RPT: Toward Transferable Model on Heterogeneous Researcher Data via Pre-Training

Ziyue Qiao, Yanjie Fu, Pengyang Wang, Meng Xiao, Zhiyuan Ning, Denghui Zhang, Yi Du, Yuanchun Zhou

Research output: Contribution to journalArticlepeer-review

9 Scopus citations

Abstract

With the growth of the academic engines, the mining and analysis acquisition of massive researcher data, such as collaborator recommendation and researcher retrieval, has become indispensable for improving the quality and intelligence of services. However, most of the existing studies for researcher data mining focus on a single task for a particular application scenario and learning a task-specific model, which is usually unable to transfer to out-of-scope tasks. In this paper, we propose a multi-task self-supervised learning-based researcher data pre-training model named RPT, which is efficient to accomplish multiple researcher data mining tasks. Specifically, we divide the researchers' data into semantic document sets and community graph. We design the hierarchical Transformer and the local community encoder to capture information from the two categories of data, respectively. Then, we propose three self-supervised learning objectives to train the whole model. For RPT's main task, we leverage contrastive learning to discriminate whether these captured two kinds of information belong to the same researcher. In addition, two auxiliary tasks, named hierarchical masked language model and community relation prediction for extracting semantic and community information, are integrated to improve pre-training. Finally, we also propose two transfer modes of RPT for fine-tuning in different scenarios. We conduct extensive experiments to evaluate RPT, results on three downstream tasks verify the effectiveness of pre-training for researcher data mining.

Original languageEnglish
Pages (from-to)186-199
Number of pages14
JournalIEEE Transactions on Big Data
Volume9
Issue number1
DOIs
StatePublished - 1 Feb 2023

Keywords

  • Pre-training
  • contrastive learning
  • graph representation learning
  • transformer

Fingerprint

Dive into the research topics of 'RPT: Toward Transferable Model on Heterogeneous Researcher Data via Pre-Training'. Together they form a unique fingerprint.

Cite this