TY - JOUR
T1 - Multi-Faceted Knowledge-Driven Pre-Training for Product Representation Learning
AU - Zhang, Denghui
AU - Liu, Yanchi
AU - Yuan, Zixuan
AU - Fu, Yanjie
AU - Chen, Haifeng
AU - Xiong, Hui
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2023/7/1
Y1 - 2023/7/1
N2 - As a key component of e-commerce computing, product representation learning (PRL) provides benefits for a variety of applications, including product matching, search, and categorization. The existing PRL approaches have poor language understanding ability due to their inability to capture contextualized semantics. In addition, the learned representations by existing methods are not easily transferable to new products. Inspired by the recent advance of pre-trained language models (PLMs), we make the attempt to adapt PLMs for PRL to mitigate the above issues. In this article, we develop KINDLE, a Knowledge-drIven pre-trainiNg framework for proDuct representation LEarning, which can preserve the contextual semantics and multi-faceted product knowledge robustly and flexibly. Specifically, we first extend traditional one-stage pre-training to a two-stage pre-training framework, and exploit a deliberate knowledge encoder to ensure a smooth knowledge fusion into PLM. In addition, we propose a multi-objective heterogeneous embedding method to represent thousands of knowledge elements. This helps KINDLE calibrate knowledge noise and sparsity automatically by replacing isolated classes as training targets in knowledge acquisition tasks. Furthermore, an input-aware gating network is proposed to select the most relevant knowledge for different downstream tasks. Finally, extensive experiments have demonstrated the advantages of KINDLE over the state-of-the-art baselines across three downstream tasks.
AB - As a key component of e-commerce computing, product representation learning (PRL) provides benefits for a variety of applications, including product matching, search, and categorization. The existing PRL approaches have poor language understanding ability due to their inability to capture contextualized semantics. In addition, the learned representations by existing methods are not easily transferable to new products. Inspired by the recent advance of pre-trained language models (PLMs), we make the attempt to adapt PLMs for PRL to mitigate the above issues. In this article, we develop KINDLE, a Knowledge-drIven pre-trainiNg framework for proDuct representation LEarning, which can preserve the contextual semantics and multi-faceted product knowledge robustly and flexibly. Specifically, we first extend traditional one-stage pre-training to a two-stage pre-training framework, and exploit a deliberate knowledge encoder to ensure a smooth knowledge fusion into PLM. In addition, we propose a multi-objective heterogeneous embedding method to represent thousands of knowledge elements. This helps KINDLE calibrate knowledge noise and sparsity automatically by replacing isolated classes as training targets in knowledge acquisition tasks. Furthermore, an input-aware gating network is proposed to select the most relevant knowledge for different downstream tasks. Finally, extensive experiments have demonstrated the advantages of KINDLE over the state-of-the-art baselines across three downstream tasks.
KW - Product representation learning
KW - pre-trained language models
KW - product classification
KW - product matching
KW - product search
UR - http://www.scopus.com/inward/record.url?scp=85137917620&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85137917620&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2022.3200921
DO - 10.1109/TKDE.2022.3200921
M3 - Article
AN - SCOPUS:85137917620
SN - 1041-4347
VL - 35
SP - 7239
EP - 7250
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 7
ER -