Structural Contrastive Representation Learning for Zero-shot Multi-label Text Classification

Tianyi Zhang, Zhaozhuo Xu, Tharun Medini, Anshumali Shrivastava

Research output: Contribution to conferencePaperpeer-review

9 Scopus citations

Abstract

Zero-shot multi-label text classification (ZMTC) is a fundamental task in natural language processing with applications in the cold start problem of recommendation systems. Ideally, one would learn an expressive representation of both input text and label features so that ZMTC is transformed into a nearest neighbor search problem. However, the existing representation learning approaches for ZMTC struggle with accuracy as well as poor training efficiency. Firstly, the input text is structural, consisting of both short title sentences and long content paragraphs. It is challenging to model the correlation between short label descriptions and long structural input documents. Secondly, the enormous label space in ZMTC forces the existing approaches to perform multi-stage learning with label engineering. As a result, the training overhead is significant. In this paper, we address both problems by introducing an end-to-end structural contrastive representation learning approach. We propose a randomized text segmentation (RTS) technique to generate high-quality contrastive pairs. This RTS technique allows us to model title-content correlation. Additionally, we simplify the multi-stage ZMTC learning strategy by avoiding label engineering. Extensive experiments demonstrate that our approach leads to up to 2.33% improvement in precision@1 and 5.94× speedup in training time on publicly available datasets. Our code is available publicly.

Original languageEnglish
Pages4966-4976
Number of pages11
StatePublished - 2022
Event2022 Findings of the Association for Computational Linguistics: EMNLP 2022 - Abu Dhabi, United Arab Emirates
Duration: 7 Dec 202211 Dec 2022

Conference

Conference2022 Findings of the Association for Computational Linguistics: EMNLP 2022
Country/TerritoryUnited Arab Emirates
CityAbu Dhabi
Period7/12/2211/12/22

Fingerprint

Dive into the research topics of 'Structural Contrastive Representation Learning for Zero-shot Multi-label Text Classification'. Together they form a unique fingerprint.

Cite this