论文标题

通过标签文档预培训多标签文本分类

Label-Wise Document Pre-Training for Multi-Label Text Classification

论文作者

Liu, Han, Yuan, Caixia, Wang, Xiaojie

论文摘要

多标签文本分类(MLTC)的主要挑战是刺激可能的标签差异和标签相关性。在本文中,我们通过开发标签前培训(LW-PT)方法来解决这一挑战,以获取具有标签感知信息的文档表示形式。基本思想是,多标签文档可以表示为多个标签表示的组合,并且相关的标签始终在相同或相似的文档中同时进行。 LW-PT通过构建标签文档分类任务并培训标签文档编码来实现此想法。最后,通过下游MLTC任务对预先训练的标签编码器进行了微调。广泛的实验结果验证了所提出的方法比以前的最新模型具有显着优势,并且能够发现合理的标签关系。该代码的发布是为了促进其他研究人员。

A major challenge of multi-label text classification (MLTC) is to stimulatingly exploit possible label differences and label correlations. In this paper, we tackle this challenge by developing Label-Wise Pre-Training (LW-PT) method to get a document representation with label-aware information. The basic idea is that, a multi-label document can be represented as a combination of multiple label-wise representations, and that, correlated labels always cooccur in the same or similar documents. LW-PT implements this idea by constructing label-wise document classification tasks and trains label-wise document encoders. Finally, the pre-trained label-wise encoder is fine-tuned with the downstream MLTC task. Extensive experimental results validate that the proposed method has significant advantages over the previous state-of-the-art models and is able to discover reasonable label relationship. The code is released to facilitate other researchers.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源