论文标题

循环中的人:如何通过手动标记每个班级的几个文档来有效地创建连贯的主题

Human in the loop: How to effectively create coherent topics by manually labeling only a few documents per class

论文作者

Thielmann, Anton, Weisser, Christoph, Säfken, Benjamin

论文摘要

在稀疏标签分配下进行精确建模的几种方法显着改善。但是,在自然语言处理中几乎没有射击建模的应用仅在文档分类领域。随着最近的性能改进,有监督的几种方法与简单的主题提取方法相结合,对无监督的主题建模方法构成了重大挑战。我们的研究表明,在生成连贯的主题方面,几乎没有射击的学习与简单的主题提取方法相结合,可以超越无监督的主题建模技术,即使使用每个班级只有几个标记的文档。

Few-shot methods for accurate modeling under sparse label-settings have improved significantly. However, the applications of few-shot modeling in natural language processing remain solely in the field of document classification. With recent performance improvements, supervised few-shot methods, combined with a simple topic extraction method pose a significant challenge to unsupervised topic modeling methods. Our research shows that supervised few-shot learning, combined with a simple topic extraction method, can outperform unsupervised topic modeling techniques in terms of generating coherent topics, even when only a few labeled documents per class are used.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源