论文标题
剪辑还理解文本:提示剪辑以了解短语理解
CLIP also Understands Text: Prompting CLIP for Phrase Understanding
论文作者
论文摘要
对比性语言图像预处理(剪辑)有效地通过自然语言监督进行预训练来学习视觉概念。剪辑及其视觉编码器已在各种视觉和语言任务上进行了探索,并实现了强大的零击或转移学习绩效。但是,仅探索了仅用于文本理解的文本编码器的应用。在本文中,我们发现剪辑的文本编码器实际上表现出强大的短语理解能力,甚至可以在适当设计的提示中显着优于诸如BERT之类的流行语言模型。广泛的实验验证了我们方法对实体聚类和实体设置扩展任务的不同数据集和域的有效性。
Contrastive Language-Image Pretraining (CLIP) efficiently learns visual concepts by pre-training with natural language supervision. CLIP and its visual encoder have been explored on various vision and language tasks and achieve strong zero-shot or transfer learning performance. However, the application of its text encoder solely for text understanding has been less explored. In this paper, we find that the text encoder of CLIP actually demonstrates strong ability for phrase understanding, and can even significantly outperform popular language models such as BERT with a properly designed prompt. Extensive experiments validate the effectiveness of our method across different datasets and domains on entity clustering and entity set expansion tasks.