探索对比度学习的负面样本的影响：嵌入句子的案例研究

论文标题

探索对比度学习的负面样本的影响：嵌入句子的案例研究

Exploring the Impact of Negative Samples of Contrastive Learning: A Case Study of Sentence Embedding

论文作者

Cao, Rui, Wang, Yihao, Liang, Yuxin, Gao, Ling, Zheng, Jie, Ren, Jie, Wang, Zheng

论文摘要

对比学习是一种有力的技术，可以从未标记的数据中提取知识。该技术需要两种成分的平衡混合物：阳性（类似）和阴性（不同）样品。这通常是通过在训练期间保持负样本队列来实现的。该区域中的先验作品通常使用固定长度的负样本队列，但是负样本量如何影响模型性能尚不清楚。使用对比学习时，负样本数量对性能的不透明影响引起了我们的深入探索。本文提出了一个动量对比学习模型，带有句子嵌入的否定样品队列，即mocose。我们将预测层添加到在线分支中，以使模型不对称，并与目标分支的EMA更新机制一起，以防止模型崩溃。我们定义了最大的可追溯距离度量，通过该指标，我们通过它在多大程度上学到了对比的学习益处，从负面样本的历史信息中受益。我们的实验发现，当最大可追踪距离在一定范围内时，获得最佳结果，表明负样品队列有最佳的历史信息范围。我们在语义文本相似性（STS）任务上评估了提出的无监督的Mocose，并获得了Spearman的平均相关性$ 77.27 \％$。源代码可从https://github.com/xbdxwyh/mocose获得。

Contrastive learning is emerging as a powerful technique for extracting knowledge from unlabeled data. This technique requires a balanced mixture of two ingredients: positive (similar) and negative (dissimilar) samples. This is typically achieved by maintaining a queue of negative samples during training. Prior works in the area typically uses a fixed-length negative sample queue, but how the negative sample size affects the model performance remains unclear. The opaque impact of the number of negative samples on performance when employing contrastive learning aroused our in-depth exploration. This paper presents a momentum contrastive learning model with negative sample queue for sentence embedding, namely MoCoSE. We add the prediction layer to the online branch to make the model asymmetric and together with EMA update mechanism of the target branch to prevent the model from collapsing. We define a maximum traceable distance metric, through which we learn to what extent the text contrastive learning benefits from the historical information of negative samples. Our experiments find that the best results are obtained when the maximum traceable distance is at a certain range, demonstrating that there is an optimal range of historical information for a negative sample queue. We evaluate the proposed unsupervised MoCoSE on the semantic text similarity (STS) task and obtain an average Spearman's correlation of $77.27\%$. Source code is available at https://github.com/xbdxwyh/mocose.

下载PDF全文

下载文献需遵守相关版权规定

论文标题