论文标题

拿铁 - 混合:测量句子语义相似性与潜在的分类混合物

Latte-Mix: Measuring Sentence Semantic Similarity with Latent Categorical Mixtures

论文作者

Li, M., Bai, H., Tan, L., Xiong, K., Li, M., Lin, J.

论文摘要

使用预训练的语言模型(例如BERT)测量句子语义相似性通常会产生零拍的性能不令人满意,而一个主要原因是无效的令牌聚合方法,例如平均池。在本文中,我们在贝叶斯框架下证明了原始统计数据之间的距离,例如单词嵌入的平均值从根本上是有缺陷的,因为它们可以捕获句子级别的语义相似性。为了解决这个问题,我们建议学习基于现成的预训练的语言模型的分类变异自动编码器(VAE)。从理论上讲,我们证明测量潜在的分类混合物之间的距离,即拿铁麦片,可以更好地反映真实的句子语义相似性。此外,我们的贝叶斯框架提供了解释,说明了为什么在标记句子对上进行固定的模型具有更好的零击性能。我们还从经验上证明,拿铁混合物可以进一步改善这些固定模型。我们的方法不仅可以在STS等语义相似性数据集上产生最先进的零拍性能,而且还可以享受快速训练的好处,并具有少量的内存足迹。

Measuring sentence semantic similarity using pre-trained language models such as BERT generally yields unsatisfactory zero-shot performance, and one main reason is ineffective token aggregation methods such as mean pooling. In this paper, we demonstrate under a Bayesian framework that distance between primitive statistics such as the mean of word embeddings are fundamentally flawed for capturing sentence-level semantic similarity. To remedy this issue, we propose to learn a categorical variational autoencoder (VAE) based on off-the-shelf pre-trained language models. We theoretically prove that measuring the distance between the latent categorical mixtures, namely Latte-Mix, can better reflect the true sentence semantic similarity. In addition, our Bayesian framework provides explanations for why models finetuned on labelled sentence pairs have better zero-shot performance. We also empirically demonstrate that these finetuned models could be further improved by Latte-Mix. Our method not only yields the state-of-the-art zero-shot performance on semantic similarity datasets such as STS, but also enjoy the benefits of fast training and having small memory footprints.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源