拿铁 - 混合：测量句子语义相似性与潜在的分类混合物

论文标题

拿铁 - 混合：测量句子语义相似性与潜在的分类混合物

Latte-Mix: Measuring Sentence Semantic Similarity with Latent Categorical Mixtures

论文作者

Li, M., Bai, H., Tan, L., Xiong, K., Li, M., Lin, J.

论文摘要

使用预训练的语言模型（例如BERT）测量句子语义相似性通常会产生零拍的性能不令人满意，而一个主要原因是无效的令牌聚合方法，例如平均池。在本文中，我们在贝叶斯框架下证明了原始统计数据之间的距离，例如单词嵌入的平均值从根本上是有缺陷的，因为它们可以捕获句子级别的语义相似性。为了解决这个问题，我们建议学习基于现成的预训练的语言模型的分类变异自动编码器（VAE）。从理论上讲，我们证明测量潜在的分类混合物之间的距离，即拿铁麦片，可以更好地反映真实的句子语义相似性。此外，我们的贝叶斯框架提供了解释，说明了为什么在标记句子对上进行固定的模型具有更好的零击性能。我们还从经验上证明，拿铁混合物可以进一步改善这些固定模型。我们的方法不仅可以在STS等语义相似性数据集上产生最先进的零拍性能，而且还可以享受快速训练的好处，并具有少量的内存足迹。

Measuring sentence semantic similarity using pre-trained language models such as BERT generally yields unsatisfactory zero-shot performance, and one main reason is ineffective token aggregation methods such as mean pooling. In this paper, we demonstrate under a Bayesian framework that distance between primitive statistics such as the mean of word embeddings are fundamentally flawed for capturing sentence-level semantic similarity. To remedy this issue, we propose to learn a categorical variational autoencoder (VAE) based on off-the-shelf pre-trained language models. We theoretically prove that measuring the distance between the latent categorical mixtures, namely Latte-Mix, can better reflect the true sentence semantic similarity. In addition, our Bayesian framework provides explanations for why models finetuned on labelled sentence pairs have better zero-shot performance. We also empirically demonstrate that these finetuned models could be further improved by Latte-Mix. Our method not only yields the state-of-the-art zero-shot performance on semantic similarity datasets such as STS, but also enjoy the benefits of fast training and having small memory footprints.

下载PDF全文

下载文献需遵守相关版权规定

论文标题