从蒸馏到硬性阴性抽样：使稀疏神经IR模型更有效

论文标题

从蒸馏到硬性阴性抽样：使稀疏神经IR模型更有效

From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective

论文作者

Formal, Thibault, Lassance, Carlos, Piwowarski, Benjamin, Clinchant, Stéphane

论文摘要

基于密集表示的神经检索器与大约最近的邻居搜索相结合，最近引起了很多关注，因为它们的成功是蒸馏和/或更好地对训练的示例采样，同时仍然依靠相同的骨干结构。同时，传统倒置索引技术为稀疏的表示学习所增强，人们越来越兴趣，从理想的IR先知（例如显式词汇匹配）继承了人们的兴趣。尽管已经提出了一些建筑变体，但在培训此类模型中的努力较少。在这项工作中，我们以Splade为基础 - 基于稀疏的基于扩展的回收犬 - 并通过研究蒸馏，硬性阴性采矿以及预培训的语言模型初始化的效果，从与密集模型相同的训练改进中受益。我们进一步研究了有效性与效率之间的联系，在域内和零射击设置上，从而导致最先进的结果在两种情况下都导致了足够表达的模型。

Neural retrievers based on dense representations combined with Approximate Nearest Neighbors search have recently received a lot of attention, owing their success to distillation and/or better sampling of examples for training -- while still relying on the same backbone architecture. In the meantime, sparse representation learning fueled by traditional inverted indexing techniques has seen a growing interest, inheriting from desirable IR priors such as explicit lexical matching. While some architectural variants have been proposed, a lesser effort has been put in the training of such models. In this work, we build on SPLADE -- a sparse expansion-based retriever -- and show to which extent it is able to benefit from the same training improvements as dense models, by studying the effect of distillation, hard-negative mining as well as the Pre-trained Language Model initialization. We furthermore study the link between effectiveness and efficiency, on in-domain and zero-shot settings, leading to state-of-the-art results in both scenarios for sufficiently expressive models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题