论文标题

从蒸馏到硬性阴性抽样:使稀疏神经IR模型更有效

From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective

论文作者

Formal, Thibault, Lassance, Carlos, Piwowarski, Benjamin, Clinchant, Stéphane

论文摘要

基于密集表示的神经检索器与大约最近的邻居搜索相结合,最近引起了很多关注,因为它们的成功是蒸馏和/或更好地对训练的示例采样,同时仍然依靠相同的骨干结构。同时,传统倒置索引技术为稀疏的表示学习所增强,人们越来越兴趣,从理想的IR先知(例如显式词汇匹配)继承了人们的兴趣。尽管已经提出了一些建筑变体,但在培训此类模型中的努力较少。在这项工作中,我们以Splade为基础 - 基于稀疏的基于扩展的回收犬 - 并通过研究蒸馏,硬性阴性采矿以及预培训的语言模型初始化的效果,从与密集模型相同的训练改进中受益。我们进一步研究了有效性与效率之间的联系,在域内和零射击设置上,从而导致最先进的结果在两种情况下都导致了足够表达的模型。

Neural retrievers based on dense representations combined with Approximate Nearest Neighbors search have recently received a lot of attention, owing their success to distillation and/or better sampling of examples for training -- while still relying on the same backbone architecture. In the meantime, sparse representation learning fueled by traditional inverted indexing techniques has seen a growing interest, inheriting from desirable IR priors such as explicit lexical matching. While some architectural variants have been proposed, a lesser effort has been put in the training of such models. In this work, we build on SPLADE -- a sparse expansion-based retriever -- and show to which extent it is able to benefit from the same training improvements as dense models, by studying the effect of distillation, hard-negative mining as well as the Pre-trained Language Model initialization. We furthermore study the link between effectiveness and efficiency, on in-domain and zero-shot settings, leading to state-of-the-art results in both scenarios for sufficiently expressive models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源