论文标题
DSI ++:使用新文档更新变压器内存
DSI++: Updating Transformer Memory with New Documents
论文作者
论文摘要
可区分的搜索指标(DSIS)编码模型参数中的文档语料库,并使用相同的模型直接回答用户查询。尽管DSI模型的性能很强,但将它们部署在随时间变化的情况下将其部署在计算上昂贵,因为重新索引语料库需要重新训练模型。在这项工作中,我们介绍了DSI ++,这是DSI持续学习的挑战,即逐步索引新文档,同时能够回答与以前和新索引文档有关的查询。在不同的模型量表和文档标识符表示中,我们表明,新文档的持续索引会导致对先前索引文档的忘记。我们还假设并验证该模型会在训练过程中忘记事件,从而导致学习不稳定。为了减轻这些问题,我们研究了两种方法。第一个重点是修改训练动力。平坦的minima隐含地减轻了遗忘,因此我们优化了较平坦的损失盆地,并表明该模型稳定地记住了更多的文档($+12 \%$)。接下来,我们引入生成记忆,以对文档进行伪Queries进行样品,并在连续索引期间对其进行补充,以防止忘记检索任务。基于自然问题(NQ)和MARCO女士的新型持续索引基准的广泛实验表明,我们提出的解决方案会减轻忘记。具体而言,它比竞争性基线的NQ的平均命中率提高了@10 $++21.1 \%\%\%$ $,并且与重新训练DSI模型相比,需要减少$ 6 $ $倍的模型更新。
Differentiable Search Indices (DSIs) encode a corpus of documents in model parameters and use the same model to answer user queries directly. Despite the strong performance of DSI models, deploying them in situations where the corpus changes over time is computationally expensive because reindexing the corpus requires re-training the model. In this work, we introduce DSI++, a continual learning challenge for DSI to incrementally index new documents while being able to answer queries related to both previously and newly indexed documents. Across different model scales and document identifier representations, we show that continual indexing of new documents leads to considerable forgetting of previously indexed documents. We also hypothesize and verify that the model experiences forgetting events during training, leading to unstable learning. To mitigate these issues, we investigate two approaches. The first focuses on modifying the training dynamics. Flatter minima implicitly alleviate forgetting, so we optimize for flatter loss basins and show that the model stably memorizes more documents ($+12\%$). Next, we introduce a generative memory to sample pseudo-queries for documents and supplement them during continual indexing to prevent forgetting for the retrieval task. Extensive experiments on novel continual indexing benchmarks based on Natural Questions (NQ) and MS MARCO demonstrate that our proposed solution mitigates forgetting significantly. Concretely, it improves the average Hits@10 by $+21.1\%$ over competitive baselines for NQ and requires $6$ times fewer model updates compared to re-training the DSI model for incrementally indexing five corpora in a sequence.