论文标题
多Query视频检索
Multi-Query Video Retrieval
论文作者
论文摘要
根据文本描述检索目标视频是一项实用价值的一项任务,并且在过去几年中受到了越来越多的关注。尽管取得了最近的进展,但现有视频检索数据集中的不完美注释对模型评估和开发提出了重大挑战。在本文中,我们通过重点关注次要视频检索的设置来解决此问题,其中为模型提供了多个描述以搜索视频档案。我们首先表明,多Query检索任务有效地减轻了通过不完美注释引入的数据集噪声,并且更好地与人类在评估当前模型的检索能力方面更好地相关。然后,我们研究了几种在训练时利用多个查询的方法,并证明了多传奇启发的训练可以导致卓越的性能和更好的概括。我们希望朝这个方向进行进一步的调查可以为在现实世界视频检索应用程序中表现更好的建筑系统带来新的见解。
Retrieving target videos based on text descriptions is a task of great practical value and has received increasing attention over the past few years. Despite recent progress, imperfect annotations in existing video retrieval datasets have posed significant challenges on model evaluation and development. In this paper, we tackle this issue by focusing on the less-studied setting of multi-query video retrieval, where multiple descriptions are provided to the model for searching over the video archive. We first show that multi-query retrieval task effectively mitigates the dataset noise introduced by imperfect annotations and better correlates with human judgement on evaluating retrieval abilities of current models. We then investigate several methods which leverage multiple queries at training time, and demonstrate that the multi-query inspired training can lead to superior performance and better generalization. We hope further investigation in this direction can bring new insights on building systems that perform better in real-world video retrieval applications.