论文标题
表检索可能不需要特定表的模型设计
Table Retrieval May Not Necessitate Table-specific Model Design
论文作者
论文摘要
表是人类和机器读取器的结构化数据的一种重要形式,为在文本中找不到或无法轻易找到的问题提供了答案。最近的工作设计了特殊的模型和培训范例,用于与桌子相关的任务,例如基于表的问答和表检索。尽管有效,但与通用文本解决方案相比,它们在建模和数据采集方面都增加了复杂性,并且掩盖了哪些真正有益的元素。在这项工作中,我们专注于表检索的任务,并提出:“表格检索所需的特定于表格的模型设计,还是可以有效地使用更简单的基于文本的模型来实现相似的结果?”首先,我们对自然问题数据集(NQ-table)的基于表的部分进行分析,并发现该结构在70%以上的情况下起着可忽略的作用。基于此,我们根据文本和使用特定于表特定模型设计的专门密集的表猎犬(DTR)尝试了一般密集的通道检索器(DPR)。我们发现,DPR在没有任何特定表格的设计和训练的情况下表现良好,甚至在正确线性的桌子上进行了微调时,与DTR相比,DPR的性能相比,甚至取得了优越的结果。然后,我们使用三个模块进行实验,以明确编码表结构,即辅助行/柱嵌入,硬注意性掩码和基于软关系的注意偏见。但是,这些都没有产生重大改进,这表明特定于桌子的模型设计对于表检修可能不是必需的。
Tables are an important form of structured data for both human and machine readers alike, providing answers to questions that cannot, or cannot easily, be found in texts. Recent work has designed special models and training paradigms for table-related tasks such as table-based question answering and table retrieval. Though effective, they add complexity in both modeling and data acquisition compared to generic text solutions and obscure which elements are truly beneficial. In this work, we focus on the task of table retrieval, and ask: "is table-specific model design necessary for table retrieval, or can a simpler text-based model be effectively used to achieve a similar result?" First, we perform an analysis on a table-based portion of the Natural Questions dataset (NQ-table), and find that structure plays a negligible role in more than 70% of the cases. Based on this, we experiment with a general Dense Passage Retriever (DPR) based on text and a specialized Dense Table Retriever (DTR) that uses table-specific model designs. We find that DPR performs well without any table-specific design and training, and even achieves superior results compared to DTR when fine-tuned on properly linearized tables. We then experiment with three modules to explicitly encode table structures, namely auxiliary row/column embeddings, hard attention masks, and soft relation-based attention biases. However, none of these yielded significant improvements, suggesting that table-specific model design may not be necessary for table retrieval.