论文标题
使用公共索引文件格式支持开源搜索引擎之间的互操作性
Supporting Interoperability Between Open-Source Search Engines with the Common Index File Format
论文作者
论文摘要
在鼓励开源搜索引擎的多样化生态系统与支持这些系统之间的公平,可复制的比较之间存在自然的张力。为了平衡这两个目标,我们研究了两种方法,可以在几个系统的倒置索引之间提供互操作性。首先利用了索引结构和构建包装器的内部抽象,使一个系统可以直接读取另一个系统的索引。第二个涉及通过我们开发的数据交换规范在系统之间共享索引,称为公共索引文件格式(CIFF)。我们演示了Java Systems Anserini和Terrier的第一种方法,以及Anserini,Jassv2,Olddog,Pisa和Terrier的第二种方法。这些系统共同提供了广泛的实现和功能,具有不同的研究目标。总体而言,我们建议CIFF作为支持独立创新的一种低劳动方法,同时实现了对推动领域前进至关重要的公平评估类型。
There exists a natural tension between encouraging a diverse ecosystem of open-source search engines and supporting fair, replicable comparisons across those systems. To balance these two goals, we examine two approaches to providing interoperability between the inverted indexes of several systems. The first takes advantage of internal abstractions around index structures and building wrappers that allow one system to directly read the indexes of another. The second involves sharing indexes across systems via a data exchange specification that we have developed, called the Common Index File Format (CIFF). We demonstrate the first approach with the Java systems Anserini and Terrier, and the second approach with Anserini, JASSv2, OldDog, PISA, and Terrier. Together, these systems provide a wide range of implementations and features, with different research goals. Overall, we recommend CIFF as a low-effort approach to support independent innovation while enabling the types of fair evaluations that are critical for driving the field forward.