论文标题

Arqmath Lab:ZBMATH OPEN中的语义配方搜索的孵化器?

ARQMath Lab: An Incubator for Semantic Formula Search in zbMATH Open?

论文作者

Scharpf, Philipp, Schubotz, Moritz, Greiner-Petter, Andre, Ostendorff, Malte, Teschke, Olaf, Gipp, Bela

论文摘要

ZBMATH数据库包含超过400万参考书目。我们旨在轻松访问这些条目。因此,我们维护不同的索引结构,包括公式指数。为了优化数据库中条目的可发现性,我们不断研究新方法以满足用户的信息需求。我们认为,来自ARQMATH评估的发现将产生新的见解,其中最适合满足数学信息需求的索引结构。搜索引擎,推荐系统,pla窃检查软件以及许多其他添加值服务,这些服务作用于诸如ARXIV和ZBMATH之类的数据库中,需要结合天然和公式语言。解决此挑战的一种初始方法是通过实体链接丰富大多数非结构化的文档数据。 CLEF 2020的ARQMATH任务旨在解决将新发布的问题从数学堆栈交易所(MSE)链接到社区已经回答的问题的问题。为了深入了解MSE信息需求,答案和公式类型,我们对任务1和2进行了手动运行。此外,我们探索了几种公式检索方法:对于任务2,例如模糊字符串搜索,K-nearest Neignbors,以及我们最近引入的文本搜索搜索的数学对象(MOI)。任务结果表明,我们的自动化方法和手册均未在比赛中运行良好的分数。但是,MOI搜索返回的热门单曲的感知质量特别促使我们对MOI进行进一步的研究。

The zbMATH database contains more than 4 million bibliographic entries. We aim to provide easy access to these entries. Therefore, we maintain different index structures, including a formula index. To optimize the findability of the entries in our database, we continuously investigate new approaches to satisfy the information needs of our users. We believe that the findings from the ARQMath evaluation will generate new insights into which index structures are most suitable to satisfy mathematical information needs. Search engines, recommender systems, plagiarism checking software, and many other added-value services acting on databases such as the arXiv and zbMATH need to combine natural and formula language. One initial approach to address this challenge is to enrich the mostly unstructured document data via Entity Linking. The ARQMath Task at CLEF 2020 aims to tackle the problem of linking newly posted questions from Math Stack Exchange (MSE) to existing ones that were already answered by the community. To deeply understand MSE information needs, answer-, and formula types, we performed manual runs for tasks 1 and 2. Furthermore, we explored several formula retrieval methods: For task 2, such as fuzzy string search, k-nearest neighbors, and our recently introduced approach to retrieve Mathematical Objects of Interest (MOI) with textual search queries. The task results show that neither our automated methods nor our manual runs archived good scores in the competition. However, the perceived quality of the hits returned by the MOI search particularly motivates us to conduct further research about MOI.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源