购物查询数据集：用于改进产品搜索的大型ESCI基准

论文标题

购物查询数据集：用于改进产品搜索的大型ESCI基准

Shopping Queries Dataset: A Large-Scale ESCI Benchmark for Improving Product Search

论文作者

Reddy, Chandan K., Màrquez, Lluís, Valero, Fran, Rao, Nikhil, Zaragoza, Hugo, Bandyopadhyay, Sambaran, Biswas, Arnab, Xing, Anlu, Subbian, Karthik

论文摘要

提高搜索结果的质量可以显着增强用户与搜索引擎的经验和参与度。尽管机器学习和数据挖掘领域的最新进展，但正确对特定用户搜索查询的项目进行了分类一直是一个长期的挑战，这仍然有很大的改进空间。本文介绍了“购物查询数据集”，这是一个很大的亚马逊搜索查询和结果的大型数据集，以促进研究以提高搜索结果质量，以公开发布。该数据集包含约1.3万个独特的查询和260万手动标记（查询，产品）相关性判断。该数据集具有多语言，其中包括英语，日语和西班牙语的查询。购物查询数据集用于KDDCUP'22挑战之一。在本文中，我们描述了数据集并介绍了三个评估任务以及基线结果：（i）对结果列表进行排名，（ii）将产品结果分类为相关类别，以及（iii）确定给定查询的替代产品。我们预计这些数据将成为产品搜索主题的未来研究的黄金标准。

Improving the quality of search results can significantly enhance users experience and engagement with search engines. In spite of several recent advancements in the fields of machine learning and data mining, correctly classifying items for a particular user search query has been a long-standing challenge, which still has a large room for improvement. This paper introduces the "Shopping Queries Dataset", a large dataset of difficult Amazon search queries and results, publicly released with the aim of fostering research in improving the quality of search results. The dataset contains around 130 thousand unique queries and 2.6 million manually labeled (query,product) relevance judgements. The dataset is multilingual with queries in English, Japanese, and Spanish. The Shopping Queries Dataset is being used in one of the KDDCup'22 challenges. In this paper, we describe the dataset and present three evaluation tasks along with baseline results: (i) ranking the results list, (ii) classifying product results into relevance categories, and (iii) identifying substitute products for a given query. We anticipate that this data will become the gold standard for future research in the topic of product search.

下载PDF全文

下载文献需遵守相关版权规定

论文标题