论文标题
监督产品匹配的对比度学习
Supervised Contrastive Learning for Product Matching
论文作者
论文摘要
近年来,对比度学习使计算机视觉和信息检索中的许多任务都转移了最新技术。这张海报是使用不同电子商店的产品提供的第一项将受监管的对比度学习应用于电子商务中产品匹配的任务的工作。更具体地说,我们采用了一种有监督的对比学习技术来预先训练变压器编码器,随后使用配对训练数据对匹配任务进行了微调。我们进一步提出了一种来源感知的抽样策略,该策略能够将对比度学习用于培训数据不包含产品标识符的用例。我们表明,应用有监督的对比预训练与源意识采样相结合,可以显着提高几种广泛使用的基准的最新性能:对于ABT-Buy,我们达到94.29的F1分数(+3.24(+3.24)(与先前的目前的目前是Amazon-Google 79.28(+ 3.7)(+ 3.7)相比。对于WDC计算机数据集,根据训练集的大小,我们在F1得分中达到+0.8至+8.84之间的改进。进一步的数据增强和自我监督的对比预训练表明,前者可以有助于较小的训练集,而后者由于固有的标签噪声而导致性能显着下降。因此,我们得出的结论是,对比的预训练具有很高的产品匹配用例,在这些用例中可用。
Contrastive learning has moved the state of the art for many tasks in computer vision and information retrieval in recent years. This poster is the first work that applies supervised contrastive learning to the task of product matching in e-commerce using product offers from different e-shops. More specifically, we employ a supervised contrastive learning technique to pre-train a Transformer encoder which is afterward fine-tuned for the matching task using pair-wise training data. We further propose a source-aware sampling strategy that enables contrastive learning to be applied for use cases in which the training data does not contain product identifiers. We show that applying supervised contrastive pre-training in combination with source-aware sampling significantly improves the state-of-the-art performance on several widely used benchmarks: For Abt-Buy, we reach an F1-score of 94.29 (+3.24 compared to the previous state-of-the-art), for Amazon-Google 79.28 (+ 3.7). For WDC Computers datasets, we reach improvements between +0.8 and +8.84 in F1-score depending on the training set size. Further experiments with data augmentation and self-supervised contrastive pre-training show that the former can be helpful for smaller training sets while the latter leads to a significant decline in performance due to inherent label noise. We thus conclude that contrastive pre-training has a high potential for product matching use cases in which explicit supervision is available.