论文标题

CompanyName2VEC:基于工作广告的公司实体匹配

CompanyName2Vec: Company Entity Matching Based on Job Ads

论文作者

Ziv, Ran, Gronau, Ilan, Fire, Michael

论文摘要

实体匹配是所有现实世界系统的重要组成部分,这些系统接收来自不同来源的结构化和非结构化数据。通常,连接记录没有通用键。大规模的数据清洁和集成过程需要完成在任何数据分析或可以进一步处理之前完成。尽管记录链接通常被认为是一个乏味但必要的步骤,但它揭示了有价值的见解,支持数据可视化,并指导了数据的进一步分析方法。在这里,我们专注于组织实体匹配。我们介绍了CompanyName2Vec,这是一种新颖的算法,用于使用神经网络模型来求解公司实体匹配(CEM),以从职位广告语料库中学习公司名称语义,而无需依靠匹配公司的任何信息。根据现实世界的数据,我们表明CompanyName2Vec优于其他评估的方法,并以平均成功率为89.3%解决CEM挑战。

Entity Matching is an essential part of all real-world systems that take in structured and unstructured data coming from different sources. Typically no common key is available for connecting records. Massive data cleaning and integration processes require completion before any data analytics, or further processing can be performed. Although record linkage is frequently regarded as a somewhat tedious but necessary step, it reveals valuable insights, supports data visualization, and guides further analytic approaches to the data. Here, we focus on organization entity matching. We introduce CompanyName2Vec, a novel algorithm to solve company entity matching (CEM) using a neural network model to learn company name semantics from a job ad corpus, without relying on any information on the matched company besides its name. Based on a real-world data, we show that CompanyName2Vec outperforms other evaluated methods and solves the CEM challenge with an average success rate of 89.3%.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源