论文标题

域分歧:调查和经验分析

Domain Divergences: a Survey and Empirical Analysis

论文作者

Kashyap, Abhinav Ramesh, Hazarika, Devamanyu, Kan, Min-Yen, Zimmermann, Roger

论文摘要

域差异在估计新域中模型的性能中起着重要作用。尽管有有关差异措施的重要文献,但研究人员发现很难为给定的NLP应用选择适当的分歧。我们通过调查文献和实证研究来解决这一缺点。我们开发了分类措施的分类法,包括三个类别:信息理论,几何和高阶度量,并确定它们之间的关系。此外,为了了解这些措施的常见用例,我们认识到三个新颖的应用 - 1)数据选择,2)学习表示和3)野外的决策 - 并使用它来组织我们的文献。由此,我们确定信息理论措施在1)和3)中普遍存在,而高阶度量更为常见2)。为了进一步帮助研究人员选择适当的措施来预测绩效的下降 - 野外决策的一个重要方面,我们进行了相关分析,涵盖了130个域适应方案,3种不同的NLP任务和12个从我们的调查中确定的差异措施。为了计算这些差异,我们考虑当前的上下文单词表示(CWR),并与较旧的分布式表示形成对比。我们发现,关于单词分布的传统措施仍然是强大的基线,而使用CWR的高阶度量则有效。

Domain divergence plays a significant role in estimating the performance of a model in new domains. While there is a significant literature on divergence measures, researchers find it hard to choose an appropriate divergence for a given NLP application. We address this shortcoming by both surveying the literature and through an empirical study. We develop a taxonomy of divergence measures consisting of three classes -- Information-theoretic, Geometric, and Higher-order measures and identify the relationships between them. Further, to understand the common use-cases of these measures, we recognise three novel applications -- 1) Data Selection, 2) Learning Representation, and 3) Decisions in the Wild -- and use it to organise our literature. From this, we identify that Information-theoretic measures are prevalent for 1) and 3), and Higher-order measures are more common for 2). To further help researchers choose appropriate measures to predict drop in performance -- an important aspect of Decisions in the Wild, we perform correlation analysis spanning 130 domain adaptation scenarios, 3 varied NLP tasks and 12 divergence measures identified from our survey. To calculate these divergences, we consider the current contextual word representations (CWR) and contrast with the older distributed representations. We find that traditional measures over word distributions still serve as strong baselines, while higher-order measures with CWR are effective.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源