论文标题
让巴特骑惯用火车:学习代表惯用表情
Getting BART to Ride the Idiomatic Train: Learning to Represent Idiomatic Expressions
论文作者
论文摘要
惯用表达(IES)以其非构成性为特征,是自然语言的重要组成部分。这是对NLP的经典挑战,包括训练当今最先进的预训练的语言模型。先前的工作已经确定了其背景化表示的缺陷,这是由于代表的基本组成范式所致。在这项工作中,我们采用了第一个原理的方法,以使用适配器作为对惯用句子的轻量级非构成语言专家来建立惯用性。通过固有和外在方法可以看到基准线(例如BART)的能力,其中嵌入聚类的均匀性得分高0.19分,而在IE sense sense sense Inambiguation disamabiagiation and span和tactection的同质量得分中得分提高了0.19分。
Idiomatic expressions (IEs), characterized by their non-compositionality, are an important part of natural language. They have been a classical challenge to NLP, including pre-trained language models that drive today's state-of-the-art. Prior work has identified deficiencies in their contextualized representation stemming from the underlying compositional paradigm of representation. In this work, we take a first-principles approach to build idiomaticity into BART using an adapter as a lightweight non-compositional language expert trained on idiomatic sentences. The improved capability over baselines (e.g., BART) is seen via intrinsic and extrinsic methods, where idiom embeddings score 0.19 points higher in homogeneity score for embedding clustering, and up to 25% higher sequence accuracy on the idiom processing tasks of IE sense disambiguation and span detection.