您知道的语言会影响您所学的语言：语言特征对多语言文本到文本转移的影响

论文标题

您知道的语言会影响您所学的语言：语言特征对多语言文本到文本转移的影响

Languages You Know Influence Those You Learn: Impact of Language Characteristics on Multi-Lingual Text-to-Text Transfer

论文作者

Muller, Benjamin, Gupta, Deepanshu, Patwardhan, Siddharth, Fauconnier, Jean-Philippe, Vandyke, David, Agarwal, Sachin

论文摘要

多语言语言模型（LM），例如Mbert，XLM-R，MT5，MBART，在通过从高资源的跨语言转移中启用低资源语言的自然语言任务非常成功。在这项工作中，我们试图更好地理解此类模型，特别是MT5， *跨语言的任何语言和语义知识如何转移，即使在预训练期间没有提供明确的跨语言信号。相反，只有从每种语言中的未经注释的文本分别和彼此独立地呈现给模型，并且该模型似乎隐含地学习了跨语性的连接。这提出了激励我们研究的几个问题，例如：每条语言对之间的跨语性联系是否同样强？来源和目标语言的哪些特性会影响跨语言转移的强度？我们可以量化这些特性对跨语性转移的影响吗？在我们的调查中，我们分析了预先训练的MT5，以发现模型学到的跨语义连接的属性。通过统计解释框架超过90个任务，我们表明转移性能可以通过一些语言和数据衍生的功能来建模。这些观察结果使我们能够解释对MT5模型的跨语性理解。通过这些观察结果，人们可以有利地为任务选择最佳的源语言，并可以预见其培训数据需求。这项工作的一个关键发现是，语法，形态学和语音学的相似性是跨语性转移的良好预测指标，而不仅仅是语言的词汇相似性。对于给定的语言，我们能够预测零弹性的性能，这会增加对数刻度，而少数射击目标语言数据点数。

Multi-lingual language models (LM), such as mBERT, XLM-R, mT5, mBART, have been remarkably successful in enabling natural language tasks in low-resource languages through cross-lingual transfer from high-resource ones. In this work, we try to better understand how such models, specifically mT5, transfer *any* linguistic and semantic knowledge across languages, even though no explicit cross-lingual signals are provided during pre-training. Rather, only unannotated texts from each language are presented to the model separately and independently of one another, and the model appears to implicitly learn cross-lingual connections. This raises several questions that motivate our study, such as: Are the cross-lingual connections between every language pair equally strong? What properties of source and target language impact the strength of cross-lingual transfer? Can we quantify the impact of those properties on the cross-lingual transfer? In our investigation, we analyze a pre-trained mT5 to discover the attributes of cross-lingual connections learned by the model. Through a statistical interpretation framework over 90 language pairs across three tasks, we show that transfer performance can be modeled by a few linguistic and data-derived features. These observations enable us to interpret cross-lingual understanding of the mT5 model. Through these observations, one can favorably choose the best source language for a task, and can anticipate its training data demands. A key finding of this work is that similarity of syntax, morphology and phonology are good predictors of cross-lingual transfer, significantly more than just the lexical similarity of languages. For a given language, we are able to predict zero-shot performance, that increases on a logarithmic scale with the number of few-shot target language data points.

下载PDF全文

下载文献需遵守相关版权规定

论文标题