论文标题

无监督的交叉语言嵌入的细化

Refinement of Unsupervised Cross-Lingual Word Embeddings

论文作者

Biesialska, Magdalena, Costa-jussà, Marta R.

论文摘要

跨语性单词嵌入旨在通过学习多种语单词表示形式,即使不使用任何直接的双语信号,旨在弥合高资源和低资源语言之间的差距。狮子的份额是基于投影的方法,可将预训练的嵌入到共享的潜在空间中。这些方法主要基于正交转换,该变换假设语言向量空间是同构。但是,该标准不一定成立,尤其是对于形态上丰富的语言。在本文中,我们提出了一种自我监督的方法,以完善无监督的双语单词嵌入的对齐方式。所提出的模型将单词的向量及其相应的翻译彼此接近,并强制执行长度和中心变革,从而可以更好地对齐跨语言嵌入。实验结果证明了我们的方法的有效性,因为在大多数情况下,它在双语词典诱导任务中都优于最先进的方法。

Cross-lingual word embeddings aim to bridge the gap between high-resource and low-resource languages by allowing to learn multilingual word representations even without using any direct bilingual signal. The lion's share of the methods are projection-based approaches that map pre-trained embeddings into a shared latent space. These methods are mostly based on the orthogonal transformation, which assumes language vector spaces to be isomorphic. However, this criterion does not necessarily hold, especially for morphologically-rich languages. In this paper, we propose a self-supervised method to refine the alignment of unsupervised bilingual word embeddings. The proposed model moves vectors of words and their corresponding translations closer to each other as well as enforces length- and center-invariance, thus allowing to better align cross-lingual embeddings. The experimental results demonstrate the effectiveness of our approach, as in most cases it outperforms state-of-the-art methods in a bilingual lexicon induction task.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源