论文标题

USCORE:一种有效的无监督评估指标的方法

USCORE: An Effective Approach to Fully Unsupervised Evaluation Metrics for Machine Translation

论文作者

Belouadi, Jonas, Eger, Steffen

论文摘要

监督机器翻译的绝大多数评估指标,即(i)接受人体得分的培训,(ii)假设存在参考翻译,或(iii)利用并行数据。这会阻碍其适用于此类监督信号的情况。在这项工作中,我们制定了完全无监督的评估指标。为此,我们利用评估指标,平行语料库开采和MT系统之间的相似性和协同作用。特别是,我们使用无监督的评估指标来开采伪并行数据,我们用来将缺陷的基础矢量空间(以迭代方式)进行重塑,并诱导无监督的MT系统,然后提供伪引用作为标准中的附加组件。最后,我们还从伪并行数据诱导了无监督的多语言句子嵌入。我们表明,我们完全无监督的指标是有效的,即,他们在5个评估数据集中的4个击败了受监督的竞争对手。我们公开提供代码。

The vast majority of evaluation metrics for machine translation are supervised, i.e., (i) are trained on human scores, (ii) assume the existence of reference translations, or (iii) leverage parallel data. This hinders their applicability to cases where such supervision signals are not available. In this work, we develop fully unsupervised evaluation metrics. To do so, we leverage similarities and synergies between evaluation metric induction, parallel corpus mining, and MT systems. In particular, we use an unsupervised evaluation metric to mine pseudo-parallel data, which we use to remap deficient underlying vector spaces (in an iterative manner) and to induce an unsupervised MT system, which then provides pseudo-references as an additional component in the metric. Finally, we also induce unsupervised multilingual sentence embeddings from pseudo-parallel data. We show that our fully unsupervised metrics are effective, i.e., they beat supervised competitors on 4 out of our 5 evaluation datasets. We make our code publicly available.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源