前传：提前对机器翻译输出的质量估计

论文标题

前传：提前对机器翻译输出的质量估计

PreQuEL: Quality Estimation of Machine Translation Outputs in Advance

论文作者

Don-Yehiya, Shachar, Choshen, Leshem, Abend, Omri

论文摘要

我们介绍了前传，前（质量估计）学习的任务。前传系统预测，给定句子的翻译程度，而无需求助于实际翻译，因此当翻译质量势必低时，可以避免不必要的资源分配。可以相对于给定的MT系统（例如某些行业服务）或通常相对于最先进的前传定义前传。从理论的角度来看，前传将重点放在源文本，跟踪属性（可能是语言特征）上，这些特征使句子更难进行翻译。我们为任务开发基线模型并分析其性能。我们还开发了一种数据增强方法（来自Parallel Corpora），可以大大改善结果。我们表明，这种增强方法也可以提高质量估计任务的性能。我们通过对挑战集和不同语言进行测试，研究了我们模型敏感的输入文本的属性。我们得出的结论是，它意识到句法和语义区别，并且相关联甚至过分强调了标准NLP特征的重要性。

We present the task of PreQuEL, Pre-(Quality-Estimation) Learning. A PreQuEL system predicts how well a given sentence will be translated, without recourse to the actual translation, thus eschewing unnecessary resource allocation when translation quality is bound to be low. PreQuEL can be defined relative to a given MT system (e.g., some industry service) or generally relative to the state-of-the-art. From a theoretical perspective, PreQuEL places the focus on the source text, tracing properties, possibly linguistic features, that make a sentence harder to machine translate. We develop a baseline model for the task and analyze its performance. We also develop a data augmentation method (from parallel corpora), that improves results substantially. We show that this augmentation method can improve the performance of the Quality-Estimation task as well. We investigate the properties of the input text that our model is sensitive to, by testing it on challenge sets and different languages. We conclude that it is aware of syntactic and semantic distinctions, and correlates and even over-emphasizes the importance of standard NLP features.

下载PDF全文

下载文献需遵守相关版权规定

论文标题