论文标题
机器翻译评估的概述
An Overview on Machine Translation Evaluation
论文作者
论文摘要
自1950年代以来,机器翻译(MT)已成为AI和开发的重要任务之一,并且经历了几个不同的时期和开发阶段,包括基于规则的方法,统计方法以及最近提出的基于神经网络的学习方法。伴随这些分阶段的飞跃是MT的评估研究和开发,尤其是评估方法在统计翻译和神经翻译研究中的重要作用。 MT的评估任务不仅是为了评估机器翻译的质量,而且还可以及时向机器翻译研究人员提供有关机器翻译本身存在的问题,如何改进以及如何优化的问题。在某些实际的应用程序字段中,例如在没有参考翻译的情况下,机器翻译的质量估计起着重要的作用,作为揭示自动翻译目标语言的信誉的指标。该报告主要包括以下内容:机器翻译评估(MTE)的简要历史,对MTE的研究方法的分类以及尖端进度,包括人类评估,自动评估和评估方法评估(元评估)。手动评估和自动评估包括基于参考翻译和参考翻译的独立参与;自动评估方法包括传统的N-Gram字符串匹配,应用语法和语义的模型以及深度学习模型;评估方法的评估包括估计人类评估的可信度,自动评估的可靠性,测试集的可靠性等。尖端评估方法的进步包括基于任务的评估,使用基于大数据的预训练的语言模型以及使用蒸馏技术的轻量级优化模型。
Since the 1950s, machine translation (MT) has become one of the important tasks of AI and development, and has experienced several different periods and stages of development, including rule-based methods, statistical methods, and recently proposed neural network-based learning methods. Accompanying these staged leaps is the evaluation research and development of MT, especially the important role of evaluation methods in statistical translation and neural translation research. The evaluation task of MT is not only to evaluate the quality of machine translation, but also to give timely feedback to machine translation researchers on the problems existing in machine translation itself, how to improve and how to optimise. In some practical application fields, such as in the absence of reference translations, the quality estimation of machine translation plays an important role as an indicator to reveal the credibility of automatically translated target languages. This report mainly includes the following contents: a brief history of machine translation evaluation (MTE), the classification of research methods on MTE, and the the cutting-edge progress, including human evaluation, automatic evaluation, and evaluation of evaluation methods (meta-evaluation). Manual evaluation and automatic evaluation include reference-translation based and reference-translation independent participation; automatic evaluation methods include traditional n-gram string matching, models applying syntax and semantics, and deep learning models; evaluation of evaluation methods includes estimating the credibility of human evaluations, the reliability of the automatic evaluation, the reliability of the test set, etc. Advances in cutting-edge evaluation methods include task-based evaluation, using pre-trained language models based on big data, and lightweight optimisation models using distillation techniques.