连续评级作为同时言语翻译的可靠人类评估

论文标题

连续评级作为同时言语翻译的可靠人类评估

Continuous Rating as Reliable Human Evaluation of Simultaneous Speech Translation

论文作者

Javorský, Dávid, Macháček, Dominik, Bojar, Ondřej

论文摘要

可以在模拟的在线事件上评估同时的语音翻译（SST），其中人类评估者观看字幕视频并通过按下按钮（即连续评级）不断表达满意度。连续评级很容易收集，但是关于其可靠性或与SST用户对外语文档的理解有关的知之甚少。在本文中，我们将连续评级与对法官的事实问卷进行对比，并具有不同级别的源语言知识。我们的结果表明，如果法官对源语言的了解至少有限，则连续评级是简单且可靠的SST质量评估。我们的研究表明用户对字幕布局和演示样式的喜好，最重要的是，提供了一个重要的证据，表明具有先进的源语言知识的用户比较少的重新翻译更喜欢潜伏期低。

Simultaneous speech translation (SST) can be evaluated on simulated online events where human evaluators watch subtitled videos and continuously express their satisfaction by pressing buttons (so called Continuous Rating). Continuous Rating is easy to collect, but little is known about its reliability, or relation to comprehension of foreign language document by SST users. In this paper, we contrast Continuous Rating with factual questionnaires on judges with different levels of source language knowledge. Our results show that Continuous Rating is easy and reliable SST quality assessment if the judges have at least limited knowledge of the source language. Our study indicates users' preferences on subtitle layout and presentation style and, most importantly, provides a significant evidence that users with advanced source language knowledge prefer low latency over fewer re-translations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题