基于知识的模板机在低资源设置中的翻译

论文标题

基于知识的模板机在低资源设置中的翻译

Knowledge Based Template Machine Translation In Low-Resource Setting

论文作者

Tang, Zilu, Wijaya, Derry

论文摘要

将标签纳入神经机器翻译（NMT）系统已显示出令人鼓舞的结果，可帮助翻译诸如命名实体（NE）之类的稀有单词。但是，在低资源环境中翻译NE仍然是一个挑战。在这项工作中，我们研究了在不同级别的资源条件下，在平行语料库中使用标签和NE Hypernyms的效果。我们发现标签和复制机制（标记源句子中的NES并将其复制到目标句子）仅在高资源设置中改进翻译。引入复制还会导致翻译不同语音部分（POS）的两极化效果。有趣的是，我们发现高鼻的复制精度始终高于实体。为了避免在引导稀有实体中“硬”复制和使用HyperNym的一种方式，我们引入了“软”标记机构，并在高资源和低资产设置中发现了一致的改进。

Incorporating tagging into neural machine translation (NMT) systems has shown promising results in helping translate rare words such as named entities (NE). However, translating NE in low-resource setting remains a challenge. In this work, we investigate the effect of using tags and NE hypernyms from knowledge graphs (KGs) in parallel corpus in different levels of resource conditions. We find the tag-and-copy mechanism (tag the NEs in the source sentence and copy them to the target sentence) improves translation in high-resource settings only. Introducing copying also results in polarizing effects in translating different parts-of-speech (POS). Interestingly, we find that copy accuracy for hypernyms is consistently higher than that of entities. As a way of avoiding "hard" copying and utilizing hypernym in bootstrapping rare entities, we introduced a "soft" tagging mechanism and found consistent improvement in high and low-resource settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题