论文标题
神经机器翻译与联合表示
Neural Machine Translation with Joint Representation
论文作者
论文摘要
尽管统计机器翻译(SMT)系统的早期成功部分归因于任何两个源和目标单元之间相互作用的明确建模,例如对齐,但最近的神经机器翻译(NMT)系统诉诸于对效率进行部分编码相互作用的部分编码相互作用的注意力。在本文中,我们采用联合表示,充分说明每种可能的交互作用。我们通过提出有效的注意操作来完善表示形式来避开效率低下的问题。除编码器框架外,由此产生的改革者模型提供了一种新的序列建模范式,并在小规模的IWSLT14德国 - 英国 - 英国,英国 - 德国,英国和IWSLT15越南人 - 英格兰或大型Nist12中性翻译任务中均超过了变形金属的基线。改革者的模型击败了IWSLT14德国英语和Nist12中文英语中最先进的变压器,参数少约50%。该代码可在https://github.com/lyy1994/reformer上公开获取。
Though early successes of Statistical Machine Translation (SMT) systems are attributed in part to the explicit modelling of the interaction between any two source and target units, e.g., alignment, the recent Neural Machine Translation (NMT) systems resort to the attention which partially encodes the interaction for efficiency. In this paper, we employ Joint Representation that fully accounts for each possible interaction. We sidestep the inefficiency issue by refining representations with the proposed efficient attention operation. The resulting Reformer models offer a new Sequence-to- Sequence modelling paradigm besides the Encoder-Decoder framework and outperform the Transformer baseline in either the small scale IWSLT14 German-English, English-German and IWSLT15 Vietnamese-English or the large scale NIST12 Chinese-English translation tasks by about 1 BLEU point.We also propose a systematic model scaling approach, allowing the Reformer model to beat the state-of-the-art Transformer in IWSLT14 German-English and NIST12 Chinese-English with about 50% fewer parameters. The code is publicly available at https://github.com/lyy1994/reformer.