MixMt 2022的GUI：英语 - 布：MT翻译代码混合数据的方法

论文标题

MixMt 2022的GUI：英语 - 布：MT翻译代码混合数据的方法

Gui at MixMT 2022 : English-Hinglish: An MT approach for translation of code mixed data

论文作者

Gahoi, Akshat, Duneja, Jayant, Padhi, Anshul, Mangale, Shivam, Rajput, Saransh, Kamble, Tanvi, Sharma, Dipti Misra, Varma, Vasudeva

论文摘要

代码混合的机器翻译已成为多语言社区的重要任务，并将机器翻译任务扩展到代码混合数据已成为这些语言的常见任务。在WMT 2022的共同任务中，我们试图将英语 +印地语的同时解决与英语相同。第一个任务涉及罗马和Devanagari脚本，因为我们拥有英语和印地语单语言数据，而第二个任务仅在罗马脚本中具有数据。据我们所知，我们获得了最高的Rouge-L和WER分数之一，用于单语到混合机器翻译的第一个任务。在本文中，我们讨论了MBART的使用，并详细介绍了一些特殊的预处理和后处理（从Devanagari到罗马的音译），以及我们为将代码混合的Hinglish转换为单语言英语而进行的第二项任务。

Code-mixed machine translation has become an important task in multilingual communities and extending the task of machine translation to code mixed data has become a common task for these languages. In the shared tasks of WMT 2022, we try to tackle the same for both English + Hindi to Hinglish and Hinglish to English. The first task dealt with both Roman and Devanagari script as we had monolingual data in both English and Hindi whereas the second task only had data in Roman script. To our knowledge, we achieved one of the top ROUGE-L and WER scores for the first task of Monolingual to Code-Mixed machine translation. In this paper, we discuss the use of mBART with some special pre-processing and post-processing (transliteration from Devanagari to Roman) for the first task in detail and the experiments that we performed for the second task of translating code-mixed Hinglish to monolingual English.

下载PDF全文

下载文献需遵守相关版权规定

论文标题