hinglisheval的Prepogiith：利用代码混合指标和语言模型嵌入以估算代码混合质量

论文标题

hinglisheval的Prepogiith：利用代码混合指标和语言模型嵌入以估算代码混合质量

PreCogIIITH at HinglishEval : Leveraging Code-Mixing Metrics & Language Model Embeddings To Estimate Code-Mix Quality

论文作者

Kodali, Prashant, Sachan, Tanmay, Goindani, Akshay, Goel, Anmol, Ahuja, Naman, Shrivastava, Manish, Kumaraguru, Ponnurangam

论文摘要

混音是在语音事件中混合两种或多种语言的现象，并且在多语言社会中很普遍。鉴于代码混合的低资源性质，代码混合文本的机器生成是数据增强的一种普遍方法。但是，评估该机器生成的代码混合文本的质量是一个开放问题。在与INLG2022共享的共享任务的Hinglisheval提交时，我们尝试通过预测代码混合质量的评分来构建影响合成生成的代码混合文本质量的模型因素。

Code-Mixing is a phenomenon of mixing two or more languages in a speech event and is prevalent in multilingual societies. Given the low-resource nature of Code-Mixing, machine generation of code-mixed text is a prevalent approach for data augmentation. However, evaluating the quality of such machine generated code-mixed text is an open problem. In our submission to HinglishEval, a shared-task collocated with INLG2022, we attempt to build models factors that impact the quality of synthetically generated code-mix text by predicting ratings for code-mix quality.

下载PDF全文

下载文献需遵守相关版权规定

论文标题