论文标题

JUNLP@semeval-2020任务9:使用网格搜索交叉验证的印度英语代码混合数据的情感分析

JUNLP@SemEval-2020 Task 9:Sentiment Analysis of Hindi-English code mixed data using Grid Search Cross Validation

论文作者

Garain, Avishek, Mahata, Sainik Kumar, Das, Dipankar

论文摘要

混合代码是一种主要在多语言社会中引起的现象。多语言的人精通母语,而且说英语的人,他们倾向于使用基于英语的语音键入和以其主要语言插入英语主义的代码混音。这种语言现象对传统的NLP领域(例如情感分析,机器翻译和文本摘要)构成了巨大挑战,仅举几例。在这项工作中,我们专注于为代码混合情感分析的领域制定合理的解决方案。这项工作是在参与Semeval-2020 Sentimix任务时完成的,我们专注于对英语印地语代码混合句子的情感分析。我们提交的用户名是“ sainik.mahata”,团队名称为“ junlp”。我们将特征提取算法与传统的机器学习算法(例如SVR和网格搜索)结合使用,以解决该任务。当使用任务组织者准备的指标测试时,我们的方法获得了66.2 \%的F1得分。

Code-mixing is a phenomenon which arises mainly in multilingual societies. Multilingual people, who are well versed in their native languages and also English speakers, tend to code-mix using English-based phonetic typing and the insertion of anglicisms in their main language. This linguistic phenomenon poses a great challenge to conventional NLP domains such as Sentiment Analysis, Machine Translation, and Text Summarization, to name a few. In this work, we focus on working out a plausible solution to the domain of Code-Mixed Sentiment Analysis. This work was done as participation in the SemEval-2020 Sentimix Task, where we focused on the sentiment analysis of English-Hindi code-mixed sentences. our username for the submission was "sainik.mahata" and team name was "JUNLP". We used feature extraction algorithms in conjunction with traditional machine learning algorithms such as SVR and Grid Search in an attempt to solve the task. Our approach garnered an f1-score of 66.2\% when tested using metrics prepared by the organizers of the task.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源