社交媒体上初步抑郁状态估算的新型情感分析引擎

论文标题

社交媒体上初步抑郁状态估算的新型情感分析引擎

A Novel Sentiment Analysis Engine for Preliminary Depression Status Estimation on Social Media

论文作者

Suman, Sudhir Kumar, Shalu, Hrithwik, Agrawal, Lakshya A, Agrawal, Archit, Kadiwala, Juned

论文摘要

在社交媒体上对用户进行初步抑郁状态估计的文本情绪分析是一种广泛行使且可行的方法，但是，访问社交媒体网站的众多用户及其充分的词汇混合，使得常见应用的基于深度学习的分类器很难执行。除了情况外，传统监督机器学习缺乏适应能力可能会在许多层面上受伤。我们提出了一个基于云的智能手机应用程序，并具有深度学习的后端，主要在Twitter社交媒体上进行抑郁症检测。后端模型由一个基于罗伯塔的暹罗句子分类器组成，该分类器将给定的推文（查询）与标有已知情感的标记的推文（标准语料库）进行了比较。随着时间的流逝，标准语料库随着专家意见而变化，以提高模型的可靠性。心理学家（并获得患者的许可）可以利用该应用程序在咨询之前评估患者的抑郁状况，这可以更好地了解患者的心理健康状况。此外，与之相同的是，心理学家可以参考具有类似特征的情况，这反过来又有助于更有效的治疗。在公开可用的数据集中对后端模型进行微调后，我们评估了我们的后端模型。发现调整的模型是为了预测具有随机噪声因子的大量推文样品的抑郁症。该模型达到了峰顶结果，测试精度为87.23％，AUC为0.8621。

Text sentiment analysis for preliminary depression status estimation of users on social media is a widely exercised and feasible method, However, the immense variety of users accessing the social media websites and their ample mix of vocabularies makes it difficult for commonly applied deep learning-based classifiers to perform. To add to the situation, the lack of adaptability of traditional supervised machine learning could hurt at many levels. We propose a cloud-based smartphone application, with a deep learning-based backend to primarily perform depression detection on Twitter social media. The backend model consists of a RoBERTa based siamese sentence classifier that compares a given tweet (Query) with a labeled set of tweets with known sentiment ( Standard Corpus ). The standard corpus is varied over time with expert opinion so as to improve the model's reliability. A psychologist ( with the patient's permission ) could leverage the application to assess the patient's depression status prior to counseling, which provides better insight into the mental health status of a patient. In addition, to the same, the psychologist could be referred to cases of similar characteristics, which could in turn help in more effective treatment. We evaluate our backend model after fine-tuning it on a publicly available dataset. The find tuned model is made to predict depression on a large set of tweet samples with random noise factors. The model achieved pinnacle results, with a testing accuracy of 87.23% and an AUC of 0.8621.

下载PDF全文

下载文献需遵守相关版权规定

论文标题