论文标题
道德基础reddit语料库
The Moral Foundations Reddit Corpus
论文作者
论文摘要
道德框架和情感可能会影响各种在线和离线行为,包括捐赠,亲环境行动,政治参与,甚至参与暴力抗议活动。自然语言处理中的各种计算方法(NLP)已被用来从文本数据中检测道德情绪,但是为了在此类主观任务中取得更好的性能,需要大量的手工宣传数据。事实证明,以前对道德情绪注释的语料库已被证明是有价值的,并且在NLP和整个社会科学中都产生了新的见解,但仅限于Twitter。为了促进我们对道德修辞作用的理解,我们提出了道德基础,reddit语料库,收集了16,123个reddit评论的集合,这些评论已从12个不同的子记录中策划了,这些评论是由12个不同的子雷达者策划的,至少由三种训练有素的注释者手工注册,以8种的道德情绪(即基于护理,纯正,纯正,纯净,纯净,忠诚,忠诚,忠诚,忠诚,忠诚),更新的道德基础理论(MFT)框架。我们使用一系列方法为这种新的语料库(例如跨域分类和知识转移)提供基线道德居民分类结果。
Moral framing and sentiment can affect a variety of online and offline behaviors, including donation, pro-environmental action, political engagement, and even participation in violent protests. Various computational methods in Natural Language Processing (NLP) have been used to detect moral sentiment from textual data, but in order to achieve better performances in such subjective tasks, large sets of hand-annotated training data are needed. Previous corpora annotated for moral sentiment have proven valuable, and have generated new insights both within NLP and across the social sciences, but have been limited to Twitter. To facilitate improving our understanding of the role of moral rhetoric, we present the Moral Foundations Reddit Corpus, a collection of 16,123 Reddit comments that have been curated from 12 distinct subreddits, hand-annotated by at least three trained annotators for 8 categories of moral sentiment (i.e., Care, Proportionality, Equality, Purity, Authority, Loyalty, Thin Morality, Implicit/Explicit Morality) based on the updated Moral Foundations Theory (MFT) framework. We use a range of methodologies to provide baseline moral-sentiment classification results for this new corpus, e.g., cross-domain classification and knowledge transfer.