论文标题

“不错的尝试,Kiddo”:在对话回复中调查AD HOMINEMS

"Nice Try, Kiddo": Investigating Ad Hominems in Dialogue Responses

论文作者

Sheng, Emily, Chang, Kai-Wei, Natarajan, Premkumar, Peng, Nanyun

论文摘要

AD HOMINEM攻击是针对一个人角色的某些特征而不是人所维持的位置的攻击。这些攻击是有害的,因为它们传播了隐性偏见并降低了一个人的信誉。由于对话系统直接响应用户的输入,因此在对话响应中研究AD HOMINEM很重要。为此,我们提出了AD HOMINEMS的类别,组成一个注释的数据集,并构建分类器来分析对英语Twitter帖子的人类和对话系统的响应。我们专门将有关边缘化社区(#BlackLivesMatter,#MeToo)与其他主题(#Vegan,#wfh)的Twitter主题进行了比较,因为AD Hominems的滥用语言可以进一步扩大从边缘化人群中的权力偏差。此外,我们提出了一种约束的解码技术,该技术使用显着的$ n $ gram相似性作为Top-$ k $采样的软约束,以减少生成的AD HOMINEMS的数量。我们的结果表明,1)人类和对话的响应都包含更多的AD人物,以讨论围绕边缘化的社区进行讨论,2)培训数据中不同数量的AD同源性,可能会影响产生AD AD的可能性,3)我们可以使用受约束的解码技术来减少生成的对话对话中的AD AD源。

Ad hominem attacks are those that target some feature of a person's character instead of the position the person is maintaining. These attacks are harmful because they propagate implicit biases and diminish a person's credibility. Since dialogue systems respond directly to user input, it is important to study ad hominems in dialogue responses. To this end, we propose categories of ad hominems, compose an annotated dataset, and build a classifier to analyze human and dialogue system responses to English Twitter posts. We specifically compare responses to Twitter topics about marginalized communities (#BlackLivesMatter, #MeToo) versus other topics (#Vegan, #WFH), because the abusive language of ad hominems could further amplify the skew of power away from marginalized populations. Furthermore, we propose a constrained decoding technique that uses salient $n$-gram similarity as a soft constraint for top-$k$ sampling to reduce the amount of ad hominems generated. Our results indicate that 1) responses from both humans and DialoGPT contain more ad hominems for discussions around marginalized communities, 2) different quantities of ad hominems in the training data can influence the likelihood of generating ad hominems, and 3) we can use constrained decoding techniques to reduce ad hominems in generated dialogue responses.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源