论文标题

使用上下文化的语义轴发现人的表示差异

Discovering Differences in the Representation of People using Contextualized Semantic Axes

论文作者

Lucy, Li, Tadimeti, Divya, Bamman, David

论文摘要

识别社会和时间上环境之间语义差异的常见范式是使用静态单词嵌入及其距离。特别是,过去的工作将嵌入与代表两个相反概念的“语义轴”进行了比较。我们将此范式扩展到BERT嵌入,并构建上下文化的轴,以减轻Antonyms具有相邻表示的陷阱。我们在两个以人为中心的数据集上验证和演示这些轴:Wikipedia的职业,以及在14年中的极端主义者,男性社区进行的多平台讨论。在这两项研究中,上下文化的语义轴都可以表征相同单词类型的实例之间的差异。在后一项研究中,我们表明,随着时间的流逝,对妇女及其周围环境的引用变得更加可恶。

A common paradigm for identifying semantic differences across social and temporal contexts is the use of static word embeddings and their distances. In particular, past work has compared embeddings against "semantic axes" that represent two opposing concepts. We extend this paradigm to BERT embeddings, and construct contextualized axes that mitigate the pitfall where antonyms have neighboring representations. We validate and demonstrate these axes on two people-centric datasets: occupations from Wikipedia, and multi-platform discussions in extremist, men's communities over fourteen years. In both studies, contextualized semantic axes can characterize differences among instances of the same word type. In the latter study, we show that references to women and the contexts around them have become more detestable over time.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源