论文标题
在形态句法的镜头下:语音翻译中性别偏见的多方面评估
Under the Morphosyntactic Lens: A Multifaceted Evaluation of Gender Bias in Speech Translation
论文作者
论文摘要
性别偏见在很大程度上被认为是影响语言技术的有问题现象,最近的研究强调了它在语言上可能不同。但是,当前的大多数评估实践在合成条件下采用了单词级别的重点,专注于一组狭窄的职业名词。这种协议忽略了语法性别语言的关键特征,其特征是性别一致的词法链,以各种词汇项目和词性(POS)为标志。为了克服这一局限性,我们用两个新的语言注释层(POS和一致性链)丰富了自然,对性别敏感的必不可少的语料库(Bentivogli et al。,2020),并探索了不同的词汇类别和一致性现象的程度。为了注重语音翻译,我们在三个语言方向(英语 - 法国/意大利语/西班牙语)进行了多方面的评估,并接受了不同数量的数据和不同单词分割技术的模型。通过阐明模型行为,性别偏差及其在几个粒度层面上的检测,我们的发现强调了超出总体总成果的专用分析的价值。
Gender bias is largely recognized as a problematic phenomenon affecting language technologies, with recent studies underscoring that it might surface differently across languages. However, most of current evaluation practices adopt a word-level focus on a narrow set of occupational nouns under synthetic conditions. Such protocols overlook key features of grammatical gender languages, which are characterized by morphosyntactic chains of gender agreement, marked on a variety of lexical items and parts-of-speech (POS). To overcome this limitation, we enrich the natural, gender-sensitive MuST-SHE corpus (Bentivogli et al., 2020) with two new linguistic annotation layers (POS and agreement chains), and explore to what extent different lexical categories and agreement phenomena are impacted by gender skews. Focusing on speech translation, we conduct a multifaceted evaluation on three language directions (English-French/Italian/Spanish), with models trained on varying amounts of data and different word segmentation techniques. By shedding light on model behaviours, gender bias, and its detection at several levels of granularity, our findings emphasize the value of dedicated analyses beyond aggregated overall results.