论文标题

证据不可避免地会降低模型性能吗

Does Debiasing Inevitably Degrade the Model Performance

论文作者

Liu, Yiran, Liu, Xiao, Chen, Haotian, Yu, Yang

论文摘要

语言模型中的性别偏见引起了足够的关注,因为它威胁着社会正义。但是,当前的大多数辩论方法都会降低模型在其他任务上的性能,而退化机制仍然是神秘的。我们提出了一个理论框架,解释了语言模型的性别偏见的三种候选机制。我们使用我们的理论框架来解释为什么当前的偏见方法会导致性能降解。我们还发现了一种途径,通过这种途径,借鉴不会降低模型性能。我们进一步开发了一种因果检测微调方法来纠正性别偏见。数值实验表明,我们的方法能够导致双重股息:部分缓解性别偏见,同时避免绩效降解。

Gender bias in language models has attracted sufficient attention because it threatens social justice. However, most of the current debiasing methods degraded the model's performance on other tasks while the degradation mechanism is still mysterious. We propose a theoretical framework explaining the three candidate mechanisms of the language model's gender bias. We use our theoretical framework to explain why the current debiasing methods cause performance degradation. We also discover a pathway through which debiasing will not degrade the model performance. We further develop a causality-detection fine-tuning approach to correct gender bias. The numerical experiment demonstrates that our method is able to lead to double dividends: partially mitigating gender bias while avoiding performance degradation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源