关于对抗性偏见和公平机器学习的鲁棒性

论文标题

关于对抗性偏见和公平机器学习的鲁棒性

On Adversarial Bias and the Robustness of Fair Machine Learning

论文作者

Chang, Hongyan, Nguyen, Ta Duy, Murakonda, Sasi Kumar, Kazemi, Ehsan, Shokri, Reza

论文摘要

优化预测准确性可以以公平为代价。为了最大程度地减少对群体的歧视，公平的机器学习算法通过对模型施加公平性约束来努力在不同群体上均衡模型的行为。但是，我们表明，对不同大小和分布的组具有相同的重要性，以抵消培训数据中偏见的影响，可能与稳健性相抵触。我们分析了针对基于群体的公平机器学习的数据中毒攻击，重点是均衡的赔率。可以控制训练数据的一小部分的对手，可以大大降低测试准确性，超出他在无约束模型上的实现之外。尽管模型满足培训数据的公平性限制，但对抗性采样和对抗标记攻击也可能会使模型的公平差距恶化。我们通过对多种算法和基准数据集的攻击进行经验评估，分析了公平机器学习的鲁棒性。

Optimizing prediction accuracy can come at the expense of fairness. Towards minimizing discrimination against a group, fair machine learning algorithms strive to equalize the behavior of a model across different groups, by imposing a fairness constraint on models. However, we show that giving the same importance to groups of different sizes and distributions, to counteract the effect of bias in training data, can be in conflict with robustness. We analyze data poisoning attacks against group-based fair machine learning, with the focus on equalized odds. An adversary who can control sampling or labeling for a fraction of training data, can reduce the test accuracy significantly beyond what he can achieve on unconstrained models. Adversarial sampling and adversarial labeling attacks can also worsen the model's fairness gap on test data, even though the model satisfies the fairness constraint on training data. We analyze the robustness of fair machine learning through an empirical evaluation of attacks on multiple algorithms and benchmark datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题