论文标题
公平:公平对抗实例重新加权
FAIR: Fair Adversarial Instance Re-weighting
论文作者
论文摘要
随着人们对人工智能的社会影响的认识,公平已成为机器学习算法的重要方面。问题在于,通过数据收集和标签将人类对某些由种族和性别等敏感特征等敏感特征(例如种族和性别)定义的偏见。公平性的两个重要方向确保研究重点是(i)实例加权,以减少更有偏见的实例和(ii)对抗性训练的影响,以构建目标变量的数据表示,但对敏感属性的无信息。在本文中,我们提出了一个公平的对抗实例重新加权方法(公平)方法,该方法使用对抗性培训来学习实例加权功能,以确保公平的预测。合并了两个范式,它从两者中都继承了理想的属性 - 重新加权和对抗性训练的端到端训练性。我们提出了该方法的四个不同变体,除其他外,还展示了如何在完全概率的框架中施放该方法。此外,已经对公平模型属性的理论分析进行了广泛的研究。我们将公平模型与其他7种相关和最先进的模型进行了比较,并证明Fair能够在准确性和不公平之间实现更好的权衡。据我们所知,这是第一个通过可以提供有关个人实例公平性的可解释信息的加权函数将重新加权和对抗方法合并的模型。
With growing awareness of societal impact of artificial intelligence, fairness has become an important aspect of machine learning algorithms. The issue is that human biases towards certain groups of population, defined by sensitive features like race and gender, are introduced to the training data through data collection and labeling. Two important directions of fairness ensuring research have focused on (i) instance weighting in order to decrease the impact of more biased instances and (ii) adversarial training in order to construct data representations informative of the target variable, but uninformative of the sensitive attributes. In this paper we propose a Fair Adversarial Instance Re-weighting (FAIR) method, which uses adversarial training to learn instance weighting function that ensures fair predictions. Merging the two paradigms, it inherits desirable properties from both -- interpretability of reweighting and end-to-end trainability of adversarial training. We propose four different variants of the method and, among other things, demonstrate how the method can be cast in a fully probabilistic framework. Additionally, theoretical analysis of FAIR models' properties have been studied extensively. We compare FAIR models to 7 other related and state-of-the-art models and demonstrate that FAIR is able to achieve a better trade-off between accuracy and unfairness. To the best of our knowledge, this is the first model that merges reweighting and adversarial approaches by means of a weighting function that can provide interpretable information about fairness of individual instances.