采取公平分类反对中毒攻击

论文标题

采取公平分类反对中毒攻击

Towards Fair Classification against Poisoning Attacks

论文作者

Xu, Han, Liu, Xiaorui, Wan, Yuxuan, Tang, Jiliang

论文摘要

公平分类旨在强调分类模型，以达到不同敏感群体之间的平等（治疗或预测质量）。但是，公平的分类可能面临中毒攻击的风险，这些攻击有意插入恶意培训样本以操纵训练有素的分类器的表现。在这项工作中，我们研究了攻击者可以将一小部分样本插入具有任意敏感属性以及其他预测特征的训练数据中的中毒情况。我们证明，训练有素的分类器可能非常容易受到此类中毒攻击的攻击，即使我们采用一些最有效的防御措施（最初建议捍卫传统的分类任务），准确性和公平性的权衡也差得多。作为捍卫公平分类任务的对策，我们提出了一个一般且理论上保证的框架，该框架适合传统的防御方法，以公平地分类，以防止中毒攻击。通过广泛的实验，结果验证了所提出的防御框架比代表性的基线方法在准确性和公平性方面获得了更好的鲁棒性。

Fair classification aims to stress the classification models to achieve the equality (treatment or prediction quality) among different sensitive groups. However, fair classification can be under the risk of poisoning attacks that deliberately insert malicious training samples to manipulate the trained classifiers' performance. In this work, we study the poisoning scenario where the attacker can insert a small fraction of samples into training data, with arbitrary sensitive attributes as well as other predictive features. We demonstrate that the fairly trained classifiers can be greatly vulnerable to such poisoning attacks, with much worse accuracy & fairness trade-off, even when we apply some of the most effective defenses (originally proposed to defend traditional classification tasks). As countermeasures to defend fair classification tasks, we propose a general and theoretically guaranteed framework which accommodates traditional defense methods to fair classification against poisoning attacks. Through extensive experiments, the results validate that the proposed defense framework obtains better robustness in terms of accuracy and fairness than representative baseline methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题