公平特征子集选择使用多目标遗传算法

论文标题

公平特征子集选择使用多目标遗传算法

Fair Feature Subset Selection using Multiobjective Genetic Algorithm

论文作者

Rehman, Ayaz Ur, Nadeem, Anas, Malik, Muhammad Zubair

论文摘要

功能子集选择问题旨在选择相关的功能子集，以提高机器学习（ML）算法在培训数据上的性能。数据中的某些功能本质上可能是嘈杂的，计算，不当缩放或与其他功能相关的成本，它们可能会对诱导算法的准确性，成本和复杂性产生不利影响。传统特征选择方法的目的是消除这种无关紧要的功能。近年来，ML对我们日常生活的决策过程产生了明显的影响。我们要确保这些决定不会反映基于年龄，性别或种族等受保护属性的某些群体或个人的偏见行为。在本文中，我们提出了一种特征子集选择方法，该方法可以提高公平性和准确性目标，并使用NSGA-II算法计算帕累托最佳解决方案。我们将统计差异用作公平度量标准，而F1得分作为模型性能的指标。我们对具有三种不同机器学习算法的最常用公平基准数据集进行的实验表明，使用进化算法，我们可以有效地探索公平与准确性之间的权衡。

The feature subset selection problem aims at selecting the relevant subset of features to improve the performance of a Machine Learning (ML) algorithm on training data. Some features in data can be inherently noisy, costly to compute, improperly scaled, or correlated to other features, and they can adversely affect the accuracy, cost, and complexity of the induced algorithm. The goal of traditional feature selection approaches has been to remove such irrelevant features. In recent years ML is making a noticeable impact on the decision-making processes of our everyday lives. We want to ensure that these decisions do not reflect biased behavior towards certain groups or individuals based on protected attributes such as age, sex, or race. In this paper, we present a feature subset selection approach that improves both fairness and accuracy objectives and computes Pareto-optimal solutions using the NSGA-II algorithm. We use statistical disparity as a fairness metric and F1-Score as a metric for model performance. Our experiments on the most commonly used fairness benchmark datasets with three different machine learning algorithms show that using the evolutionary algorithm we can effectively explore the trade-off between fairness and accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题