演示ROSA：任何数据分析管道的公平解决方案

论文标题

演示ROSA：任何数据分析管道的公平解决方案

Demonstrating Rosa: the fairness solution for any Data Analytic pipeline

论文作者

Wilkinson, Kate, Cevora, George

论文摘要

分析行业感兴趣的大多数数据集都受到各种形式的人类偏见的影响。因此，数据分析[DA]或机器学习[ML]的结果很容易复制偏差。结果，基于DA/ML的大量有偏见的决策系统最近引起了人们的关注。在本文中，我们介绍了Rosa，Rosa是一种基于网络的免费工具，可轻松相对于选定的特征来消除偏差数据集。 Rosa基于Illumr Ltd.开发的公平对抗网络的原理，因此可以消除交互式，非线性和非二元偏见。 Rosa是独立的预处理步骤 / API，这意味着可以轻松使用任何DA / ML管道。我们通过在五个现实世界数据集中执行标准DA任务，从数据驱动的决策系统中消除偏见，从而测试Rosa的功效，这些数据与当前DA问题相关，并选择其偏见的潜力很高。我们使用简单的ML模型来对分析兴趣的特征进行建模，并将模型输出中有或没有ROSA的偏差水平作为预处理步骤进行比较。我们发现，在与ROSA预处理数据时，数据驱动的决策系统的偏差都大大减少。

Most datasets of interest to the analytics industry are impacted by various forms of human bias. The outcomes of Data Analytics [DA] or Machine Learning [ML] on such data are therefore prone to replicating the bias. As a result, a large number of biased decision-making systems based on DA/ML have recently attracted attention. In this paper we introduce Rosa, a free, web-based tool to easily de-bias datasets with respect to a chosen characteristic. Rosa is based on the principles of Fair Adversarial Networks, developed by illumr Ltd., and can therefore remove interactive, non-linear, and non-binary bias. Rosa is stand-alone pre-processing step / API, meaning it can be used easily with any DA/ML pipeline. We test the efficacy of Rosa in removing bias from data-driven decision making systems by performing standard DA tasks on five real-world datasets, selected for their relevance to current DA problems, and also their high potential for bias. We use simple ML models to model a characteristic of analytical interest, and compare the level of bias in the model output both with and without Rosa as a pre-processing step. We find that in all cases there is a substantial decrease in bias of the data-driven decision making systems when the data is pre-processed with Rosa.

下载PDF全文

下载文献需遵守相关版权规定

论文标题