论文标题
演示ROSA:任何数据分析管道的公平解决方案
Demonstrating Rosa: the fairness solution for any Data Analytic pipeline
论文作者
论文摘要
分析行业感兴趣的大多数数据集都受到各种形式的人类偏见的影响。因此,数据分析[DA]或机器学习[ML]的结果很容易复制偏差。结果,基于DA/ML的大量有偏见的决策系统最近引起了人们的关注。在本文中,我们介绍了Rosa,Rosa是一种基于网络的免费工具,可轻松相对于选定的特征来消除偏差数据集。 Rosa基于Illumr Ltd.开发的公平对抗网络的原理,因此可以消除交互式,非线性和非二元偏见。 Rosa是独立的预处理步骤 / API,这意味着可以轻松使用任何DA / ML管道。我们通过在五个现实世界数据集中执行标准DA任务,从数据驱动的决策系统中消除偏见,从而测试Rosa的功效,这些数据与当前DA问题相关,并选择其偏见的潜力很高。我们使用简单的ML模型来对分析兴趣的特征进行建模,并将模型输出中有或没有ROSA的偏差水平作为预处理步骤进行比较。我们发现,在与ROSA预处理数据时,数据驱动的决策系统的偏差都大大减少。
Most datasets of interest to the analytics industry are impacted by various forms of human bias. The outcomes of Data Analytics [DA] or Machine Learning [ML] on such data are therefore prone to replicating the bias. As a result, a large number of biased decision-making systems based on DA/ML have recently attracted attention. In this paper we introduce Rosa, a free, web-based tool to easily de-bias datasets with respect to a chosen characteristic. Rosa is based on the principles of Fair Adversarial Networks, developed by illumr Ltd., and can therefore remove interactive, non-linear, and non-binary bias. Rosa is stand-alone pre-processing step / API, meaning it can be used easily with any DA/ML pipeline. We test the efficacy of Rosa in removing bias from data-driven decision making systems by performing standard DA tasks on five real-world datasets, selected for their relevance to current DA problems, and also their high potential for bias. We use simple ML models to model a characteristic of analytical interest, and compare the level of bias in the model output both with and without Rosa as a pre-processing step. We find that in all cases there is a substantial decrease in bias of the data-driven decision making systems when the data is pre-processed with Rosa.