清醒：通过反设计暴露算法偏差

论文标题

清醒：通过反设计暴露算法偏差

LUCID: Exposing Algorithmic Bias through Inverse Design

论文作者

Mazijn, Carmen, Prunkl, Carina, Algaba, Andres, Danckaert, Jan, Ginis, Vincent

论文摘要

AI系统可以在决策过程中创建，传播，支持和自动化偏见。为了减轻偏见的决策，我们俩都需要了解偏见的起源，并定义算法做出公平决定的含义。大多数群体公平概念通过计算输出上的统计指标来评估模型的结果平等。我们认为，这些输出指标会遇到内在障碍，并提出了一种互补的方法，该方法与对治疗平等的关注越来越多。通过通过规范逆设计（Lucid）找到不公平性，我们生成了一个规范集，该集合显示了给定优选输出的模型所需的输入。该规范集揭示了模型的内部逻辑，并通过反复询问决策过程来暴露潜在的不道德偏见。我们评估了UCI成人和Compas数据集的Lucid，发现规范集检测到的某些偏见与输出指标的偏见不同。结果表明，通过将重点转移到治疗平等并研究算法的内部运作方式中，规范集是算法公平评估工具箱的宝贵补充。

AI systems can create, propagate, support, and automate bias in decision-making processes. To mitigate biased decisions, we both need to understand the origin of the bias and define what it means for an algorithm to make fair decisions. Most group fairness notions assess a model's equality of outcome by computing statistical metrics on the outputs. We argue that these output metrics encounter intrinsic obstacles and present a complementary approach that aligns with the increasing focus on equality of treatment. By Locating Unfairness through Canonical Inverse Design (LUCID), we generate a canonical set that shows the desired inputs for a model given a preferred output. The canonical set reveals the model's internal logic and exposes potential unethical biases by repeatedly interrogating the decision-making process. We evaluate LUCID on the UCI Adult and COMPAS data sets and find that some biases detected by a canonical set differ from those of output metrics. The results show that by shifting the focus towards equality of treatment and looking into the algorithm's internal workings, the canonical sets are a valuable addition to the toolbox of algorithmic fairness evaluation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题