论文标题
机器学习库的公平意识配置
Fairness-aware Configuration of Machine Learning Libraries
论文作者
论文摘要
本文研究了机器学习的参数空间(ML)算法在加剧或减轻公平虫子中。数据驱动的软件越来越多地应用于确保公平性至关重要的社会关键应用中。现有方法着重于通过修改输入数据集或修改学习算法来解决公平性错误。另一方面,提供更精细的ML算法控制的超参数的选择可能会导致一种影响不足的方法来影响公平。超参数可以放大或抑制输入数据集中存在的歧视吗?我们如何帮助程序员检测,理解和利用超参数提高公平性的作用? 我们设计了三种基于搜索的软件测试算法,以发现超参数空间的精确领域。我们通过统计调试来补充这些算法,以解释这些参数在改善公平性中的作用。我们在工具帕菲特-ML(ML库的参数公平测试)中实施了所提出的方法,并在六个社会关键应用程序中使用的五种成熟ML算法显示了其有效性和实用性。在这些应用中,我们的方法成功地识别出显着改善的超参数(相对于最先进的技术),而无需牺牲精度。出乎意料的是,对于某些算法(例如随机森林),我们的方法表明,超参数的某些配置(例如,限制属性的搜索空间)可以扩大跨应用程序的偏见。经过进一步的研究,我们发现了这些现象的直观解释,结果证实了文献的相似观察结果。
This paper investigates the parameter space of machine learning (ML) algorithms in aggravating or mitigating fairness bugs. Data-driven software is increasingly applied in social-critical applications where ensuring fairness is of paramount importance. The existing approaches focus on addressing fairness bugs by either modifying the input dataset or modifying the learning algorithms. On the other hand, the selection of hyperparameters, which provide finer controls of ML algorithms, may enable a less intrusive approach to influence the fairness. Can hyperparameters amplify or suppress discrimination present in the input dataset? How can we help programmers in detecting, understanding, and exploiting the role of hyperparameters to improve the fairness? We design three search-based software testing algorithms to uncover the precision-fairness frontier of the hyperparameter space. We complement these algorithms with statistical debugging to explain the role of these parameters in improving fairness. We implement the proposed approaches in the tool Parfait-ML (PARameter FAIrness Testing for ML Libraries) and show its effectiveness and utility over five mature ML algorithms as used in six social-critical applications. In these applications, our approach successfully identified hyperparameters that significantly improve (vis-a-vis the state-of-the-art techniques) the fairness without sacrificing precision. Surprisingly, for some algorithms (e.g., random forest), our approach showed that certain configuration of hyperparameters (e.g., restricting the search space of attributes) can amplify biases across applications. Upon further investigation, we found intuitive explanations of these phenomena, and the results corroborate similar observations from the literature.