论文标题

综合医疗保健数据的下游公平警告

Downstream Fairness Caveats with Synthetic Healthcare Data

论文作者

Bhanot, Karan, Baldini, Ioana, Wei, Dennis, Zeng, Jiaming, Bennett, Kristin P.

论文摘要

本文评估了合成生成的偏见的医疗数据,并研究了公平缓解技术对公用事业的影响。隐私法律限制了对健康数据(例如电子病历(EMR))的访问,以保护患者隐私。尽管至关重要,但这些法律阻碍了研究可重复性。合成数据是一个可行的解决方案,可以启用与无隐私风险的实际医疗保健数据相似的数据。医疗保健数据集可能具有某些保护群体可能比其他人更糟糕的结果的偏见。随着真实数据具有偏见,合成生成的健康数据的公平性受到质疑。在本文中,我们评估了在两个医疗保健数据集上产生的性别和种族偏见的模型的公平性。我们使用称为Healthgan的生成对抗网络生成数据集的合成版本,并比较真实和合成模型的平衡精度和公平得分。我们发现,与真实数据和公平缓解技术相比,合成数据具有不同的公平性能,表明合成数据并非无偏差。

This paper evaluates synthetically generated healthcare data for biases and investigates the effect of fairness mitigation techniques on utility-fairness. Privacy laws limit access to health data such as Electronic Medical Records (EMRs) to preserve patient privacy. Albeit essential, these laws hinder research reproducibility. Synthetic data is a viable solution that can enable access to data similar to real healthcare data without privacy risks. Healthcare datasets may have biases in which certain protected groups might experience worse outcomes than others. With the real data having biases, the fairness of synthetically generated health data comes into question. In this paper, we evaluate the fairness of models generated on two healthcare datasets for gender and race biases. We generate synthetic versions of the dataset using a Generative Adversarial Network called HealthGAN, and compare the real and synthetic model's balanced accuracy and fairness scores. We find that synthetic data has different fairness properties compared to real data and fairness mitigation techniques perform differently, highlighting that synthetic data is not bias free.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源