印度名字的解码人口统计学

论文标题

印度名字的解码人口统计学

Decoding Demographic un-fairness from Indian Names

论文作者

Vahini, Medidoddi, Bantupalli, Jalend, Chakraborty, Souvic, Mukherjee, Animesh

论文摘要

人口统计学分类对于推荐系统的公平评估或测量在线网络和投票系统中的意外偏见至关重要。教育和政治等重要领域通常为社会平等的未来奠定了基础，需要审查设计政策，这些政策可以更好地促进受国家人口不平衡分布限制的资源分配平等。我们收集三个公开可用的数据集，以培训性别和种姓分类领域的最先进的分类器。我们在印度背景下对模型进行训练，那里的同名可以拥有不同的造型惯例（一种州的Jolly Abraham/Kumar Abhishikta可以写为Abraham Jolly/Abishikta Kumar）。最后，我们还进行了跨测试（在不同数据集上的培训和测试），以了解上述模型的功效。我们还对预测模型执行错误分析。最后，我们试图评估现有印度系统的偏见作为案例研究，并找到一些在性别和种姓层面的次大陆的复杂人口布局中表现出的有趣模式。

Demographic classification is essential in fairness assessment in recommender systems or in measuring unintended bias in online networks and voting systems. Important fields like education and politics, which often lay a foundation for the future of equality in society, need scrutiny to design policies that can better foster equality in resource distribution constrained by the unbalanced demographic distribution of people in the country. We collect three publicly available datasets to train state-of-the-art classifiers in the domain of gender and caste classification. We train the models in the Indian context, where the same name can have different styling conventions (Jolly Abraham/Kumar Abhishikta in one state may be written as Abraham Jolly/Abishikta Kumar in the other). Finally, we also perform cross-testing (training and testing on different datasets) to understand the efficacy of the above models. We also perform an error analysis of the prediction models. Finally, we attempt to assess the bias in the existing Indian system as case studies and find some intriguing patterns manifesting in the complex demographic layout of the sub-continent across the dimensions of gender and caste.

下载PDF全文

下载文献需遵守相关版权规定

论文标题