论文标题
将苹果与橘子进行比较:学习不同分布生成的数据的相似性功能
Comparing Apples to Oranges: Learning Similarity Functions for Data Produced by Different Distributions
论文作者
论文摘要
相似性函数衡量了元素的可比较对,并在多种应用中起关键作用,例如,通过Dwork等人的开创性范式遵守的个人公平概念以及聚类问题。但是,不应总是将获得准确的相似性函数访问,并且这一点甚至由Dwork等提高。例如,可以合理地假设,当要比较的要素是由不同的分布产生的,或者换句话说属于不同的``人口统计学''群体时,对其真正相似性的知识可能很难获得。在这项工作中,我们提出了一个有效的抽样框架,该框架仅使用有限的专家反馈来学习这些跨组相似性功能。我们以严格的理论界限显示了分析结果,并通过大量实验从经验上验证了我们的算法。
Similarity functions measure how comparable pairs of elements are, and play a key role in a wide variety of applications, e.g., notions of Individual Fairness abiding by the seminal paradigm of Dwork et al., as well as Clustering problems. However, access to an accurate similarity function should not always be considered guaranteed, and this point was even raised by Dwork et al. For instance, it is reasonable to assume that when the elements to be compared are produced by different distributions, or in other words belong to different ``demographic'' groups, knowledge of their true similarity might be very difficult to obtain. In this work, we present an efficient sampling framework that learns these across-groups similarity functions, using only a limited amount of experts' feedback. We show analytical results with rigorous theoretical bounds, and empirically validate our algorithms via a large suite of experiments.