论文标题
距离和基于内核的全球和本地两样本条件分布测试的措施
Distance and Kernel-Based Measures for Global and Local Two-Sample Conditional Distribution Testing
论文作者
论文摘要
在各种现代应用中测试两个条件分布的平等至关重要,包括转移学习和因果推断。尽管它很重要,但这个基本问题在文献中的关注很少。这项工作旨在根据距离和内核方法提出一个统一的框架,用于全球和局部两样本条件分布测试。为此,我们介绍了距离和基于内核的测量,以表征两个条件分布的同质性。从条件U统计量的概念中提出,我们提出了这些措施的一致估计器。从理论上讲,我们得出了估计量的收敛速率和无效假设和替代假设下的渐近分布。利用这些措施以及本地的自举方法,我们开发了可以分别检测到全球和本地级别的两个条件分布之间的差异的全球和本地测试。我们的测试通过模拟和实际数据分析表明了可靠的性能。
Testing the equality of two conditional distributions is crucial in various modern applications, including transfer learning and causal inference. Despite its importance, this fundamental problem has received surprisingly little attention in the literature. This work aims to present a unified framework based on distance and kernel methods for both global and local two-sample conditional distribution testing. To this end, we introduce distance and kernel-based measures that characterize the homogeneity of two conditional distributions. Drawing from the concept of conditional U-statistics, we propose consistent estimators for these measures. Theoretically, we derive the convergence rates and the asymptotic distributions of the estimators under both the null and alternative hypotheses. Utilizing these measures, along with a local bootstrap approach, we develop global and local tests that can detect discrepancies between two conditional distributions at global and local levels, respectively. Our tests demonstrate reliable performance through simulations and real data analyses.