论文标题

部分可观测时空混沌系统的无模型预测

Leveraging Structure for Improved Classification of Grouped Biased Data

论文作者

Zeiberg, Daniel, Jain, Shantanu, Radivojac, Predrag

论文摘要

我们考虑对数据点自然分组的应用(例如,由州分组的调查响应)进行的半监督二进制分类,并且标记的数据是偏见的(例如,调查受访者不代表人群)。组在特征空间中重叠,因此输入输出模式在整个组之间是相关的。为了建模此类数据中的固有结构,我们假设跨组跨组的分区划分构件不变性,该组是根据群 - 不平衡的特征空间定义的。我们证明,在这个假设下,该小组在群 - 不合稳定的特征上携带有关该类别的其他信息,而ROC曲线下的面积得到了改善。进一步假设在标记和未标记的数据上分配了分区的类条件分布的不变性,我们得出了一种半监督的算法,该算法明确利用结构来学习最佳的,群体意识,概率校准的分类器,尽管标有标记的数据中的偏见。关于合成和真实数据的实验证明了我们的算法对合适的基准和烧蚀模型的功效,涵盖了标准监督和半监督学习方法,并且不直接将组作为特征并不直接合并。

We consider semi-supervised binary classification for applications in which data points are naturally grouped (e.g., survey responses grouped by state) and the labeled data is biased (e.g., survey respondents are not representative of the population). The groups overlap in the feature space and consequently the input-output patterns are related across the groups. To model the inherent structure in such data, we assume the partition-projected class-conditional invariance across groups, defined in terms of the group-agnostic feature space. We demonstrate that under this assumption, the group carries additional information about the class, over the group-agnostic features, with provably improved area under the ROC curve. Further assuming invariance of partition-projected class-conditional distributions across both labeled and unlabeled data, we derive a semi-supervised algorithm that explicitly leverages the structure to learn an optimal, group-aware, probability-calibrated classifier, despite the bias in the labeled data. Experiments on synthetic and real data demonstrate the efficacy of our algorithm over suitable baselines and ablative models, spanning standard supervised and semi-supervised learning approaches, with and without incorporating the group directly as a feature.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源