论文标题

PC GOAN:伪标签条件生成的对抗插补网络,用于数据不完整的数据

PC-GAIN: Pseudo-label Conditional Generative Adversarial Imputation Networks for Incomplete Data

论文作者

Wang, Yufeng, Li, Dan, Li, Xiang, Yang, Min

论文摘要

在现实世界应用中,具有丢失值的数据集非常普遍。 Gain是最近提出的用于缺失数据插补的深层生成模型,已被证明超过了许多最新方法。但是GAIN仅在发生器中使用重建损失来最大程度地减少非错失部分的插补误差,而忽略了可以反映样本之间关系的潜在类别信息。在本文中,我们提出了一种名为PC-GAIN的新型无监督的缺少数据插补方法,该方法利用潜在的类别信息进一步增强了插补能力。具体而言,我们首先提出了一个预训练程序,以了解低失误率数据子集中包含的潜在类别信息。然后使用合成伪标签确定辅助分类器。此外,该分类器被纳入生成对抗框架中,以帮助发电机产生更高质量的插补结果。提出的方法可以显着提高收益的归合质量。各种基准数据集的实验结果表明,我们的方法也优于其他基线方法。我们的代码可在\ url {https://github.com/wyu-feng/pc-gain}上找到。

Datasets with missing values are very common in real world applications. GAIN, a recently proposed deep generative model for missing data imputation, has been proved to outperform many state-of-the-art methods. But GAIN only uses a reconstruction loss in the generator to minimize the imputation error of the non-missing part, ignoring the potential category information which can reflect the relationship between samples. In this paper, we propose a novel unsupervised missing data imputation method named PC-GAIN, which utilizes potential category information to further enhance the imputation power. Specifically, we first propose a pre-training procedure to learn potential category information contained in a subset of low-missing-rate data. Then an auxiliary classifier is determined using the synthetic pseudo-labels. Further, this classifier is incorporated into the generative adversarial framework to help the generator to yield higher quality imputation results. The proposed method can improve the imputation quality of GAIN significantly. Experimental results on various benchmark datasets show that our method is also superior to other baseline approaches. Our code is available at \url{https://github.com/WYu-Feng/pc-gain}.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源