论文标题
Expertnet:分类和聚类的共生
ExpertNet: A Symbiosis of Classification and Clustering
论文作者
论文摘要
一种广泛使用的范式来改善高容量神经模型的概括性能是通过在监督培训期间添加辅助无监督任务。已经证明,诸如相似性匹配和输入重建之类的任务可以通过指导代表学习来提供有益的正规化效果。真实数据通常具有复杂的基础结构,并且可能由当前方法没有很好地学习的异质亚群组成。在这项工作中,我们设计了Expertnet,它使用新颖的培训策略来学习聚类的潜在表示,并通过有效组合特定于集群的分类器来利用它们。我们从理论上分析了聚类对其概括差距的影响,并从经验上表明,从Expertnet的聚类潜在表示会导致分解内在结构和分类性能的改善。 ExpertNet还满足了一个重要的现实需求,在该需求中,分类器需要针对不同的亚群(例如临床风险模型)进行量身定制。我们在6个大型临床数据集上证明了专家网比最先进的方法的优越性,在该数据集中,我们的方法可以对特定于小组的风险产生宝贵的见解。
A widely used paradigm to improve the generalization performance of high-capacity neural models is through the addition of auxiliary unsupervised tasks during supervised training. Tasks such as similarity matching and input reconstruction have been shown to provide a beneficial regularizing effect by guiding representation learning. Real data often has complex underlying structures and may be composed of heterogeneous subpopulations that are not learned well with current approaches. In this work, we design ExpertNet, which uses novel training strategies to learn clustered latent representations and leverage them by effectively combining cluster-specific classifiers. We theoretically analyze the effect of clustering on its generalization gap, and empirically show that clustered latent representations from ExpertNet lead to disentangling the intrinsic structure and improvement in classification performance. ExpertNet also meets an important real-world need where classifiers need to be tailored for distinct subpopulations, such as in clinical risk models. We demonstrate the superiority of ExpertNet over state-of-the-art methods on 6 large clinical datasets, where our approach leads to valuable insights on group-specific risks.