论文标题

混合模型中的错误会员率控制

False membership rate control in mixture models

论文作者

Marandon, Ariane, Rebafka, Tabea, Roquain, Etienne, Sokolovska, Nataliya

论文摘要

聚类任务包括将样本的元素划分为均一组。大多数数据集都包含模棱两可且本质上难以归因于一个或另一个集群的个体。但是,在实际应用中,错误分类的个体可能是灾难性的,应避免。为了使错误分类率较小,可以决定仅分类样本的一部分。在监督的环境中,这种方法是众所周知的,并被称为弃权选项的分类。在本文中,该方法是在无监督的混合模型框架中重新审视的,目的是开发一种方法,该方法保证错误会员资格(FMR)不超过预定的名义级别$α$。提出了一个插件程序,通过量化有关目标水平$α$具有明确剩余条款的FMR偏差,为此提供了理论分析。该过程的引导版本显示出可以改善数值实验中的性能。

The clustering task consists in partitioning elements of a sample into homogeneous groups. Most datasets contain individuals that are ambiguous and intrinsically difficult to attribute to one or another cluster. However, in practical applications, misclassifying individuals is potentially disastrous and should be avoided. To keep the misclassification rate small, one can decide to classify only a part of the sample. In the supervised setting, this approach is well known and referred to as classification with an abstention option. In this paper the approach is revisited in an unsupervised mixture model framework and the purpose is to develop a method that comes with the guarantee that the false membership rate (FMR) does not exceed a pre-defined nominal level $α$. A plug-in procedure is proposed, for which a theoretical analysis is provided, by quantifying the FMR deviation with respect to the target level $α$ with explicit remainder terms. Bootstrap versions of the procedure are shown to improve the performance in numerical experiments.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源