都是集群游戏吗？ - 基于嵌入空间中的聚类探索分布外检测

论文标题

都是集群游戏吗？ - 基于嵌入空间中的聚类探索分布外检测

Is it all a cluster game? -- Exploring Out-of-Distribution Detection based on Clustering in the Embedding Space

论文作者

Sinhamahapatra, Poulami, Koner, Rajat, Roscher, Karsten, Günnemann, Stephan

论文摘要

深度神经网络的安全至关重要的应用是确定新输入何时与训练分布明显不同的必不可少的。在本文中，我们探讨了使用训练数据的语义相似嵌入的簇，并利用与分布数据之间和分布外数据之间这些群集的距离关系差异，以探索图像分类的该分布（OOD）检测问题。我们研究嵌入空间中簇的结构和分离，发现受监督的对比学习会导致分离良好的群集，而其自我监督的对应物未能做到这一点。在我们对不同培训方法，聚类策略，距离指标和阈值方法的广泛分析中，我们观察到没有明确的赢家。最佳方法取决于模型体系结构和所选数据集用于分布和分布。尽管我们可以将CIFAR-10的对比培训作为分布数据重现出色的结果，但我们发现标准的跨渗透性与余弦相似性配对时，在CIFAR-100上训练时，所有对比度的训练方法都优于所有对比训练方法。与昂贵的对比度训练方法相比，跨凝结提供了竞争成果。

It is essential for safety-critical applications of deep neural networks to determine when new inputs are significantly different from the training distribution. In this paper, we explore this out-of-distribution (OOD) detection problem for image classification using clusters of semantically similar embeddings of the training data and exploit the differences in distance relationships to these clusters between in- and out-of-distribution data. We study the structure and separation of clusters in the embedding space and find that supervised contrastive learning leads to well-separated clusters while its self-supervised counterpart fails to do so. In our extensive analysis of different training methods, clustering strategies, distance metrics, and thresholding approaches, we observe that there is no clear winner. The optimal approach depends on the model architecture and selected datasets for in- and out-of-distribution. While we could reproduce the outstanding results for contrastive training on CIFAR-10 as in-distribution data, we find standard cross-entropy paired with cosine similarity outperforms all contrastive training methods when training on CIFAR-100 instead. Cross-entropy provides competitive results as compared to expensive contrastive training methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题