论文标题
形态集:通过聚类对浮游生物图像的有效注释
MorphoCluster: Efficient Annotation of Plankton images by Clustering
论文作者
论文摘要
在这项工作中,我们提出了形态集团,这是一种软件工具,用于数据驱动,快速准确的大图像数据集注释。尽管已经超过了人类专家的注释率,但在未来几年中,海洋数据的数量和复杂性将继续增加。尽管如此,这些数据仍需要解释。形态集可以通过将无监督的聚类嵌入交互过程中,从而增强人类发现模式并在大量数据中执行对象分类的能力。通过将相似的图像汇总到簇中,我们的新型图像注释方法提高了一致性,乘以注释者的吞吐量,并允许专家将其排序方案的粒度调整到数据中的结构中。通过将一组120万个对象排序为71小时(每小时16K对象)的280个数据驱动的类,其中90%的类具有0.889或更高的精度。这表明形态簇同时快速,准确且一致,提供了细粒度和数据驱动的分类,并实现了新颖性检测。 MorphoCluster可作为开源软件提供,网址为https://github.com/morphocluster。
In this work, we present MorphoCluster, a software tool for data-driven, fast and accurate annotation of large image data sets. While already having surpassed the annotation rate of human experts, volume and complexity of marine data will continue to increase in the coming years. Still, this data requires interpretation. MorphoCluster augments the human ability to discover patterns and perform object classification in large amounts of data by embedding unsupervised clustering in an interactive process. By aggregating similar images into clusters, our novel approach to image annotation increases consistency, multiplies the throughput of an annotator and allows experts to adapt the granularity of their sorting scheme to the structure in the data. By sorting a set of 1.2M objects into 280 data-driven classes in 71 hours (16k objects per hour), with 90% of these classes having a precision of 0.889 or higher. This shows that MorphoCluster is at the same time fast, accurate and consistent, provides a fine-grained and data-driven classification and enables novelty detection. MorphoCluster is available as open-source software at https://github.com/morphocluster.