亲和力-VAE：从科学图像中纳入代表性学习中的先验知识

论文标题

亲和力-VAE：从科学图像中纳入代表性学习中的先验知识

Affinity-VAE: incorporating prior knowledge in representation learning from scientific images

论文作者

Famili, Marjan, Mirecka, Jola, Smith, Camila Rangel, Kotańska, Anna, Juraschko, Nikolai, Costa-Gomes, Beatriz, Palmer, Colin M., Thiyagalingam, Jeyan, Burnley, Tom, Basham, Mark, Lowe, Alan R.

论文摘要

学习数据的紧凑和可解释的表示是科学图像分析的关键挑战。在这里，我们介绍了Affinity-Vae，这是一种生成模型，使我们能够对培训期间学会表示的数据集实例相似。我们证明了该方法在冷冻电子断层扫描（Cryo-ET）科学领域中的实用性，其中当前的重大挑战是在嘈杂且较低的对比度层析成像图像量中识别类似的分子。此任务与分类不同，因为在推理时，实例是否是培训集的一部分是未知的。我们使用蛋白质结构的先验知识训练了亲和力-VAE，以告知潜在空间。我们的模型能够在潜在的表示中创建旋转不变的，形态上均匀的簇，与其他方法相比，群集分离的改善。它在蛋白质分类方面取得了竞争性能，并带有解开对象姿势，结构相似性和可解释的潜在表示的额外好处。在冷冻数据的背景下，亲和力-VAE捕获了3D中鉴定的蛋白质的方向，可以用作后续科学实验的先验。从训练的网络中提取物理原理在科学成像中至关重要，在科学成像中，地面真相训练集并非总是可行的。

Learning compact and interpretable representations of data is a critical challenge in scientific image analysis. Here, we introduce Affinity-VAE, a generative model that enables us to impose our scientific intuition about the similarity of instances in the dataset on the learned representation during training. We demonstrate the utility of the approach in the scientific domain of cryo-electron tomography (cryo-ET) where a significant current challenge is to identify similar molecules within a noisy and low contrast tomographic image volume. This task is distinct from classification in that, at inference time, it is unknown whether an instance is part of the training set or not. We trained affinity-VAE using prior knowledge of protein structure to inform the latent space. Our model is able to create rotationally-invariant, morphologically homogeneous clusters in the latent representation, with improved cluster separation compared to other approaches. It achieves competitive performance on protein classification with the added benefit of disentangling object pose, structural similarity and an interpretable latent representation. In the context of cryo-ET data, affinity-VAE captures the orientation of identified proteins in 3D which can be used as a prior for subsequent scientific experiments. Extracting physical principles from a trained network is of significant importance in scientific imaging where a ground truth training set is not always feasible.

下载PDF全文

下载文献需遵守相关版权规定

论文标题