小型数据集的多域多域区标准本地化

论文标题

小型数据集的多域多域区标准本地化

Multi-Domain Multi-Definition Landmark Localization for Small Datasets

论文作者

Ferman, David, Bharaj, Gaurav

论文摘要

我们提出了一种用于多图像域和小型数据集面部定位的多个图像定义学习的新方法。培训小型数据集和大型（R）数据集并为前者提供了强大的学习，并为新标准数据集提供了面部地标本地化的通用机制。为此，我们提出了一个带有新颖的解码器的视觉变压器编码器，其定义不可知的具有里程碑意义的语义群体结构为先验，这是我们同时在多个数据集上进行训练时所学的。由于我们新颖的定义不可知组，该数据集可能会在具有里程碑意义的定义和域上有所不同。在解码器阶段，我们使用交叉和自我注意力，后来将其输出送入域/定义特定的头部，以最大程度地减少laplacian-log类损失。当使用较大的数据集培训时，我们在标准地标本地化数据集（例如COFW和WFLW）上实现了最先进的性能。我们还在几个不同的图像域上显示了最新的性能，用于动物，漫画和面部肖像画。此外，我们为Pareidolias的一个小数据集（150张图像）贡献了我们方法的功效。最后，我们提供了几项分析和消融研究，以证明我们的主张是合理的。

We present a novel method for multi image domain and multi-landmark definition learning for small dataset facial localization. Training a small dataset alongside a large(r) dataset helps with robust learning for the former, and provides a universal mechanism for facial landmark localization for new and/or smaller standard datasets. To this end, we propose a Vision Transformer encoder with a novel decoder with a definition agnostic shared landmark semantic group structured prior, that is learnt, as we train on more than one dataset concurrently. Due to our novel definition agnostic group prior the datasets may vary in landmark definitions and domains. During the decoder stage we use cross- and self-attention, whose output is later fed into domain/definition specific heads that minimize a Laplacian-log-likelihood loss. We achieve state-of-the-art performance on standard landmark localization datasets such as COFW and WFLW, when trained with a bigger dataset. We also show state-of-the-art performance on several varied image domain small datasets for animals, caricatures, and facial portrait paintings. Further, we contribute a small dataset (150 images) of pareidolias to show efficacy of our method. Finally, we provide several analysis and ablation studies to justify our claims.

下载PDF全文

下载文献需遵守相关版权规定

论文标题