通过跨模式翻译和对齐方式强大的潜在表示

论文标题

通过跨模式翻译和对齐方式强大的潜在表示

Robust Latent Representations via Cross-Modal Translation and Alignment

论文作者

Rajan, Vandana, Brutti, Alessio, Cavallaro, Andrea

论文摘要

多模式学习将相同物理现象的观察方式之间的信息联系起来，以利用互补信息。大多数多模式的机器学习方法都要求所有用于培训的模式都可以进行测试。当某些模式的信号不可用或因噪声而严重降解时，这是一个限制。为了解决这一限制，我们旨在仅在培训期间使用多种模式来提高单一个模式系统的测试性能。所提出的多模式训练框架使用基于相关的潜在空间比对来改善弱模态的表示。从弱者到更强的模态的翻译产生了代表这两种模态的多模式中间编码。然后，此编码与共享潜在空间中更强的模态表示相关。我们验证了AVEC 2016数据集上提出的方法，以持续识别情绪识别，并显示了实现最先进（Uni-Modal）表现较弱方式的方法的有效性。

Multi-modal learning relates information across observation modalities of the same physical phenomenon to leverage complementary information. Most multi-modal machine learning methods require that all the modalities used for training are also available for testing. This is a limitation when the signals from some modalities are unavailable or are severely degraded by noise. To address this limitation, we aim to improve the testing performance of uni-modal systems using multiple modalities during training only. The proposed multi-modal training framework uses cross-modal translation and correlation-based latent space alignment to improve the representations of the weaker modalities. The translation from the weaker to the stronger modality generates a multi-modal intermediate encoding that is representative of both modalities. This encoding is then correlated with the stronger modality representations in a shared latent space. We validate the proposed approach on the AVEC 2016 dataset for continuous emotion recognition and show the effectiveness of the approach that achieves state-of-the-art (uni-modal) performance for weaker modalities.

下载PDF全文

下载文献需遵守相关版权规定

论文标题