论文标题
多到许多跨域映射的潜在标准化流量
Latent Normalizing Flows for Many-to-Many Cross-Domain Mappings
论文作者
论文摘要
学到的图像和文本的联合表示形成了几个重要的跨域任务(例如图像字幕)的骨干。先前的工作主要以纯监督的方式将两个领域映射成一个共同的潜在代表。但是,这是相当限制的,因为两个领域遵循不同的生成过程。因此,我们提出了一个新颖的半监督框架,该框架分别在域和特定于域之间共享信息。域之间共享的信息与可逆神经网络一致。我们的模型集成了针对特定领域的信息的基于流动的先验,这使我们能够学习两个域之间的多个多对数映射。我们证明了模型对各种任务的有效性,包括图像字幕和文本对图像综合。
Learned joint representations of images and text form the backbone of several important cross-domain tasks such as image captioning. Prior work mostly maps both domains into a common latent representation in a purely supervised fashion. This is rather restrictive, however, as the two domains follow distinct generative processes. Therefore, we propose a novel semi-supervised framework, which models shared information between domains and domain-specific information separately. The information shared between the domains is aligned with an invertible neural network. Our model integrates normalizing flow-based priors for the domain-specific information, which allows us to learn diverse many-to-many mappings between the two domains. We demonstrate the effectiveness of our model on diverse tasks, including image captioning and text-to-image synthesis.