转置：使用特征图作为转置卷积滤波器的通用纹理合成

论文标题

转置：使用特征图作为转置卷积滤波器的通用纹理合成

Transposer: Universal Texture Synthesis Using Feature Maps as Transposed Convolution Filter

论文作者

Liu, Guilin, Taori, Rohan, Wang, Ting-Chun, Yu, Zhiding, Liu, Shiqiu, Reda, Fitsum A., Sapra, Karan, Tao, Andrew, Catanzaro, Bryan

论文摘要

纹理合成的常规CNN由（DE） - 卷积和向上采样层组成，其中每层在本地运行，并且缺乏捕获纹理合成所需的长期结构依赖性的能力。因此，它们通常只是扩大输入纹理，而不是执行合理的合成。作为妥协，许多最近的方法通过对同一单个（或固定的）纹理图像进行训练和测试来牺牲可推广性，从而导致了看不见的图像的巨大重新训练时间成本。在这项工作中，基于这样的发现，即传统纹理合成中的组装/缝合操作类似于转置卷积操作，我们提出了一种使用转置卷积操作的新方法。具体而言，我们将输入纹理的整个编码特征图直接视为转置卷积过滤器和特征的自相似图，该图将捕获自动相关信息，作为输入转置卷积。这样的设计使我们的框架一旦受过训练，就可以通过几乎实时的单个前向通行证来构成看不见的纹理的合成。我们的方法基于各种指标实现了最先进的纹理合成质量。尽管自相似性有助于保留输入纹理的常规结构模式，但我们的框架还可以作为不规则输入纹理而不是自相似度图作为转置卷积输入来获取随机噪声图。它可以通过直接在单个通行证中直接采样大型噪声图来获得更多样化的结果，并产生任意较大的纹理输出。

Conventional CNNs for texture synthesis consist of a sequence of (de)-convolution and up/down-sampling layers, where each layer operates locally and lacks the ability to capture the long-term structural dependency required by texture synthesis. Thus, they often simply enlarge the input texture, rather than perform reasonable synthesis. As a compromise, many recent methods sacrifice generalizability by training and testing on the same single (or fixed set of) texture image(s), resulting in huge re-training time costs for unseen images. In this work, based on the discovery that the assembling/stitching operation in traditional texture synthesis is analogous to a transposed convolution operation, we propose a novel way of using transposed convolution operation. Specifically, we directly treat the whole encoded feature map of the input texture as transposed convolution filters and the features' self-similarity map, which captures the auto-correlation information, as input to the transposed convolution. Such a design allows our framework, once trained, to be generalizable to perform synthesis of unseen textures with a single forward pass in nearly real-time. Our method achieves state-of-the-art texture synthesis quality based on various metrics. While self-similarity helps preserve the input textures' regular structural patterns, our framework can also take random noise maps for irregular input textures instead of self-similarity maps as transposed convolution inputs. It allows to get more diverse results as well as generate arbitrarily large texture outputs by directly sampling large noise maps in a single pass as well.

下载PDF全文

下载文献需遵守相关版权规定

论文标题