DTVNET+：用于动态延时视频生成的高分辨率风景数据集

论文标题

DTVNET+：用于动态延时视频生成的高分辨率风景数据集

DTVNet+: A High-Resolution Scenic Dataset for Dynamic Time-lapse Video Generation

论文作者

Zhang, Jiangning, Xu, Chao, Liu, Yong, Jiang, Yunliang

论文摘要

本文介绍了一个新型的端到端动态延时视频生成框架，名为DTVNet，以从归一化运动向量调节的单个景观图像中生成多元化的延时视频。所提出的DTVNET由两个子模型组成：\ emph {ofe coder}（ofe）和\ emph {dynamic Video Generator}（DVG）。 OFE将光流映射序列映射到\ emph {归一化运动向量}，该{归一化运动向量}编码生成的视频的运动信息。 DVG包含运动和内容流，以从运动矢量和单个景观图像中学习。此外，它还包含一个编码器，以学习共享内容功能和一个解码器，用于构建具有相应运动的视频帧。具体而言，\ emph {运动流}引入了多个\ emph {自适应实例归一化}（adain）层，以集成多级运动信息以控制对象运动。在测试阶段，具有相同内容但各种运动信息的视频可以由仅基于一个输入图像的不同\ emph {归一化运动向量}生成。另外，我们提出了一个名为Quick-Sky Time的高分辨率风景延时视频数据集，以评估不同的方法，可以将其视为高质量风景图像和视频生成任务的新基准。我们进一步在天空延时，海滩和快速的时间数据集上进行实验。结果表明，我们的方法优于最先进的方法，即产生高质量和各种动态视频。

This paper presents a novel end-to-end dynamic time-lapse video generation framework, named DTVNet, to generate diversified time-lapse videos from a single landscape image conditioned on normalized motion vectors. The proposed DTVNet consists of two submodules: \emph{Optical Flow Encoder} (OFE) and \emph{Dynamic Video Generator} (DVG). The OFE maps a sequence of optical flow maps to a \emph{normalized motion vector} that encodes the motion information of the generated video. The DVG contains motion and content streams to learn from the motion vector and the single landscape image. Besides, it contains an encoder to learn shared content features and a decoder to construct video frames with corresponding motion. Specifically, the \emph{motion stream} introduces multiple \emph{adaptive instance normalization} (AdaIN) layers to integrate multi-level motion information for controlling the object motion. In the testing stage, videos with the same content but various motion information can be generated by different \emph{normalized motion vectors} based on only one input image. Also, we propose a high-resolution scenic time-lapse video dataset, named Quick-Sky-Time, to evaluate different approaches, which can be viewed as a new benchmark for high-quality scenic image and video generation tasks. We further conduct experiments on Sky Time-lapse, Beach, and Quick-Sky-Time datasets. The results demonstrate the superiority of our approach over state-of-the-art methods for generating high-quality and various dynamic videos.

下载PDF全文

下载文献需遵守相关版权规定

论文标题