统一人类运动合成的预审预定扩散模型

论文标题

统一人类运动合成的预审预定扩散模型

Pretrained Diffusion Models for Unified Human Motion Synthesis

论文作者

Ma, Jianxin, Bai, Shuai, Zhou, Chang

论文摘要

人类运动的生成建模在计算机动画，虚拟现实和机器人技术中广泛应用。常规方法为不同的运动综合任务开发了单独的模型，通常使用小尺寸的模型来避免过度适合每种设置中可用的稀缺数据。这仍然是一个悬而未决的问题，即开发单个统一模型是可行的，这是5月1日）通过结合从多个任务中学到的技能来使获得新技能的获得，以及2）帮助增加模型容量而不通过结合多个数据源而过度适应的模型。统一是具有挑战性的，因为1）它涉及不同的控制信号以及不同粒度的目标，而2）运动数据集可能使用不同的骨骼和默认姿势。在本文中，我们提出了统一运动合成的框架。 Mofusion采用变压器骨干来通过交叉注意来缓解各种控制信号，并将主链作为一种扩散模型预告，以支持多晶状体合成，从身体部位的运动完成到全身运动产生。它使用可学习的适配器来适应预处理和微调数据使用的默认骨架之间的差异。经验结果表明，预处理对于不适合不适合的模型尺寸至关重要，并在各种任务中展示了多种控制信号的各种任务的潜在潜力，例如文本到动作，运动完成和零拍混合。项目页面：\ url {https://ofa-sys.github.io/mofusion/}。

Generative modeling of human motion has broad applications in computer animation, virtual reality, and robotics. Conventional approaches develop separate models for different motion synthesis tasks, and typically use a model of a small size to avoid overfitting the scarce data available in each setting. It remains an open question whether developing a single unified model is feasible, which may 1) benefit the acquirement of novel skills by combining skills learned from multiple tasks, and 2) help in increasing the model capacity without overfitting by combining multiple data sources. Unification is challenging because 1) it involves diverse control signals as well as targets of varying granularity, and 2) motion datasets may use different skeletons and default poses. In this paper, we present MoFusion, a framework for unified motion synthesis. MoFusion employs a Transformer backbone to ease the inclusion of diverse control signals via cross attention, and pretrains the backbone as a diffusion model to support multi-granularity synthesis ranging from motion completion of a body part to whole-body motion generation. It uses a learnable adapter to accommodate the differences between the default skeletons used by the pretraining and the fine-tuning data. Empirical results show that pretraining is vital for scaling the model size without overfitting, and demonstrate MoFusion's potential in various tasks, e.g., text-to-motion, motion completion, and zero-shot mixing of multiple control signals. Project page: \url{https://ofa-sys.github.io/MoFusion/}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题