论文标题
面部里程碑预测,申请元
Facial Landmark Predictions with Applications to Metaverse
论文作者
论文摘要
这项研究旨在通过添加从野外视频中学到的唇部动画来使元角色更现实。为了实现这一目标,我们的方法是扩展Tacotron 2文本到语音合成器,以在一个通过时与MEL频谱一起生成唇部运动。编码器和栅极层的权重在LJ语音1.1数据集上进行了预训练,而解码器则在从LRS 3数据集中提取的93个TED谈话视频中重新训练。我们的新型解码器预测,使用OpenFace 2.0 Landmark预测变量自动提取的标签,可以随时间置于20个唇部标记位置的位置。训练在7小时内使用不到5分钟的视频收集。我们进行了对前/后和训练的编码重量的消融研究,以证明音频和视觉语音数据之间传递学习的有效性。
This research aims to make metaverse characters more realistic by adding lip animations learnt from videos in the wild. To achieve this, our approach is to extend Tacotron 2 text-to-speech synthesizer to generate lip movements together with mel spectrogram in one pass. The encoder and gate layer weights are pre-trained on LJ Speech 1.1 data set while the decoder is retrained on 93 clips of TED talk videos extracted from LRS 3 data set. Our novel decoder predicts displacement in 20 lip landmark positions across time, using labels automatically extracted by OpenFace 2.0 landmark predictor. Training converged in 7 hours using less than 5 minutes of video. We conducted ablation study for Pre/Post-Net and pre-trained encoder weights to demonstrate the effectiveness of transfer learning between audio and visual speech data.