论文标题
MetAspeech:语音效果开关以及元环境
MetaSpeech: Speech Effects Switch Along with Environment for Metaverse
论文作者
论文摘要
荟萃分析将物理世界扩展到一个新的维度,并且可以直接连接并输入物理环境和元环境。声音是现实世界中必不可少的通信媒介。声音与环境效应的融合对于用户沉浸在元视频中很重要。在本文中,我们建议使用基于语音转换的方法进行目标环境效应语音的转换。提出的方法命名为MetAspeech,该方法介绍了一个包含效应提取器的环境效应模块,以提取环境信息和效应编码器来编码环境效应条件,其中使用梯度反转层进行对抗训练,以保持语音内容和扬声器信息,同时散布环境效应。从具有四个环境效应的LJSpeech公共数据集的实验结果中,提出的模型可以完成特定的环境效果转换,并超过语音转换任务的基线方法。
Metaverse expands the physical world to a new dimension, and the physical environment and Metaverse environment can be directly connected and entered. Voice is an indispensable communication medium in the real world and Metaverse. Fusion of the voice with environment effects is important for user immersion in Metaverse. In this paper, we proposed using the voice conversion based method for the conversion of target environment effect speech. The proposed method was named MetaSpeech, which introduces an environment effect module containing an effect extractor to extract the environment information and an effect encoder to encode the environment effect condition, in which gradient reversal layer was used for adversarial training to keep the speech content and speaker information while disentangling the environmental effects. From the experiment results on the public dataset of LJSpeech with four environment effects, the proposed model could complete the specific environment effect conversion and outperforms the baseline methods from the voice conversion task.