Seanet：多模式的语音增强网络

论文标题

Seanet：多模式的语音增强网络

SEANet: A Multi-modal Speech Enhancement Network

论文作者

Tagliasacchi, Marco, Li, Yunpeng, Misiunas, Karolis, Roblek, Dominik

论文摘要

我们探讨了利用加速度计数据在非常嘈杂的条件下进行语音增强的可能性。尽管只能仅部分从加速度计部分重建用户的语音，但后者提供了强大的条件信号，该信号不受环境中噪声源的影响。基于此观察结果，我们将多模式输入馈送到Seanet（声音增强网络），这是一个波浪 - 波浪完全卷积模型，该模型采用了功能损失和对抗性损失的组合，以重建用户语音的增强版本。我们通过在耳塞上安装的传感器收集的数据训练了我们的模型，并通过在音频信号中添加不同种类的噪声源而在合成中损坏。我们的实验结果表明，即使在相同响度的言语中，也可以取得非常高质量的结果。我们的模型产生的输出样本可在https://google-research.github.io/seanet/multimodal/speech上获得。

We explore the possibility of leveraging accelerometer data to perform speech enhancement in very noisy conditions. Although it is possible to only partially reconstruct user's speech from the accelerometer, the latter provides a strong conditioning signal that is not influenced from noise sources in the environment. Based on this observation, we feed a multi-modal input to SEANet (Sound EnhAncement Network), a wave-to-wave fully convolutional model, which adopts a combination of feature losses and adversarial losses to reconstruct an enhanced version of user's speech. We trained our model with data collected by sensors mounted on an earbud and synthetically corrupted by adding different kinds of noise sources to the audio signal. Our experimental results demonstrate that it is possible to achieve very high quality results, even in the case of interfering speech at the same level of loudness. A sample of the output produced by our model is available at https://google-research.github.io/seanet/multimodal/speech.

下载PDF全文

下载文献需遵守相关版权规定

论文标题