轻卷卷积神经网络具有特征真实性，用于检测合成语音攻击

论文标题

轻卷卷积神经网络具有特征真实性，用于检测合成语音攻击

Light Convolutional Neural Network with Feature Genuinization for Detection of Synthetic Speech Attacks

论文作者

Wu, Zhenzong, Das, Rohan Kumar, Yang, Jichen, Li, Haizhou

论文摘要

现代文本到语音（TTS）和语音转换（VC）系统会产生自然发声的语音，质疑自动扬声器验证的安全性（ASV）。这使得对这种综合语音的检测非常重要，这对于保护ASV系统免于未经授权的访问非常重要。当训练过程中系统知道攻击的性质时，大多数现有的欺骗对策的表现都很好。但是，面对攻击性质的性质，他们的表现降低了。与广泛的TTS和VC方法创建的合成语音相比，真正的语音具有更一致的分布。我们认为，合成语音和真实语音的分布之间的差异是两个类别之间的重要歧视特征。在这方面，我们提出了一种被称为特征真实化的新颖方法，该方法仅使用真正的语音的特征来学习具有卷积神经网络（CNN）的变压器。然后，我们将此真正的变压器与轻型CNN分类器一起使用。 ASVSPOOF 2019逻辑访问语料库用于评估所提出的方法。研究表明，提出的基于特征的LCNN系统优于其他最先进的对策，描绘了其检测合成语音攻击的有效性。

Modern text-to-speech (TTS) and voice conversion (VC) systems produce natural sounding speech that questions the security of automatic speaker verification (ASV). This makes detection of such synthetic speech very important to safeguard ASV systems from unauthorized access. Most of the existing spoofing countermeasures perform well when the nature of the attacks is made known to the system during training. However, their performance degrades in face of unseen nature of attacks. In comparison to the synthetic speech created by a wide range of TTS and VC methods, genuine speech has a more consistent distribution. We believe that the difference between the distribution of synthetic and genuine speech is an important discriminative feature between the two classes. In this regard, we propose a novel method referred to as feature genuinization that learns a transformer with convolutional neural network (CNN) using the characteristics of only genuine speech. We then use this genuinization transformer with a light CNN classifier. The ASVspoof 2019 logical access corpus is used to evaluate the proposed method. The studies show that the proposed feature genuinization based LCNN system outperforms other state-of-the-art spoofing countermeasures, depicting its effectiveness for detection of synthetic speech attacks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题