MEL频谱域的防御措施，以防止对语音识别系统的对抗性攻击

论文标题

MEL频谱域的防御措施，以防止对语音识别系统的对抗性攻击

Mel Frequency Spectral Domain Defenses against Adversarial Attacks on Speech Recognition Systems

论文作者

Mehlman, Nicholas, Sreeram, Anirudh, Peri, Raghuveer, Narayanan, Shrikanth

论文摘要

最近的许多作品都研究了针对对抗性攻击的深层神经网络，尤其是在图像处理领域内的防御。诸如自动语音识别（ASR）之类的语音处理应用程序越来越依赖深度学习模型，因此也很容易受到对抗攻击。但是，为ASR探索的许多防御措施只是调整图像域防御措施，这可能无法提供最佳的鲁棒性。本文使用MEL光谱域探索了语音特定的防御，并引入了一种新型的防御方法，称为“ MEL域噪声洪水”（MDNF）。 MDNF在重新合成音频信号之前将添加噪声应用于语音发音的MEL频谱图。我们测试了针对强烈的白盒对抗性攻击的防御措施，例如预计梯度下降（PGD）和Carlini-Wagner（CW）攻击，并且与强大威胁模型中的随机平滑基线相比，它表现出更好的鲁棒性。

A variety of recent works have looked into defenses for deep neural networks against adversarial attacks particularly within the image processing domain. Speech processing applications such as automatic speech recognition (ASR) are increasingly relying on deep learning models, and so are also prone to adversarial attacks. However, many of the defenses explored for ASR simply adapt the image-domain defenses, which may not provide optimal robustness. This paper explores speech specific defenses using the mel spectral domain, and introduces a novel defense method called 'mel domain noise flooding' (MDNF). MDNF applies additive noise to the mel spectrogram of a speech utterance prior to re-synthesising the audio signal. We test the defenses against strong white-box adversarial attacks such as projected gradient descent (PGD) and Carlini-Wagner (CW) attacks, and show better robustness compared to a randomized smoothing baseline across strong threat models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题