论文标题
使用对抗性示例的隐私效果平衡语音去识别
Privacy-Utility Balanced Voice De-Identification Using Adversarial Examples
论文作者
论文摘要
面对语音数据出版期间身份泄漏的威胁,在享受便捷的语音服务时,用户会遇到隐私 - 实用性困境。现有研究采用直接修改或基于文本的重新合成来消除用户的声音,但在人类参与者的存在下导致听觉性不一致。在本文中,我们提出了一个语音去识别系统,该系统使用对抗性示例来平衡语音服务的隐私和实用性。我们设计了一个新型的卷积对抗示例,而不是诱发可感知扭曲的典型添加示例,该例子将扰动调节到现实世界中的脉冲响应中。从中受益,我们的系统可以通过自动扬声器识别(ASI)保护用户身份,同时保持语音感知质量以进行非侵入性去识别。此外,我们的系统通过有条件的变异自动编码器来学习紧凑的扬声器分布,以按需采样各种目标嵌入。将各种目标产生和特定于投入的扰动结构相结合,我们的系统使任何对任何对自适应去识别的转换都可以确定。实验结果表明,我们的系统可以在主流ASIS和商业系统上获得98%和79%的成功去识别,其客观MEL CEPSTRAL失真为4.31db,主观平均意见分数为4.48。
Faced with the threat of identity leakage during voice data publishing, users are engaged in a privacy-utility dilemma when enjoying convenient voice services. Existing studies employ direct modification or text-based re-synthesis to de-identify users' voices, but resulting in inconsistent audibility in the presence of human participants. In this paper, we propose a voice de-identification system, which uses adversarial examples to balance the privacy and utility of voice services. Instead of typical additive examples inducing perceivable distortions, we design a novel convolutional adversarial example that modulates perturbations into real-world room impulse responses. Benefit from this, our system could preserve user identity from exposure by Automatic Speaker Identification (ASI) while remaining the voice perceptual quality for non-intrusive de-identification. Moreover, our system learns a compact speaker distribution through a conditional variational auto-encoder to sample diverse target embeddings on demand. Combining diverse target generation and input-specific perturbation construction, our system enables any-to-any identify transformation for adaptive de-identification. Experimental results show that our system could achieve 98% and 79% successful de-identification on mainstream ASIs and commercial systems with an objective Mel cepstral distortion of 4.31dB and a subjective mean opinion score of 4.48.