指定实体识别模型中的意外记忆和正时攻击

论文标题

指定实体识别模型中的意外记忆和正时攻击

Unintended Memorization and Timing Attacks in Named Entity Recognition Models

论文作者

Ali, Rana Salal, Zhao, Benjamin Zi Hao, Asghar, Hassan Jameel, Nguyen, Tham, Wood, Ian David, Kaafar, Dali

论文摘要

命名的实体识别模型（NER）被广泛用于识别文本文档中指定的实体（例如个人，位置和其他信息）。基于机器学习的NER模型越来越多地应用于对隐私敏感的应用程序中，这些应用程序需要自动且可扩展的敏感信息来编辑文本以进行数据共享。在本文中，我们研究了NER模型作为黑框服务的设置，用于识别用户文档中的敏感信息，并表明这些模型很容易受到其培训数据集中的会员推断。借助Spacy的更新的预训练的NER模型，我们展示了对这些模型的两种不同的会员攻击。我们的第一次攻击利用了NER基本神经网络中意外的记忆，已知现象很容易受到影响。我们的第二次攻击利用了定时侧通道来针对维护训练数据构建的词汇的NER模型。我们表明，训练数据集中单词的不同功能路径与以前看不见的单词相比具有可测量的执行时间差异。揭示培训样本的成员身份具有明显的隐私影响，例如，在文本修订，敏感单词或要找到和删除的短语中，有可能在培训数据集中被检测到的风险。我们的实验评估包括对密码和健康数据的修订，同时提出安全风险以及隐私/监管问题。仅用单个短语显示记忆的结果使这一恶化。我们在第一次对文本修订用例的攻击中实现了70％的AUC。我们还以99.23％的AUC表现出了压倒性的成功。最后，我们讨论了潜在的缓解方法，以根据会员推理攻击的隐私和安全含义来实现NER模型的安全使用。

Named entity recognition models (NER), are widely used for identifying named entities (e.g., individuals, locations, and other information) in text documents. Machine learning based NER models are increasingly being applied in privacy-sensitive applications that need automatic and scalable identification of sensitive information to redact text for data sharing. In this paper, we study the setting when NER models are available as a black-box service for identifying sensitive information in user documents and show that these models are vulnerable to membership inference on their training datasets. With updated pre-trained NER models from spaCy, we demonstrate two distinct membership attacks on these models. Our first attack capitalizes on unintended memorization in the NER's underlying neural network, a phenomenon NNs are known to be vulnerable to. Our second attack leverages a timing side-channel to target NER models that maintain vocabularies constructed from the training data. We show that different functional paths of words within the training dataset in contrast to words not previously seen have measurable differences in execution time. Revealing membership status of training samples has clear privacy implications, e.g., in text redaction, sensitive words or phrases to be found and removed, are at risk of being detected in the training dataset. Our experimental evaluation includes the redaction of both password and health data, presenting both security risks and privacy/regulatory issues. This is exacerbated by results that show memorization with only a single phrase. We achieved 70% AUC in our first attack on a text redaction use-case. We also show overwhelming success in the timing attack with 99.23% AUC. Finally we discuss potential mitigation approaches to realize the safe use of NER models in light of the privacy and security implications of membership inference attacks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题