论文标题
在预测文本语言模型中种植和缓解记忆的内容
Planting and Mitigating Memorized Content in Predictive-Text Language Models
论文作者
论文摘要
语言模型被广泛部署,以在用户产品中提供自动文本完成服务。但是,最近的研究表明,语言模型(尤其是大型模型)具有记忆私人培训数据的巨大风险,然后这很容易受到对手泄漏和提取的影响。在这项研究中,我们测试了一系列隐私技术的功效,以减轻对敏感用户文本的意外记忆,同时改变其他因素,例如模型大小和对抗性条件。我们测试了“启发式”缓解(没有正式隐私保证的启发式”和私人培训,以某些模型性能为代价提供了可证明的隐私水平。我们的实验表明(除L2正则化外),启发式缓解在很大程度上无效地防止我们的测试套件中的记忆,这可能是因为它们对定义“敏感”或“私人”文本的特征的假设过于强烈。相比之下,尽管其计算和模型性能成本,但差异隐私可靠地阻止了我们的实验中的记忆。
Language models are widely deployed to provide automatic text completion services in user products. However, recent research has revealed that language models (especially large ones) bear considerable risk of memorizing private training data, which is then vulnerable to leakage and extraction by adversaries. In this study, we test the efficacy of a range of privacy-preserving techniques to mitigate unintended memorization of sensitive user text, while varying other factors such as model size and adversarial conditions. We test both "heuristic" mitigations (those without formal privacy guarantees) and Differentially Private training, which provides provable levels of privacy at the cost of some model performance. Our experiments show that (with the exception of L2 regularization), heuristic mitigations are largely ineffective in preventing memorization in our test suite, possibly because they make too strong of assumptions about the characteristics that define "sensitive" or "private" text. In contrast, Differential Privacy reliably prevents memorization in our experiments, despite its computational and model-performance costs.