论文标题
通过功能提取来保护隐私的协作学习
Privacy-Preserving Collaborative Learning through Feature Extraction
论文作者
论文摘要
我们提出了一个框架,其中多个实体在保留数据隐私的同时协作建立机器学习模型。该方法利用来自共享/实体特征提取器的特征嵌入,将数据转换为实体之间合作的特征空间。我们提出了两种特定的方法,并将它们与基线方法进行比较。在共享特征提取器(SFE)学习中,实体使用共享特征提取器来计算样品的特征嵌入。在本地训练的功能提取器(LTFE)学习中,每个实体都使用单独的特征提取器,并使用所有实体的串联特征训练模型。作为基线,在经过合作培训的功能提取器(CTFE)学习中,实体通过共享原始数据来训练模型。安全的多方算法可用于训练模型,而无需透露纯文本中的数据或功能。我们研究了SFE,LTFE和CTFE之间的权衡方面的性能,隐私泄漏(使用现成的会员推理攻击)和计算成本。 LTFE提供了最多的隐私,其次是SFE,然后是CTFE。 SFE的计算成本最低,CTFE和LTFE的相对速度取决于网络体系结构。 CTFE和LTFE提供了最佳准确性。我们使用MNIST,合成数据集和信用卡欺诈检测数据集进行评估。
We propose a framework in which multiple entities collaborate to build a machine learning model while preserving privacy of their data. The approach utilizes feature embeddings from shared/per-entity feature extractors transforming data into a feature space for cooperation between entities. We propose two specific methods and compare them with a baseline method. In Shared Feature Extractor (SFE) Learning, the entities use a shared feature extractor to compute feature embeddings of samples. In Locally Trained Feature Extractor (LTFE) Learning, each entity uses a separate feature extractor and models are trained using concatenated features from all entities. As a baseline, in Cooperatively Trained Feature Extractor (CTFE) Learning, the entities train models by sharing raw data. Secure multi-party algorithms are utilized to train models without revealing data or features in plain text. We investigate the trade-offs among SFE, LTFE, and CTFE in regard to performance, privacy leakage (using an off-the-shelf membership inference attack), and computational cost. LTFE provides the most privacy, followed by SFE, and then CTFE. Computational cost is lowest for SFE and the relative speed of CTFE and LTFE depends on network architecture. CTFE and LTFE provide the best accuracy. We use MNIST, a synthetic dataset, and a credit card fraud detection dataset for evaluations.