论文标题

SOK:通过隐私保护的多个来源的培训机器学习模型

SoK: Training Machine Learning Models over Multiple Sources with Privacy Preservation

论文作者

Song, Lushan, Lin, Guopeng, Wang, Jiaxuan, Wu, Haoqi, Ruan, Wenqiang, Han, Weili

论文摘要

如今,从多个数据源中收集高质量的培训数据具有隐私保护是对培训高性能机器学习模型的至关重要的挑战。潜在的解决方案可能会打破孤立的数据语料库之间的障碍,因此扩大了可用于处理的数据范围。为此,学术研究人员和工业供应商最近都有强烈的动力,主要基于软件结构提出两个主要解决方案文件夹:1)安全多方学习(简称MPL); 2)联合学习(简称FL)。当我们根据以下五个标准对其进行评估时,上述两个技术文件夹具有其优点和局限性:安全性,效率,数据分配,训练有素的模型的准确性和应用程序方案。 我们激励展示研究进度并讨论未来方向的见解,我们彻底研究了MPL和FL的这些方案和框架。首先,我们将培训机器学习模型的问题定义为具有隐私保护的多个数据源(简称TMMPP)。然后,我们比较了TMMPP的最新研究,从技术路线的各个方面,支持各方的数量,数据划分,威胁模型和支持的机器学习模型,以显示其优势和局限性。接下来,我们调查并评估五个流行的FL平台。最后,我们讨论了将来解决TMMPP问题的潜在方向。

Nowadays, gathering high-quality training data from multiple data sources with privacy preservation is a crucial challenge to training high-performance machine learning models. The potential solutions could break the barriers among isolated data corpus, and consequently enlarge the range of data available for processing. To this end, both academic researchers and industrial vendors are recently strongly motivated to propose two main-stream folders of solutions mainly based on software constructions: 1) Secure Multi-party Learning (MPL for short); and 2) Federated Learning (FL for short). The above two technical folders have their advantages and limitations when we evaluate them according to the following five criteria: security, efficiency, data distribution, the accuracy of trained models, and application scenarios. Motivated to demonstrate the research progress and discuss the insights on the future directions, we thoroughly investigate these protocols and frameworks of both MPL and FL. At first, we define the problem of Training machine learning Models over Multiple data sources with Privacy Preservation (TMMPP for short). Then, we compare the recent studies of TMMPP from the aspects of the technical routes, the number of parties supported, data partitioning, threat model, and machine learning models supported, to show their advantages and limitations. Next, we investigate and evaluate five popular FL platforms. Finally, we discuss the potential directions to resolve the problem of TMMPP in the future.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源