具有部分分布式反馈的差异私有线性匪徒

论文标题

具有部分分布式反馈的差异私有线性匪徒

Differentially Private Linear Bandits with Partial Distributed Feedback

论文作者

Li, Fengjiao, Zhou, Xingyu, Ji, Bo

论文摘要

在本文中，我们仅使用部分分布式反馈来研究全球奖励最大化的问题。这个问题是由几个现实世界应用程序（例如蜂窝网络配置，动态定价和政策选择）激发的，其中中央实体采取的行动会影响大量人群，从而有助于全球奖励。但是，从整个人群那里收集这种奖励反馈不仅会产生高昂的成本，而且经常导致隐私问题。为了解决这个问题，我们考虑了差异化的分布式线性土匪，其中只选择了来自人群中的用户（称为客户）来参与学习过程，并且中央服务器通过以差异私人方式迭代地兼顾这些客户的本地反馈，从而从这种部分反馈中学习了全球模型。然后，我们提出了一个统一的算法学习框架，称为差异性分布式分布式消除（DP-DPE），该框架可以与流行的差分隐私（DP）模型（包括中央DP，Local DP，Local DP和Shuffle DP）自然集成。此外，我们证明DP-DPE既可以达到统一的遗憾，又实现了sublrinear沟通成本。有趣的是，DP-DPE在某种意义上也可以实现隐私保护``免费''，这是因为隐私保证是一个低阶添加剂术语。此外，作为我们技术的副产品，对于标准的差异私有线性匪徒，也可以实现``自由''隐私的相同结果。最后，我们进行了模拟以证实我们的理论结果并证明了DP-DPE的有效性。

In this paper, we study the problem of global reward maximization with only partial distributed feedback. This problem is motivated by several real-world applications (e.g., cellular network configuration, dynamic pricing, and policy selection) where an action taken by a central entity influences a large population that contributes to the global reward. However, collecting such reward feedback from the entire population not only incurs a prohibitively high cost but often leads to privacy concerns. To tackle this problem, we consider differentially private distributed linear bandits, where only a subset of users from the population are selected (called clients) to participate in the learning process and the central server learns the global model from such partial feedback by iteratively aggregating these clients' local feedback in a differentially private fashion. We then propose a unified algorithmic learning framework, called differentially private distributed phased elimination (DP-DPE), which can be naturally integrated with popular differential privacy (DP) models (including central DP, local DP, and shuffle DP). Furthermore, we prove that DP-DPE achieves both sublinear regret and sublinear communication cost. Interestingly, DP-DPE also achieves privacy protection ``for free'' in the sense that the additional cost due to privacy guarantees is a lower-order additive term. In addition, as a by-product of our techniques, the same results of ``free" privacy can also be achieved for the standard differentially private linear bandits. Finally, we conduct simulations to corroborate our theoretical results and demonstrate the effectiveness of DP-DPE.

下载PDF全文

下载文献需遵守相关版权规定

论文标题