关于对抗性多源PAC学习的样本复杂性

论文标题

关于对抗性多源PAC学习的样本复杂性

On the Sample Complexity of Adversarial Multi-Source PAC Learning

论文作者

Konstantinov, Nikola, Frantar, Elias, Alistarh, Dan, Lampert, Christoph H.

论文摘要

我们研究了从多个不受信任的数据源学习的问题，这是众包和协作学习范式最近出现的实际相关性的情况。具体而言，我们分析了学习系统从多个来源获取数据集的情况，其中一些来源可能是偏见甚至对敌方扰动的。众所周知，在单源案例中，具有破坏固定部分训练数据的对手可以防止PAC可行性性，即即使在无限的训练数据的极限中，也没有学习系统可以处理最佳的测试错误。在这项工作中，我们表明，令人惊讶的是，在多源环境中，对手可以任意损坏数据源的固定部分。我们的主要结果是概括性结合，可为这种学习设置以及相应的下限提供有限样本的保证。除了建立PAC可行性外，我们的结果还表明，在合作学习中，与其他各方共享数据也可以证明是可证明的好处，即使某些参与者是恶意的。

We study the problem of learning from multiple untrusted data sources, a scenario of increasing practical relevance given the recent emergence of crowdsourcing and collaborative learning paradigms. Specifically, we analyze the situation in which a learning system obtains datasets from multiple sources, some of which might be biased or even adversarially perturbed. It is known that in the single-source case, an adversary with the power to corrupt a fixed fraction of the training data can prevent PAC-learnability, that is, even in the limit of infinitely much training data, no learning system can approach the optimal test error. In this work we show that, surprisingly, the same is not true in the multi-source setting, where the adversary can arbitrarily corrupt a fixed fraction of the data sources. Our main results are a generalization bound that provides finite-sample guarantees for this learning setting, as well as corresponding lower bounds. Besides establishing PAC-learnability our results also show that in a cooperative learning setting sharing data with other parties has provable benefits, even if some participants are malicious.

下载PDF全文

下载文献需遵守相关版权规定

论文标题