从合成数据中学习的基础

论文标题

从合成数据中学习的基础

Foundations of Bayesian Learning from Synthetic Data

论文作者

Wilde, Harrison, Jewson, Jack, Vollmer, Sebastian, Holmes, Chris

论文摘要

由于隐私或可用性限制，将合成数据用作机器学习的推动力有显着的增长和兴趣。尽管有大量的合成数据生成方法，但在合成数据上学到的模型的统计特性的结果相对较少，而对于研究人员希望使用另一方的合成数据来增强真实数据的情况仍然很少。我们使用贝叶斯范式来表征在这些设置中学习时模型参数的更新，这表明在应用常规学习算法时应谨慎行事，而无需考虑合成数据生成过程和学习任务。一般贝叶斯更新的最新结果支持一种基于决策理论的贝叶斯合成学习的新颖而强大的方法，该方法在监督学习和推理问题上重复实验的标准方法优于标准方法。

There is significant growth and interest in the use of synthetic data as an enabler for machine learning in environments where the release of real data is restricted due to privacy or availability constraints. Despite a large number of methods for synthetic data generation, there are comparatively few results on the statistical properties of models learnt on synthetic data, and fewer still for situations where a researcher wishes to augment real data with another party's synthesised data. We use a Bayesian paradigm to characterise the updating of model parameters when learning in these settings, demonstrating that caution should be taken when applying conventional learning algorithms without appropriate consideration of the synthetic data generating process and learning task. Recent results from general Bayesian updating support a novel and robust approach to Bayesian synthetic-learning founded on decision theory that outperforms standard approaches across repeated experiments on supervised learning and inference problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题