论文标题

部分可观测时空混沌系统的无模型预测

Dataset Condensation via Efficient Synthetic-Data Parameterization

论文作者

Kim, Jang-Hyun, Kim, Jinuk, Oh, Seong Joon, Yun, Sangdoo, Song, Hwanjun, Jeong, Joonhyun, Ha, Jung-Woo, Song, Hyun Oh

论文摘要

用大量数据的机器学习取得了巨大的成功,以巨大的计算成本和培训和调整存储的价格。关于数据集凝结的最新研究试图通过合成紧凑的训练数据集来减少对大规模数据的依赖。但是,由于合成数据集的有限性而没有考虑任何数据规律性特征,因此现有方法在优化方面具有根本的限制。为此,我们提出了一个新颖的冷凝框架,该框架通过考虑数据规律性来生成多个综合数据,通过有效的参数化生成有限的存储预算。我们进一步分析了现有的基于梯度匹配的冷凝方法的缺点,并开发了一种有效的优化技术来改善培训数据信息的凝结。我们提出了一种统一的算法,该算法可大大提高与CIFAR-10,ImageNet和语音命令当前最新技术的凝结数据质量。

The great success of machine learning with massive amounts of data comes at a price of huge computation costs and storage for training and tuning. Recent studies on dataset condensation attempt to reduce the dependence on such massive data by synthesizing a compact training dataset. However, the existing approaches have fundamental limitations in optimization due to the limited representability of synthetic datasets without considering any data regularity characteristics. To this end, we propose a novel condensation framework that generates multiple synthetic data with a limited storage budget via efficient parameterization considering data regularity. We further analyze the shortcomings of the existing gradient matching-based condensation methods and develop an effective optimization technique for improving the condensation of training data information. We propose a unified algorithm that drastically improves the quality of condensed data against the current state-of-the-art on CIFAR-10, ImageNet, and Speech Commands.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源