通过对比鲜明的语义不同的增强来减轻在线持续学习中忘记

论文标题

通过对比鲜明的语义不同的增强来减轻在线持续学习中忘记

Mitigating Forgetting in Online Continual Learning via Contrasting Semantically Distinct Augmentations

论文作者

Yu, Sheng-Feng, Chiu, Wei-Chen

论文摘要

在线持续学习（OCL）旨在使模型从非平稳的数据流学习不断获取新知识，并保留学习的知识，这是在有限的系统规模和计算成本的限制下，其中主要挑战来自“灾难性忘记”问题 - 无法在学习新知识的同时记住学习知识。由于特定的重点是类OCL场景，即用于分类的OCL，最近的进步融合了学习更广泛的功能表示的对比度学习技术，以实现最先进的表现，但仍无法完全解决灾难性的遗忘。在本文中，我们遵循采用对比学习的策略，但进一步引入了语义上不同的增强技术，在该技术中，它利用强大的增强来生成更多的数据样本，我们表明，考虑到这些样本与原始类别不同的类别（因此与异性分发样品相关的原始类别）在相反的学习机制中造成了与众不同的模型，使人们变得疲惫不堪。此外，除了对比度学习外，我们的模型设计中还包括典型的分类机制和客观（即软马克斯分类器和跨凝性损失），以更快地收敛和使用标签信息，但尤其是配备了采样策略，可以解决一些倾向于偏爱新类别的趋势（即对最近学习的类别的模型偏见）。在对CIFAR-10，CIFAR-100和迷你imagenet数据集进行广泛的实验后，我们提出的方法被证明是针对各种基准的出色性能。

Online continual learning (OCL) aims to enable model learning from a non-stationary data stream to continuously acquire new knowledge as well as retain the learnt one, under the constraints of having limited system size and computational cost, in which the main challenge comes from the "catastrophic forgetting" issue -- the inability to well remember the learnt knowledge while learning the new ones. With the specific focus on the class-incremental OCL scenario, i.e. OCL for classification, the recent advance incorporates the contrastive learning technique for learning more generalised feature representation to achieve the state-of-the-art performance but is still unable to fully resolve the catastrophic forgetting. In this paper, we follow the strategy of adopting contrastive learning but further introduce the semantically distinct augmentation technique, in which it leverages strong augmentation to generate more data samples, and we show that considering these samples semantically different from their original classes (thus being related to the out-of-distribution samples) in the contrastive learning mechanism contributes to alleviate forgetting and facilitate model stability. Moreover, in addition to contrastive learning, the typical classification mechanism and objective (i.e. softmax classifier and cross-entropy loss) are included in our model design for faster convergence and utilising the label information, but particularly equipped with a sampling strategy to tackle the tendency of favouring the new classes (i.e. model bias towards the recently learnt classes). Upon conducting extensive experiments on CIFAR-10, CIFAR-100, and Mini-Imagenet datasets, our proposed method is shown to achieve superior performance against various baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题