论文标题
对持续学习中的排练和知识蒸馏的结合,以进行语言理解的调查
An Investigation of the Combination of Rehearsal and Knowledge Distillation in Continual Learning for Spoken Language Understanding
论文作者
论文摘要
持续学习是指一个动态框架,其中一个模型会随着时间的推移接收非平稳数据流,并且必须适应新数据,同时保留先前获得的知识。不幸的是,神经网络未能遇到这两个逃亡者,从而产生了所谓的灾难性遗忘现象。尽管已经提出了各种各样的策略来减轻计算机视觉域中的遗忘,而对于语音相关的任务,但却缺乏作品。在本文中,我们考虑了在班级学习方案下的彩排和知识蒸馏(KD)方法的共同使用(KD)方法。我们报告网络中不同级别的多个KD组合,这表明组合特征级别和预测级级KD会带来最佳结果。最后,我们提供了一项关于排练记忆大小的影响的消融研究,该研究证实了我们方法对低资源设备的功效。
Continual learning refers to a dynamical framework in which a model receives a stream of non-stationary data over time and must adapt to new data while preserving previously acquired knowledge. Unluckily, neural networks fail to meet these two desiderata, incurring the so-called catastrophic forgetting phenomenon. Whereas a vast array of strategies have been proposed to attenuate forgetting in the computer vision domain, for speech-related tasks, on the other hand, there is a dearth of works. In this paper, we consider the joint use of rehearsal and knowledge distillation (KD) approaches for spoken language understanding under a class-incremental learning scenario. We report on multiple KD combinations at different levels in the network, showing that combining feature-level and predictions-level KDs leads to the best results. Finally, we provide an ablation study on the effect of the size of the rehearsal memory that corroborates the efficacy of our approach for low-resource devices.