论文标题
要了解快速的对抗训练
Towards Understanding Fast Adversarial Training
论文作者
论文摘要
当前基于神经网络的分类器容易受到对抗示例的影响。防御这种对抗性例子的最经验成功的方法是对抗性训练,它在训练过程中结合了强大的自我攻击,以增强其稳健性。但是,这种方法在计算上是昂贵的,因此很难扩大规模。最近的一项名为“快速对抗训练”的工作表明,可以显着减少计算时间而不牺牲大量绩效。这种方法结合了简单的自我攻击,但它只能为有限数量的训练时期而进行,从而导致次优的性能。在本文中,我们进行实验以了解快速对抗训练的行为,并证明其成功的关键是能够从过度拟合到弱攻击中恢复。然后,我们扩展了发现以改善快速对抗性训练,并在训练时间大量降低的情况下证明了与强大的对抗训练相比强大的良好精度。
Current neural-network-based classifiers are susceptible to adversarial examples. The most empirically successful approach to defending against such adversarial examples is adversarial training, which incorporates a strong self-attack during training to enhance its robustness. This approach, however, is computationally expensive and hence is hard to scale up. A recent work, called fast adversarial training, has shown that it is possible to markedly reduce computation time without sacrificing significant performance. This approach incorporates simple self-attacks, yet it can only run for a limited number of training epochs, resulting in sub-optimal performance. In this paper, we conduct experiments to understand the behavior of fast adversarial training and show the key to its success is the ability to recover from overfitting to weak attacks. We then extend our findings to improve fast adversarial training, demonstrating superior robust accuracy to strong adversarial training, with much-reduced training time.