论文标题
通过随机子空间分析攻击对逆境示例的不可知论检测
Attack Agnostic Detection of Adversarial Examples via Random Subspace Analysis
论文作者
论文摘要
尽管对抗性攻击检测受到了很大的关注,但从两个角度来看,这仍然是一个根本上具有挑战性的问题。首先,虽然威胁模型可以明确定义,但攻击者策略在这些约束中可能仍然有很大差异。因此,与大多数当前检测方法相比,检测应视为开放式问题。这些方法采用封闭式视图和训练二进制探测器,从而将检测偏向于检测器训练期间看到的攻击。其次,有限的信息在测试时间可用,通常会被包括图像的标签和基础内容在内的滋扰因素混淆。我们通过基于随机子空间分析的新策略来应对这些挑战。我们提出了一种利用随机投影的属性来表征各种子空间中清洁和对抗示例的行为的技术。利用模型激活的自矛盾(或不一致)可以从对抗性示例中辨别清洁。绩效评估表明,我们的技术($ auc \ in [0.92,0.98] $)优于相互竞争的检测策略($ auc \ in [0.30,0.79 $),同时对攻击策略保持了真正的不可知论(对于目标/未靶向攻击而言)。与实现此性能相比,它还需要明显少的校准数据(仅由干净的示例组成)。
Whilst adversarial attack detection has received considerable attention, it remains a fundamentally challenging problem from two perspectives. First, while threat models can be well-defined, attacker strategies may still vary widely within those constraints. Therefore, detection should be considered as an open-set problem, standing in contrast to most current detection approaches. These methods take a closed-set view and train binary detectors, thus biasing detection toward attacks seen during detector training. Second, limited information is available at test time and typically confounded by nuisance factors including the label and underlying content of the image. We address these challenges via a novel strategy based on random subspace analysis. We present a technique that utilizes properties of random projections to characterize the behavior of clean and adversarial examples across a diverse set of subspaces. The self-consistency (or inconsistency) of model activations is leveraged to discern clean from adversarial examples. Performance evaluations demonstrate that our technique ($AUC\in[0.92, 0.98]$) outperforms competing detection strategies ($AUC\in[0.30,0.79]$), while remaining truly agnostic to the attack strategy (for both targeted/untargeted attacks). It also requires significantly less calibration data (composed only of clean examples) than competing approaches to achieve this performance.