论文标题
传播虚假属性:通过虚假属性估算提高最坏组的准确性
Spread Spurious Attribute: Improving Worst-group Accuracy with Spurious Attribute Estimation
论文作者
论文摘要
最差群体损失的范式最小化表明了其避免学习虚假相关性的希望,但需要对虚假属性进行昂贵的额外监督。为了解决这一问题,最近的作品着重于开发较弱的监督形式 - 例如,使用少量具有虚假属性注释的验证样本发现的超参数 - 但是,这些方法都没有使用虚假属性上的完整监督的方法保留可比的性能与方法。在本文中,我们要求:如果我们“完全利用”它们,那么可以访问带有虚假属性注释的固定样本的固定样本,而不是寻找较弱的监督?为此,我们提出了一种基于伪属性的算法,即创建的散布伪造属性(SSA),以提高最差的组精度。特别是,我们利用有或没有伪造属性注释的带有和没有伪造属性的样本来训练模型以预测伪造属性,然后使用训练有素的模型预测的伪属性作为对伪造属性的监督,以训练具有最小差损失的新的强大模型。我们在各种基准数据集上的实验表明,我们的算法始终使用具有虚假属性注释的相同数量的验证样本来优于基线方法。我们还证明,提出的SSA可以通过使用少量的带注释的样本(根据数据集,使用少量的带注释的样本(从0.6%且最高1.5%)实现与方法相当的性能与方法可比的性能。
The paradigm of worst-group loss minimization has shown its promise in avoiding to learn spurious correlations, but requires costly additional supervision on spurious attributes. To resolve this, recent works focus on developing weaker forms of supervision -- e.g., hyperparameters discovered with a small number of validation samples with spurious attribute annotation -- but none of the methods retain comparable performance to methods using full supervision on the spurious attribute. In this paper, instead of searching for weaker supervisions, we ask: Given access to a fixed number of samples with spurious attribute annotations, what is the best achievable worst-group loss if we "fully exploit" them? To this end, we propose a pseudo-attribute-based algorithm, coined Spread Spurious Attribute (SSA), for improving the worst-group accuracy. In particular, we leverage samples both with and without spurious attribute annotations to train a model to predict the spurious attribute, then use the pseudo-attribute predicted by the trained model as supervision on the spurious attribute to train a new robust model having minimal worst-group loss. Our experiments on various benchmark datasets show that our algorithm consistently outperforms the baseline methods using the same number of validation samples with spurious attribute annotations. We also demonstrate that the proposed SSA can achieve comparable performances to methods using full (100%) spurious attribute supervision, by using a much smaller number of annotated samples -- from 0.6% and up to 1.5%, depending on the dataset.