论文标题
用硬轭伪标签来理解GD,以进行测试时间适应
Towards Understanding GD with Hard and Conjugate Pseudo-labels for Test-Time Adaptation
论文作者
论文摘要
我们考虑了一个设置,即模型需要适应正在分发的新域,鉴于在测试时只能访问来自新域的未标记测试样本。在大多数相关作品中,一个共同的想法是为未标记的测试样本构建伪标记,并将梯度下降(GD)应用于伪标签。最近,\ cite {gsrk22}提出了共轭标签,这是一种在测试时进行自我训练的新型伪标签。他们从经验上表明,共轭标签在许多领域适应基准上的伪标记的其他方式都优于其他方式。但是,事实证明,带有共轭标签的GD学会了用于测试时间适应的良好分类器仍然开放。在这项工作中,我们旨在用硬和共轭标签在理论上理解GD,以解决二进制分类问题。我们表明,对于正方形损失,带有共轭标签的GD收敛到任何任意小$ε$的高斯模型下的$ε$ - 最佳预测指标,而带有硬伪标签的GD在此任务中失败了。我们还以不同的损失功能分析它们以进行更新。我们的结果阐明了了解何时以及为什么使用硬标签或共轭标签的GD在测试时间适应中起作用。
We consider a setting that a model needs to adapt to a new domain under distribution shifts, given that only unlabeled test samples from the new domain are accessible at test time. A common idea in most of the related works is constructing pseudo-labels for the unlabeled test samples and applying gradient descent (GD) to a loss function with the pseudo-labels. Recently, \cite{GSRK22} propose conjugate labels, which is a new kind of pseudo-labels for self-training at test time. They empirically show that the conjugate label outperforms other ways of pseudo-labeling on many domain adaptation benchmarks. However, provably showing that GD with conjugate labels learns a good classifier for test-time adaptation remains open. In this work, we aim at theoretically understanding GD with hard and conjugate labels for a binary classification problem. We show that for square loss, GD with conjugate labels converges to an $ε$-optimal predictor under a Gaussian model for any arbitrarily small $ε$, while GD with hard pseudo-labels fails in this task. We also analyze them under different loss functions for the update. Our results shed lights on understanding when and why GD with hard labels or conjugate labels works in test-time adaptation.