论文标题
多元化和歧义:从指定的数据中学习
Diversify and Disambiguate: Learning From Underspecified Data
论文作者
论文摘要
许多数据集被指定:给定任务有多个同样可行的解决方案。对于学习单个假设的方法,指定的指定可能是有问题的,因为实现低训练损失的不同功能可以集中在不同的预测特征上,从而在分布数据的数据上产生较大的预测。我们提出了Divdis,这是一个简单的两阶段框架,首先通过利用测试分布中的未标记数据来了解任务的各种假设。然后,我们通过使用其他标签的形式或检查功能可视化的形式选择最小的其他监督来选择一个发现的假设之一来消除歧义。我们证明了Divdis找到在图像分类中使用强大特征的假设和自然语言处理问题的能力。
Many datasets are underspecified: there exist multiple equally viable solutions to a given task. Underspecification can be problematic for methods that learn a single hypothesis because different functions that achieve low training loss can focus on different predictive features and thus produce widely varying predictions on out-of-distribution data. We propose DivDis, a simple two-stage framework that first learns a diverse collection of hypotheses for a task by leveraging unlabeled data from the test distribution. We then disambiguate by selecting one of the discovered hypotheses using minimal additional supervision, in the form of additional labels or inspection of function visualization. We demonstrate the ability of DivDis to find hypotheses that use robust features in image classification and natural language processing problems with underspecification.