研究IID，OOD和对抗设置中多种任务的选择性预测方法

论文标题

研究IID，OOD和对抗设置中多种任务的选择性预测方法

Investigating Selective Prediction Approaches Across Several Tasks in IID, OOD, and Adversarial Settings

论文作者

Varshney, Neeraj, Mishra, Swaroop, Baral, Chitta

论文摘要

为了为NLP系统配备选择性预测能力，已经提出了几种特定于任务的方法。但是，哪种方法在跨任务中运作良好，或者即使它们始终优于最简单的基线“ MaxProb”，还有待探索。为此，我们在几个NLP任务的17个数据集的大规模设置中系统地研究了“选择性预测”。通过在内域（IID），域外（OOD）和对抗（ADV）设置下进行的全面实验，我们表明，尽管利用了其他资源（持有数据/计算），但在所有三个设置中，现有方法均未始终如一，并且没有相当大得多。此外，他们的性能不能很好地转化为跨任务。例如，蒙特 - 卡洛辍学物优于重复检测数据集上的所有其他方法，但在NLI数据集上的表现不佳，尤其是在OOD设置中。因此，我们建议应在任务和设置之间评估未来的选择性预测方法，以可靠地估算其能力。

In order to equip NLP systems with selective prediction capability, several task-specific approaches have been proposed. However, which approaches work best across tasks or even if they consistently outperform the simplest baseline 'MaxProb' remains to be explored. To this end, we systematically study 'selective prediction' in a large-scale setup of 17 datasets across several NLP tasks. Through comprehensive experiments under in-domain (IID), out-of-domain (OOD), and adversarial (ADV) settings, we show that despite leveraging additional resources (held-out data/computation), none of the existing approaches consistently and considerably outperforms MaxProb in all three settings. Furthermore, their performance does not translate well across tasks. For instance, Monte-Carlo Dropout outperforms all other approaches on Duplicate Detection datasets but does not fare well on NLI datasets, especially in the OOD setting. Thus, we recommend that future selective prediction approaches should be evaluated across tasks and settings for reliable estimation of their capabilities.

下载PDF全文

下载文献需遵守相关版权规定

论文标题