从具有抗模型的语言模型中检测和驱除统计恶魔

论文标题

从具有抗模型的语言模型中检测和驱除统计恶魔

Detecting and Exorcising Statistical Demons from Language Models with Anti-Models of Negative Data

论文作者

Wick, Michael L., Silverstein, Kate, Tristan, Jean-Baptiste, Pocock, Adam, Johnson, Mark

论文摘要

据说“语言模型是无监督的多任务学习者”。的确，以“积极”的英语示例训练了以理想的方式训练的自我监督的语言模型。但是，如果这样的模型与最初的自我判断目标相距甚远，那么任性模型也可能以不良的方式推广，对非自然语言的无意义“负”例子说。这项工作中的一个关键问题是：在（正面）培训数据上训练的语言模型是否也推广到（负）测试数据？我们将这个问题用作评估语言模型在多大程度上学习文本（例如n-grams）的不良属性的程度，这些属性可能会干扰学习文本（例如语法）的更理想的属性。我们发现，在模型家族中，随着参数，训练时期和数据集大小的增加，模型的概括到负N-Gram数据的能力也会增加，这表明标准的自我统一的概括性太远了。我们提出了一种归纳偏差的形式，该形式削弱了这种不良信号，其负面数据分布会自动从正数据中学到。我们应用了从LSTMS中删除N-Gram信号的方法，并发现这样做会导致它们偏爱句法信号，如大量误差降低（在最困难的情况下最多46％）所证明的，在语法主题 - 动词协议任务上。

It's been said that "Language Models are Unsupervised Multitask Learners." Indeed, self-supervised language models trained on "positive" examples of English text generalize in desirable ways to many natural language tasks. But if such models can stray so far from an initial self-supervision objective, a wayward model might generalize in undesirable ways too, say to nonsensical "negative" examples of unnatural language. A key question in this work is: do language models trained on (positive) training data also generalize to (negative) test data? We use this question as a contrivance to assess the extent to which language models learn undesirable properties of text, such as n-grams, that might interfere with the learning of more desirable properties of text, such as syntax. We find that within a model family, as the number of parameters, training epochs, and data set size increase, so does a model's ability to generalize to negative n-gram data, indicating standard self-supervision generalizes too far. We propose a form of inductive bias that attenuates such undesirable signals with negative data distributions automatically learned from positive data. We apply the method to remove n-gram signals from LSTMs and find that doing so causes them to favor syntactic signals, as demonstrated by large error reductions (up to 46% on the hardest cases) on a syntactic subject-verb agreement task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题