论文标题
部分可观测时空混沌系统的无模型预测
Rarely a problem? Language models exhibit inverse scaling in their predictions following few-type quantifiers
论文作者
论文摘要
语言模型如何处理量化?在这项研究中,我们专注于“少数”类型的量词,就像“很少有孩子喜欢玩具”中,这可能对语言模型构成特殊挑战,因为带有量化器的句子组件可能会共同出现,并且“少数”型量词量词很少。我们提出了960个英语句子刺激,从两个人类神经语言实验到22种不同大小的自回归变压器模型。所有模型不仅在“少数”型量化符上的性能较差,而且总体上越大的模型,其性能越高。这种逆缩放与以前的工作相一致,表明较大的模型越来越多地反映在线而不是离线人类处理,我们认为较大模型的降低性能可能会挑战语言模型作为自然语言系统的基础。
How well do language models deal with quantification? In this study, we focus on 'few'-type quantifiers, as in 'few children like toys', which might pose a particular challenge for language models because the sentence components with out the quantifier are likely to co-occur, and 'few'-type quantifiers are rare. We present 960 English sentence stimuli from two human neurolinguistic experiments to 22 autoregressive transformer models of differing sizes. Not only do all the models perform poorly on 'few'-type quantifiers, but overall the larger the model, the worse its performance. This inverse scaling is consistent with previous work suggesting that larger models increasingly reflect online rather than offline human processing, and we argue that the decreasing performance of larger models may challenge uses of language models as the basis for natural language systems.