论文标题
关于变形金刚识别形式语言的能力和局限性
On the Ability and Limitations of Transformers to Recognize Formal Languages
论文作者
论文摘要
变形金刚在大量NLP任务中取代了复发模型。但是,它们建模不同句法特性的能力的差异在很大程度上未知。过去的作品表明,LSTM在普通语言上很好地概括,并与反语言有密切的联系。在这项工作中,我们系统地研究了变形金刚对这种语言建模的能力以及其各个组成部分在这样做中的作用。我们首先为反语言的子类提供了变压器的结构,包括诸如N- ARY Boolean Expressions,dyck-1及其概括之类的精心培养语言。在实验中,我们发现变形金刚在该子类上表现良好,并且其学到的机制与我们的构造密切相关。与LSTM相比,也许令人惊讶的是,变形金刚仅在一部分普通语言上表现出色,因为我们会根据众所周知的复杂性衡量语言使语言变得更加复杂。我们的分析还提供了有关自我注意机制在建模某些行为以及位置编码方案对模型学习和概括能力的影响中的作用的见解。
Transformers have supplanted recurrent models in a large number of NLP tasks. However, the differences in their abilities to model different syntactic properties remain largely unknown. Past works suggest that LSTMs generalize very well on regular languages and have close connections with counter languages. In this work, we systematically study the ability of Transformers to model such languages as well as the role of its individual components in doing so. We first provide a construction of Transformers for a subclass of counter languages, including well-studied languages such as n-ary Boolean Expressions, Dyck-1, and its generalizations. In experiments, we find that Transformers do well on this subclass, and their learned mechanism strongly correlates with our construction. Perhaps surprisingly, in contrast to LSTMs, Transformers do well only on a subset of regular languages with degrading performance as we make languages more complex according to a well-known measure of complexity. Our analysis also provides insights on the role of self-attention mechanism in modeling certain behaviors and the influence of positional encoding schemes on the learning and generalization abilities of the model.