论文标题

Cxgbert:伯特遇到建筑语法

CxGBERT: BERT meets Construction Grammar

论文作者

Madabushi, Harish Tayyar, Romain, Laurence, Divjak, Dagmar, Milin, Petar

论文摘要

尽管词典语义元素无疑捕获了大量的语言信息,但有人认为它们没有捕获文本中包含的所有信息。这个假设是建筑主义者的语言方法的核心,该语言认为语言由构造组成,学到的形式的配对以及频繁或具有无法从其组成部分预测的含义的功能或含义。伯特(Bert)的培训目标使它获得了大量的词典语义信息,尽管贝特(Bertology)表明,伯特(Bert)表明伯特(Bert)捕获了某些重要的语言维度,但没有研究探索伯特(Bert)可以访问构造信息的程度。在这项工作中,我们设计了几种探针,并进行了广泛的实验来回答这个问题。我们的结果使我们得出结论,伯特确实确实可以访问大量信息,其中大部分语言学家通常会调用构造信息。这种观察的影响可能是深远的,因为它提供了有关深度学习方法从文本中学习的内容的见解,同时还表明在词典 - 词典中详细编码了构造中包含的信息。

While lexico-semantic elements no doubt capture a large amount of linguistic information, it has been argued that they do not capture all information contained in text. This assumption is central to constructionist approaches to language which argue that language consists of constructions, learned pairings of a form and a function or meaning that are either frequent or have a meaning that cannot be predicted from its component parts. BERT's training objectives give it access to a tremendous amount of lexico-semantic information, and while BERTology has shown that BERT captures certain important linguistic dimensions, there have been no studies exploring the extent to which BERT might have access to constructional information. In this work we design several probes and conduct extensive experiments to answer this question. Our results allow us to conclude that BERT does indeed have access to a significant amount of information, much of which linguists typically call constructional information. The impact of this observation is potentially far-reaching as it provides insights into what deep learning methods learn from text, while also showing that information contained in constructions is redundantly encoded in lexico-semantics.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源