论文标题
挑选伯特的大脑:使用代表性相似性分析在上下文化嵌入中探测语言依赖性
Picking BERT's Brain: Probing for Linguistic Dependencies in Contextualized Embeddings Using Representational Similarity Analysis
论文作者
论文摘要
顾名思义,语言的上下文表示语言表示通常是由于其编码上下文的能力而动机。这些表示形式捕获了上下文的哪些方面?我们介绍了一种使用代表性相似性分析(RSA)来解决这个问题的方法。作为案例研究,我们研究了动词嵌入动词主题的程度,代词嵌入的代词编码代词的先例,而全句子表示编码句子的头词(由依赖性解析确定)。在所有情况下,我们都表明,伯特的上下文化嵌入反映了所研究的语言依赖性,并且伯特在编码这些依赖性方面的编码程度要比编码较少语言的平衡控制的程度更大。这些结果证明了我们的方法在假设之间裁定有关上下文的哪个方面在语言表示中编码的能力。
As the name implies, contextualized representations of language are typically motivated by their ability to encode context. Which aspects of context are captured by such representations? We introduce an approach to address this question using Representational Similarity Analysis (RSA). As case studies, we investigate the degree to which a verb embedding encodes the verb's subject, a pronoun embedding encodes the pronoun's antecedent, and a full-sentence representation encodes the sentence's head word (as determined by a dependency parse). In all cases, we show that BERT's contextualized embeddings reflect the linguistic dependency being studied, and that BERT encodes these dependencies to a greater degree than it encodes less linguistically-salient controls. These results demonstrate the ability of our approach to adjudicate between hypotheses about which aspects of context are encoded in representations of language.