可接受性判断通过检查注意图的拓扑

论文标题

可接受性判断通过检查注意图的拓扑

Acceptability Judgements via Examining the Topology of Attention Maps

论文作者

Cherniavskii, Daniil, Tulchinskii, Eduard, Mikhailov, Vladislav, Proskurina, Irina, Kushnareva, Laida, Artemova, Ekaterina, Barannikov, Serguei, Piontkovskaya, Irina, Piontkovski, Dmitri, Burnaev, Evgeny

论文摘要

注意机制在编码语言知识中的作用引起了NLP的特别兴趣。但是，注意力判断句子的语法可接受性的能力尚未得到充实。本文通过拓扑数据分析（TDA）处理可接受性判断的范式，表明注意力图的几何特性可以有效利用语言学的两种标准实践：二元判断和语言最小对。拓扑功能以三种语言（英语，意大利语和瑞典语）的Cola为基于BERT的可接受性分类器分数$ 8 $％ - $ 24 $％。通过揭示最小对的注意力图之间的拓扑差异，我们在飞艇基准上实现了人类水平的性能，超过了九个统计和变压器LM基准。同时，TDA为分析注意力头的语言功能并解释图形特征和语法现象之间的对应关系提供了基础。

The role of the attention mechanism in encoding linguistic knowledge has received special interest in NLP. However, the ability of the attention heads to judge the grammatical acceptability of a sentence has been underexplored. This paper approaches the paradigm of acceptability judgments with topological data analysis (TDA), showing that the geometric properties of the attention graph can be efficiently exploited for two standard practices in linguistics: binary judgments and linguistic minimal pairs. Topological features enhance the BERT-based acceptability classifier scores by $8$%-$24$% on CoLA in three languages (English, Italian, and Swedish). By revealing the topological discrepancy between attention maps of minimal pairs, we achieve the human-level performance on the BLiMP benchmark, outperforming nine statistical and Transformer LM baselines. At the same time, TDA provides the foundation for analyzing the linguistic functions of attention heads and interpreting the correspondence between the graph features and grammatical phenomena.

下载PDF全文

下载文献需遵守相关版权规定

论文标题