论文标题
通过选择性平均上下文嵌入来对名词的一般特性进行建模
Modelling General Properties of Nouns by Selectively Averaging Contextualised Embeddings
论文作者
论文摘要
尽管预训练的语言模型的成功在很大程度上消除了在许多NLP应用程序中对高质量静态词向量的需求,但此类向量在没有语言环境的情况下需要建模单词的任务中继续发挥重要作用。在本文中,我们探讨了BERT预测的上下文化嵌入方式如何用于为此类领域(尤其是与知识库完成)生成高质量的单词向量,在这种情况下,我们的重点是捕获名词的语义属性。我们发现,平均蒙版单词的上下文嵌入的简单策略会导致矢量胜过伯特学到的静态单词向量,以及从标准单词嵌入模型中所学的静态单词矢量。我们特别注意到,掩盖目标词对于实现这种强大的性能至关重要,因为最终的向量较少关注特质属性,而更多地关注一般语义属性。受这种观点的启发,我们提出了一种过滤策略,旨在消除最特异的提及向量,从而使我们能够在财产诱导中获得进一步的绩效提高。
While the success of pre-trained language models has largely eliminated the need for high-quality static word vectors in many NLP applications, such vectors continue to play an important role in tasks where words need to be modelled in the absence of linguistic context. In this paper, we explore how the contextualised embeddings predicted by BERT can be used to produce high-quality word vectors for such domains, in particular related to knowledge base completion, where our focus is on capturing the semantic properties of nouns. We find that a simple strategy of averaging the contextualised embeddings of masked word mentions leads to vectors that outperform the static word vectors learned by BERT, as well as those from standard word embedding models, in property induction tasks. We notice in particular that masking target words is critical to achieve this strong performance, as the resulting vectors focus less on idiosyncratic properties and more on general semantic properties. Inspired by this view, we propose a filtering strategy which is aimed at removing the most idiosyncratic mention vectors, allowing us to obtain further performance gains in property induction.