没有位置编码的变压器语言模型仍然学习位置信息

论文标题

没有位置编码的变压器语言模型仍然学习位置信息

Transformer Language Models without Positional Encodings Still Learn Positional Information

论文作者

Haviv, Adi, Ram, Ori, Press, Ofir, Izsak, Peter, Levy, Omer

论文摘要

因果变压器语言模型（LMS）（例如GPT-3）通常需要某种形式的位置编码，例如位置嵌入。但是，我们表明，没有任何明确位置编码的LMS仍与标准模型竞争，并且这种现象在不同的数据集，模型尺寸和序列长度之间是强大的。探测实验表明，此类模型在整个网络中获得了绝对位置的隐式概念，从而有效地补偿了丢失的信息。我们推测因果关注使模型能够推断每个令牌可以参与的前辈数量，从而近似于其绝对位置。我们的发现表明，因果LM不仅可以从显式定位机制中得出位置意识，而且还可以从因果面具的影响中得出。

Causal transformer language models (LMs), such as GPT-3, typically require some form of positional encoding, such as positional embeddings. However, we show that LMs without any explicit positional encoding are still competitive with standard models, and that this phenomenon is robust across different datasets, model sizes, and sequence lengths. Probing experiments reveal that such models acquire an implicit notion of absolute positions throughout the network, effectively compensating for the missing information. We conjecture that causal attention enables the model to infer the number of predecessors that each token can attend to, thereby approximating its absolute position. Our findings indicate that causal LMs might derive positional awareness not only from the explicit positioning mechanism, but also from the effects of the causal mask.

下载PDF全文

下载文献需遵守相关版权规定

论文标题