PMI掩蔽：相关跨度的原则掩盖

论文标题

PMI掩蔽：相关跨度的原则掩盖

PMI-Masking: Principled masking of correlated spans

论文作者

Levine, Yoav, Lenz, Barak, Lieber, Opher, Abend, Omri, Leyton-Brown, Kevin, Tennenholtz, Moshe, Shoham, Yoav

论文摘要

在随机的随机上均匀地掩盖令牌是掩盖语言模型（MLMS）（例如Bert）的常见缺陷。我们表明，这种均匀的掩蔽件允许传销通过锁定浅水当地信号来最大程度地减少其训练目标，从而导致效率低下和次优的下游性能。为了解决这个缺陷，我们提出了PMI蒙版，这是一种基于点倾斜信息概念（PMI）的原则掩盖策略，如果它在语料库上表现出很高的搭配，则可以共同掩盖n-gram。 PMI掩盖会激励，统一和改进以前的启发式方法，这些方法试图解决随机统一令牌掩蔽的缺点，例如全词掩蔽，实体/短语掩盖和随机跨度掩盖。具体而言，我们通过实验表明，PMI掩盖在一半的训练时间内达到了先前掩盖方法的性能，并始终提高训练结束时的性能。

Masking tokens uniformly at random constitutes a common flaw in the pretraining of Masked Language Models (MLMs) such as BERT. We show that such uniform masking allows an MLM to minimize its training objective by latching onto shallow local signals, leading to pretraining inefficiency and suboptimal downstream performance. To address this flaw, we propose PMI-Masking, a principled masking strategy based on the concept of Pointwise Mutual Information (PMI), which jointly masks a token n-gram if it exhibits high collocation over the corpus. PMI-Masking motivates, unifies, and improves upon prior more heuristic approaches that attempt to address the drawback of random uniform token masking, such as whole-word masking, entity/phrase masking, and random-span masking. Specifically, we show experimentally that PMI-Masking reaches the performance of prior masking approaches in half the training time, and consistently improves performance at the end of training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题