ganlm：用辅助歧视者进行编码器预训练

论文标题

ganlm：用辅助歧视者进行编码器预训练

GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator

论文作者

Yang, Jian, Ma, Shuming, Dong, Li, Huang, Shaohan, Huang, Haoyang, Yin, Yuwei, Zhang, Dongdong, Yang, Liqun, Wei, Furu, Li, Zhoujun

论文摘要

预训练的模型在自然语言处理（NLP）方面取得了巨大的成功。但是，现有的培训方法不足以将语言理解的益处用于发电。受生成对抗网络（GAN）的构想的启发，我们通过引入辅助歧视器，提出了一个用于编码模型预训练的GAN式模型，从而在单个模型中统一了语言理解和产生的能力。我们的模型被称为GanLM，接受了两个预训练目标的训练：更换令牌检测并取代了令牌Denoising。具体而言，给定的屏蔽源句子，生成器输出目标分布，并且鉴别器预测来自分布的目标对象是否不正确。目标句子被错误分类的令牌取代，以构建嘈杂的先前上下文，该上下文用于生成黄金句子。通常，这两个任务都通过选择性使用denoising数据来提高语言理解和生成的能力。语言生成基准的广泛实验表明，具有强大语言理解能力的GANM优于各种强大的预训练的语言模型（PLM），并实现最先进的表现。

Pre-trained models have achieved remarkable success in natural language processing (NLP). However, existing pre-training methods underutilize the benefits of language understanding for generation. Inspired by the idea of Generative Adversarial Networks (GANs), we propose a GAN-style model for encoder-decoder pre-training by introducing an auxiliary discriminator, unifying the ability of language understanding and generation in a single model. Our model, named as GanLM, is trained with two pre-training objectives: replaced token detection and replaced token denoising. Specifically, given masked source sentences, the generator outputs the target distribution and the discriminator predicts whether the target sampled tokens from distribution are incorrect. The target sentence is replaced with misclassified tokens to construct noisy previous context, which is used to generate the gold sentence. In general, both tasks improve the ability of language understanding and generation by selectively using the denoising data. Extensive experiments in language generation benchmarks show that GanLM with the powerful language understanding capability outperforms various strong pre-trained language models (PLMs) and achieves state-of-the-art performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题