SSD-LM：用于文本生成和模块化控制的半自动回报单纯性扩散语言模型

论文标题

SSD-LM：用于文本生成和模块化控制的半自动回报单纯性扩散语言模型

SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control

论文作者

Han, Xiaochuang, Kumar, Sachin, Tsvetkov, Yulia

论文摘要

尽管扩散模型在连续价值域中的成功越来越大（例如图像），但对于诸如文本之类的离散域（例如文本）的类似努力尚未符合自回归语言模型的性能。在这项工作中，我们介绍了SSD-LM，这是一个基于扩散的语言模型，具有两个关键的设计选择。首先，SSD-LM是半自动回旋的，迭代生成文本块，可以在解码时间进行灵活的输出长度，同时启用本地双向上下文更新。其次，它是基于简单的，在自然词汇空间上进行扩散，而不是学习的潜在空间，从而使我们能够使用现成的分类器在没有任何适应性的情况下使用分类器指导和模块化控制。我们在不受约束的文本生成基准上评估了SSD-LM，并表明它与标准质量和多样性指标相匹配或胜过强大的自回旋GPT-2模型，同时表现优于基于扩散的基准。在受控文本生成上，SSD-LM还胜过竞争基线，在模块化方面具有额外的优势。

Despite the growing success of diffusion models in continuous-valued domains (e.g., images), similar efforts for discrete domains such as text have yet to match the performance of autoregressive language models. In this work, we present SSD-LM -- a diffusion-based language model with two key design choices. First, SSD-LM is semi-autoregressive, iteratively generating blocks of text, allowing for flexible output length at decoding time while enabling local bidirectional context updates. Second, it is simplex-based, performing diffusion on the natural vocabulary space rather than a learned latent space, allowing us to incorporate classifier guidance and modular control using off-the-shelf classifiers without any adaptation. We evaluate SSD-LM on unconstrained text generation benchmarks, and show that it matches or outperforms strong autoregressive GPT-2 models across standard quality and diversity metrics, while vastly outperforming diffusion-based baselines. On controlled text generation, SSD-LM also outperforms competitive baselines, with an extra advantage in modularity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题