论文标题
动量解码:开放式文本生成作为图形探索
Momentum Decoding: Open-ended Text Generation As Graph Exploration
论文作者
论文摘要
具有自回归语言模型(LMS)的开放式文本生成是自然语言处理中的核心任务之一。但是,基于最大化的解码方法(例如,贪婪/光束搜索)通常会导致变性问题,即生成的文本是不自然的,并且包含不良的重复。现有解决此问题的解决方案要么引入容易出现不连贯性的随机性,要么需要一种需要额外计算开销的外观机制。在这项研究中,我们从新的角度来制定开放式文本生成,即,我们将其视为有向图内的探索过程。因此,我们将变性现象理解为有向图内的圆环。基于我们的公式,我们提出了一种新颖的解码方法 - \ textIt {动量解码} - ,它鼓励LM到\ textit {Greedily}探索当前图之外的新节点。同时,它还允许LM以预定义的电阻函数降级的动量返回现有节点。我们通过自动和人类评估对来自不同领域的三个基准测试进行了广泛的测试。结果表明,动量解码的性能与当前的最新状态相当,同时享有明显改善的推理速度和计算失败。此外,我们进行了详细的分析,以揭示我们方法的优点和内部工作。我们的代码和其他相关资源可在https://github.com/gmftbygmftby/momentumdecoding上公开获得。
Open-ended text generation with autoregressive language models (LMs) is one of the core tasks in natural language processing. However, maximization-based decoding methods (e.g., greedy/beam search) often lead to the degeneration problem, i.e., the generated text is unnatural and contains undesirable repetitions. Existing solutions to this problem either introduce randomness prone to incoherence or require a look-ahead mechanism that demands extra computational overhead. In this study, we formulate open-ended text generation from a new perspective, i.e., we view it as an exploration process within a directed graph. Thereby, we understand the phenomenon of degeneration as circular loops within the directed graph. Based on our formulation, we propose a novel decoding method -- \textit{momentum decoding} -- which encourages the LM to \textit{greedily} explore new nodes outside the current graph. Meanwhile, it also allows the LM to return to the existing nodes with a momentum downgraded by a pre-defined resistance function. We extensively test our approach on three benchmarks from different domains through automatic and human evaluations. The results show that momentum decoding performs comparably with the current state of the art while enjoying notably improved inference speed and computation FLOPs. Furthermore, we conduct a detailed analysis to reveal the merits and inner workings of our approach. Our codes and other related resources are publicly available at https://github.com/gmftbyGMFTBY/MomentumDecoding.