论文标题

AVGOUT:一个简单的输出概率措施,以消除沉闷的反应

AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses

论文作者

Niu, Tong, Bansal, Mohit

论文摘要

许多序列对话模型倾向于产生安全的,无信息的响应。试图消除它们已有各种有用的努力。但是,这些方法要么改善了推断期间解码算法,要么依赖手工制作的功能,要么采用复杂的模型。在我们的工作中,我们建立了对话模型,这些模型在没有任何功能工程的情况下动态地意识到什么话语或令牌是乏味的。具体而言,我们从简单而有效的自动度量标准开始,该指标计算训练期间解码器侧所有时间步长的平均输出概率分布。该指标直接估计了哪些令牌更有可能产生,从而使其对模型多样性进行忠实的评估(即,对于不同的模型,代币的概率应该更均匀地分布,而不是在一些沉闷的标记处达到顶峰)。然后,我们利用这个新颖的指标提出了三个促进多样性而不会失去相关性的模型。第一个模型Minavgout直接通过每批输出分布最大化多样性得分。第二个模型,标签微调(LFT),将源序列A的标签预先限制,以多样性评分连续缩放以控制多样性水平。第三个模型RL采用强化学习,并将多样性得分视为奖励信号。此外,我们通过组合Minavgout和RL的损失项来实验混合模型。所有四个模型的基础LSTM-RNN模型都超过了多样性和相关性的基本模型,并且与竞争基线相当(也可以通过人类评估进行验证)。此外,我们的方法与基本模型是正交的,这使得它们适用于将来其他新兴对话模型的附加组件。

Many sequence-to-sequence dialogue models tend to generate safe, uninformative responses. There have been various useful efforts on trying to eliminate them. However, these approaches either improve decoding algorithms during inference, rely on hand-crafted features, or employ complex models. In our work, we build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering. Specifically, we start with a simple yet effective automatic metric, AvgOut, which calculates the average output probability distribution of all time steps on the decoder side during training. This metric directly estimates which tokens are more likely to be generated, thus making it a faithful evaluation of the model diversity (i.e., for diverse models, the token probabilities should be more evenly distributed rather than peaked at a few dull tokens). We then leverage this novel metric to propose three models that promote diversity without losing relevance. The first model, MinAvgOut, directly maximizes the diversity score through the output distributions of each batch; the second model, Label Fine-Tuning (LFT), prepends to the source sequence a label continuously scaled by the diversity score to control the diversity level; the third model, RL, adopts Reinforcement Learning and treats the diversity score as a reward signal. Moreover, we experiment with a hybrid model by combining the loss terms of MinAvgOut and RL. All four models outperform their base LSTM-RNN model on both diversity and relevance by a large margin, and are comparable to or better than competitive baselines (also verified via human evaluation). Moreover, our approaches are orthogonal to the base model, making them applicable as an add-on to other emerging better dialogue models in the future.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源