论文标题

奖励有条件文本的游戏

Reward Gaming in Conditional Text Generation

论文作者

Pang, Richard Yuanzhe, Padmakumar, Vishakh, Sellam, Thibault, Parikh, Ankur P., He, He

论文摘要

为了使条件文本生成模型输出与所需的行为相结合,越来越多地关注使用增强学习(RL)以及从人类注释中学到的奖励功能来训练该模型。在此框架下,我们确定了三种常见的情况,在这些情况下,高奖励被错误地分配给了不良模式:噪声引起的虚假相关性,自然发生的虚假相关性和协变量转移。我们表明,尽管学到的指标在用于训练奖励功能的数据的分布上达到了高性能,但在文本生成模型的RL培训期间,可能会放大不良模式。尽管讨论了RL或安全社区中的奖励游戏,但在此讨论文章中,我们希望使用具体的有条件文本生成示例来强调自然语言发电(NLG)社区中的奖励游戏,并讨论未来工作的潜在修复和领域。

To align conditional text generation model outputs with desired behaviors, there has been an increasing focus on training the model using reinforcement learning (RL) with reward functions learned from human annotations. Under this framework, we identify three common cases where high rewards are incorrectly assigned to undesirable patterns: noise-induced spurious correlation, naturally occurring spurious correlation, and covariate shift. We show that even though learned metrics achieve high performance on the distribution of the data used to train the reward function, the undesirable patterns may be amplified during RL training of the text generation model. While there has been discussion about reward gaming in the RL or safety community, in this discussion piece, we would like to highlight reward gaming in the natural language generation (NLG) community using concrete conditional text generation examples and discuss potential fixes and areas for future work.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源