论文标题
强化学习的情感分析
Sentiment Analysis for Reinforcement Learning
论文作者
论文摘要
尽管强化学习(RL)在自然语言处理(NLP)领域(例如对话生成和基于文本的游戏)中取得了成功,但它通常面临稀疏奖励的问题,这会导致缓慢或没有收敛。使用文本描述仅提取状态表示形式的传统方法忽略了它们内在存在的反馈。例如,在基于文本的游戏中,诸如“好工作!您吃食物”之类的描述表示进度,而诸如“您进入新房间”之类的描述表示探索。像这样的正面和负面提示可以通过情感分析转换为奖励。该技术将稀疏的奖励问题转换为一个浓密的问题,这更容易解决。此外,这可以使强化学习无奖励,在这些学习中,代理商完全从这些内在的情感奖励中学习。该框架类似于内在动机,在这种动机的情况下,环境不一定会提供奖励,而是代理人本身分析并实现了奖励。我们发现,使用情感分析在基于文本的游戏中提供密集的奖励可以在某些情况下提高性能。
While reinforcement learning (RL) has been successful in natural language processing (NLP) domains such as dialogue generation and text-based games, it typically faces the problem of sparse rewards that leads to slow or no convergence. Traditional methods that use text descriptions to extract only a state representation ignore the feedback inherently present in them. In text-based games, for example, descriptions like "Good Job! You ate the food}" indicate progress, and descriptions like "You entered a new room" indicate exploration. Positive and negative cues like these can be converted to rewards through sentiment analysis. This technique converts the sparse reward problem into a dense one, which is easier to solve. Furthermore, this can enable reinforcement learning without rewards, in which the agent learns entirely from these intrinsic sentiment rewards. This framework is similar to intrinsic motivation, where the environment does not necessarily provide the rewards, but the agent analyzes and realizes them by itself. We find that providing dense rewards in text-based games using sentiment analysis improves performance under some conditions.