论文标题
L2C2:局部Lipschitz连续约束稳定且平稳的增强学习
L2C2: Locally Lipschitz Continuous Constraint towards Stable and Smooth Reinforcement Learning
论文作者
论文摘要
本文提出了一种用于增强学习(RL)的新正规化技术,以使政策和价值功能平稳稳定。 RL以学习过程的不稳定性以及获得的政策对噪声的敏感性而闻名。已经提出了几种解决这些问题的方法,总而言之,政策和价值功能的平稳性主要在RL中造成这些问题。但是,如果这些功能非常流畅,它们的表现力将会丢失,从而无法获得全局最佳解决方案。因此,本文考虑了当地Lipschitz的连续性约束,即所谓的L2C2。通过在每个时间步骤中从状态转变中设计L2C2的时空局部紧凑空间,可以在不会丧失表达的情况下实现中等平滑度。数值嘈杂的模拟验证了所提出的L2C2在平滑学习策略产生的机器人动作的同时优于任务性能。
This paper proposes a new regularization technique for reinforcement learning (RL) towards making policy and value functions smooth and stable. RL is known for the instability of the learning process and the sensitivity of the acquired policy to noise. Several methods have been proposed to resolve these problems, and in summary, the smoothness of policy and value functions learned mainly in RL contributes to these problems. However, if these functions are extremely smooth, their expressiveness would be lost, resulting in not obtaining the global optimal solution. This paper therefore considers RL under local Lipschitz continuity constraint, so-called L2C2. By designing the spatio-temporal locally compact space for L2C2 from the state transition at each time step, the moderate smoothness can be achieved without loss of expressiveness. Numerical noisy simulations verified that the proposed L2C2 outperforms the task performance while smoothing out the robot action generated from the learned policy.