带有分配加固学习的风险敏感政策

论文标题

带有分配加固学习的风险敏感政策

Risk-Sensitive Policy with Distributional Reinforcement Learning

论文作者

Théate, Thibaut, Ernst, Damien

论文摘要

经典的加强学习（RL）技术通常与预期结果最大化驱动的决策政策的设计有关。然而，这种方法并未考虑与所采取的行动相关的潜在风险，这在某些应用中可能至关重要。为了解决该问题，本研究工作介绍了一种基于分布RL的新方法，以得出对风险敏感的顺序决策策略，后者是由回报概率分布的尾巴建模的。核心想法是通过考虑预期收益和风险的另一个功能，替换了RL学习方案的核心，$ q $函数通常站在RL中的学习方案的核心。命名为基于风险的实用功能$ u $，可以从任何分布RL算法自然学到的随机返回分配$ z $中提取。与完全规避风险的方法相比，这使得风险最小化和预期收益最大化之间的完全潜在权衡。从根本上讲，这项研究为学习风险敏感的政策提供了真正的实用和可访问的解决方案，对分布RL算法的修改最少，并强调了由此产生的决策过程的解释性。

Classical reinforcement learning (RL) techniques are generally concerned with the design of decision-making policies driven by the maximisation of the expected outcome. Nevertheless, this approach does not take into consideration the potential risk associated with the actions taken, which may be critical in certain applications. To address that issue, the present research work introduces a novel methodology based on distributional RL to derive sequential decision-making policies that are sensitive to the risk, the latter being modelled by the tail of the return probability distribution. The core idea is to replace the $Q$ function generally standing at the core of learning schemes in RL by another function taking into account both the expected return and the risk. Named the risk-based utility function $U$, it can be extracted from the random return distribution $Z$ naturally learnt by any distributional RL algorithm. This enables to span the complete potential trade-off between risk minimisation and expected return maximisation, in contrast to fully risk-averse methodologies. Fundamentally, this research yields a truly practical and accessible solution for learning risk-sensitive policies with minimal modification to the distributional RL algorithm, and with an emphasis on the interpretability of the resulting decision-making process.

下载PDF全文

下载文献需遵守相关版权规定

论文标题