道德：通过多目标增强活性学习使AI与人类规范保持一致

论文标题

道德：通过多目标增强活性学习使AI与人类规范保持一致

MORAL: Aligning AI with Human Norms through Multi-Objective Reinforced Active Learning

论文作者

Peschl, Markus, Zgonnikov, Arkady, Oliehoek, Frans A., Siebert, Luciano C.

论文摘要

从演示和成对的偏好中推断出奖励功能是使加固学习（RL）与人类意图相结合（RL）的吉祥方法。但是，最先进的方法通常专注于学习单个奖励模型，从而使多个专家的不同奖励功能很难进行。我们提出了多目标增强积极学习（道德），这是一种将社会规范各种演示结合到帕累托最佳政策中的新方法。通过维持标量重量的分布，我们的方法能够将深入的RL代理与各种偏好进行交互调整，同时消除了计算多个策略的需求。我们从经验上证明了道德在两种情况下的有效性，这些情况模拟了一项交付和紧急任务，要求代理在存在规范冲突的情况下采取行动。总体而言，我们认为我们的研究是迈向多目标RL的一步，并获得了学习的奖励，从而弥合了当前的奖励学习与机器道德文献之间的差距。

Inferring reward functions from demonstrations and pairwise preferences are auspicious approaches for aligning Reinforcement Learning (RL) agents with human intentions. However, state-of-the art methods typically focus on learning a single reward model, thus rendering it difficult to trade off different reward functions from multiple experts. We propose Multi-Objective Reinforced Active Learning (MORAL), a novel method for combining diverse demonstrations of social norms into a Pareto-optimal policy. Through maintaining a distribution over scalarization weights, our approach is able to interactively tune a deep RL agent towards a variety of preferences, while eliminating the need for computing multiple policies. We empirically demonstrate the effectiveness of MORAL in two scenarios, which model a delivery and an emergency task that require an agent to act in the presence of normative conflicts. Overall, we consider our research a step towards multi-objective RL with learned rewards, bridging the gap between current reward learning and machine ethics literature.

下载PDF全文

下载文献需遵守相关版权规定

论文标题