我知道您的意思：通过（以下）估算他们的选择集来学习人类目标

论文标题

我知道您的意思：通过（以下）估算他们的选择集来学习人类目标

I Know What You Meant: Learning Human Objectives by (Under)estimating Their Choice Set

论文作者

Jonnavittula, Ananth, Losey, Dylan P.

论文摘要

辅助机器人有可能帮助人们执行日常任务。但是，这些机器人首先需要了解他们的用户希望他们做什么。对于经验不足的用户，老年用户和身体残疾的用户来说，教学辅助机器人很难，因为这些人通常无法向机器人展示其所需的行为。我们知道，包容性学习者应该让人类教师因无法证明的东西而值得称赞。但是今天的机器人相反：他们假设每个用户都能提供任何演示。结果，这些机器人学会模仿所证明的行为，即使这种行为不是人类真正的含义！在这里，我们提出了一种不同的方法来奖励学习：机器人在类似或更简单的替代方案的背景下将用户演示的原因。与先前的作品不同，这是指高估人类能力的误解 - 在这里我们要低估了人类可以输入的内容（即他们的选择集）。我们的理论分析证明，低估人类的选择集是规避风险的，比高估的情况更糟糕。我们正式化了三个属性，以生成类似和更简单的替代方案。在模拟和用户研究中，我们由此产生的算法可以更好地推断人的目标。请参阅此处的用户研究：https：//youtu.be/rgbh2yulvro

Assistive robots have the potential to help people perform everyday tasks. However, these robots first need to learn what it is their user wants them to do. Teaching assistive robots is hard for inexperienced users, elderly users, and users living with physical disabilities, since often these individuals are unable to show the robot their desired behavior. We know that inclusive learners should give human teachers credit for what they cannot demonstrate. But today's robots do the opposite: they assume every user is capable of providing any demonstration. As a result, these robots learn to mimic the demonstrated behavior, even when that behavior is not what the human really meant! Here we propose a different approach to reward learning: robots that reason about the user's demonstrations in the context of similar or simpler alternatives. Unlike prior works -- which err towards overestimating the human's capabilities -- here we err towards underestimating what the human can input (i.e., their choice set). Our theoretical analysis proves that underestimating the human's choice set is risk-averse, with better worst-case performance than overestimating. We formalize three properties to generate similar and simpler alternatives. Across simulations and a user study, our resulting algorithm better extrapolates the human's objective. See the user study here: https://youtu.be/RgbH2YULVRo

下载PDF全文

下载文献需遵守相关版权规定

论文标题