机器人在工作中学习：部署期间的人类自主权和学习

论文标题

机器人在工作中学习：部署期间的人类自主权和学习

Robot Learning on the Job: Human-in-the-Loop Autonomy and Learning During Deployment

论文作者

Liu, Huihan, Nasiriany, Soroush, Zhang, Lance, Bao, Zhiyao, Zhu, Yuke

论文摘要

随着计算能力的快速增长和深度学习的最新进展，我们目睹了在研究环境中新型机器人能力的令人印象深刻的演示。尽管如此，这些学习系统表现出脆弱的概括，需要过多的培训数据才能完成实际任务。为了利用最先进的机器人学习模型的能力，同时拥抱了他们的缺陷，我们提出了Sirius，这是人类和机器人通过工作部门合作的原则性框架。在此框架中，部分自主机器人的任务是处理他们可靠工作的决策的主要部分；同时，人类操作员监督过程并干预具有挑战性的情况。这样的人类机器人团队可确保在复杂的任务中安全部署。此外，我们介绍了一种新的学习算法，以提高政策对从任务执行收集的数据的绩效。核心思想是通过近似人类信任重新提高训练样本，并通过加权行为克隆优化政策。我们在模拟和真实硬件中评估了Sirius，这表明Sirius始终优于一系列接触式操纵任务的基准，在仿真中实现了8％的增长，而实际硬件的增强率为27％，而在政策成功率方面的最先进方法则具有两倍的融合和85％的记忆尺寸。视频和更多详细信息可在https://ut-aut-austin-rpl.github.io/sirius/上找到

With the rapid growth of computing powers and recent advances in deep learning, we have witnessed impressive demonstrations of novel robot capabilities in research settings. Nonetheless, these learning systems exhibit brittle generalization and require excessive training data for practical tasks. To harness the capabilities of state-of-the-art robot learning models while embracing their imperfections, we present Sirius, a principled framework for humans and robots to collaborate through a division of work. In this framework, partially autonomous robots are tasked with handling a major portion of decision-making where they work reliably; meanwhile, human operators monitor the process and intervene in challenging situations. Such a human-robot team ensures safe deployments in complex tasks. Further, we introduce a new learning algorithm to improve the policy's performance on the data collected from the task executions. The core idea is re-weighing training samples with approximated human trust and optimizing the policies with weighted behavioral cloning. We evaluate Sirius in simulation and on real hardware, showing that Sirius consistently outperforms baselines over a collection of contact-rich manipulation tasks, achieving an 8% boost in simulation and 27% on real hardware than the state-of-the-art methods in policy success rate, with twice faster convergence and 85% memory size reduction. Videos and more details are available at https://ut-austin-rpl.github.io/sirius/

下载PDF全文

下载文献需遵守相关版权规定

论文标题