通过利用学习先生的先验来加速掌握探索

论文标题

通过利用学习先生的先验来加速掌握探索

Accelerating Grasp Exploration by Leveraging Learned Priors

论文作者

Li, Han Yu, Danielczuk, Michael, Balakrishna, Ashwin, Satish, Vishal, Goldberg, Ken

论文摘要

机器人掌握新颖对象的能力具有行业应用在电子商务订单履行和家庭服务中的应用。数据驱动的掌握政策在学习掌握任意对象的一般策略方面取得了成功。但是，这些方法无法掌握具有复杂几何形状或显着超出训练分布的物体。我们提出了一种汤普森采样算法，该算法学会使用在线体验掌握给定的对象。该算法利用了从敏捷网络机器人掌握的计划者学习的先验，以指导Grasp探索并为新物体的每个稳定姿势提供抓握成功的概率估计。我们发现，将策略与DEX-NET先验播种可以使其更有效地在这些对象上找到强大的抓地力。实验表明，最佳学识的政策比贪婪的基线获得平均总奖励64.5％，并且在评估3000个物体姿势的300,000多个培训运行中，占据了Oracle基线的5.7％以内。

The ability of robots to grasp novel objects has industry applications in e-commerce order fulfillment and home service. Data-driven grasping policies have achieved success in learning general strategies for grasping arbitrary objects. However, these approaches can fail to grasp objects which have complex geometry or are significantly outside of the training distribution. We present a Thompson sampling algorithm that learns to grasp a given object with unknown geometry using online experience. The algorithm leverages learned priors from the Dexterity Network robot grasp planner to guide grasp exploration and provide probabilistic estimates of grasp success for each stable pose of the novel object. We find that seeding the policy with the Dex-Net prior allows it to more efficiently find robust grasps on these objects. Experiments suggest that the best learned policy attains an average total reward 64.5% higher than a greedy baseline and achieves within 5.7% of an oracle baseline when evaluated over 300,000 training runs across a set of 3000 object poses.

下载PDF全文

下载文献需遵守相关版权规定

论文标题