论文标题
在Visuo-Tactile反馈政策中,安全的自我监督学习的工业插入政策
Safe Self-Supervised Learning in Real of Visuo-Tactile Feedback Policies for Industrial Insertion
论文作者
论文摘要
工业插入任务通常是重复执行的,这些零件可能会受到严格的公差和容易破裂的影响。实际学习工业插入政策是具有挑战性的,因为零件与环境之间的碰撞会导致零件的滑倒或破裂。在本文中,我们提出了一种安全的自我监督方法,以学习一项可掌握姿势变化的Visuo-Tactile插入政策。该方法减少了人类的输入和零件和容器之间的碰撞。该方法将插入任务分为两个阶段。在第一个对齐阶段中,学会了基于触觉的GRASP姿势估计模型,以使插入部分与容器对齐。在第二个插入阶段,学会了基于远见的政策,以指导插件进入插座。该机器人使用力量扭曲感测来实现安全的自我监督数据收集管道。 NIST组装任务板上USB插入任务的物理实验表明,由此产生的策略可以在45个不同的初始掌握姿势上取得45/45的插入成功,并改善了两个基准:(1)对50个人插入示范(1/45)和(2)在线RL Policy(2)的行为克隆训练的行为克隆训练(TD3)训练(TD3)训练(TD3)训练(2)。
Industrial insertion tasks are often performed repetitively with parts that are subject to tight tolerances and prone to breakage. Learning an industrial insertion policy in real is challenging as the collision between the parts and the environment can cause slippage or breakage of the part. In this paper, we present a safe self-supervised method to learn a visuo-tactile insertion policy that is robust to grasp pose variations. The method reduces human input and collisions between the part and the receptacle. The method divides the insertion task into two phases. In the first align phase, a tactile-based grasp pose estimation model is learned to align the insertion part with the receptacle. In the second insert phase, a vision-based policy is learned to guide the part into the receptacle. The robot uses force-torque sensing to achieve a safe self-supervised data collection pipeline. Physical experiments on the USB insertion task from the NIST Assembly Taskboard suggest that the resulting policies can achieve 45/45 insertion successes on 45 different initial grasp poses, improving on two baselines: (1) a behavior cloning agent trained on 50 human insertion demonstrations (1/45) and (2) an online RL policy (TD3) trained in real (0/45).