视觉上依赖历史的操纵的语言教学

论文标题

视觉上依赖历史的操纵的语言教学

Visually Grounding Language Instruction for History-Dependent Manipulation

论文作者

Ahn, Hyemin, Kwon, Obin, Kim, Kyoungdo, Jeong, Jaeyeon, Jun, Howoong, Lee, Hongjung, Lee, Dongheui, Oh, Songhwai

论文摘要

本文强调了机器人参考其任务历史记录的能力的重要性，尤其是当它通过按下一个一个人给出的语言说明执行一系列挑选和设备的操作时。参考操作历史记录的优点可以分为两个折：（1）语言指令省略了细节，但是可以解释参考过去的表达式，并且（2）可以推断出先前操纵的对象的视觉信息。为此，我们介绍了一项依赖历史的操纵任务，该任务是在视觉上以一系列语言说明来依据，以通过参考过去，以进行适当的挑选和放置操作。我们还建议一个相关的数据集和模型，该数据集和模型可以是基准，并表明我们经过拟议数据集训练的模型也可以基于Cypergan应用于现实世界。我们的数据集和代码在项目网站上公开可用：https：//sites.google.com/view/history-deppity-manipulation。

This paper emphasizes the importance of a robot's ability to refer to its task history, especially when it executes a series of pick-and-place manipulations by following language instructions given one by one. The advantage of referring to the manipulation history can be categorized into two folds: (1) the language instructions omitting details but using expressions referring to the past can be interpreted, and (2) the visual information of objects occluded by previous manipulations can be inferred. For this, we introduce a history-dependent manipulation task which objective is to visually ground a series of language instructions for proper pick-and-place manipulations by referring to the past. We also suggest a relevant dataset and model which can be a baseline, and show that our model trained with the proposed dataset can also be applied to the real world based on the CycleGAN. Our dataset and code are publicly available on the project website: https://sites.google.com/view/history-dependent-manipulation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题