信号时间逻辑规格的半监督轨迹反馈控制器合成

论文标题

信号时间逻辑规格的半监督轨迹反馈控制器合成

Semi-Supervised Trajectory-Feedback Controller Synthesis for Signal Temporal Logic Specifications

论文作者

Leung, Karen, Pavone, Marco

论文摘要

有一些时空规则决定机器人应如何在复杂的环境中运行，例如，道路规则控制着（自动驾驶）车辆在道路上的行为。但是，将这些规则无缝纳入机器人控制策略仍然具有挑战性，尤其是对于实时应用程序。在这项工作中，如果在信号时间逻辑（STL）语言中表达的所需时空规范，我们提出了一种半监督的控制器合成技术，该技术适合类似人类的行为，同时满足所需的STL规范。离线，我们通过对抗性训练方案综合了轨迹反馈神经网络控制器，该方案在计算控件时总结了过去的时空行为，然后在线，我们执行梯度步骤以提高规范满意度。离线阶段的核心是基于模仿的正规化组成部分，该组件促进更好的政策探索并有助于诱导自然主义的人类行为。我们的实验表明，与仅在先前工作中完成的优化STL目标相比，具有基于模仿的正则化导致更高的定性和定量性能。我们通过说明性案例研究证明了方法的功效，并表明我们提出的控制器在性能和计算时间均优于最先进的拍摄方法。

There are spatio-temporal rules that dictate how robots should operate in complex environments, e.g., road rules govern how (self-driving) vehicles should behave on the road. However, seamlessly incorporating such rules into a robot control policy remains challenging especially for real-time applications. In this work, given a desired spatio-temporal specification expressed in the Signal Temporal Logic (STL) language, we propose a semi-supervised controller synthesis technique that is attuned to human-like behaviors while satisfying desired STL specifications. Offline, we synthesize a trajectory-feedback neural network controller via an adversarial training scheme that summarizes past spatio-temporal behaviors when computing controls, and then online, we perform gradient steps to improve specification satisfaction. Central to the offline phase is an imitation-based regularization component that fosters better policy exploration and helps induce naturalistic human behaviors. Our experiments demonstrate that having imitation-based regularization leads to higher qualitative and quantitative performance compared to optimizing an STL objective only as done in prior work. We demonstrate the efficacy of our approach with an illustrative case study and show that our proposed controller outperforms a state-of-the-art shooting method in both performance and computation time.

下载PDF全文

下载文献需遵守相关版权规定

论文标题