论文标题
遵循千里眼:模仿学习方法以最佳控制
Follow the Clairvoyant: an Imitation Learning Approach to Optimal Control
论文作者
论文摘要
我们考虑通过竞争分析的镜头来控制动力系统。该领域的大多数先前工作都集中于最大程度地减少遗憾,即相对于理想千里眼政策的损失,该政策具有非因果关系,该政策可访问过去,现在和将来的干扰。通过观察到最佳成本仅提供有关理想闭环行为的粗略信息的动机,我们建议直接最大程度地减少相对于事后的最佳轨迹的跟踪误差,即模仿千里眼的政策。通过采用系统级别的观点,我们提出了一种基于高效优化的方法,用于计算顾客(FTC)安全控制器。我们证明,如果对非因果基准没有施加约束,那么这些遗憾是最小的遗憾。此外,我们提出了数值实验,以表明我们的政策保留了竞争性算法的标志,在经典的$ \ Mathcal {h} _2 $和$ \ Mathcal {h} _ \ intcal {h} _ \ infty $控制法律中,同时超过了遗憾的最小化方法,即clair of clair of clair clair clair clair clair clair clairs clairs clair的最小化最小,同时又超过了遗憾的方法。
We consider control of dynamical systems through the lens of competitive analysis. Most prior work in this area focuses on minimizing regret, that is, the loss relative to an ideal clairvoyant policy that has noncausal access to past, present, and future disturbances. Motivated by the observation that the optimal cost only provides coarse information about the ideal closed-loop behavior, we instead propose directly minimizing the tracking error relative to the optimal trajectories in hindsight, i.e., imitating the clairvoyant policy. By embracing a system level perspective, we present an efficient optimization-based approach for computing follow-the-clairvoyant (FTC) safe controllers. We prove that these attain minimal regret if no constraints are imposed on the noncausal benchmark. In addition, we present numerical experiments to show that our policy retains the hallmark of competitive algorithms of interpolating between classical $\mathcal{H}_2$ and $\mathcal{H}_\infty$ control laws - while consistently outperforming regret minimization methods in constrained scenarios thanks to the superior ability to chase the clairvoyant.