为黑盒策略提供基于模型的建议，用于稳定的非线性控制

论文标题

为黑盒策略提供基于模型的建议，用于稳定的非线性控制

Equipping Black-Box Policies with Model-Based Advice for Stable Nonlinear Control

论文作者

Li, Tongxin, Yang, Ruixiao, Qu, Guannan, Lin, Yiheng, Low, Steven, Wierman, Adam

论文摘要

机器学习的黑盒策略无处不在非线性控制问题。同时，通常可以从非线性动力学的线性近似值来获得这些问题的粗制模型信息。我们研究了将黑框控制策略配备基于模型的建议，以实现单个轨迹的非线性控制。我们首先表明，即使两个策略都稳定，即使两个政策都在稳定，也表明黑盒策略和基于线性模型的策略的幼稚凸组合和基于线性模型的策略的组合可能会导致不稳定的总体负面结果。然后，我们提出了一种自适应$λ$ confident策略，其系数$λ$表示对黑盒策略的信心，并证明其稳定性。此外，凭借有限的非线性，我们表明自适应$λ$ confident策略在近乎最佳的黑盒政策时实现了有限的竞争比率。最后，我们提出了一种在线学习方法，以实施自适应$λ$ confinted政策，并在有关Cartpole问题的案例研究中验证其功效，以及由于COVID-19引起的数据偏见，现实世界中的电动汽车（EV）充电问题。

Machine-learned black-box policies are ubiquitous for nonlinear control problems. Meanwhile, crude model information is often available for these problems from, e.g., linear approximations of nonlinear dynamics. We study the problem of equipping a black-box control policy with model-based advice for nonlinear control on a single trajectory. We first show a general negative result that a naive convex combination of a black-box policy and a linear model-based policy can lead to instability, even if the two policies are both stabilizing. We then propose an adaptive $λ$-confident policy, with a coefficient $λ$ indicating the confidence in a black-box policy, and prove its stability. With bounded nonlinearity, in addition, we show that the adaptive $λ$-confident policy achieves a bounded competitive ratio when a black-box policy is near-optimal. Finally, we propose an online learning approach to implement the adaptive $λ$-confident policy and verify its efficacy in case studies about the CartPole problem and a real-world electric vehicle (EV) charging problem with data bias due to COVID-19.

下载PDF全文

下载文献需遵守相关版权规定

论文标题