行星际任务的低迷轨迹设计的强化学习

论文标题

行星际任务的低迷轨迹设计的强化学习

Reinforcement Learning for Low-Thrust Trajectory Design of Interplanetary Missions

论文作者

Zavoli, Alessandro, Federici, Lorenzo

论文摘要

本文研究了在存在严重干扰的情况下使用增强学习来实现低门行星际轨迹的鲁棒设计，以高斯加性过程噪声，观察噪声，控制幅度和方向的控制驱动误差，可能是多个丢失的推力事件。最佳控制问题是重新铸造，作为一个时间污秽的马尔可夫决策过程，以符合增强学习的标准表述。采用了最新算法近端政策优化的开源实施，以执行深层神经网络的训练过程，用于将航天器（观察到的）状态映射到最佳控制政策。由此产生的指导和控制网络提供了强大的名义轨迹和相关的闭环指导法。为典型的地球任务提供了数值结果。首先，为了验证提出的方法，将（确定性）不受干扰的方案中找到的解决方案与间接技术提供的最佳方案进行了比较。然后，通过在考虑不确定的情况下进行的蒙特卡洛运动评估所获得的闭环指导法律的鲁棒性和最佳性。这些初步结果为在行星际任务的强大设计中使用增强学习开辟了新的视野。

This paper investigates the use of Reinforcement Learning for the robust design of low-thrust interplanetary trajectories in presence of severe disturbances, modeled alternatively as Gaussian additive process noise, observation noise, control actuation errors on thrust magnitude and direction, and possibly multiple missed thrust events. The optimal control problem is recast as a time-discrete Markov Decision Process to comply with the standard formulation of reinforcement learning. An open-source implementation of the state-of-the-art algorithm Proximal Policy Optimization is adopted to carry out the training process of a deep neural network, used to map the spacecraft (observed) states to the optimal control policy. The resulting Guidance and Control Network provides both a robust nominal trajectory and the associated closed-loop guidance law. Numerical results are presented for a typical Earth-Mars mission. First, in order to validate the proposed approach, the solution found in a (deterministic) unperturbed scenario is compared with the optimal one provided by an indirect technique. Then, the robustness and optimality of the obtained closed-loop guidance laws is assessed by means of Monte Carlo campaigns performed in the considered uncertain scenarios. These preliminary results open up new horizons for the use of reinforcement learning in the robust design of interplanetary missions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题