深入增强学习，一声，离线和生产量表优化的PID优化

论文标题

深入增强学习，一声，离线和生产量表优化的PID优化

One-shot, Offline and Production-Scalable PID Optimisation with Deep Reinforcement Learning

论文作者

Shabka, Zacharaya, Enrico, Michael, Parsons, Nick, Zervas, Georgios

论文摘要

比例综合衍生（PID）控制的基础超过$ 97 \％$ $的自动化工业流程。相对于某些指定的性能目标有效地控制这些过程，需要找到一组最佳的PID参数来调节PID循环。调整这些参数是一个漫长而详尽的过程。提出了一种基于深入增强学习的方法（专利待处理），该方法将学习通用系统属性（例如共振频率），多目标性能目标和最佳PID参数值之间的关系。在全球此类设备最重要的制造商的真实光学切换产品的背景下，性能得到了证明。切换由压电执行器处理，其中切换时间和光损失分别从执行器控制过程的速度和稳定性得出。该方法可实现$ 5 \ times $ $的改善，而执行器的数量属于最具挑战性的目标开关速度，$ \ geq 20 \％$以相同的光学损失的平均开关速度提高，而$ \ geq 75 \％$降低绩效不一致时，温度在5到73摄氏度之间会降低性能不一致。此外，该模型一旦接受培训（接受$ \ Mathcal {o}（小时）$），在一次性推理过程中生成了执行器唯一的PID参数，该参数需要$ \ nathcal {o}（MS）$相比，最多可用于$ \ \ \ nathcal {o}（O a} $ aNTARIGES，因此可以调整这些绩效，因此可以调节一定的能力。 $ 10^6 \ times $加速。训练后，该方法可以完全离线应用，在生产中有效地产生了零优化。

Proportional-integral-derivative (PID) control underlies more than $97\%$ of automated industrial processes. Controlling these processes effectively with respect to some specified set of performance goals requires finding an optimal set of PID parameters to moderate the PID loop. Tuning these parameters is a long and exhaustive process. A method (patent pending) based on deep reinforcement learning is presented that learns a relationship between generic system properties (e.g. resonance frequency), a multi-objective performance goal and optimal PID parameter values. Performance is demonstrated in the context of a real optical switching product of the foremost manufacturer of such devices globally. Switching is handled by piezoelectric actuators where switching time and optical loss are derived from the speed and stability of actuator-control processes respectively. The method achieves a $5\times$ improvement in the number of actuators that fall within the most challenging target switching speed, $\geq 20\%$ improvement in mean switching speed at the same optical loss and $\geq 75\%$ reduction in performance inconsistency when temperature varies between 5 and 73 degrees celcius. Furthermore, once trained (which takes $\mathcal{O}(hours)$), the model generates actuator-unique PID parameters in a one-shot inference process that takes $\mathcal{O}(ms)$ in comparison to up to $\mathcal{O}(week)$ required for conventional tuning methods, therefore accomplishing these performance improvements whilst achieving up to a $10^6\times$ speed-up. After training, the method can be applied entirely offline, incurring effectively zero optimisation-overhead in production.

下载PDF全文

下载文献需遵守相关版权规定

论文标题