PTDE：通过蒸馏执行的个性化培训，用于多代理增强学习

论文标题

PTDE：通过蒸馏执行的个性化培训，用于多代理增强学习

PTDE: Personalized Training with Distilled Execution for Multi-Agent Reinforcement Learning

论文作者

Chen, Yiqun, Mao, Hangyu, Mao, Jiaxin, Wu, Shiguang, Zhang, Tianle, Zhang, Bin, Yang, Wei, Chang, Hongxing

论文摘要

通过分散执行（CTDE）的集中培训已成为多机构强化学习中广泛采用的范式，强调了全球信息用于学习增强的联合$ Q $功能或集中评论家的利用。相比之下，我们的调查研究了利用全球信息直接增强单个$ q $ functions或个体参与者。值得注意的是，我们发现，在所有代理商中普遍应用相同的全球信息被证明不足以实现最佳性能。因此，我们主张定制为每个代理量身定制的全球信息，创建代理人个性化的全球信息以增强整体性能。此外，我们引入了一种用蒸馏执行（PTDE）的名为“个性化培训”的小说范式，其中代理人个性化的全球信息被蒸馏到代理商的本地信息中。然后在分散执行过程中使用此蒸馏信息，从而导致性能降低最少。 PTDE可以与最先进的算法无缝集成，从而导致各种基准的绩效提高，包括SMAC基准，Google Research Football（GRF）基准和学习排名（LTR）任务。

Centralized Training with Decentralized Execution (CTDE) has emerged as a widely adopted paradigm in multi-agent reinforcement learning, emphasizing the utilization of global information for learning an enhanced joint $Q$-function or centralized critic. In contrast, our investigation delves into harnessing global information to directly enhance individual $Q$-functions or individual actors. Notably, we discover that applying identical global information universally across all agents proves insufficient for optimal performance. Consequently, we advocate for the customization of global information tailored to each agent, creating agent-personalized global information to bolster overall performance. Furthermore, we introduce a novel paradigm named Personalized Training with Distilled Execution (PTDE), wherein agent-personalized global information is distilled into the agent's local information. This distilled information is then utilized during decentralized execution, resulting in minimal performance degradation. PTDE can be seamlessly integrated with state-of-the-art algorithms, leading to notable performance enhancements across diverse benchmarks, including the SMAC benchmark, Google Research Football (GRF) benchmark, and Learning to Rank (LTR) task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题