论文标题
组合优化的多目标指针网络
Multi-objective Pointer Network for Combinatorial Optimization
论文作者
论文摘要
多目标组合优化问题(MOCOPS),一种类型的复杂优化问题,在各种实际应用中广泛存在。尽管已成功地应用了荟萃学位来解决Mocops,但计算时间通常更长。最近,已经提出了许多深入的增强学习方法(DRL)方法来为组合优化问题生成近似的最佳解决方案。但是,现有的关于DRL的研究很少集中在大麻上。这项研究提出了一个单模深的增强学习框架,称为多目标指针网络(MOPN),其中PN的输入结构有效地改进了,因此单个PN能够求解大型摩擦。此外,提出了基于代表性模型和转移学习的两种培训策略,以进一步提高在不同的应用方案中MOPN的性能。此外,与经典的元数据相比,MOPN仅消耗远期传播的时间来获得帕累托阵线。同时,MOPN对问题量表不敏感,这意味着训练有素的MOPN能够解决不同尺度的大麻。为了验证MOPN的性能,与三个最先进的型号DRL-MOA和三个经典的多目标元数据术相比,对三个多目标旅行者问题进行了广泛的实验。实验结果表明,所提出的模型的表现仅优于DRL-MOA的20 \%至40 \%训练时间的所有比较方法。
Multi-objective combinatorial optimization problems (MOCOPs), one type of complex optimization problems, widely exist in various real applications. Although meta-heuristics have been successfully applied to address MOCOPs, the calculation time is often much longer. Recently, a number of deep reinforcement learning (DRL) methods have been proposed to generate approximate optimal solutions to the combinatorial optimization problems. However, the existing studies on DRL have seldom focused on MOCOPs. This study proposes a single-model deep reinforcement learning framework, called multi-objective Pointer Network (MOPN), where the input structure of PN is effectively improved so that the single PN is capable of solving MOCOPs. In addition, two training strategies, based on representative model and transfer learning, respectively, are proposed to further enhance the performance of MOPN in different application scenarios. Moreover, compared to classical meta-heuristics, MOPN only consumes much less time on forward propagation to obtain the Pareto front. Meanwhile, MOPN is insensitive to problem scale, meaning that a trained MOPN is able to address MOCOPs with different scales. To verify the performance of MOPN, extensive experiments are conducted on three multi-objective traveling salesman problems, in comparison with one state-of-the-art model DRL-MOA and three classical multi-objective meta-heuristics. Experimental results demonstrate that the proposed model outperforms all the comparative methods with only 20\% to 40\% training time of DRL-MOA.