通过深度强化学习学习2-OPT启发式问题

论文标题

通过深度强化学习学习2-OPT启发式问题

Learning 2-opt Heuristics for the Traveling Salesman Problem via Deep Reinforcement Learning

论文作者

da Costa, Paulo R. de O., Rhuggenaath, Jason, Zhang, Yingqian, Akcay, Alp

论文摘要

最近使用深度学习来解决旅行推销员问题（TSP）的工作重点是学习构建启发式方法。这种方法找到了质量高质量的TSP解决方案，但需要其他程序，例如梁搜索和采样，以改善解决方案并实现最先进的性能。但是，很少有研究集中于改进启发式方法，在这种启发式方法上，在达到近乎最佳的溶液之前，给定的溶液得到了改进。在这项工作中，我们建议通过深入的强化学习来学习基于2-OPT操作员的本地搜索启发式。我们提出了一种策略梯度算法，以学习一个随机策略，以选择2-OPT操作给定当前解决方案。此外，我们引入了一个政策神经网络，该网络利用指向注意力的机制，与以前的作品不同，可以轻松地扩展到更通用的K-OPT动作。我们的结果表明，与以前的最先进的深度学习方法相比，学习策略甚至可以通过随机的初始解决方案改进，并以更快的速度使用近乎最佳的解决方案。

Recent works using deep learning to solve the Traveling Salesman Problem (TSP) have focused on learning construction heuristics. Such approaches find TSP solutions of good quality but require additional procedures such as beam search and sampling to improve solutions and achieve state-of-the-art performance. However, few studies have focused on improvement heuristics, where a given solution is improved until reaching a near-optimal one. In this work, we propose to learn a local search heuristic based on 2-opt operators via deep reinforcement learning. We propose a policy gradient algorithm to learn a stochastic policy that selects 2-opt operations given a current solution. Moreover, we introduce a policy neural network that leverages a pointing attention mechanism, which unlike previous works, can be easily extended to more general k-opt moves. Our results show that the learned policies can improve even over random initial solutions and approach near-optimal solutions at a faster rate than previous state-of-the-art deep learning methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题