TY - GEN
T1 - Edge Encoded Attention Mechanism to Solve Capacitated Vehicle Routing Problem with Reinforcement Learning
AU - Fellek, G.
AU - Gebreyesus, G.
AU - Farid, A.
AU - Fujimura, S.
AU - Yoshie, O.
N1 - Funding Information:
This Research is supported by Japan International Cooperation Agency (JICA) under Kaizen PhD project.
Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - The capacitated vehicle routing problem (CVRP), which is referred as NP-hard problem is a variant of Traveling Salesman Problem (TSP). CVRP constructs the route with the lowest cost without violating vehicle capacity constraints to meet demands of customer nodes. Following the advent of artificial intelligence and deep learning, the use of deep reinforcement learning (DRL) to solve CVRP is giving promising results. In this paper we proposed DRL model to solve CVRP. The transformer-based encoder of our proposed model fuses node and edge information to construct a rich graph embedding. The proposed architecture is trained using proximal policy optimization (PPO). Experiments using randomly generated test instances show that the proposed model gives rise to better results in comparison with the existing DRL methods. In addition, we also tested our model on locally generated real-world data to verify its performance. Accordingly, the results show that our model has a good generalization performance for both of random instance testing to real-world instance testing.
AB - The capacitated vehicle routing problem (CVRP), which is referred as NP-hard problem is a variant of Traveling Salesman Problem (TSP). CVRP constructs the route with the lowest cost without violating vehicle capacity constraints to meet demands of customer nodes. Following the advent of artificial intelligence and deep learning, the use of deep reinforcement learning (DRL) to solve CVRP is giving promising results. In this paper we proposed DRL model to solve CVRP. The transformer-based encoder of our proposed model fuses node and edge information to construct a rich graph embedding. The proposed architecture is trained using proximal policy optimization (PPO). Experiments using randomly generated test instances show that the proposed model gives rise to better results in comparison with the existing DRL methods. In addition, we also tested our model on locally generated real-world data to verify its performance. Accordingly, the results show that our model has a good generalization performance for both of random instance testing to real-world instance testing.
KW - Attention mechanism
KW - transformer
KW - vehicle routing problem
UR - http://www.scopus.com/inward/record.url?scp=85146356121&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85146356121&partnerID=8YFLogxK
U2 - 10.1109/IEEM55944.2022.9989997
DO - 10.1109/IEEM55944.2022.9989997
M3 - Conference contribution
AN - SCOPUS:85146356121
T3 - IEEE International Conference on Industrial Engineering and Engineering Management
SP - 576
EP - 582
BT - IEEE International Conference on Industrial Engineering and Engineering Management, IEEM 2022
PB - IEEE Computer Society
T2 - 2022 IEEE International Conference on Industrial Engineering and Engineering Management, IEEM 2022
Y2 - 7 December 2022 through 10 December 2022
ER -