TY - GEN
T1 - Policy Advisory Module for Exploration Hindrance Problem in Multi-agent Deep Reinforcement Learning
AU - Peng, Jiahao
AU - Sugawara, Toshiharu
N1 - Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - This paper proposes a method to improve the policies trained with multi-agent deep learning by adding a policy advisory module (PAM) in the testing phase to relax the exploration hindrance problem. Cooperation and coordination are central issues in the study of multi-agent systems, but agents’ policies learned in slightly different contexts may lead to ineffective behavior that reduces the quality of cooperation. For example, in a disaster rescue scenario, agents with different functions must work cooperatively as well as avoid collision. In the early stages, all agents work effectively, but when only a few tasks remain with the passage of time, agents are likely to focus more on avoiding negative rewards brought about by collision, but this avoidance behavior may hinder cooperative actions. For this problem, we propose a PAM that navigates agents in the testing phase to improve performance. Using an example problem of disaster rescue, we investigated whether the PAM could improve the entire performance by comparing cases with and without it. Our experimental results show that the PAM could break the exploration hindrance problem and improve the entire performance by navigating the trained agents.
AB - This paper proposes a method to improve the policies trained with multi-agent deep learning by adding a policy advisory module (PAM) in the testing phase to relax the exploration hindrance problem. Cooperation and coordination are central issues in the study of multi-agent systems, but agents’ policies learned in slightly different contexts may lead to ineffective behavior that reduces the quality of cooperation. For example, in a disaster rescue scenario, agents with different functions must work cooperatively as well as avoid collision. In the early stages, all agents work effectively, but when only a few tasks remain with the passage of time, agents are likely to focus more on avoiding negative rewards brought about by collision, but this avoidance behavior may hinder cooperative actions. For this problem, we propose a PAM that navigates agents in the testing phase to improve performance. Using an example problem of disaster rescue, we investigated whether the PAM could improve the entire performance by comparing cases with and without it. Our experimental results show that the PAM could break the exploration hindrance problem and improve the entire performance by navigating the trained agents.
KW - Cooperation
KW - Deep reinforcement learning
KW - Disaster rescue
KW - Multi-agent system
KW - Sequential cooperative task
KW - Social dilemma
UR - http://www.scopus.com/inward/record.url?scp=85102778943&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85102778943&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-69322-0_9
DO - 10.1007/978-3-030-69322-0_9
M3 - Conference contribution
AN - SCOPUS:85102778943
SN - 9783030693213
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 133
EP - 149
BT - PRIMA 2020
A2 - Uchiya, Takahiro
A2 - Bai, Quan
A2 - Marsá Maestre, Iván
PB - Springer Science and Business Media Deutschland GmbH
T2 - 23rd International Conference on Principles and Practice of Multi-Agent Systems, PRIMA 2020
Y2 - 18 November 2020 through 20 November 2020
ER -