TY - GEN
T1 - Coordinated Behavior for Sequential Cooperative Task Using Two-Stage Reward Assignment with Decay
AU - Miyashita, Yuki
AU - Sugawara, Toshiharu
N1 - Funding Information:
This work was partly supported by JSPS KAKENHI Grant Number 17KT0044, 20H04245.
Publisher Copyright:
© 2020, Springer Nature Switzerland AG.
PY - 2020
Y1 - 2020
N2 - Recently, multi-agent deep reinforcement learning (MADRL) has been studied to learn actions to achieve complicated tasks and generate their coordination structure. The reward assignment in MADRL is a crucial factor to guide and produce both their behaviors for their own tasks and coordinated behaviors by agents’ individual learning. However, it has not been sufficiently clarified the reward assignment in MADRL’s effect on learned coordinated behavior. To address this issue, using the sequential tasks, coordinated delivery and execution problem with expiration time, we analyze the effect of various ratios of the reward given for the task that agent is responsible for to the reward given for the whole task. Then, we propose a two-stage reward assignment with decay to learn the actions for tasks that the agent is responsible for and coordinated actions for facilitating other agents’ tasks. We experimentally showed that the proposed method enabled agents to learn both actions in a balanced manner, so they could realize effective coordination, by reducing the number of tasks that were ignored by other agents. We also analyzed the mechanism behind the emergence of different coordinated behaviors.
AB - Recently, multi-agent deep reinforcement learning (MADRL) has been studied to learn actions to achieve complicated tasks and generate their coordination structure. The reward assignment in MADRL is a crucial factor to guide and produce both their behaviors for their own tasks and coordinated behaviors by agents’ individual learning. However, it has not been sufficiently clarified the reward assignment in MADRL’s effect on learned coordinated behavior. To address this issue, using the sequential tasks, coordinated delivery and execution problem with expiration time, we analyze the effect of various ratios of the reward given for the task that agent is responsible for to the reward given for the whole task. Then, we propose a two-stage reward assignment with decay to learn the actions for tasks that the agent is responsible for and coordinated actions for facilitating other agents’ tasks. We experimentally showed that the proposed method enabled agents to learn both actions in a balanced manner, so they could realize effective coordination, by reducing the number of tasks that were ignored by other agents. We also analyzed the mechanism behind the emergence of different coordinated behaviors.
KW - Control and decision theory
KW - Cooperation
KW - Coordination
KW - Multi-agent deep reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85097395654&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85097395654&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-63833-7_22
DO - 10.1007/978-3-030-63833-7_22
M3 - Conference contribution
AN - SCOPUS:85097395654
SN - 9783030638320
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 257
EP - 269
BT - Neural Information Processing - 27th International Conference, ICONIP 2020, Proceedings
A2 - Yang, Haiqin
A2 - Pasupa, Kitsuchart
A2 - Leung, Andrew Chi-Sing
A2 - Kwok, James T.
A2 - Chan, Jonathan H.
A2 - King, Irwin
PB - Springer Science and Business Media Deutschland GmbH
T2 - 27th International Conference on Neural Information Processing, ICONIP 2020
Y2 - 18 November 2020 through 22 November 2020
ER -