Coordinated Behavior for Sequential Cooperative Task Using Two-Stage Reward Assignment with Decay

Yuki Miyashita*, Toshiharu Sugawara

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Recently, multi-agent deep reinforcement learning (MADRL) has been studied to learn actions to achieve complicated tasks and generate their coordination structure. The reward assignment in MADRL is a crucial factor to guide and produce both their behaviors for their own tasks and coordinated behaviors by agents’ individual learning. However, it has not been sufficiently clarified the reward assignment in MADRL’s effect on learned coordinated behavior. To address this issue, using the sequential tasks, coordinated delivery and execution problem with expiration time, we analyze the effect of various ratios of the reward given for the task that agent is responsible for to the reward given for the whole task. Then, we propose a two-stage reward assignment with decay to learn the actions for tasks that the agent is responsible for and coordinated actions for facilitating other agents’ tasks. We experimentally showed that the proposed method enabled agents to learn both actions in a balanced manner, so they could realize effective coordination, by reducing the number of tasks that were ignored by other agents. We also analyzed the mechanism behind the emergence of different coordinated behaviors.

Original languageEnglish
Title of host publicationNeural Information Processing - 27th International Conference, ICONIP 2020, Proceedings
EditorsHaiqin Yang, Kitsuchart Pasupa, Andrew Chi-Sing Leung, James T. Kwok, Jonathan H. Chan, Irwin King
PublisherSpringer Science and Business Media Deutschland GmbH
Pages257-269
Number of pages13
ISBN (Print)9783030638320
DOIs
Publication statusPublished - 2020
Event27th International Conference on Neural Information Processing, ICONIP 2020 - Bangkok, Thailand
Duration: 2020 Nov 182020 Nov 22

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12533 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference27th International Conference on Neural Information Processing, ICONIP 2020
Country/TerritoryThailand
CityBangkok
Period20/11/1820/11/22

Keywords

  • Control and decision theory
  • Cooperation
  • Coordination
  • Multi-agent deep reinforcement learning

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Coordinated Behavior for Sequential Cooperative Task Using Two-Stage Reward Assignment with Decay'. Together they form a unique fingerprint.

Cite this