Policy Advisory Module for Exploration Hindrance Problem in Multi-agent Deep Reinforcement Learning

Jiahao Peng*, Toshiharu Sugawara

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper proposes a method to improve the policies trained with multi-agent deep learning by adding a policy advisory module (PAM) in the testing phase to relax the exploration hindrance problem. Cooperation and coordination are central issues in the study of multi-agent systems, but agents’ policies learned in slightly different contexts may lead to ineffective behavior that reduces the quality of cooperation. For example, in a disaster rescue scenario, agents with different functions must work cooperatively as well as avoid collision. In the early stages, all agents work effectively, but when only a few tasks remain with the passage of time, agents are likely to focus more on avoiding negative rewards brought about by collision, but this avoidance behavior may hinder cooperative actions. For this problem, we propose a PAM that navigates agents in the testing phase to improve performance. Using an example problem of disaster rescue, we investigated whether the PAM could improve the entire performance by comparing cases with and without it. Our experimental results show that the PAM could break the exploration hindrance problem and improve the entire performance by navigating the trained agents.

Original languageEnglish
Title of host publicationPRIMA 2020
Subtitle of host publicationPrinciples and Practice of Multi-Agent Systems - 23rd International Conference, 2020, Proceedings
EditorsTakahiro Uchiya, Quan Bai, Iván Marsá Maestre
PublisherSpringer Science and Business Media Deutschland GmbH
Pages133-149
Number of pages17
ISBN (Print)9783030693213
DOIs
Publication statusPublished - 2021
Event23rd International Conference on Principles and Practice of Multi-Agent Systems, PRIMA 2020 - Virtual, Online
Duration: 2020 Nov 182020 Nov 20

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12568 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference23rd International Conference on Principles and Practice of Multi-Agent Systems, PRIMA 2020
CityVirtual, Online
Period20/11/1820/11/20

Keywords

  • Cooperation
  • Deep reinforcement learning
  • Disaster rescue
  • Multi-agent system
  • Sequential cooperative task
  • Social dilemma

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Policy Advisory Module for Exploration Hindrance Problem in Multi-agent Deep Reinforcement Learning'. Together they form a unique fingerprint.

Cite this