TY - JOUR
T1 - Online Learning of Genetic Network Programming and its Application to Prisoner's Dilemma Game
AU - Mabu, Shingo
AU - Hu, Jinglu
AU - Murata, Junichi
AU - Hirasawa, Kotaro
N1 - Copyright:
Copyright 2017 Elsevier B.V., All rights reserved.
PY - 2003/1
Y1 - 2003/1
N2 - A new evolutionary model with the network structure named Genetic Network Programming (GNP) has been proposed recently. GNP, that is, an expansion of GA and GP, represents solutions as a network structure and evolves it by using “offline learning (selection, mutation, crossover)”. GNP can memorize the past action sequences in the network flow, so it can deal with Partially Observable Markov Decision Process (POMDP) well. In this paper, in order to improve the ability of GNP, Q learning (an off-policy TD control algorithm) that is one of the famous online methods is introduced for online learning of GNP. Q learning is suitable for GNP because (1) in reinforcement learning, the rewards an agent will get in the future can be estimated, (2) TD control doesn't need much memory and can learn quickly, and (3) off-policy is suitable in order to search for an optimal solution independently of the policy. Finally, in the simulations, online learning of GNP is applied to a player for “Prisoner's dilemma game“ and its ability for online adaptation is confirmed.
AB - A new evolutionary model with the network structure named Genetic Network Programming (GNP) has been proposed recently. GNP, that is, an expansion of GA and GP, represents solutions as a network structure and evolves it by using “offline learning (selection, mutation, crossover)”. GNP can memorize the past action sequences in the network flow, so it can deal with Partially Observable Markov Decision Process (POMDP) well. In this paper, in order to improve the ability of GNP, Q learning (an off-policy TD control algorithm) that is one of the famous online methods is introduced for online learning of GNP. Q learning is suitable for GNP because (1) in reinforcement learning, the rewards an agent will get in the future can be estimated, (2) TD control doesn't need much memory and can learn quickly, and (3) off-policy is suitable in order to search for an optimal solution independently of the policy. Finally, in the simulations, online learning of GNP is applied to a player for “Prisoner's dilemma game“ and its ability for online adaptation is confirmed.
KW - Genetic Algorithm
KW - Genetic Programming. Network Structure
KW - Online learning
KW - Prisoner's dilemma game
KW - Q learning
UR - http://www.scopus.com/inward/record.url?scp=34547270982&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34547270982&partnerID=8YFLogxK
U2 - 10.1541/ieejeiss.123.535
DO - 10.1541/ieejeiss.123.535
M3 - Article
AN - SCOPUS:34547270982
SN - 0385-4221
VL - 123
SP - 535
EP - 543
JO - IEEJ Transactions on Electronics, Information and Systems
JF - IEEJ Transactions on Electronics, Information and Systems
IS - 3
ER -