TY - JOUR
T1 - A theoretical analysis of temporal difference learning in the iterated Prisoner's dilemma game
AU - Masuda, Naoki
AU - Ohtsuki, Hisashi
N1 - Funding Information:
N.M. acknowledges the support from the Grants-in-Aid for Scientific Research (No. 20760258) and the Grant-in-Aid for Scientific Research on Priority Areas: Integrative Brain Research (No. 20019012) from MEXT, Japan. H.O. acknowledges the support and the Grants-in-Aid for Scientific Research from JSPS, Japan.
PY - 2009/10
Y1 - 2009/10
N2 - Direct reciprocity is a chief mechanism of mutual cooperation in social dilemma. Agents cooperate if future interactions with the same opponents are highly likely. Direct reciprocity has been explored mostly by evolutionary game theory based on natural selection. Our daily experience tells, however, that real social agents including humans learn to cooperate based on experience. In this paper, we analyze a reinforcement learning model called temporal difference learning and study its performance in the iterated Prisoner's Dilemma game. Temporal difference learning is unique among a variety of learning models in that it inherently aims at increasing future payoffs, not immediate ones. It also has a neural basis. We analytically and numerically show that learners with only two internal states properly learn to cooperate with retaliatory players and to defect against unconditional cooperators and defectors. Four-state learners are more capable of achieving a high payoff against various opponents. Moreover, we numerically show that four-state learners can learn to establish mutual cooperation for sufficiently small learning rates.
AB - Direct reciprocity is a chief mechanism of mutual cooperation in social dilemma. Agents cooperate if future interactions with the same opponents are highly likely. Direct reciprocity has been explored mostly by evolutionary game theory based on natural selection. Our daily experience tells, however, that real social agents including humans learn to cooperate based on experience. In this paper, we analyze a reinforcement learning model called temporal difference learning and study its performance in the iterated Prisoner's Dilemma game. Temporal difference learning is unique among a variety of learning models in that it inherently aims at increasing future payoffs, not immediate ones. It also has a neural basis. We analytically and numerically show that learners with only two internal states properly learn to cooperate with retaliatory players and to defect against unconditional cooperators and defectors. Four-state learners are more capable of achieving a high payoff against various opponents. Moreover, we numerically show that four-state learners can learn to establish mutual cooperation for sufficiently small learning rates.
KW - Cooperation
KW - Direct reciprocity
KW - Prisoner's dilemma
KW - Reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=70350354956&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70350354956&partnerID=8YFLogxK
U2 - 10.1007/s11538-009-9424-8
DO - 10.1007/s11538-009-9424-8
M3 - Article
C2 - 19479310
AN - SCOPUS:70350354956
SN - 0007-4985
VL - 71
SP - 1818
EP - 1850
JO - The Bulletin of Mathematical Biophysics
JF - The Bulletin of Mathematical Biophysics
IS - 8
ER -