Evolution of cooperation facilitated by reinforcement learning with adaptive aspiration levels

Shoma Tanabe, Naoki Masuda*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

28 Citations (Scopus)


Repeated interaction between individuals is the main mechanism for maintaining cooperation in social dilemma situations. Variants of tit-for-tat (repeating the previous action of the opponent) and the win-stay lose-shift strategy are known as strong competitors in iterated social dilemma games. On the other hand, real repeated interaction generally allows plasticity (i.e., learning) of individuals based on the experience of the past. Although plasticity is relevant to various biological phenomena, its role in repeated social dilemma games is relatively unexplored. In particular, if experience-based learning plays a key role in promotion and maintenance of cooperation, learners should evolve in the contest with nonlearners under selection pressure. By modeling players using a simple reinforcement learning model, we numerically show that learning enables the evolution of cooperation. We also show that numerically estimated adaptive dynamics appositely predict the outcome of evolutionary simulations. The analysis of the adaptive dynamics enables us to capture the obtained results as an affirmative example of the Baldwin effect, where learning accelerates the evolution to optimality.

Original languageEnglish
Pages (from-to)151-160
Number of pages10
JournalJournal of Theoretical Biology
Publication statusPublished - 2012 Jan 21
Externally publishedYes


  • Baldwin effect
  • Iterated prisoner's dilemma

ASJC Scopus subject areas

  • Statistics and Probability
  • Modelling and Simulation
  • General Biochemistry,Genetics and Molecular Biology
  • General Immunology and Microbiology
  • General Agricultural and Biological Sciences
  • Applied Mathematics


Dive into the research topics of 'Evolution of cooperation facilitated by reinforcement learning with adaptive aspiration levels'. Together they form a unique fingerprint.

Cite this