A theoretical analysis of temporal difference learning in the iterated Prisoner's dilemma game

Naoki Masuda*, Hisashi Ohtsuki

*この研究の対応する著者

研究成果: Article査読

15 被引用数 (Scopus)

抄録

Direct reciprocity is a chief mechanism of mutual cooperation in social dilemma. Agents cooperate if future interactions with the same opponents are highly likely. Direct reciprocity has been explored mostly by evolutionary game theory based on natural selection. Our daily experience tells, however, that real social agents including humans learn to cooperate based on experience. In this paper, we analyze a reinforcement learning model called temporal difference learning and study its performance in the iterated Prisoner's Dilemma game. Temporal difference learning is unique among a variety of learning models in that it inherently aims at increasing future payoffs, not immediate ones. It also has a neural basis. We analytically and numerically show that learners with only two internal states properly learn to cooperate with retaliatory players and to defect against unconditional cooperators and defectors. Four-state learners are more capable of achieving a high payoff against various opponents. Moreover, we numerically show that four-state learners can learn to establish mutual cooperation for sufficiently small learning rates.

本文言語English
ページ(範囲)1818-1850
ページ数33
ジャーナルBulletin of Mathematical Biology
71
8
DOI
出版ステータスPublished - 2009 10月
外部発表はい

ASJC Scopus subject areas

  • 神経科学(全般)
  • 免疫学
  • 数学 (全般)
  • 生化学、遺伝学、分子生物学(全般)
  • 環境科学(全般)
  • 薬理学
  • 農業および生物科学(全般)
  • 計算理論と計算数学

フィンガープリント

「A theoretical analysis of temporal difference learning in the iterated Prisoner's dilemma game」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル