ADCB: Adaptive Dynamic Clustering of Bandits for Online Recommendation System

Yufeng Wang*, Weidong Zhang, Jianhua Ma, Qun Jin

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review


To deal with the insufficient feedbacks and dynamics of individual arrival and item popularity in online recommender, collaborative multi-armed bandit (MAB) schemes intentionally utilize the explicitly known or implicitly inferred social relationships among individuals to collaboratively recommend. Especially, without assuming the social relationships among individuals given, the dynamic cluster of bandits simultaneously infers the relationships, and recommends items through using the inferred relationships in multi-round interactive steps. However, the existed clustering bandit algorithms have two weakpoints: first they either fix the number of clusters in advance, or assign two individuals into the same cluster if there exists a path between two users in graph structure, which may lead to the wrongly cluster users. Second, they usually exploit only the cluster’s accumulated parameters of cluster as the inferred preference of individual in the cluster, which can’t fully accurately learn individual’s latent preference. To address issues above, we propose new clustering MAB based online recommendation methods, ADCB and ADCB+, based on adaptively splitting and merging clusters, which incrementally enforce both user-level re-assignment and cluster-level re-adjustment in recommendation rounds to efficiently and effectively learn the individuals’ preferences and their clustering structure. Especially, the proposed ADCB+ method further exploits both the accumulated cluster preference parameters and each individual’s personalized feature through the adaptively weighting of the two influences according to the number of user interactions. The experiments on three real online rating datasets (i.e., MovieLens-2k, Delicious-2k, LastFM-2k) consistently show that, in terms of the cumulative reward over recommendation rounds, and the average Click-Through-Rate, our proposed ADCB and ADCB+ schemes outperform than some existing dynamic clustering based online recommendation methods.

Original languageEnglish
Pages (from-to)1155-1172
Number of pages18
JournalNeural Processing Letters
Issue number2
Publication statusPublished - 2023 Apr


  • Dynamic clustering
  • Multi-armed bandit (MAB)
  • Online learning
  • Recommender systems

ASJC Scopus subject areas

  • Software
  • Neuroscience(all)
  • Computer Networks and Communications
  • Artificial Intelligence


Dive into the research topics of 'ADCB: Adaptive Dynamic Clustering of Bandits for Online Recommendation System'. Together they form a unique fingerprint.

Cite this