TY - JOUR
T1 - ADCB
T2 - Adaptive Dynamic Clustering of Bandits for Online Recommendation System
AU - Wang, Yufeng
AU - Zhang, Weidong
AU - Ma, Jianhua
AU - Jin, Qun
N1 - Funding Information:
The work was sponsored by QingLan Project of JiangSu Province, and JiangSu Provincial Key Research and Development Program (No. BE2020084-1).
Publisher Copyright:
© 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2023/4
Y1 - 2023/4
N2 - To deal with the insufficient feedbacks and dynamics of individual arrival and item popularity in online recommender, collaborative multi-armed bandit (MAB) schemes intentionally utilize the explicitly known or implicitly inferred social relationships among individuals to collaboratively recommend. Especially, without assuming the social relationships among individuals given, the dynamic cluster of bandits simultaneously infers the relationships, and recommends items through using the inferred relationships in multi-round interactive steps. However, the existed clustering bandit algorithms have two weakpoints: first they either fix the number of clusters in advance, or assign two individuals into the same cluster if there exists a path between two users in graph structure, which may lead to the wrongly cluster users. Second, they usually exploit only the cluster’s accumulated parameters of cluster as the inferred preference of individual in the cluster, which can’t fully accurately learn individual’s latent preference. To address issues above, we propose new clustering MAB based online recommendation methods, ADCB and ADCB+, based on adaptively splitting and merging clusters, which incrementally enforce both user-level re-assignment and cluster-level re-adjustment in recommendation rounds to efficiently and effectively learn the individuals’ preferences and their clustering structure. Especially, the proposed ADCB+ method further exploits both the accumulated cluster preference parameters and each individual’s personalized feature through the adaptively weighting of the two influences according to the number of user interactions. The experiments on three real online rating datasets (i.e., MovieLens-2k, Delicious-2k, LastFM-2k) consistently show that, in terms of the cumulative reward over recommendation rounds, and the average Click-Through-Rate, our proposed ADCB and ADCB+ schemes outperform than some existing dynamic clustering based online recommendation methods.
AB - To deal with the insufficient feedbacks and dynamics of individual arrival and item popularity in online recommender, collaborative multi-armed bandit (MAB) schemes intentionally utilize the explicitly known or implicitly inferred social relationships among individuals to collaboratively recommend. Especially, without assuming the social relationships among individuals given, the dynamic cluster of bandits simultaneously infers the relationships, and recommends items through using the inferred relationships in multi-round interactive steps. However, the existed clustering bandit algorithms have two weakpoints: first they either fix the number of clusters in advance, or assign two individuals into the same cluster if there exists a path between two users in graph structure, which may lead to the wrongly cluster users. Second, they usually exploit only the cluster’s accumulated parameters of cluster as the inferred preference of individual in the cluster, which can’t fully accurately learn individual’s latent preference. To address issues above, we propose new clustering MAB based online recommendation methods, ADCB and ADCB+, based on adaptively splitting and merging clusters, which incrementally enforce both user-level re-assignment and cluster-level re-adjustment in recommendation rounds to efficiently and effectively learn the individuals’ preferences and their clustering structure. Especially, the proposed ADCB+ method further exploits both the accumulated cluster preference parameters and each individual’s personalized feature through the adaptively weighting of the two influences according to the number of user interactions. The experiments on three real online rating datasets (i.e., MovieLens-2k, Delicious-2k, LastFM-2k) consistently show that, in terms of the cumulative reward over recommendation rounds, and the average Click-Through-Rate, our proposed ADCB and ADCB+ schemes outperform than some existing dynamic clustering based online recommendation methods.
KW - Dynamic clustering
KW - Multi-armed bandit (MAB)
KW - Online learning
KW - Recommender systems
UR - http://www.scopus.com/inward/record.url?scp=85134665584&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85134665584&partnerID=8YFLogxK
U2 - 10.1007/s11063-022-10931-5
DO - 10.1007/s11063-022-10931-5
M3 - Article
AN - SCOPUS:85134665584
SN - 1370-4621
VL - 55
SP - 1155
EP - 1172
JO - Neural Processing Letters
JF - Neural Processing Letters
IS - 2
ER -