TY - JOUR
T1 - Discovering novel mutation signatures by latent Dirichlet allocation with variational Bayes inference
AU - Matsutani, Taro
AU - Ueno, Yuki
AU - Fukunaga, Tsukasa
AU - Hamada, Michiaki
N1 - Funding Information:
This work was supported by the Ministry of Education, Culture, Sports, Science and Technology (MEXT) [KAKENHI grant numbers JP18KT0016, JP17K20032, JP16H05879, JP16H01318 and JP16H02484 to M.H.] JST CREST Grant Number JPMJCR1881, Japan and by a Waseda University Grant for Special Research Projects (Project Number: 2017A-506).
Publisher Copyright:
© The Author(s) 2019. Published by Oxford University Press.
PY - 2019/11/1
Y1 - 2019/11/1
N2 - A cancer genome includes many mutations derived from various mutagens and mutational processes, leading to specific mutation patterns. It is known that each mutational process leads to characteristic mutations, and when a mutational process has preferences for mutations, this situation is called a 'mutation signature.' Identification of mutation signatures is an important task for elucidation of carcinogenic mechanisms. In previous studies, analyses with statistical approaches (e.g. non-negative matrix factorization and latent Dirichlet allocation) revealed a number of mutation signatures. Nonetheless, strictly speaking, these existing approaches employ an ad hoc method or incorrect approximation to estimate the number of mutation signatures, and the whole picture of mutation signatures is unclear. Results: In this study, we present a novel method for estimating the number of mutation signatures- latent Dirichlet allocation with variational Bayes inference (VB-LDA)-where variational lower bounds are utilized for finding a plausible number of mutation patterns. In addition, we performed cluster analyses for estimated mutation signatures to extract novel mutation signatures that appear in multiple primary lesions. In a simulation with artificial data, we confirmed that our method estimated the correct number of mutation signatures. Furthermore, applying our method in combination with clustering procedures for real mutation data revealed many interesting mutation signatures that have not been previously reported.
AB - A cancer genome includes many mutations derived from various mutagens and mutational processes, leading to specific mutation patterns. It is known that each mutational process leads to characteristic mutations, and when a mutational process has preferences for mutations, this situation is called a 'mutation signature.' Identification of mutation signatures is an important task for elucidation of carcinogenic mechanisms. In previous studies, analyses with statistical approaches (e.g. non-negative matrix factorization and latent Dirichlet allocation) revealed a number of mutation signatures. Nonetheless, strictly speaking, these existing approaches employ an ad hoc method or incorrect approximation to estimate the number of mutation signatures, and the whole picture of mutation signatures is unclear. Results: In this study, we present a novel method for estimating the number of mutation signatures- latent Dirichlet allocation with variational Bayes inference (VB-LDA)-where variational lower bounds are utilized for finding a plausible number of mutation patterns. In addition, we performed cluster analyses for estimated mutation signatures to extract novel mutation signatures that appear in multiple primary lesions. In a simulation with artificial data, we confirmed that our method estimated the correct number of mutation signatures. Furthermore, applying our method in combination with clustering procedures for real mutation data revealed many interesting mutation signatures that have not been previously reported.
UR - http://www.scopus.com/inward/record.url?scp=85074964056&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85074964056&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btz266
DO - 10.1093/bioinformatics/btz266
M3 - Article
C2 - 30993319
AN - SCOPUS:85074964056
SN - 1367-4803
VL - 35
SP - 4543
EP - 4552
JO - Bioinformatics
JF - Bioinformatics
IS - 22
ER -