TY - JOUR
T1 - Beyond similarity assessment
T2 - Selecting the optimal model for sequence alignment via the Factorized Asymptotic Bayesian algorithm
AU - Takeda, Taikai
AU - Hamada, Michiaki
N1 - Funding Information:
This work was supported by MEXT KAKENHI [grant numbers JP24680031, JP16H05879 and JP25240044 to M.H.], in part.
Publisher Copyright:
© The Author 2017. Published by Oxford University Press.
PY - 2018/2/15
Y1 - 2018/2/15
N2 - Motivation Pair Hidden Markov Models (PHMMs) are probabilistic models used for pairwise sequence alignment, a quintessential problem in bioinformatics. PHMMs include three types of hidden states: match, insertion and deletion. Most previous studies have used one or two hidden states for each PHMM state type. However, few studies have examined the number of states suitable for representing sequence data or improving alignment accuracy. Results We developed a novel method to select superior models (including the number of hidden states) for PHMM. Our method selects models with the highest posterior probability using Factorized Information Criterion, which is widely utilized in model selection for probabilistic models with hidden variables. Our simulations indicated that this method has excellent model selection capabilities with slightly improved alignment accuracy. We applied our method to DNA datasets from 5 and 28 species, ultimately selecting more complex models than those used in previous studies. Availability and implementation The software is available at https://github.com/bigsea-t/fab-phmm. Contact mhamada@waseda.jp Supplementary informationSupplementary dataare available at Bioinformatics online.
AB - Motivation Pair Hidden Markov Models (PHMMs) are probabilistic models used for pairwise sequence alignment, a quintessential problem in bioinformatics. PHMMs include three types of hidden states: match, insertion and deletion. Most previous studies have used one or two hidden states for each PHMM state type. However, few studies have examined the number of states suitable for representing sequence data or improving alignment accuracy. Results We developed a novel method to select superior models (including the number of hidden states) for PHMM. Our method selects models with the highest posterior probability using Factorized Information Criterion, which is widely utilized in model selection for probabilistic models with hidden variables. Our simulations indicated that this method has excellent model selection capabilities with slightly improved alignment accuracy. We applied our method to DNA datasets from 5 and 28 species, ultimately selecting more complex models than those used in previous studies. Availability and implementation The software is available at https://github.com/bigsea-t/fab-phmm. Contact mhamada@waseda.jp Supplementary informationSupplementary dataare available at Bioinformatics online.
UR - http://www.scopus.com/inward/record.url?scp=85042537448&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85042537448&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btx643
DO - 10.1093/bioinformatics/btx643
M3 - Article
C2 - 29040374
AN - SCOPUS:85042537448
SN - 1367-4803
VL - 34
SP - 576
EP - 584
JO - Bioinformatics
JF - Bioinformatics
IS - 4
ER -