TY - JOUR
T1 - Inverse Potts model improves accuracy of phylogenetic profiling
AU - Fukunaga, Tsukasa
AU - Iwasaki, Wataru
N1 - Publisher Copyright:
© 2022 The Author(s) 2022. Published by Oxford University Press.
PY - 2022/4/1
Y1 - 2022/4/1
N2 - Motivation: Phylogenetic profiling is a powerful computational method for revealing the functions of function-unknown genes. Although conventional similarity metrics in phylogenetic profiling achieved high prediction accuracy, they have two estimation biases: an evolutionary bias and a spurious correlation bias. While previous studies reduced the evolutionary bias by considering a phylogenetic tree, few studies have analyzed the spurious correlation bias. Results: To reduce the spurious correlation bias, we developed metrics based on the inverse Potts model (IPM) for phylogenetic profiling. We also developed a metric based on both the IPM and a phylogenetic tree. In an empirical dataset analysis, we demonstrated that these IPM-based metrics improved the prediction performance of phylogenetic profiling. In addition, we found that the integration of several metrics, including the IPM-based metrics, had superior performance to a single metric.
AB - Motivation: Phylogenetic profiling is a powerful computational method for revealing the functions of function-unknown genes. Although conventional similarity metrics in phylogenetic profiling achieved high prediction accuracy, they have two estimation biases: an evolutionary bias and a spurious correlation bias. While previous studies reduced the evolutionary bias by considering a phylogenetic tree, few studies have analyzed the spurious correlation bias. Results: To reduce the spurious correlation bias, we developed metrics based on the inverse Potts model (IPM) for phylogenetic profiling. We also developed a metric based on both the IPM and a phylogenetic tree. In an empirical dataset analysis, we demonstrated that these IPM-based metrics improved the prediction performance of phylogenetic profiling. In addition, we found that the integration of several metrics, including the IPM-based metrics, had superior performance to a single metric.
UR - http://www.scopus.com/inward/record.url?scp=85128418312&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85128418312&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btac034
DO - 10.1093/bioinformatics/btac034
M3 - Article
C2 - 35060594
AN - SCOPUS:85128418312
SN - 1367-4803
VL - 38
SP - 1794
EP - 1800
JO - Bioinformatics
JF - Bioinformatics
IS - 7
ER -