TY - GEN
T1 - MotiMul
T2 - 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020
AU - Mori, Koichi
AU - Ozaki, Haruka
AU - Fukunaga, Tsukasa
N1 - Funding Information:
This work was supported by Japan Society for the Promotion of Science KAKENHI, Grant Number 19K20395 and 20H05582 to T.F.
Publisher Copyright:
© 2020 IEEE.
PY - 2020/12/16
Y1 - 2020/12/16
N2 - Sequence motifs play essential roles in intermolecular interactions such as DNA-protein interactions. The discovery of novel sequence motifs is therefore crucial for revealing gene functions. Various bioinformatics tools have been developed for finding sequence motifs, but until now there has been no software based on statistical hypothesis testing with statistically sound multiple testing correction. Existing software therefore could not control for the type-l error rates. This is because, in the sequence motif discovery problem, conventional multiple testing correction methods produce very low statistical power due to overly-strict correction. We developed MotiMul, which comprehensively finds significant sequence motifs using statistically sound multiple testing correction. Our key idea is the application of Tarone's correction, which improves the statistical power of the hypothesis test by ignoring hypotheses that never become statistically significant. For the efficient enumeration of the significant sequence motifs, we integrated a variant of the PrefixSpan algorithm with Tarone's correction. Simulation and empirical dataset analysis showed that MotiMul is a powerful method for finding biologically meaningful sequence motifs. The source code of MotiMul is freely available at https://github.com/ko-ichimo-ri/MotiMul.
AB - Sequence motifs play essential roles in intermolecular interactions such as DNA-protein interactions. The discovery of novel sequence motifs is therefore crucial for revealing gene functions. Various bioinformatics tools have been developed for finding sequence motifs, but until now there has been no software based on statistical hypothesis testing with statistically sound multiple testing correction. Existing software therefore could not control for the type-l error rates. This is because, in the sequence motif discovery problem, conventional multiple testing correction methods produce very low statistical power due to overly-strict correction. We developed MotiMul, which comprehensively finds significant sequence motifs using statistically sound multiple testing correction. Our key idea is the application of Tarone's correction, which improves the statistical power of the hypothesis test by ignoring hypotheses that never become statistically significant. For the efficient enumeration of the significant sequence motifs, we integrated a variant of the PrefixSpan algorithm with Tarone's correction. Simulation and empirical dataset analysis showed that MotiMul is a powerful method for finding biologically meaningful sequence motifs. The source code of MotiMul is freely available at https://github.com/ko-ichimo-ri/MotiMul.
KW - ChIP-seq data analysis
KW - frequent pattern mining
KW - multiple testing correction
KW - sequence motif
KW - statistical significance
UR - http://www.scopus.com/inward/record.url?scp=85100341040&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85100341040&partnerID=8YFLogxK
U2 - 10.1109/BIBM49941.2020.9313598
DO - 10.1109/BIBM49941.2020.9313598
M3 - Conference contribution
AN - SCOPUS:85100341040
T3 - Proceedings - 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020
SP - 186
EP - 193
BT - Proceedings - 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020
A2 - Park, Taesung
A2 - Cho, Young-Rae
A2 - Hu, Xiaohua Tony
A2 - Yoo, Illhoi
A2 - Woo, Hyun Goo
A2 - Wang, Jianxin
A2 - Facelli, Julio
A2 - Nam, Seungyoon
A2 - Kang, Mingon
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 16 December 2020 through 19 December 2020
ER -