TY - JOUR
T1 - MCIC
T2 - Automated Identification of Cellulases From Metagenomic Data and Characterization Based on Temperature and pH Dependence
AU - Foroozandeh Shahraki, Mehdi
AU - Ariaeenejad, Shohreh
AU - Fallah Atanaki, Fereshteh
AU - Zolfaghari, Behrouz
AU - Koshiba, Takeshi
AU - Kavousi, Kaveh
AU - Salekdeh, Ghasem Hosseini
N1 - Funding Information:
We give thanks to the supports from members of the Laboratory of Complex Biological Systems and Bioinformatics (CBB) who contributed to this study. Funding. This research was supported by grants from Agricultural Biotechnology Research Institute of Iran (ABRII).
Funding Information:
This research was supported by grants from Agricultural Biotechnology Research Institute of Iran (ABRII).
Publisher Copyright:
© Copyright © 2020 Foroozandeh Shahraki, Ariaeenejad, Fallah Atanaki, Zolfaghari, Koshiba, Kavousi and Salekdeh.
PY - 2020/10/23
Y1 - 2020/10/23
N2 - As the availability of high-throughput metagenomic data is increasing, agile and accurate tools are required to analyze and exploit this valuable and plentiful resource. Cellulose-degrading enzymes have various applications, and finding appropriate cellulases for different purposes is becoming increasingly challenging. An in silico screening method for high-throughput data can be of great assistance when combined with the characterization of thermal and pH dependence. By this means, various metagenomic sources with high cellulolytic potentials can be explored. Using a sequence similarity-based annotation and an ensemble of supervised learning algorithms, this study aims to identify and characterize cellulolytic enzymes from a given high-throughput metagenomic data based on optimum temperature and pH. The prediction performance of MCIC (metagenome cellulase identification and characterization) was evaluated through multiple iterations of sixfold cross-validation tests. This tool was also implemented for a comparative analysis of four metagenomic sources to estimate their cellulolytic profile and capabilities. For experimental validation of MCIC’s screening and prediction abilities, two identified enzymes from cattle rumen were subjected to cloning, expression, and characterization. To the best of our knowledge, this is the first time that a sequence-similarity based method is used alongside an ensemble machine learning model to identify and characterize cellulase enzymes from extensive metagenomic data. This study highlights the strength of machine learning techniques to predict enzymatic properties solely based on their sequence. MCIC is freely available as a python package and standalone toolkit for Windows and Linux-based operating systems with several functions to facilitate the screening and thermal and pH dependence prediction of cellulases.
AB - As the availability of high-throughput metagenomic data is increasing, agile and accurate tools are required to analyze and exploit this valuable and plentiful resource. Cellulose-degrading enzymes have various applications, and finding appropriate cellulases for different purposes is becoming increasingly challenging. An in silico screening method for high-throughput data can be of great assistance when combined with the characterization of thermal and pH dependence. By this means, various metagenomic sources with high cellulolytic potentials can be explored. Using a sequence similarity-based annotation and an ensemble of supervised learning algorithms, this study aims to identify and characterize cellulolytic enzymes from a given high-throughput metagenomic data based on optimum temperature and pH. The prediction performance of MCIC (metagenome cellulase identification and characterization) was evaluated through multiple iterations of sixfold cross-validation tests. This tool was also implemented for a comparative analysis of four metagenomic sources to estimate their cellulolytic profile and capabilities. For experimental validation of MCIC’s screening and prediction abilities, two identified enzymes from cattle rumen were subjected to cloning, expression, and characterization. To the best of our knowledge, this is the first time that a sequence-similarity based method is used alongside an ensemble machine learning model to identify and characterize cellulase enzymes from extensive metagenomic data. This study highlights the strength of machine learning techniques to predict enzymatic properties solely based on their sequence. MCIC is freely available as a python package and standalone toolkit for Windows and Linux-based operating systems with several functions to facilitate the screening and thermal and pH dependence prediction of cellulases.
KW - MCIC
KW - cellulase
KW - enzyme screening
KW - machine learning
KW - metagenomics
KW - optimum pH
KW - optimum temperature
UR - http://www.scopus.com/inward/record.url?scp=85095603196&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85095603196&partnerID=8YFLogxK
U2 - 10.3389/fmicb.2020.567863
DO - 10.3389/fmicb.2020.567863
M3 - Article
AN - SCOPUS:85095603196
SN - 1664-302X
VL - 11
JO - Frontiers in Microbiology
JF - Frontiers in Microbiology
M1 - 567863
ER -