TY - JOUR
T1 - A semi-supervised learning approach for RNA secondary structure prediction
AU - Yonemoto, Haruka
AU - Asai, Kiyoshi
AU - Hamada, Michiaki
N1 - Funding Information:
This work was supported in part by MEXT KAKENHI (Grant-in-Aid for Young Scientists (A) Grant Number 24680031 for MH; Grant-in-Aid for Scientific Research (A) Grant Number 25240044 for MH and KA)).
Publisher Copyright:
© 2015 Elsevier Ltd. All rights reserved.
PY - 2015/5/16
Y1 - 2015/5/16
N2 - RNA secondary structure prediction is a key technology in RNA bioinformatics. Most algorithms for RNA secondary structure prediction use probabilistic models, in which the model parameters are trained with reliable RNA secondary structures. Because of the difficulty of determining RNA secondary structures by experimental procedures, such as NMR or X-ray crystal structural analyses, there are still many RNA sequences that could be useful for training whose secondary structures have not been experimentally determined. In this paper, we introduce a novel semi-supervised learning approach for training parameters in a probabilistic model of RNA secondary structures in which we employ not only RNA sequences with annotated secondary structures but also ones with unknown secondary structures. Our model is based on a hybrid of generative (stochastic context-free grammars) and discriminative models (conditional random fields) that has been successfully applied to natural language processing. Computational experiments indicate that the accuracy of secondary structure prediction is improved by incorporating RNA sequences with unknown secondary structures into training. To our knowledge, this is the first study of a semi-supervised learning approach for RNA secondary structure prediction. This technique will be useful when the number of reliable structures is limited.
AB - RNA secondary structure prediction is a key technology in RNA bioinformatics. Most algorithms for RNA secondary structure prediction use probabilistic models, in which the model parameters are trained with reliable RNA secondary structures. Because of the difficulty of determining RNA secondary structures by experimental procedures, such as NMR or X-ray crystal structural analyses, there are still many RNA sequences that could be useful for training whose secondary structures have not been experimentally determined. In this paper, we introduce a novel semi-supervised learning approach for training parameters in a probabilistic model of RNA secondary structures in which we employ not only RNA sequences with annotated secondary structures but also ones with unknown secondary structures. Our model is based on a hybrid of generative (stochastic context-free grammars) and discriminative models (conditional random fields) that has been successfully applied to natural language processing. Computational experiments indicate that the accuracy of secondary structure prediction is improved by incorporating RNA sequences with unknown secondary structures into training. To our knowledge, this is the first study of a semi-supervised learning approach for RNA secondary structure prediction. This technique will be useful when the number of reliable structures is limited.
KW - Parameter learning
KW - RNA secondary structure
KW - Semi-supervised learning
UR - http://www.scopus.com/inward/record.url?scp=84939599043&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84939599043&partnerID=8YFLogxK
U2 - 10.1016/j.compbiolchem.2015.02.002
DO - 10.1016/j.compbiolchem.2015.02.002
M3 - Article
C2 - 25748534
AN - SCOPUS:84939599043
SN - 1476-9271
VL - 57
SP - 72
EP - 79
JO - Computational Biology and Chemistry
JF - Computational Biology and Chemistry
ER -