Since the large-scale protein sequence data is available, applying deep neural networks to mine better features from the sequences becomes possible. Eukaryotic protein subcellular localization prediction which makes a contribution in many biology process, has used protein sequences in many automatic predicting methods. Moreover, gene ontology (GO) annotation has been shown to be helpful in improving the prediction accuracy of subcellular localization. However, experimentally annotated proteins are not always available. On the other hand, experimentally annotated proteins are available for certain species such as human, mouse, Arabidopsis thaliana, etc. It is highly motivated to perform deep learning of GO annotations on the available experimentally annotated proteins and to transfer it to subcellular localization prediction on other species. In this paper, we propose a deep protein subcellular localization predictor, consisting of a linear classifier and a deep feature extractor of convolution neural network (CNN). The deep CNN feature extractor is first shared and pre-trained in a deep GO annotation predictor, and then is transferred to the subcellular localization predictor with fine-tuning using protein localization samples. In this way, we have a deep protein subcellular localization predictor enhanced with transfer learning of GO annotation. The proposed method has good performances on the Swiss-Prot datasets, when transfer learning using the protein samples both within and out species. Moreover, it outperforms the state-of-the-art traditional methods on benchmark datasets.
|IEEJ Transactions on Electrical and Electronic Engineering
|Published - 2021 4月
ASJC Scopus subject areas