Publication Type
Journal Article
Version
publishedVersion
Publication Date
3-2004
Abstract
Elucidation of the interaction of proteins with different molecules is of significance in the understanding of cellular processes. Computational methods have been developed for the prediction of protein-protein interactions. But insufficient attention has been paid to the prediction of protein-RNA interactions, which play central roles in regulating gene expression and certain RNA-mediated enzymatic processes. This work explored the use of a machine learning method, support vector machines (SVM), for the prediction of RNA-binding proteins directly from their primary sequence. Based on the knowledge of known RNA-binding and non-RNA-binding proteins, an SVM system was trained to recognize RNA-binding proteins. A total of 4011 RNA-binding and 9781 non-RNA-binding proteins was used to train and test the SVM classification system, and an independent set of 447 RNA-binding and 4881 non-RNA-binding proteins was used to evaluate the classification accuracy. Testing results using this independent evaluation set show a prediction accuracy of 94.1%, 79.3%, and 94.1% for rRNA-, mRNA-, and tRNA-binding proteins, and 98.7%, 96.5%, and 99.9% for non-rRNA-, non-mRNA-, and non-tRNA-binding proteins, respectively. The SVM classification system was further tested on a small class of snRNA-binding proteins with only 60 available sequences. The prediction accuracy is 40.0% and 99.9% for snRNA-binding and non-snRNA-binding proteins, indicating a need for a sufficient number of proteins to train SVM. The SVM classification systems trained in this work were added to our Web-based protein functional classification software SVMProt, at http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi. Our study suggests the potential of SVM as a useful tool for facilitating the prediction of protein-RNA interactions.
Keywords
RNA-binding proteins, RNA-protein interactions, rRNA, mRNA, tRNA, snRNA, support vector machine
Discipline
Bioinformatics | Computer Sciences | Life Sciences
Research Areas
Data Science and Engineering
Publication
RNA
Volume
10
Issue
3
First Page
355
Last Page
368
ISSN
1355-8382
Identifier
10.1261/rna.5890304
Publisher
Cold Spring Harbor Laboratory Press
Citation
HAN, Lian Yi; CAI, Cong Zhong; LO, Siaw Ling; CHUNG, Maxey; and CHEN, Yu Zong.
Prediction of RNA-binding proteins from primary sequence by a support vector machine approach.. (2004). RNA. 10, (3), 355-368.
Available at: https://ink.library.smu.edu.sg/sis_research/4876
Copyright Owner and License
Publisher
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1261/rna.5890304