Research Collection School Of Computing and Information Systems

Effect of training datasets on support vector machine prediction of protein-protein interactions

Publication Type

Journal Article

Version

publishedVersion

Publication Date

3-2005

Abstract

Knowledge of protein-protein interaction is useful for elucidating protein function via the concept of 'guilt-by-association'. A statistical learning method, Support Vector Machine (SVM), has recently been explored for the prediction of protein-protein interactions using artificial shuffled sequences as hypothetical noninteracting proteins and it has shown promising results (Bock, J. R., Gough, D. A., Bioinformatics 2001, 17, 455-460). It remains unclear however, how the prediction accuracy is affected if real protein sequences are used to represent noninteracting proteins. In this work, this effect is assessed by comparison of the results derived from the use of real protein sequences with that derived from the use of shuffled sequences. The real protein sequences of hypothetical noninteracting proteins are generated from an exclusion analysis in combination with subcellular localization information of interacting proteins found in the Database of Interacting Proteins. Prediction accuracy using real protein sequences is 76.9% compared to 94.1% using artificial shuffled sequences. The discrepancy likely arises from the expected higher level of difficulty for separating two sets of real protein sequences than that for separating a set of real protein sequences from a set of artificial sequences. The use of real protein sequences for training a SVM classification system is expected to give better prediction results in practical cases. This is tested by using both SVM systems for predicting putative protein partners of a set of thioredoxin related proteins. The prediction results are consistent with observations, suggesting that real sequence is more practically useful in development of SVM classification system for facilitating protein-protein interaction prediction.

Keywords

Database of interacting proteins, Protein function prediction, Protein-protein interaction prediction, Shuffled sequence, Support vector machine, SVMlight

Discipline

Computer Engineering | Data Storage Systems

Research Areas

Data Science and Engineering

Publication

Proteomics

Volume

Issue

First Page

876

Last Page

884

ISSN

1615-9853

Identifier

10.1002/pmic.200401118

Publisher

Wiley: 12 months

Citation

LO, Siaw Ling; CAI, Cong Zhong; CHUNG, Maxey; and CHEN, Yu Zong. Effect of training datasets on support vector machine prediction of protein-protein interactions. (2005). Proteomics. 5, (4), 876-884.
Available at: https://ink.library.smu.edu.sg/sis_research/4874

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1002/pmic.200401118

Download

Find it in your library

Included in

Data Storage Systems Commons

COinS

Research Collection School Of Computing and Information Systems

Effect of training datasets on support vector machine prediction of protein-protein interactions

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Effect of training datasets on support vector machine prediction of protein-protein interactions

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links