Publication Type
Book Chapter
Version
submittedVersion
Publication Date
7-2018
Abstract
This chapter is based on exploiting the network-based representations of proteins, metagraphs, in protein-protein interaction network to identify candidate disease-causing proteins. Protein-protein interaction (PPI) networks are effective tools in studying the functional roles of proteins in the development of various diseases. However, they are insufficient without the support of additional biological knowledge for proteins such as their molecular functions and biological processes. To enhance PPI networks, we utilize biological properties of individual proteins as well. More specifically, we integrate keywords from UniProt database describing protein properties into the PPI network and construct a novel heterogeneous PPI-Keyword (PPIK) network consisting of both proteins and keywords. As proteins with similar functional duties or involving in the same metabolic pathway tend to have similar topological characteristics, we propose to represent them with metagraphs. Compared to the traditional network motif or subgraph, a metagraph can capture the topological arrangements through not only the protein-protein interactions but also protein-keyword associations. We feed those novel metagraph representations into classifiers for disease protein prediction and conduct our experiments on three different PPI databases. They show that the proposed method consistently increases disease protein prediction performance across various classifiers, by 15.3% in AUC on average. It outperforms the diffusion-based (e.g., RWR) and the module-based baselines by 13.8–32.9% in overall disease protein prediction. Breast cancer protein prediction outperforms RWR, PRINCE, and the module-based baselines by 6.6–14.2%. Finally, our predictions also exhibit better correlations with literature findings from PubMed database.
Keywords
Disease protein prediction, Metagraph, Protein representations, Protein-protein interaction, Uniprot keywords
Discipline
Databases and Information Systems | Medicine and Health Sciences
Research Areas
Data Science and Engineering
Publication
Data mining for systems biology: Methods and protocols
Editor
Hiroshi Mamitsuka
First Page
211
Last Page
224
ISBN
9781627031073
Identifier
10.1007/978-1-4939-8561-6_16
Edition
2nd ed
Publisher
Humana Press
City or Country
New York
Citation
KIRCALI ATA, Sezin; FANG, Yuan; WU, Min; LI, Xiao-Li; and XIAO, Xiaokui.
Disease gene classification with metagraph representations. (2018). Data mining for systems biology: Methods and protocols. 211-224.
Available at: https://ink.library.smu.edu.sg/sis_research/4230
Copyright Owner and License
Authors
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1007/978-1-4939-8561-6_16