Publication Type
Journal Article
Version
publishedVersion
Publication Date
12-2017
Abstract
Protein-protein interaction (PPI) networks play an important role in studying the functional roles of proteins, including their association with diseases. However, protein interaction networks are not sufficient without the support of additional biological knowledge for proteins such as their molecular functions and biological processes. To complement and enrich PPI networks, we propose to exploit biological properties of individual proteins. More specifically, we integrate keywords describing protein properties into the PPI network, and construct a novel PPI-Keywords (PPIK) network consisting of both proteins and keywords as two different types of nodes. As disease proteins tend to have a similar topological characteristics on the PPIK network, we further propose to represent proteins with metagraphs. Different from a traditional network motif or subgraph, a metagraph can capture a particular topological arrangement involving the interactions/ associations between both proteins and keywords. Based on the novel metagraph representations for proteins, we further build classifiers for disease protein classification through supervised learning. Our experiments on three different PPI databases demonstrate that the proposed method consistently improves disease protein prediction across various classifiers, by 15.3% in AUC on average. It outperforms the baselines including the diffusion-based methods (e.g., RWR) and the module-based methods by 13.8–32.9% for overall disease protein prediction. For predicting breast cancer genes, it outperforms RWR, PRINCE and the module-based baselines by 6.6–14.2%. Finally, our predictions also turn out to have better correlations with literature findings from PubMed.
Keywords
Disease protein prediction, Metagraph, Protein representations, Protein-protein interaction, Uniprot keywords
Discipline
Databases and Information Systems | Medicine and Health Sciences
Research Areas
Data Science and Engineering
Publication
Methods
Volume
131
First Page
83
Last Page
92
ISSN
1046-023
Identifier
10.1016/j.ymeth.2017.06.036
Publisher
Elsevier
Citation
KIRCALI ATA, Sezin; FANG, Yuan; WU, Min; LI, Xiao-Li; and XIAO, Xiaokui.
Disease gene classification with metagraph representations. (2017). Methods. 131, 83-92.
Available at: https://ink.library.smu.edu.sg/sis_research/4068
Copyright Owner and License
Authors
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1016/j.ymeth.2017.06.036