Publication Type
Journal Article
Version
publishedVersion
Publication Date
6-2020
Abstract
Motivation: Identification of virulence factors (VFs) is critical to the elucidation of bacterial pathogenesis and prevention of related infectious diseases. Current computational methods for VF prediction focus on binary classification or involve only several class(es) of VFs with sufficient samples. However, thousands of VF classes are present in real-world scenarios, and many of them only have a very limited number of samples available.Results: We first construct a large VF dataset, covering 3446 VF classes with 160 495 sequences, and then propose deep convolutional neural network models for VF classification. We show that (i) for common VF classes with sufficient samples, our models can achieve state-of-the-art performance with an overall accuracy of 0.9831 and an F1-score of 0.9803; (ii) for uncommon VF classes with limited samples, our models can learn transferable features from auxiliary data and achieve good performance with accuracy ranging from 0.9277 to 0.9512 and F1-score ranging from 0.9168 to 0.9446 when combined with different predefined features, outperforming traditional classifiers by 1-13% in accuracy and by 1-16% in F1-score.Availability and implementation: All of our datasets are made publicly available at http://www.mgc.ac.cn/VFNet/, and the source code of our models is publicly available at https://github.com/zhengdd0422/VFNet.Supplementary information: Supplementary data are available at Bioinformatics online.
Discipline
Artificial Intelligence and Robotics | OS and Networks
Research Areas
Intelligent Systems and Optimization
Publication
Bioinformatics
Volume
36
Issue
12
First Page
3693
Last Page
3702
ISSN
1367-4803
Identifier
10.1093/bioinformatics/btaa230
Publisher
Oxford University Press (OUP): Policy B - Oxford Open Option B
Citation
ZHENG, Dandan; PANG, Guansong; LIU, Bo; CHEN, Lihong; and YANG, Jian.
Learning transferable deep convolutional neural networks for the classification of bacterial virulence factors. (2020). Bioinformatics. 36, (12), 3693-3702.
Available at: https://ink.library.smu.edu.sg/sis_research/7038
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.