Research Collection School Of Computing and Information Systems

Enhancing code vulnerability detection via vulnerability-preserving data augmentation

Shangqing LIU
Wei MA
Jian WANG
Xiaofei XIE, Singapore Management UniversityFollow
Ruitao FENG, Singapore Management UniversityFollow
Yang LIU

Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

6-2024

Abstract

Source code vulnerability detection aims to identify inherent vulnerabilities to safeguard software systems from potential attacks. Many prior studies overlook diverse vulnerability characteristics, simplifying the problem into a binary (0-1) classification task for example determining whether it is vulnerable or not. This poses a challenge for a single deep-learning based model to effectively learn the wide array of vulnerability characteristics. Furthermore, due to the challenges associated with collecting large-scale vulnerability data, these detectors often overfit limited training datasets, resulting in lower model generalization performance. To address the aforementioned challenges, in this work, we introduce a fine-grained vulnerability detector namely FGVulDet. Unlike previous approaches, FGVulDet employs multiple classifiers to discern characteristics of various vulnerability types and combines their outputs to identify the specific type of vulnerability. Each classifier is designed to learn type-specific vulnerability semantics. Additionally, to address the scarcity of data for some vulnerability types and enhance data diversity for learning better vulnerability semantics, we propose a novel vulnerability-preserving data augmentation technique to augment the number of vulnerabilities. Taking inspiration from recent advancements in graph neural networks for learning program semantics, we incorporate a Gated Graph Neural Network (GGNN) and extend it to an edge-aware GGNN to capture edge-type information. FGVulDet is trained on a large-scale dataset from GitHub, encompassing five different types of vulnerabilities. Extensive experiments compared with static-analysis-based approaches and learning-based approaches have demonstrated the effectiveness of FGVulDet.

Keywords

Graph Neural Networks, Vulnerability Detection

Discipline

Information Security

Research Areas

Cybersecurity

Areas of Excellence

Digital transformation

Publication

LCTES 2024: Proceedings of the 25th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES ’24), June 24, Copenhagen

First Page

166

Last Page

177

ISBN

9798400706165

Identifier

10.1145/3652032.3657564

Publisher

ACM

City or Country

New York

Citation

LIU, Shangqing; MA, Wei; WANG, Jian; XIE, Xiaofei; FENG, Ruitao; and LIU, Yang. Enhancing code vulnerability detection via vulnerability-preserving data augmentation. (2024). LCTES 2024: Proceedings of the 25th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES ’24), June 24, Copenhagen. 166-177.
Available at: https://ink.library.smu.edu.sg/sis_research/9038

Creative Commons License

This work is licensed under a Creative Commons Attribution 3.0 License.

Additional URL

https://doi.org/10.1145/3652032.3657564

Download

Included in

Information Security Commons

COinS

Research Collection School Of Computing and Information Systems

Enhancing code vulnerability detection via vulnerability-preserving data augmentation

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Enhancing code vulnerability detection via vulnerability-preserving data augmentation

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links