Publication Type
Conference Proceeding Article
Version
acceptedVersion
Publication Date
5-2023
Abstract
Identifying security patches via code commits to allow early warnings and timely fixes for Open Source Software (OSS) has received increasing attention. However, the existing detection methods can only identify the presence of a patch (i.e., a binary classification) but fail to pinpoint the vulnerability type. In this work, we take the first step to categorize the security patches into fine-grained vulnerability types. Specifically, we use the Common Weakness Enumeration (CWE) as the label and perform fine-grained classification using categories at the third level of the CWE tree. We first formulate the task as a Hierarchical Multi-label Classification (HMC) problem, i.e., inferring a path (a sequence of CWE nodes) from the root of the CWE tree to the node at the target depth. We then propose an approach named TreeVul with a hierarchical and chained architecture, which manages to utilize the structure information of the CWE tree as prior knowledge of the classification task. We further propose a tree structure aware and beam search based inference algorithm for retrieving the optimal path with the highest merged probability. We collect a large security patch dataset from NVD, consisting of 6,541 commits from 1,560 GitHub OSS repositories. Experimental results show that Tree-vulsignificantly outperforms the best performing baselines, with improvements of 5.9%, 25.0%, and 7.7% in terms of weighted F1-score, macro F1-score, and MCC, respectively. We further conduct a user study and a case study to verify the practical value of TreeVul in enriching the binary patch detection results and improving the data quality of NVD, respectively.
Keywords
Codes, Data integrity, Computer architecture, Inference algorithms, Classification algorithms, Software security, Task analysis, Common Weakness Enumeration
Discipline
Artificial Intelligence and Robotics | Information Security
Research Areas
Cybersecurity; Intelligent Systems and Optimization; Software and Cyber-Physical Systems
Publication
Proceedings of the 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20
First Page
957
Last Page
969
ISBN
9781665457026
Identifier
10.1109/ICSE48619.2023.00088
Publisher
IEEE Computer Society
City or Country
New York, NY, USA
Citation
PAN, Shengyi; BAO, Lingfeng; XIA, Xin; LO, David; and LI, Shanping.
Fine-grained commit-level vulnerability type prediction by CWE tree structure. (2023). Proceedings of the 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20. 957-969.
Available at: https://ink.library.smu.edu.sg/sis_research/8511
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1109/ICSE48619.2023.00088