Publication Type
Conference Proceeding Article
Version
acceptedVersion
Publication Date
5-2023
Abstract
Tools that alert developers about library vulnerabilities depend on accurate, up-to-date vulnerability databases which are maintained by security researchers. These databases record the libraries related to each vulnerability. However, the vulnerability reports may not explicitly list every library and human analysis is required to determine all the relevant libraries. Human analysis may be slow and expensive, which motivates the need for automated approaches. Researchers and practitioners have proposed to automatically identify libraries from vulnerability reports using extreme multi-label learning (XML). While state-of-the-art XML techniques showed promising performance, their experimental settings do not practically fit what happens in reality. Previous studies randomly split the vulnerability reports data for training and testing their models without considering the chronological order of the reports. This may unduly train the models on chronologically newer reports while testing the models on chronologically older ones. However, in practice, one often receives chronologically new reports, which may be related to previously unseen libraries. Under this practical setting, we observe that the performance of current XML techniques declines substantially, e.g., F1 decreased from 0.7 to 0.24 under experiments without and with consideration of chronological order of vulnerability reports. We propose a practical library identification approach, namely Chronos, based on zero-shot learning. The novelty of Chronos is three-fold. First, Chronos fits into the practical pipeline by considering the chronological order of vulnerability reports. Second, Chronos enriches the data of the vulnerability descriptions and labels using a carefully designed data enhancement step. Third, Chronos exploits the temporal ordering of the vulnerability reports using a cache to prioritize prediction of versions of libraries that recently had reports of vulnerabilities. In our experiments, Chronos achieves an average F1-score of 0.75, 3x better than the best XML-based approach. Data enhancement and the time-aware adjustment improve Chronos over the vanilla zero-shot learning model by 27% in average F1.
Keywords
Extreme multi-label classification, Library identification, Unseen labels, Vulnerability reports, Zero-shot learning
Discipline
Artificial Intelligence and Robotics | Databases and Information Systems | Graphics and Human Computer Interfaces
Research Areas
Data Science and Engineering; Information Systems and Management
Publication
45th IEEE/ACM International Conference on Software Engineering, ICSE 2023
ISBN
9781665457019
Identifier
10.1109/ICSE48619.2023.00094
City or Country
MELBOURNE, AUSTRALIA
Citation
LYU, Yunbo; CONG, Thanh Le; KANG, Hong Jin; WIDYASARI, Ratnadira; ZHAO, Zhipeng; LE, Xuan-Bach Dinh; LI, Ming; and David LO.
CHRONOS: Time-aware zero-shot identification of libraries from vulnerability reports. (2023). 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023.
Available at: https://ink.library.smu.edu.sg/sis_research/8512
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Included in
Artificial Intelligence and Robotics Commons, Databases and Information Systems Commons, Graphics and Human Computer Interfaces Commons