Shield broken: Black-box adversarial attacks on LLM-based vulnerability detectors
Publication Type
Journal Article
Publication Date
1-2026
Abstract
Vulnerability detection is critical for ensuring software security. Although deep learning (DL) methods, particularly those employing large language models (LLMs), have shown strong performance in automating vulnerability identification, they remain susceptible to adversarial examples, which are carefully crafted inputs with subtle perturbations designed to evade detection. Existing adversarial attack methods often require access to model architectures or confidence scores, making them impractical for real-world black-box systems. In this paper, we propose SVulAttack, a novel label-only adversarial attack framework targeting LLM-based vulnerability detectors. Our key innovation lies in a similarity-based strategy that estimates statement importance and model confidence, thereby enabling more effective selection of semantic-preserving code perturbations. SVulAttack combines this strategy with a transformation component and a search component, based on either greedy or genetic algorithms, to effectively identify and apply optimal combinations of transformations. We evaluate SVulAttack on open-source models (LineVul, StagedVulBERT, Code Llama, Deepseek-Coder) and closed-source models (GPT-5 nano, GPT-4o, GPT-4o-mini, Claude Sonnet 4). Results show that SVulAttack significantly outperforms existing label-only black-box attack methods. For example, against LineVul, our method with genetic algorithm achieves an attack success rate of 49.0%, improving over DIP and CODA by 150.0% and 240.3%, respectively.
Keywords
Codes, Perturbation Methods, Detectors, Closed Box, Security, Predictive Models, Electronics Packaging, Robustness, Genetic Algorithms, Threat Modeling, Adversarial Examples, Vulnerability Detection, Black Box Attack, Optimization Algorithm, Adversarial Attacks, Vulnerability Detection, Black Box Adversarial Attack, Confidence Score, Language Model, Source Model, Real World Systems, Attack Success, Adversarial Examples, Attack Methods, Attack Success Rate, Black Box Attacks, Prediction Model, Alternative Models, Search Algorithm, Search Method, Generative Adversarial Networks, Decision Boundary, Target Model, Output Model, Greedy Search Algorithm, Model Confidence, Monte Carlo Tree Search, Data For Model Training, Attack Performance, Number Of Modifications, Lexical Analysis, Artificial Intelligence Training, Code Representation, Random Search
Discipline
Artificial Intelligence and Robotics | Software Engineering
Publication
IEEE Transactions on Software Engineering
Volume
52
First Page
246
Last Page
265
ISSN
0098-5589
Identifier
10.1109/TSE.2025.3638998
Publisher
Institute of Electrical and Electronics Engineers
Citation
JIANG, Yuan; HUANG, Shan; TREUDE, Christoph; SU, Xiaohong; and WANG, Tiantian.
Shield broken: Black-box adversarial attacks on LLM-based vulnerability detectors. (2026). IEEE Transactions on Software Engineering. 52, 246-265.
Available at: https://ink.library.smu.edu.sg/sis_research/10815
Additional URL
https://www.researchgate.net/publication/398260447_Shield_Broken_Black-Box_Adversarial_Attacks_on_LLM-Based_Vulnerability_Detectors