Shield broken: Black-box adversarial attacks on LLM-based vulnerability detectors

Publication Type

Journal Article

Publication Date

1-2026

Abstract

Vulnerability detection is critical for ensuring software security. Although deep learning (DL) methods, particularly those employing large language models (LLMs), have shown strong performance in automating vulnerability identification, they remain susceptible to adversarial examples, which are carefully crafted inputs with subtle perturbations designed to evade detection. Existing adversarial attack methods often require access to model architectures or confidence scores, making them impractical for real-world black-box systems. In this paper, we propose SVulAttack, a novel label-only adversarial attack framework targeting LLM-based vulnerability detectors. Our key innovation lies in a similarity-based strategy that estimates statement importance and model confidence, thereby enabling more effective selection of semantic-preserving code perturbations. SVulAttack combines this strategy with a transformation component and a search component, based on either greedy or genetic algorithms, to effectively identify and apply optimal combinations of transformations. We evaluate SVulAttack on open-source models (LineVul, StagedVulBERT, Code Llama, Deepseek-Coder) and closed-source models (GPT-5 nano, GPT-4o, GPT-4o-mini, Claude Sonnet 4). Results show that SVulAttack significantly outperforms existing label-only black-box attack methods. For example, against LineVul, our method with genetic algorithm achieves an attack success rate of 49.0%, improving over DIP and CODA by 150.0% and 240.3%, respectively.

Keywords

Codes, Perturbation Methods, Detectors, Closed Box, Security, Predictive Models, Electronics Packaging, Robustness, Genetic Algorithms, Threat Modeling, Adversarial Examples, Vulnerability Detection, Black Box Attack, Optimization Algorithm, Adversarial Attacks, Vulnerability Detection, Black Box Adversarial Attack, Confidence Score, Language Model, Source Model, Real World Systems, Attack Success, Adversarial Examples, Attack Methods, Attack Success Rate, Black Box Attacks, Prediction Model, Alternative Models, Search Algorithm, Search Method, Generative Adversarial Networks, Decision Boundary, Target Model, Output Model, Greedy Search Algorithm, Model Confidence, Monte Carlo Tree Search, Data For Model Training, Attack Performance, Number Of Modifications, Lexical Analysis, Artificial Intelligence Training, Code Representation, Random Search

Discipline

Artificial Intelligence and Robotics | Software Engineering

Publication

IEEE Transactions on Software Engineering

Volume

52

First Page

246

Last Page

265

ISSN

0098-5589

Identifier

10.1109/TSE.2025.3638998

Publisher

Institute of Electrical and Electronics Engineers

Additional URL

https://www.researchgate.net/publication/398260447_Shield_Broken_Black-Box_Adversarial_Attacks_on_LLM-Based_Vulnerability_Detectors

This document is currently not available here.

Share

COinS