Research Collection School Of Computing and Information Systems

Cost-effective adversarial attacks against Code LLM with model attention

Publication Type

Journal Article

Publication Date

2-2026

Abstract

Code LLMs (CLLMs) are vulnerable to adversarial attacks, where semantically identical code mutations mislead models into incorrect predictions. To address this, adversarial training has been proposed, retraining models with adversarial examples generated by attack methods. Among various attack approaches, black-box methods have attracted increasing attention due to their flexibility and applicability. However, existing black-box attack methods face two key challenges: 1) vast mutation spaces limit attack efficiency and effectiveness, and 2) resource-intensive model queries constrain scalability. These challenges hinder the practicality of black-box attacks, especially under resource constraints, prompting the critical question: Can we enhance the efficiency of existing attack methods without compromising their effectiveness? To answer this, we conduct an empirical study using Explainable AI (XAI) techniques to investigate differences between adversarial and non-adversarial (failure) examples. After analyzing state-of-the-art attack methods against two CLLMs, we introduce the concept of model attention deviation, which quantifies differences in the model’s focus between unmutated (original) and mutated code. Our findings reveal that adversarial examples exhibit significant attention deviations, with the direction of deviation critically affecting attack success. Building on these insights, we propose ADVSEL, an efficient adversarial attack framework comprising two proxy components: the Attention Proxy Model (APM), which quickly estimates attention deviations to filter unpromising mutations, and the Deviation Direction Proxy Model (DDPM), which assesses whether attention shifts lead toward incorrect predictions. By integrating these proxy models with existing attack methods, ADVSEL effectively prioritizes promising mutations, significantly improving attack efficiency. Experimental evaluations across five CLLMs, four downstream tasks, and three attack methods demonstrate that ADVSEL maintains comparable attack success rates (a slight ASR reduction of 0.62%–0.70%) while significantly reducing model queries (by 34.98%–42.91%) and runtime (by 20.84%–21.45%). Under resource constraints, ADVSEL consistently outperforms baselines, highlighting its practical advantage in cost-effective adversarial evaluation.

Keywords

Adversarial Attack, Code LLM, Explainability

Discipline

Artificial Intelligence and Robotics | Information Security | Software Engineering

Research Areas

Software and Cyber-Physical Systems

Publication

IEEE Transactions on Software Engineering

First Page

Last Page

ISSN

0098-5589

Identifier

10.1109/TSE.2026.3663143

Publisher

Institute of Electrical and Electronics Engineers

Citation

SUN, Weifeng; HUANG, Naiqi; YAN, Meng; HUANG, Li; LIU, Zhangxin; LIU, Xiao; and LO, David. Cost-effective adversarial attacks against Code LLM with model attention. (2026). IEEE Transactions on Software Engineering. 1-19.
Available at: https://ink.library.smu.edu.sg/sis_research/11038

Additional URL

https://doi.org/10.1109/TSE.2026.3663143

This document is currently not available here.

COinS

Research Collection School Of Computing and Information Systems

Cost-effective adversarial attacks against Code LLM with model attention

Publication Type

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Additional URL

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Cost-effective adversarial attacks against Code LLM with model attention

Author

Publication Type

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Additional URL

Share

Search

Links

Browse

Links