Publication Type
Journal Article
Version
publishedVersion
Publication Date
7-2025
Abstract
Deep neural networks have achieved remarkable success across various applications; however, their vulnerability to backdoor attacks poses severe security risks—especially in situations where only a limited set of clean samples is available for defense. In this work, we address this critical challenge by proposing ULRL (UnLearn and ReLearn for backdoor removal), a novel two-phase approach for comprehensive backdoor removal. Our method first employs an unlearning phase, in which the network’s loss is intentionally maximized on a small clean dataset to expose neurons that are excessively sensitive to backdoor triggers. Subsequently, in the relearning phase, these suspicious neurons are recalibrated using targeted reinitialization and cosine similarity regularization, effectively neutralizing backdoor influences while preserving the model’s performance on benign data. Extensive experiments with 12 backdoor types on multiple datasets (CIFAR-10, CIFAR-100, GTSRB, and Tiny-ImageNet) and architectures (PreAct-ResNet18, VGG19-BN, and ViT-B-16) demonstrate that ULRL significantly reduces the attack success rate without compromising clean accuracy—even when only 1% of clean data is used for defense.
Keywords
Deep Neural Network, Backdoor, Mitigation
Discipline
OS and Networks | Software Engineering
Research Areas
Software and Cyber-Physical Systems
Areas of Excellence
Digital transformation
Publication
IEEE Transactions on Information Forensics and Security
First Page
6984
Last Page
6998
ISSN
1556-6013
Identifier
10.1109/TIFS.2025.3586499
Publisher
Institute of Electrical and Electronics Engineers
Citation
MIN, Nay Myat; PHAM, Long H.; and SUN, Jun.
Unified neural backdoor removal with only few clean samples through unlearning and relearning. (2025). IEEE Transactions on Information Forensics and Security. 6984-6998.
Available at: https://ink.library.smu.edu.sg/sis_research/10292
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1109/TIFS.2025.3586499