Publication Type

Journal Article

Version

publishedVersion

Publication Date

7-2025

Abstract

Deep neural networks have achieved remarkable success across various applications; however, their vulnerability to backdoor attacks poses severe security risks—especially in situations where only a limited set of clean samples is available for defense. In this work, we address this critical challenge by proposing ULRL (UnLearn and ReLearn for backdoor removal), a novel two-phase approach for comprehensive backdoor removal. Our method first employs an unlearning phase, in which the network’s loss is intentionally maximized on a small clean dataset to expose neurons that are excessively sensitive to backdoor triggers. Subsequently, in the relearning phase, these suspicious neurons are recalibrated using targeted reinitialization and cosine similarity regularization, effectively neutralizing backdoor influences while preserving the model’s performance on benign data. Extensive experiments with 12 backdoor types on multiple datasets (CIFAR-10, CIFAR-100, GTSRB, and Tiny-ImageNet) and architectures (PreAct-ResNet18, VGG19-BN, and ViT-B-16) demonstrate that ULRL significantly reduces the attack success rate without compromising clean accuracy—even when only 1% of clean data is used for defense.

Keywords

Deep Neural Network, Backdoor, Mitigation

Discipline

OS and Networks | Software Engineering

Research Areas

Software and Cyber-Physical Systems

Areas of Excellence

Digital transformation

Publication

IEEE Transactions on Information Forensics and Security

First Page

6984

Last Page

6998

ISSN

1556-6013

Identifier

10.1109/TIFS.2025.3586499

Publisher

Institute of Electrical and Electronics Engineers

Additional URL

https://doi.org/10.1109/TIFS.2025.3586499

Share

COinS