Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

4-2025

Abstract

Despite their advances and success, real-world deep neural networks are known to be vulnerable to adversarial attacks. Universal adversarial perturbation, an inputagnostic attack, poses a serious threat for them to be deployed in security-sensitive systems. In this case, a single universal adversarial perturbation deceives the model on a range of clean inputs without requiring input-specific optimization, which makes it particularly threatening. In this work, we observe that universal adversarial perturbations usually lead to abnormal entropy spectrum in hidden layers, which suggests that the prediction is dominated by a small number of “feature” in such cases (rather than democratically by many features). Inspired by this, we propose an efficient yet effective defense method for mitigating UAPs called Democratic Training by performing entropy-based model enhancement to suppress the effect of the universal adversarial perturbations in a given model. Democratic Training is evaluated with 7 neural networks trained on 5 benchmark datasets and 5 types of state-of-the-art universal adversarial attack methods. The results show that it effectively reduces the attack success rate, improves model robustness and preserves the model accuracy on clean samples.

Discipline

Software Engineering

Areas of Excellence

Digital transformation

Publication

Proceedings of the Thirteenth annual conference, ICLR 2025, Singapore, April 24-26

First Page

1

Last Page

22

Identifier

10.48550/arXiv.2502.05542

City or Country

Singapore

Additional URL

https://doi.org/10.48550/arXiv.2502.05542

Share

COinS