Research Collection School Of Computing and Information Systems

SampDetox : Black-box backdoor defense via perturbation-based sample detoxification

Yanxin YANG
Chentao JIA
Dengke YAN
Ming HU, Singapore Management UniversityFollow
Tianlin LI
Xiaofei XIE, Singapore Management UniversityFollow
Xian WEI
Mingsong CHEN

Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

12-2024

Abstract

The advancement of Machine Learning has enabled the widespread deployment of Machine Learning as a Service (MLaaS) applications. However, the untrustworthy nature of third-party ML services poses backdoor threats. Existing defenses in MLaaS are limited by their reliance on training samples or white-box model analysis, highlighting the need for a black-box backdoor purification method. In our paper, we attempt to use diffusion models for purification by introducing noise in a forward diffusion process to destroy backdoors and recover clean samples through a reverse generative process. However, since a higher noise also destroys the semantics of the original samples, it still results in a low restoration performance. To investigate the effectiveness of noise in eliminating different types of backdoors, we conducted a preliminary study, which demonstrates that backdoors with low visibility can be easily destroyed by lightweight noise and those with high visibility need to be destroyed by high noise but can be easily detected. Based on the study, we propose SampDetox, which strategically combines lightweight and high noise. SampDetox applies weak noise to eliminate low-visibility backdoors and compares the structural similarity between the recovered and original samples to localize high-visibility backdoors. Intensive noise is then applied to these localized areas, destroying the high-visibility backdoors while preserving global semantic information. As a result, detoxified samples can be used for inference, even by poisoned models. Comprehensive experiments demonstrate the effectiveness of SampDetox in defending against various state-of-the-art backdoor attacks.

Keywords

Machine learning, Backdoor threats, Backdoor defense

Discipline

Information Security

Research Areas

Cybersecurity

Publication

Proceedings of the 38th Conference on Neural Information Processing (NeurIPS 2024), Vancouver, Canada, December 10-15

Publisher

NeurIPS

City or Country

Canada

Citation

YANG, Yanxin; JIA, Chentao; YAN, Dengke; HU, Ming; LI, Tianlin; XIE, Xiaofei; WEI, Xian; and CHEN, Mingsong. SampDetox : Black-box backdoor defense via perturbation-based sample detoxification. (2024). Proceedings of the 38th Conference on Neural Information Processing (NeurIPS 2024), Vancouver, Canada, December 10-15.
Available at: https://ink.library.smu.edu.sg/sis_research/9812

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Download

Included in

Information Security Commons

COinS

Research Collection School Of Computing and Information Systems

SampDetox : Black-box backdoor defense via perturbation-based sample detoxification

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Publisher

City or Country

Citation

Creative Commons License

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

SampDetox : Black-box backdoor defense via perturbation-based sample detoxification

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Publisher

City or Country

Citation

Creative Commons License

Included in

Share

Search

Links

Browse

Links