Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

7-2025

Abstract

It is known that neural networks are subject to attacks through adversarial perturbations. Worse yet, such attacks are impossible to eliminate, i.e., the adversarial perturbation is still possible after applying mitigation methods such as adversarial training. Multiple approaches have been developed to detect and reject such adversarial inputs. Rejecting suspicious inputs however may not be always feasible or ideal. First, normal inputs may be rejected due to false alarms generated by the detection algorithm. Second, denial-of-service attacks may be conducted by feeding such systems with adversarial inputs. To address this, in this work, we focus on the text domain and propose an approach to automatically repair adversarial texts at runtime. Given a text which is suspected to be adversarial, we novelly apply multiple adversarial perturbation methods in a positive way to identify a repair, i.e., a slightly mutated but semantically equivalent text that the neural network correctly classifies. Experimental results show that our approach effectively repairs about 80% of adversarial texts. Furthermore, depending on the applied perturbation method, an adversarial text could be repaired about one second on average.

Keywords

Adversarial text, Detection, Repair, Perturbation

Discipline

Software Engineering

Research Areas

Software and Cyber-Physical Systems

Areas of Excellence

Digital transformation

Publication

Proceedings of the 16th International Symposium, TASE 2022, Cluj-Napoca, Romania, July 8–10

Volume

13299 LNCS

First Page

29

Last Page

48

ISBN

9783031103629

Identifier

10.1007/978-3-031-10363-6_3

Publisher

Springer

City or Country

Cham

Comments

Cited by: 4

Additional URL

https://doi.org/10.1007/978-3-031-10363-6_3

Share

COinS