"Causality analysis for neural network security" by Bing SUN

Publication Type

PhD Dissertation

Version

publishedVersion

Publication Date

12-2024

Abstract

While neural networks are demonstrating excellent performance in a wide range of applications, there has been a growing concern on their reliability and dependability.Similar to traditional decision-making programs, neural networks inevitably have defects that need to be identified and mitigated at times. Neural networks are usually inherently black-boxes and do not provide explanations on how and why decisions are made. As a result, these defects are more ``hidden" and more challenging to eliminate. It is thus crucial to develop systematic approaches to identify and mitigate defects in a neural network in a rigorous way.

In this dissertation, we focus on three important properties of neural networks, fairness, backdoor-freeness and robustness, and develop systematic ways to mitigate the risk of discrimination, having backdoors and adversarial perturbations.

In the first research work, we propose an approach to formally verify neural networks against fairness. Our method is built upon an approach for learning Markov Chains from a given neural network. The learned Markov Chain not only allows us to verify (with Probably Approximate Correctness guarantee) whether the neural network is fair or not, but also facilities a lightweight causality analysis known as sensitivity analysis which helps to understand why fairness is violated. We demonstrate that with our analysis results, the neural weights can be optimized to improve fairness. Our approach has been evaluated with multiple models trained on benchmark datasets and the experiment results show that our approach is effective and efficient.

In the second research work, we address the problem of repairing a neural network for desirable properties such as fairness and the absence of backdoor. The goal is to construct a neural network that satisfies the property by (minimally) adjusting the given neural network's parameters (i.e., weights). Specifically, we propose CARE (\textbf{CA}usality-based \textbf{RE}pair), a causality-based neural network repair technique that 1) performs causality-based fault localization to identify the `guilty' neurons and 2) optimizes the parameters of the identified neurons to reduce the misbehavior. We have empirically evaluated CARE on various tasks such as backdoor removal, neural network repair for fairness and safety properties. Our experiment results show that CARE is able to repair all neural networks efficiently and effectively.

In the third research work, we propose SODA (\textbf{S}emantic Backd\textbf{O}or \textbf{D}etection and Mitig\textbf{A}tion), a causality-based approach to systematically detect and remove semantic backdoors. SODA conducts lightweight causality analysis to identify potential semantic backdoor based on how hidden neurons contribute to the predictions. Then the identified backdoor is removed by adjusting the responsible neurons' contribution towards the correct predictions through optimization. SODA is evaluated with 21 neural networks trained on 6 benchmark datasets. The results show that it effectively detects and removes semantic backdoors and preserves the accuracy of the neural networks.

In the fourth research work, we aim to protect neural networks against universal adversarial perturbations (UAP). We empirically show that UAPs usually lead to abnormal entropy spectrum in hidden layers, which suggests that the prediction is dominated by a small number of ``feature'' in such cases (rather than democratically by many features). Inspired by this, we propose an efficient yet effective defense method for mitigating UAPs called \emph{Democratic Training} by performing entropy-based model enhancement to suppress the effect of the UAPs. \emph{Democratic Training} is evaluated with 7 neural networks trained on 5 benchmark datasets and 5 types of state-of-the-art universal adversarial attack methods. The results show that it effectively reduces the attack success rate, improves model robustness and preserves the model accuracy on clean samples.

Keywords

Neural Network Security, Neural Network Fairness, Neural Network Backdoor, Neural Network Adversarial Attack and Defense, Neural Network Causality Analysis

Degree Awarded

PhD in Computer Science

Discipline

Information Security | OS and Networks

Supervisor(s)

DENG, Huijie Robert; SUN, Jun

First Page

1

Last Page

162

Publisher

Singapore Management University

City or Country

Singapore

Copyright Owner and License

Author

Share

COinS