Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

2-2025

Abstract

In the contemporary landscape of cybersecurity, AI-driven detectors have emerged as pivotal in the realm of malware detection. However, existing AI-driven detectors encounter a myriad of challenges, including poisoning attacks, evasion attacks, and concept drift, which stem from the inherent characteristics of AI methodologies. While numerous solutions have been proposed to address these issues, they often concentrate on isolated problems, neglecting the broader implications for other facets of malware detection. This paper diverges from the conventional approach by not targeting a singular issue but instead identifying one of the fundamental causes of these challenges, sparsity. Sparsity refers to a scenario where certain feature values occur with low frequency, being represented only a minimal number of times across the dataset. The authors elevate the significance of sparsity and link it to core challenges in the domain of malware detection, and then aim to improve performance, robustness, and sustainability simultaneously by solving sparsity problems. To address the sparsity problems, a novel compression technique is designed to effectively alleviate the sparsity. Concurrently, a density boosting training method is proposed to consistently fill sparse regions. The proposed strategies are applied to PE, Android and PDF datasets, respectively. Empirical results demonstrate that the proposed methodologies not only successfully bolster the model’s resilience against different attacks but also enhance the performance and sustainability over time. For instance, on EMBER (PE) dataset, the backdoor attack success rate decreased from 99.99% to 23.71% while the F1 score increased from 99.301% to 99.488%; the AUT (a metric for evaluating sustainability) increased from 92.850% to 95.135% on SOREL-20M dataset (a larger and long spanning PE dataset). Moreover, the proposals are complementary to existing defensive technologies and successfully demonstrate practical classifiers with improved performance and robustness to attacks. At last, such observation is verified to be consistent on DREBIN (Android) and Contagio (PDF) datasets.

Discipline

Graphics and Human Computer Interfaces | Software Engineering

Research Areas

Software and Cyber-Physical Systems

Areas of Excellence

Digital transformation

Publication

Proceedings of the 2025 Network and Distributed System Security (NDSS) Symposium, San Diego, CA, USA, February 24-28

First Page

1

Last Page

18

Identifier

10.14722/ndss.2025.240336

City or Country

San Diego, USA

Additional URL

https://doi.org/10.14722/ndss.2025.240336

Share

COinS