Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

6-2025

Abstract

With the rapid advancement of cloud-native computing, securing cloud environments has become an important task. Log-based Anomaly Detection (LAD) is the most representative technique used in different systems for attack detection and safety guarantee, where multiple LAD methods and relevant datasets have been proposed. However, even though some of these datasets are specifically prepared for cloud systems, they only cover limited cloud behaviors and lack information from a whole-system perspective. Another critical issue to consider is normality shift, which implies that the test distribution could differ from the training distribution and highly affect the performance of LAD. Unfortunately, existing works only focus on simple shift types such as chronological changes, while other cloud-specific shift types are ignored, e.g., different deployed cloud architectures. Therefore, a dataset that captures diverse cloud system behaviors and various types of normality shifts is essential.To fill this gap, we construct a dataset CAShift to evaluate the performance of LAD in cloud, which considers different roles of software in cloud systems, supports three real-world normality shift types (application shift, version shift, and cloud architecture shift), and features 20 different attack scenarios in various cloud system components. Based on CAShift, we conduct a comprehensive empirical study to investigate the effectiveness of existing LAD methods in normality shift scenarios. Additionally, to explore the feasibility of shift adaptation, we further investigate three continuous learning approaches, which are the most common methods to mitigate the impact of distribution shift. Results demonstrated that 1) all LAD methods suffer from normality shift where the performance drops up to 34%, and 2) existing continuous learning methods are promising to address shift drawbacks, but the ratio of data used for model retraining and the selection of algorithms highly affect the shift adaptation, with an increase in the F1-Score of up to 27%. Based on our findings, we offer valuable implications for future research in designing more robust LAD models and methods for LAD shift adaptation.

Discipline

Artificial Intelligence and Robotics | Software Engineering

Research Areas

Intelligent Systems and Optimization

Areas of Excellence

Digital transformation

Publication

Proceedings of the ACM on Software Engineering, Volume 2, Issue FSE, Trondheim, Norway, 2025 June 23-27

First Page

1687

Last Page

1709

Identifier

10.1145/3729346

Publisher

ACM

City or Country

New York

Additional URL

https://dl.acm.org/doi/10.1145/3729346

Share

COinS