Publication Type

PhD Dissertation

Version

publishedVersion

Publication Date

4-2025

Abstract

In the era of big data, data quality plays a critical role in computer vision, where the reliability and purity of training images are essential for optimal performance. When training models such as image classifiers and object detectors, the quality of the training data directly influences the success of the model. In other words, if the training dataset is contaminated, the model’s performance might accordingly decrease.

To address this challenge, unsupervised anomaly detection (UAD) has become an attractive research area. By automatically removing these anomalous data points, UAD can help improve the accuracy and robustness of machine learning models in real-world computer vision tasks. Before proposing our solutions, we display the challenges of UAD. In practice, UAD has several challenges: image data is in the high-dimensional space, where some common properties in our low-dimensional (2-D/3-D) space become invalid. Unsupervised anomaly detection refers to learning without any annotation. Varying contamination factors and various target dataset domains.

We begin by introducing LVAD, a novel statistical embedding-based approach that leverages locally varying data projections to preserve the intrinsic variability within each cluster. Each projection encodes the distinguishing features of a local cluster relative to the global data distribution. By aggregating probabilistic cluster membership estimates from these local projections, LVAD defines a global affinity measure for each instance, enabling the emergence of anomalies as outliers with unexpectedly low affinity scores. This formulation offers a principled way to model multi-normality in high-dimensional space, establishing a strong baseline for unsupervised anomaly scoring. However, LVAD faces two major limitations: (\romannumeral1) the absence of a learnable thresholding mechanism, which is crucial for label prediction and real-world deployment; and (\romannumeral2) reliance on K-Means for multi-normality embedding, which leads to inefficiencies in practice.

To bridge the gap between anomaly scoring and thresholding (label prediction), we propose Multi-T. Unlike previous methods that depend on a single global threshold, Multi-T is the first that introduces the concept of multiple thresholds. It generates two thresholds that isolate normal data and anomalies within an unlabeled target dataset. This separation enables the use of detected anomalies to enhance feature representations, while normal data helps preserve a clean normal manifold. In doing so, Multi-T transforms unlabeled datasets into a weakly supervised resource, allowing significant improvements to existing anomaly scoring methods. Extensive experiments show that Multi-T not only boosts the effectiveness of LVAD but also elevates a simple distance-based method to state-of-the-art performance.

Despite these above advances, a key challenge remains: the instability of UAD methods under varying contamination levels (anomaly percentages) in target datasets. To address this, we introduce FlexUAD, a training-free, plug-and-play framework that incorporates a contamination factor estimator. FlexUAD adaptively selects the appropriate anomaly detector based on the estimated contamination factor, ensuring both high efficacy and stability across diverse settings.

In summary, this dissertation contributes a cohesive suite of methods: LVAD for robust anomaly scoring, Multi-T for multiple thresholding, and FlexUAD for contamination factor estimation that collectively push the boundaries of unsupervised image anomaly detection towards greater stability and efficiency.
Additionally, we will further discuss some industrial applications of UAD, i.e., industrial inspection.

Keywords

Image, Anomaly detection, Unsupervised learning

Degree Awarded

PhD in Computer Science

Discipline

Graphics and Human Computer Interfaces

Supervisor(s)

LIN, Wenyan, Daniel

First Page

1

Last Page

120

Publisher

Singapore Management University

City or Country

Singapore

Copyright Owner and License

Author

Share

COinS