Publication Type
Journal Article
Version
publishedVersion
Publication Date
12-2020
Abstract
In outlier detection, recent major research has shifted from developing univariate methods to multivariate methods due to the rapid growth of multidimensional data. However, one typical issue of this paradigm shift is that many multidimensional data often mainly contains univariate outliers, in which many features are actually irrelevant. In such cases, multivariate methods are ineffective in identifying such outliers due to the potential biases and the curse of dimensionality brought by irrelevant features. Those univariate outliers might be well detected by applying univariate outlier detectors in individually relevant features. However, it is very challenging to choose a right univariate detector for each individual feature since different features may take very different probability distributions. To address this challenge, we introduce a novel Heterogeneous Univariate Outlier Ensembles (HUOE) framework and its instance ZDD to synthesize a set of heterogeneous univariate outlier detectors as base learners to build heterogeneous ensembles that are optimized for each individual feature. Extensive results on 19 real-world datasets and a collection of synthetic datasets show that ZDD obtains 5%–14% average AUC improvement over four state-of-the-art multivariate ensembles and performs substantially more robustly w.r.t. irrelevant features.
Keywords
Outlier detection, outlier ensemble, anomaly detection, univariate outlier, multidimensional data, heterogeneous data
Discipline
Artificial Intelligence and Robotics | Databases and Information Systems
Research Areas
Intelligent Systems and Optimization
Publication
ACM Transactions on Knowledge Discovery from Data
Volume
14
Issue
6
First Page
1
Last Page
27
ISSN
1556-4681
Identifier
10.1145/3403934
Publisher
Association for Computing Machinery (ACM)
Citation
PANG, Guansong and CAO, Longbing.
Heterogeneous univariate outlier ensembles in multidimensional data. (2020). ACM Transactions on Knowledge Discovery from Data. 14, (6), 1-27.
Available at: https://ink.library.smu.edu.sg/sis_research/7039
Copyright Owner and License
Publisher
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1145/3403934