Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
8-2017
Abstract
This paper introduces a novel wrapper-based outlier detection framework (WrapperOD) and its instance (HOUR) for identifying outliers in noisy data (i.e., data with noisy features) with strong couplings between outlying behaviors. Existing subspace or feature selection-based methods are significantly challenged by such data, as their search of feature subset(s) is independent of outlier scoring and thus can be misled by noisy features. In contrast, HOUR takes a wrapper approach to iteratively optimize the feature subset selection and outlier scoring using a top-k outlier ranking evaluation measure as its objective function. HOUR learns homophily couplings between outlying behaviors (i.e., abnormal behaviors are not independent - they bond together) in constructing a noise-resilient outlier scoring function to produce a reliable outlier ranking in each iteration. We show that HOUR (i) retains a 2-approximation outlier ranking to the optimal one; and (ii) significantly outperforms five state-of-the-art competitors on 15 real-world data sets with different noise levels in terms of AUC and/or P@n. The source code of HOUR is available at https://sites.google.com/site/gspangsite/sourcecode.
Keywords
Machine Learning: Data Mining, Machine Learning: Feature Selection/Construction
Discipline
Databases and Information Systems | Data Storage Systems
Research Areas
Data Science and Engineering
Publication
Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, Australia, August 19-25
First Page
2585
Last Page
2591
Identifier
10.24963/ijcai.2017/360
Publisher
IJCAI
City or Country
Melbourne, Australia
Citation
PANG, Guansong; CAO, Longbing; CHEN, Ling; and LIU, Huan.
Learning homophily couplings from non-iid data for joint feature selection and noise-resilient outlier detection. (2017). Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, Australia, August 19-25. 2585-2591.
Available at: https://ink.library.smu.edu.sg/sis_research/7144
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.