Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

11-2017

Abstract

This paper introduces a novel framework, namely SelectVC and its instance POP, for learning selective value couplings (i.e., interactions between the full value set and a set of outlying values) to identify outliers in high-dimensional categorical data. Existing outlier detection methods work on a full data space or feature subspaces that are identified independently from subsequent outlier scoring. As a result, they are significantly challenged by overwhelming irrelevant features in high-dimensional data due to the noise brought by the irrelevant features and its huge search space. In contrast, SelectVC works on a clean and condensed data space spanned by selective value couplings by jointly optimizing outlying value selection and value outlierness scoring. Its instance POP defines a value outlierness scoring function by modeling a partial outlierness propagation process to capture the selective value couplings. POP further defines a top-k outlying value selection method to ensure its scalability to the huge search space. We show that POP (i) significantly outperforms five state-of-the-art full space- or subspace-based outlier detectors and their combinations with three feature selection methods on 12 real-world high-dimensional data sets with different levels of irrelevant features; and (ii) obtains good scalability, stable performance w.r.t. k, and fast convergence rate.

Keywords

Outlier Detection, High-Dimensional Data, Categorical Data, Feature Selection, Coupling Learning

Discipline

Databases and Information Systems | Data Storage Systems

Research Areas

Data Science and Engineering

Publication

Proceedings of the 26th ACM Conference on Information and Knowledge Management, Singapore, November 6-10

First Page

807

Last Page

816

ISBN

9781450349185

Identifier

10.1145/3132847.3132994

Publisher

ACM

City or Country

Singapore

Citation

PANG, Guansong; XU, Hongzuo; CAO Longbing; and ZHAO, Wentao. Selective value coupling learning for detecting outliers in high-dimensional categorical data. (2017). Proceedings of the 26th ACM Conference on Information and Knowledge Management, Singapore, November 6-10. 807-816.
Available at: https://ink.library.smu.edu.sg/sis_research/7142

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Download

Included in

Databases and Information Systems Commons, Data Storage Systems Commons

COinS

Research Collection School Of Computing and Information Systems

Selective value coupling learning for detecting outliers in high-dimensional categorical data

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Selective value coupling learning for detecting outliers in high-dimensional categorical data

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Included in

Share

Search

Links

Browse

Links