Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

8-2017

Abstract

Learning the representation of categorical data with hierarchical value coupling relationships is very challenging but critical for the effective analysis and learning of such data. This paper proposes a novel coupled unsupervised categorical data representation (CURE) framework and its instantiation, i.e., a coupled data embedding (CDE) method, for representing categorical data by hierarchical value-to-value cluster coupling learning. Unlike existing embedding- and similarity-based representation methods which can capture only a part or none of these complex couplings, CDE explicitly incorporates the hierarchical couplings into its embedding representation. CDE first learns two complementary feature value couplings which are then used to cluster values with different granularities. It further models the couplings in value clusters within the same granularity and with different granularities to embed feature values into a new numerical space with independent dimensions. Substantial experiments show that CDE significantly outperforms three popular unsupervised embedding methods and three state-of-the-art similarity-based representation methods.

Keywords

Machine Learning: Data Mining, Machine Learning: Unsupervised Learning

Discipline

Databases and Information Systems | Data Storage Systems

Research Areas

Data Science and Engineering

Publication

Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, Australia, August 19-25

First Page

1937

Last Page

1943

Identifier

10.24963/ijcai.2017/269

Publisher

IJCAI

City or Country

Melbourne, Australia

Share

COinS