Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
8-2017
Abstract
Learning the representation of categorical data with hierarchical value coupling relationships is very challenging but critical for the effective analysis and learning of such data. This paper proposes a novel coupled unsupervised categorical data representation (CURE) framework and its instantiation, i.e., a coupled data embedding (CDE) method, for representing categorical data by hierarchical value-to-value cluster coupling learning. Unlike existing embedding- and similarity-based representation methods which can capture only a part or none of these complex couplings, CDE explicitly incorporates the hierarchical couplings into its embedding representation. CDE first learns two complementary feature value couplings which are then used to cluster values with different granularities. It further models the couplings in value clusters within the same granularity and with different granularities to embed feature values into a new numerical space with independent dimensions. Substantial experiments show that CDE significantly outperforms three popular unsupervised embedding methods and three state-of-the-art similarity-based representation methods.
Keywords
Machine Learning: Data Mining, Machine Learning: Unsupervised Learning
Discipline
Databases and Information Systems | Data Storage Systems
Research Areas
Data Science and Engineering
Publication
Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, Australia, August 19-25
First Page
1937
Last Page
1943
Identifier
10.24963/ijcai.2017/269
Publisher
IJCAI
City or Country
Melbourne, Australia
Citation
JIAN, Songlei; CAO, Longbing; PANG, Guansong; LU, Kai; and GAO, Hang.
Embedding-based representation of categorical data by hierarchical value coupling learning. (2017). Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, Australia, August 19-25. 1937-1943.
Available at: https://ink.library.smu.edu.sg/sis_research/7143
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.