Publication Type

Book Chapter

Version

acceptedVersion

Publication Date

5-2019

Abstract

Heterogeneous data co-clustering is a commonly used technique for tapping the rich meta-information of multimedia web documents, including category, annotation, and description, for associative discovery. However, most co-clustering methods proposed for heterogeneous data do not consider the representation problem of short and noisy text and their performance is limited by the empirical weighting of the multimodal features. This chapter explains how to use the Generalized Heterogeneous Fusion Adaptive Resonance Theory (GHF-ART) generalized heterogeneous fusion adaptive resonance theory for clustering large-scale web multimedia documents. Specifically, GHF-ART is designed to handle multimedia data with an arbitrarily rich level of meta-information. For handling short and noisy text, GHF-ART employs the representation and learning methods of PF-ART as described in Sect. 3.5, which identify key tags for cluster prototype modeling by learning the probabilistic distribution of tag occurrences of clusters. More importantly, GHF-ART incorporates an adaptive method for effective fusion of the multimodal features, which weights the features of multiple data sources by incrementally measuring the importance of feature modalities through the intra-cluster scatters. Extensive experiments on two web image datasets and one text document set have shown that GHF-ART achieves significantly better clustering performance and is much faster than many existing state-of-the-art algorithms. The content of this chapter is summarized and extended from Heterogeneous data co-clustering is a commonly used technique for tapping the rich meta-information of multimedia web documents, including category, annotation, and description, for associative discovery. However, most co-clustering methods proposed for heterogeneous data do not consider the representation problem of short and noisy text and their performance is limited by the empirical weighting of the multimodal features. This chapter explains how to use the Generalized Heterogeneous Fusion Adaptive Resonance Theory (GHF-ART) generalized heterogeneous fusion adaptive resonance theory for clustering large-scale web multimedia documents. Specifically, GHF-ART is designed to handle multimedia data with an arbitrarily rich level of meta-information. For handling short and noisy text, GHF-ART employs the representation and learning methods of PF-ART as described in Sect. 3.5, which identify key tags for cluster prototype modeling by learning the probabilistic distribution of tag occurrences of clusters. More importantly, GHF-ART incorporates an adaptive method for effective fusion of the multimodal features, which weights the features of multiple data sources by incrementally measuring the importance of feature modalities through the intra-cluster scatters. Extensive experiments on two web image datasets and one text document set have shown that GHF-ART achieves significantly better clustering performance and is much faster than many existing state-of-the-art algorithms. The content of this chapter is summarized and extended from IEEE Trans Knowl Data Eng 26(9): 2293-2306, and the Python codes of GHF-ART are available at https://github.com/Lei-Meng/GHF-ART.

Discipline

Databases and Information Systems | Social Media | Theory and Algorithms

Research Areas

Data Science and Engineering

Publication

Adaptive resonance theory in social media data clustering: Roles, methodologies, and applications

First Page

111

Last Page

135

ISBN

9783030029845

Identifier

10.1007/978-3-030-02985-2_5

Publisher

Springer

City or Country

Cham

Embargo Period

8-17-2021

Copyright Owner and License

Authors

Additional URL

https://doi.org/10.1007/978-3-030-02985-2_5

Share

COinS