Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

2-2018

Abstract

Text data co-clustering is the process of partitioning the documents and words simultaneously. This approach has proven to be more useful than traditional one-sided clustering when dealing with sparsity. Among the wide range of co-clustering approaches, Non-Negative Matrix Tri-Factorization (NMTF) is recognized for its high performance, flexibility and theoretical foundations. One important aspect when dealing with text data, is to capture the semantic relationships between words since documents that are about the same topic may not necessarily use exactly the same vocabulary. However, this aspect has been overlooked by previous co-clustering models, including NMTF. To address this issue, we rely on the distributional hypothesis stating that words which co-occur frequently within the same context, e.g., a document or sentence, are likely to have similar meanings. We then propose a new NMTF model that maps frequently co-occurring words roughly to the same direction in the latent space to reflect the relationships between them. To infer the factor matrices, we derive a scalable alternating optimization algorithm, whose convergence is guaranteed. Extensive experiments, on several real-world datasets, provide strong evidence for the effectiveness of the proposed approach, in terms of co-clustering.

Discipline

Artificial Intelligence and Robotics | Computer Sciences

Publication

Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), New Orleans, USA, February 2-7

Volume

First Page

2992

Last Page

3999

ISSN

2374-3468

Identifier

10.1609/aaai.v32i1.11659

Publisher

AAAI

Embargo Period

3-14-2025

Citation

SALAH, Aghiles; AILEM, Melissa; and NADIF, Mohamed. Word co-occurrence regularized non-negative matrix tri-factorization for text data co-clustering. (2018). Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), New Orleans, USA, February 2-7. 32, 2992-3999.
Available at: https://ink.library.smu.edu.sg/sis_research/10126

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1609/aaai.v32i1.11659

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

Research Collection School Of Computing and Information Systems

Word co-occurrence regularized non-negative matrix tri-factorization for text data co-clustering

Publication Type

Version

Publication Date

Abstract

Discipline

Publication

Volume

First Page

Last Page

ISSN

Identifier

Publisher

Embargo Period

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Word co-occurrence regularized non-negative matrix tri-factorization for text data co-clustering

Author

Publication Type

Version

Publication Date

Abstract

Discipline

Publication

Volume

First Page

Last Page

ISSN

Identifier

Publisher

Embargo Period

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links