Research Collection School Of Computing and Information Systems

Modeling Syntactic Structures of Topics with a Nested HMM-LDA

Jing JIANG, Singapore Management UniversityFollow

Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

12-2009

Abstract

Latent Dirichlet allocation (LDA) is a commonly used topic modeling method for text analysis and mining. Standard LDA treats documents as bags of words, ignoring the syntactic structures of sentences. In this paper, we propose a hybrid model that embeds hidden Markov models (HMMs) within LDA topics to jointly model both the topics and the syntactic structures within each topic. Our model is general and subsumes standard LDA and HMM as special cases. Compared with standard LDA and HMM, our model can simultaneously discover both topic-specific content words and background functional words shared among topics. Our model can also automatically separate content words that play different roles within a topic. Using perplexity as evaluation metric, our model returns lower perplexity for unseen test documents compared with standard LDA, which shows its better generalization power than LDA.

Keywords

background functional words, hidden Markov models, latent Dirichlet allocation, syntactic structure modeling, text analysis, text mining, topic modeling method, topic-specific content words

Discipline

Computer Sciences | Numerical Analysis and Scientific Computing

Research Areas

Information Systems and Management

Publication

9th IEEE International Conference on Data Mining

First Page

824

Last Page

829

ISBN

9780769538952

Identifier

10.1109/ICDM.2009.144

Publisher

IEEE

City or Country

Miami, FL

Citation

JIANG, Jing. Modeling Syntactic Structures of Topics with a Nested HMM-LDA. (2009). 9th IEEE International Conference on Data Mining. 824-829.
Available at: https://ink.library.smu.edu.sg/sis_research/351

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

http://dx.doi.org/10.1109/ICDM.2009.144

Link to Full Text

COinS

Research Collection School Of Computing and Information Systems

Modeling Syntactic Structures of Topics with a Nested HMM-LDA

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Modeling Syntactic Structures of Topics with a Nested HMM-LDA

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Share

Search

Links

Browse

Links