Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

7-2014

Abstract

As the rapid growth of online social media attracts a large number of Internet users, the large volume of content generated by these users also provides us with an opportunity to study the lexical variation of people of different ages. In this paper, we present a latent variable model that jointly models the lexical content of tweets and Twitter users’ ages. Our model inherently assumes that a topic has not only a word distribution but also an age distribution. We propose a Gibbs-EM algorithm to perform inference on our model. Empirical evaluation shows that our model can learn meaningful age-specific topics such as “school” for teenagers and “health” for older people. Our model can also be used for age prediction and performs better than a number of baseline methods.

Keywords

Age topic model, Gibbs-EM, Lexical variation

Discipline

Computer Sciences | Databases and Information Systems | Social Media

Publication

Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence: 27-31 July 2014, Québec

First Page

1643

Last Page

1649

ISBN

9781577356615

Publisher

AAAI Press

City or Country

Palo Alto, CA

Copyright Owner and License

LARC

Additional URL

http://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8381

Share

COinS