Publication Type

PhD Dissertation

Version

publishedVersion

Publication Date

8-2015

Abstract

With the rapid growth of social media, Twitter has become one of the most widely adopted platforms for people to post short and instant messages. Because of such wide adoption of Twitter, events like breaking news and release of popular videos can easily capture people’s attention and spread rapidly on Twitter. Therefore, the popularity and importance of an event can be approximately gauged by the volume of tweets covering the event. Moreover, the relevant tweets also reflect the public’s opinions and reactions to events. It is therefore very important to identify and analyze the events on Twitter. In this dissertation, we introduce our work which aims to (1) identify events from Twitter stream, (2) analyze personal topics, events and users on Twitter, and (3) summarize the events identified from Twitter. First of all, we focus on event identification on Twitter. We observe that the textual content coupled with the temporal patterns of tweets provides important insight into the general public’s attention and interests. A sudden increase of topically similar tweets usually indicates a burst of attention in some events that has happened offline (such as a product launch or a natural disaster) or online (such as the spread of a viral video). Based on these observations, we propose two models to identify events on Twitter, which are extended from LDA and a non-parametric model. These two models share two common assumptions: (1) similar tweets emerged around the same time are more likely about some events, and (2) similar tweets published by the same user over a long term are more likely about the user’s personal background and interests. These two assumptions help separate event-driven tweets from the large proportion of personal-interests-driven tweets. The first model needs to predefine the number of events because of the limitation of topic models. However, events emerge and die out fast along the time line, and the number can be countable infinite. Our non-parametric model overcomes this challenge. In the first task described above, we aim to identify events underlying the Twitter stream, and we do not consider the relation between events and users’ personal interest topics. However, the concept of events and users’ personal interest topics are orthogonal in that many events fall under certain topics. For example, concerts fall under the topic about music. Furthermore, being social media, Twitter users play important roles in forming topics and events on Twitter. Each user has her own topic interests, which influence the content of her tweets. Whether a user publishes a tweet related to an event also largely depends on whether her topic interests match the nature of the event. Modeling the interplay between topics, events and users can deepen our understanding of Twitter content and potentially aid many predication and recommendation tasks. For the second task, we aim to construct a unified model of topics, events and users on Twitter. The unified model is a combination of a topic model, a dynamic non-parametric model and matrix factorization. The topic model part is to learn users’ personal interest topics. The dynamic non-parametric model is to identify events from the tweets stream, and finally matrix factorization is to model the interaction between topics and events. Finally, we aim to summarize the events identified on Twitter. In the previous two tasks, we utilize topic models and a dynamic non-parametric models to identify events from tweets stream. For both methods, events are learnt as clusters of tweets featured by multinomial word distributions. Therefore, users need to either read the clusters of tweets or the word distribution to interpret the events. However, the former is time-consuming and the latter cannot accurately represent the events. In this case, we propose a novel graph-based summarization method that generates concise abstractive summaries for the events. Overall, this dissertation presents our work on event identification first. Then we further analyze events, users and personal interest topics on Twitter, which can help better understand users’ tweeting behavior on events. Finally, we propose a summarization method to generate abstractive summaries for the events on Twitter.

Keywords

event identification, event detection, twitter analysis, topic model, event summarization, bursty topic detection

Degree Awarded

PhD in Information Systems

Discipline

Databases and Information Systems | Social Media

Supervisor(s)

JIANG, Jing

First Page

1

Last Page

90

Publisher

Singapore Management University

City or Country

Singapore

Copyright Owner and License

Author

Share

COinS