Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
7-2007
Abstract
We consider the problem of analyzing word trajectories in both time and frequency domains, with the specific goal of identifying important and less-reported, periodic and aperiodic words. A set of words with identical trends can be grouped together to reconstruct an event in a completely un-supervised manner. The document frequency of each word across time is treated like a time series, where each element is the document frequency - inverse document frequency (DFIDF) score at one time point. In this paper, we 1) first applied spectral analysis to categorize features for different event characteristics: important and less-reported, periodic and aperiodic; 2) modeled aperiodic features with Gaussian density and periodic features with Gaussian mixture densities, and subsequently detected each feature's burst by the truncated Gaussian approach; 3) proposed an unsupervised greedy event detection algorithm to detect both aperiodic and periodic events. All of the above methods can be applied to time series data in general. We extensively evaluated our methods on the 1-year Reuters News Corpus [3] and showed that they were able to uncover meaningful aperiodic and periodic events.
Discipline
Databases and Information Systems | Numerical Analysis and Scientific Computing
Publication
30th ACM SIGIR Conference on Research & Development on Information Retrieval (SIGIR2007)
Identifier
10.1145/1277741.1277779
Publisher
ACM
City or Country
Amsterdam
Citation
HE, Qi; CHANG, Kuiyu; and LIM, Ee Peng.
Analyzing feature trajectories for event detection. (2007). 30th ACM SIGIR Conference on Research & Development on Information Retrieval (SIGIR2007).
Available at: https://ink.library.smu.edu.sg/sis_research/1269
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
http://doi.org/10.1145/1277741.1277779
Included in
Databases and Information Systems Commons, Numerical Analysis and Scientific Computing Commons