Publication Type
Journal Article
Version
publishedVersion
Publication Date
6-2015
Abstract
Researchers have begun studying content obtained from microblogging services such as Twitter to address a variety of technological, social, and commercial research questions. The large number of Twitter users and even larger volume of tweets often make it impractical to collect and maintain a complete record of activity; therefore, most research and some commercial software applications rely on samples, often relatively small samples, of Twitter data. For the most part, sample sizes have been based on availability and practical considerations. Relatively little attention has been paid to how well these samples represent the underlying stream of Twitter data. To fill this gap, this article performs a comparative analysis on samples obtained from two of Twitter’s streaming APIs with a more complete Twitter dataset to gain an in-depth understanding of the nature of Twitter data samples and their potential for use in various data mining tasks.
Keywords
Twitter API, sample, data mining
Discipline
Databases and Information Systems | Numerical Analysis and Scientific Computing | Social Media
Research Areas
Data Science and Engineering
Publication
ACM Transactions on the Web
Volume
9
Issue
3
First Page
13:1
Last Page
23
ISSN
1559-1131
Identifier
10.1145/2746366
Publisher
Association for Computing Machinery (ACM)
Citation
WANG, Yazhe; CALLAN, Jamie; and ZHENG, Baihua.
Should We Use the Sample? Analyzing Datasets Sampled from Twitter's Stream API. (2015). ACM Transactions on the Web. 9, (3), 13:1-23.
Available at: https://ink.library.smu.edu.sg/sis_research/2866
Copyright Owner and License
Publisher
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1145/2746366
Included in
Databases and Information Systems Commons, Numerical Analysis and Scientific Computing Commons, Social Media Commons