Research Collection School Of Computing and Information Systems

Should We Use the Sample? Analyzing Datasets Sampled from Twitter's Stream API

Publication Type

Journal Article

Version

publishedVersion

Publication Date

6-2015

Abstract

Researchers have begun studying content obtained from microblogging services such as Twitter to address a variety of technological, social, and commercial research questions. The large number of Twitter users and even larger volume of tweets often make it impractical to collect and maintain a complete record of activity; therefore, most research and some commercial software applications rely on samples, often relatively small samples, of Twitter data. For the most part, sample sizes have been based on availability and practical considerations. Relatively little attention has been paid to how well these samples represent the underlying stream of Twitter data. To fill this gap, this article performs a comparative analysis on samples obtained from two of Twitter’s streaming APIs with a more complete Twitter dataset to gain an in-depth understanding of the nature of Twitter data samples and their potential for use in various data mining tasks.

Keywords

Twitter API, sample, data mining

Discipline

Databases and Information Systems | Numerical Analysis and Scientific Computing | Social Media

Research Areas

Data Science and Engineering

Publication

ACM Transactions on the Web

Volume

Issue

First Page

13:1

Last Page

ISSN

1559-1131

Identifier

10.1145/2746366

Publisher

Association for Computing Machinery (ACM)

Citation

WANG, Yazhe; CALLAN, Jamie; and ZHENG, Baihua. Should We Use the Sample? Analyzing Datasets Sampled from Twitter's Stream API. (2015). ACM Transactions on the Web. 9, (3), 13:1-23.
Available at: https://ink.library.smu.edu.sg/sis_research/2866

Copyright Owner and License

Publisher

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1145/2746366

Download

Find it in your library

Included in

Databases and Information Systems Commons, Numerical Analysis and Scientific Computing Commons, Social Media Commons

COinS

Research Collection School Of Computing and Information Systems

Should We Use the Sample? Analyzing Datasets Sampled from Twitter's Stream API

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Should We Use the Sample? Analyzing Datasets Sampled from Twitter's Stream API

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links