Research Collection School Of Computing and Information Systems

NIRMAL: Automatic Identification of Software Relevant Tweets Leveraging Language Model

Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

3-2015

Abstract

Twitter is one of the most widely used social media platforms today. It enables users to share and view short 140-character messages called 'tweets'. About 284 million active users generate close to 500 million tweets per day. Such rapid generation of user generated content in large magnitudes results in the problem of information overload. Users who are interested in information related to a particular domain have limited means to filter out irrelevant tweets and tend to get lost in the huge amount of data they encounter. A recent study by Singer et al. found that software developers use Twitter to stay aware of industry trends, to learn from others, and to network with other developers. However, Singer et al. also reported that developers often find Twitter streams to contain too much noise which is a barrier to the adoption of Twitter. In this paper, to help developers cope with noise, we propose a novel approach named NIRMAL, which automatically identifies software relevant tweets from a collection or stream of tweets. Our approach is based on language modeling which learns a statistical model based on a training corpus (i.e., set of documents). We make use of a subset of posts from StackOverflow, a programming question and answer site, as a training corpus to learn a language model. A corpus of tweets was then used to test the effectiveness of the trained language model. The tweets were sorted based on the rank the model assigned to each of the individual tweets. The top 200 tweets were then manually analyzed to verify whether they are software related or not, and then an accuracy score was calculated. The results show that decent accuracy scores can be achieved by various variants of NIRMAL, which indicates that NIRMAL can effectively identify software related tweets from a huge corpus of tweets.

Discipline

Computer Sciences | Databases and Information Systems | Social Media

Research Areas

Software and Cyber-Physical Systems

Publication

2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering, SANER: Proceedings: March 2-6, Montréal

First Page

449

Last Page

458

ISBN

9781479984695

Identifier

10.1109/SANER.2015.7081855

Publisher

IEEE

City or Country

Piscataway, NJ

Citation

SHARMA, Abishek; TIAN, Yuan; and David LO. NIRMAL: Automatic Identification of Software Relevant Tweets Leveraging Language Model. (2015). 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering, SANER: Proceedings: March 2-6, Montréal. 449-458.
Available at: https://ink.library.smu.edu.sg/sis_research/3194

Copyright Owner and License

LARC

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1109/SANER.2015.7081855

Download

Find it in your library

Included in

Databases and Information Systems Commons, Social Media Commons

COinS

Research Collection School Of Computing and Information Systems

NIRMAL: Automatic Identification of Software Relevant Tweets Leveraging Language Model

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

NIRMAL: Automatic Identification of Software Relevant Tweets Leveraging Language Model

Author

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links