It takes two to tango: Deleted Stack Overflow question prediction with text and meta features
Publication Type
Conference Proceeding Article
Publication Date
6-2016
Abstract
Stack Overflow is a popular community-based Q&A website that caters to technical needs of software developers. As of February 2015 - Stack Overflow has more than 3.9M registered users, 8.8M questions, and 41M comments. Stack Overflow provides explicit and detailed guidelines on how to post questions but, some questions are very poor in quality. Such questions are deleted by the experienced community members and moderators. Deleted questions increase maintenance cost and have an adverse impact on the user experience. Therefore, predicting deleted questions is an important task. In this study, we propose a two stage hybrid approach - DelPredictor - which combines text processing and classification techniques to predict deleted questions. In the first stage, DelPredictor converts text in the title, body, and tag fields of questions into numerical textual features via text processing and classification techniques. In the second stage, it extracts meta features that can be categorized into: profile, community, content, and syntactic features. Next, it learns and combines two independent classifiers built on the textual and meta features. We evaluate DelPredictor on 5 years (2008 - 2013) of deleted questions from Stack Overflow. Our experimental results show that DelPredictor improves the F1-scores over baseline prediction, a prior approach [12] and a text-based approach by 29.50%, 9.34%, and 28.11%, respectively.
Keywords
Classification, Deleted Question, Stack Overflow, Text Processing
Discipline
Computer Sciences | Software Engineering
Research Areas
Software and Cyber-Physical Systems
Publication
COMPSAC 2016: Proceedings of the 40th IEEE Annual International Computers, Software and Applications Conference: Atlanta, Georgia, 10-14 June 2016
First Page
73
Last Page
82
ISBN
9781467388450
Identifier
10.1109/COMPSAC.2016.145
Publisher
IEEE Computer Society
City or Country
Los Alamitos, CA
Citation
XIA, Xin; David LO; CORREA, Denzil; SUREKA, Ashish; and SHIHAB, Emad.
It takes two to tango: Deleted Stack Overflow question prediction with text and meta features. (2016). COMPSAC 2016: Proceedings of the 40th IEEE Annual International Computers, Software and Applications Conference: Atlanta, Georgia, 10-14 June 2016. 73-82.
Available at: https://ink.library.smu.edu.sg/sis_research/3568
Additional URL
http://doi.org/10.1109/COMPSAC.2016.145