Automated Configuration Bug Report Prediction Using Text Mining
Publication Type
Conference Proceeding Article
Publication Date
7-2014
Abstract
Configuration bugs are one of the dominant causes of software failures. Previous studies show that a configuration bug could cause huge financial losses in a software system. The importance of configuration bugs has attracted various research studies, e.g., To detect, diagnose, and fix configuration bugs. Given a bug report, an approach that can identify whether the bug is a configuration bug could help developers reduce debugging effort. We refer to this problem as configuration bug reports prediction. To address this problem, we develop a new automated framework that applies text mining technologies on the natural-language description of bug reports to train a statistical model on historical bug reports with known labels (i.e., Configuration or non-configuration), and the statistical model is then used to predict a label for a new bug report. Developers could apply our model to automatically predict labels of bug reports to improve their productivity. Our tool first applies feature selection techniques (e.g., Information gain and Chi-square) to pre-process the textual information in bug reports, and then applies various text mining techniques (e.g., Naive Bayes, SVM, naive Bayes multinomial) to build statistical models. We evaluate our solution on 5 bug report datasets including accumulo, activemq, camel, flume, and wicket. We show that naive Bayes multinomial with information gain achieves the best performance. On average across the 5 projects, its accuracy, configuration F-measure and non-configuration F-measure are 0.811, 0.450, and 0.880, respectively. We also compare our solution with the method proposed by Arshad et al. The results show that our proposed approach that uses naive Bayes multinomial with information gain on average improves accuracy, configuration F-measure and non-configuration F-measure scores of Arshad et al.'s method by 8.34%, 103.7%, and 4.24%, respectively.
Keywords
data mining, program debugging, statistical analysis, text analysis
Discipline
Information Security | Software Engineering
Research Areas
Software and Cyber-Physical Systems
Publication
2014 IEEE 38th Annual Computer Software and Applications Conference (COMPSAC 2014): Vasteras, Sweden, 21-25 July 2014
First Page
107
Last Page
116
ISBN
9781479935765
Identifier
10.1109/COMPSAC.2014.17
Publisher
IEEE
City or Country
Piscataway, NJ
Citation
Xia, Xin; LO, David; Qiu, Weiwei; Xingen, Wang; and Zhou, Bo.
Automated Configuration Bug Report Prediction Using Text Mining. (2014). 2014 IEEE 38th Annual Computer Software and Applications Conference (COMPSAC 2014): Vasteras, Sweden, 21-25 July 2014. 107-116.
Available at: https://ink.library.smu.edu.sg/sis_research/2418
Additional URL
http://dx.doi.org/10.1109/COMPSAC.2014.17