Research Collection School Of Computing and Information Systems

Prediction of relatedness in stack overflow: Deep learning vs. SVM: A reproducibility study

Bowen XU, Singapore Management UniversityFollow
Amirreza SHIRANI, University of Houston
David LO, Singapore Management UniversityFollow
Mohammad Amin ALIPOUR, University of Houston

Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

10-2018

Abstract

Background Xu et al. used a deep neural network (DNN) technique to classify the degree of relatedness between two knowledge units (question-answer threads) on Stack Overflow. More recently, extending Xu et al.'s work, Fu and Menzies proposed a simpler classification technique based on a fine-tuned support vector machine (SVM) that achieves similar performance but in a much shorter time. Thus, they suggested that researchers need to compare their sophisticated methods against simpler alternatives.Aim The aim of this work is to replicate the previous studies and further investigate the validity of Fu and Menzies' claim by evaluating the DNN- and SVM-based approaches on a larger dataset. We also compare the effectiveness of these two approaches against SimBow, a lightweight SVM-based method that was previously used for general community question-answering.Method We (1) collect a large dataset containing knowledge units from Stack Overflow, (2) show the value of the new dataset addressing shortcomings of the original one, (3) re-evaluate both the DNN-and SVM-based approaches on the new dataset, and (4) compare the performance of the two approaches against that of SimBow.Results We find that: (1) there are several limitations in the original dataset used in the previous studies, (2) effectiveness of both Xu et al.'s and Fu and Menzies' approaches (as measured using F1-score) drop sharply on the new dataset, (3) similar to the previous finding, performance of SVM-based approaches (Fu and Menzies' approach and SimBow) are slightly better than the DNN-based approach, (4) contrary to the previous findings, Fu and Menzies' approach runs much slower than DNN-based approach on the larger dataset - its runtime grows sharply with increase in dataset size, and (5) SimBow outperforms both Xu et al. and Fu and Menzies' approaches in terms of runtime.Conclusion We conclude that, for this task, simpler approaches based on SVM performs adequately well. We also illustrate the challenges brought by the increased size of the dataset and show the benefit of a lightweight SVM-based approach for this task.

Keywords

Relatedness Prediction, Deep Learning, Support Vector Machine

Discipline

Databases and Information Systems

Research Areas

Data Science and Engineering

Publication

ESEM '18: Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, Oulu, Finland, October 11-12

First Page

21:1

Last Page

ISBN

9781450358231

Identifier

10.1145/3239235.3240503

Publisher

ACM

City or Country

New York

Citation

XU, Bowen; SHIRANI, Amirreza; LO, David; and ALIPOUR, Mohammad Amin. Prediction of relatedness in stack overflow: Deep learning vs. SVM: A reproducibility study. (2018). ESEM '18: Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, Oulu, Finland, October 11-12. 21:1-10.
Available at: https://ink.library.smu.edu.sg/sis_research/4293

Copyright Owner and License

Publisher

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1145/3239235.3240503

Download

Find it in your library

Included in

Databases and Information Systems Commons

COinS

Research Collection School Of Computing and Information Systems

Prediction of relatedness in stack overflow: Deep learning vs. SVM: A reproducibility study

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Prediction of relatedness in stack overflow: Deep learning vs. SVM: A reproducibility study

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links