Publication Type
Journal Article
Version
publishedVersion
Publication Date
1-2019
Abstract
Traditional code search engines (e.g., Krugle) often do not perform well with natural language queries. They mostly apply keyword matching between query and source code. Hence, they need carefully designed queries containing references to relevant APIs for the code search. Unfortunately, preparing an effective search query is not only challenging but also time-consuming for the developers according to existing studies. In this article, we propose a novel query reformulation technique–RACK–that suggests a list of relevant API classes for a natural language query intended for code search. Our technique offers such suggestions by exploiting keyword-API associations from the questions and answers of Stack Overflow (i.e., crowdsourced knowledge). We first motivate our idea using an exploratory study with 19 standard Java API packages and 344K Java related posts from Stack Overflow. Experiments using 175 code search queries randomly chosen from three Java tutorial sites show that our technique recommends correct API classes within the Top-10 results for 83% of the queries, with 46% mean average precision and 54% recall, which are 66%, 79% and 87% higher respectively than that of the state-of-the-art. Reformulations using our suggested API classes improve 64% of the natural language queries and their overall accuracy improves by 19%. Comparisons with three state-of-the-art techniques demonstrate that RACK outperforms them in the query reformulation by a statistically significant margin. Investigation using three web/code search engines shows that our technique can significantly improve their results in the context of code search.
Keywords
Code search, Keyword-API association, Crowdsourced knowledge, Stack Overflow, Query reformulation
Discipline
Programming Languages and Compilers | Software Engineering
Research Areas
Software and Cyber-Physical Systems
Publication
Empirical Software Engineering
Volume
24
Issue
4
First Page
1869
Last Page
1924
ISSN
1382-3256
Identifier
10.1007/s10664-018-9671-0
Publisher
Springer Verlag (Germany)
Citation
RAHMAN, Mohammad M.; ROY, Chanchal K.; and LO, David.
Automatic query reformulation for code search using crowdsourced knowledge. (2019). Empirical Software Engineering. 24, (4), 1869-1924.
Available at: https://ink.library.smu.edu.sg/sis_research/4374
Copyright Owner and License
Authors
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1007/s10664-018-9671-0