Publication Type
Journal Article
Version
acceptedVersion
Publication Date
10-2018
Abstract
Source code terms such as method names and variable types are often different from conceptual words mentioned in a search query. This vocabulary mismatch problem can make code search inefficient. In this paper, we present COde voCABUlary (CoCaBu), an approach to resolving the vocabulary mismatch problem when dealing with free-form code search queries. Our approach leverages common developer questions and the associated expert answers to augment user queries with the relevant, but missing, structural code entities in order to improve the performance of matching relevant code examples within large code repositories. To instantiate this approach, we build GitSearch, a code search engine, on top of GitHub and Stack Overflow Q&A data. We evaluate GitSearch in several dimensions to demonstrate that (1) its code search results are correct with respect to user-accepted answers; (2) the results are qualitatively better than those of existing Internet-scale code search engines; (3) our engine is competitive against web search engines, such as Google, in helping users solve programming tasks; and (4) GitSearch provides code examples that are acceptable or interesting to the community as answers for Stack Overflow questions.
Keywords
Code search, GitHub, Free-form search, Query augmentation, StackOverflow, Vocabulary mismatch
Discipline
Computer Engineering | Programming Languages and Compilers | Software Engineering
Research Areas
Data Science and Engineering
Publication
Empirical Software Engineering
Volume
23
Issue
5
First Page
2622
Last Page
2654
ISSN
1382-3256
Identifier
10.1007/s10664-017-9544-y
Publisher
Springer Verlag (Germany)
Citation
SIRRES, Raphael; BISSYANDE, Tegawendé F.; KIM, Dongsun; LO, David; KLEIN, Jacques; KIM, Kisub; and TRAON, Yves Le.
Augmenting and structuring user queries to support efficient free-form code search. (2018). Empirical Software Engineering. 23, (5), 2622-2654.
Available at: https://ink.library.smu.edu.sg/sis_research/4129
Copyright Owner and License
Authors
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1007/s10664-017-9544-y
Included in
Computer Engineering Commons, Programming Languages and Compilers Commons, Software Engineering Commons