Research Collection School Of Computing and Information Systems

Are the code snippets what we are searching for? A benchmark and an empirical study on code search with natural-language queries

Shuhan YAN, Shanghai Jiaotong University
Hang YU, Shanghai Jiaotong University
Yuting CHEN, Shanghai Jiaotong University
Beijun SHEN, Shanghai Jiaotong University

Publication Type

Edited Conference Proceeding

Version

acceptedVersion

Publication Date

2-2020

Abstract

Code search methods, especially those that allow programmers to raise queries in a natural language, plays an important role in software development. It helps to improve programmers' productivity by returning sample code snippets from the Internet and/or source-code repositories for their natural-language queries. Meanwhile, there are many code search methods in the literature that support natural-language queries. Difficulties exist in recognizing the strengths and weaknesses of each method and choosing the right one for different usage scenarios, because (1) the implementations of those methods and the datasets for evaluating them are usually not publicly available, and (2) some methods leverage different training datasets or auxiliary data sources and thus their effectiveness cannot be fairly measured and may be negatively affected in practical uses. To build a common ground for measuring code search methods, this paper builds CosBench, a dataset that consists of 1000 projects, 52 code-independent natural-language queries with ground truths, and a set of scripts for calculating four metrics on code research results. We have evaluated four IR (Information Retrieval)-based and two DL (Deep Learning)-based code search methods on CosBench. The empirical evaluation results clearly show the usefulness of the CosBench dataset and various strengths of each code search method. We found that DL-based methods are more suitable for queries on reusing code, and IR-based ones for queries on resolving bugs and learning API uses.

Keywords

natural-language code search, benchmarking, empirical study, information retrieval, machine learning, deep learning, word embedding

Discipline

Software Engineering

Research Areas

Software and Cyber-Physical Systems

Publication

2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER): Ontario, Canada, February 18-21: Proceedings

First Page

344

Last Page

354

ISBN

9781728151434

Identifier

10.1109/SANER48275.2020.9054840

Publisher

IEEE

City or Country

Piscataway, NJ

Embargo Period

5-31-2021

Citation

YAN, Shuhan; YU, Hang; CHEN, Yuting; and SHEN, Beijun. Are the code snippets what we are searching for? A benchmark and an empirical study on code search with natural-language queries. (2020). 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER): Ontario, Canada, February 18-21: Proceedings. 344-354.
Available at: https://ink.library.smu.edu.sg/sis_research/5975

Copyright Owner and License

Authors

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1109/SANER48275.2020.9054840

Download

Find it in your library

Included in

Software Engineering Commons

COinS

Research Collection School Of Computing and Information Systems

Are the code snippets what we are searching for? A benchmark and an empirical study on code search with natural-language queries

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Embargo Period

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Are the code snippets what we are searching for? A benchmark and an empirical study on code search with natural-language queries

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Embargo Period

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links