Publication Type

PhD Dissertation

Version

publishedVersion

Publication Date

2015

Abstract

Today’s software is large and complex, consisting of millions of lines of code. New developers of a software project always face significant challenges in finding code related to their development or maintenance tasks (e.g., implementing features, fixing bugs and adding new features). In fact, research has shown that developers typically spend more time on locating and understanding code than modifying it. Thus, we can significantly reduce the cost of software development and maintenance by reducing the time to search and understand code relevant to a software development or maintenance task. In order to reduce the time of searching and understanding relevant code, many code search techniques are proposed. For different circumstances, the best form of inputs (i.e., queries) users can provide to search for a piece of code of interest may differ. During development, developers usually like to search a piece of code implementing certain functionality for reuse by expressing their queries in free-form texts (i.e., natural language). After deployment, users might report bugs to an issue tracking system. For these bug reports, developers would benefit from an automated tool that can identify buggy code from the descriptions of the symptoms of the bugs. During maintenance, developers may notice that some pieces of code with a particular structure are potentially buggy. A code search technique that allows users to specify the code structure using a query language may be the best choice. In another scenario, developers may have found some buggy code examples and they would like to locate other similar code snippets containing the same problem across the entire system. In this case, a code search technique that takes as input known buggy code examples is the best choice. During testing, suppose developers have execution traces of a suite of test cases, they might want to use these execution traces as input to search the buggy code. Developers may also like to provide feedback to the code search engine to improve results. From the above examples, we could see that there is a need for multimodal code search which allows users to express their needs in multiple input forms and processes different inputs with different strategies. This will make their search more convenient and effective. In this dissertation, we propose a multimodal code search engine, which employs novel techniques that allow developers to effectively find code elements of interest by processing developers’ inputs in various input forms including free-form texts, an SQL-like domain-specific language, code examples, execution traces, and user feedback. In the multimodal code search engine, we utilize program analysis, data mining, and machine learning techniques to improve the code search accuracy. Our evaluations show that our approaches improve over state-of-the-art approaches significantly.

Keywords

code search, software engineering, bug localization, fault localization, software repository mining, software maintainence

Degree Awarded

PhD in Information Systems

Discipline

Software Engineering

Supervisor(s)

LO, David

First Page

1

Last Page

170

Publisher

Singapore Management University

City or Country

Singapore

Copyright Owner and License

Author

Share

COinS