Publication Type

PhD Dissertation

Version

publishedVersion

Publication Date

6-2014

Abstract

A fault is the root cause of program failures where a program behaves differently from the intended behavior. Finding or localizing faults is often laborious (especially so for complex programs), yet it is an important task in the software lifecycle. An automated technique that can accurately and quickly identify the faulty code is greatly needed to alleviate the costs of software debugging. Many fault localization techniques assume that faults are localizable, i.e., each fault manifests only in a single or a few lines of code that are close to one another. To verify this assumption, we study how faults spread across program elements. We find that most faults are localizable within a few lines of code or a few methods, while around 30% of the faults manifest in a single line of code. Spectrum-based fault localization approach is a lightweight approach that analyzes execution traces to highlight top-most suspicious program elements (i.e., statement, blocks, etc.) for inspection by developers. Our fault localization technique can be categorized into spectrum-based approach. Our technique localizes faults by measuring the strength of the relationship between the execution of a program element and the occurrence of a program failure. Various association measures are proposed in the domains of statistics and data mining to quantify the strength of the relationship between two variables of interest. However, their effectiveness in localizing faults is not well studied. We investigate the effectiveness of 40 association measures in localizing faults in single-bug and multiple-bug programs. Some of the measures achieve smaller percentage of code inspected on average than the two well-known spectrum-based techniques, namely Ochiai and Tarantula, while a number of the measures are comparable to Ochiai and Tarantula. Different fault localization techniques have different effectiveness in localizing faults for different buggy programs. We propose an approach called Fusion Localizer to leverage their differences and boost the effectiveness in localizing faults. Our approach combines scores or ranking information produced by existing spectrumbased fault localization techniques in particular, 40 association measures, Ochiai, and Tarantula, to inexpensively rank the faulty program elements using data fusion methods that have been studied in the domain of information retrieval. Our evaluation demonstrates that our approach can significantly improve the effectiveness of existing state-of-the-art fault localization techniques. The above approaches localize potential faulty elements using execution traces. However at times, full execution traces are not available for debugging. Code clones (i.e., pieces of similar code) have been shown to be useful for detecting bugs because the inconsistent changes among clones in a clone group may indicate potential bugs. However, clone-based bug detection techniques suffer from an excessive number of false positives. Our technique ranks the anomaly reports that contain bugs earlier in the list as compared to the original list. By actively and incrementally incorporating user feedback to iteratively refine our classification model and reorder the anomaly reports, our technique can successfully reduce the false positive rate. In summary, this dissertation has empirically demonstrated the need of and proposed a number of novel ranking-based approaches for localizing faults, which advances the previous state-of-the-art.

Keywords

automatic debugging, software maintenance, suspicious code, localizing faults, data mining, ranking approaches

Degree Awarded

PhD in Information Systems

Discipline

Software Engineering

Supervisor(s)

LO, David

First Page

1

Last Page

157

Publisher

Singapore Management University

City or Country

Singapore

Copyright Owner and License

Author

Share

COinS