Network-clustered multi-modal bug localization

Van Duc Thong HOANG, Singapore Management University
Richard Jayadi OENTARYO, Singapore Management University
Bui Tien Duy LE, Singapore Management University
David LO, Singapore Management University

Abstract

Developers often spend much effort and resources to debug a program. To help the developers debug, numerous informationretrieval (IR)-based and spectrum-based bug localization techniqueshave been devised. IR-based techniques process textual informationin bug reports, while spectrum-based techniques process programspectra (i.e., a record of which program elements are executed foreach test case). While both techniques ultimately generate a rankedlist of program elements that likely contain a bug, they only considerone source of information—either bug reports or program spectra—which is not optimal. In light of this deficiency, this paper presents anew approach dubbed Network-clustered Multi-modal Bug Localization(NetML), which utilizes multi-modal information from both bug reportsand program spectra to localize bugs. NetML facilitates an effective buglocalization by carrying out a joint optimization of bug localization errorand clustering of both bug reports and program elements (i.e., methods).The clustering is achieved through the incorporation of network Lassoregularization, which incentivizes the model parameters of similar bugreports and similar program elements to be close together. To estimatethe model parameters of both bug reports and methods, NetML employs an adaptive learning procedure based on Newton method thatupdates the parameters on a per-feature basis. Extensive experimentson 355 real bugs from seven software systems have been conducted tobenchmark NetML against various state-of-the-art localization methods.The results show that NetML surpasses the best-performing baseline by31.82%, 22.35%, 19.72%, and 19.24%, in terms of the number of bugssuccessfully localized when a developer inspects the top 1, 5, and 10methods and Mean Average Precision (MAP), respectively.