Research Collection School Of Computing and Information Systems

An empirical study of bugs in machine learning systems

Ferdian THUNG, Singapore Management UniversityFollow
Shaowei WANG, Singapore Management UniversityFollow
David LO, Singapore Management UniversityFollow
Lingxiao JIANG, Singapore Management UniversityFollow

Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

11-2012

Abstract

Many machine learning systems that include various data mining, information retrieval, and natural language processing code and libraries have being used in real world applications. Search engines, internet advertising systems, product recommendation systems are sample users of these algorithm intensive code and libraries. Machine learning code and toolkits have also been used in many recent studies on software mining and analytics that aim to automate various software engineering tasks. With the increasing number of important applications of machine learning systems, the reliability of such systems is also becoming increasingly important. A necessary step for ensuring reliability of such systems is to understand the features and characteristics of bugs occurred in the systems. A number of studies have investigated bugs and fixes in various software systems, but none focuses on machine learning systems. Machine learning systems are unique due to their algorithm-intensive nature and applications to potentially large-scale data, and thus deserve a special consideration. In this study, we fill the research gap by performing an empirical study on the bugs appeared in machine learning systems. We analyze three systems, namely Apache Mahout, Lucene, and OpenNLP, which are data mining, information retrieval, and natural language processing tools respectively. We look into their bug databases and code repositories, analyze existing bugs and corresponding fixes, and label the bugs into various categories. Our study finds that 22.6% of the bugs belong to algorithm/method category, 15.6% of the bugs belong to the non-functional category, and 13% of the bugs belong to the assignment/initialization category. We also report the relationship between the categories of bugs and their severity, the time and effort needed to fix the bugs, and their impact. We highlight several categories of bugs that deserve attention in future research.

Discipline

Software Engineering

Research Areas

Software and Cyber-Physical Systems

Publication

ISSRE 2012: Proceedings of the 23rd IEEE International Symposium on Software Reliability Engineering, Dallas, 27-30 November 2012

First Page

271

Last Page

280

ISBN

9781467346382

Identifier

10.1109/ISSRE.2012.22

Publisher

IEEE

City or Country

Piscataway, NJ

Citation

THUNG, Ferdian; WANG, Shaowei; LO, David; and JIANG, Lingxiao. An empirical study of bugs in machine learning systems. (2012). ISSRE 2012: Proceedings of the 23rd IEEE International Symposium on Software Reliability Engineering, Dallas, 27-30 November 2012. 271-280.
Available at: https://ink.library.smu.edu.sg/sis_research/1587

Copyright Owner and License

Authors

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

http://doi.org/10.1109/ISSRE.2012.22

Download

Find it in your library

Included in

Software Engineering Commons

COinS

Research Collection School Of Computing and Information Systems

An empirical study of bugs in machine learning systems

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

An empirical study of bugs in machine learning systems

Author

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links