Research Collection School Of Computing and Information Systems

Assessing the robustness of test selection methods for deep neural networks

Qiang HU
Yuejun GUO
Xiaofei XIE, Singapore Management UniversityFollow
Maxime CORDY
Wei MA
Mike PAPADAKIS
Lei MA
Yves LE TRAON

Publication Type

Journal Article

Version

publishedVersion

Publication Date

8-2025

Abstract

Regularly testing deep learning-powered systems on newly collected data is critical to ensure their reliability, robustness, and efficacy in real-world applications. This process is demanding due to the significant time and human effort required for labeling new data. While test selection methods alleviate manual labor by labeling and evaluating only a subset of data while meeting testing criteria, we observe that such methods with reported promising results are simply evaluated, e.g., testing on original test data. The question arises: are they always reliable? In this article, we explore when and to what extent test selection methods fail. First, we identify potential pitfalls of 11 selection methods based on their construction. Second, we conduct a study to empirically confirm the existence of these pitfalls. Furthermore, we demonstrate how pitfalls can break the reliability of these methods. Concretely, methods for fault detection suffer from data that are: (1) correctly classified but uncertain or (2) misclassified but confident. Remarkably, the test relative coverage achieved by such methods drops by up to 86.85\%. Besides, methods for performance estimation are sensitive to the choice of intermediate-layer output. The effectiveness of such methods can be even worse than random selection when using an inappropriate layer.

Keywords

deep learning testing, test selection, empirical study, fault detection, performance estimation

Discipline

Digital Communications and Networking | Software Engineering

Research Areas

Intelligent Systems and Optimization

Areas of Excellence

Digital transformation

Publication

ACM Transactions on Software Engineering and Methodology

Volume

Issue

First Page

Last Page

ISSN

1049-331X

Identifier

10.1145/3715693

Publisher

Association for Computing Machinery (ACM)

Citation

HU, Qiang; GUO, Yuejun; XIE, Xiaofei; CORDY, Maxime; MA, Wei; PAPADAKIS, Mike; MA, Lei; and LE TRAON, Yves. Assessing the robustness of test selection methods for deep neural networks. (2025). ACM Transactions on Software Engineering and Methodology. 34, (7), 1-26.
Available at: https://ink.library.smu.edu.sg/sis_research/10332

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1145/3715693

Download

Included in

Digital Communications and Networking Commons, Software Engineering Commons

COinS

Research Collection School Of Computing and Information Systems

Assessing the robustness of test selection methods for deep neural networks

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Areas of Excellence

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Assessing the robustness of test selection methods for deep neural networks

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Areas of Excellence

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links