Research Collection School Of Computing and Information Systems

An empirical study on data distribution-aware test selection for deep learning enhancement

Publication Type

Journal Article

Version

publishedVersion

Publication Date

7-2022

Abstract

Similar to traditional software that is constantly under evolution, deep neural networks need to evolve upon the rapid growth of test data for continuous enhancement (e.g., adapting to distribution shift in a new environment for deployment). However, it is labor intensive to manually label all of the collected test data. Test selection solves this problem by strategically choosing a small set to label. Via retraining with the selected set, deep neural networks will achieve competitive accuracy. Unfortunately, existing selection metrics involve three main limitations: (1) using different retraining processes, (2) ignoring data distribution shifts, and (3) being insufficiently evaluated. To fill this gap, we first conduct a systemically empirical study to reveal the impact of the retraining process and data distribution on model enhancement. Then based on our findings, we propose DAT, a novel distribution-aware test selection metric. Experimental results reveal that retraining using both the training and selected data outperforms using only the selected data. None of the selection metrics perform the best under various data distributions. By contrast, DAT effectively alleviates the impact of distribution shifts and outperforms the compared metrics by up to five times and 30.09% accuracy improvement for model enhancement on simulated and in-the-wild distribution shift scenarios, respectively.

Keywords

Deep learning testing, test selection, data distribution

Discipline

Artificial Intelligence and Robotics | Software Engineering

Research Areas

Information Systems and Management

Publication

ACM Transactions on Software Engineering and Methodology

Volume

Issue

First Page

78:1

Last Page

78:30

ISSN

1049-331X

Identifier

10.1145/3511598

Publisher

Association for Computing Machinery (ACM)

Citation

HU, Qiang; GUO, Yuejun; CORDY, Maxime; XIE, Xiaofei; MA, Lei; PAPADAKIS, Mike; and LE TRAON, Yves. An empirical study on data distribution-aware test selection for deep learning enhancement. (2022). ACM Transactions on Software Engineering and Methodology. 31, (4), 78:1-78:30.
Available at: https://ink.library.smu.edu.sg/sis_research/7195

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Download

Included in

Artificial Intelligence and Robotics Commons, Software Engineering Commons

COinS

Research Collection School Of Computing and Information Systems

An empirical study on data distribution-aware test selection for deep learning enhancement

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Creative Commons License

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

An empirical study on data distribution-aware test selection for deep learning enhancement

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Creative Commons License

Included in

Share

Search

Links

Browse

Links