Research Collection School Of Computing and Information Systems

Enhanced sample selection with confidence tracking: Identifying correctly labeled yet hard-to-learn samples in noisy data

Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

3-2025

Abstract

We propose a novel sample selection method for image classification in the presence of noisy labels. Existing methods typically consider small-loss samples as correctly labeled. However, some correctly labeled samples are inherently difficult for the model to learn and can exhibit high loss similar to mislabeled samples in the early stages of training. Consequently, setting a threshold on per-sample loss to select correct labels results in a trade-off between precision and recall in sample selection: a lower threshold may miss many correctly labeled hard-to-learn samples (low recall), while a higher threshold may include many mislabeled samples (low precision). To address this issue, our goal is to accurately distinguish correctly labeled yet hard-to-learn samples from mislabeled ones, thus alleviating the trade-off dilemma. We achieve this by considering the trends in model prediction confidence rather than relying solely on loss values. Empirical observations show that only for correctly labeled samples, the model's prediction confidence for the annotated labels typically increases faster than for any other classes. Based on this insight, we propose tracking the confidence gaps between the annotated labels and other classes during training and evaluating their trends using the Mann-Kendall Test. A sample is considered potentially correctly labeled if all its confidence gaps tend to increase. Our method functions as a plug-and-play component that can be seamlessly integrated into existing sample selection techniques. Experiments on several standard benchmarks and real-world datasets demonstrate that our method enhances the performance of existing methods for learning with noisy labels.

Discipline

Artificial Intelligence and Robotics | Databases and Information Systems

Research Areas

Data Science and Engineering

Publication

AAAI'25/IAAI'25/EAAI'25: Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence and Thirty-Seventh Conference on Innovative Applications of Artificial Intelligence and Fifteenth Symposium on Educational Advances in Artificial Intelligence, Philadelphia, Pennsylvania, February 25 - March 4

Volume

First Page

19795

Last Page

19803

Identifier

10.1609/aaai.v39i19.34180

Publisher

ACM

City or Country

New York

Citation

PAN, Weiran; WEI, Wei; ZHU, Feida; and DENG, Yong. Enhanced sample selection with confidence tracking: Identifying correctly labeled yet hard-to-learn samples in noisy data. (2025). AAAI'25/IAAI'25/EAAI'25: Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence and Thirty-Seventh Conference on Innovative Applications of Artificial Intelligence and Fifteenth Symposium on Educational Advances in Artificial Intelligence, Philadelphia, Pennsylvania, February 25 - March 4. 39, 19795-19803.
Available at: https://ink.library.smu.edu.sg/sis_research/10963

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1609/aaai.v39i19.34180

Download

Included in

Artificial Intelligence and Robotics Commons, Databases and Information Systems Commons

COinS

Research Collection School Of Computing and Information Systems

Enhanced sample selection with confidence tracking: Identifying correctly labeled yet hard-to-learn samples in noisy data

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Publication

Volume

First Page

Last Page

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Enhanced sample selection with confidence tracking: Identifying correctly labeled yet hard-to-learn samples in noisy data

Author

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Publication

Volume

First Page

Last Page

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links