Research Collection School Of Computing and Information Systems

Real: A representative error-driven approach for active learning

Cheng CHEN
Yong WANG, Singapore Management UniversityFollow
Lizi LIAO, Singapore Management UniversityFollow
Yueguo CHEN
Xiaoyong DU

Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

9-2023

Abstract

Given a limited labeling budget, active learning (al) aims to sample the most informative instances from an unlabeled pool to acquire labels for subsequent model training. To achieve this, al typically measures the informativeness of unlabeled instances based on uncertainty and diversity. However, it does not consider erroneous instances with their neighborhood error density, which have great potential to improve the model performance. To address this limitation, we propose Real, a novel approach to select data instances with Representative Errors for Active Learning. It identifies minority predictions as pseudo errors within a cluster and allocates an adaptive sampling budget for the cluster based on estimated error density. Extensive experiments on five text classification datasets demonstrate that Real consistently outperforms all best-performing baselines regarding accuracy and F1-macro scores across a wide range of hyperparameter settings. Our analysis also shows that Real selects the most representative pseudo errors that match the distribution of ground-truth errors along the decision boundary. Our code is publicly available at https://github.com/withchencheng/ECML_PKDD_23_Real.

Keywords

Active Learning, Error density, Error-driven, Informativeness, Labelings, Model training, Neighbourhood, Pseudo errors, Text classification

Discipline

Databases and Information Systems

Research Areas

Data Science and Engineering

Publication

Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Turin, Italy, 2023 September 18-22

First Page

Last Page

ISBN

9783031434112

Identifier

10.1007/978-3-031-43412-9_2

Publisher

Springer

City or Country

New York

Citation

CHEN, Cheng; WANG, Yong; LIAO, Lizi; CHEN, Yueguo; and DU, Xiaoyong. Real: A representative error-driven approach for active learning. (2023). Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Turin, Italy, 2023 September 18-22. 20-37.
Available at: https://ink.library.smu.edu.sg/sis_research/8586

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1007/978-3-031-43412-9_2

Download

Included in

Databases and Information Systems Commons

COinS

Research Collection School Of Computing and Information Systems

Real: A representative error-driven approach for active learning

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Real: A representative error-driven approach for active learning

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links