Research Collection School Of Computing and Information Systems

Reliable-Data-Split (RDS): Maximizing model potential with reinforced selection strategy

Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

12-2025

Abstract

The nexus between data characteristics and parametric models is fundamental for developing effective and reliable artificial intelligence (AI) systems. Mismatches in data properties for model development may lead to deleterious effects on AI model performance in machine learning practice. This paper proposes a Reliable Data Split (RDS) procedure to learn how to select data points that will generalise the target domain adequately by employing prior knowledge of the data generative process. We introduce a reinforced selection strategy using deep reinforcement learning with diverse black box predictors in maximising ensemble rewards as the proxy of model performance potential while maintaining an appropriate proportionate allocation and the independent and identically distributed (i.i.d.) assumption. A comprehensive evaluation of the RDS procedure is conducted on four real-world datasets, including Madelon, Drug Reviews, MNIST, and Kalapa Credit Scoring Challenge, with coverage of machine learning tasks such as binary classification, multi-class classification, and regression on multivariate, textual, and visual data. The experimental results evidently demonstrate consistent performance improvements of trainable data samples over classical or prior data selection. Hence, we advocate the use of RDS for data splitting in the early stage of machine learning tasks for parameter tuning, model selection and overfitting prevention, as well as, sampling in large-scale AI competitions for searching the best possible and shift-stable solutions.

Keywords

Artificial intelligence systems; Characteristic model; Data characteristics; Data properties; Learning tasks; Machine-learning; Model development; Model potential; Modeling performance; Parametric models

Discipline

Artificial Intelligence and Robotics | Databases and Information Systems

Publication

Proceedings of Machine Learning Research: Reliable and Trustworthy Artificial Intelligence Workshop at 17th Asian Conference on Machine Learning, ACML 2025, Taipei, December 12

Volume

310

First Page

Last Page

Publisher

ML Research Press

City or Country

Taipei

Citation

Nguyen, Hoang D.; Vu, Xuan-Son; TRUONG, Quoc Tuan; and Le, Duc-Trong. Reliable-Data-Split (RDS): Maximizing model potential with reinforced selection strategy. (2025). Proceedings of Machine Learning Research: Reliable and Trustworthy Artificial Intelligence Workshop at 17th Asian Conference on Machine Learning, ACML 2025, Taipei, December 12. 310, 73-89.
Available at: https://ink.library.smu.edu.sg/sis_research/11029

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://proceedings.mlr.press/v310/nguyen25c.html

Download

Included in

Artificial Intelligence and Robotics Commons, Databases and Information Systems Commons

COinS

Research Collection School Of Computing and Information Systems

Reliable-Data-Split (RDS): Maximizing model potential with reinforced selection strategy

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Publication

Volume

First Page

Last Page

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Reliable-Data-Split (RDS): Maximizing model potential with reinforced selection strategy

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Publication

Volume

First Page

Last Page

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links