Publication Type
Conference Proceeding Article
Version
acceptedVersion
Publication Date
7-2020
Abstract
An alternative to current mainstream preprocessing methods is proposed: Value Selection (VS). Unlike the existing methods such as feature selection that removes features and instance selection that eliminates instances, value selection eliminates the values (with respect to each feature) in the dataset with two purposes: reducing the model size and preserving its accuracy. Two probabilistic methods based on information theory's metric are proposed: PVS and P + VS. Extensive experiments on the benchmark datasets with various sizes are elaborated. Those results are compared with the existing preprocessing methods such as feature selection, feature transformation, and instance selection methods. Experiment results show that value selection can achieve the balance between accuracy and model size reduction.
Keywords
preprocessing, data mining, value selection, model size reduction, entropy, information theory
Discipline
Databases and Information Systems | Theory and Algorithms
Research Areas
Data Science and Engineering
Publication
Proceedings of the 21st IEEE International Conference on Mobile Data Management
Identifier
10.1109/MDM48529.2020.00037
Publisher
IEEE
City or Country
Versailles, France
Citation
NJOO, Gunarto Sindoro; ZHENG, Baihua; HSU, Kuo-Wei; and PENG, Wen-Chih.
Probabilistic Value Selection for Space Efficient Model. (2020). Proceedings of the 21st IEEE International Conference on Mobile Data Management.
Available at: https://ink.library.smu.edu.sg/sis_research/5264
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1109/MDM48529.2020.00037