A Model Driven Approach to Imbalanced Data Sampling in Medical Decision Making
Publication Type
Conference Proceeding Article
Publication Date
12-2010
Abstract
Classification is an important medical decision support function that can be seriously affected by disproportionate class distribution in the training data. In medical decision making, the rate of misclassification and the cost of misclassifying a minority (positive) class as a majority (negative) class are especially high. In this paper, we propose a new model-driven sampling approach to balancing data samples. Most existing data sampling methods produce new data points based on local, deterministic information. Our approach extends the idea of generative sampling to produce new data points based on an induced probabilistic graphical model. We present the motivation and the design of the proposed algorithm, and compare it with two representative imbalanced data sampling approaches on four medical data sets varying in size, imbalance ratio, and dimension. The empirical study helped identify the challenges in imbalanced data problems in medicine, and highlighted the strengths and limitations of the relevant sampling approaches. Performance of the model driven approach is shown to be comparable with existing approaches; potential improvements could be achieved by incorporating domain knowledge. © 2010 IMIA and SAHIA. All rights reserved.
Keywords
Imbalanced data learning, Model driven sampling, Random sampling, Synthetic Minority Over Sampling (SMOTE)
Discipline
Databases and Information Systems | Health Information Technology
Publication
13th World Congress on Medical and Health Informatics, Medinfo 2010
Volume
160
First Page
856
Last Page
860
ISBN
9781607505877
Identifier
10.3233/978-1-60750-588-4-856
City or Country
Cape Town, South Africa
Citation
Yin H. and Tze-Yun LEONG.
A Model Driven Approach to Imbalanced Data Sampling in Medical Decision Making. (2010). 13th World Congress on Medical and Health Informatics, Medinfo 2010. 160, 856-860.
Available at: https://ink.library.smu.edu.sg/sis_research/2988