Publication Type
Journal Article
Version
publishedVersion
Publication Date
3-2014
Abstract
Feature selection is an important technique for data mining. Despite its importance, most studies of feature selection are restricted to batch learning. Unlike traditional batch learning methods, online learning represents a promising family of efficient and scalable machine learning algorithms for large-scale applications. Most existing studies of online learning require accessing all the attributes/features of training instances. Such a classical setting is not always appropriate for real-world applications when data instances are of high dimensionality or it is expensive to acquire the full set of attributes/features. To address this limitation, we investigate the problem of online feature selection (OFS) in which an online learner is only allowed to maintain a classifier involved only a small and fixed number of features. The key challenge of online feature selection is how to make accurate prediction for an instance using a small number of active features. This is in contrast to the classical setup of online learning where all the features can be used for prediction. We attempt to tackle this challenge by studying sparsity regularization and truncation techniques. Specifically, this article addresses two different tasks of online feature selection: 1) learning with full input, where an learner is allowed to access all the features to decide the subset of active features, and 2) learning with partial input, where only a limited number of features is allowed to be accessed for each instance by the learner. We present novel algorithms to solve each of the two problems and give their performance analysis. We evaluate the performance of the proposed algorithms for online feature selection on several public data sets, and demonstrate their applications to real-world problems including image classification in computer vision and microarray gene expression analysis in bioinformatics. The encouraging results of our experiments validate the efficacy and efficiency of th- proposed techniques.
Keywords
Feature selection, online learning, large-scale data mining, classification, big data analytics
Discipline
Computer Sciences | Databases and Information Systems | Numerical Analysis and Scientific Computing
Research Areas
Data Science and Engineering
Publication
IEEE Transactions on Knowledge and Data Engineering
Volume
26
Issue
3
First Page
698
Last Page
710
ISSN
1041-4347
Identifier
10.1109/TKDE.2013.32
Publisher
IEEE
Citation
WANG, Jialei; ZHAO, Peilin; HOI, Steven C. H.; and JIN, Rong.
Online feature selection and its applications. (2014). IEEE Transactions on Knowledge and Data Engineering. 26, (3), 698-710.
Available at: https://ink.library.smu.edu.sg/sis_research/2277
Copyright Owner and License
Authors
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1109/TKDE.2013.32
Included in
Databases and Information Systems Commons, Numerical Analysis and Scientific Computing Commons