Publication Type
Journal Article
Version
acceptedVersion
Publication Date
8-2020
Abstract
The amount of data in our society has been exploding in the era of big data. This article aims to address several open challenges in big data stream classification. Many existing studies in data mining literature follow the batch learning setting, which suffers from low efficiency and poor scalability. To tackle these challenges, we investigate a unified online learning framework for the big data stream classification task. Different from the existing online data stream classification techniques, we propose a unified Sparse Online Classification (SOC) framework. Based on SOC, we derive a second-order online learning algorithm and a cost-sensitive sparse online learning algorithm, which could successfully handle online anomaly detection tasks with the extremely unbalanced class distribution. As the performance evaluation, we analyze the theoretical bounds of the proposed algorithms and conduct an extensive set of experiments. The encouraging experimental results demonstrate the efficacy of the proposed algorithms over the state-of-the-art techniques on multiple data stream classification tasks.
Keywords
Online learning, sparse learning, classification, cost-sensitive learning
Discipline
Databases and Information Systems | Data Science | Numerical Analysis and Scientific Computing
Research Areas
Data Science and Engineering
Publication
ACM Transactions on Knowledge Discovery from Data
Volume
14
Issue
5
First Page
1
Last Page
18
ISSN
1556-4681
Identifier
10.1145/3361559
Publisher
ACM
Embargo Period
5-23-2021
Citation
ZHAO, Peilin; WONG, Dayong; WU, Pengcheng; and HOI, Steven C. H..
A unified framework for sparse online learning. (2020). ACM Transactions on Knowledge Discovery from Data. 14, (5), 1-18.
Available at: https://ink.library.smu.edu.sg/sis_research/5957
Copyright Owner and License
Authors
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1145/3361559
Included in
Databases and Information Systems Commons, Data Science Commons, Numerical Analysis and Scientific Computing Commons