Publication Type

Journal Article

Version

acceptedVersion

Publication Date

8-2020

Abstract

The amount of data in our society has been exploding in the era of big data. This article aims to address several open challenges in big data stream classification. Many existing studies in data mining literature follow the batch learning setting, which suffers from low efficiency and poor scalability. To tackle these challenges, we investigate a unified online learning framework for the big data stream classification task. Different from the existing online data stream classification techniques, we propose a unified Sparse Online Classification (SOC) framework. Based on SOC, we derive a second-order online learning algorithm and a cost-sensitive sparse online learning algorithm, which could successfully handle online anomaly detection tasks with the extremely unbalanced class distribution. As the performance evaluation, we analyze the theoretical bounds of the proposed algorithms and conduct an extensive set of experiments. The encouraging experimental results demonstrate the efficacy of the proposed algorithms over the state-of-the-art techniques on multiple data stream classification tasks.

Keywords

Online learning, sparse learning, classification, cost-sensitive learning

Discipline

Databases and Information Systems | Data Science | Numerical Analysis and Scientific Computing

Research Areas

Data Science and Engineering

Publication

ACM Transactions on Knowledge Discovery from Data

Volume

14

Issue

5

First Page

1

Last Page

18

ISSN

1556-4681

Identifier

10.1145/3361559

Publisher

ACM

Embargo Period

5-23-2021

Copyright Owner and License

Authors

Additional URL

https://doi.org/10.1145/3361559

Share

COinS