Publication Type
Journal Article
Version
publishedVersion
Publication Date
11-2017
Abstract
The goal of online active learning is to learn predictive models from a sequence of unlabeled data given limited label querybudget. Unlike conventional online learning tasks, online active learning is considerably more challenging because of two reasons.Firstly, it is difficult to design an effective query strategy to decide when is appropriate to query the label of an incoming instance givenlimited query budget. Secondly, it is also challenging to decide how to update the predictive models effectively whenever the true labelof an instance is queried. Most existing approaches for online active learning are often based on a family of first-order online learningalgorithms, which are simple and efficient but fall short in the slow convergence and sub-optimal solution in exploiting the labeledtraining data. To solve these issues, this paper presents a novel framework of Second-order Online Active Learning (SOAL) by fullyexploiting both the first-order and second-order information. The proposed algorithms are able to achieve effective online learningefficacy, maximize the predictive accuracy and minimize the labeling cost. To make SOAL more practical for real-world applications,especially for class-imbalanced online classification tasks (e.g., malicious web detection), we extend the SOAL framework by proposingthe Cost-sensitive Second-order Online Active Learning algorithm named “SOALCS”, which is devised by maximizing the sum ofweighted sensitivity and specificity or minimizing the cost of weighted mistakes of different classes. We conducted both theoreticalanalysis and empirical studies, including an extensive set of experiments on a variety of large-scale real-world datasets, in which thepromising empirical results validate the efficacy and scalability of the proposed algorithms towards large-scale online learning tasks.
Keywords
Algorithm design and analysis, Labeling, Active Learning, Online Learning, Prediction algorithms, Machine learning algorithms, Malicious websites detection, Predictive models, Training
Discipline
Databases and Information Systems | Numerical Analysis and Scientific Computing | Theory and Algorithms
Research Areas
Data Science and Engineering
Publication
IEEE Transactions on Knowledge and Data Engineering
Volume
30
Issue
7
First Page
1338
Last Page
1351
ISSN
1041-4347
Identifier
10.1109/TKDE.2017.2778097
Publisher
Institute of Electrical and Electronics Engineers (IEEE)
Citation
HAO, Shuji; LU, Jing; ZHAO, Peilin; ZHANG, Chi; HOI, Steven C. H.; and MIAO, Chunyan.
Second-order online active learning and its applications. (2017). IEEE Transactions on Knowledge and Data Engineering. 30, (7), 1338-1351.
Available at: https://ink.library.smu.edu.sg/sis_research/4132
Copyright Owner and License
Authors
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1109/TKDE.2017.2778097
Included in
Databases and Information Systems Commons, Numerical Analysis and Scientific Computing Commons, Theory and Algorithms Commons