Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
4-2018
Abstract
Representative subset selection (RSS) is an important tool for users to draw insights from massive datasets. A common approach is to model RSS as the submodular maximization problem because the utility of extracted representatives often satisfies the "diminishing returns" property. To capture the data recency issue and support different types of constraints in real-world problems, we formulate RSS as maximizing a submodular function subject to a d-knapsack constraint (SMDK) over sliding windows. Then, we propose a novel KnapWindow framework for SMDK. Theoretically, KnapWindow is 1-ε/1+d - approximate for SMDK and achieves sublinear complexity. Finally, we evaluate the efficiency and effectiveness of KnapWindow on real-world datasets. The results show that it achieves up to 120x speedups over the batch baseline with at least 94% utility assurance.
Keywords
Data summarization, submodular maximization, data stream, sliding window, approximation algorithm
Discipline
Databases and Information Systems | Numerical Analysis and Scientific Computing | Theory and Algorithms
Research Areas
Data Science and Engineering
Publication
2018 IEEE 34th International Conference on Data Engineering (ICDE): Paris, April 16-19: Proceedings
First Page
1268
Last Page
1271
ISBN
9781538655207
Identifier
10.1109/ICDE.2018.00127
Publisher
IEEE Computer Society
City or Country
Los Alamitos, CA
Citation
WANG, Yanhao; LI, Yuchen; and TAN, Kian-Lee.
A sliding-window framework for representative subset selection. (2018). 2018 IEEE 34th International Conference on Data Engineering (ICDE): Paris, April 16-19: Proceedings. 1268-1271.
Available at: https://ink.library.smu.edu.sg/sis_research/7123
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1109/ICDE.2018.00127
Included in
Databases and Information Systems Commons, Numerical Analysis and Scientific Computing Commons, Theory and Algorithms Commons