Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
2-2023
Abstract
The k-center clustering algorithm, introduced over 35 years ago, is known to be robust to class imbalance prevalent in many clustering problems and has various applications such as data summarization, document clustering, and facility location determination. Unfortunately, existing k-center algorithms provide highly suboptimal solutions that can limit their practical application, reproducibility, and clustering quality. In this paper, we provide a novel scalable and globally optimal solution to a popular variant of the k-center problem known as generalized L1 k-center clustering that uses L1 distance and allows the selection of arbitrary vectors as cluster centers. We show that this clustering objective can be reduced to a mixed-integer linear program (MILP) that facilitates globally optimal clustering solutions. However, solving such a MILP may be intractable for large datasets; to remedy this, we present a scalable algorithm that leverages constraint generation to efficiently and provably converge to its global optimum. We further enhance outlier handling through a simple but elegant extension to our MILP objective. We first evaluate our algorithm on a variety of synthetic datasets to better understand its properties and then validate on 20 real benchmark datasets where we compare its performance to both traditional L1 distance k-center and k-medians baselines. Our results demonstrate significant suboptimality of existing algorithms in comparison to our approach and further demonstrate that we can find optimal generalized L1 k-center clustering solutions up to an unprecedented 1,000,000 data points.
Keywords
Class imbalance, Clustering problems, Clustering solutions, Clusterings, Constraints generation, Integer Linear Programming, K-center, Mixed integer linear, Mixed integer linear program, Mixed integer-linear programmes
Discipline
Databases and Information Systems | Theory and Algorithms
Research Areas
Data Science and Engineering; Intelligent Systems and Optimization
Publication
Proceedings of the 37th AAAI Conference on Artificial Intelligence, Washington, USA, 2023 February 7-14
First Page
7015
Last Page
7023
ISBN
9781577358800
Identifier
10.1609/aaai.v37i6.25857
Publisher
AAAI
City or Country
Washington
Citation
CHEMBU, Aravinth; SANNER, Scott; KHURRAM, Hassan; and KUMAR, Akshat.
Scalable and globally optimal generalized L1 K-center clustering via constraint generation in mixed integer linear programming. (2023). Proceedings of the 37th AAAI Conference on Artificial Intelligence, Washington, USA, 2023 February 7-14. 7015-7023.
Available at: https://ink.library.smu.edu.sg/sis_research/8518
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1609/aaai.v37i6.25857