Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
6-2018
Abstract
Many high dimensional vector distances tend to a constant. This is typically considered a negative “contrastloss” phenomenon that hinders clustering and other machine learning techniques. We reinterpret “contrast-loss” as a blessing. Re-deriving “contrast-loss” using the law of large numbers, we show it results in a distribution’s instances concentrating on a thin “hyper-shell”. The hollow center means apparently chaotically overlapping distributions are actually intrinsically separable. We use this to develop distribution-clustering, an elegant algorithm for grouping of data points by their (unknown) underlying distribution. Distribution-clustering, creates notably clean clusters from raw unlabeled data, estimates the number of clusters for itself and is inherently robust to “outliers” which form their own clusters. This enables trawling for patterns in unorganized data and may be the key to enabling machine intelligence.
Discipline
Computer and Systems Architecture | Graphics and Human Computer Interfaces
Research Areas
Data Science and Engineering
Publication
Proceedings of the 31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, USA, June 18-23
First Page
5784
Last Page
5793
ISBN
9781538664209
Identifier
10.1109/CVPR.2018.00606
Publisher
IEEE Computer Society
City or Country
Salt Lake City
Citation
LIN, Wen-yan; LAI, Jian-Huang; LIU, Siying; and MATSUSHITA, Yasuyuki.
Dimensionality's blessing: Clustering images by underlying distribution. (2018). Proceedings of the 31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, USA, June 18-23. 5784-5793.
Available at: https://ink.library.smu.edu.sg/sis_research/4810
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1109/CVPR.2018.00606
Included in
Computer and Systems Architecture Commons, Graphics and Human Computer Interfaces Commons