Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

6-2018

Abstract

Many high dimensional vector distances tend to a constant. This is typically considered a negative “contrastloss” phenomenon that hinders clustering and other machine learning techniques. We reinterpret “contrast-loss” as a blessing. Re-deriving “contrast-loss” using the law of large numbers, we show it results in a distribution’s instances concentrating on a thin “hyper-shell”. The hollow center means apparently chaotically overlapping distributions are actually intrinsically separable. We use this to develop distribution-clustering, an elegant algorithm for grouping of data points by their (unknown) underlying distribution. Distribution-clustering, creates notably clean clusters from raw unlabeled data, estimates the number of clusters for itself and is inherently robust to “outliers” which form their own clusters. This enables trawling for patterns in unorganized data and may be the key to enabling machine intelligence.

Discipline

Computer and Systems Architecture | Graphics and Human Computer Interfaces

Research Areas

Data Science and Engineering

Publication

Proceedings of the 31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, USA, June 18-23

First Page

5784

Last Page

5793

ISBN

9781538664209

Identifier

10.1109/CVPR.2018.00606

Publisher

IEEE Computer Society

City or Country

Salt Lake City

Additional URL

https://doi.org/10.1109/CVPR.2018.00606

Share

COinS