K-means: A Complete Introduction

Alan Jeffares
Towards Data Science
9 min readNov 19, 2019

K-means is an unsupervised clustering algorithm designed to partition unlabelled data into a certain number (thats the “ K”) of distinct groupings. In other words, k-means finds observations that share important characteristics and classifies them together into clusters. A good clustering solution is one that finds clusters such that the observations within each cluster are more similar than the clusters themselves.

There are countless examples of where this automated grouping of data can be extremely useful. For example, consider the case of creating an online advertising campaign for a brand new range of products being released to the market. While we could display a single generic advertisement to the entire population, a far better approach would be to divide the population into clusters of people who hold shared characteristics and interests displaying customised advertisements to each group. K-means is an algorithm that finds these groupings in big datasets where it is not feasible to be done by hand.

The intuition behind the algorithm is actually pretty straight forward. To begin, we choose a value for k (the number of clusters) and randomly choose an initial centroid (centre coordinates) for each cluster. We then apply a two step process:

  1. Assignment step — Assign each observation to it’s nearest centre.

Create an account to read the full story.

Or, continue in mobile web

Already have an account? Sign in

Published in Towards Data Science

Your home for data science and AI. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.

Written by Alan Jeffares

Machine Learning PhD student in University of Cambridge

No responses yet

What are your thoughts?