K-means: A Complete Introduction

Published in

Towards Data Science

9 min readNov 19, 2019

K-means is an unsupervised clustering algorithm designed to partition unlabelled data into a certain number (thats the “ K”) of distinct groupings. In other words, k-means finds observations that share important characteristics and classifies them together into clusters. A good clustering solution is one that finds clusters such that the observations within each cluster are more similar than the clusters themselves.

There are countless examples of where this automated grouping of data can be extremely useful. For example, consider the case of creating an online advertising campaign for a brand new range of products being released to the market. While we could display a single generic advertisement to the entire population, a far better approach would be to divide the population into clusters of people who hold shared characteristics and interests displaying customised advertisements to each group. K-means is an algorithm that finds these groupings in big datasets where it is not feasible to be done by hand.

The intuition behind the algorithm is actually pretty straight forward. To begin, we choose a value for k (the number of clusters) and randomly choose an initial centroid (centre coordinates) for each cluster. We then apply a two step process:

Assignment step — Assign each observation to it’s nearest centre.

K-means: A Complete Introduction

Create an account to read the full story.

Published in Towards Data Science

Written by Alan Jeffares

No responses yet

More from Alan Jeffares and Towards Data Science

Decision Trees: A Complete Introduction

A complete introduction to the Decision Tree…

5 Simple Projects to Start Today: A Learning Roadmap for Data Engineering

Start with 5 practical projects to lay the foundation for your data engineering roadmap.

Deep Learning for Outlier Detection on Tabular and Image Data

The challenges and promises of deep learning for outlier detection, including self-supervised learning techniques

From Jupyter Notebook to Deployment — A Straightforward Example

A step-by-step example of taking typical machine learning research code and building a production-ready microservice.

Recommended from Medium

Image Segmentation With K-Means Clustering

An introduction with Python

Support Vector Machines

SVM is highly preferred by many as it produces significant accuracy with less computation power. SVM can be used for both regression and…

Lists

Predictive Modeling w/ Python

Practical Guides to Machine Learning

Natural Language Processing

data science and AI

Understanding Principal Component Analysis (PCA)

Machine learning models often struggle with high-dimensional data, a challenge known as the curse of dimensionality. One powerful technique…

Jeff Bezos Says the 1-Hour Rule Makes Him Smarter. New Neuroscience Says He’s Right

Jeff Bezos’s morning routine has long included the one-hour rule. New neuroscience says yours probably should too.

Understanding Decision Boundaries in Machine Learning

When training a machine learning model for classification tasks, one of the most important concepts to grasp is the decision boundary. It…

Ad Click Prediction ML System Design

Introduction: