An Introduction to K-Nearest Neighbors Algorithm

What is KNN?

Indhumathy Chelliah
Towards Data Science
8 min readNov 23, 2020

--

Photo by Asad Photo Maldives from Pexels

KNN

The K-Nearest Neighbours (KNN) algorithm is one of the simplest supervised machine learning algorithms that is used to solve both classification and regression problems.
KNN is also known as an instance-based model or a lazy learner because it doesn’t construct an internal model.

For classification problems, it will find the k nearest neighbors and predict the class by the majority vote of the nearest neighbors.

For regression problems, it will find the k nearest neighbors and predict the value by calculating the mean value of the nearest neighbors.

The main concept behind KNN is

Birds of same feather flock together

Topics covered in this story

  1. KNN Classification
  2. How to find the optimum k value?
  3. How to find the k nearest neighbors?
  4. Euclidean distance
  5. Why KNN is known as an instance-based method or Lazy learner
  6. Example dataset, dataset representation
  7. Calculating Euclidean distance
  8. Finding nearest neighbors
  9. Python Implementation of KNN Using sklearn
  10. How to find the best k value? Plot error vs k values.

KNN Classification

Let’s learn how to classify data using the knn algorithm. Suppose we have two classes circle and triangle.

Below is the representation of data points in the feature space.

Data points representation [Image by Author]

Now we have to predict the class of new query point (star shape shown in the figure). We have to predict whether the new data point (star shape) belongs to class circle or traingle.

--

--