The world’s leading publication for data science, AI, and ML professionals.

Nearest-neighbor missing visuals revealed

How to analyze and interpret KNN results with cutting-edge visuals

KNN Visuals (image by author)
KNN Visuals (image by author)

The unsupervised K- Nearest Neighbour (KNN) algorithm is perhaps the most straightforward machine learning algorithm. However, a simple algorithm does not mean that analyzing the results is equally simple. As per my research, there are not many documented approaches to analyzing the results of the Knn Algorithm. In this article, I will show you how to analyze and understand the results of the unsupervised KNN algorithm.

I will be using a dataset on cars. A sample dataset is shown here. The data has got make of the car, different technical characteristics such as fuel type, length, width, number of doors, etc. as well as the price of the car. The data has about 25 fields, out of which there are about 15 numeric fields.

Cars sample data (image by author).
Cars sample data (image by author).

The data is split into two – training and scoring. The training dataset is used to train the KNN model. Then the model is used to find the nearest neighbors for data in the scoring dataset.

Here are the visuals which will help understand the KNN results.

Network diagram to visualize the nearest neighbors

One of the elegant ways to visualize nearest neighbors is using a network diagram. The data in the scoring dataset is a central node and is linked to its nearest neighbors.

network diagram (image by author)
network diagram (image by author)

In addition, one can add a hovering tooltip to see details behind the nodes. This gives a good understanding of the nearest neighbors for a particular record in the score data.

hover tooltip (image by author)
hover tooltip (image by author)

Using the power of graph analytics

As Network diagrams are based on graph analytics, you can also analyze how neighbors are connected to each other. This helps in finding a community of neighbors as well as isolated neighbors.

Graph analytics on nearest neighbor output (image by author)
Graph analytics on nearest neighbor output (image by author)

Combining the nearest neighbor algorithm with graph analytics is a powerful tool to understand overall results.

Using PCA and spot-lighting to understand neighbor compatibility

In real life, one can have good neighbors or bad neighbors! Similarly, KNN can identify the nearest neighbor, however, it does not mean that the nearest neighbors are always similar or compatible.

We can verify this "neighborhood compatibility" using PCA and spot-lighting technique as shown below. We use PCA for all train and scoring data to reduce data to two dimensions. The reduced dimensional data is plotted with a scatter plot. We can then use a spotlight technique to highlight the nearest neighbors for a particular record in the scoring dataset. For more information on spotlighting technique, please see my article here

Nearest neighbor analysis for score record 2 (image by author)
Nearest neighbor analysis for score record 2 (image by author)

Shown above are all nearest neighbors for scoring record 2. You will observe that all the points are relatively close to each other. This means that the nearest neighbors are relatively compatible with each other as they have more or less identical features. Further, inspection shows that most of the cars are Nissan, which justifies our observation.

Now let us do the same analysis for scoring record 6 as shown below.

Nearest neighbor analysis for score record 6 (image by author)
Nearest neighbor analysis for score record 6 (image by author)

You will observe that neighbors are situated relatively far from each other. This means that the nearest neighbors are not very compatible with each other. Observing the cars behind the dots, we can see that it’s a mix of Audi, Volvo, and Volkswagon. So even if the dots are classified as nearest neighbors, the cars are different from each other.

Conclusion

In summary

  • The network diagram and graph analytics are an excellent way to visualize the results of the KNN unsupervised algorithm
  • Using PCA and spotlight technique, you can analyze the compatibility of the nearest neighbors

Watch a demo and Try-It-Yourself

You can visit my website to make KNN analyses as well as other analytics with no coding : https://experiencedatascience.com

Here is a step-by-step tutorial and demo on my Youtube channel. You will be able to customize the demo to your data with zero coding.

Please subscribe in order to stay informed whenever I release a new story.

Get an email whenever Pranay Dave publishes.

You can also join Medium with my referral link. Thank you.

Join Medium with my referral link – Pranay Dave

Datasource citation

The data is from https://archive.ics.uci.edu/ml/datasets/automobile.

Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.


Related Articles