Similarity Search, Part 4: Hierarchical Navigable Small World (HNSW)

Discover how to construct efficient multi-layered graphs to boost search speed in massive volumes of data

Vyacheslav Efimov
Towards Data Science

Similarity search is a problem where given a query the goal is to find the most similar documents to it among all the database documents.

Introduction

In data science, similarity search often appears in the NLP domain, search engines or recommender systems where the most relevant documents or items need to be retrieved for a query. There exists a large variety of different ways to improve search performance in massive volumes of data.

Hierarchical Navigable Small World (HNSW) is a state-of-the-art algorithm used for an approximate search of nearest neighbours. Under the hood, HNSW constructs optimized graph structures making it very different from other approaches that were discussed in previous parts of this article series.

The main idea of HNSW is to construct such a graph where a path between any pair of vertices could be traversed in a small number of steps.

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Responses (7)

What are your thoughts?

Hi Vyacheslav, really appreciate the way you explained HNSW with such simplicity and great detail.
One doubt: You mentioned that the algorithm descends to the next layer and each of found efConstruction nodes acts as an entry point; but in the…

--

Hi Vyacheslav, this is such a great blog. I tried to do research on FAISS but I couldn't get as deep as you, and explain as simple as you! Thank you for writing!
A small detail regarding FAISS is that, they fix all the random seeds, so that everytime you do search, its deterministic :)

--

greater

Hi Vyacheslav, thank you for your excellent article. Should this be "smaller" instead of "greater"? Since in your example, we are trying to find the closest node to X compared to the distance from neighbour node B to other vertices.

--