
My first machine learning algorithm was a K-nearest-neighbors (KNN) model. It makes sense for beginners – intuitive, easy to understand, and you can even implement it without using dedicated packages.
Because it makes sense for beginners, it also makes a lot of sense when explaining it to anyone unfamiliar with machine learning. I can’t put into words how much easier it is to get a room full of skeptical people on board with the KNN approach than with a black box random forest.
It’s an unsung hero of modeling approaches and serves as an excellent benchmark before moving on to more complex algorithms, and for many use cases, you might actually find that the time and cost of more complex algorithms aren’t worth it.
To get your modeling inspiration going, here are three example applications of KNN where you might well get much better results in a real-world scenario than you think you will.
Marketing Mix Modeling (MMM)
I work in marketing, and my work with MMM systems typically involves identifying marketing channels that will improve campaign performance and/or scale the campaign up to reach more people. At a high level, this is known as marketing (or media) mix modeling.
The goal of any kind of modeling with MMM is to understand the effectiveness of each marketing input in both isolation and in combination with others, and then optimize the marketing mix for maximum effectiveness.
The most basic approach is predicting the impact of different marketing strategies based on historical data. A KNN model would consider each marketing strategy as a point in a multi-dimensional space, where the dimensions could be various marketing inputs such as advertising spend, promotional activities, pricing strategy, and so on.
When a new marketing strategy is proposed or an existing strategy needs optimizing, the model can predict the strategy’s results by looking at the ‘k’ most similar historical strategies, i.e., the ‘k’ nearest neighbors in the multi-dimensional space.

The outcome of the new strategy is predicted as a weighted average of the outcomes of these ‘k’ nearest neighbors, i.e. known strategies and results. We might set weighting based on the distance of each neighbor from the new strategy, with closer neighbors having more influence on the prediction.
This approach allows for a nuanced understanding of the potential impact of different marketing strategies, and quantifying how the marketing mix is working as a whole.
Ad Targeting
Ad targeting is the process of serving ads to a specific group of consumers based on their attributes. Digital advertising platforms like Instagram and YouTube use incredibly precise targeting algorithms based on thousands of attributes, however, the strategy also works well for much less precise mediums like TV.
Distance-based models like KNN and Clustering algorithms can be used to predict outcomes like the likelihood of a user responding to an ad based on the behavior of similar users, or finding new users to target who are demographically similar to groups who are already in use.
For example, if a group of users who have similar browsing habits and demographics have responded positively to a particular ad, a KNN model can predict that a new user with geometrically similar attributes would also respond positively to that ad.
There are several different modeling approaches we can use here. Probably the "easiest" and most intuitive approach is to predict the likelihood of a known user responding to an ad based on other known users. There are more powerful uses for a model trained this way, however.
Instead of running predictions on known users, we can create a simulated dataset with as many user attribute combinations as computationally reasonable, then look at which attribute combinations produce the best results. We can then not only find the users who best match the optimal results, but prospect for new users who have not interacted with our product previously.
A bonus outcome is that we might also highlight potentially performant audiences that might not have been obvious previously.

KNN is rarely the most precise possible model option, but that can work to our advantage in simulation scenarios like the above. A positive tradeoff of KNN’s lower precision is that reducing overfitting is particularly easy. Simply increasing the k-value until the issue improves will solve the issue in many cases.
Since KNN predictions are weighted averages of existing outcomes, the model is almost certainly not going to produce errant results far outside of the range of what has already been observed.
Additionally, ad targeting is a direct product of understanding who your target audience is. The benefits of a model that we’re able to confidently know is not overfitting, and that can also be explained in simple terms to critical stakeholders, quickly stack up to be a preferable choice over other model types.
Influencer Identification
While ads can be targeted based on quantifiable data, having a relatable person you trust giving product advice is one of the most effective ways to build brand reputation. This is particularly effective with video-first social media like TikTok.
Influencer Marketing typically has outsized engagement and conversion rates vs other marketing channels, so having a network of influencers to build brand exposure is extremely valuable.
We can characterize each influencer account by a set of features, such as the number of followers, the engagement rate, the type of content they produce, and so on.
A KNN model could then find mathematically similar accounts in the same way our previous examples did, or we could go a different route and categorize influencers into groups that match certain criteria.
Since we’re familiar with the "find similar things by distance" approach now, let’s look at the classification option.
A KNN model used as a classifier is an intuitive and generally performant model option, particularly for applications where there’s no pressing need to classify things with attributes that might be quite different from what we’ve seen before.
For this use case, there’s a novel way we can exploit multiple properties of a KNN model to expand our product reach without falling into the trap of reaching the same people over and over again (duplicated reach) through accounts with similar followings.
We start by building the KNN classifier to place influencers into groups who might fit products similar to ours or who maybe are already being used by our competitors.
For example’s sake, let’s say we have a shortlist of 500 accounts that our digital marketing team thinks might work well, and our classifier finds a sub-group of 100 accounts that are good candidates for expanding our influencer program.
We want to avoid the most similar accounts to prevent duplicated reach, so we can make use of the actual distances calculated by the model and look for accounts that are outside the closest k points. The number k can further be optimized to instead be a minimum distance threshold that makes practical sense, but let’s say k=20 here.

At this point we have 100 influencers who match the criteria we’re looking for, and 20 influencers we want to ignore because the likelihood of duplicative reach is too high. We can then pass this list back to the digital marketing team to do their thing and we’ve efficiently expanded who our product is being presented to.
The approach here is somewhat unique to KNN since we’re able to utilize both the classifier properties of a KNN model and the distance-based information that was used to calibrate the model itself.
There are some drawbacks worth noting with KNN. Training speed is heavily impacted by data size and the number of features, making it prohibitively slow for very large datasets. The model is also highly sensitive to irrelevant features, so things like co-linearity are major issues. Features with different scales or with very skewed distributions can definitely cause problems, owing to the distance calculations.
When built and applied correctly, however, KNN is a really solid option. Next time you’re working through your model selection process, remember to give this trusty option a good look.