Optimizing YouTube thumbnails with deep learning
I recently had the opportunity to visit Michael Petherick, the Chateau YouTuber, and his family at Chateau de la Basmaignee in France. And this is what led me to developing an AI tool for optimizing YouTube thumbnails with the goal of getting an edge over the YouTube algorithm.
For some context on how this happened, Michael runs a very successful YouTube channel called Doing It Ourselves that follows his family and his friends making progress on renovations at the chateau and the gardener’s cottage. Michael invited me down to help on some of the renovation’s and in return they would put me up in a room at the chateau. It was an incredible experience and I was so appreciative of Michael inviting me and his family for being such great hosts that I wanted to help out in any way that I could. And that is what inspired me to attempt to apply my data science background to Michael’s YouTube channel. He was nice enough to share his analytics data¹ with me and I ended up doing a deep dive into his channel reach, engagement, and audience statistics.
This process involved a lot of research into the YouTube algorithm, how it works, and how it interacts with Michael’s content and his subscriber base. YouTube has created a complicated ecosystem of content creators and an advanced recommendation system so I knew it was going to be difficult for me to add value to Michael’s channel because he’s already an expert in his own content and what interests his quarter of a million subscribers.

Still, I wanted to try and offer some recommendations, and while I won’t cover all of the details that I discovered in this process, I want to talk about the major points; and this involves features of the YouTube algorithm in general and some of the finer details I learned through evaluation of Michael’s YouTube analytics. Finally, I’ll talk about how this led to the development of the AI tool I mentioned.
As everyone knows, the primary metric of success for a YouTube video is view count. As a data scientist, I love single-value metrics to optimize because it makes things much easier for me to focus on one piece of a very complex problem. All we have to do is explore the different variables that contribute to view count.
To start, I wanted to look at traffic source data; where do viewers come from? For me, as a small content creator, my channel, MindOfData, has one relatively popular video on making music with AI that had a well-known open source software in the title and roughly 66% of views were driven by people searching for that software, i.e. YouTube search. However, for content creators like Michael with an established subscriber base, the majority of his views are going to come from the "Browse Features" category where his new videos pop up on people’s homepage. This will mainly include his subscribers, but can include others who have enjoyed his content or similar content in the past.

Because this is the largest category it’s where I wanted to focus the majority of my efforts. It also answers our first question on how to get more views: the way to get more views is to have YouTube show your video to more people on the homepage. As long as you have a moderately interesting thumbnail and title then by virtue of the large number of people on YouTube, some will end up clicking on it. So this lets us narrow our research question to:
How do we convince the YouTube algorithm to show a video to more people?
And this complicates things slightly; it’s equivalent to asking how the YouTube algorithm works, and there’s a lot of people who have already devoted a lot of time to figuring that out. Entire books can be created on the intricacies of the algorithm and how it works, but I want to make a unique contribution to this area that, to my knowledge, hasn’t been explored yet. So I’m going to narrow the research question even further and focus on one particular aspect of the algorithm: impressions click-through rate. So what convinces a person to click on a particular video? This is an essential element to YouTube success because having a high click-through rate is just one of the factors that the algorithm uses to determine whether or not a video is good and worthy of being pushed to more people (obviously the quality of the content measured by average view duration is also an important metric). Click-through rate also has the benefit of being almost entirely dependent upon 6 factors:
- The thumbnail
- The title
- The video length
- The number of views
- How recently the video was published
- People’s existing perception of the channel’s content and quality
These features as a whole constitute a fairly complex set of variables that make it impossible to optimize, but we can focus on one of these variables and this ended up giving me the idea for my AI project… It’s fairly common knowledge at this point that thumbnail is a primary factor that convinces someone to click on a video. Features such as big red arrows, circles, people’s faces, overlaid text, and visually intriguing depictions of content in thumbnails all have a huge impact on whether or not people decide to click on a video. It also occurred to me that neural networks can identify all of these features, and I have both thumbnail images and the impressions click-through rate data from Michael’s YouTube analytics, so why don’t I build a tool that takes Michael’s thumbnails as input and learns to predict click-through rate? The result of such a project would be a model that could distinguish thumbnails with high "clickability" versus those with low "clickability".
So I started working on the model. I scraped YouTube thumbnails off of the Doing It Ourselves channel page, and ensured image titles were appropriately labeled so that I could map them to the YouTube analytics click-through rate.
One of the first thing’s I discovered came about because I started by using the average-over-time click-through rate from the main analytics page, but I noticed that a lot of Michael’s most popular videos actually had lower than expected click-through rates. So I looked into it, and it turns out that if a video has a high click-through rate then the YouTube algorithm is more likely to push that video more heavily until the video reaches a saturation point of being shown to all the people who are going to be interested in it and the click-through rate starts to drop off. At that point the algorithm decides it has more relevant videos to recommend to people. So I actually ended up pulling up the click-through rate time series data and isolating the peak rate which usually occurs in the first day or first couple of days of a video being published.

Using FastAI, I coded up a model to predict peak click-through rate from the thumbnail alone; I don’t incorporate any of the other variables in the model partially because I wanted to keep it as a simple proof of concept, but also because training a model on title "clickability" with limited data would be difficult. Even with thumbnails, I had to train across 20 epochs before my model began to identify "clickable" features and the error got to a reasonable level. This means the model as built will be heavily adapted specifically to the Doing It Ourselves style of thumbnails.
I got the error down to 2.90% which seems pretty decent given the average peak click-through rate was around 12% plus or minus 5% standard deviation, and I’m also not incorporating title data or other factors. That error percentage also comes from the validation set since data was limited and I wasn’t able to reserve a significant test set. I have however, tested the model on a couple of Michael’s recent thumbnails, and it so far seems to give pretty solid predictions of the peak click-through rate. After getting the model to a decent point, I built widget app functionality around it and launched a public version via Heroku. So you can actually check out the application here: http://thumbnailpredict.herokuapp.com/ if you’re interested in testing it out.
At best, this tool can give someone a slight edge and some insight into how to develop "clickable" thumbnails. It can be especially helpful if you’re debating between two or more thumbnails for a video. It definitely won’t magically make a great thumbnail for you, or a great title for that matter. I believe that kind of AI application, that can watch your content and develop a thumbnail and title that will get a lot of attention, is totally feasible to create, but it would take several months to develop and would require access to significantly more YouTube data.
Throughout this process I’ve learned something very important though; thumbnail and title are really important, but the difference between a good thumbnail and an average thumbnail (assuming you’re still representing the content accurately) accounts for about a 25% difference in views. Obviously this value ranges greatly and is just a rough estimate, but the point is that while a 25% difference in views is quite significant, it shows that the secret to growth on YouTube is not as simple as a good thumbnail and title. In fact, it seems from my research that potentially the most critical element is people’s perception of the channel and the quality of the content. This is by no means a simple metric to understand and predict. It requires a non-trivial understanding of the YouTube ecosystem; marketing, media, engagement with fans, and the various other unmentioned elements that can contribute to growth on YouTube.
[1]: Michael Petherick (Nov 2021). Doing It Ourselves YouTube Analytics.