
Intro
Since the early days of AI, we’ve seen multiple attempts to handle a jigsaw puzzle problem. However, enthusiasts mainly focus on specific aspects: only square-tile, only non-scanned, monochrome, etc.
Here we take a look at the comprehensive approach that handles the problem in its entirety while being very simple. The idea of this work is to demonstrate how we can engage the power of today’s basic AI tools to treat all aspects of jigsaw puzzle solving.
We’ll do the job with Google Colab and Python so that everyone is able to understand and reproduce it easily without any special software.

Overview
We will solve the 15 tiles puzzle. For a computer, it is a complicated one, as complicity mainly comes not from the number of tiles, but from the tiles’ geometry. Our puzzle is unevenly rounded and has tiles with distorted curved borders.
The proposed approach is to demonstrate that we may not limit ourselves to any special type of tiles. Vice versa the idea is to solve puzzles with tiles of any geometry.
We’ll go through 300 lines of Python code. Still, it covers only 3 principal steps:
- Image Processing. It deals with how to extract tiles from the scan.
- Matching. It’s a core part of how to find which tiles match and what way.
- Assembly. It’s a final algorithm to put tiles together into a holistic image.
A surprisingly small number of basic modules will be enough to solve our colored Jigsaw Puzzle.
Image processing
We take a scanned image of scattered tiles as an input. It has to be a good scan without evident scanner artefacts, such as colored stripes or black edges. My scan is A4 format, resampled to 727 x 1000 pixels, which roughly equals 90 dpi.
First, we wrap image plotting into functions to simplify further code. We want to turn off axes, switch to a ‘gray’ color map for single-channel images, etc.
Then we load a scan and make it RGBA to engage the transparency dimension further.

We can see colored tiles on a sky-blue background. To be able to do anything with tiles we have to detect them first. It’s a single image, which we have to turn into 15 separate tiles. One of the possible ways to detect objects on a monochrome background is Adaptive Thresholding. We will apply adaptiveThreshold()
tool to our image in order to separate tiles from the background. GaussianBlur()
is optional, while it is necessary over here as some tiles have white edges, which blend into the background and create rips we have to fill.

This looks good. Still, we can’t do anything with it. What we need is a clear binary image, where each pixel is either tile or background. Therefore, we’ll do the simple trick, using OpenCV contour detection and fill technique. In order to suppress extra contours, formed by paintings, we sort detected contours by length descending order and take 15 biggest (manually, as automation is omitted to simplify the code).

Not good yet. We can see that tiles have ragged borders and protruding spikes formed by shadows in the original scanned image. One of the typical ways to suppress spikes is median filtering, which is turning each element into the averaged style of surrounding elements. In other words, if all of your mates are short and you are tall, you’ll be made short too :). After median_filter()
we trim our shapes a little bit by drawing a black-colored contour. This is to get rid of shadows and extra pixels created by the blur operation above.

Well, that’s good! Now the image is binary: tiles vs background and no halftones We can superimpose this on the original image to extract colored tiles one by one. Another clever tool boundingRect()
will help us cut the desired tile out of the big image. Its role is to detect a minimal bounding box that embraces a shape.

We finally have them! 15 colored tiles, positioned in the centers of 300 x 300 images with transparent background. The prelude is over, it’s time for the main act!
Matching
The key idea of matching is to take a pair of tiles, look for similar parts in their contours, compare colors along those parts and then try to lock those parts without loss of pixels.
We start with rescaling our tiles. Let’s put them on 1400 x 1400 canvas, which is a canvas we will use to assemble the whole puzzle. This is just a technical operation, with no secret meaning.
Four auxiliary functions will be used in the matching algorithm. All of them are grounded on basic 2D geometry:
getColors()
is designed to take color pixels along the subcontour of the image. Every 3 points of the subcontour we take two points orthogonal to subcontour inside and outside it 3 pixels deep (as we don’t know where the tile exactly is), convert color to HSV and add to list.putOnAnvil()
is just a technical thing to take input image as NumPy array and offset /rotate it using PIL methods. We use it to rotate and re-position canvas tile, putting its subcontour center into the image center for a fuse with matching tile.rotatePoint()
is helpful in tracking the tile center when we move and rotate the puzzle and tiles during assembly or fitting.reScale()
is once again a technical thing, which is used to translate point coordinates from (300,300) of tile image to (1400,1400) of puzzle canvas. Working in two spaces is necessary to save time with not processing extra zero pixels.
All over the further code please keep in mind that OpenCV works in (x,y) coordinates domain, while other modules are (y,x). Therefore we will have to constantly swap and flip there and back.
Before we dive into a matching algorithm, let’s take a look at some major concepts we will use. When we match, we talk about tiles A and B, subcontours and subcontour centers (pointA and pointB), bounding rectangle and tile center, minimum area rectangles and their centers (cA and cB) and angles (angleA, angleB), typepointA and typepointB to say it’s tab or blank, canvas center ("anvil") and near subcontour points we use for color matching.

Then we come to the matching algorithm itself. It looks scary. Still, the structure is very simple. We take two tiles and extract their contours with OpenCV findContours()
. Then we take small subcontours of tile A and compare them against small subcontours of tile B. Subcontours are derived via rolling and slicing of the main contour.
One after another, we go through 3 matching loops (contour matching, color matching, and pre-fitting). Names are self-explanatory, while the idea is to continuously reduce the number of matches, filtering bad ones out by certain criteria. For instance, two contours that are similar in form, may be absolutely incompatible by color, so they won’t even reach a fitting room. In addition, the fitting room is time-consuming, so we want a minimum number of matches to undergo its filter.
Let’s look at what happens in each one:
- Contour matching. It is done with Open CV
matchShape()
. While not 100% good for curves (I prefer to treat those my way), we use it here to simplify the story. Before matching we detect subcontour types (tab or blank) and general look (usingminAreaRect()
) to save time and avoid evident fails. Two auxiliary flags (‘colinear’ and ‘codirect’) are necessary for the correct detection of rotation angle. This is becauseminAreaRect()
returns only (0,-90) angle in quadrant III andmatchShapes()
doesn’t return any angle at all. Still, there is a way out. - Color matching. It runs through filtered form matches, takes color points along each of the subcontours, and compares them with
fastdtw()
distance metric (explained it here). The crucial thing is we have to convert colors into HSV format. Original RGB of tiles may mislead, for 130 is as close to 160 as 190, and swapping two channels may give you the same metric but a totally different color. HSV will do better. -
Pre-fitting. It is a direct attempt to fuse tiles, using matches that reached this level. We take two canvas tiles, superimpose their subcontour centers in the center of the canvas, rotate due angle, and calculate metrics. If tiles fit good, we’ll have a minimum loss of pixels (tiles don’t overlap) and a minimum length of resulting pair contour (joint sides stick to each other and hide from measuring). A more robust version includes superposition of subcontours with the calculation of intersection ratio (dropped here).
Parametrization of matching algorithm will require manual tuning from puzzle to puzzle :
LENGTH
. Here subcontour length is 160, which is roughly close to the minimum joint length. It has to be variable for puzzles, which have tiles of significantly varying size.PRECISION
. It’s only a rough filter to get rid of evident non-matches. Precision is allowed difference in dimensions of subcontours bounding rectangles. It has to be non-zero, as image processing is not 100% accurate.STEP_A
,STEP_B
. Steps are just shifts we use to take another subcontour. Value 1 is a dream, but it’ll loop forever, you have to look for compromise.MAX_*.
These parameters determine the upper limits for corresponding metrics. They mainly depend on resolution and subcontour length.
Now we run the matching algorithm for all possible pairs in a loop. It goes roughly 0.5 sec per each of 105 pairs. Less than a minute here, but for 128 tiles puzzle with 8192 pairs it will take an hour. Therefore, to be fast with big puzzles, we need optimizations (numba, parallel threads) as well as algorithmic tricks such as early grouping of tiles, pre-assemblies, etc.
The resulting list of matches contains information about tile numbers, roll values, subcontour centers coordinates, rotation angle, and metrics. If there are no matches that satisfy our limitations, our results will be empty. We may also miss some matches, tiles may lock during assembly anyway.
As our matching algorithm goes from tile 0 to tile 14, we record only ascending pairs such as (1,5), (2,6), etc. However, if (1,5) match, then (5,1) also match. That’s why we unfold matches into a full list, flipping pairs. In the end, we sort the list of matches by pair and fit metric, and get something like this:

Assembly
The key idea of assembly is to go through matches found and having A tile on canvas, try to lock B tile to it, moving and rotating both the puzzle and added tile according to match information.
Namely, we take match tiles by their matching subcontours’ centers, superimpose one point upon another in canvas center, and rotate tiles due angle. In order to keep control over rotations and positioning, we always do this trick in the center of a canvas. We can think of this as putting every new pair of centers in the center of a canvas as on an anvil to fuse. This allows us to correctly rotate and keep track of contour and tile centers coordinates in the assembly process.
We will use one helpful function in order to simplify the assembly algorithm. Its role is to keep records of tiles centers and angles on canvas when we rotate the puzzle and tiles.
Our puzzle has 22 joints, but matching algorithm returned 37 matches. This means we have to filter the right ones in assembly process. We do this via control of pixel loss. If we a add new tile to the puzzle and it goes wrong and partially overlaps, this results in the loss of color pixels. When pixel loss is below some ratio (f.i. 10% of the added tile in my case), we accept the tile, if it’s higher – probably our match is wrong and we have to try another one.
For simplicity, we drop the replacement of tiles in this algorithm and just do the maximum of 10 attempts to assemble the puzzle or leave the loop upon reaching 15 tiles on canvas. This means if we put a wrong tile or put right tile wrong way, we won’t be able to get back and re-do. However, for this puzzle that’s enough.

Well, works fine! Though… the image looks a little bit worn out. This results from the distortion of information during multiple rotations. As well, don’t forget we work at low resolution from the beginning. At least, we can see underwater guys are smiling 🙂
Now, as we’ve found the solution, we can mark original tiles with matches found. For each pair, we draw circles of a specific color in the lock positions and put a match number inside of both.

As you can see, not all of the tabs and blanks are marked. This is because 14 locks are enough to assemble the puzzle. Others will lock automatically.
Conclusion
In this article, we’ve gone through the full cycle of jigsaw puzzle solving. It was clearly demonstrated that we can do this without deep learning, by simply combining algorithmic logic with high-power AI tools of today.
Many explanations and some algorithms (such as detecting the number of tiles, tiles replacement during assembly, etc) were dropped to make the story short and keep focus on the general line.
Though this is not the ultimate approach, it is comprehensive and shows the way we can treat real-life problems using simple affordable tools of today.
Knowledge of the school geometry and basic Python AI happens to be enough to solve a problem, which was a headache just yesterday.
Find source files at: GitHubJigsaw-Puzzle-AI