How to read a label on a wine bottle using computer vision (Part 3)

Antonin Leroy
Towards Data Science
6 min readMar 28, 2022

--

Welcome back to this series of articles. As a reminder, in our last two articles (Part 1, Part 2) we explored two different methods to try and detect the position of the label on a wine bottle.

At the current stage of the project, we are able to feed a photo of a wine bottle to our trained U-Net type neural network and obtain as output a black and white mask that locates the estimated position of the label:

Exemple of a prediction

The next challenging step is to remove all curvature from the label, so we can apply OCR on our flatten label and successfully detect words and sentences written on it.

During my research, I had the chance to stumble upon an awesome github project by Alexey Zankevich that does exactly that. Before using his repository and write this article, it’s important to mention that I asked him a formal authorization to use his code and mention his work in this article.

I will not explain in detail how his library works because I simply did not code it and I do not master the geometric concept he uses, but in simple words the library allows to “unwrap” the wine label by positioning manually 6 points around the edges of the label cylinder. Then the code is generating and mapping a lot of points that follow the estimated geometry of the cylinder and to apply an interpolation of those points with the scipy library to a destination map that is geometrically flat. By doing this we get a visual approximation of a flat label.

The thing is that we need to find a way to automatically position those 6 points on our U-Net image prediction. Let’s take one photo example we will use until the end of this article:

Raw 256*256 U_net Prediction

Once we get our U-net prediction we need to reshape it back to the original source photo size, that way once we find our 6 point coordinates we can place them correctly in the original photo. Also we want the label to be vertically aligned and we only want binary colors (black or white pixels). To do so, we use this bit of code from the readLabels method of the package.

Resized, binarized & aligned U-net prediction

A few words about the vertical align algorithm I implemented, it simply tries to maximize the number of fully black columns so here you can see we cannot say that the alignment is perfect because there is some imperfect detection of the shape of the cylinder, but I choose not to take a perfect example to show that it works in imperfect scenarios.

So now let’s talk about the way I managed to estimate and position the 6 points we need to use the unwrapping label code. My idea was to find an iterative way to detect the corners first, As with the vertical align algorithm I used a logic of “full black column” as a condition to keep iterating, so basically: Scan from side to side and stop as soon as there is a column with at least 1 white pixel in it etc… But the problem was that I did not manage to find all corners with this logic because usually the cylinder has perspective and thus either the top or bottom part is bigger than the other…

And then I remembered about matrix diagonals ! What if instead of iterating with columns and rows, I could find a way to iterate with diagonals like so:

Corner / edge detection logic

This way I’ll be able to find A,C,D and F coordinates points with diagonals and find B coordinate by calculating the simple distance:

Where XB is the X coordinate for the point B. We can now select the column vector (lambda) of the image corresponding to this XB position:

Yes I know this is a wild unorthodox math notation

We iterate from top to bottom in our vector to find the first white pixel to deduct the Y coordinate for the B point.

The logic is the same for the E point: we find the column vector in the middle of the D and F point and this time we iterate from bottom to top until we find our first white pixel.

To get the detail of the implementation, please check the method getCylinderPoints from the package.

Now that we have our 6 coordinates, we can use the “unwrap label” code.

Mesh cylindrical projection
Flatten label

TADA ! And finally we now have an almost perfectly flat label to try to apply the pytesseract OCR library on it. To see the difference between the original (curved) image and the flatten one I put both OCR transcript to demonstrate that we did not do all this work for nothing !

Original image OCR:

PPELLATION SAUCE CONTROLEE

DE RESERVE
_ftes Lo
2020

BBOUTEILLE A LA PROPRIETE
p PRODUIT DE FRANCE

Flatten label image OCR:

GHATEAY DE SAUMUR, PROPRIETE DE LA VILLE DE SAUMUR

AUMUR

LLATION SAUMUR CONTROLEE

DE RESERVE

2020

BOUTEILLE A LA PROPRIETE |
PRODUIT DE FRANCE

So it is absolutely not perfect but there’s definitely an improvement. The photo in itself has some lighting difference that seems to affect the OCR. Here are a few other examples I tested:

Original image OCR:

R
ORDEAUX SUPER!

ERIEUR

APDE
LLAT
1ON BORDEAUX SUP

Flatten label image OCR:

GEREAN D VI N DE 5 O8R De EeARURX

Day

DEMEURE DU BORDELAIS

MOULIN DE JAURE

+S a

(co)

BORDEAUX SUPERIEUR

APPELLATION BORDEAUX SUPERIEUR CONTROLEE

Original image OCR:

Pav
| i YS dD’ :
= UE e :
: SANS SUL
: EITES AJOUTES
ARD BERTRAND at

Fence
oL0 oul

poo

:

p
R
O
D
U
IT
DE F
RA

NCE

Flatten label image OCR:

MS22338

2020

SYRAH ROSE

PAYS D’OCc

VIN BIOLOGIQUE e SANS SULFITES AJOUTES

— — —
CEeRTiFre

GERARD BERTRAND

AGRICULTURE

PRODUIT DE FRANCE = sos:

And I think we are done for this series of article on my first computer vision project. As you may have spotted it, it is far from perfect and some of the steps can be optimized:

  • The corner / edge cylinder detection is one way I proposed to solve this algorithmic task, but I’m sure maybe another solution can be found.
  • The OCR part in itself is a pretty lazy work on my end, it can be better optimized by using something else than the default configuration and maybe add one more image processing steps to reduce shadow / lighting or background object that could create some noise.
  • The Label shape detection U-Net is far from perfect in its predictions and often fail to find a geometrically correct cylinder, I’m sure experimented computer vision engineers will find a way to improve its efficiency with more deep learning work or image processing techniques.

If you have some suggestions to improve this solution feel free to comment or contact me. You can test the application at https://plural.run/wineReader with your own wine bottle photos (Keep in mind that they should have a good resolution, have the less perspective possible and a high contrast background with the label).

all images are by the author.

--

--