Marathon Bib Identification and Recognition

Using deep learning and image processing to recognize numbers from a marathon bib.

Kapil Varshney
Towards Data Science

--

Me after finishing the Mumbai Marathon 2019, and, of course, the bib number recognition (using AWS Rekognition API)

Inception

I recently participated in a Marathon. A few days later I received an email with a link where I could check out and download my race day pictures. I needed to put my bib number on the webpage, and it would pull up all the photos with me in it. This got me thinking how was it made possible!

For those who are not familiar with running events, a bib is a sheet of paper with an e-tag attached to it. This tag is used to record the runners accurate timing over the course of the marathon. The bib also has a unique bib number, printed in big fonts, along with the runner’s name and some other text. (See the photo for reference)

I started thinking of possible ways the tagging of photos could have been achieved. The one obvious way can be manual tagging — there’s a team of humans who look at the photos, read the bib numbers and tag the photos with those bib numbers. This is a tedious task, assuming each marathon will have upward of five thousand photos. The other way is using Computer Vision.

Computer Vision

I have this philosophy that any task that requires a human to look at an image and follow it with an action, almost in a mechanical fashion, can be and should be automated using Computer Vision. We have state-of-the-art algorithms to solve the problem and high computation power to enable such solutions. Why not utilize it? So, that’s how this project started.

I started to explore different Computer Vision techniques that could be used to tag the photos with the bib numbers. I’ll briefly describe each method that I could think of:

  1. EAST text detector + Tesseract text recognition: The idea was to first detect the text regions in the image and then recognize the text to identify the bib numbers. I used the EAST text detector model to identify the regions in the image with text and then passed those regions to Tesseract model to recognize the text.
    Pros: easy and quick to implement.
    Cons: not very accurate, will detect a lot of other texts in the photo.
  2. Image Processing using OpenCV to identify the bib region and then further process the bib region to extract the digits. The extracted digits can be passed to a pre-trained ML model to recognize the number.
    Pros: low requirements in terms of computation power.
    Cons: very difficult to generalize for different bib designs
  3. Segmentation using deep learning model like MaskRCNN to segment out the bib form the image. Apply image processing on the bib to extract the digits. Pass the extracted digits to a pre-trained CNN network to recognize the digits.
    Pros: highly accurate in segmenting the bib and identifying the digits.
    Cons: heavy on power, thus slower, difficult to generalize the image processing methods for different bib designs
  4. Object Detection using deep learning model to directly identify the bib number regions instead of the whole bib. This can save us a lot of image processing steps in comparison to the previous method. Apply image processing to extract the digits and pass them to a pre-trained CNN network to recognize them.
    Pros: similar to the previous method, deep learning models can be highly accurate.
    Cons: heavy on computation requirements
  5. Face Recognition: This method can have a major advantage in identifying a runner in the photo even when the bib is obscured, which does happen a lot. There are multiple ways to implement facial recognition, and I wouldn’t go into the details of it as it can be a book in itself. I’ll mention the couple of ways I can think of right away. One way could be to match faces (probably using Siamese Network and Triplet Loss) against an id photo provided during registration. The other way could be a hybrid of a couple of methods mentioned above. We can cluster photos for each runner based on Face Recognition and then try to read the bib number from one of the photos where the bib is clearly legible.
  6. Cloud-based APIs from Google (Vision), AWS (Rekognition) or Microsoft Azure: Using these APIs to detect and identify texts in the images and then just filter out the bib numbers (possibly using a database of all bib numbers).

Project

First, I tried the first method to get an idea of how a generically trained model will perform on this problem. As expected, it wasn’t performing very well. There was no guarantee of identifying the bib number correctly, and moreover, there were lots of false positive text detected in the image. I’ll write more about it in a later post. Later, I proceeded to try out the (third) method that involves using instance segmentation. This forms the core of this project.

When I first started out to work on this problem, it didn’t seem like a big task. Only when I dived into the project and the nuances started to surface, I realized how challenging this problem is. Just to give you an idea, the only possible way I could get to read the numbers correctly on the photo at the top was using AWS Rekognition API. While it might be easy to read that bib for humans, it wasn’t that simple to train a computer to read that number. The best I could get with my custom image processing pipeline was “1044” instead of “21044”. There are reasons for it like creating a general heuristic for different bib designs and color schemes, etc. that I’ll discuss in later blogs.

The solutions that I have come up with are probably not the best, yet. I realized there is no end to the improvements you can make to a solution. The end-to-end execution of this solution is what this series of write-ups is about. The major reason to pick this up was to get my hands dirty with each aspect of building a computer vision project — gathering dataset, annotating the images, implementing a deep learning model for segmentation, image processing, creating a CNN model for OCR, custom training it for a given dataset, stitching these different parts together, etc.

I’ll write about each of the above-mentioned parts and share the code. You are free to take the code and make improvements or use it for your own application. The learning from this project has been enormous and I want the community to benefit from it.

Please feel free to leave any suggestions/comments/critique. I’ll try to be as prompt as possible.

Here’s the next part in the series:

--

--

Data Scientist (Computer Vision) @ Esri R&D New Delhi. Here to share what I learn and do. Connect with me at https://www.linkedin.com/in/kapilvarshney14/