How I use machine learning to save time

Published in

Towards Data Science

4 min readDec 7, 2017

Things have changed. Certain types of problems used to be really hard, or require a lot of repetitive work to be solved. I’m sure we’ve all been there, facing down something we knew would take a long time to complete, not because it was challenging but because it was repetitive and time-consuming.

Things like Automator from Apple can be used to make certain computer-related tasks more doable, especially if you’re not handy with scripting languages. These tools can be used to solve problems like needing to rename thousands of files in a folder, or deleting duplicates. But there are some things that it can’t solve.

Many years ago I produced a web series that had quite an extensive cast. One task that I needed to complete was making a website for the show. I remember enjoying the work until I got to the cast page. I suddenly realized that I would need to crop and position tons of headshots of different sizes and shapes to all look uniform on the site (this was before I knew about grid layouts). This process took me days to complete. I never really felt like I got everything quite right since I was doing it all manually.

Today, this problem can be solved quite easily with machine learning. Here’s how I’d approach it;

Download and install Facebox from Machine Box (a startup I helped found).
Write a script that posts every photo you want cropped to Facebox and get back the position of a face.
Use the dimensions Facebox returns to crop the photo (with some padding) around the face.

I’ve open sourced some code that does just this here: profilecropper

You can create, and auto-crop millions of profile pictures this way, not having to do any manual labor. All it takes it knowing where the face is, and that’s something machine learning can solve.

Lets take another example. Let’s say you’ve got yourself quite a library of photos that you’ve scanned from the last 50 years. Scanning each photo was a pain, but at least now they’re all digital files. But the problem is, you can’t search them. Fortunately, machine learning can come to the rescue once again. Here’s how I’d approach it;

Download and install Tagbox from Machine Box
Write a script that posts every photo you have to Tagbox and get back a list of tags.
You can decide to filter out tags with low confidence at this point or not, depending on the results.
Store the tags along side the file name in a text file that your OS will then later index. Or, write the data into a simple database like an excel spreadsheet or a SQL table.
You can now search your photos by whats in them.

To get you started, there’s also some open source code that solves this here: tagroll

Solving the chihuahua/muffin problem with Tagbox

Fun little side note; lets say you’ve got specific types of objects you’d like to search for, like a special kind of car or house. You can teach Tagbox those things with one or two sample photos, and it will then tag every photo it sees it in from then on. Read more about how to do that here.

You can also do really clever things like visual similarity search on all your photos. Upload a photo of something, and use Tagbox to show you other photos in your collection that are similar.

As you can imagine, these solutions have applications just about everywhere. For example, your e-commerce site can provide customers with powerful tools to find the clothing they’d be more likely to buy based on visual similarity to something they like.

In this day and age, we’re all dealing with a tremendous amount of data. Whether its personal photos or product catalogues, we need to start regularly considering machine learning as a powerful method for solving the problems that arise from having so much data.

How I use machine learning to save time

Written by Aaron Edell