Machine Learning Rules in a Nutshell

Published in

Towards Data Science

5 min readMay 29, 2018

A few days ago google engineers posted a huge manual on how to build great ML products(and, by the way, a new TensorFlow on GCP specialization on coursera). These guys have a lot of experience in developing large-scale projects in production and they have written down their best practices. The only problem — this doc is really huge, so I’ve decided to summarize the key points here. There are 43 rules in total, they are going roughly in the order of development stages and divided into 3 phases. Let’s begin.

0. Do not talk about Artificial General Intelligence. ML engineers may get you wrong and decide that you’re schizophrenic. Just kidding.

Phase 1: First Pipeline

Do not hesitate to launch a product without ML. If a simple heuristic or manual work can get the job done — put ML aside until you’ll get enough data.
Add metrics. Then add more metrics. Track everything to see how each part of your app does it’s job. When you have a problem or simply not sure — add a metric.
Do not invest too much time into a heuristic. Once it becomes complex — switch to a ML model. It would be easier to develop and maintain later.
Focus on infrastructure first. Build a pipeline with a simple model and make sure the whole thing works well.
Add tests to every piece of your pipeline. Test your infrastructure, test your data, test your models.
When you’re copying an existing pipeline for a new project, make sure that you are not throwing away potentially useful data.
Do not simply throw your heuristics away. They might generate useful features for your ML model.
Keep the model fresh. Track how much your model’s performance degrades with time and make sure it won’t become rotten in production.
Track your model’s performance before rolling out in production. Especially when it’s a user-facing model.
Test your data, again. Make sure that all features are gathered and data is fine.
Create documentation for features. Where they come from and why they are useful.
Choose the right objective. It must reflect the performance of the main goal of your project.
Picking the right objective might be hard, so choose a metric that is easy to measure and is a proxy for the “true” objective.
Simple model is easier to debug. Linear or Logistic regression might be a good choice.
Separate spam filtering and quality ranking. Such models are often used in different ways.

Phase 2: Feature Engineering

Plan to iterate. Your features and your model will evolve with time, so make sure you’re ready to update it.
Start with directly observed features. And leave learned features for future updates.
Try out features that generalize across contexts. Metrics like # of likes may be useful in recommendations, quality ranking and other models.
Don’t forget to add very specific features. Even when they don’t generalize well, they still have value. If they apply only to a small # of examples, regularization will filter them out.
Hand-craft new features from existing ones in human-understandable ways. And try to not overthink them.
The number of feature weights you can learn in a linear model is roughly proportional to the amount of data you have. With only about 1000 examples — use a few dozens of features. With more than a billion — 10m features is enough.
Remove unused features from your pipeline. If you found that some feature does not improve your model — get rid of it.
Don’t let engineers do the UX testing. Do it through a crowdsourcing platform, or through a live experiment on real users.
Track how much models differ. Before testing a new model further, calculate how much it’s outputs differ from the previous one.
Focus on utility of your model, not predictive power. A new model may have higher accuracy, but how useful it is in practice is more important.
Look at your model’s errors and create new features to fix them. Try to find patterns in errors and add respective features.
Create metrics to track these error. Optimization becomes easier when you have metrics to optimize.
Model may have very different behavior in the long-term. Even when in short-term it looks exactly how you expected.
Gather data live and use it to test model during training. This way you’ll be sure that model will perform well after deployment.
Importance-weight sampled data, don’t arbitrarily drop it.
Don’t forget that data itself may change between training and serving. Sometimes, however, you can deal with it using hourly/daily/weekly snapshots.
Share the code between training and production pipelines. As much as possible.
Use earlier data for training, later data for testing. This will help you roughly estimate how your model will perform in the future.
Hold out a small % of examples in binary classification tasks to get clean training data. Show 1% or 0.1% of these to users to get new training data.
Regularize general features harder and allow only positive weights in ranking tasks. Don’t let your ranking system become too biased.
Avoid feedback loops with positional features. Separate them during training and avoid them during serving.
Measure how your model performs on training, validation, testing and live data. Track how much they differ and try regularize model more to reduce the difference.

Phase 3: Refinement and Complex Models

Revisit your objective. Make sure that it is aligned with your product goals.
Launch decisions are a proxy for long-term product goals. Usually a single decision affects multiple metrics, so don’t rush into quick decisions.
Keep your base and ensemble models separate. Base models only take raw inputs, ensemble only take outputs from base models.
When your performance stopped growing, look for new sources of features. Don’t focus on existing ones for too long.
Diversity, personalization, and relevance are important. But not as much as popularity.
Your friends tend to be the same across different products. Your interests tend not to be. We tend to have the same connections, but our goals change.

I think it’s nice to have these guidelines while working on ML project. Because if you won’t, you’ll have a good chance to get into technical debt. So, don’t forget to bookmark them and revisit from time to time during work.

Machine Learning Rules in a Nutshell

Phase 1: First Pipeline

Phase 2: Feature Engineering

Phase 3: Refinement and Complex Models

Written by Egor Dezhic