22 nuggets of wisdom to structure your machine learning project

Completing a machine learning project is hard. This summary helps you structure your machine learning project.

Published in

Towards Data Science

5 min readNov 21, 2017

This post examines the third course of the Coursera Deep Learning Specialization. It gives a graphical overview of the key concept and summarizes every lecture in roughly 1 sentence.

Why this summary? I am convinced that the lessons learned are helpful for all machine learning practitioners. After binging the lectures, I believe the reader benefits from a tight lecture summary and a general overview of the material. If this post stimulates your curiosity, please feel encouraged to sign up and explore the course yourself.

So without further ado — let’s dive into the world of knowledge.

Iterate quickly

At its core, every machine learning project is about having an idea, implementing and then evaluating it. The sooner you iterate through these steps, the sooner you will reach your goal. This structure resonates with Eric Ries’ lean startup spirit to “Build, Measure and Learn”, which is so popular in startups, organizations, and corporates all over the world.

At every point of this process, you have to make important decisions. Should you gather more data? Should you reduce bias or variance? How should you compare different algorithms? The course offers advice to choose the most efficient steps forward. The image above summarizes this key concept along with advice for every iteration step.

Next you can find a one sentence summary for every lecture video in the third Coursera Deep Learning Specialization Course. Voilá, your 22 nuggets of wisdom.

Lecture summaries

Why ML strategy — Machine learning strategy is useful to iterate through ideas quickly and to efficiently reach the project outcome.
Orthogonalization — Refers to the concept of picking parameters to tune which only adjust one outcome of the machine learning model, e.g. regularization is a knob to reduce variance.
Single number evaluation metric — Pick one evaluation metric, e.g. f1-score, to instantly judge the performance of multiple models.
Satisficing and Optimizing metrics — A machine learning model generally has one metric to optimize for, e.g. achieve maximum accuracy, and certain constraints which should be upheld, e.g. calculate predictions in less than 1s or fit the model into local memory. In this case, accuracy is the optimizing metric and prediction time and memory usage are satisficing metrics.
Train/dev/test sets distributions — Make sure that the development and test sets come from the same distribution and that they represent the target accurately that team tries to optimize for.
Size of the dev and test sets — Use as much data as possible for the training set and use 1%/1% for development and test set, given that your training set is in the millions.
When to change dev/test sets and metrics — If you find out that the rank of your evaluation metric doesn’t accurately reflect the performance of your models anymore, consider restating the optimization metric, e.g. through adding a weighting term to heavily penalize your classifier for misclassifying really important examples.
Why human-level performance? — Bayes error is the best performance that a classifier can achieve and by definition better than human-level performance. Bayes and human-level error are important metrics to evaluate whether your training data suffers from bias.
Avoidable bias — Describes the gap between training set error and human-level performance.
Understanding human-level performance — If a group of experts is able to achieve an error rate of 0.7% and a single human achieves 1% error rate, chose 0.7% as the best human-level performance and a value <0.7% as the Bayes error to test model performance.
Surpassing human-level performance — If your algorithm surpasses human-level performance, it becomes very hard to judge the avoidable bias because you generally don’t know how small the Bayes error is.
Improving your model performance — Evaluate the difference between Bayes error and training set error to estimate the level of avoidable bias. Variance is estimated by comparing training to dev set error. Try different techniques to combat any form of error.
Carrying out error analysis — Analyze 100 misclassifies examples and batch them by reason for misclassification. To improve your model, it might make sense to train your network to eliminate the reason why it misclassifies a certain type of input, e.g. feed it with more foggy pictures.
Cleaning up incorrectly labeled data — Neural networks are pretty stable to handle random misclassifications and if you eliminate misclassifications in the dev set, also eliminate it in the test set.
Build your first system quickly, then iterate — Quickly prototype a first version of the classifier and then improve it iteratively following the strategic guidelines.
Training and testing on different distributions — If your data in the training set comes from mixed data sources, create the dev and test sets with the data that you want to optimize for, e.g. if you want to classify sneaker images from a phone, use a dev and test set consisting only of sneaker photos from mobile phones but feel free to use enhanced sneaker web images to train the network.
Bias and variance with mismatched data — Create a training-dev set with the same data distribution as the training set when you have a dev and test set from a different data distribution. This step helps you check if you have a variance, bias or data mismatch problem.
Addressing data mismatch — If you have a data mismatch problem, carry out manual error analysis and understand the difference between training and dev/test sets. Be mindful of creating artificial training data, because it could happen that you synthesize only a small subset of all available noise.
Transfer learning — If you have lower-level features that another classifier could benefit from, use transfer learning and cut off the last layer of an existing neural network and train the network on the new task.
Multitask Learning — Use a single neural network to detect multiple classes in an image, e.g. traffic lights and pedestrians for an autonomous car. Again, it is useful when the neural network identifies lower-level features which are helpful for multiple classification tasks and if you have an equal distribution of class data.
What is end-to-end deep learning — Instead of using many different steps and manual feature engineering to generate a prediction, use one neural network to figure out the underlying pattern.
Whether to use end-to-end deep learning — End-to-end deep learning has advantages like letting the network figure out important features itself and disadvantages like requiring lots of data, so its use really has to be judged on a case-by-case basis by how complex the task or function is that you are solving.

Thank you for reading this post and I hope it helped you recap your course experience. All credit belongs to deeplearning.ai, Coursera and Andrew Ng. If I missed an important topic, please feel free to add it in the comments section below.

If you’re interested in more updates on the Coursera Deep Learning Specialization and machine learning in general, check out my previous two posts, follow me on medium and feel free to shoot me a message on LinkedIn if you have any additional questions.

22 nuggets of wisdom to structure your machine learning project

Completing a machine learning project is hard. This summary helps you structure your machine learning project.

Iterate quickly

Lecture summaries

Written by Jan Zawadzki