What is Machine Learning System Design Interview and How to Prepare for It

A detailed guide on preparing ML system design interview with a template

Aqeel Anwar

Published in

Towards Data Science

7 min readApr 4, 2022

What is ML System Design?

Machine learning interviews cover a wide range of skills such as coding, machine learning, probability/statistics, research, case studies, presentations, etc. One of the important machine learning interviews is the system design interview.

The ML system design interview analyzes the candidate’s skill to design an end-to-end machine learning system for a given use case.

This is done to gauge the candidate’s ability to understand the bigger picture of developing a complete ML system, taking most of the necessary details into account. The majority of the ML candidates are good at understanding the technical details of ML topics. But when it comes to connecting them together, they fail to discern the complexities and inter-dependencies of designing a complete ML system from data collection all the way to model evaluation and deployment and hence perform poorly in such interviews.

The essential thing in such an interview is the organized thought process. Such an organizational thought process requires preparation. A common template for such problems can come in real handy during the limited interview time. This guarantees that you keep your focus on important aspects and not talk about one thing for long or entirely miss important topics.

📓 The Template:

Most ML system design interview questions can be answered following the template below

In this article, we will go through the organized process of the ML Design Interview following the six-step template above mentioning key resources for each module.

1. Understand the problem and ask clarifying questions

Before you even begin working on the problem, you have to make sure you have enough information. ML design problems are most of the time open-ended. The key to designing an efficient model is gathering as much information as possible. The interviewer will present the problem with bare minimum information. When presented with the problem, you have to make sure you understood it correctly and ask clarifying questions such as corner cases, data size, data/memory/energy constraints, latency requirements, etc.

2. Data collection/generation

ML models learn directly from data, hence the source of the data and collection strategy matters a lot. There are multiple ways to collect data for your ML system. A couple of ways are

User’s interaction with the existing system (if any)
Human labelers/Crowdsourcing
Specialized labelers
Synthetic data

It would help if you discussed with the interviewer alongside these points. Another important thing is to analyze what kind of data is available to you and argue if there is enough versatility. You should be aware of the implications of the imbalanced dataset in ML and address it if need be. Make sure the positive and negative samples are balanced to avoid overfitting to one class. Also, there shouldn't be any bias in the data collection process. Ask yourself if the data is sampled from a large enough population so that it generalizes well.

A walk through imbalanced classes in machine learning through a visual cheat sheet

What imbalanced training data is and how to address it through precision, recall, and f1 score

towardsdatascience.com

3. Exploratory data analysis

Once you have the raw data, you can’t use the data directly to feed into the ML system. You always have to analyze and prune it. Such pre-processing includes data cleaning, filtering, and getting rid of redundant parameters. A few ways of doing that are by

Studying the features (mean, median, histogram, moments, etc.)
Exploring the relationship between features (covariance, correlation, etc)
Applying dimensionality reduction (such as PCA) to get rid of redundant parameters

The end goal is to explore which features are important and get rid of the redundant ones. Unnecessary features tend to create issues in model training usually known as the curse of dimensionality.

4. Model and KPI Selection

Instead of selecting complex models, always begin with the simpler ones. Analyze the model for the given problem and data, and then keep on improving it. The interviewer is interested in your thought process and whether you can realize your mistakes and improve on them. When discussing the model, make sure you talk about

Type of model (regression, trees, ANN, random forest, etc.)
If you opted for a DNN, discuss the structure, number of layers, type of layers, etc
Whether you prefer one type of network/block over another such as AlexNet, VGGNet, ResNet, and inception.
Make sure to talk about the memory and computation usage of your network

The selection of your model depends on both the available data and your performance metric. Make sure to talk about the different KPIs and how they compare with each other. Such KPIs include but are not limited to

Classification Problem: Accuracy, Precision, Recall, F1 Score, Area Under ROC (AUROC)
Regression: MSE, MAE, R-squared/Adjusted R-squared
Object Detection/Localization: Intersection over Union (IoU), Average Precision (AP)
Reinforcement Learning: Cummulative reward, Return, Q-values, Success rate.
System/hardware: Latency, Energy, Power
Business-related KPIs: User retention, Daily/Monthly Active Users (DAU, MAU), New users

The following article will help you understand the different blocks of an ML model

A visualization of the basic elements of a Convolutional Neural Network

Animated visualizations of different CNN elements

towardsdatascience.com

A Beginner’s Guide to Regression Analysis in Machine Learning

Regression analysis explained with examples, illustrations, animations, and cheat sheets.

towardsdatascience.com

Difference between AlexNet, VGGNet, ResNet, and Inception

AlexNet, VGGNet, ResNet and Inception explained

towardsdatascience.com

5. Model training

This is where the technical knowledge will be evaluated. Make sure you are familiar with the different aspects of ML training and are comfortable talking about them in-depth. The interviewer might even ask you how you will combat say overfitting, or why didn't you use regularization, and if you did which one did you use and why, etc. Topics include but are not limited to

Loss function selection: CrossEntropy, MSE, MAE, Huber loss, Hinge loss
Regularization: L1, L2, Entropy Regularization, K-fold CV, dropout
Backpropagation: SGD, ADAGrad, Momentum, RMSProp
Vanishing gradient and how to address it
Activation functions: Linear, ELU, RELU, Tanh, Sigmoid
Other issues: Imbalanced data, Overfitting, Normalization, etc

The following articles will help you create an in-depth understanding of some of the topics mentioned above

Types of Regularization in Machine Learning

A beginner's guide to regularization in machine learning.

towardsdatascience.com

Difference between Local Response Normalization and Batch Normalization

A short tutorial on different normalization techniques used in Deep Neural Networks.

towardsdatascience.com

6. Evaluation

The end goal of the trained model is to perform well in real-world scenarios of the problem at hand. How does it perform on unseen data? Are the corner cases covered? To analyze this, one needs to do both offline and online evaluations.

Offline evaluation: The performance of the model on the hold-out samples of the dataset. During dataset collection, the data was divided into train, test, and validation subsets. The idea is to analyze how well the model generalizes the unseen datasets. You can also carry out K-fold cross-validation to find the performance under different subsets of data. The model that performs well for the selected KPI is selected to be implemented and deployed.
Online evaluation: The first step of deploying the trained model in real-world scenarios (after it has been evaluated offline) is to carry out A/B testing. The trained model is not quickly put out to face the real-world data at large. It's far too risky. Instead, the model is deployed on a small subset of scenarios. For example, say the model designed was to match an Uber driver with the rider. In A/B testing, the model will say only be deployed in a smaller geographical region instead of the entire globe. This beta version of the model will then be compared with the existing model over a longer period of time and if it results in an increase in the performance of the business-related KPI (such as more DAU/MAU for the Uber app, better user retention, and eventually an improved uber revenue for that are) then it will be implemented on a larger scale.

Wrapping up:

If at any step you are headed in the wrong direction, the interviewer will jump in and try to steer you in the desired direction. Make sure you take the hints provided by the interviewer. ML System design is supposed to be a discussion, so whenever you state something ask the interviewer what are their thoughts on it, or if they think this is an acceptable design step. Once you are done talking about these 6 steps (with improvements along the way), make sure to recap the final system design parameters in a few sentences mentioning the key takeaways along the way.

Summary:

In this article, we looked at an organized way of answering an ML System Design question. There is no one correct answer, and the purpose of this interview is to analyze the candidate’s thought process for designing an end-to-end system. Having said that, an in-depth understanding of various ML topics is necessary to succeed in this interview. The following cheatsheets will help refresh those topics.

Cheat Sheets for Machine Learning Interview Topics

A visual cheat sheet for ML interviews (www.cheatsheets.aqeel-anwar.com)

medium.com

If this article was helpful to you or you want to learn more about Machine Learning and Data Science, follow Aqeel Anwar, or connect with me on LinkedIn or Twitter. You can also subscribe to my mailing list.