Mathematics for Data Science

Are you overwhelmed by looking for resources to understand the math behind data science and machine learning? We got you covered.

Ibrahim Sharaf
Towards Data Science

--

Motivation

Learning the theoretical background for data science or machine learning can be a daunting experience, as it involves multiple fields of mathematics and a long list of online resources.

In this piece, my goal is to suggest resources to build the mathematical background necessary to get up and running in data science practical/research work. These suggestions are derived from my own experience in the data science field and following up with the latest resources suggested by the community.

However, suppose you are a beginner in machine learning and looking to get a job in the industry. In that case, I don’t recommend studying all the math before starting to do actual practical work. You’ll probably get discouraged, as you started with the theory (dull?) before the practice (fun!). This bottom-up approach is counter-productive.

My advice is to do it the other way around (top-down approach), learn how to code, use the PyData stack (Pandas, sklearn, Keras, etc.), get your hands dirty building real-world projects, use libraries documentation and YouTube/Medium tutorials. THEN, you’ll start to see the bigger picture, noticing your lack of theoretical background, to understand how those algorithms work; at that moment, studying math will make much more sense to you!

Here’s an article by the fantastic fast.ai team, supporting the top-down learning approach

And another one by Jason Brownlee in his gold mine “Machine Learning Mastery” blog

Resources

I will divide the resources into three sections (Linear Algebra, Calculus, Statistics & Probability); the list of resources will be in no particular order. Resources are diversified between video tutorials, books, blogs, and online courses.

Linear Algebra

Linear Algebra is used in machine learning to understand how algorithms work under the hood. It’s all about vector/matrix/tensor operations; no black magic is involved!

  1. Khan Academy Linear Algebra series (beginner-friendly).
  2. Coding the Matrix course (and book).
  3. 3Blue1Brown Linear Algebra series.
  4. fast.ai Linear Algebra for coders course, highly related to modern ML workflow.
  5. The first course in Coursera Mathematics for Machine Learning specialization.
  6. “Introduction to Applied Linear Algebra — Vectors, Matrices, and Least Squares” book.
  7. MIT Linear Algebra course, highly comprehensive.
  8. Stanford CS229 Linear Algebra review.

Calculus

Calculus is utilised in machine learning to formulate the functions used to train algorithms to reach their objective, known by loss/cost/objective functions.

  1. Khan Academy Calculus series (beginner-friendly).
  2. 3Blue1Brown Calculus series.
  3. The second course in Coursera Mathematics for Machine Learning specialization.
  4. The Matrix Calculus You Need For Deep Learning paper.
  5. MIT Single Variable Calculus.
  6. MIT Multivariable Calculus.
  7. Stanford CS224n Differential Calculus review.

Statistics & Probability

Both are used in machine learning and data science to analyze and understand data, discover and infer valuable insights and hidden patterns.

  1. Khan Academy Statistics and probability series (beginner-friendly).
  2. Seeing Theory: A visual introduction to probability and statistics.
  3. Intro to Descriptive Statistics from Udacity.
  4. Intro to Inferential Statistics from Udacity.
  5. Statistics with R Specialization from Coursera.
  6. Stanford CS229 Probability Theory review.

Bonus Materials

  1. Part one of Deep Learning book.
  2. CMU Math Background for ML course.
  3. Mathematics for Machine Learning book.

So, that was me giving away my carefully curated Math bookmarks folder for the common good! I hope that helps you expand your machine learning knowledge and fight your fear of discovering what’s happening behind the scenes of your sklearn/Keras/pandas import statements.

Your contributions are very welcomed through reviewing one of the listed resources or adding new remarkable ones.

--

--