Why use Machine Learning Instead of Traditional Statistics?

Published in

Towards Data Science

6 min readJul 20, 2018

This question loomed over me as I prepared for a phone call interview for an analyst position in a major Financial Services firm. My curiosity turned to distress when the engineer interviewing me asked: “Why did you choose to use learning methods in your projects rather than just statistics with regressions and correlations?”

I didn’t understand the question at first. It seemed obvious to me that an algorithm which recommends its future University to a student cannot function without Machine Learning. How can I identify movies that will increase the popcorn sales of a movie theatre chain without applying AI to the hundreds of thousands of transactions of our dataset? Following my interview, I began pondering this question.
Why can’t simple regressions be applied to identify impactful factors and their correlation coefficients?
As I was thinking about it and doing some research I realized that the boundaries between the disciplines seem blurred for many people — so much so that we can wonder if Machine Learning isn’t just Glorified Statistics.
Is ML really different from Statistics? Does this difference have a meaningful technical impact? Or is it just nice to say that you are doing ML and AI?

In this article, I will mainly focus on supervised learning methods. First, because that is what my projects were relying on, so that is the category I had to bring an answer to. And second, because that is generally what people think about when they hear ‘Machine Learning’.

Indeed, people generally associate ML with inferential statistics, i.e a discipline which aims to understand the underlying probability distribution of a phenomenon within a specific population. This is also what we want to do in ML in order to generate a prediction for a new element of the population.
Inferential statistics relies on assumptions: the first step of the statistical method is to choose a model with unknown parameters for the underlying law governing the observed property. Then correlations and other statistical tools help us determine the values for the parameters of this model. Hence if your assumptions about the data are wrong, computation of the parameters will make no sense and your model will never fit your data with enough accuracy.

Trying to fit a Gaussian model by visualizing a histogram

«So let’s choose better hypothesis and make sure to pick the right model!» you could think. Right, but there are an infinite number of possible families of distributions and no recipe to come up with the good one. Usually, we conduct a descriptive analysis to identify the shape of the distribution of our data. But what if the data has more than two features? How do we visualize this data to make a model proposition? What if we cannot identify the specific shape of the model? What if the subtle difference between two families of models cannot be distinguished by the human eye? In fact, the stage of modelisation is the most difficult part of the inferential statistics methodology.

And that is right! This is also what we do in Machine Learning when we decide that the relationship in our data is linear and then run a linear regression. But ML doesn’t sum up to this. Let’s not forget that learning methods, and it is even more true for deep learning, find their roots in nature and the human process of learning.

Let’s look at an example. Take a physician who wants to reconvert into a real estate broker. At first, she will have no understanding of the real estate market and how it works. But as she observes deals being closed on properties she has visited, she will have a better understanding of the market and will soon be able to execute good estimations for the value of properties. Of course, we all have a broad intuition that the smaller the house the lower the price but it doesn’t mean a house two times smaller will cost two times less. Moreover, our new broker hasn’t been given with a formula including all the features of a house to compute the right price. That would be too easy! Instead, she has ‘fed’ her brain with many examples and has built a more accurate judgment over time: she has learned!

That is exactly the point of Machine Learning, Deep Learning, and AI, overall. Such learning methods enable us to identify tricky correlations in data sets for which exploratory analysis hasn’t enabled proper determination of the shape of the underlying model. We do not want to give an explicit formula for the distribution of our data, rather, we want the algorithm to figure out the pattern on its own directly from the data. Learning methods enable us to throw off assumptions attached to the statistical methodology.

Of course, for some real-world applications, regressions and correlations are sufficient. Sometimes we just want to know the general trend of a variable against another. In this case, a simple correlation will determine the coefficient related to this trend. But how would one determine a model to classify these non-linear separable data:

There is no straight line to separate these data points in 2D

There is no simple regression able to do it. Machine Learning came up with SVM (Support Vector Machine) and the kernel trick which map the data into higher dimensions where they are linearly separable:

SVM algorithm maps the points into 3D where they are separable by a plane (linear hyperplane in 3D)

Another example where ML seems inevitable. Let’s say our data has five features X1, X2, X3, X4 and Y and the underlying relationship between Y (the desired output we want to be able to predict ) and the other four input features is as crazy as:

epsilon (in blue) being the noise inherent to most of the real world phenomena. Of course, regression is theoretically doable. We can build new variables out of the four inputs by applying the logarithm, the square root, the exponential, raising the features to the power of 2, 3, 4…, taking the logarithm of some sums of those variables. We would then use these new features as input variables for a linear regression. Coming up with the right model would require to use a huge number of possible variables and it is very unlikely we achieve it — and I am not including the different combinations of variables which can be chosen by subset selection.
Once again a simple polynomial regression may provide us with a model accurate enough regarding our goal. But if we look for a minimized error, learning methods are the key.
Don’t get me wrong, there is no ML method that will output the ‘real’ underlying model. Instead, it will provide a model representing the closest possible match to the behavior of the data. Sometimes this can be a global function or a local model if we use random forest or KNN for example. These last algorithms don’t even provide a function for which we compute the output for a newly observed data but rather assign the new observation to a group of ‘old’ data — training data — and decide that the output should be similar to the one of this group.

We can also mention the fact that Machine Learning is inherent to Computer Science and Optimization that allow fast learning on a huge dataset, which is not a concern of statistics. We see that the goal of ML is not to come up with knowledge about the data (‘this is the real phenomenon, this is how it works’) but rather with a workable and reproducible model for which the error tolerance is determined by the project we are undertaking.

Learning methods are in fact necessary to deal with plenty of problems. They ignore our lack of knowledge about the data by not prompting us to choose a model. Certainly, you will have to decide on an algorithm to work with, but this is a different story. And there is no doubt that they will remain essential, as confirmed by the crazy applications empowered by deep learning and NLP.

What about you? Why are learning methods essential to your projects? Let me know in the comments section 😄

Why use Machine Learning Instead of Traditional Statistics?

Written by Wendy Teboul