The statistical foundations of machine learning
A look beyond function fitting
Developing machine learning algorithms is easier than ever. There are several high-level libraries like TensorFlow, PyTorch or scikit-learn to build upon, and thanks to the amazing effort of many talented developers, these are really easy to use and require only a superficial familiarity with the underlying algorithms. However, this comes at a cost of deep understanding. Without proper theoretical foundations, one quickly gets overwhelmed with the complex technical details.
My aim is demonstrate how seemingly obscure and ad-hoc methods in machine learning can be explained with really simple and natural ideas when we have the appropriate perspective. Our mathematical tool for that is going to be probability theory and statistics, which lies at the foundation of predictive models. Using the theory of probability is not just an academic exercise, it can actually provide very deep insight into how machine learning works, giving you the tools to improve on the state of the art.
Before we begin our journey in the theoretical foundations of machine learning, let’s look at a toy problem!
Fitting models
Suppose that we have two numerical quantities, say x and y, that are in relation with each other. For…