EDUCATION

The 2021 Kaggle survey just revealed that, like the previous year, the most commonly used algorithms amongst the Data Science community were linear or logistic regression. I am not surprised at all. Weighted summation of different features is the most intuitive thing to do after all. However, what becomes less obvious to most beginners is the distinction between linear regression and other kinds of regression.
After you finish reading this post, you will have a very clear idea of what distinguishes a linear regression from logistic and Poisson regression. Further, you will also clearly understand the reasons why we still call it logistic "regression" when it is used for classification tasks.
If you have ever been confused about these different types of regressions before, you have come to the right place. However, I request you to suspend your prior knowledge on the subject for the next few minutes. If you do that, I promise that you will walk away with crystal clarity. Let’s dive in.
What is a Generalized Linear Model?
In a nutshell, Generalized Linear Model (GLM) is a mathematical model that relates an output (a function of the response variable, more on this later) with one or more input variables (also called the exploratory variables). The equation below shows how the output is related to a linear summation of n predictor variables. There are corresponding n+1 coefficient terms (one each for the n predictor variables, and one additional term to help model any offset)

Regardless of whether you do linear, logistic, or Poisson regression, the right-hand side of the above equation (weighted combination of input features) stays the same.
Let’s talk about the left-hand side of the equation, the output. This is a random component. It is a function of the expected value of the response variable. Let’s call the expected value Y for simplicity.

The function, g(.), is called the link function. It is this link function that makes the distribution of Y compatible with the right-hand side (the linear combination of inputs).
When the function g(.) is an identity function, then the GLM equation reduces to a regular linear regression equation.

In other words, the regular linear regression is a special case of the generalized linear model when the link function is an identity.
Key differences between Regular Linear Regression and GLM
As I mentioned earlier, the regular linear regression is a special case of GLM. However, before proceeding further, let’s take a quick detour to explain some key differences between the two.
The key assumptions of a regular linear regression are that each value of the output Y is independent, that the output is normally distributed, and that the mean of Y is related to the predictor variables by a linear combination. In the GLM, the output is not confined to being a normal distribution and it can, instead, belong to any member of the exponential family.
To solve the regular linear regression problem, you can take either a least-squares or maximum likelihood estimation. They will both give the same results. However, GLM can only be solved with a maximum likelihood estimation approach.
Lastly, the regular linear regression model is also sometimes called the standard least-squares model, invented by Gauss in ~1809, while the GLM was invented by Nelder and Wedderburn in ~1972.
How does GLM relate to Linear, Logistic and Poisson Regression?
Now that you start viewing linear regression as a special case of GLM, we can proceed and identify the other cases of GLM.
When the link function is logit (the natural log of a proportion), we end up with a Logistic Regression equation.

Logistic regression is most suitable when the outcome is binary (e.g. success/failure, have a disease/do not have the disease). In such applications then, the ratio (_Y/(1-Y) )_ is akin to the ratio of the probability of success to the probability of failure (also called the ‘odds’).
When the link function is the natural log of the rate, we end up with a Poisson regression equation.

Poisson regression is most suitable when the outcome is a count in a given time interval or the number of events that occur in a given time.
Relationship Between Link Function and Activation Function
The link function g(.) is an invertible function that transforms the expectation of the output to make it compatible with the linear predictor part (the right-hand side of the equation in GLM). However, in the Machine Learning community, we are often first introduced to the inverse of the link function. This is called the activation function.

The inverse of the link function is the same as the activation function. The term "link function" is common in the statistics literature, while the term "activation function" is more common in the machine learning literature.
For example, if you take the exponential of both sides of the GLM equation (shown earlier) for logistic Regression and apply simple algebraic manipulations, you will end up with the following equation for logistic regression (more commonly seen in machine learning literature).

Final Thoughts
People often confuse and mix the concepts. Worse still, people often refer to the anomaly of calling it Logistic regression when it is used for classification. This is partly because we, in the machine learning community, have divided supervised learning into classification (when the output is discrete) and regression (when the output is continuous).
Of course, logistic regression is used for classification but it is still a regression technique. This only starts making sense when you understand Generalized Linear Models, a more overarching concept.
At the end of the day, you combine different input variables as a weighted sum with unknown coefficients that need to be determined regardless of whether you use linear, logistic, or Poisson regression.
Join Medium with my referral link – Ahmar Shah, PhD (Oxford)