What makes a GML GML?

Generalized linear models are a group of models with some common attributes. These common attributes are:
- The distribution of the response variable (i.e. the label), given an input x, is a member of the exponential family of distributions.
- The natural parameter of the exponential family distribution is a linear combination of theta (i.e. the model parameters) and input data.
- At prediction time, the output of the model for a given x is the expected value of the distribution for that x.
If a model has these 3 characteristics, it is a generalized linear model.
Before diving in further, this article assumes you know what linear and logistic regressions are.
Now, onto what each of these 3 characteristics mean.
I. The distribution of the response variable given x is a member of the exponential family of distributions.
When doing Linear Regression or logistic regression, we assume that the distribution of the response variable belongs to an exponential family. What this means is that given a set of features (x), the outcome is some value (y). The probability for this value (y) follows a distribution that’s in the exponential family. In the case of linear regression, we assume the distribution of y given x is a Gaussian distribution with some mean μ and variance σ².

As we shall see in a bit, the Gaussian distribution is part of the exponential family of distributions.
For further reading, check out my article on probabilistic interpretation of linear regression.
Probabilistic interpretation of linear regression clearly explained
What is the exponential family of distributions?
A distribution belongs to the exponential family if its probability density function (PDF) can be written in the form:

…and the PDF integrates to 1.
In the PDF equation above, eta (η) is called the "natural parameter" of the distribution.
The Gaussian (or Normal) distribution is in the exponential family.

The PDF for Gaussian distribution with variance 1 is:

The greek letter μ is the mean of the distribution. Here, μ is called the "canonical parameter" of the distribution. We will see below how the mean μ is connected to the natural parameter eta (η).
From the equation above, after some algebraic arrangement, we get:

Thus, the Gaussian distribution has the required form to be in the exponential family.
For Gaussian distribution, the canonical parameter μ is equal to the natural parameter η.
The Bernoulli distribution is also in the exponential family.

The PDF of Bernoulli distribution is:

Here, the canonical parameter phi (Φ) is the probability of y being 1. We now need to write this PDF in the exponential family form:

For Bernoulli distribution, the natural parameter eta is the so-called "log-odds".

The natural parameter can be expressed as a function of the canonical parameter. This function is called the canonical link function. Similarly, the canonical parameter can be expressed as a function of the natural parameter. This function is called the canonical response function. We shall see in a bit the role of the natural parameter and canonical parameter in linear and logistic regression.
Now, onto attribute number 2.
II. The natural parameter of the exponential family distribution is a linear combination of theta and input data
This line basically means that eta can be expressed as theta times input data.

When doing linear or logistic regression, we’re training a model to find the "best" theta given a set of data.
What this means is that in order for GLM to be a useful learning algorithm, we expect there to be a linear relationship between the input data and the natural parameter of the distribution for the outcome at that data.
III. At prediction time, the output of the model for a given x is the expected value of the distribution for that x
When doing linear or logistic regression, we’re training a model to predict an outcome for a given data.
In the case of linear regression, we’re training a model to predict the mean of a Gaussian distribution for a given x. The mean of a Gaussian distribution is the expected value of the Gaussian distribution. Thus, the output of the model for a given x is the expected value of the distribution for that x.
In the case of logistic regression, the expected outcome for a datapoint x is a probability. This probability is the expected value of a Bernoulli distribution.
We can use the natural parameter to find the expected value of a distribution.
In the case of linear regression:

In the case of logistic regression:

A generalized linear model is just a model with the aforementioned 3 attributes. The "linear" part comes from the fact that the natural parameter (eta) is a linear combination of the model parameter (theta) and input data.
Why GLM is useful
Being in the exponential family of distribution comes with perks.
- The loss function is convex. In other words, there is one global minimum (as opposed to several local minima). Thus, gradient descent will converge to the global minimum.
- All GLMs use the same formula to update theta in gradient descent.

Just plug in the right h(x) to do gradient descent. As we have seen earlier, h(x) is a function of the natural parameter, and the natural parameter is theta times x.
