We have used Linear Regression is many use cases to model the linear relationship between a scalar response variable and one or more explanatory variable. Linear regression makes some key assumptions like normality and constant variance of the response variable. So, what will happen if the response variable does not follow the "usual" assumptions like normality and constant variance? Generalized Linear Model (GLM) is one of the commonly used approaches for data transformation to tackle that issue. But the problem is GLM consists of lot of terms, notations and components. So, sometimes it is little bit confusing to grasp the idea. But don’t worry, I’m here to help you understand all the concepts clearly.
The three basic statistical procedures of GLM are:
- Regression
- Analysis of Variance, aka ANOVA
- Analysis of Covariance, aka ANCOVA
In this article, I’d like to mainly concentrate on the regression part of GLM and talk about exponential family and how to transform the response variable when it is following any one of the distributions that belongs to the exponential family.
Exponential family
Given a single parameter θ, we define exponential family as a set of probability distributions whose probability density function (or probability mass function for discrete distribution) can be expressed as
P(x| θ) = B(x) exp[η(θ) T(x) – A(θ)]
A and η are functions which map θ to ℝ.
By considering η(θ) as parameter we can represent the above equation in Canonical form.
P(x| η ) = B(x) exp[η T(x) – A(η)]
Where,
η : natural parameter
T(x) : sufficient statistic
B(x) : base measure
A(η) : log partition
Now, let’s see how Bernoulli distribution belongs to exponential family. Let us consider a Bernoulli distribution with parameter Φ(Φ is probability of an event). So p.m.f of x is of the form

So, comparing with the canonical form we can get,
B(x)=1
η =log(Φ/1- Φ) => Φ=1/1+exp(- η)
T(x)=x
A(η)= -log(1- Φ)
or, A(η)=-log(1- 1/1+exp(- η)) [Replacing the value of Φ]
or, A(η)= log(1+exp(η))
Similarly, for Normal distribution with unknown mean µ and known variance σ², p.d.f is as follows

So, η = µ/ σ => µ = η σ
and A(η)= µ²/ 2σ² = η²/2
Similarly, for Poisson distribution with parameter λ, p.m.f is of the form

So, η = log λ => λ =exp(η)
and A(η) = λ = exp(η)
Some of the other probability distributions that belong to exponential family are Binomial, Gamma, Exponential, Beta, Chi-squared distribution etc.
Exponential family has some properties:
- Maximum Likelihood estimate of exponential family w.r.t η is concave.
- E[x; η] = ∂A(η)/ ∂η
- Var[x; η] = ∂²A(η)/∂η²
Now let’s see how some of the commonly used regression methods are designed from the GLM family using the concept of exponential family.
Generalized Linear Model
The basic idea of GLM is to fit a linear model for an "appropriate" function of the expected value of the response variable.

So, a GLM consists of mainly three components:
- Probability distribution of the response variable
- Linear predictor
- Link function
The response variable follows a certain probability distribution with some parameter.
Linear predictor is the linear combination of model parameter β and explanatory variable X.
Link function g is the function that links the linear predictor and the with the parameter of the probability distribution of the response variable Y.
Let’s see how we can fit some of the commonly used regression models using the concept of GLM.
Linear Regression
The response variable Y follows Normal distribution with constant variance. As Normal distribution belongs to exponential family and A(η) = η²/2 for Normal distribution with known variance. So, expected value of the response variable
E(Y) = ∂A(η)/ ∂η = η
With known σ =1,
we can write, Identity(µ) = Identity(η) = η = βᵗ x
So, the response variable Y follows the Normal distribution with parameter µ and link function g should be Identity function.
Logistic Regression
This is a situation where the response variable has only two possible outcomes(dichotomous) generally called "Success" and "Failure". So, the response variable does not follow normal distribution with constant variance. In this case we can say the response variable Y is a Bernoulli random variable.
So, expected value of the response variable
E(Y) = ∂A(η)/ ∂η = 1/[1+exp(- η)]
we can write, logit (Φ) = logit(1/[1+exp(- η)]) = η = βᵗx
So, the response variable Y follows the Bernoulli distribution with parameter Φ = 1/[1+exp(- η)] and the link function g should be Logit function.
Poisson Regression
Let us consider another case now where the response variable is not normally distributed and it represents some count of an event. So, the response variable can follow a Poisson distribution or a Negative Binomial Distribution. Let’s see what happens if the response variable Y follows the Poisson distribution.
So, expected value of the response variable
E(Y) = ∂A(η)/ ∂η = exp(η)
and we can easily write ,
log(λ) = log[exp(η)] = η = βᵗx
So, the response variable Y follows the Poisson distribution with parameter λ =exp(η) and the link function g should be Log function.
Alright! Alright! Enough theory done. Let’s focus on the implementation of GLM in Python.
Let’s see first how we can fit a linear regression model using the concept of GLM in python.
All the regression models I have fitted using the statsmodels library in python.
The scatter plot of the data I used for linear regression has 2 explanatory variables(horsepower and acceleration of cars) and 1 response variable mpg of cars(present in the left hand side panel of Figure 2). It is visible that there exists a linear relationship among the explanatory and response variables and variance of the response variable is almost constant. So, it is a descent idea to fit a linear model for this dataset.

The code for fitting the linear regression is very simple. I defined the explanatory variables as x and response variable as y and added constant term with the explanatory variables.
There is no need to pass any link function as default link for Gaussian family is identity function.
Now, let’s focus on some non-linear data as in real life scenarios fitting a linear model is not always feasible.
For example, we need to predict whether a candidate will be admitted for a particular course based on his/her gpa and years of work experience. Now, here the response variable has 2 outcomes either "yes" or "no".
From the data present in the left hand side panel of Figure 3 we can see there does not exist any linear relationship between x and y variables. In fact, one of the most crucial point is linear regression is unbounded. So, we need something rather that linear regression that can output any value between 0 and 1.

The code for logistic regression is also pretty simple and almost same as linear regression except the probability distribution and the link function. The distribution used for logistic regression is Binomial/Bernoulli and the link function used is logit which is default for Binomial family.
In fact, we can use probit function also as link function. Our main motto is to have output between 0 and 1. Both logit and probit functions give almost similar inference. But, because of explainability logit function is mostly used.
The right hand side panel of Figure 3 is showing that the straight curved surface is replaced by a S-shaped curve surface and the value is bounded between 0 and 1. It also allows different rate of changes at the low and high end of the x variable and hence properly deals with heteroskedasticity.
Finally, let’s see what we can do if we need to predict the number of awards received by the students using their math and science score. The scatter plot looks like the plot present in the left hand side panel of Figure 4.
We can not fit linear regression for this data as the variance of response variable is not constant with regards to the explanatory variables. Also, the value of the response variable is positive integer which is discrete but normal distribution used for linear regression takes continuous variables and it can be negative as well.

We use Poisson distribution for Poisson regression with log as link function.
We can see from the right hand side panel of Figure 4 the predicted surface is exponential.
For more details like the summary of the regression models and how I fitted the 3-D plots in python please visit my GitHub profile.
Conclusion
To summarize, we have seen that GLM can be used to fit both linear and non-linear data. In fact, there are several other non-canonical link functions we can use for the probability distributions to fit the data in the most efficient way. For example, we can use probit function for logistic regression; log function for the normal distribution.
So, finally we can say that Generalized Linear Model is the "General" approach for fitting a regression model with any kind of data.
That’s it folks. Thank you for reading and happy learning!!