
The mathematics and concepts of most Cointegration methods are not always very straightforward. Often, complex mathematical tools obscure the intent and steps of the methods. Such complexities also detract some from further exploration of the subject.
The Engle-Granger approach to cointegration does not suffer from this. It may not be the most reliable method, nor the most stable, but it is simple and intuitive. For more robust cointegration methods, see BTCD or Dickey-Fuller direct optimization.
In this story, we will present the Engle-Granger approach with simple mathematics, and we will code a mini-library to use this method in future projects.
Story Structure
- Cointegration, order 1
- Building 1D time series from vectors and matrices
- Ordinary Least Squares residuals
- Determination of the most suitable relationship
- Examples with synthetic data
- Code summary
- Final words
Cointegration, order 1
When referring to cointegration, we usually mean cointegration of time series integrated of order one. That means we seek a linear combination of the original time series (not differences) that is stationary. Sometimes we find it; sometimes, it is not there to be found. Mathematically, we seek for coefficients α and possibly a constant c such that

is stationary.
It is essential to notice that if u is stationary, then it is stationary no matter the value of the constant c. The constant is used as a mere way to re-center the time series values.
Building 1D time series from vectors and matrices
Before we go deeper into the Engle-Granger approach, we need to develop tools that will simplify our future endeavors.
Given a vector of coefficients a and a matrix of processes (time series), each column one process, the columns’ index corresponds to the vector a‘s index; the function "get_u_t" of the following code creates the linear combination of such conditions. Furthermore, the function "get_umat" creates a matrix of u time series (one in each column) from a matrix of coefficients (indexed by columns as well), i.e., a vector of a_ vectors.
You can see that the convention we will use in our NumPy arrays is that each column is a time series, very similar to how a pandas DataFrame works.
Ordinary Least Squares residuals
Let us return to the equation for u and rewrite it slightly differently.

Here _x0 has become y, and the summation starts at one instead of zero.
Now comes the essence of the Engle-Granger approach; they conjectured that using the OLS regression specification:

where y is one of the multiple time series, and x‘s are all the others. Then because ε is i.i.d normal and therefore stationary, even if the residuals from the actual regression, u, are not i.i.d normal, they would approximate stationarity.
Then our series u is:

where the coefficients and constant are estimated using OLS.
Notice that if we have n+1 time series, there are n+1 ways to choose y and the x‘s. Hence, from the Engle-Granger approach using OLS, we find a matrix of coefficients, not a single vector.
In other words, given a matrix where each column is a time series _xj, we apply OLS to each of the following specifications.

then each u_j time series is:

The matrix of coefficients to build the matrix of _uj‘s is defined as:

notice the negative sign; this is because in the regression specification, the linear combination is on the other side of the equality sign, and we need to account for that.
The following code finds such a matrix, where each column corresponds to a vector of coefficients. We have defined an [interface](https://towardsdatascience.com/python-interfaces-why-should-a-data-scientist-care-2ed7ff80f225) for what the linear regression model should be like (all the scikit-learn regressions implement this interface) so that we can test other models beyond the original OLS. Using an interface, we inject the dependency of the regression model instead of hard-coding it, thereby being able to switch models without rewriting our code.
Everything is great up to this point; however, we need to address one issue; we were supposed to find the the a vector (the vector with the linear combination coefficients), not n+1 a vectors (the matrix of coefficients). We need to determine how to pick among all the possible choices.
Determination of the most suitable relationship
We use the Dickey-Fuller test for stationarity to find the relationship which is more likely to be stationary. We estimate the test statistic for all relationships and pick the one with the smallest value. Simple.
For speed and simplicity, we use the direct way of estimating the Dickey-Fuller test statistic:

where the time series sample size is T+1, and ρ is the correlation between the lagged series and the differenced series.
If you want to check out the full mathematical proof of this result, I recommend reading this story.
The code to implement it:
We wrap the function "get_engle_granger_coint_mat", coded in the previous section, inside another function that uses the same interface for the linear model and selects the vector with the lowest Dickey-Fuller statistic value.
That is the function that we call when we want a simple solution to the problem of finding the a vector with the coefficients that produce a potentially stationary time series u. The key here being potentially stationary. There are cases where the Engle-Granger approach fails to find a stationary time series, so use this with caution. Once again, for more robust, albeit more complicated methods, check out BTCD or direct Dickey-Fuller optimization.
Examples with synthetic data
To test the code we just wrote, we will use unit root processes in the form of discretely sampled correlated Brownian motions. Check out the previous story about generating such processes, as we will use the code discussed there to generate correlated Brownian paths. We save such code as "brownian_motion.py" and place it in the same folder where you run the following code.
Finally, we test the Engle-Granger approach, using OLS, injecting a "LinearRegression" model from sklearn:

We coded the regression with an interface because we wanted the freedom to change the model without rewriting our code. So if we now wish to use LASSO, we inject a Lasso model from sklearn.

It’s nice to be able to change models so easily. Although, in this particular example, OLS seems to produce a better result, do not underestimate the power of penalized regressions. Since the model complexity using LASSO is less, it could lead to more robust results. It is up to you to determine what regression model works better for you.
Code summary
To wrap up, for completeness, here is all the code from the previous sections. This time, the code was not that extensive; more docstrings than code, though.
Final words
The purpose of this story was merely didactical, to explore cointegration intuitively. It can be helpful to use the Engle-Granger method to explore relationships between time series. Every relationship produces a different time series; some are more likely to be stationary than others. It is interesting to investigate why this happens. Why are some time series fitted better by the regression model? These are questions that can create valuable data insights.
Nevertheless, the Engle-Granger method is rarely used in practice to compute the cointegrating vector. I would suggest either BTCD or Dickey-Fuller direct optimization for such purposes.
References
[1] R. F. Engle and C. W. J. Granger, Co-Integration and Error Correction: Representation, Estimation and Testing(1987), Econometrica Vol. 55, No. 2, pp. 251–276
I hope this story was useful for you. Stay connected for part 2 of the Cointegration Popular Methods mini-series. Subscribe to get notified as soon as part 2 comes out.
Liked the story? Support my writing by becoming a Medium member through my referral link below. Get unlimited access to my stories and many others.