A Simple Linear Regression Model

Exploring a relationship between Electricity Price and Carbon Emissions Data

Carter Bouley
Towards Data Science

--

Regression analysis is a powerful statistical method that enables examination between variables of interest. It can be utilised to assess the strength of the relationship between variables, and modelling the future relationship between them.

Economists are obsessed with Regressions. This is likely why they were such an integral and formative chunk of my university years. This kind of quantitive estimation is one of the most important and frequently used tools to understand economic theories.

In economics, correlations are common, but identifying whether the correlation between two or more variables is causal is rarely as easy. Regression analysis enables us to predict the direction and distance of change, and significance of this relationship between variables.

It is also one of the most well known and understood algorithms in statistics, Data Science and Machine Learning. Its simplicity means it’s a good place to start for those looking in to the space.

My previous role had me involved with a tech start up within the energy sector — specifically electricity. Given my non-energy background I had much to learn on the workings of electricity markets.

A snippet of information I picked up through exposure to this sector — simple once understood but not known by everyone — is that electricity price and carbon emission intensity from the grid are correlated.

These variables form the basis for my Linear Regression.

Finding Data

Carbon Intensity

The first part of any Data Science project is finding data. National Grid provide an interactive website, and host an easy to use API. I selected a weeks worth of hourly carbon intensity data to analyse recent electricity grid emissions.

They also provide an estimation for the next 48 hours. They do this to try and encourage companies to use their API to automate electricity consumption of IOT devices based on their prediction. Think electric vehicles charging on wind power rather than coal generators.

I’ve selected solely the dates from 20th to the 27th of October 2019, to have a look at last week.

(www.carbonintensity.org.uk)

Electricity Price

When we consume electricity as households, we are given a fixed hourly rate, regardless of time of day. Utilities however have some exposure to the wholesale market. In order to minimise this, large companies ensure the vast majority of UK electricity is sold in bilateral agreements, away from the open market, limiting their price risk.

However, because electricity cannot be stored at a large scale supply must equal demand at all times. This means some market mechanisms must be created to ensure balance. One of these, the day ahead market, is exactly what it sounds like — buyers and sellers bidding 24 hours ahead of time. There are various shorter term market mechanisms that are more volatile which ensure the market clears and is balanced, however the NordPool Group data is relatively easy to work with.

(www.nordpoolgroup.com)

Visualisations

Price per Mwh (£) in the Day Ahead market over the course of a week.

Average Carbon Intensity of the Electricity Grid (gC02/kWh)

The above graphs help are a visual representation of the results of our varying electricity demands (and supply) over the course of a day. The overall picture is a ramp up in price and carbon intensity in the mornings, peaking at 8am, dipping during the day and then rising again in the evening, peaking at 8pm. This is broadly similar across both carbon intensity and price.

Plotting below, we can see this relationship more clearly.

These Visualisations can prove a useful tool for getting a feel for data but are limiting in their statistical analysis of the interactions between the variables.

While there looks to be some sort of correlation, a Linear Regression allows us to further examine this relationship.

Assumptions

Linear Regression requires five key assumptions to be met:

  • Linear relationship
  • Multivariate normality
  • No or little multicollinearity
  • No auto-correlation
  • Homoscedasticity

Ordinary Least Squares

I imported and ran statsmodels — a python library- to perform the ordinary least squares regression (OLS).

OLS finds the average change in Y (Electricity Price), caused by a change in X (Carbon Intensity). Least Squares refers to the fact that it finds this average change by mapping a line of best fit. It takes the distance between the actual data, and the prediction (line of best fit) and calculates the error and squares it to find the actual distance to the prediction. If the model can be improved, the line shifts to that point, until it is the least possible distance from all points of data.

Statsmodels provides an awesome summary of our model, using its summary() superpower, which is exactly why I chose to use it over other libraries that can also perform regression analysis

A visualisation can again help to picture the model.

The model shows the average relationship between the variables. If we know which electricity generators are currently running, their carbon intensity and current output, we will should be able to refer to this line and estimate the price. However, the accuracy of this depends on passing various statistical tests to asses its significance.

In this case, it fails the assumption of no Autocorrelation due to its low durbin-watson result. As well as this, its Jarque-Bera its much higher than acceptable, suggesting a non-normal distribution. This may invalidate our model.

Distribution of errors:

Check for homoskedasticity

Firstly, while price is normally distributed:

Carbon intensity is not:

Results Discussion

Therefore, our model is statistically dubious at best, but that does not mean that some relationship does not exist.With more time on my hands, or help from others, this relationship could definitely be interesting to look at. On top of this, such a small data set is not unlikely to yield these insignificant results.

Wildly simplified, high carbon intensity fuels and high prices are correlated because renewable generation does not require fuel, and all the costs that come along with it: Sourcing it, buying land, drilling or digging it up, refining it, transporting it, dealing with the waste.

Coal is the least efficient fuel source, and the dirtiest on our grid, losing 70% of its energy when converting into electricity.Wind on the other hand, ends up being the most efficient source of energy, creating 1126% more electricity then energy it took to create.

(https://blogs.wsj.com/numbers/what-is-the-most-efficient-source-of-electricity-1754/?ns=prod/accounts-wsj)

Continuation to Further Modelling

To further analyse the indicators of electricity price, there are various methods we could use to strengthen our model:

Include more data:

This could be including data over a longer timeframe, or from other countries — providing we could also access or calculate the carbon intensity of the grid.(this is often difficult to calculate)

Include more variables:

Including other variables, such as global oil prices, natural gas and coal prices would be obvious contenders to include to help predict price. On top of that, we could look into solar panel cost and wind turbine costs, and even venture into the granularity of localised weather data, predictions around electricity demand (large sporting events can often yield interesting results — although the rugby world cup didn’t).

Using Linear Regression as a basis to analyse electricity prices can prove useful tool for the energy transition. If we want to move off of fossil fuel reliance it is vital to shift our energy demand onto renewable and low carbon generation.

Analysis and prediction is the first step in getting us there, and there are companies like Nest, that help to control vast numbers of houses energy consumption to limit our carbon footprint.

This is especially important as we electrify our transport, and electricity consumption soars. If all this extra demand is supplied by coal rather than wind then the key benefits will be lost.

--

--