Can We Use Machine Learning To Forecast Oil Prices During The 2020 Collapse?

Using the brute-force of machine learning to predict the recovery of oil prices after the COVID-19 outbreak

Noah Mukhtar
Towards Data Science

--

A Sensitive Commodity

Photo by Science in HD on Unsplash

Fundamentals of Oil Pricing

Oil is a commodity notorious for being able to go in the complete opposite direction after a single market event.

This is because the fundamentals of oil prices are rarely based on real-time data, instead, it is driven by externalities, making our attempt to forecast it all the more challenging.

2020: A Year Of Ups & Downs

COVID-19

In 2020, COVID-19’s repercussions acted as a reminder of how unpredictable and sensitive oil prices are relative to external shocks.

Early 2020

At the beginning of the year, oil prices were soaring because of the OPEC-led supply cuts, U.S. sanctions on multiple major oil exporters, and escalating tensions in Libya.

Mid-2020

However, all of that took a major turn when the health of the global economy was put into speculation after COVID-19, and to make matters worse, industry experts believe it is now “virtually impossible” to confidently forecast the price of oil.

What is more confusing is that presidents have been preaching the virtues of cheap oil for decades, including President Trump himself just a month ago.

Donald Trump Tweet Praising Low Oil Prices — Mar 9, 2020
Donald Trump Praising High Oil Prices — Apr 2, 2020

However, Trump is now doing whatever it takes to push the prices back up, including posting a tweet that caused oil prices to temporarily soar up by 25% — the biggest one-day gain in recorded history.

OPEC Deal — April 9th, 2020

The historic OPEC deal to cut production by 10% has only worked to stem the damage that is still being done to the market. Oil and gas producers are still cutting their dividends and capital spending in efforts to protect their balance sheets in the face of escalating financial losses.

Why Should We Care About Oil Prices?

The reason we are creating this model is because of how linked the health of the economy is to oil prices, whenever there is a slight deviation from the norm in oil prices, the economy is impacted drastically as evident by the parallel movements on Google Trends.

(Closely Intertwined Economy & Oil Prices — Google Trends, 2020)

Time Series Analysis

Time series analysis is an insightful way to look at how a certain commodity changes over time, however, we need to go a step further and create a forecasting model using machine learning’s ARIMA.

What is ARIMA?

An autoregressive integrated moving average model is a form of regression analysis that predicts future moves by examining the difference between the values in the series as opposed to actual values.

It is the perfect time to implement this algorithm as we don’t expect any more majorly historic deals anytime soon given the recency of the OPEC deal.

Forecast Period

Timeframe 1: 20th April 2020–1st October 2020

(COVID-19 Statistics — Google News Apr 18, 2020)

The first timeframe we are forecasting is from the 20th April 2020–1st October 2020, making up almost half of the year.

The rationale behind this is because we have not reached the global peak of COVID-19 yet, giving us a reasonable level of assurance that a fully free COVID-19 market can’t exist within merely 5 and a half months as based on previous pandemic’s timelines, making this our more accurate forecast.

Timeframe 2: 20th April 2020–1st January 2025

This timeframe will act as our prediction for the estimated recovery time until oil prices can go back to their high $50s before 2020's crash.

Photo by Erik Mclean on Unsplash

Dataset

Our dataset is sourced from the U.S. Energy Information Administration and contains 37 years’ worth of daily historic Brent Oil prices from the 17th of May 198717th of April 2020, meaning it includes a week of oil price movements after the recent OPEC deal.

Training

After preprocessing the data, we found out that training the data from 2000 onwards demonstrates a higher level of accuracy.

Training Dataset — Oil Prices [2000–2020]

Statistical Fine-Tuning of Model

Differencing in ARIMA

The entire point of differencing is to make the time series stationary, and the difference is measured between today and yesterday until we reach a point where the statistical properties are constant over time.

Testing If We Have a Stationary Time Series

We run an Augmented Dickey-Fuller, and if the p-value > 0.005, which was true in our case at 0.297299, we go ahead with differencing.

Visual Representation of Non-Stationary Time Series — Oil Prices [2019/05–2020/05]

After running three tests, we are reassured that the decision of selecting an order of differencing value of 1 is the most appropriate.

Thereafter, we identify if the model requires AR terms by inspecting the Partial Autocorrelation (PACF) plot, which shows the correlation between the series and its lag.

PACF: 1st and 2nd Diffferencing Autocorrelation

Then, we find the order of the Moving Average term q, which is the error of the lagged forecast, by looking at the ACF to see how many MA terms are required to remove any autocorrelation in the stationiarized series.

Autocorrelation 1st and 2nd differencing

After testing, we decide to set q to 1, making the three parameters (p,d,q) as (1,0,1).

Model’s Forecasts:

20th April 2020–1st October 2020 Forecast

Timeframe: Expected Peak of COVID-19 [± Next 5.5 months]

Forecasted Average Price of Brent Oil: $27.79*

However, the US election is a month away from the end of our forecast, we have therefore decided to provide two more scenarios based on the volatility an election can bring to the economy and oil prices.

The Wildcard: Texas

Photo by Matthew T Rader on Unsplash

Texas produces more oil than every OPEC nation apart from Saudi Arabia, but right now it is getting crushed by cheap oil.

This means two things:

  1. Trump needs high oil prices to gain Texas’s vote to gain re-election
  2. There is the possibility of Texas limiting its output for the first time in more than 40 years to further raise oil prices

In essence, both of these reasons give us enough reasons to speculate that there will be more spikes in oil prices in the foreseeable future, which means our forecast may see more ups and downs than we expected, however, with the way things are going now, it may lean towards the optimistic estimate, even if only temporary.

Conservative Estimate — Average Price of Brent Oil: $23.62

Optimistic Estimate — Average Price of Brent Oil: $34.74*

2020–2024 Forecast

Regardless of the spikes, our model does a good job of predicting the general movement, giving us a decent indicator of the expected time-frame of the recovery of oil prices.

ARIMA Forecast — Oil Prices [2020–2024]
ARIMA Forecast — Oil Prices Per Period [1: Q1/Q2 vs. 2: Q3/Q4]

Breakdown of Results

Oil prices used to have somewhat predictable seasonal swings, with a spike in the spring, and then a drop in the fall and winter.

However, there are 4 main factors that may cause future oil prices to further deviate from our forecasts:

1) Slowing Global Demand

The global oil demand is around 17m lower in April 2020 than the 2019 annual average, which is the largest drop in history.

2) Rising U.S. Oil Production

In 2018, the US became the world’s largest oil producer, and a year after, it exported more oil than it imported for the first time since 1948.

3) Diminished OPEC Clout

OPEC has not cut output enough to put a defined floor under prices.

4) Rising Dollar Value

Foreign Exchange traders have been inflating the value of the dollar since 2014, and oil transactions are paid in U.S. dollars, which means a 25% rise in the dollar offsets a 25% drop in oil prices, and global economic uncertainty keeps the U.S. dollar strong, which is where we are headed.

Recovery Time

Photo by M. B. M. on Unsplash

In conclusion, we believe that it will take around 4 years, give or take a year based on externalities, for oil prices to recover to what they were before the 2020 crash.

LinkedIn

GitHub Code

--

--

Senior Consultant, Data Science at Deloitte | Masters in Analytics — McGill University | CFA Level 2 Candidate | https://www.linkedin.com/in/nmukhtar/