Structural Time-Series Forecasting with TensorFlow Probability: Iron Ore Mine Production

Bayesian Structural Time Series Forecasting with TensorFlow Probability Structural Time Series

Chris Price
Towards Data Science

--

Image: https://unsplash.com/photos/-subrrYxv8A

Introduction

Iron ore is one of the most heavily traded commodities in the world. As the primary input for the production of steel, it provides the foundation upon which the world’s largest metal market trades, and commands one of the largest shares of the global dry bulk trade.

Iron ore production, unsurprisingly, starts at the mine. As a trader either physical or financial, an understanding of the fundamental supply-demand nature of the iron ore market is essential. Iron ore grade (quality) variance has a notable impact on not only the spot and forward contracts pricing, but also on mill penalty charges for impurities. Imbalances in the fundamental supply/demand relationship can cause dramatic rises in the price of iron ore. Forecasting iron ore output from the largest iron ore exporting countries in order to predict global iron ore supply can be very helpful when speculating on spot, futures and contaminant penalty price movements.

Problem

In this article, we are going to develop a forecast model using TensorFlow Probability’s Structural Time-Series (STS) framework, to forecast the aggregate output of major iron ore mines in Brazil.

Brazil is the second largest exporter of iron ore globally. Major changes in supply from Brazil can have an affect on the price of iron ore, as noted above. Furthermore, Brazilian iron ore is typically very high-grade and low in impurities. As a result, the relative supply of ore from Brazil can have an affect on the penalty pricing of impurities charged by steel mills. If global supply is dominated by high-contaminant iron ore as a result of a shortage in supply from Brazilian mines, the price penalty of contaminants can rise dramatically. Forecasting the output from Brazil therefore can lend itself to understanding the above dynamics.

The code used in this article follows similar logic to that outlined in the Structural Time Series modeling in TensorFlow Probability tutorial (Copyright 2019 The TensorFlow Authors).

Why TensorFlow Probability STS?

When approaching a time series forecast problem, investing time into understanding the complexity of the variable you wish to forecast is paramount. Stationarity, seasonality, distributions and exogenous feature relationships are but a handful of the many considerations to bear in mind before designing any model’s architecture.

Structural time series models (sometimes referred to as Bayesian Structural Time Series) are expressed as a sum of components such as trend, seasonal patterns, cycles and residuals:

These individual components are themselves time series defined by a structural assumption. The ability to configure each component in the time series makes TFP’s STS library particularly relevant in the context of our time series forecasting problem, as it enables us to encode domain-specific knowledge, such as trader and mine operator expertise, and known events into our model.

How this helps with our problem

Mines production outputs typically exhibit systematic behaviour that can be modelled as a structural time series. In the context of our problem, we know that many mines close down for a two-week period in December for scheduled maintenance. This repeatable pattern can be added as a seasonal component to a structural time series model. We also know that Iron ore and other open cast mines exhibit distinct seasonal patterns that correlate strongly with precipitation which, during periods of heavy rainfall, lead to reduced outputs as drainage pumps become overwhelmed.

What is particularly useful about structural time series models in the context of our problem is that they employ a probabilistic approach to modelling a time series problem, namely, they return a posterior predictive distribution over which we can sample to provide not only a forecast, but also a means of quantifying model uncertainty.

Another exciting and highly useful feature of STS models is that the resulting model can be decomposed as a collection of it separate components. These components can then be plotted, giving us a valuable insight into their respective affects on the dependent variable (Y-hat), and a deeper understanding of the global time series problem:

Example time series decomposition taken from https://blog.tensorflow.org/2019/03/structural-time-series-modeling-in.html

Prerequisites

Before we start, it is worth noting that TensorFlow probability has a specific set of dependencies. Moreover TensorFlow is not included as a dependency of the TensorFlow Probability library and will need to be installed separately:

pip install --upgrade tensorflow-probability

Alternatively, you can use Google’s Colaboratory (Colab), who kindly provide hosted runtimes in Colab completely free of charge (CPU, GPU and even TPU!) subject to memory limits.

Structural Time Series Model

Let’s get started.

Data

We start by examining our mine loadings (output) data whose observations are the total weekly output of each mine in millions of metric tonnes, aggregated over all major Brazilian iron ore producers:

# Imports
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
import tensorflow_probability as tfp
import tensorflow as tf
from statsmodels.tsa.seasonal import seasonal_decomposedf = pd.read_excel(
'/content/bloomberg_weekly_io_output_brazil.xlsx',
header = 1,
index_col = 0,
parse_dates = True)
# Loadings
df.plot(figsize=(16, 8))
plt.title(‘Weekly Iron Ore Output, Metric Tonnes’)
plt.ylabel(‘Loadings, Mt’)
Mine Loadings in Millions of Tonnes, Weekly

If we examine the observed output time series carefully, we can vaguely see some of the structural components mentioned earlier in the article that we can attempt to encode in an STS model:

  • A clear seasonal pattern. Judging by the amplitude and frequency of each cycle, it is reasonable to suggest that this is additive seasonality (defined below).
  • With the exception of late 2018/early 2019, a linear trend.
  • Around the Christmas period, a distinct drop in output when mines typically close for scheduled maintenance.
  • A degree of noise, possibly correlated with weather, strikes, equipment failures etc.

Time Series Decomposition

We can verify some of the aforementioned components in the time series by attempting to decompose the time series into its constituent parts. We will use statsmodels time series analysis library to perform the decomposition and choose an ‘additive’ model as the seasonal component on the basis of the observed behaviour in our time series plot. For reference, additive seasonality is estimated as:

Passing the ‘iron_ore_brazil’ Pandas Series to the seasonal_decompose method we imported from statsmodels yields the following plot:

tsa = seasonal_decompose(
df[‘iron_ore_brazil’],model=’additive’
).plot()
Iron Ore Output Tonnes — Seasonal Decomposition

Upon inspection, we can immediately identify the (mostly) linear trend and the seasonal component. If we look closely at the frequency and amplitude, you can see when the mine output diminishes each year during the winter period.

It is immediately apparent upon inspection of the residuals, that there are other relationships within this time series that cannot be explained by the seasonal and trend elements alone. The variance in the residuals, however, remains fairly constant and within a defined bound. This is useful information to bear in mind when defining our structural components.

“Aha!” I hear you cry at this point. “Why decompose the time series if that is the point of the STS model?”. Well, for two reasons:

  1. It helps as a starting point in identifying and understanding the various fundamental components in a time series: trend, seasonality, cycles and unexplained variance which in turn, can inform us how we can configure the input parameters to the STS model priors.
  2. TFP’s STS models are trained on data through Variational Inference (VI) or Hamiltonian Monte Carlo (HMC) methods:
# Fit model to observed data with HMC
tfp.sts.fit_with_hmc(
model, observed_time_series, num_results=100, num_warmup_steps=50,
num_leapfrog_steps=15, initial_state=None, initial_step_size=None,
chain_batch_shape=(), num_variational_steps=150, variational_optimizer=None,
variational_sample_size=5, seed=None, name=None
)
# Or, fit model to data with VI
tfp.sts.build_factored_surrogate_posterior(
model, batch_shape=(), seed=None, name=None
)

Both of these methods are, generally-speaking, quite computationally intensive (particularly in the case of HMC) on high-dimensional problems and are quite sensitive to tuning. Having an informed choice of input parameters when configuring STS components can help save time, resources and increase posterior distribution accuracy.

Explaining the mathematics behind HMC and VI is beyond the scope of this article, but you can find out more on HMC here and VI here.

Define Our Structural Components

We can now define the various components to our STS model and configure them based on what we know about the data generating process. Let’s start with the seasonal component:

Seasonal:

Iron Ore Output (Tonnes): Seasonal Component

We define the seasonal inputs to our structural time series model as follows:

# Create train dataset
_train = df[‘iron_ore_brazil’][df.index < ‘2018–01–01’]
_dates = train.index
# Test data
test = df[‘iron_ore_brazil’][df.index >= ‘2018–01–01’]
# TensorFlow requires an an (N, 1) float tensor
train = _train.to_numpy().reshape(-1, 1))
# Seasonal effect 1: weekly cycle as identified in decomp.
weekly_cycle = tfp.sts.Seasonal(
num_seasons=52, # 52 weeks in year
observed_time_series=train,
allow_drift=True,
name=’weekly_effect’)
# Seasonal effect 2: month of year to capture winter drop in output.
monthly_affect = tfp.sts.Seasonal(
num_seasons=12, # 12 months in year
num_steps_per_season=4, # assumed 4 weeks in every month
observed_time_series=train,
name=’month_of_year_effect’)

A valuable feature offered by the tfp.sts.Seasonal model is the ability to add ‘drift’ to the seasonal affect. This parameter allows the effect of each season to evolve or ‘drift’ from one occurrence to the next following a Gaussian random walk(specifically, samples drawn from a normal distribution defined by a mean and variance plus some drift term). If we were confident of the mean and variance of the prior distribution of the seasonal component, we could configure it ourselves within the model:

monthly_effect = tfp.sts.Seasonal(
num_seasons=12, # 12 months in year
num_steps_per_season=4, # assumed 4 weeks in month
observed_time_series=train,
drift_scale_prior=tfd.Normal(loc=1., scale=0.1), # define priors
initial_effect_prior=tfd.Normal(loc=0., scale=5.),
name=’month_of_year_effect’)

For now, by setting the parameter ‘allow_drift=True’ we can let the model handle this for us.

Trend Component

Iron Ore Output (Tonnes) Trend Component

A visual inspection of our iron ore mine outputs shows (with the exception of late 2018/early 2019) a consistent linear trend. We can model this behaviour with a LocalLinearTrend model.

The local linear trend model represents a time series trend as a combination of some magnitude (level) and slope. Each of these elements evolve through time via a Gaussian random-walk:

level[t] = level[t-1] + slope[t-1] + Normal(0., level_scale)
slope[t] = slope[t-1] + Normal(0., slope_scale)

Implementation of the local linear trend component in our problem is quite straightforward:

# Add trend
trend = tfp.sts.LocalLinearTrend(
observed_time_series=train,
name='trend')

An important consideration to bear in mind when choosing how to model the trend component is the choice of model, which is dependent upon the nature of the problem. In the case of our time series problem, the observed trend is relatively stable over time and evolves gradually i.e. it does not display any strong nonlinear behaviour. Our choice of model to represent this trend, therefore, is a reasonable one, however this model can produce forecasts with very high uncertainty over longer forecast periods.

What is the alternative?

It is well known that most time series have inherent temporal structure where succeeding observations are dependent on the previous n observations in time i.e. autocorrelation. A wise choice therefore, might be to model the trend using TFP STS’s SemiLocalLinearTrend model. In a semi-local linear trend model, the slopecomponent evolves according to a first-order autoregressive process. The AR process can therefore account for the autocorrelative (of order n) effect in the time series, and typically lead to forecasts with greater certainty over a longer period of time.

Residuals:

Iron Ore Output (Tonnes) Residual Component

As previously mentioned during our inspection of the seasonal decomposition plot, the residuals in our time series look relatively consistent suggesting that they are potentially stationary, i.e. they maintain a constant variance over time, do not exhibit bias or heteroskedasticity etc. We can therefore represent the residual behaviour in an sts.AutoRegressive model:

# Residuals
residuals = tfp.sts.Autoregressive(
order=1,
observed_time_series=train,
coefficients_prior=None,
level_scale_prior=None,
initial_state_prior=None,
name='residuals_autoregressive')

As with the other components, for a fully Bayesian approach one should specify the priors coefficients_prior, level_scale_prior and initial_state_prior. As we have not specified priors, a TensorFlow Distributions (tfd) MultivariateNormalDiag instance is used as a default prior for the coefficients, and a heuristic prior constructed for the level and initial states based on the input time series.

Define Model

We can now define our structural time series model using the tfp.sts.Sum class. This class enables us to define the compositional specification of our structural time series model from the components we have defined above:

model = tfp.sts.Sum(
components=[
trend,
weekly_cycle,
monthly_effect,
residuals],
observed_time_series=train)

Fit Model

We will now fit our model to the observed time series, namely our iron ore output. Unlike traditional time series forecasting architectures such as Linear Regression models, which estimate their coefficients via Maximum Likelihood Estimation, or, on the more powerful end of the scale, an LSTM which learns a function that maps a sequence of past observations as input to an output observation, an STS model learns a distribution, namely, the posterior.

We are going to fit the model to the data and build a posterior predictive distribution using Variational Inference. Simply put, VI fits a set of approximate posterior distributions for the model parameters we have defined (for each component) and optimises these by minimising a variational loss function known as negative Evidence Lower Bound (ELBO):

# VI posterior 
variational_posteriors = tfp.sts.build_factored_surrogate_posterior(
model=loadings_model)
# Build and optimize the variational loss function (ELBO).
@tf.function()
def train_sts_model():
elbo_loss_curve = tfp.vi.fit_surrogate_posterior(
target_log_prob_fn=loadings_model.joint_log_prob(
observed_time_series=training_data),
surrogate_posterior=variational_posteriors,
ptimizer=tf.optimizers.Adam(learning_rate=.1),
num_steps=200)
return elbo_loss_curve
# Plot KL divergence
elbo = train_sts_model()()
plt.plot(elbo_loss_curve)
plt.show()
Plot of Evidence Lower Bound Optimisation (ELBO minimises Kullbeck-Leibler Divergence)

The astute among us might have noticed the decorator, @tf.function(). This accepts a function, in this instance our STS model, as an argument and compiles it into a callable TensorFlow graph. An interesting introduction on how TensorFlow works and handles operations through the use of graphs can be found here.

Forecast

Now for the fun part. After checking the loss function converges, we can perform a forecast. We draw traces (samples) from the variational posterior and construct a forecast by passing these as an argument to tfp.sts.forecast(). Given our model, the observed time series and our sampled parameters, forecast() returns the predictive distribution over the future observations for the desired number of forecast steps:

# Draw traces from posterior
traces__ = variational_posteriors.sample(50)
# No timesteps to forecast
n_forecast_steps = len(test)
# Build forecast distribution over future timesteps
forecast_distribution = tfp.sts.forecast(
loadings_model,
observed_time_series=train,
parameter_samples=traces__
num_steps_forecast=n_forecast_steps)
# Draw fcast samples
num_samples=50
# Assign vars corresponding to variational posterior
fcst_mu, fcast_scale, fcast_samples=(
forecast_distribution.mean().numpy()[..., 0],
forecast_distribution.stddev().numpy()[..., 0],
forecast_distribution.sample(num_samples).numpy()[..., 0])

We can then visualise our forecast:

Iron Ore Output (Tonnes) Forecast vs Observed

Validate Model Performance

Upon first glance, we can observe that the model (with the exception of the 2019 force majeure) has performed quite reasonably in forecasting our mine outputs. We can validate the accuracy of our forecast against the observed data points by calculating the Mean Absolute Error. From a business perspective, however, given the scale of the label i.e. in millions of tonnes it makes more sense to calculate and present the Mean Average Percentage Error (MAPE), as this is more interpretable and meaningful to non-data science individuals:

# Remember to check for div by zero errors first
print(np.mean(np.abs((test- fcast_mu) / test)) * 100)
>>> 13.92

At 13.92% MAPE It is clear that our iron ore output model has done a reasonable job of modelling the data generating distribution, particularly when we have not fully tuned the parameters of the model.

Validate Structural Component Contributions

We can further understand our model’s fit and verify the contribution of our individual structural components by decomposing it into their respective time series (again). We usetfp.sts.decompose_by_component to perform this operation which returns acollections.OrderedDict of the individual components posterior distributions, mapped back to its respective observation model:

# Dist. over individual component outputs from posterior (train)
component_dists = tfp.sts.decompose_by_component(
loadings_model,
observed_time_series=train,
parameter_samples=traces__)
# Same for fcast.
forecast_component_dists = tfp.sts.decompose_forecast_by_component(
loadings_model,
forecast_dist=loadings_model_distribution,
parameter_samples=traces__)

After a bit of manipulation, we can plot the output of the decomposed time series to yield the following chart:

Iron Ore Structural Time Series Decomposition by Component, Training & Forecast Distributions

Observations, Criticisms & Further Analysis

Examination of our decomposed model raises some points for further analysis:

  1. The mean and std deviation of our forecast posterior distributions provide us with marginal uncertainty at each time step. With this in mind, we can observe that our model is confident of our trend component’s contribution to the time series, exhibiting a stable profile up until 2018 where model uncertainty compounds. To counter this, we could improve the accuracy of our trend component by modelling it as a SemiLocalLinearTrend model, which allows the slope component to evolve as an AutoRegressive process.
  2. Our model is confident of the weekly and monthly seasonal component contributions, exhibiting a consistent output at each time step, suggesting our configuration is correct. Whilst consistent, the uncertainty for our monthly component is significantly higher. We could potentially improve these components with the addition of informed priors i.e. initial_state_prior etc.
  3. Interestingly, our AutoRegressive component contributes very little. Our model is also reasonably confident about this, suggesting that the residuals are not explained by an AR process. We could explore adding external features such as local weather/precipitation patterns and iron ore spot prices as linear covariates using sts.LinearRegression.
  4. We might achieve a more accurate posterior by fitting the model to the data utilising a Markov chain Monte Carlo (MCMC) method i.e. tfp.sts.fit_with_hmc() as opposed to using VI.

Conclusion

In this article, we have seen how we can develop a Bayesian Structural Time Series model using TensorFlow Probability’s Structural Time Series library. Additionally, we explored:

  • How define and configure structural time series model components.
  • How train an STS model to build a posterior predictive probability distribution.
  • How to sample the posterior distribution in order to preform a time series forecast.
  • How to decompose a structural time series models to examine the individual contributions of it’s respective components.
  • A brief look into some alternative ways of improving our STS model with some of the other awesome models offered by TensorFlow Probability STS.

Thank you for reading!

References

--

--

Principal Data Scientist at Anglo American | Ex-Google | Commodities Trading | Quantitative Research | Contributor to TensorFlow Probability