Forecast Model Tuning with Additional Regressors in Prophet

Explanation about how to add regressors to Prophet model to improve forecast accuracy

Andrej Baranovskij
Towards Data Science

--

Use case — bike rental (picture source: Pixabay)

I’m going to share my experiment results with Prophet additional regressors. My goal was to check how extra regressor would weight on forecast calculated by Prophet.

Using dataset from Kaggle — Bike Sharing in Washington D.C. Dataset. Data comes with a number for bike rentals per day and weather conditions. I have created and compared three models:

  1. Time series Prophet model with date and number of bike rentals
  2. A model with additional regressor —weather temperature
  3. A model with additional regressor s— weather temperature and state (raining, sunny, etc.)

We should see the effect of regressor and compare these three models.

The forecast is calculated for ten future days. Last day available in the dataset is 2012–12–31, this means forecast starts from 2013–01–01.

A. Model without additional regressors

Forecast values for bike rentals, starting from 2013–01–01:

Bike rentals forecast for the next ten days

B. Model with the additional regressor — weather temperature

Regressor value must be known in the past and in the future, this is how it helps Prophet to adjust the forecast. The future value must be either predefined and known (for example, a specific event happening in certain dates) or it should be forecasted elsewhere. In our case — we are using weather temperature forecast, which we can get from the weather channel.

Values for regressor must be in the same data frame as time series data. I have copied a date column to be defined as an index column:

df = pd.read_csv('weather_day.csv')
df = df[['dteday', 'cnt', 'temp']].dropna()
d_df['date_index'] = d_df['dteday']
d_df['date_index'] = pd.to_datetime(d_df['date_index'])
d_df = d_df.set_index('date_index')

We need to construct future data-frame for ten days — creating Pandas data-frame for ten days from 2013–01–01 and initializing each element with temperature forecast (normalized value):

t = 13
min_t = -8
max_t = 39
n_t = (t - min_t)/(max_t - min_t)
print(n_t)
future_range = pd.date_range('2013-01-01', periods=10, freq='D')
future_temp_df = pd.DataFrame({ 'future_date': future_range, 'future_temp' : 0})
future_temp_df['future_date'] = pd.to_datetime(future_temp_df['future_date'])
future_temp_df = future_temp_df.set_index('future_date')
future_temp_df.at['2013-01-01', 'future_temp'] = 0.319148
future_temp_df.at['2013-01-02', 'future_temp'] = 0.255319
future_temp_df.at['2013-01-03', 'future_temp'] = 0.234042
future_temp_df.at['2013-01-04', 'future_temp'] = 0.319148
future_temp_df.at['2013-01-05', 'future_temp'] = 0.340425
future_temp_df.at['2013-01-06', 'future_temp'] = 0.404255
future_temp_df.at['2013-01-07', 'future_temp'] = 0.361702
future_temp_df.at['2013-01-08', 'future_temp'] = 0.404255
future_temp_df.at['2013-01-09', 'future_temp'] = 0.425531
future_temp_df.at['2013-01-10', 'future_temp'] = 0.446808
future_temp_df.tail(10)

With below code, I’m adding regressor for weather temperature. If the date falls into the training set, then returning temperature from the training set, otherwise from a future forecast data frame (the one constructed above). Period for ten days into the future is set. Prophet model is constructed with fit function, predict function is called to calculate forecast:

def weather_temp(ds):
date = (pd.to_datetime(ds)).date()

if d_df[date:].empty:
return future_temp_df[date:]['future_temp'].values[0]
else:
return (d_df[date:]['temp']).values[0]

return 0
m = Prophet()
m.add_regressor('temp')
m.fit(d_df)
future = m.make_future_dataframe(periods=10)
future['temp'] = future['ds'].apply(weather_temp)
forecast = m.predict(future)
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail(15)

Forecast value for bike rentals, with additional regressor for the temperature, starting from 2013–01–01. The temperature forecast is good, warm weather for January is expected, this helps to adjust numbers for bike rentals to be higher (naturally there should be more rentals if weather temperature is good):

Bike rentals forecast for the next ten days, with temperature regressor

C. Model with two additional regressors— weather temperature and condition

The dataset contains quite a few attributes which describe the weather, using more of these attributes would help to calculate the more accurate forecast for bike rentals. I will show how adding one more regressor could change the forecast.

def weather_temp(ds):
date = (pd.to_datetime(ds)).date()

if d_df[date:].empty:
return future_temp_df[date:]['future_temp'].values[0]
else:
return (d_df[date:]['temp']).values[0]

return 0
def weather_condition(ds):
date = (pd.to_datetime(ds)).date()

if d_df[date:].empty:
return future_temp_df[date:]['future_weathersit'].values[0]
else:
return (d_df[date:]['weathersit']).values[0]

return 0
m = Prophet()
m.add_regressor('temp')
m.add_regressor('weathersit')
m.fit(d_df)
future = m.make_future_dataframe(periods=10)
future['temp'] = future['ds'].apply(weather_temp)
future['weathersit'] = future['ds'].apply(weather_condition)
forecast = m.predict(future)
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail(15)

Additional regressor — weather condition (check weathersit attribute in the dataset) is added using the above code, along with weather temperature regressor. For the test purpose, I’m setting weather condition equal to 4 (this means bad weather) for all ten days in the future. Even if weather temperature in January is increasing (which is good for bike rentals), overall weather is bad (this should decrease the number of bike rentals).

With the second regressor (pointing to bad expected bad weather), Prophet returns smaller expected bike rental numbers:

Bike rentals forecast for the next ten days, with temperature regressor and weather condition

Summary: Additional regressors feature is very important for accurate forecast calculation in Prophet. It helps to tune how the forecast is constructed and make prediction process more transparent. Regressor must be a variable which was known in the past and known (or separately forecasted for the future).

Resources:

--

--