
When I first saw a time series forecasting problem I was very confused. Until that moment, I just did some supervised learning predictions on tabular data so I didn’t know how to do the forecastings if I didn’t have the target values. Many of you may have face that problem so in this post I want to introduce a very powerful way of solving time series forecasting problems using supervised machine learning models instead of statistical models such as ARIMA, ARMA, MA, AR…
I decided to write about the machine learning approach of solving time series problems because I believe that these models are very versatile and powerful and they’re much more beginner friendly than other statistical approaches.
The code is available in the link below:
towards-data-science-posts-notebooks/Bike Sharing Demand Prediction.ipynb at master ·…
Dataset
We are going to use Kaggle’s Bike Sharing Demand competition dataset because it suites perfectly for this tutorial. You can download and read about the data in the link below:
Time Series Analysis
Before using any model, it’s important to do some time series analysis to understand the data. In this step we will check all variable types, seasonalities, if the series is autoregressive or not, etc.
First of all, let’s visualize the data:

If we look to the screenshot above, we can see that the dataframe is 10886 rows long and 12 columns wide. The time series has an hourly period and our target variable will be the count column. This column is the sum of casual and registered columns but for the simplicity of the tutorial we’ll remove casual and registered columns later and we’ll just predict the count column. If you want to understand better the different variables of the data, you can check kaggle’s link above and read some information about the bike sharing demand competition’s dataset.
Now let’s check dataframe’s variable types:

All the dataframe’s variables are correct except from the datetime column. This variable’s type should be pandas’ datetime instead of object. We’ll change it later.
Let’s see some statistical data about our dataframe’s columns:

The data shown in the screenshot can be interesting for extracting some insights from our data. Let’s continue with our data analysis looking for seasonalities and trends.
We can easily look for seasonalities using statsmodels’ seasonal_decompose function. This function will decompose our time series into trend, seasonality and noise:
Now, we are going to use the custom function above to decompose 1000 hours of our time series with a daily seasonality (period=24 hours):

There we go! We can extract a lot of insights from the graphs above. If we look closely, we can see a clear daily seasonal pattern with 2 peeks and a valley between them. Despite this pattern, there is still a lot of noise that is not explained by our daily seasonality so we will try to model this noise using other variables in the dataset and some Feature Engineering. But before that, let’s see if our data is autoregressive:

After plotting this autocorrelation graph, we can say with a high confidence that our data is autoregressive and that we can improve our model’s performance using lags. In other words, the bike sharing demand can be explained using previous hour’s and day’s values.
Time Series Forecasting
After understanding the data and getting some insights, we’re ready to start modelling and forecasting the bike sharing demand per hour. In this post, we are going to forecast 1 week bike sharing demand. This means that if a week has 7 days and every day has 24 hours, we are going to predict the bike sharing demand for the next 168 hours.
We’re going to use Microsoft’s Light Gradient Boosting Machine model. This model was developed by Microsoft and it beats the standard Extreme Gradient Boosting (XGBoost) in training speed and sometimes in accuracy. Even though I use this Machine Learning model, you can use whatever model you want within scikit-learn regressors or beyond.
In this approach, we will extract new features from our timestamp and we will use this new features to perform a multi-output regression:
Let’s see the result of this feature engineering process:

Once we have our dataset with the regressors we are going to use, let’s build a custom function for predicting our horizon:
Without Lags
Once we have defined our custom function, we are going to use it and check the results of the model:

Without using any lagged variables we got a MAE of 53. That’s not bad at all!

If we check the importance of the variables according to our model, the hour and the day seem to be quite important so we can say that our features created by our feature engineering process are very helpful.
But, can we create more features and improve our performance even more?
With Lags
As we said before, the data seems to be very autocorrelated so let’s try adding lags and let’s see if this new feature improves the model’s performance:
If we look to the code above, we can see how panda’s shift function can help a lot on creating lagged features. Let’s see if this new feature can help us improving our model’s performance:

Wow! The model accuracy has improved a lot just adding a new lag feature. It went from a mean absolute error of 53 to 32. That’s about a 40% improvement comparing with the model without lagged features!

If we look closely to the variable importances, the lagged feature (count_prev_week_same_hour) seems to be very useful for predicting our target. Feature engineering is great!
Conclusion
As we saw in this post, supervised machine learning models can be very versatile and even better than other statistical approaches for Time Series Forecasting in some cases. That said, we can conclude that these models are very powerful for time series forecasting. In spite of their power, they require some feature engineering to make them work, otherwise, their performance will be poor.
If you like my content you can check my other posts too:
Unleash the Power of Scikit-learn’s Pipelines
References
Welcome to LightGBM’s documentation! – LightGBM 3.3.1.99 documentation