Implementing Facebook Prophet efficiently

Ruan van der Merwe
Towards Data Science
12 min readNov 14, 2018

--

If you have ever worked with time series predictions, I am quite sure you are well aware of the strains and pains that come with them. One moment you think you have cracked the stock market, the next moment you are lying in the bath crying and cursing your inaccurate models (I really don’t recommend that you try and predict the stock market, you will most likely not reap the benefits you think you will). What I am trying to say is that time series predictions are difficult and always require a very specialized data scientist to implement it.

Facebook Prophet’s logo

However, our blue overlords, namely Facebook released an amazing model called Facebook Prophet. Prophet makes it possible for almost anyone to predict time series values even if you have very little to no experience in this field. In most cases it works fine out of the box and your data analyst will be able to tell quite accurate stories with the output. But, when you want to start using the model in production, you have to start understanding more deeply what is going on as well as know which parameters require tuning and why.

To show you how one can tune parameters and why they should be tuned, we will be using a simple y and ds dataframe (the format Prophet uses) on which to experiment. The data is over 2 years and is hourly which already requires some extra tweaking from the out of the box model.

Our time series we want to predict
Zooming in on our time series

Before we get started, it is assumed that you have a relatively good understanding of Facebook Prophet. If you don’t, I highly recommend you follow the link below to the Prophet quick start. If you do not do this, some of the code will not make sense as I will not be explaining how to import or fit data to Prophet, only on how to optimize it.

Prophet Quick Start

Before we get to the fun part, the goal of the blog I only want to show which hyper parameters to be on the lookout for as well as give some tips I have picked up. Whatever I am saying is not the law and I invite you to share comments and recommendations on my work.

How can we measure the performance of our model?

One of the many hidden secrets that is not so obvious when first using Prophet is that it has a built-in cross validation function. This function will take your data and train the model on a period you specify. It will then predict a period that you also specify. Prophet will then train your data on a bigger period, then predict again and this will repeat until the end point is reached.

As an example, let’s take the data shown above and fit the out of the box Prophet model to it. In the example, I have written my own mean absolute percentage error (MAPE) function and applied it to the test results to get the model performance to be represented in one number.

As you can see from the code above, we import the cross validation function from Prophet and apply it to our data. This returns a dataframe of the predicted and true values for all our predicted points, which can then be used to calculate the error.

So after running the code above, we get a MAPE of 15.32%. This indicates that over all the points predicted, we are out with an average of 15.32% from the true value.

For the rest of the blog, I will discuss each hyper parameter and what you should keep an eye on when deciding their values.

Optimizing the model

Don’t do anything blindly

The first tip with optimizing a model is not to go crazy and change values and create huge for loops or throw your model into a gridsearch to optimize hyper parameters. This is especially true when looking at time series models as hyper parameters have a great effect on the predictions, but the values of those parameters don’t have to be perfect. Rather, getting them into the correct zone is what will bring the greatest reward. You accomplish this by looking at the data and understanding what your model will do with that data when you change each parameter.

Growth

This parameter is the easiest to understand and implement as you only have to plot your data to know what it should be. If you plot your data and you see a trend that keeps on growing with no real saturation insight (or if your domain expert tells you there is no saturation to worry about) you will set this parameter to “linear”.

If you plot it and you see a curve that is showing promise of saturation (or if you are working with values that you know must saturate, for example CPU usage) then you will set it to “logistic”.

The difficult part of this parameter comes when you choose logistic growth as you then have to provide the cap (maximum value your data will reach)and floor (minimum value your data will reach) of your predictions as well as historic data. These cap and floor values can change over time or be a set value that you put in for all time.

This is one of those cases where talking with a domain expert will help you the most. They have a very good idea of what can be expected for the following year and what would be impossible values for a time period. After speaking with them you can provide much more accurate caps and floors over time. However, if you have no domain experts nearby, I have found that 65% of your first value works well for a floor value. You can then make your cap be a relatively high amount, but within common sense. Thus if you are predicting the number of visits to a site and at the moment it sits at 100 000, then don’t make your cap 200 000 000 as you most likely will not reach that limit within the time you are predicting. Rather make it something more reasonable like 500 000 and let the cap grow slowly over time.

Holidays

Holidays are periods of time where the days have the same sort of effect each year. For example, if you want to model the number of subscribers in a city where people migrate over the festive periods, you can put in the dates of the festive period in your model using the holidays parameter.

The tricky part comes when you are modelling non-daily data. A mistake I made was to put in my holidays as daily data when I tried to model hourly data. This caused my model to perform worse as it struggled to adapt the holiday data to the correct form. It is important to ensure that your holiday data has the same form as your target data. You should also be able to provide the holiday dates of the period you are predicting. This means that holidays can only be times/dates/days that you know beforehand.

The other parameter that deals with holidays is holidays_prior_scale. This parameter determines how much of an effect holidays should have on your predictions. So for instance when you are dealing with population predictions and you know holidays will have a big effect, try big values. Normally values between 20 and 40 will work, otherwise the default value of 10 usually works quite well. As a last resort, you can lower it to see the effect, but I have not ever found that to increase my MAPE.

Changepoints

Changepoints are the points in your data where there are sudden and abrupt changes in the trend. An example of this would be if you had a campaign and suddenly you got 50 000 more constant visitors to your website. The changepoint will be the timeslot where this big change occurred.

There are four hyperparameters for changepoints: changepoints, n_changepoints, changepoint_range and changepoint_prior_scale.

The changepoints parameter is used when you supply the changepoint dates instead of having Prophet determine them. Once you have provided your own changepoints, Prophet will not estimate any more changepoints. Therefore, it is important that you know what you are doing. From my experience, I have found that letting Prophet discover them on its own and me changing the number of changepoints (with n_changepoints) gave the best results. In terms of how many changepoints should be chosen, I recommend that if you have hourly data at least one changepoint a month will give good results. However, it will change depending on each use case, but this can be a good start.

The changepoint_range usually does not have that much of an effect on the performance. I have had the best results by just keeping it at the default value, but I would love to hear in the comments if someone has found a case where there was a difference when changing it from 0,8. The other parameter, changepoint_prior_scale, is there to indicate how flexible the changepoints are allowed to be. In other words, how much can the changepoints fit to the data. If you make it high it will be more flexible, but you can end up overfitting. I have found that values between 10 and 30 work for me, depending on how volatile the data is.

Seasonalities

These parameters are where Prophet shines as you can make big improvements and gain great insights by changing only a few values.

The first big parameter is seasonality_mode. This parameter indicates how your seasonality components should be integrated with the predictions. The default value here is additive with multiplicative being the other option. At first I struggled with this parameter, but after working with it for a bit, it began to make sense to me. You will use additive when your seasonality trend should be “constant” over the entire period. For example, when you want your yearly trend growth impact to be the same as in 2010 as it is in 2018. This is applicable in data where the trend change seems to stay constant, for example the number of people living in a small town. This is because we don’t expect the growth to suddenly increase in millions because there is no infrastructure for that.

On the other hand, when we want to predict the amount of people living in a growing city, the yearly trend number might be much more important in the final years as the infrastructure is there. The rate of population growth can then be much quicker than what it would have been in the early years. In a case like that, you will use multiplicative to increase the importance of the seasonalities over time.

As is the case everywhere, there is also a seasonality_prior_scale parameter. This parameter will again allow your seasonalities to be more flexible. I have found values between 10 and 25 to work well here, depending on how much seasonality you notice in the components plot.

A trick I found to work best for me (and I am open to discussion about this) is to set yearly_seasonality, weekly_seasonality and daily_seasonality all to false and then add in my own seasonalities, as shown in the code snippet below.

Adding your own seasonalities

By doing this you get more power and control over seasonality. You can specify the exact periods of each season which means you can create “new” seasons. For example, by making the period 93.3125 you can add in quarterly seasonality to your model. You can also add for each seasonality what the prior scale should be instead of them all sharing one scale. Now I wish I could tell you what seasonalities to add and what period they should be, but each use case is completely different and a domain expert should be contacted to get recommendations and insights.

Just for clarity, if the period is set to 35 then you tell the model that what happened at a certain point is likely to happen again in 35 days.

The other parameter you can tweak using this technique is the number of Fourier components (fourier_order) each seasonality is composed of. Now for those that know a bit of mathematics, any signal can be represented by a sum of sine and cosine waves. This is inherently how Prophet generates its seasonality signals. With this, you can change how accurately it should start representing the curve or how many more curves can be present in the curve. Shown below is the daily seasonality of some data using a fourier_order of 5 and 20 respectfully.

Fourier_order = 5
Fourier_order = 20

As you can see the curve outline stays the same in both trends, but there are more bumps in the trend with 20 components. These bumps could either be picking up noise or can be more accurate values. It should come down to your own interpretation and the input of a domain expert. I have found that higher values do give better results and I highly recommend investigating the effect of using higher values. You can try values that range from 10 to 25.

MCMC Samples

This is a tricky one. This parameter determines if the model uses Maximum a posteriori (MAP) estimation or a full Bayesian inference with the specified number of Markov Chain Monte Carlo (MCMC) samples to train and predict.

So if you make MCMC zero then it will do MAP estimation, otherwise you need to specify the number of samples to use with MCMC. I will be honest and say I have never found Bayesian inference to work better and I always use MAP estimation. Bayesian inference also takes extremely long to run as you need to use at least 1000 samples to get results that are satisfactory. This is the one parameter that I always just leave to the baseline value.

As a result of keeping the mcmc_samples 0, I have not yet had a chance to explore interval_width because it gets generated automatically when using MAP estimation.

Uncertainty samples

This is another parameter that does not affect the outcome that much and should be explored for each use case.

Improving our original model from the start

So how much will the MAPE increase if we give more thought to our parameters and concentrate on the parameters mentioned above? Shown below is the final model.

Final Prophet model

If we look at my final model, we see I chose linear. This is because I spoke to the domain expert I work with and she mentioned that what we are predicting here has no limit. One can also see it grows fast, but also not very exponentially. For this reason, I will not have to worry that my predictions will pick up something that causes the values to grow exponentially.

My seasonality has changed to multiplicative because I found that models struggled to predict higher values in the latter part of the time series. This indicated that the trend might be more important in the latter stages.

You can also see that I added a monthly and quarterly trend as I noticed these trends when doing some EDA on my data (very important to always do EDA before building models).

Through all these changes, I managed to get my MAPE down to 6.35% with the baseline model having a MAPE of 15.32%.

Predicted (blue) vs real values (black dots)

This just goes to show that Prophet works amazingly out of the box and if you take a bit of care with your model, you can increase the performance substantially!

Thank you for the read and I look forward to discussions on how you have implemented and optimized Prophet.

--

--