One simple way for CFOs to manage forecast uncertainties (using Monte Carlo and R)

Tristan Ganry
Towards Data Science
5 min readMay 9, 2018

--

There are countless ways for Finance professionals to forecast sales and the intent is not to cover them. The goal of this article is to help finance professionals better manage forecast uncertainties using using Monte Carlo and R by being able to answer one of the following questions for the business:

“What’s the likelihood your organization will reach the target/goal?”

“What’s your organization’s forecast risk profile?”

It’s no longer just about estimating targets (for example: reforecasting Q4 based on what we know happened in the past). It’s about evaluating the level of risk within a target and having a data-driven business conversation about what we do about it to achieve the desired outcomes. It will also give Finance professionals the opportunity to be much more strategic in the way they manage their forecast (e.g. is it time to consider including a new opportunity coming down the pipe? should we wait? did we over or under promise?).

While it’s relatively easy to do Monte Carlo simulations in Excel, I personally prefer using R given the breath of libraries that can be leveraged for fitting distribution and for generating sample data for the Monte Carlo Simulation.

For this exercise, we will use a simplified case and it requires few libraries. With a few lines of codes, we will be able to get back to the business with a clear answer of:

“The risk is high. As things are today, it’s unlikely that we will reach target (e.g. less than 33% chances). Let’s come up with a plan to bridge that gap.”

What we need for this example:

  • A business question / context — “We closed Q3 at 141.7m and we have an annual target of 186.0m; what’s our probability to achieve it?”
  • An observed time series — quantity sold of a certain product ABC by month.
  • An annual target — at least 186.0m products ABC sold by end of year.

The approach we will use to answer the question is as follows:

  • Step 1: Prep the data.
  • Step 2 A: Fit a traditional distribution to the observed time series.
  • Step 2 B: Fit a “supplemental” distribution to the observed time series.
  • Step 3: Generate a data table with 10,000 samples for October, November, and December.
  • Step 4: Generate probabilities and compare both options (Option A = 26.75% / Option B = 33.2%). A big takeaway, in that example, is that being more precise by leveraging a better fitted distribution doesn’t significantly change the business outcome (e.g. 5% risk difference between the 2 approaches).
  • Step 5: Report to the business that the risk is high and, as things are today, it’s unlikely that they will reach target (e.g. less than 33% chances). Then, come up with a plan to bridge that gap.

Step 1: Prep the data

| library(RCurl)# Read data
mydata <- read.csv(text = getURL("https://raw.githubusercontent.com/tristanga/MonteCarlo_ForecastRisk/master/TS.csv"))
# Create time serie
tseries <- ts(mydata$x, frequency = 12, start = c(2000, 1))
# Check the time serie
start(tseries)
end(tseries)
frequency(tseries)
# Remove Q4
tseries_sub <- window(tseries, start=c(2000, 1), end=c(2015,9))
# Define your target
mytarget = 186.0000
# Calculate actuals
actualYTD <- sum(window(tseries, start=c(2015, 1), end=c(2015,9)))
# Check the distribution of your time serie
hist(tseries_sub)
boxplot(tseries_sub)
tseries_df = as.data.frame(tseries_sub)

Step 2 A: Fit a traditional distribution to the observed time series.

|library(fitdistrplus)
fit.norm <- fitdist(as.numeric(tseries_df$x), "norm")
fit.exp <- fitdist(as.numeric(tseries_df$x), "exp")
fit.weibull <- fitdist(as.numeric(tseries_df$x), "weibull")
fit.lnorm <- fitdist(as.numeric(tseries_df$x), "lnorm")
fit.gamma <- fitdist(as.numeric(tseries_df$x), "gamma")
fit.logistic <- fitdist(as.numeric(tseries_df$x), "logis")
fit.cauchy <- fitdist(as.numeric(tseries_df$x), "cauchy")
# Compare Goodness-of-fit statistics
gofstat(list(fit.norm, fit.exp, fit.weibull,fit.lnorm,fit.gamma,fit.logistic,fit.cauchy),
fitnames = c("fit.norm", "fit.exp", "fit.weibull","fit.lnorm","fit.gamma","fit.logistic","fit.cauchy"))
# the best Goodness-of-fit statistics is for the normal distribution
option1 = fit.norm
summary(option1)

Step 2 B: Fit a “supplemental” distribution to the observed time series.

|# Using Supplementary Distributions to fit the second option
library(SuppDists)
parms<-JohnsonFit(as.numeric(tseries_df$x), moment="quant")
# plot the distribution
hist( as.numeric(tseries_df$x) , freq=FALSE)
plot(function(x)dJohnson(x,parms), 0, 20, add=TRUE, col="red")
# let's create samples for october, november and december
option2 <- function(x)qJohnson(x,parms)

Step 3: Generate a data table with 10,000 samples for October, November, and December.

|#Option 1
library(truncnorm)
fit.coef <- coef(fit.norm)
final_df1 <- as.data.frame(rtruncnorm(n=10^4, a=min(tseries_df$x), b=max(tseries_df$x), mean=fit.coef["mean"], sd=fit.coef["sd"]))
colnames(final_df1) <- 'Oct'
final_df1$Nov <- rtruncnorm(n=10^4, a=min(tseries_df$x), b=max(tseries_df$x), mean=fit.coef["mean"], sd=fit.coef["sd"])
final_df1$Dec <- rtruncnorm(n=10^4, a=min(tseries_df$x), b=max(tseries_df$x), mean=fit.coef["mean"], sd=fit.coef["sd"])
final_df1$Forecast <- actualYTD + final_df1$Oct + final_df1$Nov +final_df1$Dec
#plot histogram of forecasted quantities
hist(final_df1$Forecast)
#Option 2
option2 <- function(x)qJohnson(x,parms)
option2sample <- option2(runif(10000))
hist(option2sample)
boxplot(option2sample,as.numeric(tseries_df$x) )
final_df2 <- as.data.frame(option2(runif(10000)))
colnames(final_df2) <- 'Oct'
final_df2$Nov <- option2(runif(10000))
final_df2$Dec <- option2(runif(10000))
final_df2$Forecast <- actualYTD + final_df2$Oct + final_df2$Nov +final_df2$Dec
# Plot Option 2
hist(final_df2$Forecast)

Step 4: Generate probabilities and compare both options (Option A = 26.75% / Option B = 33.2%).

|boxplot(final_df1$Forecast,final_df2$Forecast)
myproba1 <- sum( final_df1$Forecast >= 186 ) / 100
myproba2 <- sum( final_df2$Forecast >= 186 ) / 100

Step 5: Report to the business that the risk is high and, as things are today, it’s unlikely that they will reach target (e.g. less than 33% chances). Then, come up with a plan to bridge that gap.

Next steps:

  • Increase complexity (distribution of each month vs. entire time series, profit vs. quantity sold, include growth/CAGR in the model, external variables, etc.).
  • Write about other relevant statistical methods for forecasting quantitative time series for Finance professionals.

Please note that there are some risks/considerations to keep in mind with the Monte Carlo approach: 1) We are using the past to predict the future without accounting for any external factors beyond previous quantity sold distribution (population, price, etc.), and 2) The observed time series distribution will change over time; while the model in option 2 should follow the changes it will be important to check from time to time that it still follows closely the selected distribution.

The code is available on Github. Thank you for reading my post. It is my first post on Medium. Feel free to contact me if you have faced similar challenges regarding how you solved them or if you want help implementing it in your organization.

--

--