A Bayesian Approach to Estimating Revenue Growth

A practical example of using the normal-normal model

Published in

Towards Data Science

14 min readAug 10, 2020

Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works without seeking professional advice. See our Reader Terms for details.

Maybe you’re an investor trying to decide whether a stock is worth investing in. Maybe you’ve only recently heard of Bayesian inference and want to get a sense of how it can be applied in the real world. Maybe you’re a seasoned analyst who stumbled upon this article and found the title interesting. Regardless of where you come from, I thank you for giving this piece a read. I’m going to talk about the normal-normal model, one of the foundational models in Bayesian statistics, and how it can be used to estimate the growth rate of a company’s revenue. That estimate can then be used to decide whether or not the company is a worthwhile investment.

The first objective of this piece is to demonstrate how the normal-normal model can be used to incorporate a subjective overlay into data analysis. The second is to provide some intuition behind the normal-normal model and Bayesian inference in general without getting too bogged down in the mechanics. I’ll say it here and again at the end of the article, but this piece does not constitute investment advice. It is meant to be educational. All information and options expressed in this article are published in the author’s individual capacity and are independent of any employment arrangement the author may have with RobustWealth, Inc.

With that disclaimer out of the way, let’s get to it!

The Task at Hand

Financial modeling generally refers to projecting fundamental values for a company in order to arrive at a fair price estimate for the company’s stock. Some of the most common metrics used to arrive at valuations are revenue, earnings, and cash flow. The company we’re going to look at is MongoDB, a software services company. It began trading publicly back in 2017, and its revenue growth has been tremendous.

Created in RStudio; all plots and tables were created in RStudio

Given how young the company is and how it’s in a growth-oriented phase of its existence, it’s reasonable to focus on revenue in order to value the company. Data in the company’s 10-K filings, the annual financial reports, shows revenue numbers on a quarterly basis starting in fiscal 2016. Annual numbers are present from the year 2014. To use a little more data than the six annual numbers (which translate into five growth numbers), I’ve computed rolling one-year revenue growth on a quarterly basis. That data is shown below.

Closer to the end of this piece, I’ll compare the results of the analysis using year-end data versus quarterly data. (Although I haven’t run a formal analysis, I assume there’s a degree of serial correlation in the quarterly data. This won’t matter in terms of explaining the concepts of the normal-normal model, but it is certainly something to be mindful of in practice.)

A common way to project revenue for a company is to use the average historical revenue growth rate over a certain amount of time. For companies with many years of data, this isn’t necessarily a bad practice, especially if the growth rates follow a normal distribution. Given how little sample data is present and the histogram of the data which I’ll plot below, using the sample mean may feel unwise in this case.

Bayesian inference is particularly useful in situations where the sample size is small and where one holds a subjective belief that the sample data does not appropriately represent what a larger sample would look like.

To conduct Bayesian inference, one needs a prior distribution and a sampling model. Before defining those distributions in this context, I’ll go over some of the basics of Bayesian inference and how the prior distribution and sampling model come into play. Feel free to skip this section if you’re familiar with Bayes’ theorem and how it applies to distributions.

Bayes’ Theorem and Distributions

In its simplest form, Bayes’ theorem is defined as

which is equivalent to

This is all well and good if one is given neatly defined probabilities to use, but distributions complicate the process a little.

First, I’ll substitute A with θ and B with Y. In this case, Y refers to the points in the sample data, and θ refers to the true average growth rate in revenue for MongoDB. Re-writing the second form of the formula with the above substitutions leads to

In words, the distribution I’m trying to model is the distribution of average revenue growth rate GIVEN the sample growth rates. I will use the sample data and a little bit of judgement to define this distribution P(Y|θ). I will also need a prior distribution P(θ) for the average growth rate and the marginal distribution of the data P(Y). The onus is on the analyst to define the sampling distribution as well as define a prior distribution for θ. Once they have a sampling distribution P(Y|θ), the correct way to obtain P(Y) would be to solve for the integral below:

In practice, this may be difficult to do, but a shortcut is available. Since Y is only conditional on θ in this instance, P(Y) is an unconditional probability distribution and encompasses all possibilities of Y. This means that the area under the distribution will be equal to 1 (the sum of all probabilities for an event equals 1), and the integral will be equal to 1 multiplied by a normalizing constant. Rather than solve for this normalizing constant, the equation can be expressed more simply as

P(θ|Y)∝P(Y|θ)P(θ)

where ∝ stands for “is proportional to.” In other words, one doesn’t need to worry about P(Y). With one task eliminated, the next step is to define a sampling distribution and a prior distribution.

(Note: technically, Y is conditional on sample variance. In this case, variance is assumed to be known and constant. Because variance is assumed to be known and a constant, it can be omitted from the notation.)

Defining a Sampling Model and Prior Distribution

I’m going to use a normal model for the sampling distribution. Having looked at the histogram for our data, one may think that there are distributions available that better represent the data. I like the normal distribution in this case because it is continuous and has support along all real numbers (revenue growth could theoretically be negative or positive).

To define this sampling model, I’ll compute the mean and variance for this data set and use these as the parameters for the sampling model. The form this will take is

where the first term represents the unknown true average growth rate for MongoDB’s revenue and the second term represents the variance of the growth rates; this variance is assumed to be known. One could just as easily assume that the mean is known while the variance is unknown or that neither is known; all three classes of situations are well-documented and have substantial literature regarding how to work them. The normal-normal model applies to the situation with known variance and unknown mean, hence why I am making the current assumptions.

Next, a prior distribution for θ must be defined. For the same reasons that I’m using a normal distribution for the sampling model (continuous, support along positive and negative values), I’m going to use a normal distribution as my prior. A mean and a variance must be defined for the variable θ. I’ll define this distribution as

where the first term is the prior mean and the second term is the prior variance. There is significant literature dedicated to selecting priors; the main focus of this piece is how to apply the normal-normal model, so I didn’t put extensive effort in defining my prior distribution.

To select a value for the prior mean, I looked at the average revenue growth rate of sales of the S&P 500 index over the last 19 years (multpl.com) and then multiplied it by the β of MongoDB. In the world of equities, β refers to the covariance of an individual stock’s returns with the return of broader basket of stocks (often called an index) divided by the variance of the index returns. MongoDB has a β of about 1.26 according to Seeking Alpha, a research site with news, data, and analyses of many stocks. Whenever a β > 1 is observed, one can assume that the stock is more volatile than the index it is being compared to; for this reason, I multiply the revenue growth of the index by β. Other approaches could involve looking at slightly older companies in the software service industries or similar age companies across industries. No method is perfect, and all are viable with their own merits.

The next parameter that must be assigned is the prior variance. Just to be clear, this is not the presumed variance in growth rates, but the presumed variance of the AVERAGE growth rate; this prior variance is meant to reflect certainty in the accuracy of the prior mean. If one had full confidence that this was the correct mean to use, the variance could be set very close to 0 (for computation purposes, 0 can’t actually be used, but a very small number such as .00001 would suffice). On the other hand, if one had very little confidence in the estimate, a large variance could be used to indicate this level of certainty. In this case, where the prior mean is about 4.5%, I don’t have much of an opinion of how confident I am with this estimate. To define my distribution, I’ll use a standard deviation of 10%. With this, I’m effectively stating that I’m 95% confident that the true value for theta lies between -15.5% and 24.5% (4.5+/-2 standard deviations). This estimate may seem highly conservative given how MongoDB’s average growth rate has been about 61%, but this is exactly why Bayesian inference is powerful. MongoDB has spent the majority of its time trading in a bull market that was particularly favorable for software names. The prior distribution reflects data from multiple market cycles and consequently multiple phases of growth and contraction. Between the possibility of economic contraction, the chance MongoDB doesn’t execute its strategy effectively, and revenue growth slowing simply due to scale, I’m holding the subjective belief that MongoDB’s true average growth rate is less than what the sample data suggests. The prior distribution I’ve selected represents that belief. Now, I can study the output of the analysis.

To recap, here are the forms for the two models:

Great, let’s move on to the analysis!

Posterior Analysis and Intuition

I’ll focus more on the intuition offered by these forms rather than walk through a derivation by hand. Anyone truly interested in using the normal-normal model should study the derivation of the above parameters. Wikipedia has some good documentation, and most introductory textbooks to Bayesian statistics cover the derivations in detail.

When a normal distribution is used for the sampling model in conjunction with a normal for the prior distribution on the sample mean, the resulting posterior distribution is a product of two normal models. The power of the normal-normal model is that the product of these distributions is also a normal distribution, albeit with updated parameters. In Bayesian jargon, a normal prior distribution is a conjugate prior distribution, meaning that it and its resulting posterior distribution have the same form. The fact that the posterior distribution is a normal distribution may not seem like that big of a deal, but depending on the data to be modeled and the parameters to be estimated, there are many instances where the posterior does not take such a familiar form. Because this posterior distribution is well-defined, one can sample from it directly and consequently compute summary statistics on it easily.

The notations and re-parametrizations below are from Chapter 5 in Peter Hoff’s textbook, “A First Course in Bayesian Statistics,” the book I used in my first undergraduate Bayesian statistics course and the book I’ve been studying in recent times.

The posterior distribution takes the form

where the first term refers to the posterior mean and second term refers to the posterior variance. The formulas to calculate these updated parameters are

and

These formulas may look somewhat intimidating, but hopefully you see some similarities between them. A common practice and a particularly helpful one for gaining intuition about these formulas is to look at the formulas in terms of precision rather than variance. Precision is the inverse of variance.

In this case, we have three relevant precisions to observe:

If the posterior variance formula is inverted to calculate posterior precision, one can see that the posterior precision in terms of standard deviations is

This can be written in terms of precisions as

In this form it’s clear that the posterior precision is the sum of the prior precision and the sample precision multiplied by the sample size. The posterior mean can also be re-written in terms of precisions:

Here, it’s clear that the posterior mean is a weighted average of the prior mean and sample mean.

For the data, the posterior parameters are:

And there they are — the updated parameters. The posterior estimate for the average growth rate is about 52.7% — a decent bit lower than the sample average, but not overwhelmingly lower. I’ve taken a subjective belief, represented that belief with a distribution, and used that distribution to augment the analysis. Hooray! This is the power of Bayesian inference. As long as beliefs can be defined, they can be incorporated in a rigorous way in the analysis. Let’s talk a little more about what I have and also what I don’t have.

With the posterior standard deviation, I can compute a credible interval for the estimate. For those new to Bayesian statistics, a credible interval is not the same thing as a confidence interval even though they are computed in a similar manner. The 95% credible interval for the posterior mean is .527+/−2∗.0391.527+/−2∗.0391 which leads to points of 44.88% and 60.52%. With this credible interval, I’m making the statement that I’m 95% sure that the true value of the posterior mean falls within the interval. Even at this point, I don’t treat this updated mean as a known entity. Furthermore, I’m not saying that 52.7% is my forecast for revenue growth rate over the next rolling one-year period. If I wanted to make a forecast within this framework, I’d use the posterior predictive distribution. Since that is a separate topic, I won’t touch on it here, but the process of deriving that distribution is similar to deriving the posterior distribution.

Two key implications should be noted from this analysis: the first is that as sample size grows larger, the posterior mean and posterior variance are more and more determined by the sample data. I’m not going to state that there’s an explicit cutoff, but at some amount of data, adding a prior doesn’t move the needle much all else equal. Intuitively, this is reasonable. If you have rich enough sampling data, the sampling data likely represents the actual structure in the data, and you may not see the need to utilize a prior distribution.

To emphasize the first point, I re-ran the analysis using strictly the year-end data which would provide a sample size of five data points. Using the same prior distribution, the new sampling mean and variance are about 59.8% and .012 (or 11.1% standard deviation), and the posterior mean and variance are 23% and .0019 (or 4.45% standard deviation). This posterior estimate for the mean is much lower than what was observed in the first iteration; with the sample size cut significantly, the prior plays a much heavier role in the output. The standard deviation didn’t change as much, but I can see that it’s larger even though the sampling standard deviation was smaller the second time around. I have a much lower estimate, and I have slightly less confidence in the estimate (wider credible interval).

The second implication of the analysis is that the smaller the prior variance, the greater the prior precision and the greater impact it has on both the posterior mean and posterior variance. The more confidence one has in the prior, the more it will affect the posterior estimates. To illustrate this point, I re-ran the original analysis with different values for the prior variance. The values for the prior mean are all .045, and the sampling mean and variance come from the rolling revenue data. The table below shows the results of this experiment.

I’ll also plot the distributions.

Notice how much closer to the prior mean the posterior distribution with prior variance set to .05 is. As I increase the prior variance (effectively signifying less confidence in the prior mean), the center of the posterior distribution moves closer to the sample mean. Also, while the magnitude of the changes in the posterior variances may not appear that great in the table, from the distribution plots above, one can see how the distributions get progressively wider; in other words, the credible interval for the true value of average growth widens.

Summary

Just to recap, I analyzed a young company and wanted to estimate the true growth rate of its revenue. Given the small amount of sample data I had and a subjective belief that the average growth rate will be less than what the sample data suggests, I used Bayesian inference to augment the analysis. I defined a sampling model for the data, defined a prior for the average growth rate that reflected a subjective view, and utilized the normal-normal model to arrive at a posterior estimate and interval for the company’s average growth rate. I hope you found this brief introduction to Bayesian inference as well as the analysis of the results useful. I don’t recommend using the specific numbers in this piece for any valuation of MongoDB, but hopefully you can apply the concepts to your own analysis. I’m attaching a link to the GitHub repository for the code; nothing is particularly complicated, but I’ll share it in the spirit of transparency and reproducibility.

https://github.com/vinai-oddiraju/TDS_Blog_Post1.git

Lastly, I want to thank the friends and family members who took time to read my drafts and provide feedback throughout the process. As this is my first time writing about a project in this manner, their support is especially appreciated. Thanks, and take care!

Disclaimer

The thoughts and views expressed in this report are mine alone and do not necessarily reflect the views of my firm. This report is intended to be educational in nature and should not be construed as individual investment advice nor as a recommendation to buy, sell, or hold any security or to adopt any investment strategy.

Sources

[1] Hoff, Peter D. A First Course in Bayesian Statistical Methods (2007). Print.