The world’s leading publication for data science, AI, and ML professionals.

Bayesian A/B Testing Explained

Explaining Bayesian A/B Testing with Python Implementation

Image by Alessandro Crosato from Splash
Image by Alessandro Crosato from Splash

There are many applications of A/B testing across various industries. From trying to identify optimal market groups to target to medical drug testing, it has various applications and allows businesses to make decisions based on the results. There are two common ways to approach A/B testing, the frequentist approach and the Bayesian approach, both stepping from the foundations of hypothesis testing. In this article, I’ll cover the explanation and implementation of the bayesian approach to A/B testing. This article presumes you have a decent understanding of what A/B testing is in practice, if not, you can learn more about it and the frequentist approach here.

Frequentist A/B Testing Explained

Table of Contents

  • The Bayesian Approach
  • Bayesian Machine Learning
  • Bayesian A/B Testing
  • Explore Exploit Dilemma
  • Problem Statement
  • Bayesian Bandit / Thompson Sampling
  • Bayes Theorem
  • Beta Distribution
  • Implementation
  • Concluding Remarks
  • Resources

The Bayesian Approach

The Bayesian approach stems from one main rule, that everything is a random variable. For example, if given some dataset and you were asked to find the mean and variance of the data, your output would simply be a numerical value mapped to the mean and a numerical value mapped to the variance. However, in the Bayesian approach, you’re no longer looking for a number but a distribution.

When trying to identify the mean, you can see the difference of approaches below : 
Frequentist : ῦ
Bayesian    : p(ῦ | data)

Bayesian Machine Learning

If you think about this from a machine learning perspective, then you can see that A/B testing is intuitively very similar to reinforcement learning. Reinforcement learning refers to allowing agents to take actions in an environment in order to maximize the notion of a reward. A/B testing can be viewed as a set of randomized experiments with randomly partitioned users to maximize some reward.

For example, if we were to model something like the click through rate of two buttons on our website with an A/B test, different layouts with different buttons can be defined as an action and increased click through rates can act as a reward. We want to pick a layout which will maximize the click through rate.

Bayesian A/B Tests

Given some data, the bayesian procedure can be outlined with the following steps [1]:

  1. Identify out prior distribution (gaussian, poisson, beta, etc.), this expresses our initial understanding of a parameter (ῦ for example) prior to seeing any data
  2. Choose a statistical model (Markov chains, bayesian bandits, etc.) which reflects our beliefs about x given ῦ
  3. After observing some data, update our beliefs and calculate the posterior distribution p(ῦ | x). The posterior distribution is a probability distribution which portrays your updated beliefs about your parameter after observing the data.

As you can imagine, the larger the number of observations (N), the better the approximation of your posterior distribution. However, if your number of observations is too large, you’re losing a lot of impressions which could generate revenue for your website. For example, if you were running an A/B test for your website to identify which (of two) landing pages yields a very high click through rate, then the more samples you do, the more people you’re exposing to a landing page which would decrease the clicks the number of clicks you would have potentially gotten. Thus, having a sample size which is not too large and not too small is ideal.

Explore Exploit Dilemma

In reinforcement learning, when an agent gathers information through assessing the scenario which may lead to a positive outcome is known as exploration. After exploration, one learns the optimal decisions with the highest possible outcome given the current known information is called exploitation. It’s best to balance exploitation, and exploitation.

Problem Statement

Suppose you want to test two different positions for creating users on your platform. You have position 1 which is located on the top left corner of your website whereas position 2 is located on the top right corner. The only difference between the two landing pages is the sign up button is in two different locations, everything else is the same and our experiment is _iid._

Bayesian Bandits / Thompson Sampling

Before we explore the Bayesian bandits algorithm, we need to do a bit of review on bayes theorem and the beta distribution.

Bayes Theorem

Image provided by Tirthajyoti Sarkar from here
Image provided by Tirthajyoti Sarkar from here

Essentially, posterior ~ likelihood * prior

Beta Distribution

This is a continuous probability distribution which is bounded by the interval [0, 1] and is dependent on two parameters α and β. Both α and β must be positive. Without going into the arithmetics, the PDF of the beta distribution can be modelled by the following equation:

Image provided by author
Image provided by author

For a more thorough explanation and derivation, visit the wikipedia page for the beta distribution here.

This animation shows how the beta distribution changes for various values of its parameters alpha and beta. Source : Pabloparsil from Wikipedia
This animation shows how the beta distribution changes for various values of its parameters alpha and beta. Source : Pabloparsil from Wikipedia

Now we can explore the bayesian bandits / Thompson sampling algorithm. For the purposes of this experiment, let’s assume we know the click rate probabilities of each of both position 1 and position 2. Of course, in real world examples, this won’t happen but for the purposes of evaluating how well our algorithm will perform in this scenario, let’s say p(pos1) = 0.1 and p(pos2) = 0.55

Since we don’t have any existing observations, we can’t have any prior beliefs. To model our prior probabilities, we can use the beta distribution for α = 1 and β=1. This would be the uniform distribution over the domain of [0,1]. We choose the uniform distribution because we have no clue of what the result may be, thus we give equal probability to every possible value. Note, for industry scenarios, if you have prior knowledge available then you should use that prior knowledge in your implementation.

For this method, the posterior in one step becomes the prior in the following step, our posterior and prior can both be modeled through beta.

                          Beta * Data = Beta
                         |____|        |____|
                         prior         posterior

Implementation

Note, that your implementation might have slightly varying results due to random sampling of the distribution.

As you can see the algorithm converges quickly to the optimal distribution. Based on the results of this experiment, it is evident that position 2 outperforms position 1 and should be the location of the sign up button on your website.

Concluding Remarks

The main difference between the frequentist and bayesian approach is that the Bayesian thinks of parameters as random variables. The steps to conduct an A/B test the Bayesian way is to identify your prior distribution, choose a statistical model and to calculate and update your posterior distribution. Generally the Bayesian approach to A/B testing converges quicker than other traditional A/B tests. This implies that a smaller sample is necessary to make a conclusion.

Resources


If you liked this article, these might interest you as well :

Monte Carlo Method Explained

Markov Chain Explained

Random Walks with Restart Explained


Related Articles