How to accelerate decision-making and improve accuracy when dealing with limited data

What is A/B Testing and Why is it Hard?
A/B testing is a simple way to reduce uncertainty in decision making by providing a data driven way to determine which version of a product is more effective. The concept of A/B testing is simple.
- Imagine you are at a friend’s birthday party. You’ve been painstakingly working on perfecting your cookie recipe. You think you’ve perfected it, but you don’t know if people will prefer the cookie with or without oats. In your opinion, oats give the cookie a nice chewy texture. However, you’re not sure if this is a mass opinion or just your individual preference.
- You end up showing up to the party with two different versions of the cookie, cookie A has oats and cookie B doesn’t. You randomly give half of your friends cookie A, and the other have get cookie B.
- You decide that the cookie that gets more "yums" is the better cookie.
- Once everyone has tasted the cookie, you find that cookie B got more "yums" and conclude that is the better cookie.
This process of randomly distributing cookies to party guests and monitoring their feedback is an example of an A/B test.
In the world of technology, A/B testing provides a data driven way to determine which version of a product is more effective. By randomly routing users to different versions of an experience, you can empirically measure the impact of different product versions on key performance metrics. This allows you to validate changes, and iteratively optimize product offerings.
In my role as a senior Data Science manager, we most commonly use A/B testing to test different pricing models to see which model leads to the most purchases. Consider two pricing strategies – one where the product is priced at $19.99 and the other is priced at $24.99 with a 20% discount. These two pricing strategies lead to the same price, but are customers more likely to purchase if they see a 20% discount? We can test this using A/B testing!
Traditional A/B tests typically require a certain amount of samples before you can conclude that one version of the product or model is better than others. In other words, traditional A/B tests require enough samples so that the test itself can be considered statistically significant. The number of samples required to achieve statistical significance on an A/B test is set before the experiment begins, and then you wait. This is referred to as fixed sample size A/B testing.
Fixed sample size A/B testing is problematic for a plethora of reasons.
- Time Intensive: In large companies with huge volumes, you may reach your desired sample size quickly. However, if you’re like me, and work in a small startup where volume isn’t as large – waiting for the test to finish can be time intensive. Recently, my team designed an A/B test only to realize that it would take us 2 years to reach the desired sample size!
- Inflexibility: Once you’ve established the required sample size for your A/B test, you’re locked into that decision. If external factors change you can’t easily adjust the test without compromising the test.
What is sequential Testing and why is it (maybe) Easier?
Sequential testing is a version of A/B testing that allows for continuous monitoring of data as it is collected, enabling decisions to be made earlier than in traditional fixed-sample tests. By using predefined stopping rules, you can stop the test as soon as sufficient evidence is gathered.
Sequential testing is an alternative to fixed sample size testing. It’s commonly used in situations where there are:
- Low volumes: When you have limited data coming in and need to make decisions quickly, sequential testing allows you to draw conclusions without waiting for a large sample size.
- Cost or time constraints: If the cost or time to collect data is high, sequential testing can help reduce the number of samples needed by allowing the test to stop as soon as a clear result is observed.
- Adaptive factors: When conditions or user behavior might change over time, sequential testing allows for more flexible decision-making and adaptation as new data is collected.
How does Sequential Testing Work?
Implementing sequential testing relies on the Sequential Probability Ratio Test ("SPRT"). This ratio is used to test two competing hypotheses:
- Null Hypothesis (H₀): The parameter of interest (like a conversion rate) is equal to a specified value, often the status quo or baseline. 𝑝 = 𝑝
- Alternative Hypothesis (H₁): The desired change in the parameter of interest. 𝑝 = 𝑝 + Δ
Once you have defined the null and alternative hypothesis, you need to set up decision boundaries.
- SPRT uses two boundaries (upper and lower) to decide wether to accept H₀, accept __ H₁, or continue collecting data.
These boundaries are determined based on desired error rates
- Type 1 error (⍺): A type 1 error occurs when you conclude that there is a meaningful difference in the A and B groups, when in reality there isn’t a difference. This is also known as a false positive.
- Type 2 error (β): A type 2 error occurs when you conclude there is no meaningful difference in the A and B groups, when it reality there is a difference. This is also known as a false negative.
In sequential testing, ⍺ and β are commonly set at 0.05 and 0.20. However, these need to be set appropriately to reflect your experiment. Once the desired error rates have been set, you use them to set the relevant boundaries.
- Upper boundary (U) = (1- β)/⍺
- Lower boundary (L) = β / (1 -⍺)
For each new observation that comes in, we update the likelihood ratio as LR =LR(n-1)* λ(data|H₁)/ λ(data|H₀)
. This link has a conditional probability refresher.
Each time this likelihood ratio is updated, it’s compared against the boundaries we set previously:
- If
ℒ > U
, reject H₀ and accept H₁ - If
ℒ < L
, reject H₁ and accept H₀ - If
L ≤ ℒ ≤U
, continue test and collect more data
In the section below, we will walk through an illustrative example.
Sequential Testing Example
Imagine you are a data scientist responsible for figuring out if the model you’ve developed recently results in more conversions than the current production model. You decide that you will randomly route a portion of potential customers to your new model, while the remainder will continue to use the model currently in production.
The existing production model has an associated conversion rate of 5%. We hope that our new model will increase this conversion rate to 7%, but we’re not sure. For that reason, we develop a sequential A/B test to test this.
First, we define our hypotheses.
- H₀: 𝑝 = 0.05; baseline conversion rate
- H₁ 𝑝 = 0.07; desired conversion rate
Next, we will set up our decision boundaries. We will use commonly used error rates to set our boundaries (⍺ = 0.05, β = 0.2).
- Upper boundary (U) = (1- β)/⍺ = (1–0.2)/.05 = 0.8/.05 = 16
- Lower boundary (L) = β / (1 – ⍺) = 0.2 / (1-.05) = 0.2/0.95 ≅ 0.211
At this point, we will collect some data. For each observation, whether it is a success (conversion) or a failure (no conversion), we will update the likelihood ratio consistently.
- In the case of a success, we will always multiply the current likelihood ratio by P(success|H₁)/P(success|H₀) = 0.07/0.05 = 1.4.
- In the case of a failure, we will always multiply the current likelihood ratio by P(failure|H₁)/P(failure|H₀) = (1–0.07)/(1–0.05) = 0.93/0.95 ≅ 0.98.
Below, I’ve simulated some observations and the associated changes to our likelihood ratio. As a disclaimer, I made sure we got a lot of successes early so the table wasn’t thousands of observations long (based on our actual conversion rate, this is unlikely).

After 11 observations, we find that our model isn’t just good, it’s great! We’re converting nearly everything. We reject H₀ and accept H₁. Obviously, this is a simplified example but this is generally how sequential testing works.
Is Sequential Testing Risk Free?
In this article, we’ve explored the idea of sequential testing as an alternative to fixed sample size A/B testing. Sequential testing offers advantages, such as the potential for faster decision-making and greater adaptability to evolving market conditions. These benefits can lead to more efficient experimentation, particularly in low volume environments where you might not have the time to wait to accumulate a certain sample size. I’ve painted a rather blissful picture for sequential testing, but sequential testing isn’t without it’s own set of risks.
- Early Data Can Lead you Astray: Since the test is checked as each new observation comes in, the chance incorrectly rejecting the null hypothesis increases, especially on early data. Early data might show a strong effect that diminishes as more data is collected, leading to premature conclusions. In my example above, we had 9 conversions in 11 observations, despite a baseline conversion rate of 5%. These results could be considered an outlier, leading us to reject the null hypothesis too quickly only to revert to the baseline at a later date.
- Complexity in Interpretation: Statistical interpretation of sequential tests can be more complex. Although we don’t cover it in this article, to maintain the validity of the results, sequential tests often require the use of advanced statistical methods, such as alpha spending functions or other corrections. These methods ensure that the overall type I error rate remains controlled throughout the multiple testing stages. However, the added complexity can make it more challenging to correctly interpret the results, potentially leading to misinformed decisions if not properly understood.
Hopefully by now, you know what sequential testing is and have an idea about the pros/cons of sequential . Although sequential testing can offer many benefits, there are scenarios where the predictability and simplicity provided by fixed sample size A/B testing may be more appropriate for your experiment. The choice between sequential and fixed sample size testing should be guided by the specific goals, constraints, and context of your experiment.
Help me grow my page!
All claps and comments are appreciated. It’s how Medium knows if I’m doing a good job!
- Follow me on Medium too
- Subscribe to my newsletter below