Notes from Industry

The Three Most Important Statistical Tests in Business Analytics

Academics Invent New Ones All the Time but in Business Just Three Carry Most of the Load

Elvis
Towards Data Science
6 min readMar 17, 2021

--

Image by Author

There are more statistical tests than one can count; all of them have a reason to exist and mathematicians are paid to invent new ones all the time. But in my experience practically speaking business analysts only need to know three basic tests:

  1. t-Test
  2. chi-square Test
  3. Kolmogorov-Smirnov Test (more commonly called the K-S Test)

In business analytics the majority of the time you are working with averages, counts or distributions. Let’s look at a few examples and my recommendations of what test to apply:

  1. Comparing the average order value between two months to determine if statistically they are the same. In this case, use a two-sided t-test to compare the two averages.
  2. A/B test comparing the number of sales from the test experience versus the control experience to determine which version generates more sales. In this case, use a 2x2 chi-square test to compare the count of sales (and non-sales) from each experience.
  3. A/B/C test comparing the number of sales from a 3-way test to determine which version generates more sales. In this case, use a 3x2 chi-square test to compare the count of sales (and non-sales) from each experience. This can be generalized to any number of versions. So if you have N versions to test, use an Nx2 chi-square test. This is what you do if you are running a multi-variate test (well specifically a full-factorial version of the test, which means you explicitly test every possible combination of experiences… typically very difficult in practice).
  4. A/B test comparing the number of errors generated from process A versus process B (e.g., maybe you added an extra QA step in your production process and want to determine if that extra QA step is reducing errors by at least 1% to make it have a positive ROI for the extra cost you are incurring). Again use a 2x2 chi-square test.
  5. Comparing your distribution of sales on a Monday against the distribution on a Tuesday. By distribution I mean the probability that you get 1 sale, 2 sales, 3 sales and so on till the maximum number of sales you get for that day. Typically most businesses have something that looks Gaussian or Poisson for this type of distribution. In this case you would use a two-sided K-S test to compare the two distributions.

Practically speaking the t-test and chi-square test are used much more often in business than the K-S test. There is no mysterious reason for this; the reason is that most problems in business are reduced to looking at averages or counts. It is much rarer for a business problem to require an analyst to look at a distribution… although my personal opinion is that businesses should look at distributions much more often then they do. Too often business users assume that just because two averages are the same, then the underlying process that generates those averages are the same. But in reality they could be completely different. It would be much more revelatory to examine the distributions directly rather than just the average. However, I admit that in practice its difficult to collect sufficient data to construct a reliable distribution (e.g., you can only have 365 daily sales numbers in a year and a business can change so much over the course of a year that many of those 365 samples could be coming from different distributions).

Of those three tests, in my experience the t-test tends to be the most familiar to most analysts because its the test taught in practically every Statistics 101 course. And it is relatively easy to apply. However the chi-square test is just as important as the t-test but much less known to analysts. I’ve seen situations where the t-test is used in place of the chi-square test but this can lead to incorrect conclusions. We can use a simple example to easily determine when one is appropriate versus the other. Imagine that we perform the following test: A large group of people arrive at a room with two doors, one on the left and one on the right. One by one, each person is randomly given a yellow card or a red card. Once the person receives their card, they are asked to either choose the left door or the right door and to enter the room they selected. Once everyone has received a card and chosen a door, we then can ask several questions about the experiment.

  • Did the color of the card, a categorical variable, influence how many people selected the left or the right door? Use the chi-square test to answer that question.
  • Is the average height of the people in the left room different from the average height of the people in the right room? Use the t-test to answer that question.

Can we use the result of the second question to determine if the color of the card influenced the choice people made? For example, if we apply the t-test to the average height of the people in each room and the test result is that with a 99% confidence level the average heights are the same, then can we conclude the color of the card had no influence? No. Consider the situation where everyone who received the red card chose the left door and everyone who received the yellow card chose the right door. Everything else being equal and with a large enough population, the average height of the people in the left room will be statistically the same as the average height of the people in the right room but clearly the color of the card determined which door people selected. The t-test would imply the color of the card had negligible, if any influence, on the choice people made but the chi-square test would clearly state the color of the card had an overwhelming influence on the choice people made. The t-test is not wrong; it is merely misinterpreted because it is being applied to the wrong question. Such are the dangers of statistical analysis!

By the way, people very knowledgable in the theory of statistical analysis might object that the t-test and chi-square test may not be appropriate in some cases because they are parametric tests (i.e., they assume a very specific underlying distribution for the process that generated the data). The t-test assumes a Gaussian distribution and the chi-square test assumes a chi-square distribution. That is a valid objection. However, it has been my experience that for business problems it doesn’t matter too much for the same reasons I mentioned in an earlier posting: business problems are inherently “noisy” and its not always possible to collect sufficient data to calculate a reliable underlying distribution that would inform a more nuanced selection for a statistical test. Not to mention the underlying distribution might be changing with time making the whole endeavor a fools errand. So in practice I have found it sufficiently reliable to not worry about the exact underlying distribution and use these two parametric tests when dealing with business data (although I will admit this was not true in my past life when I was an engineer/scientist in the defense industry dealing with technical data… but that is a discussion for another time). For the more technically minded reader I will mention that — specific to the t-test — the central limit theorem guarantees that in many real-world processes with non-Gaussian underlying distributions will generate metrics that tend toward a Gaussian distribution… in which case the t-test can be reliably used. And the K-S test is a non-parametric test which makes no assumption about the underlying distributions so none of these issues apply.

There are more sources online than I can count that explain how to implement these three tests. However, if there is any interest in me posting an article on how to calculate each one, please comment below since I think too many sources make it more complicated than it has to be.

--

--

An Amazonian academically trained in Physics and Electrical Engineering experienced in Data Science, Data Engineering, Analytics and Business Intelligence.