How to think about combinatorics like a data scientist

Outlier AI
Towards Data Science
11 min readJun 19, 2017

--

Gain a solid grasp of these fundamental concepts to improve your decision making process and drive your company forward.

We’ve recently talked about decision trees, game theory, and customer personas. For each topic, we’ve used examples that have a small number of cases so that we could show you something interesting, but manageable in size. Last week, we talked about how you can reduce the size of the problems you are trying to solve using clustering. This week, we will take one step even further back and talk about how to count the number of cases you have to start with.

To do this, we will talk about the mathematical field of combinatorics, in particular, permutations and combinations. A permutation counts the number outcomes where the order of what you are counting does matter. A combination, on the other hand, counts the number of outcomes where the order of what you are counting does not matter. Both permutations and combinations are further broken down to consider whether the options of what you are choosing from is allowed to be repeated, i.e., replaced in the set of available options after each choice, or not (you’ll see this concept referred to as both “repetition” and “replacement” — I’ll use “replacement” this week). We will talk about each scenario this week:

Let’s start counting!

Outlier monitors your business data and notifies you when unexpected changes occur. We help Marketing/Growth & Product teams drive more value from their business data.

Schedule a demo today.

- Outlier was the Strata+Hadoop World 2017 Audience Award Winner.

Permutations with Replacement

Back when we were talking about game theory, one of the scenarios we talked about was that your company (e.g., Doug’s Desserts) needs to make a decision as whether to raise your price, lower your price, or keep your price the same. Suppose you have 3 other competitors (let’s call them A, B, and C) that sell the same product as you, and they face the same three decisions. Also, you know from your past experience, that your company is the leader in the space and then each of the other companies always react to your pricing decisions in the same order, A then B and then C.

A permutation counts the number outcomes where the order of what you are counting does matter. A combination, on the other hand, counts the number of outcomes where the order of what you are counting does not matter.

So how many different scenarios could happen? We could count them all out by hand of course…

…but that gets harder and harder as the number of options grows. This problem is an example of a permutation with replacement. Let’s start at the beginning to think through the number of outcomes. Doug’s Desserts first makes a choice between 3 options, then Company A makes a choice between the same 3 options. In other words, for each of the 3 options Doug’s Desserts could choose, Company A could choose any of the same 3 options, meaning there are 3 * 3 = 9 possible outcomes. Next, Company B makes a choice between the same 3 options. In other words, for each of the 9 outcomes we computed for Doug’s Desserts and Company A, Company B could choose 3 options, meaning there are 9 * 3 = 27 outcomes. Finally, Company C makes a choice between the same 3 options. In other words, for each of the 27 outcomes we computed for Doug’s Desserts, Company A, Company B, Company C could choose 3 options, meaning there are 27 * 3 = 81 outcomes.

You can now see the pattern — each time a new choice is made, we multiply the previous result by the number of options. Let’s denote n as the number of options, n = 3 in this example, and k as the number of times the choice is being made, k = 4 in this example. Then we can generalize the concept of permutations with replacement: when there are n options being chosen k times, each time with replacement of the options, there are n^k different outcomes. In this example, that means, 3⁴ = 3 * 3 * 3 * 3 = 81.

Next, I’ll talk about the scenario of permutations where the options are not allowed to be replaced.

Permutations without Replacement

So that’s how to count permutations when the same set of options is available at each step of the process. Now, I’ll talk about how to count the outcomes when each choice is only allowed to be made once.

Suppose you are ready to launch a new product and are trying to figure out the best strategy on how to announce it to the general public. You’ve identified four different marketing strategies that you are considering to undertake:

  • Send a press release and schedule interviews with the news media
  • Run an online marketing campaign
  • Run a TV / radio marketing campaign
  • Sponsor and speak at an industry conference

From your experiences in marketing, you believe that the order in which you undertake these initiatives matters. In other words, you think that the outcome that would come about by running an online marketing campaign before sponsoring and speaking at an industry conference would be different than if the conference happened before the online marketing campaign.

So how many different outcomes do you have to consider? This is an example of permutations without replacement. There are four different strategies to consider, which we will denote as n. When thinking about what to do first, you have four choices. When thinking about what to do second, you only have three choices left (since we aren’t going to do the same marketing strategy twice). At the time you are making the choice of what to do third, you have two strategies remaining. Finally, the strategy to do last is determined by your previous choices as there is only one option left. Putting this together, you have 4 * 3 * 2 * 1 = 24 different outcomes to consider.

The calculation of the product of integers that we just made happens so frequently that it has its own special name and mathematical notation. It is called a factorial and denoted with as an “!” after a number. It is shorthand to mean, take the product of each integer less than or equal to the number. For example, 4! = 4 * 3 * 2 * 1 = 24, exactly what we computed above. More generally then, if we are going to undertake all n of our marketing strategies, then we have to consider n! different outcomes.

But what if you can’t do all of the marketing strategies because you just don’t have the time or budget. Instead, you can only choose 2 of the 4 options. Let’s denote the number of selections you’ll make as k, k = 2 in this example. You can choose from 4 strategies for the first step and 3 strategies for the second step, but then you stop there. So you have 4 * 3 = 12 outcomes to consider. Notice how the only difference between this outcome and the one earlier is that 2 * 1 is no longer in the formula. We can rewrite the formula (and it will be clear why soon) as 4 * 3 * 2 * 1 / 2 * 1 = 4 * 3 = 12. In other words, we have a total of n! outcomes, but need to reduce that value by the number of choices we are able to make, which is (n — k)!

Putting both of these examples, we can write one, general formula how to compute permutations with replacement: n! / (n — k)!, commonly denoted P(n, k). In the first example we talked about, there are 4 options (n = 4) and 4 choices (k = 4), so 4! / (4–4)! = 4! / 0! = 24 [1]. In the second example, there are 4 options (n = 4) and 2 choices (k = 2), so 4! / (4–2)! = 4! / 2! = 12.

Now that we know about how to count when order does, matter, tomorrow, I’ll talk about combinations, where order does not matter.

Combinations without Replacement

When you think of combinations, the first thing that comes to mind might be a “combination lock”. Unfortunately, that is actually a very bad example of what a combination really is! As we talked about the past few days, in permutations, the order does matter, but in combinations, the order does not matter. As you know though, the order in which you enter numbers in a combination lock makes all the difference, so if we wanted to be more accurate (at least from a mathematician’s perspective), we should rename “combination locks” to be called “permutation locks”!

Let’s reconsider our example from yesterday, where you have a four different marketing campaign strategies that you are considering for an upcoming product launch. In the context of permutations, the order of those decisions matter. But today, let’s assume that you are going to make a big marketing blast and do all of those strategies at the same time. Then how many different outcomes are there? Simple, one — you do everything at the same time!

Now let’s assume that you want to make a big splash, but don’t have the budget or time to actually do all four strategies at the same time. Instead, you can only do two. The easiest way to think about combinations without replacement is to first compute the number of permutations without replacement and then reduce that value by the number of ways in which the choices could be ordered. Yesterday we calculated that there are 12 different permutations without replacement when I had four options and two choices. So how many ways could our two choices be made, with replacement? We already know the answer from yesterday, it is P(2, 2) = 2! / (2–2)! = 2! = 2 * 1 / 1 = 2. That means that the number of combinations without replacement is 12 / 2 = 6.

We can put all of this together into a general form, again denoting n as the number of options and k the number of choices. We calculated the outcomes today as P(n, k) / P(k, k). Because P(k, k) is the same as k!, we can write this as P(n, k) / k! = n! / (n — k)! * k!, which is commonly denoted C(n, k). In the first example we talked about today, there are 4 options (n = 4) and 4 choices (k = 4), so 4! / (4–4)! * 4! = 4! / 0! * 4! = 1. In the second example, there are 4 options (n = 4) and 2 choices (k = 4), so 4! / (4–2)! * 2! = 4! / 2! * 2! = 6.

Next, I’ll wrap up our conversation on combinatorics by discussing combinations with replacement.

Combinations with Replacement

The last type of combination we will talk about is combinations with replacement. One example of this type of counting problem is buying products in a store. For example, you are standing at the counter of my hypothetical bakery, Doug’s Desserts, and want to buy four donuts (i.e., the number of choices you make, where k = 4). My bakery sells three types of donuts: vanilla frosted, glazed, and jelly-filled (i.e., the number of options you have, where n = 3). The question now is, how many ways can you buy four donuts from the three options, given that you can buy as many as you like of each type of donut?

Similar logic applies to the example I discussed when talking about permutations with replacement — four companies each choosing between three options of what to do with their product’s price. In that context of permutations, we made the assumption that the order of the choices matters. For combinations, though, let’s assume that the order doesn’t matter. The question now is, how many different ways can the four companies can make their pricing decision, assuming that they are all setting their price at the same time?

In both of these examples, there are n = 3 options to choose from, and a decision is being made k = 4 times, where all of the options are available each time the decision is being made.

Let’s use the donut example to think about how to solve this problem, as it is easier to visualize. Imagine that you are at the bakery and and that all of the donuts of each type are all on their own tray. So there are a dozen[2] vanilla frosted on one tray (represented by “O” in the table), a dozen jelly-filled on the tray next to it, and a dozen glazed on its own tray as well.

You like vanilla frosted and really like jelly-filled, but don’t like glazed, so your plan is to choose one vanilla frosted and three jelly-filled. You could imagine the purchasing process being something as follows.

First you stand in front of the tray of vanilla frosted donuts (the tray you are looking at is colored blue)…

and ask for one of those donuts to be put in your bag. Then you move to the next tray of donuts, glazed.

Since you don’t want any of these donuts, none are taken from the tray. You move to the last tray of donuts, jelly-filled…

and ask for one of these donuts three times.

If a bystander had been recording the steps you took to buy your donuts, it would have looked like this: (1) donut (2) move to next tray (3) move to next tray (4) donut (5) donut (6) donut. The actual donut chosen doesn’t really matter because this is a combination and order of choices do not matter.

Looking at the purchase this way, no matter which donuts you choose, you are always going to have 4 (i.e., k) “donut” steps and 2 (i.e., n — 1) “move to next tray” steps. This is like saying, we have k + (n -1), 6 in our example, different options of choosing a donut or moving to the next tray, and we want to choose k, 4 in our example, donuts. In other words, this is the same as a combination without replacement, but with slightly modified inputs to the formula we discussed yesterday! The total number of outcomes is C(k + (n — 1), k) = (k + (n — 1))! / (k + (n — 1) — k)! * k! = (k + (n — 1))! / (n — 1)! * k!. Plugging our values in from the example, there are 6! / 2! * 4! = 15 outcomes of how you could select your donuts (or the companies could make their pricing decisions).

Well, that’s it for permutations and combinations. I hope you’ve enjoyed this week on counting the outcomes you could face at your business!

Outlier monitors your business data and notifies you when unexpected changes occur. We help Marketing/Growth & Product teams drive more value from their business data. Schedule a demo today.

- Outlier was the Strata+Hadoop World 2017 Audience Award Winner.

Here’s Outlier in 30 seconds.

[1] 0! = 1, by convention.

[2] That there are a dozen donuts doesn’t really matter, as long as there are at least four on each tray — because that is how many choices you are making and we are assuming all of the options are available for each choice — the logic is the same.

--

--

Outlier discovers unexpected changes and patterns in your data automatically.