
If you have ever played the game of Catan you’ll quickly realise that when rolling two Dice the number 7 is very common! This brought up a question, which is as follows:
What is the true probability of rolling a sum of 7 with two 6-sided dice? Moreover, what is the probability of rolling a sum of any number with n 6-sided dice?
Here is what I did to find the answers.
Experiments (rolling dice)
Let’s first run a few dice experiments (ie. simply roll some dice). Lucky for us we can use computers to produce millions of dice rolling simulations in a few minutes instead of rolling them ourselves for a few years.


Just by eye-balling the experimental data in Figure 1 we can see a familiar shape emerging as the number of dice increases, a bell curve, also known as a normal or Gaussian distribution. We can work with this, and try to fit a Gaussian to the experimental data. A Gaussian distribution is mathematically expressed as

. The two parameters μ and σ correspond to the mean and the standard deviation of the Probability distribution, they define the central position and the width of the distribution respectively. But what are the best values of these parameters to fit our _n-_dice experimental data? Well, we can infer the most likely parameter values via statistical analysis. We can see from Figure 1 that the probability distributions are symmetric. Using this symmetry we can define the means of the experimental data by simply locating the maximum positions of each distribution. Figure 2 below is a plot of these maximum positions for an increasing number of dice. It can be seen from the figure that a linear correlation exists between the mean μ and the number of dice n with a line of best fit of μ=3.5n.

Now with values for the means we can use the method of least squares to find the values for the standard deviation σ that correspond to the best fitting Gaussians to the experimental data.
Method of least squares (identifying the most likely σ values)
The method of least squares defines a metric to compare how similar two sets of data are, this metric is known as the mean squared error (MSE) which is mathematically expressed as

. MSE is the mean squared difference between the experimental data _(Xᵢ ) and the **Gaussian fitting (P(xᵢ))_ for a given σ value, where n denotes the number of bins in the histogram data and i denotes each bin. So finding the σ value that minimises MSE corresponds to minimising the difference between the Gaussian fitting and the experimental data ie. The best fit. Figure 3 shows the minimisation process for identifying the most likely parameter values (which were μ = 14 and σ = 3.36) for four dice**.

Using the mean squared error to infer the most likely σ values for a range of dice we can plot σ as a function of the number of dice n. Figure 4 shows the plot of σ(n), we can see that the experimental data traces out a √n looking curve. Again, using the method of least squares we can find the best fitting √n curve to the experimental data, this resulted in σ(_n) = 1.75√n._

So, given n-dice we can now use μ(n) = 3.5n and σ(_n) = 1.75√n_ to predict the full probability distribution for any arbitrary number of dice n. Figure 5 and 6 below shows these fittings for n=1 to n=17.


We have so far shown that with a very quick empirical effort we can predict the data amazingly well using Gaussians. However, despite the success in fitting the larger n cases our Gaussian fit is still only an approximation. This is why our fitting is worse for smaller n and the first two cases (1 and 2 dice) are not captured very well by our Gaussian approximation. Let’s see if we can derive the exact solution for the probability of a given total sum fo**r n di**ce!
Warning: This will involve some maths…
Deriving an exact solution
Let us first look at an example of a simpler version of this problem. Let’s consider a two-sided dice, this is essentially what a coin is. So with a coin we have two possible outcomes heads or tails. Now let’s say we have three coins and we ask what are the possible outcomes if we flip these? the possible outcomes are shown in Table 1 below.

There are 8 possible outcomes when the coins are distinguishable and 4 possible outcomes when we consider the coins to be indistinguishable. This indistinguishable distribution is known to be captured by the binomial

. Here, the term H³ represents (H,H,H) and H²T represents the case where two heads and one tails is observed, the coefficients represent the number of combinations that give rise to that given observation, ie. 1 for H³ and T³, and 3 for H²T and T²H. Each coefficient of the binomial can be determined by a neat little equation

where k denotes the number of heads that get flipped and (n-k) is the number of tails. For example, 2 heads and 1 tails (n=3,k=2)

. Combining the binomial coefficient with the probability of flipping each combination gives us the probability of that observation




, where H represent the probability of flipping a heads (H=1/2) and T represents the probability of flipping a tails (T=1/2). The general binomial probability distribution of n coins and flipping k heads and (n-k) tails is given by

. The full binomial for n-coins (H+T)ⁿ can be expressed as a series

. So, with this knowledge now lets get back to considering 6-sided dice, where we label each side as xⁱ (i corresponds to the number on that side). Similar to the 2-sided coins, n lots of 6-sided dice can be fully described by the multinomial

. Note, we can use the exponents to express the total sum of each possible outcome. For example, x¹x¹ =x². This is useful for us because now we have a way of identifying the total sum of each combination. For example x¹x⁶ = x⁶x¹=x² x⁵= x⁵x² = x⁴ x³ = x³x⁴= x⁷. _ These all represent a total sum of **** 7_, and therefore the probability of rolling a 7 with 2 dice is a combination of the probability of rolling any of these: (1,6),(6,1),(2,5),(5,2),(4,3),(3,4).
As in the coin scenario the coefficients of each possible factor of the multinomial f(x, n) multiplied by the probability of getting that factor (ie. (1/s)ⁿ, where s=the number of dice sides) tells us the probability of rolling that factor. We now want to use this to tell us what the probability of getting any given total T as a function of dice n. We can express f(x, n) as

Where s denotes the number of sides the dice have (s=6). Notice that the sum here is a geometric series, geometric series can be alternatively expressed as

So let’s replace the sum with this alternative expression

. The resulting two terms are binomials and can be expanded as binomial series. We saw early when considering the coin that a binomial can be expanded to a series form as

So (1- x ˢ)ⁿ in series form is

. Now (1-x)⁻ⁿ is a little less intuitive to expand as the exponent is _ negative. We can expand this by considering the Maclaurin series expansion of the function, this is a Taylor expansion around the point (x=0)._ The Taylor expansion of a general function g(x) is

. Using the chain rule we can find the derivatives for the case g(x)=(1-x)⁻ ⁿ

. Following the pattern of the series we can generalise for the _l_ᵗʰ derivative of g(x=0),

. So, the full Taylor series of **** g(x) is

and expanded around x=0 we have

. Putting the series expressions derived for (1- x ˢ)ⁿ and (1-x)⁻ ⁿ together we have an expanded series expression for the full multinomial

. Recall that the total sum of the _n-_dice is the exponent of x, so collecting all x terms here the total sum T is given by T = n+sk+l. Lets substitute T into our series expression for f(x,n) with the aim to substitute out l for l = T-n-sk

, clear up the numerator to get

. However, the condition T-sk-n>0 must always be true which means T-n>sk and ((T-n)/s)>k, this inequality flipped is k<((T-n)/s), so this is the upper limit of the summation ie.

. This substitution means the second summation,

is now redundant as the only varying parameter in the summation is k which is already taken into account with the first summation.
Remember dice are discrete so this means that ((T-n)/s) must only take integer real values, and so we must apply the floor function ⌊⌋ which rounds the result of ((T-n)/s) to the greatest integer less than ((T-n)/s). The equation below allows us to determine the collection of coefficients that produce a total sum of T with n s-sided dice.

Assuming the dice are fair we can express the probability of rolling this total sum T as the product of the multinomial coefficient and the probability of each dice side to the power of the number of dice rolled n ,

. This is the function we have all been waiting for! Using this function P(n, s, T) we can plot the exact distributions for n-dice, these are shown below in Figure 7 and 8 as the green lines.


It can be seen from Figures 7, 8 and 9 that our analytical solution fits that data better than our previous Gaussian approximation. We can now capture the exact Probability Distributions, even for n=1 and n=2 cases!

Conclusions
Some might argue that dice probabilities are mundane mathematical problems, and they might well be right! However, what we have covered here contains three important lessons at the heart of all mathematical modelling and data science problems. First, we have demonstrated the power of computers to produce and process experimental data. For example, we produced millions of dice rolling simulations in a few minutes. Second, we demonstrated the effectiveness of statistical inference and regression analysis. We showed this through estimating the experimental data as Gaussian distributions (a proxy model). This approximation allowed us to use a simple form of regression (the method of least squares) ** to find the parameters for the best fitting Gaussians. We then extended and abstracted our proxy model to find how the parameters change as we increase the number of dice. With this information we could extrapolate to predict the distributions for very large numbers of dice without running simulations. Third, and finally, we dived deep into the mathematics of the problem and derived an exact and general solution for the probability distributions. The similarity between these exact probability distributions and our simpler Gaussian approximation further highlights and is proof of the usefulness of statistical inference and regression analysis** in modelling data.
Thanks for reading, now back to the game of Catan!
The probability of dice python notebook:
References
[1] Piaggio HT. Introduction to Mathematical Probability. By JV Uspensky. Pp. ix, 411. 30s. 1937.(McGraw-Hill). The Mathematical Gazette. 1938 May;22(249):202–4.
[2] Weisstein, Eric W. "Dice." From MathWorld – A Wolfram Web Resource. https://mathworld.wolfram.com/Dice.html
[3] Weisstein, Eric W. "Maclaurin Series." From MathWorld – A Wolfram Web Resource. https://mathworld.wolfram.com/MaclaurinSeries.html