Keen soccer fans enjoy watching a game of the English Premier League, arguably the best domestic soccer (or for British readers, Football) league in the world. But a neutral spectator might only enjoy watching games between two of the well-known, elite teams. How often can one expect to watch one of these blockbuster games?
To answer this question, a brief overview of the fixture system of the premier league is needed. Each weekend, or matchday, ten fixtures are slated between the twenty clubs participating in the Premier League. While some matchdays occur midweek, especially during peak season, we will use matchday and weekend as interchangeable terms in this article, for simplicity.
During my childhood (the 2000’s), there were four big teams in the premier league out of twenty teams in total: Manchester United, Chelsea, Liverpool and Arsenal. Matchdays which included a "big" game seemed few and far between.
Nowadays, there are six top clubs: the four listed above, with the addition of Tottenham and Manchester City. With only two additional elite teams, there seem to be many more weekends which include at least one blockbuster game. Is this accurate? Exactly how much more common is this now than formerly?
Under one mild assumption, we can utilize Probability theory to assess the likelihood of a weekend including a big game.
The assumption is that all fixture combinations (defined below) are equally likely. While it is conceivable that the governing body who makes the fixture lists ensures that as many weekends include a big game as possible, by "spreading out" the big games, I believe that this is reasonably unlikely given the many other considerations the schedulers must already keep it mind. There are many games in a season and each club plays in multiple competitions, which must all be coordinated so as not to coincide, plus the players must be made available to their national teams when necessary.
With our reasonable underlying assumption, we are ready to move on to the probability. In what follows, a "fixture combination", or "fixture list" refers to a set of 10 matches among the 20 teams, comprising the entire matchday.
Our strategy will be to assess how many fixture combinations are possible, and how many of those combinations include no blockbuster games. This quotient will give us the probability that a weekend contains no big games. The complement of this (this probability subtracted from 1) will give us the probability of a matchday including at least one elite game.
How will we calculate the number of possible fixture combinations? We imagine a list of all 20 teams and rearrange the order in which the teams appear on that list. Then each successive pair of teams listed is taken to be playing each other. This yields 20! = (20) (19) … (2) (1), since there are twenty options for the team in the first position, then 19 options for the second position, etc.
But the above calculation overcounts in two ways:
1) It considers two fixture lists with the same 10 pairings but in a different order as different fixture lists, which is not accurate for our purpose. For any fixture list, there are 10! ways in which to arrange the order of the pairings. Since each fixture list appears 10! different ways in the 20! combination of fixtures, we must divide 20! by 10!.
2) We can arrange each pairing in two ways: Team 1 vs Team 2, or Team 2 vs Team 1. The 20! calculation counts these two fixtures as distinct, which is not accurate. Thus, we have doubled the possible fixture combinations for each pairing we have, for 10 pairings. Therefore, we must divide by 2 ten times.
In summary, our total number of possible matchday fixture lists is:

In general, for t total teams, the number of possible fixtures is:

The above formula assumes that t is an even number, as is the case with most leagues around the world.
If there are 4 big teams, how many of the above possible fixture combinations include no big games?
To calculate this, we imagine 4 spots, "opposite" the 4 top clubs, indicating which 4 out of the 16 smaller teams will play the big teams. (If two big teams play each other, then such a fixture list contains a big game, and we are not counting such fixture lists right now.) There are 16 choose 4 ways to choose the 4 teams who will play the big teams. (16 choose 4 is mathematical terminology for 16!/(4!*12!). We must also remember that for each choice of 4 smaller teams to play the big teams, the smaller teams can be arranged in 4! ways, determining which big team each smaller team plays. So we multiply by 4!. We then multiply by the number of ways in which we can arrange the 6 matches to be played among the remaining 12 smaller teams.
This last calculation follows the template of the above 20 team calculation, yielding: 12!/((6)!)*(2⁶)).
Putting it all together, the number of possible fixture combinations containing no big games is:

In general, for t teams and g good teams, the amount of possible fixture lists containing no big games is:

After cancellation, we obtain the following formula for total possible fixture lists containing no big games:

The above formula assumes that g is less than or equal to t/2. If g is greater than t/2, this formula will not yield a defined answer. This corresponds to the fact that if g is greater than t/2, then a fixture list containing no big games is impossible.
To explain why this is true, consider 11 (or more) big teams in a 20-team league. A big game on any given matchday is certain, as there are then only 20–11=9 smaller teams, not enough to play all 11 big teams. At least 2 big teams will have to play each other each matchday.
Using our above formula for 4 big teams in a league of 20 teams, we calculate that the probability of a weekend with no big games is 454, 053, 600/654, 729, 075 = 0.6934985, or a 69.35% chance.
This leaves a 1–0.6934985 = 0.3065015 = 30.65% probability of having at least one big game on a weekend.
We obtain our results using the following R code:
#Calculate the probability of having at least one big game on a #matchday.
prob_of_good = function(t,g){#t total #teams, g good teams
stopifnot(t%%2==0) #t must be even
if(g>t/2){
return(1) #since impossible to have no good game
}
else{
#fixture lists with no good games
negative = factorial(t-g)/(factorial((t-2*g)/2)*2^((t-2*g)/2))
#total fixture lists
total = factorial(t)/(factorial(t/2)*2^(t/2))
#probability of at least one good game
pos = 1-(negative/total)
return (pos)
}
}
prob_of_good(20,4)
Thus, for g = 4 good teams in a league of t = 20 total teams, as in the Premier League during the 2000’s, the probability of a matchday including at least one match between two of the elite teams was 30.65%.
To gain confidence in our results, we will also simulate the problem, by creating a list of ones and zeros, ones to denote the well-known teams, and zeros to denote smaller teams. We then permute the list, to produce a fixture list and consider each successive pairing to be a match between those two teams. We then count how many permutations contain at least one pairing of two ones, indicating a match between two big teams.
#Simulation
simulate = function(t,g){#t total teams, g good teams
ones = rep(1,g)
zeros = rep(0,t-g)
list = c(ones,zeros) #produce list of g ones, for big teams and #t-g zeros for smaller teams
perm = sample(list,length(list)) #permute the list
for (i in 1:(t/2)){
if(sum(perm[(2*i-1):(2*i)])==2){#If a pairing contains two ones, #for two big teams, return 1,
return (1) # for big match occurring
}
}
return(0) # If no pairing contains two ones, return 0, for no big #match occurring.
}
#many simulations
mult_sim = function(t,g,k){#k simulations
results = rep(0,k)
for (i in 1:k){
results[i] = simulate(t,g)
}
return(mean(results)) #return proportion of fixture lists which #contain at least one big game
}
}
set.seed(41)
simulated_results = rep(20,0)
for (i in 1:20){
simulated_results[i] = mult_sim(20,i,1000000)
}
Simulating 1,000,000 times, we obtain a probability of 30.70%, which is very close to our theoretical probability of 30.65%
How about the probability of a matchday including a good game nowadays, when there are 6 top sides, as is the case currently in the Premier League?
Rerunning our formula for g = 6, we obtain a probability of 65.33%, more than double the previous result. Running 1,000,000 simulations yields a probability of 65.32%, almost identical to our theoretical probability.
To assess whether our simulated results (for 1,000,000 simulations each) match our theoretical results, we now plot them side by side, across various inputs for the number of good teams in a league of twenty teams.

We can see that our simulated results match our theoretical results extremely well, giving us confidence that our calculations are accurate.
We also see, as expected, that with only 1 big team, it is impossible to have a big game, which we defined as a game between 2 big teams. We reiterate that with 11 (or more) big teams, a big game on any given matchday is certain.
Having confirmed the accuracy of our results, we now provide a more detailed plot of the theoretical probabilities of seeing a "big" game on any given matchday, for various numbers of good teams in a league of twenty teams.

Perhaps surprisingly, adding only two additional elite teams has more than doubled the probability of seeing at least one good game on a given matchday, from 30.65% to 65.33%. Adding one more elite team (perhaps Leicester City?) would raise the number of big teams to 7 and raise the above probability even further to 80.19%.
While people may or may not like the improvement and rise to prominence of new teams, it is undeniably excellent news for the neutral spectator.
Next time, (with G-d’s help), we will assess whether this probabilistic analysis, and therefore our underlying model of randomized fixtures, is an accurate model of the fixtures of the Premier League, by assessing real-life fixtures from the history of the Premier League.