The world’s leading publication for data science, AI, and ML professionals.

Part #1: A statistical analysis on Serie A

How many points do you need to survive in the Italian league?

Photo by Liam McKay on Unsplash
Photo by Liam McKay on Unsplash

Football. The most popular sport in the world. A concentrate of passion, hope and romanticism. Every year thousands and thousands of teams compete in their leagues with different purposes. Some of them are built to win the title. Others just want to not be relegated.

But the answer to their hopes always relies on the same thing: numbers.

One point more and you succeed. One point less and it’s a failure. One goal more and you are the champion. One goal less and you throw away the all season. It’s a matter of details. Every football fan knows that. But can we quantify these ”details”?

In this series of articles, I’ll try to extrapolate these magical numbers looking at the history of the Italian league: Serie A.

Navigation

  • Intro
  • Data Explanation
  • Data Normalization
  • SVM
  • Results

Intro

Individual performance is not that important. We win and lose as a team.

This sentence was pronounced by Zinedine Zidane, current manager of Real Madrid. The reason why I added this sentence to the article is simple: let’s put ourselves in Zidane’s shoes for a moment.

We are 48. In 5 years of career as a manager, we collected 2 LaLiga’s championships, 2 Supercopa de España, 3 Champions Leagues, 2 UEFA Super Cups and 2 FIFA Club World Cups. We are the manager of the best players in the world. We can admire in every game the plays of Ramos, Benzema and Hazard. But still we believe that a single player is not important. In other words, if these players don’t compete as a team, leaving off the pitch their personal interests, they will be doomed to failure.

This sentence is particularly strong if we think that one of these players is literally capable of winning a match in any moment with a single play. A long-distance shot, a tremendous header. In a matter of seconds, the match is settled.

So, why did Zidane say that?

Because – in Football – results are achieved with a broader vision. Winning by taking your only chance of the entire match, can’t be the normality. You can’t always rely on a single player’s invention. You need the whole team to work as a well-oiled system in which every single gear contributes to the common goal. And the goal is not to win the single match but to reach a precise threshold that guarantees the final victory.

That’s why we use numbers. We want to know the value of that threshold.

Data Explanation

Historically speaking, experts tend to set the relegation threshold at 40 points. But where does this number come from?

To answer this question, I collected Serie A final tables from 1955 to 2020. Tables are collected in excel files like the following:

Final table - Season 1963/64
Final table – Season 1963/64

Columns represent: final position (Pos.), team name (Squadra), number of points (Pt), number of played games (G), number of victories (V), number of draws (N), number of defeats (P), number of goals scored (GF) and number of goals against (GS). To these attributes, I also added the difference between number of goals scored and against, called goal difference (DR).

I also divided the tables in 5 groups: relegated teams, survived teams, Europa League qualified teams, Champions League qualified teams and champions. In particular:

  • Champion → 1st position
  • Champions League → 2nd, 3rd, 4th positions
  • Europa League → 5th, 6th positions
  • Relegated → last three positions
  • Survived → all the others

Note: Champions League and Europa League positions, have changed over the years. The number of available positions for these competitions depends on the coefficient set by UEFA for each European country, which changes every year considering the results obtained in the last 5 years. This coefficient depends on the number of victories and good placings of teams belonging to that country in any European competition. Anyway, I decided to apply this division because – in most of the cases – Italy competed with 4 teams in Champions League. The champion is obviously included in the 4.

Data Normalization

Over the years, the number of points assigned to the victory has changed. In particular, from season 1994/95 the winner takes 3 points for each victory. Before that year, it was used to take just 2 points. So, I normalized data in order to reflect this change by simply considering the following equation in every season:

total_points = number_of_victories * 3

so, basically ignoring the reported number of points in the tables.

A trickier normalization was required due to the fact that not all Serie A seasons were composed by the same number of participants. Actually, they varied from 16 to 20 teams.

To solve this discrepancy, I proportionally fixed the number of total points by multiplying them by a coefficient based on the number of teams:

# seasons with 16 teams
coeff = 1.27
# seasons with 18 teams
coeff = 1.12
# seasons with 20 teams
coeff = 1.0

With these two normalization’s procedures, we can think about the 54 points of Bologna in 1963/64 as 85.12 points.

SVM

SVM is a supervised Machine Learning algorithm which is very reliable for regression problems. It uses a technique called the kernel trick to transform data and then based on these transformations it finds an optimal boundary between the possible outputs.

inputs = pd.DataFrame({
'Pt' : x1,
'Dr' : x2,
'Rel' : y
})
clf = svm.SVC(kernel='linear', C=1000, probability=True)
clf.fit(np.transpose([x1,x2]), y)

I decided to use a high penalization factor c in order to shrink the support vectors margin. Moreover, I used a linear kernel: this means that I will get a straight line separating my clusters.

Results

In this first article, I will focus on the lowest positions in the tables: the relegation area.

The question that I will try to answer is: how many points does a team need to not relegate?

Let’s choose two parameters to compare first: number of points and goals scored.

I used the blue color to identify the relegated teams and the green one to identify the survived teams. In magenta we have the decision boundaries computed by the SVM.

First of all, we can soon notice that the threshold of 40 points is a relatively large approximation of the threshold set by our SVM which is slightly lower – 35/36 points. But such a threshold actually diminishes as the number of goals scored grows.

Looking at the scattered plot above we can see that it is not rare that a team with a good number of scored goals, didn’t succeed in not relegating.

Considering that the average relegated team scores 28.49 goals per season, with 31.72 points on average, the probability of such a team to remain in the league is just 15.97%. According to the parameters found by our analysis, a combination of 37 points and 38 goals scored, brings the same probability to 54.25%.

Just to give you an idea of these numbers, I will take as example Crotone in 2016/17 season. After a memorable run-up, Crotone successfully avoided relegation at the very last game, with 34 points and 34 goals scored. Absolutely something unlikely. The probability of such a combination? 25.02%. Therefore, our prediction is coherent with the reality.

But the number of goals scored alone, can’t be considered a good metric. Indeed, most of the times the causes of relegation are on the defensive side. So, I compared the number of points with the goal difference (DR). Results are shown below:

As we can see, the number of total points required to not relegate is higher than before. This reflects the greater weight the defense have on the team’s results. As we are considering the average performance of the team – both on the defensive and offensive side – the goal difference is inversely proportional to the number of points.

Remembering the history of Crotone, we can re-calculate the probability of remaining in the league: with a total of 34 points and a goal difference of -24, Crotone actually had a 23.26% of chance of doing what they did. This is slightly lower than the 25.02% computed above and better underlines the fantastic result obtained by the team.

Generally speaking, a team can reach the survival area with a combination of 36 points and a goal difference of -15 in 53.37% of the times. Again, the threshold of 40 points it is a large approximation of what the reality suggests us. It obviously guarantees a 90% of success but in most cases, less is required.


How many points does a team need to qualify for the Champions League? And to win the title?

In the next articles, we will focus on the other groups of the table. In particular, I’ll analyze the European Cup stats. Thank you for reading!


Related Articles