The world’s leading publication for data science, AI, and ML professionals.

Analyse Milan Derby data from 1993 with Pandas and Plotly

The next Milan derby will kick off next week, to provide a bit more information and visualisation to my football fan husband as well as…

The next Milan derby will kick off next week, to provide a bit more information and visualisation to my football fan husband as well as building my python learning, I grabbed the data from Kaggle regarding Serie A and Wikipedia, and tried to develop an evidence-based visualisation.

Photo by Izuddin Helmi Adnan on Unsplash
Photo by Izuddin Helmi Adnan on Unsplash

Prerequisite:

Python packages for this analysis:

import Pandas as pd
import glob
import matplotlib.pyplot as plt
import numpy as np
import Plotly.express as px

Step 1: Get the Data

The data used in the current analysis was from Kaggle (https://www.kaggle.com/massibelloni/serie-a-19932017), from where I downloaded all Serie A data from 1993 to 2017 and saved to local folders.

Step 2: Merge the Data

The Serie A data downloaded from Kaggle were presented by year. To get an overview of Serie A, I merged all data first before filtering AC Milan / Inter Milan results.

Here I used the function glob:

##set up the path
path = r'.....SerieA'
all_files = glob.glob(path + "/*.csv")
##read all files
tb = []
for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    tb.append(df)
##combine all files
frame = pd.concat(tb, axis=0, ignore_index=True)

Step 3: Filter out all matches between AC Milan and Inter Milan

According to the dataframe, it had HomeTeam and Away Team (see Figure 1 below). We need to get AC Milan as the home team and Inter Milan as the away team, and also the other way round.

Figure 1: frame.head(10)
Figure 1: frame.head(10)

First, I was trying to get the matches where AC Milan was the home team and Inter Milan was the away team:

df1= frame[(frame['HomeTeam'] == 'Milan') & (frame['AwayTeam'] == 'Inter')]

Then, Inter Milan was the home team and AC Milan was the away team:

df2 = frame[(frame['HomeTeam'] == 'Inter') & (frame['AwayTeam'] == 'Milan')]

To combine these two dataframes into one:

ddff = [df2,df1]
result = pd.concat(ddff)

We are nearly there with the data clean! As I only wanted to know the first few columns of the data, I dropped out the rest columns:

data = result.iloc[:, 0:7]

Step 4: Complete the dataset for visualisation

A few more things I did to make the visualisation easier: 1) add the match data from 1997 to 2020 as those data were missing from the downloaded file; 2) get the "year" of the matches date; 3) add two columns – one as the "winner team", and the other as the "total goals" for each game:

## add data from 1997 to 2020 - a bit messy method was used here:
new_row = {'Date': '15/10/17', 'HomeTeam':'Inter', 'AwayTeam':'Milan', 'FTHG':'3','FTAG':'2', 'FTR': 'H'} 
data = data.append(new_row, ignore_index = True)
new_row = {'Date': '27/12/17', 'HomeTeam':'Milan', 'AwayTeam':'Inter', 'FTHG':'1','FTAG':'0', 'FTR': 'H'} 
data = data.append(new_row, ignore_index = True)
new_row = {'Date': '04/04/18', 'HomeTeam':'Milan', 'AwayTeam':'Inter', 'FTHG':'0','FTAG':'0', 'FTR': 'D'} 
data = data.append(new_row, ignore_index = True)
new_row = {'Date': '21/10/18', 'HomeTeam':'Inter', 'AwayTeam':'Milan', 'FTHG':'1','FTAG':'0', 'FTR': 'H'} 
data = data.append(new_row, ignore_index = True)
new_row = {'Date': '17/03/19', 'HomeTeam':'Milan', 'AwayTeam':'Inter', 'FTHG':'2','FTAG':'3', 'FTR': 'A'} 
data = data.append(new_row, ignore_index = True)
new_row = {'Date': '09/02/20', 'HomeTeam':'Inter', 'AwayTeam':'Milan', 'FTHG':'4','FTAG':'2', 'FTR': 'H'} 
data = data.append(new_row, ignore_index = True)
new_row = {'Date': '18/10/20', 'HomeTeam':'Inter', 'AwayTeam':'Milan', 'FTHG':'1','FTAG':'2', 'FTR': 'A'} 
data = data.append(new_row, ignore_index = True)

Prepare the columns for visualisation:

## Add the column - "Year" of each match
data['year'] = pd.DatetimeIndex(data['Date']).year
data = data.sort_values('year')
## Add the column - "Winner" of each match
data['Winner'] = np.where(data['FTR'] == 'A', data['AwayTeam'], (np.where(data['FTR'] == 'H', data['HomeTeam'], 'Draw')))
## Add the column - "TotalGoal" of each match
data["FTHG"] = pd.to_numeric(data["FTHG"])
data["FTAG"] = pd.to_numeric(data["FTAG"])
data['TotalGoal'] = data['FTHG'] + data['FTAG']

Step 5: Visualisation with Plotly

Here I created two charts:

Bubble Chart:

The chart showing the relationship between "match date", "total goals in each match", and the "winner of the game":

fig = px.scatter(data, x="year", y="TotalGoal",size="TotalGoal", color="Winner",
                 color_discrete_map={'Milan':'red',
                                 'Inter':'royalblue',
                                 'Draw':'lightgreen'},
                 hover_name="Winner", log_x=True, size_max=20)
fig.show()
Figure 2: Bubble chart
Figure 2: Bubble chart

Pie Chart:

The other chart is about the proportion of the winning in the past years. To get the pie chart, the dataframe needed to be re-structured with groupby:

## Restructure the dataframe with groupby
counts = data['TotalGoal'].groupby(data['Winner']).count().reset_index(name="count")
## Pie chart
fig1 = px.pie(counts, values ='count', names ='Winner', title='Milan Derby', color = 'Winner',
             color_discrete_map={'Milan':'red',
                                 'Inter':'royalblue',
                                 'Draw':'lightgreen'})
fig1.show()
Figure 3: Pie chart
Figure 3: Pie chart

Thoughts

By reading these two charts, here are some thoughts on Milan Derby.

  1. It is clear that there aren’t big differences between AC Milan and Inter Milan in winning the matches. Inter Milan won slightly more games than AC Milan;
  2. AC Milan was more likely to win in the matches before 2010 than Inter Milan, while Inter Milan got more chances to win after 2010;
  3. When the total goals of the matches over 4, Inter Milan seems more likely to win the match.

Those thoughts seem aligned with the teams’ performance in those years. I cannot wait to watch the next derby!


Related Articles