Visualizing Football Game Data

With Python, JSON, Matplotlib and Streamlit

Andrii Gozhulovskyi
Towards Data Science

--

Foto by Fabricio Trujillo on Pexels

Explore Data

I have been always excited by nice football players’ heatmaps showing their performance on the field. Let’s try to build our application, where we can select a player and his action and visualize it. The data for analysis was found here in StatsBomb Open Data repository. StatsBomb is a British company that collects and stores football games data. For my app, I took 15 JSON files representing 15 games of the knockout phase of UEFA Euro 2020, which took place between 26 June 2021 and 11 July 2021.

Let’s open and discover the final game between Italy and England.

# Read JSON file
with open('3795506.json', 'r', errors="ignore") as f:
game = json.load(f)
df = pd.json_normalize(game, sep='_')
4796 rows × 122 columns

Let’s take a first look at the Dataframe. It consists of 4796 rows and 122 columns and has such columns, as timestamp, player_name, type_name location, and so on. It tells us the whole story about the game from starting whistle till the end: what happened, who did it, what kind of action it was, where etc. Having a game of about 120 minutes long and a Dataframe of 4796 rows, we get a list of actions captured every 1.5 seconds.

How many actions did players do in that game? And what kind of actions it was? With pd.value_counts we can see counts of unique values for player_name and type_name columns and understand what happened there.

Number of actions by Player Name and Type Name

Marco Verratti was the most active player of the game with his 382 actions. The most frequent actions were Pass, Ball Receipt, and Carry. For my application, I decided to take the four most frequent actions plus Shot action. The idea of the app is to make some drop-down menus, where we can select a game, a player, and one of the actions, and then draw a visualization of this specific action on the pitch.

Create Pitch

The Pitch Plot function was found here in the Friends of Tracking repository. They created a nice function with length, width, units, and linecolor arguments, returning us a ready-to-use football pitch as a subplot. So we just take it and do not stick much to this part.

from FCPython import createPitch
# Create pitch plot
pitch_width = 120
pitch_height = 80
fig, ax = createPitch(pitch_width, pitch_height, 'yards', 'gray')

Pass Plot function

We start the Pass Plot function by selecting rows containing certain type_name and player_name. The type_name equals here to ‘Pass’, player_name will be assigned by a user by selection from the drop-down menu (it will be done in Streamlit App part). Next, we get the location and pass_end_location series and transform them into two lists: (x1, y1) coordinates, where a pass was given away; and (x2, y2) coordinates, where a pass was received. With pyplot.quiver(x1, y1, u, v) method we can now draw arrows representing passes. Also, we assign a blue color for the home team and red color for the away team.

Pass map for Marco Verratti (Italy). Image by author

Thus we have got a function, that returns visualization of all passes made during the game for the selected player. Carry Plot and Shot Plot functions have the same structure and do not have to be described in this article.

Ball Receipt Plot

For the Ball Receipt Plot function, we select rows where type_name == ‘Ball Receipt’. Here we get only one list of (x, y) coordinates and visualize them as dots with plt.Circle(x, y) method. The Pressure Plot function has the same logic.

Ball Receipt map for Harry Maguire (England). Image by author

Streamlit App

Streamlit is a powerful tool every Data Scientist should know. It is extremely simple and at the same time, it gives you the possibility to visualize and share your data insights through a web app. First, we create a sidebar with four dropdown menus and some text just with a couple of code-lines:

# Drop-down menu 'Select Football Game'
st.sidebar.markdown('## Select Football Game')
menu_game = st.sidebar.selectbox('Select Game', games_list, index=14)

# Drop-down menus 'Select Team, Player and Activity'
st.sidebar.markdown('## Select Player and Activity')
menu_team = st.sidebar.selectbox('Select Team', (team_1, team_2))
if menu_team == team_1:
menu_player = st.sidebar.selectbox('Select Player', player_names_1)
else:
menu_player = st.sidebar.selectbox('Select Player', player_names_2)
menu_activity = st.sidebar.selectbox('Select Activity', activities)
st.sidebar.markdown('Select a player and activity. Statistics plot will appear on the pitch.')

Here we let the user specify player_name and type_name by selecting from the corresponding ‘Select Player’ and ‘Select Activity’ dropdown menus. These arguments will define which specifical player and activity have to be shown. Then the plot of activity will be called by st.pyplot function. With st.head, st.write and st.markdown methods we fill up our main area with some text.

Application on Streamlit

To create a Streamlit application you need to install the Stremlit library for your Python. For sharing it you have to register at streamlit.io, make a new app and connect it to the project folder in your GitHub repository. A web app ready-to-use will be automatically created.

References

[1] StatsBomb Open Data: https://github.com/statsbomb/open-data

[2] Create Pitch repository: https://github.com/Friends-of-Tracking-Data-FoTD/SoccermaticsForPython/blob/master/FCPython.py

[3] Irfan Alghani Khalid, How to Analyze Football Event Data Using Python: https://towardsdatascience.com/how-to-analyze-football-event-data-using-python-2f4070d551ff

[4] UEFA Euro 2020 knockout phase: https://en.wikipedia.org/wiki/UEFA_Euro_2020_knockout_phase

--

--