Sports analytics is a major subfield of Data Science. The advancements in data collection techniques and data analysis have made it more appealing to the teams to adapt strategies based on data analytics.
Data analytics provide valuable insight into both team performance and player performance. If used wisely and systematically, data analytics is most likely to take the teams ahead of the competitors.
Some clubs have an entire team dedicated to data analytics. Liverpool is a pioneer in using data analytics which I think is an important part of their success. They are the last Premier League champion and the winner of the Champions League in 2019.
In this post, we will use Pandas to draw meaningful results from German Bundesliga matches in the 2017–18 season. The datasets can be downloaded from the link. We will use a part of the datasets introduced in the paper "A public data set of spatio-temporal match events in soccer competitions".
The datasets are saved in JSON format which can easily be read into pandas dataframes.
import numpy as np
import pandas as pd
events = pd.read_json("/content/events_Germany.json")
matches = pd.read_json("/content/matches_Germany.json")
teams = pd.read_json("/content/teams.json")
players = pd.read_json("/content/players.json")
events.head()

The events dataframe contains details of events that occurred in matches. For instance, the first line tells us that player 15231 made a "simple pass" from the location (50,50) to (50,48) in the third second of the match 2516739.
The events dataframe includes player and team IDs but not the player and team names. We will add them from the teams and players dataframes using the merge function.

The IDs are stored in the "wyId" column in the teams and players dataframes.
#merge with teams
events = pd.merge(
events, teams[['name','wyId']],left_on='teamId',right_on='wyId'
)
events.rename(columns={'name':'teamName'}, inplace=True)
events.drop('wyId', axis=1, inplace=True)
#merge with players
events = pd.merge(
events, players[['wyId','shortName','firstName']],
left_on ='playerId',right_on='wyId'
)
events.rename(columns={'shortName':'playerName', 'firstName':'playerFName'}, inplace=True)
events.drop('wyId', axis=1, inplace=True)
We merged the dataframes based on the columns that contain IDs and then rename the new columns. Finally, the "wyId" column is dropped because IDs are already stored in the events dataframe.

Average Number of Passes per Match
The teams that dominate the game usually do more passes. In general, they are more likely to win the match. There are, of course, some exceptions.
Let’s check the average number of passes per match for each team. We will first create a dataframe that contains the team name, match ID, and the number of passes done in that match.
pass_per_match = events[events.eventName == 'Pass'][['teamName','matchId','eventName']]
.groupby(['teamName','matchId']).count()
.reset_index().rename(columns={'eventName':'numberofPasses'})

Augsburg made 471 passes in match 2516745. Here is the list of top 5 teams in terms of the number of passes per match.
pass_per_match[['teamName','numberofPasses']]
.groupby('teamName').mean()
.sort_values(by='numberofPasses', ascending=False).round(1)[:5]

It is not a surprise that Bayern Munich has the most number of passes. They have been dominating the Bundesliga in recent years.
Average Pass Length of Players
A pass can be evaluated based on many things. Some passes are so successful that they make it extremely easy to score.
We will focus on a quantifiable evaluation of passes which is the length. Some players are very good at long passes.
The positions column contains the initial and final location of the ball in terms of x and y coordinates. We can calculate the length based on these coordinates. Let’s first create a dataframe that only contains the passes.
passes = events[events.eventName=='Pass'].reset_index(drop=True)
We can now calculate the length.
pass_length = []
for i in range(len(passes)):
length = np.sqrt(((passes.positions[i][0]['x'] -
passes.positions[i][1]['x'])**2) +
((passes.positions[i][0]['y'] -
passes.positions[i][1]['y'])**2))
pass_length.append(length)
passes['pass_length'] = pass_length
The groupby function can be used to calculate the average pass length for each player.
passes[['playerName','pass_length']].groupby('playerName')
.agg(['mean','count']).
sort_values(by=('pass_length','mean'), ascending=False).round(1)[:5]

We have listed the top 5 players in terms of the average pass length along with the number of passes they completed. The number of passes is important because making only 3 passes do not mean much with regards to the average. Thus, we can filter the ones that are less than a certain amount of passes.
Average Number of Passes for Win and Not-Win
Let’s do a comparison of the average number of passes between win and not-win matches. I will use the matched of B. Leverkusen as an example.
We first need to add the winner of the match from the "matches" dataframe.
events = pd.merge(events, matches[['wyId','winner']], left_on='matchId', right_on='wyId')
events.drop('wyId', axis=1, inplace=True)
We can now create a dataframe that only contains events whose team Id is 2446 (ID of B. Leverkusen).
leverkusen = events[events.teamId == 2446]
The winner is B. Leverkusen if the value in the "winner" column is equal to 2446. In order to calculate the average number of passes in the matches that B. Leverkusen won, we need to filter the dataframe based on the winner and eventName columns. We will then apply groupby and count to see the number of passes per match.
passes_in_win = leverkusen[(leverkusen.winner == 2446) & (leverkusen.eventName == 'Pass')][['matchId','eventName']].groupby('matchId').count()
passes_in_notwin = leverkusen[(leverkusen.winner != 2446) & (leverkusen.eventName == 'Pass')][['matchId','eventName']].groupby('matchId').count()

We can easily get the average number of passes by applying the mean function.

Although making more passes does not mean a certain win, it will help you in dominating the game and increasing your chances to score.
The scope of sports analytics extends far beyond what we have done in this post. However, without getting familiar with the basics, it will be harder to grasp the knowledge of more advanced techniques.
Data visualization is also fundamental in sports analytics. How teams and players manage the pitch, the locations of shots and passes, and areas of the pitch that are covered the most provide valuable insight.
I will also write posts about how certain events can be visualized on the pitch. Thank you for reading. Please let me know if you have any feedback.
References
[1] Pappalardo et al., (2019) A public data set of spatio-temporal match events in soccer competitions, Nature Scientific Data 6:236, https://www.nature.com/articles/s41597-019-0247-7