MONTHLY EDITION

Data Science is a vast field that continuously expands into new industries, offering a multitude of valuable products and services. The sports Industry – a very profitable one—could not be absent from that opportunity.
Τhe move to extract meaningful insights from sports data is not that recent. Data analytics entered the sports field back in the early 1990s and ever since, everyone—from individual athletes to major leagues—has been using it in the service of performance, marketing, and other goals. These days, when cutting-edge technology along with state-of-the-art machine and deep learning models are extensively leveraged, sports data analytics sometimes promises too much. Still, by the end of 2022 it is expected to grow into a $4-billion industry!
To draw a high-level picture of the ways data science is used in sports, we can divide the field into two main areas:
- Data Analytics – the broad field of manipulating vast datasets to generate meaningful insights around stats like match results and player transfers. It is mainly conducted by clubs, independent analysts, or even universities, and serves either the clubs themselves or fans (in the context of betting).
- Sports Science – a special field which brings together a number of applications that help improve clubs’ and players’ efficiency. Here we observe techniques for optimizing a team’s play (player position, etc.), players’ capabilities (e.g. free kicks), and health or fitness (e.g. injury prediction). All of these target the same goal: cost reduction or payoff maximization.
On the practical side of the spectrum (dear TDS reader: here’s some food for thought for your next project 😉 ), some typical cases where Machine Learning algorithms can be used include:
- Classification
Load a dataset of match-related data into a classifier (potentially a neural network) to predict future games’ results (win, lose, draw)—it’s essentially a multi-label classification problem.
- Regression
Manipulate a player-based dataset that explains any of their capabilities (i.e. acceleration), and predict its respective level during a match. These kinds of algorithms are often used by a team’s medical and training staff in order to better monitor players and create their training regimen.
- Clustering
Aggregate a dataset of player performance during games (e.g. in football, this might mean passing accuracy, distance covered, etc.), segment it by keeping only the winning instances, and cluster it in such a way that the best samples are contained in one or two clusters. On the now-clustered dataset, train a classifier to predict the past games’ labels for any players of interest. Those whose performance assigns them to the best cluster could be considered a prime target for a transaction!
- Computer Vision
It’s all about making computers capable of gaining high-level understanding from digital images or videos. As fairly new interdisciplinary scientific field, it had little existing literature until recently. But plenty of new papers deal with this type of complex event recognition. In the context of sports, we could see algorithms being used to detect players’ positions and manoeuvring on the pitch, and to infer insights on how to build offensive and defensive strategies.
All in all, data and sports go hand-in-hand: players, managers, coaches, and fans make decisions based on analytics. That’s why plenty of sport-oriented data jobs emerged in recent years (i.e. Sports Analyst, Data Scout, etc.) with one shared goal: to demystify this lucrative industry, one dataset at a time…
Here are some of the best sports-and-data-related posts you can find in the TDS archive.
Gerasimos Plegas, Volunteer Editorial Associate at TDS
Can AI Make You a Better Athlete?
Using Machine Learning to Analyze Tennis Serves and Penalty Kicks
By Dale Markowitz – 11 min
Embedding the Language of Football Using NLP
Using state-of-the-art NLP algorithms to build a representation for future machine learning solutions in the sports analytics domain
By Ofir Magdaci – 13 min
Studying up for the Tokyo 2021 Olympics with SQL
Querying a PostgreSQL Database with Python, displaying results with Pandas, and visualizing with Matplotlib
By Sejal Dua – 18 min
An end-to-end machine learning project with Python Pandas, Keras, Flask, Docker and Heroku
Predicting sport scores, from data wrangling to model deployment
By Ryan Lamb – 7 min
Can a Data Scientist Replace an NBA Scout? ML App Development for Best Transfer Suggestion
Using the NBA API to create your own ML models and predict the best player transaction
By Gerasimos Plegas – 12 min
Stats for Baseball Fans: Pitching Edition
A data scientist shows that ERA is the most important stat to look at as a casual fan
By Courtney Perigo – 10 min
How to Visualize Hidden Relationships in Data with Python – Analysing NBA Assists
Manipulating & visualising data with interactive shot, bubble & Sankey charts for insights with Plotly (code & data in my GitLab repo)
By JP Hwang – 12 min
Understanding the Importance of First Serve in Tennis with Data Analysis
Can we judge the performance of a tennis player based on his first serve?
By Andrea Cazzaro – 9 min
Sports Analytics: an exploratory analysis of international football matches-Part 1
Having the possibility to study the market of sports via powerful analytics tools is a great added value.
By Valentina Alto – 9 min
Service, Point Lead, and Consecutive Points in Badminton Games
How specific metrics affect players’ mentality and performance across badminton’s give categories
By Xiaoxiang Ma – 8 min
The Beautiful Game
Predicting the Premier League with a random model
By Tuan Nguyen Doan – 7 min
Pedaling Through the Peloton with Python
A Breakdown of the Tour de France Using Foundational Data Analysis
By Will Crowley – 12 min
Finally, this is a great moment to welcome all the fantastic writers who joined us in the past month – it’s so good to have you on board here at TDS! They include Divya Gopinath, Joyita Bhattacharya, Nina Sweeney, Quoc Tien Au, Milan Leonard, David Ndukwu, Ph.D., Kheirie Elhariri, Ashok Chilakapati, Daniel Herkert, Daniel Guzmán, Marcello Politi, Field Cady, Hugo Tessier, Shahab Mohaghegh, Briti Gangopadhay, Jessica Dafflon, Chris Beckett, Emil Rijcken, Christophe Blefari, Clive Siviour, Anouk Dutrée, Nifesimi Ademoye, Lily Wu, Thomas Baumgartner, Raveena Jayadev, Heiko Onnen, Amy Forza, Margaux Masson-Forsythe, Bryce Murray, PhD, Honghan Wu, to name just a few. We invite you to take a look at their profiles and check out their work.