MONTHLY EDITION

October Edition: Data Science Meets Sports

Exploring the areas within sports that are the most receptive to data science solutions

TDS Editors
Towards Data Science
5 min readOct 1, 2021

--

Data science is a vast field that continuously expands into new industries, offering a multitude of valuable products and services. The sports Industry — a very profitable one—could not be absent from that opportunity.

Τhe move to extract meaningful insights from sports data is not that recent. Data analytics entered the sports field back in the early 1990s and ever since, everyone—from individual athletes to major leagues—has been using it in the service of performance, marketing, and other goals. These days, when cutting-edge technology along with state-of-the-art machine and deep learning models are extensively leveraged, sports data analytics sometimes promises too much. Still, by the end of 2022 it is expected to grow into a $4-billion industry!

To draw a high-level picture of the ways data science is used in sports, we can divide the field into two main areas:

  1. Data Analytics — the broad field of manipulating vast datasets to generate meaningful insights around stats like match results and player transfers. It is mainly conducted by clubs, independent analysts, or even universities, and serves either the clubs themselves or fans (in the context of betting).
  2. Sports Science — a special field which brings together a number of applications that help improve clubs’ and players’ efficiency. Here we observe techniques for optimizing a team’s play (player position, etc.), players’ capabilities (e.g. free kicks), and health or fitness (e.g. injury prediction). All of these target the same goal: cost reduction or payoff maximization.

On the practical side of the spectrum (dear TDS reader: here’s some food for thought for your next project 😉), some typical cases where machine learning algorithms can be used include:

  1. Classification

Load a dataset of match-related data into a classifier (potentially a neural network) to predict future games’ results (win, lose, draw)—it’s essentially a multi-label classification problem.

2. Regression

Manipulate a player-based dataset that explains any of their capabilities (i.e. acceleration), and predict its respective level during a match. These kinds of algorithms are often used by a team’s medical and training staff in order to better monitor players and create their training regimen.

3. Clustering

Aggregate a dataset of player performance during games (e.g. in football, this might mean passing accuracy, distance covered, etc.), segment it by keeping only the winning instances, and cluster it in such a way that the best samples are contained in one or two clusters. On the now-clustered dataset, train a classifier to predict the past games’ labels for any players of interest. Those whose performance assigns them to the best cluster could be considered a prime target for a transaction!

4. Computer Vision

It’s all about making computers capable of gaining high-level understanding from digital images or videos. As fairly new interdisciplinary scientific field, it had little existing literature until recently. But plenty of new papers deal with this type of complex event recognition. In the context of sports, we could see algorithms being used to detect players’ positions and manoeuvring on the pitch, and to infer insights on how to build offensive and defensive strategies.

All in all, data and sports go hand-in-hand: players, managers, coaches, and fans make decisions based on analytics. That’s why plenty of sport-oriented data jobs emerged in recent years (i.e. Sports Analyst, Data Scout, etc.) with one shared goal: to demystify this lucrative industry, one dataset at a time…

Here are some of the best sports-and-data-related posts you can find in the TDS archive.

Gerasimos Plegas, Volunteer Editorial Associate at TDS

Can AI Make You a Better Athlete?

Using Machine Learning to Analyze Tennis Serves and Penalty Kicks

By Dale Markowitz — 11 min

Embedding the Language of Football Using NLP

Using state-of-the-art NLP algorithms to build a representation for future machine learning solutions in the sports analytics domain

By Ofir Magdaci — 13 min

Studying up for the Tokyo 2021 Olympics with SQL

Querying a PostgreSQL Database with Python, displaying results with Pandas, and visualizing with Matplotlib

By Sejal Dua — 18 min

An end-to-end machine learning project with Python Pandas, Keras, Flask, Docker and Heroku

Predicting sport scores, from data wrangling to model deployment

By Ryan Lamb — 7 min

Can a Data Scientist Replace an NBA Scout? ML App Development for Best Transfer Suggestion

Using the NBA API to create your own ML models and predict the best player transaction

By Gerasimos Plegas — 12 min

Stats for Baseball Fans: Pitching Edition

A data scientist shows that ERA is the most important stat to look at as a casual fan

By Courtney Perigo — 10 min

How to Visualize Hidden Relationships in Data with Python — Analysing NBA Assists

Manipulating & visualising data with interactive shot, bubble & Sankey charts for insights with Plotly (code & data in my GitLab repo)

By JP Hwang — 12 min

Understanding the Importance of First Serve in Tennis with Data Analysis

Can we judge the performance of a tennis player based on his first serve?

By Andrea Cazzaro — 9 min

Sports Analytics: an exploratory analysis of international football matches-Part 1

Having the possibility to study the market of sports via powerful analytics tools is a great added value.

By Valentina Alto — 9 min

Service, Point Lead, and Consecutive Points in Badminton Games

How specific metrics affect players’ mentality and performance across badminton’s give categories

By Xiaoxiang Ma — 8 min

The Beautiful Game

Predicting the Premier League with a random model

By Tuan Nguyen Doan — 7 min

Pedaling Through the Peloton with Python

A Breakdown of the Tour de France Using Foundational Data Analysis

By Will Crowley — 12 min

Finally, this is a great moment to welcome all the fantastic writers who joined us in the past month — it’s so good to have you on board here at TDS! They include Divya Gopinath, Joyita Bhattacharya, Nina Sweeney, Quoc Tien Au, Milan Leonard, David Ndukwu, Ph.D., Kheirie Elhariri, Ashok Chilakapati, Daniel Herkert, Daniel Guzmán, Marcello Politi, Field Cady, Hugo Tessier, Shahab Mohaghegh, Briti Gangopadhay, Jessica Dafflon, Chris Beckett, Emil Rijcken, Christophe Blefari, Clive Siviour, Anouk Dutrée, Nifesimi Ademoye, Lily Wu, Thomas Baumgartner, Raveena Jayadev, Heiko Onnen, Amy Forza, Margaux Masson-Forsythe, Bryce Murray, PhD, Honghan Wu, to name just a few. We invite you to take a look at their profiles and check out their work.

--

--

Building a vibrant data science and machine learning community. Share your insights and projects with our global audience: bit.ly/write-for-tds