The world’s leading publication for data science, AI, and ML professionals.

An introduction to SQL using FPL data

A guide to some SQL basics using data from the Fantasy Premier League

Photo by Nathan Rogers on Unsplash
Photo by Nathan Rogers on Unsplash

SQL is a very useful skill to know for anyone who works with Data. This is a tutorial for those who want to learn some of the basics and are interested in doing so by working with Fantasy Premier League data.

The data

In this tutorial I use data from https://github.com/vaastav/Fantasy-Premier-League. This is an FPL library for getting gameweek data. Previous gameweeks are available in the repository. I use a file called merged_gw. This has a row for each game for each player.

Tools

In this tutorial I use DB Browser for SQLite ( https://sqlitebrowser.org/). I chose this, and Sql Lite generally, because the setup is very straightforward. DB Browser for SQLite will let us create our DB, add the data, and then query the data.

Creating the database and tables

The first thing we need to do is create a database. In DB Browser:

  1. Go to File > New Database.
  2. Save the file – I name it FPL.db

Next, you’ll be presented with an option to create a table. Here we just click cancel as we will take advantage of DB Browser’s ability to define the table based on the CSV we want to import and then import the data.

  1. Go to File > Import > Table from CSV.
  2. Choose the file and click Open
  3. Give the table a name – I name it _PlayersGWs
  4. And check the box to say that the first line in the CSV represents column names (these will become our table name columns).

Doing this creates the table. The application also shows the queries used to create the table. CREATE TABLE is followed by the name of the table we are creating and then in parentheses the name of each column followed by the data type – with each column name separated by a comma:

CREATE TABLE "Players_GWs" (
 "name" TEXT,
 "position" TEXT,
 "team" TEXT,
 "xP" REAL,
 "assists" INTEGER,
 "bonus" INTEGER,
 "bps" INTEGER,
.........

Exploring the data

To execute SQL, we go to the Execute SQL tab:

All the queries in the following examples are executed here by entering the query and clicking the run button in the toolbar (or CTRL + Enter):

SELECT BASICS

We use SELECT to query the database. Here is a basic SELECT statement to get data from the database. We say which columns we want to select and from which table. The * means we select all columns.

SELECT * FROM Players_GWs

LIMIT

This returns all our rows and all columns. Not very useful as there are 20,700 rows. If we wanted to take a quick look at the kind of data we have, we could run the above with LIMIT to only return a specified number of rows. Here we return 5 rows:

SELECT * FROM Players_GWs LIMIT 5

SELECT SPECIFIC COLUMNS

So we have less rows, but by using we still see all the columns, which makes it hard to see the most interesting columns. We can replace with the specific columns we want to view. Let’s add _name, position, team, GW, totalpoints, and value.

SELECT name, position, team, GW, total_points, value FROM Players_GWs LIMIT 5

It is also worth noting that the columns can be in any order we choose.

SELECT position, GW, name, team, total_points, value FROM Players_GWs LIMIT 5

ORDER BY

So far we have a list of records in alphabetical order by name (because that’s how the data in the CSV was and how it was saved to the bale). It would be more interesting to see the top 5 rows based on some metric. We can add the total_points column to the query to sort by it. This needs to go before LIMIT.

SELECT name, position, team, GW, total_points, value FROM Players_GWs ORDER BY total_points LIMIT 5

ORDER BY DESC

So as we can see, the data is from smallest to largest. These are the five players with the lowest points in a single game. We can see the top five highest by adding DESC to our ORDER BY.

WHERE

If we wanted to see this for a particular team, we can filter on team using WHERE. Let’s see for Spurs. So it’s WHERE [the column we want to match]=[value we want to match]:

SELECT name, position, team, GW, total_points, value FROM Players_GWs WHERE team='Spurs' ORDER BY total_points DESC LIMIT 5

LIKE

The WHERE team=’Spurs’ looks for an exact match. We can also use LIKE to match a pattern. For example, say we wanted to match both Man City and Man Utd, we can replace = with LIKE and put ‘Man%’ as what we want to match. This will look for all teams that start with ‘Man’. The % is a wildcard that matches any number of characters after that.

SELECT name, position, team, GW, total_points, value FROM Players_GWs WHERE team LIKE 'Man%' ORDER BY total_points DESC LIMIT 5

GROUP BY

Up until now we have been returning single rows (each representing a game). If we want to see the total of some metric for a particular team or player across all gameweeks, we can use GROUP BY with the SUM function performed on total_points.

SELECT name, team, SUM(total_points) FROM Players_GWs GROUP BY name

We can also GROUP BY multiple columns. Say, for example, we wanted a breakdown by team and then by player.

SELECT team, name, SUM(total_points) AS 'overall points' FROM Players_GWs GROUP BY team, name

In the first GROUP BY example, if the player played for different teams throughout the season, the team column will just show one randomly. But the second GROUP BY would allow us to see their points by each team played for – as we are grouping by both team and the name of the player.

AS

The column name for the column representing points for each player across all gameweeks isn’t the most descriptive. It displays as the function we are performing on that total_points data in the DB. We can use AS to give it a better name.

SELECT name, team, SUM(total_points) AS 'overall points' FROM Players_GWs GROUP BY name

These are some essential query types for getting started with SQL. Here, we only worked with data in one table, but we could, for example, have detailed team information in another table and could then use queries to get data from both tables. I will look at this and more complex queries in a future tutorial.

For further reading, check out the Sqlite docs at https://sqlite.org/docs.html


All screenshots taken by the author. Introductory image by Nathan Rogers on Unsplash.


Related Articles