Fantasy Premier League x Data Analysis: Being Among the Top 2%

A brief overview of the application I built, in which I have employed data analysis to power my FPL team up the charts

Kunj Mehta
Towards Data Science

--

Edit: App relaunched for the 2023–24 season here.

Fantasy Premier League: A Phenomenon

Non-football or non-sports fans will ask what exactly is Fantasy Premier League (FPL)? Well, let’s start with what FPL is and then see the rules of how the game is played before diving into the code of the data analysis.

Photo by Jack Monach on Unsplash

Overview of the Game

From Wikipedia, fantasy football (and fantasy sports in general) is a game in which participants assemble an imaginary team of real life footballers (sportsmen) and score points based on those players’ actual statistical performance or their perceived contribution on the field of play. Usually, in a particular fantasy game, players are selected from one specific division in a particular country.

So, FPL is the fantasy football league for the English Premier League in England. The original version of what is now Fantasy Premier League was created in England by Bernie Donnelly on Saturday 14 August 1971. The author has been playing the current online version of the FPL for five years now with mixed results. However, the ‘results’ for the game itself have been all but mixed: from around 3.5 million players during the 2014/15 season to 8 million now, the game has been adding around three-quarters of a million players each year.

A very understandable question here would be: What makes this game so attractive and addictive? Well, first of all, it is the inherent allure of football, the English Premier League and the fantasy of managing your own team, all combined. Add to this, mini-leagues that you can play with your friends and the bragging rights that come if you can win these mini-leagues or the FPL itself (if you can manage it, that is). To top it all off, with the amount of money that revolves around the English Premier League, they decided they could afford to invite the winner of the game for a 3-day fully funded trip to England with the honours of watching a live match of their favourite team to go along with it. No wonder every mad English football fan plays it!

Rules of the Game

Now that we have established the significance of the game, let’s take a look at the rules. These will come in handy when we deep-dive into the code as the data analysis has been done keeping in mind the rules of the game.

The premise: the player is given a budget of 100 million to buy a squad of 15 players consisting of 2 goalkeepers, 5 defenders, 5 midfielders and 3 forwards with the added rule of being able to select a maximum of 3 players from any particular team. The cost of a player is predetermined by the game developers, based on the player’s popularity and performance in the last football season.

On top of this, after every round (called gameweek) of games, the user has the option to transfer one player out from his team and bring in another one of the same position if he so wishes, for free. Any additional transfers incur a 4 point penalty. For more nuances about the rules and the game’s scoring system, you can visit this page.

The objective of the game: To have the best-scoring team of players possible every week given the budget and other constraints so that in the long-term, you can accumulate the most points among other players.

The Data

The makers of the game maintain an API that powers their website. This API is publicly accessible here. It contains data about each Gameweek, each team, and statistics on each player in the league. This is why it is a gold mine for data analysis!

We performed some preliminary EDA on this data to test out some of the hypotheses. One of it was whether the position of a team in the league table has a bearing on the performance of the players in that team as far as FPL was concerned. Meaning that whether players belonging to teams in the top half of the table had more FPL points than from players belonging to bottom half teams. If this would have been the case, we would need the teams data from the API. However, the hypothesis turned out to be false. Though true on a very high level, there were some outliers that were too high-scoring to be ignored.

So, having established that team position has more or less no bearing on player performance in the FPL, we just decided to extract the player statistics data for the whole player roster. This included the following information:

['id', 'code', 'first_name', 'second_name', 'web_name', 'team_code', 'player_type', 'status', 'form', 'total_points', 'points_per_game', 'minutes_played', 'goals_scored', 'assists', 'clean_sheets', 'goals_conceded', 'own_goals', 'penalties_saved', 'penalties_missed', 'yellow_cards','red_cards', 'saves', 'bonus', 'now_cost']

Most of the attributes above are self-explanatory. web_name is the name of the player used on the website. player_type is whether the player is a forward, midfielder, defender or goalkeeper. status refers to the injury status of the player. form is an integer that represents the form of the player — the higher the better.

A side note here: Because at the start of the season, there is no data on player performances for that season (obviously!), we have used the previous season’s final week data to make decisions on players and build a team for Gameweeks that occurred before transfer deadline day. Once the teams were finalized and a suitable amount of current season data became available, we started using that to make the decisions as to who to include in the team.

FPL-TeamMaker: The Application

The application which can be found here has two modes: the team-building mode and the transfer mode. The team-building mode allows the user to build a 15 player team with customizations on number of players per team and importance to be given on form and total points left upto the user. The transfer mode allows the user to transfer in a specified number of players into his already existing team, which he inputs. Let’s dive into the details below.

Pandas and PuLP: The Analysis Backend

The intuition behind the analysis was simple: It is best to have (normally expensive) players who are top in the scoring charts but it is also good to have value players in best form at a given time to offset some bad performances by the top-scoring players.

As per the intuition, the two most important attributes for data manipulation become total_points and form. First and foremost though, before performing the analysis and using it to prepare a 15 player team, we remove injured players from the player roster.

injured_players_df = player_df[player_df['status'] != 'a']

Next, based on the intuition to have the top scoring players, we compulsorily try to put in the top scorer for each position first. Then we go onto fill the rest of the 11 players.

top_players_positions_to_be_filled = {'GKP':1, 'DEF':1, 'MID': 1, 'FWD': 1}
top_players_positions_filled = {'GKP':0, 'DEF':0, 'MID': 0, 'FWD': 0}
i = 0
while top_players_positions_filled != top_players_positions_to_be_filled and i < len(top_players_point_sort):
if teams_filled[top_players_point_sort.iloc[i]['team_code']] > 0 and top_players_positions_filled[top_players_point_sort.iloc[i]['player_type']] < 1:
team_cost += top_players_point_sort.iloc[i]['now_cost']
team_points += top_players_point_sort.iloc[i]['total_points']
positions_filled[top_players_point_sort.iloc[i]['player_type']] -= 1
top_players_positions_filled[top_players_point_sort.iloc[i]['player_type']] += 1
teams_filled[top_players_point_sort.iloc[i]['team_code']] -= 1
players_in_team.append(top_players_point_sort.iloc[i] ['second_name'])
i += 1

We fill in the remaining 11 players using linear programming leveraging the Python library PuLP . The LP problem that is solved is maximizing a metric (based on past data) while keeping a constraint on the budget, maximum number of players to be filled in each position and maximum number of players allowed from a team (the last of which is an user-customizable input). The metric that is calculated for each player is a combination of player form and total points and the importance of each is again decided by the user.

def add_players_using_lp(metric, costs, player_type, team, budget, team_counts,positions_filled):

num_players = len(metric)
model = pulp.LpProblem("Constrained value maximisation", pulp.LpMaximize)
decisions = [
pulp.LpVariable("x{}".format(i), lowBound=0, upBound=1, cat='Integer')
for i in range(num_players)
]

# objective function:
model += sum(decisions[i] * metric[i] for i in range(num_players)), "Objective"
# cost constraint
model += sum(decisions[i] * costs[i] for i in range(num_players)) <= budget
# position constraints
model += sum(decisions[i] for i in range(num_players) if player_type[i] == 'GKP') == positions_filled['GKP']
model += sum(decisions[i] for i in range(num_players) if player_type[i] == 'DEF') == positions_filled['DEF']
model += sum(decisions[i] for i in range(num_players) if player_type[i] == 'MID') == positions_filled['MID']
model += sum(decisions[i] for i in range(num_players) if player_type[i] == 'FWD') == positions_filled['FWD']
# club constraint
for team_code in np.unique(team):
model += sum(decisions[i] for i in range(num_players) if team[i] == team_code) <= \
team_counts[team_code]
model += sum(decisions) == 11 # total team size try:
model.solve()
except:
st.info('Player roster has not been updated yet. Please be patient')
return decisions

We follow a similar logic for the transfers mode of the FPL-TeamMaker. The only difference here is that we take in the existing team of the user. We prioritize transferring out injured players. If there are no injured players in the existing team, we calculate the aforementioned metric and transfer out the player(s) that return the least value of the above metric. In place of the removed players, we try to transfer in the top scorer for that position and if that is not possible due to budget constraints, we use LP to find the best value player; the latter part being identical to the team-building mode.

Streamlit: The Frontend

We have used streamlit which is a Python library that provides functionality to create small front-end applications using pure Python. This makes it pretty simple and fast to build a proof-of-concept website, hostable on Heroku via GitHub.

Application Homepage (Image by Author)

As you will see when you visit the page, the first question is the Gameweek for which the team has to be changed. This is to take into consideration the two different datasets being used as mentioned before. Then the user will be asked to select the mode, input the leftover budget, importance to be given to form and total points and maximum players from each team for team building; or number of transfers for transfers mode.

The Results

FPL-TeamMaker was built before the 2020/21 season started and the author has been using it every week since to build the team or transfer players. Judgment of the results as good or bad is left to the reader.

Progress of Team Points vs Average Points on a Weekly Basis (Image by Author)
Team Points vs Average Points per Week. There are only 3 weeks where team points are less than average points (Image by Author)
Overall Rank and Gameweek Rank of my Team over the Weeks (Image by Author)

As you can see from the above images, (i) the team using the FPL-TeamMaker has consistently outperformed the average from the start; (ii) there were only three Gameweeks where the team’s points were less than the average points; (iii) The team has managed to stay in and around the top 2% of players for 10 weeks now and that too, after a disastrous last week; and (iv) 5 / 15 players currently in the team make the cut for the overall Dream Team.

Conclusion

You can use the FPL-TeamMaker for the benefit of your own team. The author will be regularly updating the point progress of his team via the graphs on the page

A bonus tip about FPL-TeamMaker: Use the team-building mode when you use your wildcards.

You can find the code for FPL-TeamMaker here and view the points of the FPL-TeamMaker team on FPL here. The team sits at the ranking of 1,34,956 at the time of writing this.

I would love to connect with you on Linkedin!

--

--