Exploring FIFA

SalRite
Towards Data Science
5 min readMar 24, 2019

--

‘The thing about football — the important thing about football — is that it is not just about football.’ ~Sir Terry Pratchett.

Soccer or Association Football, is not just a game, its an emotion for many. People follow their favorite Clubs no lesser than their Religion! Great Players are celebrated all over the world. But not everyone knows how much are these players’ paid?, what contributes to their Market value? Together let’s try to answer these and such similar questions. So without further due, lets dive in!

The Fédération Internationale de Football Association (FIFA) is an organization which describes itself as an international governing body of association football, fútsal, beach soccer, and eFootball. Using the 2019 Player Dataset by FIFA, we will try to answer some interesting questions.

Kaggle Notebook for same can be found here. And here is the link to Github for your reference (to replicate the below). To completely follow along we would recommend you to Fork the Kaggle Notebook/Github Repo. We have also provided link to the documentations to further understand the technical details regarding the functions used.

Look & Feel of Data

It is very important to get a look and feel of your data before proceeding with all the fancy analysis. Overview of this dataset can be found here. After importing necessary Libraries and using `df.head()` we got the following output (sorry, its not scroll able, its just a view)

Further checking all the attributes (fields) available in our dataset-

It is clear from above, based on our need we certainly need to get rid of certain columns (fields) and further maybe clean the data. But we will leave it for now, and go ahead to define the questions we need to answer using this dataset.

One can see definitely several questions can be asked with this dataset, it depends on the need, perception and the goal one needs to accomplish. Below are the questions we came up with-

Is there a relationship between Market Value and Wages of Players?

After looking at the data, we decided to go for some data cleaning and formatting.Once the preparation was done we plotted Market Value against Wages (as below).

It can be clearly seen, Usually as the Market Value increase Wages increases, but we also have higher wages and close to zero Market Value! Verdict- Market Value do affect Wage of a Player to an extent.

What is the preferred Foot among the players and how does it affect their positioning?

We found of all the players in our dataset less than 25% of the Players are left footed (shown in Bar chart below).

To check whether the preferred foot of the player has any impact on the position of a player, we took the proportion of the preferred foot grouped by Position of a player.

It can be observed from above that the proportion is same for both left and right foot, with a few exceptions. Which means it hardly matters whether you are a lefty or righty the distribution of Positions, the demand for one position over other will be roughly the same.

Further exploring, the top 5 positions as per the Foot (check below), we found that CB (Center Back) is the third most preferred spot with ST (Striker) being in top 5 for both. Though there are some striking differences like Goal Keepers are mostly Right Footed! Verdit- Yes, Foot do have an impact but only little not very substantial.

Furthermore Striker, Goalkeeper and Center-Back are top three positions in terms of no. of players (refer screesnshot below)!

Can we predict the Value of a player based on its attributes (like accuracy, shot power, reactions, dribbling etc)?

Features like ‘Preferred Foot’,’Position’,’Crossing’, ‘Finishing’, ‘HeadingAccuracy’,‘ShortPassing’, ‘Volleys’, ‘Dribbling’, ‘Curve’, ‘FKAccuracy’,
‘LongPassing’, ‘BallControl’, ‘Acceleration’, ‘SprintSpeed’, ‘Agility’, etc are chosen.

Position & Preferred foot columns are encoded using one-hot encoding, and further after formatting data, removing NaNs (NaNs are the Null value for which the data doesn’t exist); we split the data and tried to predict using RandomForestRegressor (an Ensemble learner) and GridSearch (it is like searching one-by-one through the list of optimum parameters for hyper-parameter tuning). We achieved R-Squared Score of ‘0.42’ (model evaluation).

Note- The Model can further be improved, using different algorithms and/or feature engineering.

Also using Mutual Info Regressor, we found the following as Top 5 most important features in deciding the Value of a Player- Reactions, Ballcontrol, Composure, Dribbling, & ShortPassing. Verdict- Not all Features are equally useful, also one can predict the Market value given enough data and attributes of Players.

Further we can also ask fairly straightforward questions from the data (given we have right amount of data). Below are two such questions we attempted, followed by the Conclusion.

Clubs with the highest median wages (Top 11)?

Players with largest release clause (Top 11)?

Conclusion

Given the FIFA 19 player dataset, several questions can be asked. Above we came up with 5 questions that we tried to answer, you may not be interested in same question or may like to explore the dataset from a different perspective. The Exploration we have covered in this blog post is the tip of the ice-berg, with advance machine learning techniques coupled with the right set of questions, a lot can be achieved and understood! Feel free to use the Notebook here or here and play around with the dataset.

Hope you enjoyed reading the Blog Post, and the above EDA gave you an overview of how the a certain dataset can be approached and what are some techniques that can be used.

--

--

Sr. Data Scientist || Finance Enthusiast || Creativist, Novice Blogger, Leisure Writer, Moonly Photographer Personal Finance Site- theritefinance.com