Data Analysis Series: An Overview of FIFA 19 Players Dataset
Finding the most favourable FIFA 19 player
Not Ronaldo, not Messi!

Intro
Not Messi, Not Ronaldo! You may laugh it as people in the picture 😀 , but this statement is true! Of course, it is valid under some conditions 😀 . We will look at those conditions in the next sections of this article.
Data analysis with Python is widespread in data space along with other popular programming languages such as R. However; in this article, we are going to do our research on a popular football game FIFA 19 with Python. You can find the dataset from this link. By the way, let me ask you this;
Who is your favourite player in FIFA 19?
Maybe, after reading this article, you can change your mind 😀 . Additionally, this article is a part of our "Data Analysis Series – An Overview of FIFA 19 Players Dataset" series. Moreover, if you haven’t read our previous article, you can do so from this link.
Objective
Probably you wonder the rest of the sentence when you see "The most…" part of the heading. Now I am explaining it. It is "the most favourable player". It is still vague, but it will be more apparent as we continue. To accomplish this objective, we are going to use pandas, numpy and matplotlib libraries in Python. These libraries are musts of all data analysis. They make your life easier.
Steps in This Analysis
I want to share all the steps here with you before going any further. After, I will explain some of these steps in more detail. The reason I cannot explain all actions here because the article will be very long and challenging to follow some point. However, you can check all these steps in more detail from my Kaggle notebook about this analysis. Here is the link.
Ok, let’s check the steps;
- Load necessary libraries such as Pandas, numpy and matplotlib.
- Load and read the dataset.
- Check your dataset’s general structure.
3.1. You can do this by info() function of pandas.
- Narrow your data frame (dataset) by selecting the columns you want to use.
- Clean your data frame and do your corrections according to your goals in the analysis.
- Filter your data frame and create a new data frame from it. Again, do this according to your goals in the analysis.
- Visualise your final data frame according to the goals.
- Finally, make your decisions. 😀
Details for the Steps
As mentioned above in step 4, most of the time, you don’t work with all data set you to have. You need to select your attributes and work on them specifically. Here is how I do this for our dataset;

After applying the above code, we get more focused data frame to work. The next step is to clean and correct the data frame. When you look at the above result table, you can see that there are €, M and K strings in the Value and Wage columns. These columns data types should be numerical that is why we need to get rid of these symbols and change them with their correspondences such as 1.000.000 for M and 1.000 for K. Here’s how I do this;

After this step, we can now filter our data frame to come closer to our goal! Let’s first apply these filters, and then I will make some comments on them. Here is the code and the result;

As you can see from the filters applied above, my filters are on Age, International Reputation, Potential and Overall attributes. My primary purpose here is that I want players who have significant potentials and overall scores, but at the same time, do not have the highest international reputations because that brings much more cost along with it. Also, I pay attention to those players who are not older than 30 years old.
Lastly, we will create some visualisations from our final data frame so that we can make our decisions. Our first visualisation is related to players’ nationalities. I would like to know the percentages of nationality distribution among the final dataset. Here is how we can do this;

After this, I would like to group all players in our final dataset with their positions such as Goalkeeper, Defender, Midfielder and Attacker. After this categorisation, we will create another graph to look at it. Here is how we can do this;

Let’s check top 3 players from our final data frame to make a decision.

According to the filters we applied so far, we can think our most favourable players for each position as the following;
- De Gea is for the Goalkeeper position.
- M. Hummels is for Defensive position.
- K. De Bruyne is for Midfielder position.
- E. Hazard is for Attacker position.
What would your player choices be under the conditions we applied so far?
Conclusion
Python is mighty when it comes to data analysis, and there are also lots of online platforms which you can start to do your own data analysis such as Kaggle. With the completion of this article, we come to another data analysis series’ end. I hope that you enjoyed while reading it.
Happy coding! Happy analysing!
Note: You can also check this blog post and some other interesting topics on my web page. Additionally, you can also find me in LinkedIn via this link.