The world’s leading publication for data science, AI, and ML professionals.

Analytics on FIFA 2019 Players!

Football analytics and modelling of the FIFA

Photo by Connor Coyne on Unsplash
Photo by Connor Coyne on Unsplash

CRISP-DM Data Science Football

In this post we will perform simple data analysis and modelling of the FIFA 2019 complete player dataset following the CRISP-DM process . The dataset has been collected Kaggle. Dataset contains 1 CSV file.

FIFA 2019 is football simulation video game developed as a part of Electronic Arts’ FIFA series. It is the 26th instalment in the FIFA series selling over approximately 20 million units.

Let’s dive in!

In a sport like Football, each player adds a significant value to the team’s success. It is important to understand player’s skills. How age could play an impact on potential of the player ? Which player is best at which profile? The study also focuses on evaluating the player’s overall performance based on the performance indicators and how various models evaluates on the prepared data.

Data Understanding

As a second stage of CRISP DM, it is important to explore the data and address data mining questions using data visualizaton and querying. The data set consist of 89 columns but we will limit ourself to the following columns:

Index(['Name', 'Age', 'Overall', 'Potential', 'Value', 'Wage', 'Special',
       'Preferred Foot', 'International Reputation', 'Weak Foot',
       'Skill Moves', 'Crossing', 'Finishing', 'HeadingAccuracy',
       'ShortPassing', 'Volleys', 'Dribbling', 'Curve', 'FKAccuracy',
       'LongPassing', 'BallControl', 'Acceleration', 'SprintSpeed', 'Agility',
       'Reactions', 'Balance', 'ShotPower', 'Jumping', 'Stamina', 'Strength',
       'LongShots', 'Aggression', 'Interceptions', 'Positioning', 'Vision',
       'Penalties', 'Composure', 'Marking', 'StandingTackle', 'SlidingTackle',
       'GKDiving', 'GKHandling', 'GKKicking', 'GKPositioning', 'GKReflexes'],
      dtype='object')
  1. Best players in various aspect?

It can be observed that the following players are rated the best at the respective fields. It can be seen that K. Mbappé has the highest potential, Cristiano Ronaldo has the best skill moves, Naido has the highest heading accuracy etc.

Figure 1: Best Player based on their performance score
Figure 1: Best Player based on their performance score

2. Most Preferred Foot of the Players?

Figure 2: Most Preferred Foot of the Player
Figure 2: Most Preferred Foot of the Player

3. Effect of football foot on player’s potential

FIgure 3: Impact of Foot on player's Potential
FIgure 3: Impact of Foot on player’s Potential

It can be observed from the above plot, that the potential of the hardly depends on whether the player is left foot or a right foot.

3. Does Age have an Impact on Potential?

Figure 4: Age vs Potential
Figure 4: Age vs Potential

It can be observed from the barplot that with increase in age, the potential of the player tends to fall.

Modeling

Performance indicators are combination of attributes that gives details on the selection, overall performance of the player. A heatmap is used to find the how performance indicators effect the player’s overall performance.

Figure 3: Correlation of overall with other performance indicators
Figure 3: Correlation of overall with other performance indicators

It can be observed from the heatmap that the Overall performance is positively correlated with majority of the performance indicators.

Third stage of CRISP-DM is data preparation. The data is cleaned (handling the categorical data and missing values to predict overall performance) and prepared to achieve the outcome. Linear Regression model is built to predict the overall performance of a player based on the performance scores.

Evaluation

Further we have fitted the data into various models – Random forest regressor, K neared neighbours and Decision tree regressor and evaluated the models using the following metrics:

  1. Mean Absolute Error
  2. R Square
  3. Mean Square Error
Figure 5: Evaluation of various models
Figure 5: Evaluation of various models

Conclusion

In this article, we performed simple data analysis on the FIFA 2019 complete player dataset .

  1. We looked at which player is good at a specific performance indicator. For Example: L. Messi is best at Finishing.
  2. We then looked at how age has an influence on the potential of the player i.e. with increase in age, potential of the player decreases.
  3. Build a model that predicts the overall performance of a player, given his skill scores for each of the performance indicator.
  4. Finally, we evaluated our linear regression model against KNearestRegressor, DecisionTreeRegressor, RandomForestRegressor

The findings here are observational, a lot more analysis remains:

How will YOU solve the Problem?

All the codes are available in my Github Repository.


Related Articles