Useful Pandas Features to Explore Summer Transfers
The summer transfer window offers football teams in Europe the chance to recruit key players to strengthen their respective squads for the forthcoming season. Fringe and disruptive players can also be cast aside and loannees dispatched out to various clubs to gain experience.
To gain insight into the transfer dealings of the 25 biggest spenders in Europe, I scraped and tabulated key information for each team. This information included the expenditure and income for each team. In addition, I also wanted to know the recruitment strategy in relation to players respective ages. The scraped data is shown below.
This tutorial piece will demonstrate how the data can be explored using simple, intuitive Pandas functions and features. Please also note, this information is subject to change and correct as of 13/08/19.
Spain’s La Liga, Italy’s Serie A, Germany’s Bundesliga and France’s Ligue 1 all have transfer deadlines at varying times on the 2nd of September.

1. Converting Object Datatypes into Float Datatypes
Before beginning with any data analysis, it is important to check that each column is a numeric datatype. The df.dtypes attribute reveals, the ‘Income’ column is a Object (string) datatype. Pandas has parsed the ‘Income’ column as a object datatype from my Excel file, because of the non-numeric entry (the hyphen characters) for certain teams, for example Aston Villa at Index 8 as shown above.

To convert this object column, it is necessary to use the to_numeric function, and tell Pandas to convert any invalid input into NaN values by using errors=’coerce’. If you know the NaN values actually represent 0, you can fill them with zeros using the .fillna(0) method with 0 passed as a parameter.
Now, when the datatypes are checked using df.dtypes, the income column is a float datatype and can now be used for numeric based operations later on. In the DataFrame, at Index 8, Aston Villa’s income can now be seen as a 0, as apposed to the hyphen which was previously shown.

2. New Columns and Columns Reorder
The Net difference between expenditure on players and money recouped on player sales is easy to perform, but gives an interesting insight into the financial transfer dealings of the 25 biggest spenders in Europe.
The new column, ‘Net_Difference’ is created by subtracting the Income column from the Expenditure column. For ease of readability, it may be better to have the Net_Difference column close to the two columns in which it refers to.

The columns of the DataFrame are converted to a list, and the DataFrame, df is reassigned according to the indexed columns. Here the first three columns remain as they were, but the last column, denoted by python index -1 follows, as do the final columns with age-specific data. By printing the head of the DataFrame, the information aligns more intuitively for interpretation.

3. Filtering the Data Output from the Describe Method
The describe method gives a numerical data summary of the data set. If however, we want to truncate this output, simply use the .loc function, and filter the rows and columns as desired.
In the output shown below, I would like to see only a snapshot of data specifically about the Ages, and I am only interested in the rows mins to max in this snapshot.

4. Converting Continuous Data into Categorical Data
The Income column can be used to make a new Income category data column. Here, the continuous data in the income column can be binned into intervals ranging from -1–40, 40–80, and 80–350 using the pd.cut function. I assigned these bins the labels: Low, Medium and High.
This represents whether the income received by each club is low, medium or high. Now, a quick glance at the Table can reveal which Teams are securing high income for their respective players.

5. Styling the DataFrame
Styling the DataFrame can add visual impact and aid data interpretation. In the example code shown in the githib gist, I have shown various ways to style the DataFrame. For example, to add a background gradient for a column you can use the .background_gradient attribute, and this gradient will create an informative color scheme indicting trends in Income. The deeper the blue, the higher the Income received.
In addition, I have added a horizontal bar for each column value, to indicate whether that team made either a profit, or a loss in the Net Difference column. A caption can also be added to the DataFrame to inform the audience what the specific information relates to.

6. Recruitment Strategy and referring to a variable in a query () string
Shown below in the first line of the code snippet from the github gist, is an alternative way to re-order the columns.
I have also determined the mean for the age of arrivals and the age of departures.
It would be insightful to know which teams are recruiting players with an age lower than the average age of arrivals across all 25 teams.
To do this, I create a variable called, arrival_mean, and in order to refer to it within my query string, I simply prefix it with a ‘@’ meta-character.
It seems like teams like Real Madrid, Manchester United and even AS Roma are making a concerted effort to sign young players. This is informative of their transfer policies this summer, which all seem to be aimed at signing quality young players with their best playing years ahead of them!


Further to this, the data reveals that the top 25 teams, are recruiting players which are on average approximately 2 years younger than the players they are releasing!
Summary
This tutorial has shown a use-case for Pandas in data analysis and interpretation. Styling and creating categorical columns can simplify data interpretation, whilst referring to variables in query strings can add valuable insight into any investigated data set.