
When your plan to become a professional Data Scientists begins to materialise, you will often have to answer the infamous question: "So, what do you do?". If you want to become a machine learning expert, you will have a hard time trying to explain what machine learning is to colleagues who do not have a tech background.
Professor Tom Mitchell, a renowned computer scientist at the Carnegie Mellon University, defines machine learning as:
"A computer program that learns from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E." [1]
Honestly, his definition is not going to get you very far in any informal conversation. Also, as a Data Scientist, you will often have to explain technical terms to non-tech audiences. So, whenever I find myself explaining what I do, I use the same technique that my philosophy teacher once used with his students: the football analogy. If you are in North America, then you can call it the soccer analogy. Even if you do not like football, it seems that people can relate to the game and its rules one way or another.
Hopefully, the football analogy will help you to either understand or explain machine learning to others.
The players (data)
It seems so obvious, but without players, there is no football game. It does not matter whether you are playing at Wembley stadium at the professional level or on the street with friends. Without players, those places are just an empty football field and a typical street. As for machine learning, data are like players; without data, there is nothing to be done. However, not every dataset is the same, just like players. Cristiano Ronaldo and Lionel Messi are great players, and they will exceed one’s expectations as to what a great football match should be. That would not be the case if I were to play football. Therefore, good players make a great show. Likewise, consistent with the famous saying in Data Science, "garbage in, garbage out". No matter how good your programming skills are or how much you know about math, your machine learning project is likely to disappoint your team without a useful dataset.
Football managers (data preparation)
Football managers are crucial to the success of a football team. Even though England’s national football team has the luxury to select top players, it has not won a World Cup since 1966 [2]. The person in charge of deciding who is going to the world cup is the manager. He is also responsible for proving guidance to players and influencing their training routines. This is a time-consuming process, and if not done correctly, the team will not be ready for the next championship.
A study reported that Data Scientists spend roughly 80% of their time on data preparation and data cleaning [3]. Data professionals have to transform their datasets into a format that Machine Learning models can learn from (e.g. normalising data, dealing with null-values, etc.). These are not the most exciting aspects both for data scientists and football professionals.
Football tactics (Machine learning models)
To win a championship, a team has to change tactics according to each opponent. For example, if the US National football team were to play Germany’s four-time World Champions, they would likely set up a robust defence system. If the US team is playing against Iceland’s football team, they might want to set up a strong attack strategy with different aggressive tactics. So, a well-trained team, with the right tactic, after 90 minutes, is likely to score some goals and win the match.
In our world, machine learning practitioners have to decide which algorithm or model to apply, given a particular dataset and desired outcome. For example, machine learning professionals choose a predicting model according to the problem: classification models are about predicting a label, whereas regression is about predicting a quantity. Therefore, understanding the proprieties and techniques available are critical to a successful project. Here are some machine learning models for you to check out later: K-Nearest Neighbour, Logistic Regression, Naïve Bayes Classifier and Random Forest.
Football equipment (Hardware and Software)
Different football positions require various equipment and training. Goalkeepers, for example, are the only ones who can touch the ball with their hands. So, they need (special) gloves and unique physical training, compared with other players who have to run back and forth for 90 min and try to score a goal using their forehead. Also, teams with great sponsors have the luxury to hire dieticians, medical professionals and even Data Scientists to analyse performance data. Ultimately, equipment and unique professionals contribute to the success of a team in a World Cup.
Similarly, processing a small dataset (1000 rows x 5 columns) to create some graphs may run on a standard laptop using MS Excel, but extracting data from multiple servers and processing millions of rows requires specific Programming languages Python and high-performance equipment with extraordinary computing power.

Different leagues (Domain expertise)
I would argue that wherever you go, there is always someone playing football. It might be kids/adult, men/women, indoor/amateur, online/outside or amateur/professionals. It doesn’t matter; there is always someone playing. Also, you will encounter an enormous variation in skill levels. Different skill levels and types of games are not a downside of football; they represent the diversity and the inclusiveness of the sport. Each skill level or type of match addresses a particular demand or need. Some people like to play outside on a real grass field, whereas others enjoy playing online with a couple of close friends. That is ok, and these individuals specialise in a particular type of football.
Machine learning is just like football. Different professionals have different expertise and work in their respective domains, for example, the business and corporate domain (financial market) and the academic and technical domain (researching at universities to develop new algorithms).
Conclusion
If you are in the process of becoming a machine learning professional, you will inevitably have to explain what you do to people with different backgrounds. So, having a simple and effective analogy will help you make machine learning more accessible to them. This is where the football (soccer) analogy comes in. Focus on what the general audience know about football and make easy-to-remember links with machine learning. Hopefully, now, you have an entertaining analogy to a complex topic that is part of our daily lives.
Thanks for reading. Here are some articles you might like:
Trends in Data Science That Will Change Business Strategies
Increase Productivity: Data Cleaning using Python and Pandas
References:
[1] Mitchell, T. (1997). Machine Learning. McGraw Hill. p. 2.
[2] World Cup Winners https://en.wikipedia.org/wiki/FIFA_World_Cup
[3] Data Cleaning https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/?sh=6bba28296f63