The world’s leading publication for data science, AI, and ML professionals.

Data Science at The 2022 World Cup

How Data Levels the Playing Field

Opinion

The 2022 FIFA World Cup is well underway! One of the things I love about sport, in general, is that in so many ways, each game, season, and tournament, can be seen as a microcosm of the business world.

From a Data Science perspective, there’s a big data revolution underway in football. The biggest teams in football are either building or bulking up their Data Science departments. Broadcasts are starting to introduce some meaningful statistics that have, at least some, substance. Analytics websites are popping up everywhere with vast quantities of interesting data to sink your teeth into [1].

It’s a data revolution that happened much earlier in some other sports. In particular, in baseball where many Data Scientists reference Michael Lewis’s Moneyball and Nate Silver (The Signal and The Noise, __ FiveThirtyEight, PECOTA) as inspiration for what Data Science is capable of in both sport and Business.

But much of what made statistics so successful in baseball doesn’t translate easily to football. Baseball is made up of discrete events. Football is intrinsically fluid. It’s hard to break a football game into discrete events, though much progress has been made. From a statistical point of view, football has an extra layer of abstraction compared to baseball.

Fluidity and abstractness are a big part of what slowed down the Data revolution in football. But, it also makes a much better mirror towards Data Science in business. In business, most of what Data Scientists are trying to do can be boiled down to turning some fluid, abstract, ill-defined concept, into concrete models and numbers that can be more easily understood.

That’s not to bash baseball. The success of Data Science in baseball was a necessary step for the building of Data Science in football, and even in the business community. In fact, The Signal and The Noise resonates with me in many of the challenges I have faced in Data Science. Nate Silver’s success in baseball and politics makes me feel that there’s such a powerful role for Data Science solutions in the world today.

So, while Data Science is still building momentum in football, there are some fascinating trends emerging at the World Cup that speak to Data Science as a whole. One of which is the uptick in major upsets. In the first 32 games of the World Cup 2022, there were 11 matches that could have potentially produced a significant upset and 2 that actually did, while the World Cup 2018 had produced 0 upsets at the same stage.

As data enters a new field we often see entrenched competitors hold onto their traditional behaviours and suffer for it. That’s because those traditions typically stem from the same set of entrenched competitors. Those traditions are what made those teams, those businesses, successful in the past. And those traditions, while often blindly followed, are generally the embodiment of institutional knowledge. They are behaviours that have slowly evolved, and continue to evolve, toward optimal solutions. That slow evolution protects entrenched competitors from dangerous mistakes but also inhibits their ability to innovate as quickly as the underdogs.

The power of Data Science is that it offers a faster way to an optimal solution. It lets the underdog quickly learn things that entrenched competitors spent dozens of years figuring out. Fundamentally, Data Science, done well, lets us quantify situations and quickly pull out meaningful patterns without having to experience each situation ourselves. With Data Science we can learn faster than ever before. We can level out the playing field for the underdogs.

So, for the entrenched competitor, it is a difficult situation when the data disagrees with existing traditions. Innovation, and learning, lives in those disagreements, but so do bad recommendations and mistakes. Furthermore, it can be embarrassing for entrenched competitors to act on innovative recommendations, because it’s an admission that some of their institutional knowledge is incorrect.

Less entrenched competitors are more open to the innovative solutions recommended by Data Science because they are not beholden to the standard traditions. In other words, the data doesn’t contradict their institutional knowledge, so there’s no implied admission of error.

However, as Data Scientists we can’t discount the traditions of a business. We need to respect the institutional knowledge behind their common behaviours. We need to take a Bayesian approach to understanding our clients. When we uncover an opportunity for innovative action by a business, we need to understand that we’re potentially over-turning a huge amount of institutional knowledge. We need to ask ourselves:

  1. Where did that tradition come from, and how much evidence does the business have that it is actually a good idea?
  2. How confident are we that the innovative recommendation is correct?
  3. Is our model solid? Is the data behind it clean and clear?
  4. Are we sure we asked the right question to begin with?

In football, an interesting example is the time of substitutions. Traditionally, a losing team begins their substitutions around the 60th minute of a match, which is a long-held practice. But, is that the right practice? Any analysis of optimal substitution timing should be excited at the possibility of important innovation – maybe 45th-minute substitutions are significantly better – but they also need to pay respect to the institutional knowledge that built that tradition.

One key rule of thumb to fall back on is that good analytics should follow an 80/20 rule. If we’re working in a successful business then about 80% of the recommendations coming out of our models should validate existing behaviours, and 20% should be innovative.

That is to say that 80% of the recommendations or insights might slightly modify existing behaviours, but they won’t change them in a big way, and thus won’t have a large impact on business performance. The validating 80% enables business leaders to take action with confidence and builds our confidence that we have good data, a solid model, and that we’ve asked the right questions. The innovative 20% is where we can really get excited that there’s an opportunity to have a big impact on business performance.

If we are well below the 80% validation threshold, then there is cause for concern. That means we’re about to tell the business that a lot of what they have ‘learned’ is wrong. It’s certainly possible, but we need to be extremely careful. We need to be Bayesian about it and build our confidence, from where the business currently is to what we’re recommending. We can’t just switch off a big collection of behaviours and switch on a whole new collection without having a deep understanding of how we got there. In fact, if we’re way below the 80% threshold then we should consider the model as a whole – are there fundamental issues that could be leading us astray?

Even if we’ve met our 80% validation threshold our innovative 20% should be considered from a Bayesian perspective. The fact that our model is 80% validating gives us some confidence that things are working well, but we still need to recognize that the innovative 20% could be incorrect, as we’ve said it contradicts some strength of existing institutional knowledge. There are many ways we can build that confidence, we can experiment (e.g. act fast and break things) where the stakes are low or the business has the risk appetite to do so, or through additional analysis where needed.

Going back to our substitution time example, it turns out that modelling suggests the optimal time for a trailing team to make the first substitution is in the 58th-minute [2]. That’s close enough to the existing tradition that it would go into our 80% ‘validating’. So, on one hand, there’s not a big innovation there, but the cool thing is that the data is showing institutional knowledge to be more or less correct on that topic. Something we might expect in this scenario, given how important of a decision it could be for the match.

And while validation is great, businesses do need to expect disruption from Data Science teams. The innovative 20% is only valuable if the business is willing to implement the changes and move in a new direction. If they’re not then the less entrenched competitors who are more willing to leverage those new insights will beat them. In the world cup, we’ll continue to see more upsets as the underdogs one-up the strategies of the existing elite.


As Data Scientists we should be excited about our unique position to move Sports, and business, in new and innovative directions. But, let us also remember that innovation means overturning tradition. And that most traditions come from a wealth of knowledge and experience.

When we recommend innovation we are signalling that something is wrong with our organization’s institutional knowledge base. That doesn’t mean we’re wrong, but it does mean we need to be careful. We need to understand that those traditions we are overriding likely didn’t arise out of nothing. We need to weigh the strength of the evidence supporting them against the strength of our new recommendations. We need to be Bayesian not only in our models but in how we roll out our recommendations.

Get in touch

Feel free to contact me on LinkedIn for more perspective on the data science field.

References

[1] Lyttleton, Ben. Data and Decisions in Soccer.

[2] Myers, Bret. (2012). A Proposed Decision Rule for the Timing of Soccer Substitutions. Journal of Quantitative Analysis in Sports. 8. 11–11. 10.1515/1559–0410.1349.

[3] Cox, Michael. World Cup shocks: Do group stage surprises make for a less entertaining tournament?


Related Articles