Understanding the Importance of First Serve in Tennis with Data Analysis

Can we judge the performance of a tennis player based on his first serve?

Andrea Cazzaro
Towards Data Science

--

Photo by FILMDUDES on Unsplash

Tennis is a dynamic and complex sport. There are several shots involved in a single point, but only one of them is played without the opponent’s influence: the serve. Indeed, the serve gives players the chance to start the point with a concrete advantage. Having a good serve is an essential aspect for every professional player, but can we correlate a good first serve to an excellent performance? If you are a tennis player, you may have heard the say “You are only as good as your first serve”. Is this true?

Dataset

In order to answer our research question, we need good data. Jeff Sackmann founded and maintains Tennis Abstract, an incredible website that offers thousands of datasets on tennis matches. For this analysis I used his dataset on ATP matches played in 2019. The dataset includes data on every match played in 126 tournaments, including Davis Cup matches, for a total of 2.782 matches.

The important columns in this dataset are the ones that show for each player and each match: the number of serve points played, the number of first serves in, the number of points won with the first serve, the number of points won with the second serve and the number of double faults.

Let’s start the analysis.

Performance

The first thing we should analyze is the performance of each player by calculating the number of wins. Players with less than 10 wins have been left out from the analysis, leaving a total of 89 heads in the dataset.

As we can see in the table, the top 10 performers won at least 40 matches, while the top 5 performers won at least 53 matches. If we take a look at the overall performance of the best 5 players, we can conclude that they win 30% more matches than the bottom 5 players of the top 10 list.

The distribution of wins per player confirms that the best 5 performers are outliers, which means that their performance is way above the performance of other players. Hence, the dominance in the circuit is dictated by a handful of athletes.

The serve

Now that we are aware of the performance, we can start looking at the serve. By manipulating the dataset, I measured the percentages of successful first and second serves for each player (successful: the serve landed in the service box). In addition, I calculated the percentage of points won on first and second serve when the serve was successful. Below the data for the best 10 performers.

With these data, we can start analyzing the correlation between performance and serve. The first correlation we should analyze is between performance and percentage of successful first serves, which is considered an important factor.

As you can see from the above scatter plot, there is no significant correlation between the two variables. We need to dig further. Maybe the correlation between performance and percentage of points won on successful first serves can give us more information.

Again, no significance between the two variables. At this point there are two reasons that explain these results: 1) the level in the circuit is very high and performance is not determined by the serve or 2) we should look at probabilities to see if we get more significant outputs.

Analyzing the first serve with a different perspective

We have seen that a high percentage of successful first serves does not predict high performance. Indeed, some players may play a slower first serve to avoid playing the point with a second serve. This is why we need to analyze the correlation between the percentage of successful first serves and the percentage of points won on successful first serves to understand the importance of the two variables.

Here we start having interesting results. The two variables are negatively correlated, which means that the higher the percentage of successful first serves, the lower the percentage of points won on successful first serves. Why? Well, it is hard to say, but we can explain this phenomenon with risk. When we serve, we can choose among three main strategies: 1) we can aim to serve an ace (highest risk of missing the service zone), we can aim to put our opponent under pressure (medium risk), we can aim to just serve within the service box (lowest risk). We can hypothesize that players with a high percentage of successful first serves are not taking enough risk to win the point and, therefore, are not winning many points with their first serve.

The size of the data points in the scatter plot is based on the player’s number of wins. If you take a look at the biggest data points, almost all of them fall within the area between 75 < X < 80 and 60 < Y < 65. This means that the best performers found a good equilibrium between risk and the chance to maximize the win. This will become clearer by calculating the probability of winning the point on the first serve.

The perfect equilibrium

The probability of winning the point on the first serve is given by p1*q1, where p1 is the probability that the first serve will be inside the service box and q1 is the conditional probability that the point is won given that the first serve is successful. This is a very simple calculation that we can do with our two variables.

The best performers (with a few exceptions) are the ones who maximize the probability of winning the point on their first serve. If we take a look at the X-axis, we can see that the top five performers win more than 50% of their serving points with their first serve (except for Medvedev). Indeed, very few players are able to do that and the ones who do are usually big servers like John Isner, who perform worse in rallies.

Maximizing the probability of winning the point with the first serve is done through finding the perfect equilibrium between the percentage of successful first serves and the percentage of points won with successful first serves. Let’s take two players, Roger Federer and Dusan Lajovic, to make an example. Dusan Lajovic’s percentage of successful serves is 70%, while Roger Federer’s is 65%. However, Roger Federer wins 79% of points when his first serve is successful, while Lajovic wins 70% of points. Hence, their probabilities of winning the point with the first serve are 49% for Lajovic and 52% for Federer. Peanuts you may argue, but at such a high level, every small percentage makes the difference.

The second serve

I will not keep boring you with another scatter plot since there is no correlation between performance and probability of winning the point on the second serve. However, a few players like Federer and Nadal have incredible stats on their second serve, with a probability of winning the point above 60%.

As we can see, the distribution of the probability of winning the point on successful first serves tends to 0. What does that mean? It means that very few players have a high probability of winning the point on their successful first serves. Is it the same for second serves? Not exactly. If we take a closer look at the distribution, we can see that the second violin is inverted when compared to the first. Indeed, many players have the same probability of winning the point on their successful second serves.

When looking at the plot, our vision may cheat us and let us believe that ATP players have a better chance to win the point with their second serve. It is true that the probability of winning the point with a successful second serve is higher, but this is due to p2 (the probability of a successful second serve) being higher than p1 (the probability of a successful first serve).

Probability of winning a point when serving

In order to determine the strength of a player’s serve we can calculate the overall probability of winning a point when serving. According to O’Donoghue, the probability of winning a point when serving can be calculated with the formula p1*q1 + (1-p1) * p2*q2, where p is the probability that the serve is successful and q is the conditional probability that the point is won given that the serve is successful. One and two represent the first and second serve respectively.

Again, we can see from the plot above that the best performers have a high probability of winning a point when serving. Let’s look at Federer and Zverev to understand better the importance of having a good first and second serve. If we go back to the last scatter plot, we can see that Zverev has a higher probability than Federer to win the point on a successful first serve (52.6 to 51.8). However, Federer’s overall probability of winning a point when serving is 72% while Zverev’s is 65.8%. This is quite a big difference if we think that many matches are decided by two or three important points.

Why does Zverev have a lower probability than Federer? Because Zverev’s second serve needs some improvement. Indeed, Federer has a probability of winning a point with his second serve of 56.7%, while Zverev’s probability is only 38.4%. That’s almost a 20% variance that makes a big difference in terms of performance.

Limitations

As mentioned before, tennis is a dynamic complex sport. It is hard to conclude that some players perform better than others because they have a higher probability of winning the point on their successful serves. Serves are the least hit shots along with volleys. In order to have a better analysis, we should take into consideration several other variables, especially the return of each player.

Conclusion

Hitting a good percentage of serves within the service box is not enough for ATP players. Indeed, players need to maximize their chance of winning a point when serving by balancing the probability of hitting a successful serve and the probability of winning the point when hitting a successful serve. A high probability of hitting a successful serve may indicate that the player is not taking enough risk to put his opponent under pressure.

The best performers in the ATP circuit are able to find a perfect equilibrium to maximize their chance to win the point with a successful first serve. In addition, even though the probability of winning a point with a successful second serve is not correlated to performance, the second serve remains crucial to maximize the chance to win the point when serving. Therefore, tennis players should try to find their perfect equilibrium by balancing power and risk in order to optimize their serving games.

References

  • P. O’Donoghue, A. Ballantyne, The impact of speed of service in Grand Slam singles tennis (2004), Science and racket sports III: the proceedings of the eighth international table tennis federation sports science congress and the third world congress of science and racket sports.
  • E. Gillet, D.Leroy, R. Thouvarecq, J. F. Stein, A Notational Analysis of Elite Tennis Serve and Serve-Return Strategies on Slow Surface (2009), Journal of Strength and Conditioning Research: Volume 23 — Issue 2 — p 532–539.

--

--

“Felix, qui potuit rerum cognoscere causas” (Virgil). My interests: economics, technology, computer and data science. My bio: https://bit.ly/37NxIBy.