We put the “Tsitsi-fast” phenomenon under the microscope and applied simple machine learning techniques and statistical methods to see which players the 22-year-old international is similar with and predict his career path over the next decade.
He is the first Greek tennis player to be included in the list of the top 100 professional tennis athletes in the world, having moved up hundreds of places. Among other things, he has wins over the “Big 3” (Novak Djokovic, Roger Federer, and Rafael Nadal) under his belt and the Nitto ATP Finals title in his trophy cabinet – in November 2019 he made history at the O2 Arena in London. A year later, 22-year-old Stefanos Tsitsipas, who got Greeks into tennis and put Greece on the elite sports map, was taken out in the same venue, during the opening sets of the 2020 ATP Finals that took place behind closed doors, causing him to drop from sixth to seventh place in the world rankings and leaving fans wondering what position the season-ending finale would find him in.
By performing statistical analysis on publicly available data dating from 2000 to the beginning of March 2020 with regard to the performance, profits and characteristics of 1.602 international tennis players, we try to find out where Stefanos Tsitsipas “stands” on the international sports scene and what the course of his career might look like in the coming years.
As far as the latter is concerned, our approach focuses on the world ranking positions that Tsitsipas is predicted to hold, while data analysis is based on such parameters as ranking of athletes to date, physical and professional age, performance on different surfaces, number of titles won, earnings, but also technical characteristics, such as the “backhand“.
Primary data was collected in March 2020 from the Ultimate Tennis Statistics website, which is licensed under a Creative Commons license (CC BY-NC-SA 4.0) and is based on open source software available on GitHub. Data processing and analysis is presented in detail in the corresponding publication on our research Methodology.
Tsitsipas resembles his rivals and idols
Our method identified a striking resemblance between Stefanos Tsitsipas and German peer Alexander Zverev, the previous holder of the Nitto ATP Finals title (2018), who currently ranks sixth in the world (as a result of Tsitsipas’ defeat at this year’s Nitto ATP Finals), in terms of their undisputed competitiveness and subsequent rivalry, which has occasionally made headlines.
“Stef” also resembles his idols, including Swiss star Stan Wawrinka (who eliminated, but also commended Tsitsipas at Roland-Garros in June 2019), Argentine Juan Martin del Potro – as well as Austrian player, Dominic Thiem, who was defeated by Tsitsipas in the final of the 2019 Nitto ATP Finals, but kicked off his 2020 Nitto ATP Finals campaign with a win against the Greek athlete, on November 15th.
David Ferrer, Thomas Berdych, Marin Cilic, Kei Nishikori and Milos Raonic (who knocked off Tsitsipas at the Australian Open in January 2020 and the Cincinnati Open in August 2020) complete the cluster of players who are similar to Tsitsipas, according to the statistical models we applied.
We analyzed data about 1,602 players and we clustered them in 150 different groups, based on their similarity according to the statistical method we developed.
The following radar plot displays players who are considered more similar to Stefanos Tsitsipas and who have been categorized in the same cluster as him. At the same time, the study characteristics and the similarities between players in relation to those characteristics are also reflected – the more individual shapes of players tend to touch, the more similar these tennis players are.
You can select and/or deselect specific players in order to compare them: the closer to the perimeter of the circle the vertices of the polygons are plotted, the “better” the player is considered. However, the variable of physical and professional age, which is considered a neutral feature, is an exception, which means that the closer to the perimeter of the circle the corresponding vertex is, the older the player.
Indicatively, Tsitsipas and Zverev seem to be almost identical, with the latter having a superior “backhand” and a slight advantage in terms of success rate on various surfaces. On the contrary, Tsitsipas performs better in terms of number of titles won in proportion to the duration of his professional career [see “Titles (std)”]. In fact, Tsitsipas is superior to all other players in his cluster in terms of number of titles adjusted. After all, he is, by far, the youngest player (along with Zverev), but also the one who turned professional when he was older than the rest. The “Tsitsi-fast phenomenon”, as it has been dubbed, is also mirrored in the plot: Stefanos Tsitsipas stands out for his rapid ascent in world rankings (see “First rank diff.” for the rise of places between the first and the second season), with Milos Raonic being the only one from the cluster to slightly outperform him.
The plot offers users the opportunity to compare the Tsitsipas cluster with tennis titans Rafael Nadal and Roger Federer: by activating the lines that correspond to the duo, it is clear how similar these two are to each other and how significantly they differ from Stefanos Tsitsipas and his peers, for the time being at least.
Players grouped together in terms of earnings, titles and courts
After we conducted data analysis – the method we employed to find similar players – players were grouped together based on the following: success rate on different surfaces (hard court, carpet court, grass court) and its correlation to the players’ world rankings position, titles won in relation to the years they have been active, and total prize money in relation to a number of parameters, such as season, professional age and highest position achieved. The following graphs illustrate this grouping of players.
The method of statistical analysis was applied, taking into account Stefanos Tsitsipas’ high earnings [$10.43 million by the beginning of March, when data collection was completed – $12.23 million in total (singles and doubles combined) as of the time of writing, according to the ATP], the high number of titles he’s won and his high ranking, in relation to seasons played. Consequently, the model placed him in the same category as players who enjoy equally high salaries, hold similar titles and who turned professional at a similar age as hm.
Interestingly, the cluster in which Roger Federer and Rafael Nadal belong, is exclusively comprised of the duo – a result that can be interpreted both as a reflection of their perceived unique success and as a confirmation of the model’s effective groupings.
The “golden” decade has just begun
It is clear that Stefanos Tsitsipas is now entering his prime. In order to predict his future career path and given that his professional career spans five years, we decided to predict his annual rankings until 2030.
For this purpose, after grouping players based on degree of similarity, a prediction model was created with the statistical technique of the sample mean:
We predict the athlete’s performance over the next decade, based on the mean performance of similar players, i.e. athletes from the same cluster who, however, have longer career spans.
Prior to the application of the model in the case of Tsitsipas, it was tested on other players in his cluster: more specifically, by using data from the first five years of these players’ performance trajectories, we made “pseudo predictions” for the remaining ten years of their career, which we then compared to their actual world ranking data. In this way, the margin of error in our predictions is calculated and depicted. These predictions are likely to be more effective in terms of the estimated rise or fall of Tsitsipas over the years – and not the specific place he is speculated to hold in the world rankings per year.
A “golden” decade seems to have just begun for Stefanos Tsitsipas, who is estimated to remain in the world’s top 10 until 2029 and to reach the peak of his career in about seven years from now: in particular, it is predicted that he will rank 6th (mean prediction) in 2027 – with predictions ranging from 5th to 7th, as the highest and lowest estimates for the same year respectively. Tsitsipas is also expected to move up to 5th place (mean prediction) in the world rankings the following year, before he finally ranks 4th (mean prediction) in 2029 – when his descent, which is reflected in the last year of predictions, is estimated to begin.
How the analysis was conducted
Πώς ομαδοποιήσαμε 1.602 αθλητές τένις, για να βρούμε με ποιους «μοιάζει» ο Στέφανος Τσιτσιπάς και να προβλέψουμε τη διαδρομή του.
The methodology of the analysis includes three main stages: Initially, we used all available data on players’ world rankings, profits and different characteristics (demographic and technical): drawing on this data, we calculated the similarity of each possible pair of players among 1.602 athletes.
Next, using cluster analysis, we categorized players into groups according to the degree of similarity between them. Finally, we predicted Stefanos Tsitsipas’ world rankings for the next decade, based on the progress of his more experienced peers – that is, players who are placed in the same cluster as Tsitsipas and have longer career spans. For the purposes of testing the model, we selected four athletes for whom “pseudo predictions” were made, and thus calculated the margin of error. Detailed information on statistical analysis and the creation of the prediction model is provided in the relevant publication on our research Methodology.
It should be noted that a multivariate modelling strategy would be useful for making even more realistic predictions, which is why we suggest this strategy is applied in future studies: in tennis, an athlete’s trajectory significantly depends on their opponents’ progress. In the present model, the working hypothesis is that a player (p) will follow a similar trajectory as his peers’ (n), but it is not assumed that his opponents will have a similar performance to his peers’ (n) opponents considered here.
This a result of a collaboration between iMEdD Lab and AUEB Sports Analytics Group research team which aims to promote robust quantitative analysis in sports, on both an academic and a professional level. The team works in the field of “Sports Analytics”, which includes the creation of statistical models and theproduction of predictions regarding sports results, draws on sports economics, and uses such tools as performance analysis, and visualization and measurement of competitive balance.
Translation: Anatoli Stavroulopoulou