Open Data

Performance and characteristics of 1,602 tennis players

Λογότυπο της ερευνητικής ομάδας Sports Analytics του Οικονομικού Πανεπιστημίου Αθηνών
Lab's Methodology Go to the story

Data on 1,602 tennis players regarding their demographic and technical characteristics, performance and prize money earnings over the last 20 years.

The following dataset was created in the context of the topic “What statistics can tell us about the career of Stefanos Tsitsipas“: the purpose was to analyze the data by applying simple machine learning and statistical techniques in order to predict Stefanos Tsitsipas’ world rankings within the next decade.

Primary data was collected in March 2020 from the Ultimate Tennis Statistics website, which is licensed under a Creative Commons license (CC BY-NC-SA 4.0) and is based on open source software available on GitHub.

More specifically, we collected annual data for the period 2000 to March 2020 from the Association of Tennis Professionals (ATP) world rankings, as published on the Ultimate Tennis Statistics website. We then proceeded to collect profile data published on the same website, regarding 3,912 tennis players included in these rankings.

Indicatively, as can be inferred from Stefanos Tsitsipas’ stats profile on the website, each athlete’s “profile data” includes, among other things, information about the age of the player, the year he turned professional, the seasons he has played, his “backhand“, his favorite surface, the amount of prize money he has earned, the titles he has won, the best rank he has achieved, as well as his current rank, his ELO rank etc.

The original dataset, created as a combination of the above elements, could be described as a matrix with p columns and n rows – each row, one athlete, and each column one variable (characteristic). After computing the proportion of missing values across the dataset’s rows and the columns, we found that in several cases, missing values ​​corresponded to more than 50% of row and column contents. We therefore deleted said rows and columns, since the information they would provide would be negligible.

The result of the data cleaning process is the dataset listed below, where our analysis on the clustering of players was based.

namesprize_moneybest_rankoverall_surface_pcthard_surface_pctclay_surface_pctgrass_surface_pctranksDiff1ranksDiff2ranksDiff3best_rank_stdage_turned_proageplaysbackhandfavorite_surfacehard_titles_stdclay_titles_stdtitles_stdprize_money_std
Feliciano Lopez777932.27120.520.510.490.65112.097.034.018.015.038.02290.09090909090909090.04545454545454550.3181818181818179513.5643947428156
Nicolas Mahut513113.95370.440.410.30.62172.053.0175.014.018.038.03290.213.1482532242432
Tommy Robredo636963.5750.60.560.660.54101.01.09.08.015.037.03240.04761904761904760.5238095238095240.571428571428571113.364467742966001
Paolo Lorenzi351977.79330.380.360.420.299.0415.0134.014.021.038.033150.07142857142857140.071428571428571412.7713233559987
Ivo Karlovic468826.9140.520.520.420.6393.08.0128.08.021.041.03290.190476190476190.04761904761904760.3809523809523810413.057988896144801
Roger Federer5618777.8710.820.830.760.8716.07.04.06.016.038.03293.086956521739130.47826086956521714.478260869565219515.5416247373677
Guillermo Garcia Lopez470796.06230.460.420.50.48501.0132.0115.09.018.036.032160.117647058823529020.176470588235293960.29411764705882413.062180285599199
Jo Wilfried Tsonga1379538.4450.680.680.640.69399.0106.0231.08.018.034.03361.06250.06251.12514.137259537419801
Fernando Verdasco913358.4770.570.540.610.55291.064.073.08.017.036.02340.105263157894736990.2631578947368420.36842105263157913.724883711215199
Andreas Seppi587859.78180.480.460.50.57444.0113.094.011.018.036.03390.05555555555555560.05555555555555560.16666666666666713.2842437290547
Philipp Kohlschreiber674084.05160.560.540.570.6512.039.0120.011.017.036.032140.05263157894736840.3157894736842110.42105263157894713.421110085383699
Teymuraz Gabashvili291151.07430.370.360.390.2771.0596.023.015.015.034.0331212.5815975523401
Rafael Nadal6294819.010.830.780.920.78611.0151.02.07.014.033.02341.15789473684211023.10526315789474034.4736842105263215.655237472068698
Jurgen Melzer519159.480.510.510.520.53190.077.012.012.017.038.03320.20.050.2513.159966244087999
Dustin Brown264191.91640.390.340.40.45198.0293.0198.014.017.035.033912.484431049859701
Stan Wawrinka1877237.7230.640.640.670.5489.03.0114.012.016.034.032150.50.388888888888888950.888888888888889114.4453119564572
Richard Gasquet953971.4270.630.620.620.6768.014.091.05.015.033.03290.4210526315789470.1578947368421050.789473684210525913.7683889919104
David Ferrer1749106.1730.660.640.70.63197.0150.012.013.017.037.03340.6666666666666670.72222222222222211.514.3746154554174
Go Soeda135495.27470.380.370.310.24249.0133.0200.035.033311.816692010943502
Carlos Berlocq296433.93370.410.330.460.25292.0364.010.011.018.037.03240.142857142857143020.1428571428571430212.5995796395367
Marcel Granollers772940.57190.450.410.480.45445.0119.0110.09.016.033.033160.07142857142857140.2142857142857140.2857142857142860313.5579574423371
Gilles Muller315361.79210.520.530.460.56305.0280.060.016.017.036.02360.05263157894736840.1052631578947369912.661475798423199
Dudi Sela259822.33290.420.450.190.48214.049.0138.07.016.034.032612.4677533302564
Daniel Gimeno Traver227631.36480.360.250.420.1619.012.075.09.018.034.033412.3354827573315
Julien Benneteau530930.11250.480.490.410.47149.018.0115.014.018.038.033313.182385671975801
Novak Djokovic10.830.840.80.84493.0108.062.08.015.032.03363.47058823529412040.8235294117647064.64705882352941
Yen Hsun Lu282170.35330.420.420.240.44351.02.0103.09.017.036.033612.5502662455528
Viktor Troicki575913.2120.520.530.510.52181.0449.0126.05.020.034.03320.20.213.263712233878
Florian Mayer485266.13180.480.440.50.59480.0143.0215.010.017.036.03390.06666666666666670.13333333333333313.0924527410764
Rogerio Dutra Silva166942.0630.320.290.360.01138.0144.0351.014.019.036.0321512.025401725685198
Janko Tipsarevic506824.9480.530.540.520.53453.022.044.010.017.035.03320.176470588235293960.05882352941176470.2352941176470590113.1359209369523
Ruben Ramirez Hidalgo167582.5500.340.190.390.01179.013.061.08.020.042.033412.029231046304
Mischa Zverev403370.93250.40.410.320.535.026.0444.012.017.032.02390.071428571428571412.9076118394366
Mikhail Youzhny713222.580.540.550.510.5755.026.011.09.016.037.03230.30.150.513.477548712426401
Gael Monfils1054753.8860.640.660.610.59686.0209.016.012.017.033.033100.5294117647058820.05882352941176470.58823529411764713.8688180085766
Andy Murray4102933.810.770.780.70.84129.0347.047.011.017.032.03392.26666666666666970.23.0666666666666715.227212836758499
Marco Chiudinelli134908.0520.350.350.260.416.0109.033.010.018.038.033311.812348343624999
Marcos Baghdatis557432.3180.560.570.430.58376.038.0104.03.017.034.03360.18750.2513.2310963579044
Gilles Simon869037.8860.580.580.580.57310.053.079.07.017.035.03320.5294117647058820.2941176470588240.82352941176470613.675141993631199
Sergiy Stakhovsky314649.47310.450.460.380.48198.0151.011.07.017.034.03260.176470588235293960.2352941176470590112.659214504542401
Simone Bolelli360726.07360.430.360.480.52354.019.0123.06.017.034.0321612.7958741404096
Stephane Robert218865.73500.340.360.310.33573.0103.027.015.020.039.033212.2962137157501
Radek Stepanek597024.4280.560.560.550.6265.0479.017.010.017.041.03320.2631578947368420.26315789473684213.299713296060801
Jaroslav Pospisil179194.671030.120.330.010.01157.099.0338.039.031112.096228035777099
Lukasz Kubot590473.08410.430.360.470.5147.013.069.08.019.037.0331613.288679325096
Jan Mertl300402.51630.670.990.587.052.0229.05.020.038.031112.6128785210745
Daniel Munoz De La Nava138074.38680.240.30.22205.0241.0276.017.017.038.0231011.835547804446099
Frank Dancevic117914.62650.380.380.050.48233.030.017.04.018.035.032311.6777160822304
Lamine Ouahab38605.171140.50.380.53361.076.0114.07.017.035.033410.561141484307901
Giovanni Lapentti49029.01100.340.460.240.33505.0142.0149.03.019.037.0331010.8001672387612
Fabio Fognini841913.3890.540.480.590.51415.058.0152.015.016.032.03340.06250.50.562513.643432413823302
Flavio Cipolla203031.25700.350.380.360.12319.051.070.09.019.036.0321512.2211151870629
Filippo Volandri263308.73250.440.130.540.1548.060.0106.010.015.038.03240.1333333333333330.13333333333333312.4810825010305
Santiago Giraldo61.57280.450.390.510.43390.0152.034.08.018.032.03344.12017473892312
Adrian Menendez Maceiras199829.831110.250.290.170.25298.076.0208.010.019.034.0331012.2052214333519
Jan Hernych143583.67590.40.420.320.4893.035.034.011.018.040.033911.8746732104669
Maximo Gonzalez203559.91580.320.190.370.0150.0250.0265.07.018.036.033412.2237156385726
Michal Przysiezny103031.92570.290.280.210.3350.0135.0107.013.017.036.032311.542794122114401
Albert Montanes345078.82220.470.30.530.32109.013.03.011.018.039.03340.35294117647058790.352941176470587912.7515281336877
Nicolas Almagro672014.6290.590.470.660.47106.0582.053.08.017.034.03240.81250.812513.418035375220999
Konstantin Kravchuk114634.22780.260.220.50.228.0295.02.012.019.035.0331411.649501642529
Lleyton Hewitt1043996.710.70.70.640.766.01.016.03.017.039.03391.00.11.513.8585668865002
Tomas Berdych1734784.040.650.650.630.68285.068.021.013.016.034.033140.5294117647058820.117647058823529020.764705882352940914.3663934679356
Igor Sijsling216377.5520.360.350.30.44101.0481.083.09.017.032.032912.284779846426801
Teodor Dacian Craciun218141.0253.051.09.017.039.0331
Denis Istomin372861.12330.470.460.450.54662.04.030.08.017.033.03390.06250.12512.8289612968533
Steve Darcis220764.13380.470.470.470.46114.0215.0330.014.018.035.03220.06666666666666670.06666666666666670.13333333333333312.3048501254777
Kevin Anderson1163846.7150.590.610.530.59101.0249.0296.011.020.033.03360.428571428571428940.4285714285714289413.9672412061614
Dmitry Tursunov394675.0200.510.520.40.57146.0146.0222.06.017.037.03330.333333333333333040.4666666666666670612.8858179203999
Michael Berrer212823.15420.380.410.30.29127.0152.026.011.018.039.0221012.2682168181267
Tobias Kamke216612.27640.380.370.360.4693.0271.0235.07.017.033.033912.285864260144
Paul Henri Mathieu370534.88120.480.460.490.42125.0114.047.09.017.038.03330.117647058823529020.2352941176470590112.822702862337
Frederico Gil145393.3620.470.390.550.12344.0125.010.08.017.034.033411.8871977632399
Malek Jaziri294278.0420.410.420.430.3168.0201.0158.016.019.036.0331512.5922801777746
Ernests Gulbis453547.06100.510.50.520.38277.080.08.010.015.031.03330.31250.06250.37513.024854313826099
Sam Querrey794143.47110.560.560.450.64485.067.024.012.018.032.03390.53333333333333290.06666666666666670.66666666666666713.5850194166015
Blaz Kavcic163640.09680.390.370.420.2458.060.0106.07.018.033.0331512.005424722031
Toshihide Matsui98480.672610.670.6237.072.0223.041.033111.4976155642471
Juan Monaco577459.79100.560.460.630.39315.0146.0251.010.017.035.03340.07142857142857140.57142857142857110.642857142857142913.2663940912483
Robin Haase508178.29330.460.440.510.4502.053.02.07.017.032.03340.142857142857143020.1428571428571430213.1385876295539
Michael Russell156858.0600.340.370.260.368.070.0345.09.019.041.0331011.9630962164622
Jimmy Wang87971.23850.460.460.290.39516.0197.026.05.016.035.033311.3847651081883
Lukas Lacko232969.07440.40.420.20.42190.092.0186.08.017.032.033612.358660976955099
Tommy Haas680499.3520.630.640.590.6315.03.06.06.017.041.03220.550.10.7513.430582145893199
Pablo Andujar411965.85320.410.290.510.12611.0146.069.011.018.034.03340.3076923076923080.30769230769230812.9286957365467
Adrian Mannarino531070.77220.460.460.260.59457.0122.0190.014.015.031.02390.076923076923076913.1826505681797
Matthias Bachinger142133.73850.360.390.370.22316.0159.052.06.017.032.0331511.8645236539685
Lukas Rosol300705.64260.440.390.520.4328.017.089.010.018.034.03340.07142857142857140.07142857142857140.1428571428571430212.613887125036198
Jeremy Chardy577229.53250.50.480.530.48303.069.0117.08.018.033.03320.06666666666666670.066666666666666713.2659952653493
Matteo Viola123883.81180.240.330.010.01142.0154.0402.09.016.032.0331011.727099308463302
Marc Fornell Mestres2360.010.0194.0178.066.038.0311
Pablo Cuevas530023.94190.530.410.60.41480.0124.0117.012.018.034.03240.3750.37513.1806774543195
Potito Starace315379.17270.460.310.530.0884.0200.033.06.019.038.033412.6615309082103
Victor Hanescu306932.21260.450.340.540.41265.040.0102.010.017.038.03240.07142857142857140.071428571428571412.634382187854
Ivo Klec68838.171840.220.20.25282.0309.0264.07.018.039.033111.1395136665904
Leonardo Mayer499417.46210.480.430.530.46453.067.064.012.015.032.03240.1538461538461540.15384615384615413.121197618171001
Carlos Salamanca55554.831370.380.010.75197.048.0446.09.018.037.023110.9251257399828
Donald Young308544.0380.40.420.220.3859.0394.038.08.014.030.023612.639619737765301
Alejandro Falla193833.19500.40.390.430.39291.0148.0104.012.016.036.023212.1747532228056

Dataset columns explained

  • names: player’s name
  • prize_money: total prize money earnings ($)
  • best_rank: best position held in the world rankings
  • overall_surface_pct: overall success rate (all courts included)
  • hard_surface_pct: success rate on hard courts
  • clay_surface_pct: success rate on clay courts
  • grass_surface_pct: success rate on grass courts
  • ranksDiff1: first rapid ascent in the world rankings (difference in places held between first and second season)
  • ranksDiff2: second rapid ascent in the world rankings (difference in places held between third and second season)
  • ranksDiff3: third rapid ascent in the world rankings (difference in places held between fourth and third season)
  • best_rank_std: best position in the world rankings adjusted (according to years of professional activity)
  • age_turned_pro: age the player turned professional
  • age: ηλικία του αθλητή
  • plays1: whether the player is left-handed or right-handed
  • backhand2: one-handed vs two-handed backhand players
  • favorite_surface3: player’s favorite court surface
  • hard_titles_std: titles on hard courts, adjusted according to years of experience
  • clay_titles_std: titles on clay courts, adjusted according to years of experience
  • titles_std: titles the player has won in total, adjusted according to years of professional activity
  • prize_money_std: prize money the player has received, adjusted according to years of experience

1 Decoding: 1=Unknown, 2=Left-handed, 3=Right-Handed

2 Decoding: 1=Unknown, 2=One-Handed, 3=Two-Handed

3 Decoding: 1=Unknown, 2=All-Rounder, 3=Carpet, 4=Clay, 5=Fast (H, G, Cp), 6=Fast (H, G), 7=Fastest (G, Cp), 8=Firm (H, Cp), 9=Grass, 10=Hard, 11=Non-Carpet, 12=Non-Grass, 13=Non-Hard, 14=None, 15=Slow (H, Cl), 16=Soft (Cl, G)


This is the result of a collaboration between iMEdD Lab and AUEB Sports Analytics Group research team aiming to promote robust quantitative analysis in sports, on both an academic and a professional level. The team works in the field of “Sports Analytics”, which includes the creation of statistical models and the production of predictions regarding sports results, draws on sports economics, and uses such tools as performance analysis, and visualization and measurement of competitive balance.




Translation: Anatoli Stavroulopoulou

Λογότυπο Άδειας Χρήσης Creative Commons Non Commercial International