A data-driven approach to the “Everesting” cycling challenge

Table 1 presents the descriptive statistics of the Everesting dataset, which comprises 2561 records (see “Methods” section). We emphasize that specific Everesting parameters relate to each other; for instance, the maximum gradient of 17.20% corresponds to the shortest distance (approximately 105.74 km) and vice versa. Furthermore, as expected, most cyclists attempt the Everesting challenge when the outdoor temperature is mild 16.24 ± 5.76 °C. Roads with steep gradients (> 10%) are not available everywhere and, thus, cyclists may be restricted to selecting a hill with moderate incline 7.57 ± 1.94% for their attempt. The range of the number of hill repeats is between 2 and 1001.00, even though the majority of the data points is within 63.24 ± 63.92.

Figure 1 shows the normalized linear correlation coefficient between each Everesting parameter (input attribute) and the time to complete the Everesting challenge (target attribute), ranked in descending order. The absolute value of the correlation coefficient quantifies the relative contribution of each input attribute to the target attribute, and the sign indicates a positive or negative relationship. The effect of temperature, age, and number of hill repeats on the time to complete the Everesting challenge is smaller than that of the distance, the gradient of the hill, and power per unit body mass (see “Methods” section for power/unit mass estimation). Evidently, the correlation coefficient of the latter is − 1 since the estimate of the power per unit mass is proportional to htot/ttot. Surprisingly, the number of hill repeats, which determines the intervals between effort (ascent) and recovery (descent) does not strongly correlate with the total time to complete the Everesting challenge. This observation contradicts the optimal number of hill repeats (24) theoretically derived by Swinnen et al.14. Decreasing the length of the ascent allows more frequent yet shorter recovery periods during the descent, but also requires more frequent turning at the bottom and top of the hill. The limited correlation with the time to complete the Everesting challenge indicates that the time gained by recovering more frequently during the attempt may be lost by the time needed to turn and accelerate at the bottom and top of the hill.

Figure 1
figure 1

Normalized linear correlation coefficient between each Everesting (input) parameter and time (target attribute) to complete the Everesting challenge, showing that power, distance, and gradient are the main factors that determine time.

The correlation coefficient shows that the power per unit body mass is the most important input attribute, i.e., increasing power decreases the time to complete the Everesting challenge, which is intuitive because increasing power increases propulsion. This input attribute is dependent on a cyclist’s fitness and talent, as well as their body mass. Also, the total distance and the gradient of the hill are the two next most important input attributes, and they are partially related to each other through the number of hill repeats. The tradeoff between total distance and gradient of the hill is at the heart of the Everesting challenge. The time to complete the Everesting challenge increases with increasing distance and decreases with increasing gradient because a rider gains more elevation per unit distance. However, the average speed also decreases with increasing gradient and constant power. Reports in the media anecdotally suggest that pro-level riders, who are talented, trained, and can output a high power per unit body mass for a long time, prefer the gradient to be as steep as possible to minimize the time to complete the Everesting challenge7,8,16,17, which is substantiated by our dataset. For instance, the top-10 record times of the Everesting challenge (on 08/24/22) were all achieved on a hill with gradient > 10%, 8 out of 10 with gradient > 13%, and 4 out of 10 with gradient > 15%, compared to the average gradient of 7.57% calculated based on the entire dataset (see Table 1).

Figure 2 shows the pairwise relationship between all input attributes and time (target attribute) in matrix format. The main diagonal shows the probability density function of each attribute. Note that even though the results of Fig. 2 show the entire dataset, the results are similar when separating male and female cyclists. We observe that the time to complete the Everesting challenge decreases with increasing gradient and increasing power, as expected from Fig. 1. All attempts that took under 10 h occurred on a hill with an average gradient > 7%. Figure 3 shows the estimated power per unit body mass versus the gradient, and the color of each datapoint represents the time to complete the Everesting challenge (see colorbar), thus illustrating that talented or well-trained cyclists preferentially select hills with steep gradients7,8,16,17 to capitalize on their talent and fitness, as opposed to recreational cyclists who cannot sustain such a demanding physical effort and, therefore, select a less steep hill. Hence, a cyclist’s ability to ascend steep hills fast (high power per unit body mass), rather than simply selecting a hill with steep gradient, likely drives the time to complete the Everesting challenge. Selecting a gradient below 7% most often results in attempts that exceed 20 h, driven by the increased distance of the ride (see Fig. 2). Furthermore, no unique relationship between gradient and distance exists, and the variation of the data for constant gradient or constant distance in Fig. 2 is due to the number of hill repeats, which may differ for each individual attempt as a result of selecting different hills.

Figure 2
figure 2

Pairwise relationship between input and target attributes in matrix format with the probability density function of each attribute along the main diagonal.

Figure 3
figure 3

Estimated power per unit body mass as a function of the gradient, with the color of each datapoint indicating the total time (colorbar), illustrating that cyclists who can output high power per unit mass preferentially select a steep gradient.

We perform cluster analysis considering all input and target attributes to segment the dataset into distinct cyclist types using unsupervised machine learning algorithms. Figure 4 shows the estimated power per unit body mass versus the distance, where we indicate the direction of increasing/decreasing gradient and speed with arrows. Each datapoint clusters into one of three distinct cyclist types: groups 1 (Chicago maroon), 2 (Burnt orange), and 3 (Hokie stone), using a Gaussian mixture model (GMM, see “Methods” section), which we determine to work better than k-means, k-medoids, density (DBSCAN), and spectral methods based on internal and external metrics. We also show the datapoints colored according to time (see colorbar) to assist with the interpretation of the different groups. Figure 4 shows that group 1 (Chicago maroon) likely represents talented, highly-trained cyclists, characterized by a high power per unit body mass. These cyclists have the ability to select a steep hill because they are able to maintain high power output for a long time, and this results in a fast time to complete the Everesting challenge, and compete for the world record. In contrast, group 3 (Hokie stone) likely represents recreational cyclists who generally select a hill with a shallow gradient because they cannot maintain high power output and, as a result, their total time to finish the challenge exceeds 20–25 h. In between groups 1 and 3, we define group 2 (Burnt orange) as a group of well-trained amateur cyclists who complete the Everesting challenge in 15–25 h, yet in general select a hill with a gradient that is shallower (4–10%) than the elite cyclists of group 1.

Figure 4
figure 4

Estimated power per unit body mass versus distance, indicating different clusters of cyclists (groups 1, 2, and 3), and indicating the time to complete the Everesting challenge.

Figure 5 shows the pairwise relationship between the distance, gradient, power and time (target attribute) in matrix format, similar to Fig. 2 but only showing the three input attributes that most substantially affect the target attribute based on Fig. 1. Each of these attributes approximately follows a Gaussian distribution (see main diagonal in Fig. 2) and, thus, 68% of the datapoints of each attribute are confined within one standard deviation of the arithmetic mean (see Table 1). The main diagonal shows the probability density function of each attribute, and we color each datapoint according to the three distinct cyclist types of Fig. 4; groups 1 (Chicago maroon), 2 (Burnt orange), and 3 (Hokie stone). From Fig. 5 we observe the relationship between different Everesting parameters, presented as a function of the distinct cyclist types, thus further substantiating the intuitive interpretation of these different cyclist types.

Figure 5
figure 5

Pairwise relationship between input and target attributes in matrix format with the probability density function of each attribute along the main diagonal, and indicating different clusters of cyclists (groups 1, 2, and 3) based on a Gaussian mixture model.

Even though not explicitly shown in Figs. 4 and 5, we determine that separating female and male cyclists yields similar results and interpretations, but it is important to emphasize that the dataset for female cyclists is an order of magnitude smaller than that of male cyclists. We also underscore the limitations of estimating the power per unit body mass in this work, which is solely based on the energy to overcome the actual elevation gain in the total time to complete the Everesting challenge, rather than based on a measurement using a power meter. Hence, it neglects energy losses from rolling resistance, aerodynamic drag, and friction forces in the bearings of the bicycle, which are small compared to the power required to overcome the elevation gain25. Consequently, we underestimate the power per unit body mass compared to the values documented in the media for Everesting world record attempts, which originate from power meter measurements during specific world record attempts. For instance, Sean Gardner averaged 4.73 Watt/kg on the ascents during his Everesting world record17, whereas Keegan Swenson averaged 3.59 Watt/kg during his entire attempt26. For comparison, the power per unit body mass estimate for Keegan Swenson in this paper is 3.14 Watt/kg, i.e., a difference of approximately 13%. Our estimate could be improved by considering the time ascending only, if that data would be available, instead of considering the total time. Accounting for energy losses related to aerodynamic drag, rolling resistance, and friction forces in the bearings of the bicycle would also improve the power estimate, but also requires more information. Availability of power meter data, if correctly calibrated, in combination with the weight of the cyclist, would yield the exact information.

We performed cluster analysis with different unsupervised machine learning algorithms to segment cyclists into homogeneous groups, based on their Everesting performance. Each algorithm yields a solution and the number of clusters is specified a priori. Hence, the interpretation of the results is important to select and interpret a relevant solution. We attempted k-means and k-medoids and selected between 2 and 5 clusters to segment the data, and we used internal and external metrics to evaluate the quality of the different clusters, such as shadow plots. Even though the shadow plots showed little overlap between adjacent clusters, it was not intuitive to recognize and associate different types of cyclists with the resulting clusters. We also attempted using spectral methods and density methods (DBSCAN) with limited success, likely because the datapoints are densely packed. The Gaussian mixture models (GMM) were most successful in segmenting the cyclist data into homogeneous groups, because the clustering was repeatable and driven by the input attributes that show the highest correlation coefficient with the target attribute. As a result, the algorithm segments the data into clusters based on distinct power levels, which is one of the most important parameters to distinguish between cyclists’ Everesting attempts.

link

Leave a Reply

Your email address will not be published. Required fields are marked *