Tour de Águas Claras: a data analysis of a bicycle ride

Perhaps there are no places as enchanting as Paris, Carcassonne, and Tours on this road. Maybe it is only a small proportion of the European reference, say less than 1%. Maybe I am a very *10¹⁰ amateur cyclist. Perhaps, in fact, from the Tour de France, I have only borrowed the words "Tour de". Nevertheless, my Tour de Aguas Claras is not fake and is a very well documented bike ride through my neighborhood
I invite you to see how an app that records speed, time, and altitude can inspire good data analysis that has a great dialog with math, Physics, and even public policy. Please take your bike and come with me.
Some words about my tour
Águas Claras is the name of my neighborhood. It is part of the Federal District (DF) where Brasilia, the capital of Brazil, is situated. DF is located in Planalto Central, a plateau that dominates large areas in the center of our country.
Comparing some cities altitudes
The characteristic of high altitudes is relevant in an analysis of data about a bike tour since it is easy to see that even if the tour took place on a plateau, some altitude variations are higher than the average altitudes of some cities. This is exactly the point that is presented in the graph below.
But before the picture, a little bit of R code. Some ggplot, please.
detalhe_percurso %>%
ggplot(aes(x= as.POSIXct(time_posix_millisecond/1000, origin = "1970-01-01"), y= altitude_meter)) +
geom_area(fill = "#976C42") + #cor do solo de Brasília durante a seca
geom_hline(yintercept = destaque[-c(3:5)], linetype = 5, alpha = 0.5 ) +
scale_y_continuous(breaks = c(seq(0,1200,300), destaque[-c(3:5)]))+
theme_light() +
theme(
#panel.background = element_rect(fill="#696969"),
panel.grid = element_blank()
) +
annotate("text",
x=hora_inicial,
y= altitude_fortaleza,
hjust = -0.1,
vjust= -0.1,
label= paste0("Fortaleza ",altitude_fortaleza,"m"))+
annotate("text",
x=hora_inicial,
y= altitude_ny,
hjust = -5,
vjust= -0.1,
label= paste0("NY ",altitude_ny,"m"))+
annotate("text",
x=hora_inicial,
y= altitude_guaramiranga,
hjust = -0.1,
vjust= -0.1,
label= "Guaramiranga")+
annotate("text",
x=hora_inicial,
y= altitude_paris,
hjust = -6,
vjust= -0.1,
label= paste0("Paris ",altitude_paris,"m"))+
annotate("text",
x=hora_inicial,
y= altitude_santiago,
hjust = -0.1,
vjust= -0.1,
label= "Santiago del Chile") +
labs(
x= "Horário",
y= "Altitude (m)"
)
And now the image produced from the code above.

As we can see, the altitude of my tour ranges from 1045 to 1150 meters. These marks are much higher than Santiago in Chile surrounded by the Andes and are a little higher than Guaramiranga, a small and charming town in the Brazilian state of Ceará. Finally, it is also clear that the difference in marks is greater than the altitude of some big cities like Fortaleza, New York, and Paris that are almost at sea level in their averages.
All those variations of altitude are related to this path described below with the help of the Open Street Map. Let’s see how I implemented the map visualization.
library(leaflet)
library(sp)
# define bounding box with longitude/latitude coordinates
spdf_geo <- g_total$data
coordinates(spdf_geo) <- ~ longitude_degree + latitude_degree
proj4string(spdf_geo) <- "+init=epsg:4326"
bbox <- list(
p1 = list(long = min(g_total$data$longitude_degree ), lat= min(g_total$data$latitude_degree) ),
p2 = list(long = max(g_total$data$longitude_degree), lat= max(g_total$data$latitude_degree))
)
leaflet() %>%
addTiles() %>%
fitBounds(
lng1 = bbox$p1$long, lat1 = bbox$p1$lat,
lng2 = bbox$p2$long, lat2 = bbox$p2$lat,
) %>%
addPolylines(data=coordinates(spdf_geo), color = "blue")

The path in blue represents the ride circuit. I spent an hour and fifty minutes on my bike to complete the nearly 30 km I had planned as soon as I woke up on that sunny August Sunday. The route included the Ecological Park, a bike path along one of the busiest roads in DF, some beautiful trees, and, as always, the risk of a stray dog biting my leg.
Describing the whole ride
The positive and negative gradients observed throughout the trip imply large variations in speed. After all, as Newton already knew, there is gravity, the universal law of physics, which is always present and in a way making a ride like this a difficult task. Just below is a code used to show in an image how speed is a function of altitude differences along the bike path. Scroll down a bit to see the graph produced.
g_total<-
patch_graph( c("2021-08-01 08:56:00", "2021-08-01 10:48:55"))
g_total + geom_hline(yintercept = destaque[1:2], linetype = 5, alpha = 0.5 ) +
scale_y_continuous(breaks = c(seq(0,1200,300), destaque[(1:2)]))

The color gradient shows how the speed changes along the route. Each colored vertical line represents the speed measured by the CycleDroid app every second. The darker the color, the higher the speed.
Since the scale of the graph starts at the zero mark, the altitude variation is not seen as clearly. It might be a good idea to view the speed graph with a zoom. And trust me, for those of us on a bike seat, the next image is a much better representation of the emotions brought on by the hillsides along the way.
g_total_zoom<-
patch_graph( bottom_altitude =1000) +
geom_hline(yintercept = destaque[1:2], linetype = 5, alpha = 0.5 )
g_total_zoom

Let’s comment on this chart. As we can see, there are many ups and downs. And as a consequence, a large variation around the average speed computed as 15.8 km/h. The graph is very useful to break down the complete route into some critical parts. For this text, we have chosen two sections that are related to the most radical inclines, as you can read a few paragraphs below.
The chart is also useful for detecting unusual events. This is the case of an interruption sometime between 10:00 and 10:30. At that moment I had to wait to safely cross a very busy road with no crosswalk. It took me a few precious seconds. Perhaps findings like this can be useful for the government to improve public policies around urban mobility and leisure.
Sine, derivatives, statistics, and physics summarizing the complete path
To better describe the ride, it is important to add a few more elements to our analysis. Here we have a problem. The app only gives us data on instantaneous speed, altitude, and time. A deeper description will also require cumulative distance, acceleration, and degrees of inclination. In this regard, the first task was to calculate all these new variables using the given ones.
I invited formulas from physics and using some calculus and trigonometry approaches, I calculated the remaining variables. The code below refers to two functions where I do this math.
rad2deg <- function(rad) {(rad * 180) / (pi)}
complete_ride_data <- function(df_ride){
df_ride$dif_altitude<- c(0,diff(df_ride$altitude_meter, lag=1))
df_ride$dif_speed<- c(0,diff(df_ride$speed_meter_per_second, lag=1))
df_ride$dif_time<- c(0,diff(df_ride$time_posix_millisecond, lag=1))/1000
df_ride$aceleration<- c(0,df_ride$dif_speed[2:NROW(df_ride)] / df_ride$dif_time[2:NROW(df_ride)])
df_ride$dist_percorrida<- c(0,df_ride$speed_meter_per_second[2:NROW(df_ride)] * df_ride$dif_time[2:NROW(df_ride)])
df_ride$dist_acumulada<- cumsum(df_ride$dist_percorrida)
df_ride$inclinacao<- c(0,df_ride$dif_altitude[2:NROW(df_ride)]/df_ride$dist_percorrida[2:NROW(df_ride)])
df_ride$inclinacao_graus<- c(0, rad2deg(asin(df_ride$inclinacao[2:NROW(df_ride)])) )
df_ride
}
The code below uses the function complete_ride_data, which in turn is used as input for another function that generates a graph describing the evolution of the variables cumulative distance, speed, acceleration, and slope.
df_ride<- complete_ride_data(g_total_zoom$data )
graph_cinematic(df_ride)

For all the graphs shown above, I applied an R function based on loess regression that smooths the shape of the curves. There was actually a lot of noise in the calculated values, especially for the last three curves.
As we can see, the cumulative distance, at first glance, can be perceived as an almost straight line for the entire route. However, a closer look will detect some small changes in slope along the curve. In fact, in the last few minutes, the curve is related to a shape where the cumulative distance has been increased at a slightly lower rate than those observed at the start of the ride.
These small changes in the first curve are related to the, not so small, changes observed in the second graph. When the loess algorithm is applied, the instantaneous speed varies in an almost sine-shaped fashion until a point where it turns into a straight downward line. For the entire route, this behavior is mainly the result of the negative slope of the bike path that prevails from 9:15 until the last second. This implies a negative acceleration that best describes the ride as a whole.
For specifics parts of the ride, the picture was quite different. Just below two of them where we can feel adventure and hard work.
Let´s take a look at a specific part of the tour
First, let’s see how we can relate altitude and speed for the specific part that begins at some moment before 09:15 and ends at some point before 10:00.
The graph is just below, but first some code.
g_trecho_1<-
patch_graph(limits= c("2021-08-01 09:33:00","2021-08-01 09:48:00"), bottom_altitude =1000)
g_trecho_1

The graph describes a situation where there is a positive slope and steadily decreasing velocity, and then we can see some kind of plateau where the velocity is seen with a constant value, goes to zero, and quickly recovers its previous value. At the end of the plateau, there is a negative slope related to a high positive rate of change of velocity. This is a bit confusing, isn’t it? Maybe a map of this stretch will help us identify what really happened. But as you know, some code first.
# define bounding box with longitude/latitude coordinates
spdf_geo <- g_trecho_1$data
lat_marker_1<- max(g_trecho_1$data$latitude_degree)
long_marker_1 <- g_trecho_1$data$longitude_degree[g_trecho_1$data$latitude_degree==lat_marker_1]
long_marker_2<- min(g_trecho_1$data$longitude_degree )
lat_marker_2 <- max(g_trecho_1$data$latitude_degree[g_trecho_1$data$longitude_degree==long_marker_2])
lat_marker_3<- min(g_trecho_1$data$latitude_degree)
long_marker_3 <- g_trecho_1$data$longitude_degree[g_trecho_1$data$latitude_degree==lat_marker_3]
long_marker_4<- max(g_trecho_1$data$longitude_degree )
lat_marker_4 <- g_trecho_1$data$latitude_degree[g_trecho_1$data$longitude_degree==long_marker_4]
coordinates(spdf_geo) <- ~ longitude_degree + latitude_degree
proj4string(spdf_geo) <- "+init=epsg:4326"
bbox <- list(
p1 = list(long = min(g_trecho_1$data$longitude_degree ), lat= min(g_trecho_1$data$latitude_degree) ), #long -122.522, lat = 37.707
p2 = list(long = max(g_trecho_1$data$longitude_degree), lat= max(g_trecho_1$data$latitude_degree)) #long = -122.354, lat = 37.84
)
leaflet() %>%
addTiles() %>%
fitBounds(
lng1 = bbox$p1$long, lat1 = bbox$p1$lat,
lng2 = bbox$p2$long, lat2 = bbox$p2$lat,
) %>%
addPolylines(data=coordinates(spdf_geo), color = "blue") %>%
addMarkers(lng= long_marker_1, lat= lat_marker_1, label = "Início subida",
labelOptions = labelOptions(noHide = T) ) %>%
addMarkers(lng= long_marker_2, lat= lat_marker_2, label = "Fim subida",
labelOptions = labelOptions(noHide = T) ) %>%
addMarkers(lng= long_marker_3, lat= lat_marker_3, label = "Fim ciclovia",
labelOptions = labelOptions(noHide = T) ) %>%
addMarkers(lng= long_marker_4, lat= lat_marker_4, label = "Fim descida",
labelOptions = labelOptions(noHide = T, direction = "right") )

We can see four landmarks on the map. From top to bottom, the first indicates the beginning of a long positive slope that ends at the third landmark. The last landmark represents the end of the bike path. At that point, I made a U-turn, which explains the zero speed. I went back to the third marker and then down to the second marker. I think this map description is enough to explain the confusing speed variation.
Some statistics about this stretch:
- Total distance: 4153 m
- Positive slope distance: 1595 m
- Positive slope time: 6min 38s
- Total time: 15 minutes
- Max speed: 34,7 km/h
A bit more help from science
Now comes the four graphs with the description of the path from a kinematics perspective.

As expected, science tells us another story when we focus on a small part of the whole trajectory. For the first graph, it’s easier to see the changes (still small) on the slopes. The other graphs are also more dynamic. You can see the effects of the difference of slopes influencing the acceleration and, in turn, increasing or decreasing the speed. As an example, the first four minutes are related to the decrease in speed subject to a negative, though increasing, acceleration, in turn, subordinated to a slight rise in slope.
Try this: look at the graphs beyond the point where the speed reaches its lowest value, then relate the four graphs to the map description in order to interpret the behavior of the four variables represented in the graphs.
The most difficult part of the tour
Near the end of the tour, I faced a very steep positive slope. My energy was almost drained, but since I am writing this text, I think I survived. Let’s take a look at the first graph.

We can see that there is a positive slope where the speed is constantly decreasing, then we can see some weird measurements and finally a sudden and consistent drop in altitude causing a high speed followed by some kind of deceleration. I think once again a map will help us understand better what happened.
The map again
Below, we can see the map related to this second peculiar stretch of our tour.

The path is a long climb that ends at a point where I cross a bridge for pedestrians and cyclists and then there is a sharp descent. This trajectory explains the speed and altitude graph shown above. By the way, the bridge crossing is associated with the strange measurements I mentioned earlier. And now the statistics of the stretch:
- Total distance: 1696 m
- Distance traveled uphill: 855 m
- Uphill time: 5min 42s
- Total time: 8min 38s
- Max speed: 39,3 km/h
Pythagoras, Newton & Gauss helping us
Let’s see in more detail the physics behind the scene.

In this section of the ride, it is much easier to see the slope changes in the cumulative distance curve. Note that the first few minutes are related to a small rate of change, and then there is a reversal of this behavior, as we can see that this is a consequence of the variations in slope, acceleration, and velocity. Spend a few minutes looking at the four graphs to get some interesting insights.
Science over two wheels
I believe that most of this analysis can be done with a good application that brings together most of the data I have calculated here and then presents it in beautiful graphs. My intention here was to show how you can really customize the analysis to your goals. In doing so, the idea is also to practice the concepts we have learned in our classrooms. This text is, in other words, a collection of possibilities to do science creatively using data that have been generated from a healthy and sustainable practice: a good bike ride through a neighborhood. And as a plus, perhaps data like this will be a good starting point for improving some public policies when analyzed at larger scales.
Code and data
Most of the code I used is described in the text. Although some functions and configuration variables are not shown. The complete code and data are available on my GitHub.