cycleRtools
was written to bring specialised cycling analysis procedures to R, thus permitting R to be used in place of dedicated cycling analysis applications and facilitating a much greater depth of analysis than is otherwise available.
The following is a simple example workflow to illustrate the core functions of this package.
Firstly, we need to import some data. For this example, I’ll import one of my own rides from a Garmin 810 head-unit (the same data are made available within this package; see data(cycling_data)
)
library(cycleRtools)
data("cycling_data")
Note that the above was originally achieved via:
f <- "path/to/file.fit"
cycling_data <- read(f, format = TRUE)
read()
is a simple wrapper for all read_*()
functions that recognises file extension. Hence, the above could also have been achieved with:
f <- "path/to/file.fit"
cycling_data <- read_fit(f, format = TRUE)
At this point I should apologise that read_*()
functions lack portability. This is a shortfall of the package that results from the complex file structures of most cycling head-units. It is also one I would like to rectify in future releases.
When read with the format = TRUE
argument, this generates an object of class c("cycleRdata", "data.frame")
. The main purpose of the "cycleRdata"
class is to ensure certain columns are present in a certain order. These are:
colnames(cycling_data)
## [1] "timer.s" "timer.min"
## [3] "timestamp" "delta.t"
## [5] "lat" "lng"
## [7] "distance.km" "speed.kmh"
## [9] "elevation.m" "delta.elev"
## [11] "VAM" "power.W"
## [13] "power.smooth.W" "work.J"
## [15] "Wexp.kJ" "cadence.rpm"
## [17] "lap" ".elevation.corrected.m"
All of which should be self-explanatory, perhaps with the exception of VAM
, which is the rate of vertical ascent. Also note, power.smooth.W
is a 25-second, exponentially weighted moving average of power values; and .elevation.corrected.m
is not typical, but rather added for the purpose of example. If any of these columns are not present in the original data file (e.g. latitude/longitude data in an .srm file), the columns are still present in the final formatted object, but are filled with NAs. To avoid any errors, do not rename existing columns; but new columns may be appended without complication.
Of course, certain columns are discarded from the original data as they are not deemed useful in this package - for example, heart rate data. If you want to view all original data, then set format = FALSE
. This will, however, preclude the use of most functions in this package.
Perhaps the first thing you’ll want to do on reading a data file into R is generate W’ balance data. For those unfamiliar with the concept of W’ (read “W prime”), refer to the reference in ?Wbal
. While this has already been done for cycling_data
, to illustrate:
cycling_data$Wexp.kJ <- Wbal(data = cycling_data,
time = timer.s, # Name of time column.
pwr = power.W, # Name of power column.
CP = 330) # Critical power value.
# summary(cycling_data$Wexp.kJ)
These data and others can then be plotted:
Wbal_plots(data = cycling_data,
x = 2, # Plot against distance.
CP = 330) # Not necessary; annotates power plot.
The barometric altimeters in GPS head-units are cannot always be trusted to provide accurate and(or) reliable topographical data. This package can at least generate reliable values via positional coordinates (if available):
# When invoked for the first time, this requires an internet connection to
# download topographical data files (see ?download_elev_data). Package "raster"
# must also be installed.
cycling_data$.elevation.corrected.m <-
elevation_correct(cycling_data, country = "GBR")
We can then compare the above:
par(mfrow = c(1, 2), mar = c(4.1, 4.1, 1.1, 1.1))
with(cycling_data, plot(x = distance.km,
y = elevation.m,
type = "l",
col = "green"))
with(cycling_data, plot(x = distance.km,
y = .elevation.corrected.m,
type = "l",
col = "red"))
Or, perhaps more usefully, plot the differences throughout the ride:
par(mar = c(4.1, 4.1, 1.1, 1.1))
with(cycling_data, plot(x = distance.km,
y = (elevation.m - .elevation.corrected.m),
cex = 0.2))
As cyclists, we are often most interested in gleaning our “best” power outputs for certain durations - e.g. best 20 minute power. In more mathematical terms, these are “maximum mean values”, or more specifically “maximum mean powers” (MMP, as they are often referred). Computationally, this amounts to identifying the maximum value of a rolling average. One issue that arises from this process is that efficient algorithms will roll over row windows, rather than time windows, the latter being of interest here. Hence the function uniform()
was written to make rows of a data set uniform in terms of time, hence making row windows and time windows one in the same (see ?uniform
).
For the sake of example, let’s say we want to know the best 1, 5 and 20 minute powers for the present data:
tsec <- c(1, 5, 20) * 60 # Time windows must be in units of seconds.
mmv(data = cycling_data,
windows = tsec,
column = power.W,
verbose = FALSE)
## 60 300 1200
## Best mean value 522.47 390.06 245.19
## Recorded @ 6275.00 6271.00 145.00
The function mmv
(maximal mean values) returns a matrix object with two rows: the first row shows the actual best values; the second row states where in the data (start time; seconds) those values were recorded. So the best 20 minute power was recorded from roughly ~ 2.5 minutes to 22.5 minutes.
A simpler and slightly more efficient version of this function is provided as mmv2()
. This can be used with any numeric vector (rather than just “cycleRdata” objects, as above), with windows
given in row units. Hence, to achieve the above:
mmv2(x = uniform(cycling_data, verbose = FALSE)$power.W, # Messy!
windows = tsec)
## [1] 522.4667 376.6833 245.1867
Another interesting aspect of cycling data is “time in zones”. In terms of power, we might be interested in how much time during a ride we spent between 300 and 400 watts, for example. This can be achieved quite simply via:
zone_time(data = cycling_data,
column = power.W, # Because we're interested in power.
zbounds = c(300, 400), # Zone boundaries.
pct = TRUE) # Return zone times as percentages.
## Zone 1 Zone 2 Zone 3
## 84 11 5
# And plot the above:
zdist_plot(data = cycling_data,
column = power.W,
zbounds = c(300, 400))
These analyses are made possible by the function zone_index()
, which generates a zone identifier column for a numeric vector according to supplied boundaries. This identifier can then be used for factor-wise operations (i.e. tapply()
). This will be useful for those wanting to conduct heart rate zone analyses etc.
A common scenario is that we want to partition a ride according to breaks in the data - that is, stops or time periods for which nothing was recorded. For example, the beginning of a race might be neutralised, and hence we want to exclude this from our retrospective analysis. One way to do so might be to scroll through the timer.s
values in our formatted data and find the break of interest, after which we would subset the data to omit this neutralised region. Or, more efficiently, scroll down the delta time (delta.t
) values to find the atypical values, and thus stops. The latter is performed by diff_section()
. This is perhaps best explained by way of example.
# A hypothetical, and tragically short, 1 minute ride.
timer.s <- 1:60
diff_section(timer.s)
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [36] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
# A similiarly tragic ride, but now with a 1 minute cafe stop.
timer.s <- c(1:60, 120:180)
diff_section(timer.s)
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [36] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2
## [71] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [106] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
# Applied to an *actual* ride:
cycling_data$stop_sections <- diff_section(cycling_data$timer.s)
# How many breaks were there?
unique(cycling_data$stop_sections)
## [1] 1 2 3 4 5
With these section levels appended to the data, it is straightforward to subset the data as desired:
subset(cycling_data, stop_sections == 2)
Or perform factor-wise operations:
# Average powers for the 5 sections.
with(cycling_data, tapply(power.W, stop_sections, mean))
## 1 2 3 4 5
## 233.0274 199.5548 130.0452 188.3177 168.5938
Elapsed time - the time from starting recording to stopping recording, irrespective of stoppages - is easy to calculate from head-unit data:
max(cycling_data$timer.s) / 60 # In minutes.
## [1] 147.9667
However, if you’re interested in time spent actually riding: use ride_time()
:
ride_time(cycling_data$timer.s) / 60 # In minutes.
## [1] 145.1667
While not formally integrated into this package, a final want of cyclists is the ability to map their ride. Perhaps the easiest way to achieve this is via the package leaflet
.
# GPS data must be available; and the user must be
# connected to the internet.
# Not Run.
library(leaflet)
leaflet(cycling_data) %>% addTiles() %>% addPolylines(~lng, ~lat)
And as simply as that you have an interactive route, as well as all the extended functionality of leaflet
.
Those are the core functions of this package at this time. Used alongside the standard operations available within R, I hope this package enables useRs to gain much greater insight into their own cycling data.