Getting Started with REDCapTidieR

Introduction to REDCap and REDCapR

REDCap is a powerful database solution used by many institutions around the world:

“REDCap is a secure web application for building and managing online surveys and databases. While REDCap can be used to collect virtually any type of data in any environment (including compliance with 21 CFR Part 11, FISMA, HIPAA, and GDPR), it is specifically geared to support online and offline data capture for research studies and operations. The REDCap Consortium, a vast support network of collaborators, is composed of thousands of active institutional partners in over one hundred countries who utilize and support their own individual REDCap systems.”

The {REDCapR} package streamlines calls to the REDCap API. Arguably, its main use is to import records from a REDCap project. This works well for simple projects, however becomes ugly when complex databases that include longitudinal structure and/or repeated instruments are involved.

We wrote the REDCapTidieR package to make the life of analysts who deal with complex REDCap databases easier. It does so by building upon {REDCapR} to make its output tidier. Instead of one large data frame composed of a sparse matrix, the analyst gets to work with a set of tidy tibbles, one for each REDCap instrument.

To demonstrate the use of REDCapTidieR, let’s look at a REDCap database that has information about some 734 superheroes, derived from data scraped from the Superhero Database.

This REDCap project contains two instruments:

Here is a screenshot of the REDCap Status Dashboard of this database. Note that Abin Sur (record #2) has a single circle in the Super Hero Powers column, indicating that they have one superpower. Agent 13 (record #8) has no superpowers.

Great! Now let’s import the superheroes data into R. We can use REDCapR::redcap_read_oneshot() which returns a list with and element named data that contains all of the data as a data frame. We turned this data frame into a tibble for better readability:

superheroes <- REDCapR::redcap_read_oneshot(redcap_uri, token)$data

superheroes %>% tibble()

This data structure is sometimes called the sparse matrix. It’s what happens when REDCap mashes the contents of a database that has both repeating and non-repeating instruments into a single table.

While it may seem a good idea to have everything in one data frame, there are significant downsides, including:

Tidying REDCap Exports

The main function of the {REDCapTidieR} package is the read_redcap_tidy() function. It has a similar API to REDCapR::redcap_read_oneshot(), requiring a REDCap database URI and an API token.

Let’s try it out and observe the output:

library(REDCapTidieR)
superheroes_tidy <- read_redcap_tidy(redcap_uri, token)

superheroes_tidy

This returns a tibble with two rows. This may be surprising because you might expect more rows from a database with 734 superheroes. However, this is a tibble of tibbles, or a supertibble.

In the REDCapTidieR supertibble, each row represents a REDCap instrument. The first column contains the instrument name. The second column is a list column containing a tibble for each instrument. The third column indicates the repeat/nonrepeat structure of the instrument.

Extracting Tibbles from the Supertibble

There’s a good chance that if we pull data from a REDCap database that we’d like that data to be represented as individual tibbles in the global environment. While it’s possible to do this manually (see below), this can become tedious if the REDCap project has many instruments. We wrote the bind_tables() function to automate this.

bind_tables() takes the output of read_redcap_tidy(), extracts the individual tibbles and binds them to an environment. By default, this is the global environment, but you can also supply your own environment object which will be modified using reference semantics.

Let’s take a look at the data frames in the global environment before and after calling bind_tables():

ls.str(Filter(is.data.frame, as.list(.GlobalEnv)))
superheroes_tidy %>%
  bind_tables()
ls.str(Filter(is.data.frame, as.list(.GlobalEnv)))

Note that there are now two additional tibbles, heroes_information and super_hero_powers, in the global environment!

If you don’t like the idea of tibbles magically appearing in your environment or if you’d like a more pure approach to extracting tibbles, you can use the extract_table() or extract_tables() functions.

Use extract_table() to extract a single tibble from the supertibble:

superheroes_tidy %>%
  extract_table("heroes_information")

Use extract_tables() to create a named list of tibbles from a supertibble. The default is to extract all tibbles:

superheroes_list_of_tibbles <- superheroes_tidy %>%
  extract_tables()

str(superheroes_list_of_tibbles, max.level = 1)

A neat feature of these extraction functions is that they support tidy-select semantics and selectors for picking tables:

superheroes_list_of_tibbles_ending_with_powers <- superheroes_tidy %>%
  extract_tables(ends_with("powers"))

str(superheroes_list_of_tibbles_ending_with_powers, max.level = 1)

Structure of REDCapTidieR Tibbles

So what do the REDCapTidieR tibbles (the ones inside the supertibble) look like? Consider heroes_information, which contains data from a nonrepeating instrument, and note the following:

heroes_information

Now look at the super_hero_powers tibble, which contains data from a repeating instrument, and note the following:

super_hero_powers

In summary, here are the rules by which REDCapTidieR constructs tibbles:

Note: Taken in combination, the identifying columns of any REDCapTidieR tibble are guaranteed to be unique and NOT NULL, and can be used as composite primary key. This makes it easy to join REDCapTidieR tibbles!