Modular Reporting with `heddlr`

Mike Mahoney

2020-03-23

This vignette serves as a basic introduction to the heddlr package, a set of utilities to make it easier to write R Markdown documents with sections that repeat or which might need to add or remove sections based on an underlying data source. In order to demonstrate the essentials of how the package works, let’s imagine we have a super cool R Markdown document, which looks something like this:

---
title: "My cool report!"
author: "Captain heddlr"
output: html_document
---
# Let's talk about irises!

## Iris setosa

This species of flower is great! It has a mean sepal length of 
`.r mean(iris[iris$Species == "setosa", "Sepal.Length"])`, and a
mean sepal width of `.r mean(iris[iris$Species == "setosa", "Sepal.Width"])`. 
That looks like this on a graph!

.```{r}
iris %>%
  filter(Species == "setosa") %>%
  ggplot(aes(Sepal.Length, Sepal.Width)) + 
  geom_point()
.```

## Iris virginica

This species of flower is great! It has a mean sepal length of 
`.r mean(iris[iris$Species == "virginica", "Sepal.Length"])`, and a
mean sepal width of `.r mean(iris[iris$Species == "virginica", "Sepal.Width"])`. 
That looks like this on a graph!

.```{r}
iris %>%
  filter(Species == "virginica") %>%
  ggplot(aes(Sepal.Length, Sepal.Width)) + 
  geom_point()
.```

This is a great report, and it probably didn’t take that long to create. However, one day Joe down the hall points out that there are actually more types of irises than the ones you’re always talking about - and your database already has information about one called versicolor!

If you wanted, you could go ahead and copy and paste the species section again, making sure to change all the species names to versicolor. However, there’s some level of risk associated with that – you can wind up with errors in your reporting if you miss replacing a value. More importantly, though, is that copying and pasting by hand doesn’t scale to reports which are put out often and need multiple sections added or removed. It can save a lot of time and energy to instead automate that task away.

That’s the motivation behind heddlr¹: to reduce that repetitive work and, as a side effect, simplify your code. To do so, heddlr looks at reports as a collection of components, which we’ll call patterns. As it happens, our super cool report above is made up of two patterns – first, the setup material:

---
title: "My cool report!"
author: "Captain heddlr"
output: html_document
---

.```{r setup}
library(dplyr)
library(ggplot2)
.```
# Let's talk about irises!

And secondly the species-specific section, which gets repeated for each species in our dataset – I’ve swapped the specific name out for a placeholder value:

## Iris SPECIES_NAME

This species of flower is great! It has a mean sepal length of 
`.r mean(iris[iris$Species == "SPECIES_NAME", "Sepal.Length"])`, and a
mean sepal width of `.r mean(iris[iris$Species == "SPECIES_NAME", "Sepal.Width"])`. 
That looks like this on a graph!

.```{r}
iris %>%
  filter(Species == "SPECIES_NAME") %>%
  ggplot(aes(Sepal.Length, Sepal.Width)) + 
  geom_point()
.```

When using heddlr, we’ll usually go ahead and save each of those patterns in their own files – for our example report, we’ll name those files setup_pattern.Rmd and species_pattern.Rmd respectively².

We then have a few ways we can import them into our R session. Let’s load heddlr and then walk through them:

library(heddlr)

The first and most straightforward method is to use heddlr::import_pattern(), which does more or less what you’d expect and imports a single pattern into a single R object.

# These can be any sort of plaintext file -- I tend to save them as .Rmd,
# so that I can see code highlighting in R Studio with them, but any extension
# should work fine
setup_pattern <- import_pattern("setup_pattern.Rmd")
species_pattern <- import_pattern("species_pattern.Rmd")

This gives us objects that contain strings like this:

setup_pattern
#> [1] "---\ntitle: \"My cool report!\"\nauthor: \"Captain heddlr\"\noutput: html_document\n---\n\n```{r setup}\nlibrary(dplyr)\nlibrary(ggplot2)\n```\n\n# Let's talk about irises!\n\n"

However, it can be helpful in reports with multiple patterns to store everything in one object, just to have fewer things floating around your top-level environment. heddlr provides heddlr::import_draft() for this purpose, wrapping an lapply call which will return a single list object holding all of your patterns:

iris_draft <- import_draft(
  "setup_pattern" = "setup_pattern.Rmd",
  "species_pattern" = "species_pattern.Rmd"
)

iris_draft
#> $setup_pattern
#> [1] "---\ntitle: \"My cool report!\"\nauthor: \"Captain heddlr\"\noutput: html_document\n---\n\n```{r setup}\nlibrary(dplyr)\nlibrary(ggplot2)\n```\n\n# Let's talk about irises!\n\n"
#> 
#> $species_pattern
#> [1] "## Iris SPECIES_NAME\n\nThis species of flower is great! It has a mean sepal length of \n`r mean(iris[iris$Species == \"SPECIES_NAME\", \"Sepal.Length\"])`, and \nmean sepal width of `r mean(iris[iris$Species == \"SPECIES_NAME\", \"Sepal.Width\"])`. \nThat looks like this on a graph!\n\n```{r}\niris %>%\n  filter(Species == \"SPECIES_NAME\") %>%\n  ggplot(aes(Sepal.Length, Sepal.Width)) + \n  geom_point()\n```\n"

Now that we’ve got our patterns into R, it’s time to start working with them. To demonstrate how we do that with heddlr, we first need to load a few libraries:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
library(purrr)

Our first function that we’ll use to work with patterns is heddlr::heddle(). On the most basic level, this is the function that will replace the placeholders in our patterns with our data. If we have a vector containing the values we want to use for each pattern, we’re able to use this function as follows and get a vector in return:

# heddle takes three arguments: data, pattern, placeholder to replace
heddle(unique(iris$Species), "This is a pattern - CODE ", "CODE")
#> [1] "This is a pattern - setosa "     "This is a pattern - versicolor "
#> [3] "This is a pattern - virginica "

In the context of our example document, this function call would look like this:

heddle(unique(iris$Species), iris_draft$species_pattern, "SPECIES_NAME")[[1]]
#> [1] "## Iris setosa\n\nThis species of flower is great! It has a mean sepal length of \n`r mean(iris[iris$Species == \"setosa\", \"Sepal.Length\"])`, and \nmean sepal width of `r mean(iris[iris$Species == \"setosa\", \"Sepal.Width\"])`. \nThat looks like this on a graph!\n\n```{r}\niris %>%\n  filter(Species == \"setosa\") %>%\n  ggplot(aes(Sepal.Length, Sepal.Width)) + \n  geom_point()\n```\n"

It isn’t super important, but I think of the outputs from heddle() as being components of a larger template, and will be using similar terminology from here on out.

If we wanted to instead use heddle() as part of a magrittr pipeline, we can create new dataframe columns using dplyr::mutate():

iris %>%
  distinct(Species) %>%
  # exact same pattern of arguments: data, pattern, placeholder to replace
  mutate(component = heddle(Species, iris_draft$species_pattern, "SPECIES_NAME"))
#>      Species
#> 1     setosa
#> 2 versicolor
#> 3  virginica
#>                                                                                                                                                                                                                                                                                                                                                                                                       component
#> 1                 ## Iris setosa\n\nThis species of flower is great! It has a mean sepal length of \n`r mean(iris[iris$Species == "setosa", "Sepal.Length"])`, and \nmean sepal width of `r mean(iris[iris$Species == "setosa", "Sepal.Width"])`. \nThat looks like this on a graph!\n\n```{r}\niris %>%\n  filter(Species == "setosa") %>%\n  ggplot(aes(Sepal.Length, Sepal.Width)) + \n  geom_point()\n```\n
#> 2 ## Iris versicolor\n\nThis species of flower is great! It has a mean sepal length of \n`r mean(iris[iris$Species == "versicolor", "Sepal.Length"])`, and \nmean sepal width of `r mean(iris[iris$Species == "versicolor", "Sepal.Width"])`. \nThat looks like this on a graph!\n\n```{r}\niris %>%\n  filter(Species == "versicolor") %>%\n  ggplot(aes(Sepal.Length, Sepal.Width)) + \n  geom_point()\n```\n
#> 3     ## Iris virginica\n\nThis species of flower is great! It has a mean sepal length of \n`r mean(iris[iris$Species == "virginica", "Sepal.Length"])`, and \nmean sepal width of `r mean(iris[iris$Species == "virginica", "Sepal.Width"])`. \nThat looks like this on a graph!\n\n```{r}\niris %>%\n  filter(Species == "virginica") %>%\n  ggplot(aes(Sepal.Length, Sepal.Width)) + \n  geom_point()\n```\n

We can also use heddle() as its own step in a pipeline. When we pass heddle() a dataframe like this, instead of a vector, we have to specify which column we want to replace the placeholder with:

iris %>%
  distinct(Species) %>%
  # the data argument is provided by %>%
  # so we just provide the pattern and placeholder
  # (format "PLACEHOLDER" = Variable)
  heddle("This is a pattern - CODE ", "CODE" = Species)
#> [1] "This is a pattern - setosa "     "This is a pattern - versicolor "
#> [3] "This is a pattern - virginica "

An advantage of this method is we can now replace multiple placeholders with our data in a single function call:

iris %>%
  distinct(Species) %>%
  heddle("This is a pattern - CODE ", "CODE" = Species, "This" = Species)
#> [1] "setosa is a pattern - setosa "        
#> [2] "versicolor is a pattern - versicolor "
#> [3] "virginica is a pattern - virginica "

One last way we can use heddle() is to use purrr::map() to apply it to nested columns made via tidyr::nest(). To do so, we just provide arguments in the same way as above:

iris %>%
  nest(nested = Species) %>%
  mutate(component = map(nested,
    heddle,
    "This is a pattern - CODE ",
    "CODE" = Species
  )) %>%
  head(2)
#> # A tibble: 2 x 6
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width nested           component
#>          <dbl>       <dbl>        <dbl>       <dbl> <list>           <list>   
#> 1          5.1         3.5          1.4         0.2 <tibble [1 × 1]> <chr [1]>
#> 2          4.9         3            1.4         0.2 <tibble [1 × 1]> <chr [1]>

This is also the supported way to change multiple placeholder values while saving a component as a column in a dataframe via mutate – nest the columns you’re using to replace placeholders with and then use purrr to replace the placeholders in one step.

You’ll notice that the output of this method is another list column, not the same string column we’re used to seeing as an output. Those familiar with purrr:map() might know that we can get a character output from many dataframes with purrr::map_chr() – however, this doesn’t work with dataframes where you’ll get more than one output from the map() call:

iris %>%
  nest(nested = Species) %>%
  mutate(component = map_chr(nested, heddle,
    "This is a pattern - CODE ",
    "CODE" = Species
  ))
# > Error: Result 102 must be a single string, not a character vector of length 2

Instead, we can turn to our next function, heddlr::make_template(). There are a few different ways that we can use this function. For our current situation, for instance, we can use purrr::map_chr() to apply make_template() to our new component list column, transforming it into the normal set of strings we’re used to:

iris %>%
  nest(nested = Species) %>%
  mutate(
    component = map(nested, heddle, "This is a pattern - CODE ", "CODE" = Species),
    component = map_chr(component, make_template)
  ) %>%
  head(2)
#> # A tibble: 2 x 6
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width nested     component        
#>          <dbl>       <dbl>        <dbl>       <dbl> <list>     <chr>            
#> 1          5.1         3.5          1.4         0.2 <tibble [… "This is a patte…
#> 2          4.9         3            1.4         0.2 <tibble [… "This is a patte…

In addition to this, there are two other contexts we can use make_template(). If we pass it a dataframe and a vector, it will collapse that vector into a single string:

iris %>%
  nest(nested = Species) %>%
  mutate(
    component = map(nested, heddle,
      "This is a pattern - CODE ",
      "CODE" = Species
    ),
    component = map_chr(component, make_template)
  ) %>%
  head(2) %>%
  make_template(component)
#> [1] "This is a pattern - setosa This is a pattern - setosa "

If instead the first argument we pass it is a vector, it will combine all the vectors you provide it into a single string:

make_template("Part one, ", "part two")
#> [1] "Part one, part two"

In the context of our example report, these steps would look something like this:

species_template <- iris %>%
  distinct(Species) %>%
  mutate(component = heddle(
    Species,
    iris_draft$species_pattern,
    "SPECIES_NAME"
  )) %>%
  make_template(component)

report_template <- make_template(iris_draft$setup_pattern, species_template)

I refer to these output strings as templates, which are the second to last step in the heddlr pipeline. However, in order to actually create our reports, we need to export these templates to .Rmd files. In order to do that, heddlr provides a helpful function, aptly named heddlr::export_template(). Normally, export_template takes two arguments – the template to write out, and the file to write it out to. Here, I’m going to tell it to print to stdout() instead – that will just output here what our sample report would look like:

suppressWarnings(export_template(report_template, stdout()))
#> ---
#> title: "My cool report!"
#> author: "Captain heddlr"
#> output: html_document
#> ---
#> 
#> ```{r setup}
#> library(dplyr)
#> library(ggplot2)
#> ```
#> 
#> # Let's talk about irises!
#> 
#> ## Iris setosa
#> 
#> This species of flower is great! It has a mean sepal length of 
#> `r mean(iris[iris$Species == "setosa", "Sepal.Length"])`, and 
#> mean sepal width of `r mean(iris[iris$Species == "setosa", "Sepal.Width"])`. 
#> That looks like this on a graph!
#> 
#> ```{r}
#> iris %>%
#>   filter(Species == "setosa") %>%
#>   ggplot(aes(Sepal.Length, Sepal.Width)) + 
#>   geom_point()
#> ```
#> ## Iris versicolor
#> 
#> This species of flower is great! It has a mean sepal length of 
#> `r mean(iris[iris$Species == "versicolor", "Sepal.Length"])`, and 
#> mean sepal width of `r mean(iris[iris$Species == "versicolor", "Sepal.Width"])`. 
#> That looks like this on a graph!
#> 
#> ```{r}
#> iris %>%
#>   filter(Species == "versicolor") %>%
#>   ggplot(aes(Sepal.Length, Sepal.Width)) + 
#>   geom_point()
#> ```
#> ## Iris virginica
#> 
#> This species of flower is great! It has a mean sepal length of 
#> `r mean(iris[iris$Species == "virginica", "Sepal.Length"])`, and 
#> mean sepal width of `r mean(iris[iris$Species == "virginica", "Sepal.Width"])`. 
#> That looks like this on a graph!
#> 
#> ```{r}
#> iris %>%
#>   filter(Species == "virginica") %>%
#>   ggplot(aes(Sepal.Length, Sepal.Width)) + 
#>   geom_point()
#> ```

And there we have it! Repeating the above steps will always generate sections for each flower included in your dataset, whether or not Joe down the hall remembered to tell you about it.

To recap, the essential steps to make our report can be summed up as follows:

First, we decomposed our report into patterns, saved as individual .Rmd files.
We then imported those files using either import_pattern() or import_draft()
We replicated the patterns and replaced placeholders using heddle()
We combined those components into templates via make_template()
We exported our final template into a report through export_template()

I’ll usually save these steps into a “generator” file, which I can then run every time I need to regenerate my report. I’ll also often include a step in that file that calls rmarkdown::render() on the generated report file, so that I can just run a single script in order to remake and rebuild my entire report. Obviously, this can be something of an overkill for reports as simple as our example here, but it can be a huge time saver for more complex reports that have more repeating sections, built more often, or are based on more dynamic data sources that require maintenance to add or remove components between builds. It also can help make your code DRYer³ so you can make edits in a single place and see them replicated across your entire report.

If you want to see this process applied to a somewhat more complicated example, check out how we can make this dashboard via this article.

It’s a play on the loom component, since we’re trying to automate the Sweave/knitr process and someone already took the name loomr. ↩
In order for the vignettes for this package to build correctly, I haven’t actually saved these files off separately – if you look at the source code for this vignette, you’ll notice I’m not actually using the heddlr functions, but rather using some workarounds to get the same exact results. ↩
Don’t repeat yourself ↩

Modular Reporting with heddlr

Mike Mahoney

2020-03-23

Modular Reporting with `heddlr`