Introduction to gestalt

Eugene Ha

The gestalt package provides a function composition operator, %>>>%, which improves the clarity, modularity, and versatility of your functions by enabling you to:

More importantly, gestalt fosters a powerful way of thinking about values as functions.

Overview

The following example (adapted from purrr) illustrates the use of %>>>% to express a function that takes the name of a factor-column of the mtcars data frame, fits a linear model to the corresponding groups, then computes the R² of the summary.

library(gestalt)

fit <- mpg ~ wt

r2 <- {split(mtcars, mtcars[[.]])} %>>>%
  lapply(function(data) lm(!!fit, data)) %>>>%
  summarize: (
    lapply(summary) %>>>%
      stat: sapply(`[[`, "r.squared")
  )

r2("cyl")
#>         4         6         8 
#> 0.5086326 0.4645102 0.4229655

gestalt leverages the ubiquity of the magrittr %>% operator, by adopting its semantics and augmenting it to enable you to:

Ceci n’est pas une %>%

Despite the syntactic similarity, the %>>>% operator is conceptually distinct from the magrittr %>% operator. Whereas %>% “pipes” a value into a function to yield a value, %>>>% composes functions to yield a function.

The most significant distinction, however, is that list idioms apply to composite functions made by %>>>%, so that you can inspect, modify, and repurpose them, intuitively.

Select segments of functions using indexing

To select the first two functions in r2, in order to get the fitted model, index with the vector 1:2:

r2[1:2]("cyl")[["6"]]  # Cars with 6 cylinders
#> 
#> Call:
#> lm(formula = mpg ~ wt, data = data)
#> 
#> Coefficients:
#> (Intercept)           wt  
#>       28.41        -2.78

Repurpose using subset-assignment

To compute the residuals rather than the R², reassign the summary-statistic function:

residuals <- r2
residuals$summarize$stat <- function(s) sapply(s, `[[`, "residuals")
residuals("cyl")[["6"]]
#>      Mazda RX4  Mazda RX4 Wag Hornet 4 Drive        Valiant       Merc 280 
#>     -0.1249670      0.5839601      1.9291961     -0.6896780      0.3547199 
#>      Merc 280C   Ferrari Dino 
#>     -1.0452801     -1.0079511

Inspect or modify using higher-order functions

Consider a function that capitalizes and joins a random selection of characters:

scramble <- sample %>>>% toupper %>>>% paste(collapse = "")

set.seed(1)
scramble(letters, 5)
#> [1] "YDGAB"

Here you see the final result of the composition. But because scramble is a list-like object, you can also inspect its intermediate steps by applying a standard “map-reduce” strategy, such as the following higher-order function:

stepwise <- lapply(`%>>>%`, print) %>>>% compose

stepwise maps over the constituent functions of a composite function to add printing at each step:

set.seed(1)
stepwise(scramble)(letters, 5)
#> [1] "y" "d" "g" "a" "b"
#> [1] "Y" "D" "G" "A" "B"
#> [1] "YDGAB"

The value of values as functions

Whenever you have a value that results from a series of piped values, such as

library(magrittr)

mtcars %>% 
  split(.$cyl) %>% 
  lapply(function(data) lm(mpg ~ wt, data)) %>% 
  lapply(summary) %>% 
  sapply(`[[`, "r.squared")
#>         4         6         8 
#> 0.5086326 0.4645102 0.4229655

you can transpose it to a constant composite function that computes the same value, simply by treating the input value as a constant function and replacing each function application, %>%, by function composition, %>>>%:

R2 <- {mtcars} %>>>% 
  split(.$cyl) %>>>%
  lapply(function(data) lm(mpg ~ wt, data)) %>>>%
  lapply(summary) %>>>%
  sapply(`[[`, "r.squared")

You gain power by treating (piped) values as (composite) functions:

  1. Values as functions are lazy. You can separate the value’s declaration from its point of use—the value is only computed on demand:

    R2()
    #>         4         6         8 
    #> 0.5086326 0.4645102 0.4229655
  2. Values as functions are cheap. You can cache the value of R2 by declaring it as a constant:

    R2 <- constant(R2)
    R2()
    #>         4         6         8 
    #> 0.5086326 0.4645102 0.4229655
    
    # On a 2016 vintage laptop
    microbenchmark::microbenchmark(R2(), times = 1e6)  
    #> Unit: nanoseconds
    #>  expr min  lq     mean median  uq      max neval
    #>  R2() 532 567 709.1435    585 647 39887308 1e+06
  3. Values as functions encode their computation. Since a composite function qua computation is a list-like object, you can compute on it to extract latent information.

    For instance, you can get the normal Q–Q plot of the fitted model for 6-cylinder cars from the head of R2:

    head(R2, 3)() %>% .[["6"]] %>% plot(2)

Complements

In conjunction with %>>>%, gestalt also provides:

See the package documentation for more details (help(package = gestalt)).

Acknowledgments