The idea is seplyr
package lets you program over dplyr
0.7.* without needing a Ph.D. in computer science.
In dplyr
if you know the names of columns when you are writing code you can write code such as the following.
suppressPackageStartupMessages(library("dplyr"))
datasets::mtcars %>%
group_by(cyl, gear) %>%
head()
## # A tibble: 6 x 11
## # Groups: cyl, gear [4]
## mpg cyl disp hp drat wt qsec vs am gear carb
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## 5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## 6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
If instead the names of the columns are coming from a variable set elsewhere you need to use a tool to substitute those names in as show below.
groupingVars <- c('cyl', 'gear') # assume this is set elsewhere
datasets::mtcars %>%
group_by(!!!rlang::syms(groupingVars)) %>%
head()
## # A tibble: 6 x 11
## # Groups: cyl, gear [4]
## mpg cyl disp hp drat wt qsec vs am gear carb
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## 5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## 6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
If you don’t want to try and digest entire theory of quasi-quoting (the rlang::syms()
) and splicing (the !!!
) then you can use seplyr
which conveniently wraps the operations as follows:
library("seplyr")
datasets::mtcars %>%
group_by_se(groupingVars) %>%
head()
## # A tibble: 6 x 11
## # Groups: cyl, gear [4]
## mpg cyl disp hp drat wt qsec vs am gear carb
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## 5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## 6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
And that is it. seplyr::group_by_se()
performs the wrapping for you without you having to work through the details of rlang
. If you are interested in the details seplyr
itself is a good tutorial. For example you can examine seplyr
’s implementation to see the necessary notations:
print(group_by_se)
## function (.data, groupingVars, add = FALSE)
## {
## groupingSyms <- rlang::syms(groupingVars)
## group_by(.data = .data, !(!(!groupingSyms)), add = add)
## }
## <bytecode: 0x7fc26bc92ca8>
## <environment: namespace:seplyr>
And of course we try to supply some usable help entries, example help(group_by_se)
:
group_by standard interface.
Description
Group a data frame by the groupingVars. Author: John Mount, Win-Vector LLC.
Usage
group_by_se(.data, groupingVars, add = FALSE)
Arguments
.data data.frame
groupingVars character vector of column names to group by.
add logical, passed to group_by
Value
.data grouped by columns named in groupingVars
Examples
group_by_se(datasets::mtcars, c("cyl", "gear")) %>%
head()
# roughly equivalent to:
# do.call(group_by_, c(list(datasets::mtcars), c('cyl', 'gear')))
In addition to a series of adapters we also supply a number of useful new verbs including:
add_group_indices()
Adds a column of per-group ids to data.add_group_summaries()
Adds per-group summaries to data.group_summarize()
Binds grouping, arrangement, and summarization together for clear documentation of intent.