Brief note: I created this package to make my analyses easier. So some statistics that have been implemented were chosen because that’s what I’ve done. If you would like a particular statistical method included, please fill out an Issue and I will try to implement it!
Most analyses follow a similar pattern to how construction/engineering projects are developed: design -> add specifications -> construction -> (optional) add to the design and specs -> cleaning, scrubbing, and polishing. The mason
package tries to emulate this process to make it easier to do analyses in a consistent and ‘tidy’ format.
The general command flow for using mason
is:
design()
).add_settings()
).add_variables()
). These variables include the \(y\) variables (outcomes), the \(x\) variables (predictors), covariates, and interaction variables.construct()
).add_variables()
or add_settings()
after the construct()
to add to the existing results.scrub()
and polish_*()
commands). The results are now ready for further presentation in a figure or table!Let’s go over an example analysis. We’ll use glm
for a simple linear regression. Let’s use the built-in swiss
dataset. A quick peek at it shows:
head(swiss)
#> Fertility Agriculture Examination Education Catholic
#> Courtelary 80.2 17.0 15 12 9.96
#> Delemont 83.1 45.1 6 9 84.84
#> Franches-Mnt 92.5 39.7 5 5 93.40
#> Moutier 85.8 36.5 12 7 33.77
#> Neuveville 76.9 43.5 17 15 5.16
#> Porrentruy 76.1 35.3 9 7 90.57
#> Infant.Mortality
#> Courtelary 22.2
#> Delemont 22.2
#> Franches-Mnt 20.2
#> Moutier 20.3
#> Neuveville 20.6
#> Porrentruy 26.6
Ok, let’s say we want to several models. We are interested in Fertility
and Infant.Mortality
as outcomes and Education
and Agriculture
as potential predictors. We also want to control for Catholic
. This setup means we have four potential models to analyze. With mason this is relatively easy. Analyses in mason are essentially separated into a blueprint phase and a construction phase. Since any structure or building always needs a blueprint, let’s get that started.
library(mason)
design(swiss, 'glm')
#> Fertility Agriculture Examination Education Catholic
#> Courtelary 80.2 17.0 15 12 9.96
#> Delemont 83.1 45.1 6 9 84.84
#> Franches-Mnt 92.5 39.7 5 5 93.40
#> Moutier 85.8 36.5 12 7 33.77
#> Neuveville 76.9 43.5 17 15 5.16
#> Porrentruy 76.1 35.3 9 7 90.57
#> Broye 83.8 70.2 16 7 92.85
#> Glane 92.4 67.8 14 8 97.16
#> Gruyere 82.4 53.3 12 7 97.67
#> Sarine 82.9 45.2 16 13 91.38
#> Veveyse 87.1 64.5 14 6 98.61
#> Aigle 64.1 62.0 21 12 8.52
#> Aubonne 66.9 67.5 14 7 2.27
#> Avenches 68.9 60.7 19 12 4.43
#> Cossonay 61.7 69.3 22 5 2.82
#> Echallens 68.3 72.6 18 2 24.20
#> Grandson 71.7 34.0 17 8 3.30
#> Lausanne 55.7 19.4 26 28 12.11
#> La Vallee 54.3 15.2 31 20 2.15
#> Lavaux 65.1 73.0 19 9 2.84
#> Morges 65.5 59.8 22 10 5.23
#> Moudon 65.0 55.1 14 3 4.52
#> Nyone 56.6 50.9 22 12 15.14
#> Orbe 57.4 54.1 20 6 4.20
#> Oron 72.5 71.2 12 1 2.40
#> Payerne 74.2 58.1 14 8 5.23
#> Paysd'enhaut 72.0 63.5 6 3 2.56
#> Rolle 60.5 60.8 16 10 7.72
#> Vevey 58.3 26.8 25 19 18.46
#> Yverdon 65.4 49.5 15 8 6.10
#> Conthey 75.5 85.9 3 2 99.71
#> Entremont 69.3 84.9 7 6 99.68
#> Herens 77.3 89.7 5 2 100.00
#> Martigwy 70.5 78.2 12 6 98.96
#> Monthey 79.4 64.9 7 3 98.22
#> St Maurice 65.0 75.9 9 9 99.06
#> Sierre 92.2 84.6 3 3 99.46
#> Sion 79.3 63.1 13 13 96.83
#> Boudry 70.4 38.4 26 12 5.62
#> La Chauxdfnd 65.7 7.7 29 11 13.79
#> Le Locle 72.7 16.7 22 13 11.22
#> Neuchatel 64.4 17.6 35 32 16.92
#> Val de Ruz 77.6 37.6 15 7 4.97
#> ValdeTravers 67.6 18.7 25 7 8.65
#> V. De Geneve 35.0 1.2 37 53 42.34
#> Rive Droite 44.7 46.6 16 29 50.43
#> Rive Gauche 42.8 27.7 22 29 58.33
#> Infant.Mortality
#> Courtelary 22.2
#> Delemont 22.2
#> Franches-Mnt 20.2
#> Moutier 20.3
#> Neuveville 20.6
#> Porrentruy 26.6
#> Broye 23.6
#> Glane 24.9
#> Gruyere 21.0
#> Sarine 24.4
#> Veveyse 24.5
#> Aigle 16.5
#> Aubonne 19.1
#> Avenches 22.7
#> Cossonay 18.7
#> Echallens 21.2
#> Grandson 20.0
#> Lausanne 20.2
#> La Vallee 10.8
#> Lavaux 20.0
#> Morges 18.0
#> Moudon 22.4
#> Nyone 16.7
#> Orbe 15.3
#> Oron 21.0
#> Payerne 23.8
#> Paysd'enhaut 18.0
#> Rolle 16.3
#> Vevey 20.9
#> Yverdon 22.5
#> Conthey 15.1
#> Entremont 19.8
#> Herens 18.3
#> Martigwy 19.4
#> Monthey 20.2
#> St Maurice 17.8
#> Sierre 16.3
#> Sion 18.1
#> Boudry 20.3
#> La Chauxdfnd 20.5
#> Le Locle 18.9
#> Neuchatel 23.0
#> Val de Ruz 20.0
#> ValdeTravers 19.5
#> V. De Geneve 18.0
#> Rive Droite 18.2
#> Rive Gauche 19.3
So far, all we’ve done is created a blueprint of the analysis, but it doesn’t contain much. Let’s add some settings to the blueprint. mason was designed to make use of the %>%
pipes from the package magrittr
(also found in dplyr
), so let’s load up magrittr
!
library(magrittr)
dp <- design(swiss, 'glm') %>%
add_settings(family = gaussian())
You’ll notice that each time, the only thing that is printed to the console is the dataset. That’s because we haven’t constructed the analysis yet! We are still in the blueprint phase, so nothing new has been added! Since we have two outcomes and two predictors, we have a total of four models to analysis. Normally we would need to run each of the models separately. However, if simply list the outcomes and the predictors in mason, it will ‘loop’ through each combination and run all four models! Let’s add the variables.
dp <- dp %>%
add_variables('yvars', c('Fertility', 'Infant.Mortality')) %>%
add_variables('xvars', c('Education', 'Agriculture'))
Alright, still nothing has happened. However, we are now at the phase that we can construct the analysis using construct()
.
dp <- construct(dp)
dp
#> Fertility Agriculture Examination Education Catholic
#> Courtelary 80.2 17.0 15 12 9.96
#> Delemont 83.1 45.1 6 9 84.84
#> Franches-Mnt 92.5 39.7 5 5 93.40
#> Moutier 85.8 36.5 12 7 33.77
#> Neuveville 76.9 43.5 17 15 5.16
#> Porrentruy 76.1 35.3 9 7 90.57
#> Broye 83.8 70.2 16 7 92.85
#> Glane 92.4 67.8 14 8 97.16
#> Gruyere 82.4 53.3 12 7 97.67
#> Sarine 82.9 45.2 16 13 91.38
#> Veveyse 87.1 64.5 14 6 98.61
#> Aigle 64.1 62.0 21 12 8.52
#> Aubonne 66.9 67.5 14 7 2.27
#> Avenches 68.9 60.7 19 12 4.43
#> Cossonay 61.7 69.3 22 5 2.82
#> Echallens 68.3 72.6 18 2 24.20
#> Grandson 71.7 34.0 17 8 3.30
#> Lausanne 55.7 19.4 26 28 12.11
#> La Vallee 54.3 15.2 31 20 2.15
#> Lavaux 65.1 73.0 19 9 2.84
#> Morges 65.5 59.8 22 10 5.23
#> Moudon 65.0 55.1 14 3 4.52
#> Nyone 56.6 50.9 22 12 15.14
#> Orbe 57.4 54.1 20 6 4.20
#> Oron 72.5 71.2 12 1 2.40
#> Payerne 74.2 58.1 14 8 5.23
#> Paysd'enhaut 72.0 63.5 6 3 2.56
#> Rolle 60.5 60.8 16 10 7.72
#> Vevey 58.3 26.8 25 19 18.46
#> Yverdon 65.4 49.5 15 8 6.10
#> Conthey 75.5 85.9 3 2 99.71
#> Entremont 69.3 84.9 7 6 99.68
#> Herens 77.3 89.7 5 2 100.00
#> Martigwy 70.5 78.2 12 6 98.96
#> Monthey 79.4 64.9 7 3 98.22
#> St Maurice 65.0 75.9 9 9 99.06
#> Sierre 92.2 84.6 3 3 99.46
#> Sion 79.3 63.1 13 13 96.83
#> Boudry 70.4 38.4 26 12 5.62
#> La Chauxdfnd 65.7 7.7 29 11 13.79
#> Le Locle 72.7 16.7 22 13 11.22
#> Neuchatel 64.4 17.6 35 32 16.92
#> Val de Ruz 77.6 37.6 15 7 4.97
#> ValdeTravers 67.6 18.7 25 7 8.65
#> V. De Geneve 35.0 1.2 37 53 42.34
#> Rive Droite 44.7 46.6 16 29 50.43
#> Rive Gauche 42.8 27.7 22 29 58.33
#> Infant.Mortality
#> Courtelary 22.2
#> Delemont 22.2
#> Franches-Mnt 20.2
#> Moutier 20.3
#> Neuveville 20.6
#> Porrentruy 26.6
#> Broye 23.6
#> Glane 24.9
#> Gruyere 21.0
#> Sarine 24.4
#> Veveyse 24.5
#> Aigle 16.5
#> Aubonne 19.1
#> Avenches 22.7
#> Cossonay 18.7
#> Echallens 21.2
#> Grandson 20.0
#> Lausanne 20.2
#> La Vallee 10.8
#> Lavaux 20.0
#> Morges 18.0
#> Moudon 22.4
#> Nyone 16.7
#> Orbe 15.3
#> Oron 21.0
#> Payerne 23.8
#> Paysd'enhaut 18.0
#> Rolle 16.3
#> Vevey 20.9
#> Yverdon 22.5
#> Conthey 15.1
#> Entremont 19.8
#> Herens 18.3
#> Martigwy 19.4
#> Monthey 20.2
#> St Maurice 17.8
#> Sierre 16.3
#> Sion 18.1
#> Boudry 20.3
#> La Chauxdfnd 20.5
#> Le Locle 18.9
#> Neuchatel 23.0
#> Val de Ruz 20.0
#> ValdeTravers 19.5
#> V. De Geneve 18.0
#> Rive Droite 18.2
#> Rive Gauche 19.3
Cool! This is the unadjusted model, without any covariates. We said we wanted to adjust for Catholic
. But let’s say we want to keep the unadjusted analysis too. Since we have ‘finished’ the analysis by cleaning it up, we can still add to the blueprint.
dp2 <- dp %>%
add_variables('covariates', 'Catholic') %>%
construct()
head(dp2)
#> Fertility Agriculture Examination Education Catholic
#> Courtelary 80.2 17.0 15 12 9.96
#> Delemont 83.1 45.1 6 9 84.84
#> Franches-Mnt 92.5 39.7 5 5 93.40
#> Moutier 85.8 36.5 12 7 33.77
#> Neuveville 76.9 43.5 17 15 5.16
#> Porrentruy 76.1 35.3 9 7 90.57
#> Infant.Mortality
#> Courtelary 22.2
#> Delemont 22.2
#> Franches-Mnt 20.2
#> Moutier 20.3
#> Neuveville 20.6
#> Porrentruy 26.6
We now have two models in the results. We’re happy with them, so let’s clean it up using the scrub()
function.
dp_clean <- dp2 %>%
scrub()
All scrub()
does is removes any extra specs in the attributes and sets the results as the main dataset. You can see this by looking at it’s details and comparing to the unscrubbed version.
colnames(dp2)
#> [1] "Fertility" "Agriculture" "Examination"
#> [4] "Education" "Catholic" "Infant.Mortality"
colnames(dp_clean)
#> [1] "Yterms" "Xterms" "term" "estimate" "std.error"
#> [6] "statistic" "p.value" "conf.low" "conf.high" "sample.size"
names(attributes(dp2))
#> [1] "names" "class" "row.names" "specs"
names(attributes(dp_clean))
#> [1] "names" "class" "row.names"
class(dp2)
#> [1] "glm_bp" "bp" "data.frame"
class(dp_clean)
#> [1] "tbl_df" "tbl" "data.frame"
And all as a single pipe chain:
swiss %>%
design('glm') %>%
add_settings() %>%
add_variables('yvars', c('Fertility', 'Infant.Mortality')) %>%
add_variables('xvars', c('Education', 'Agriculture')) %>%
construct() %>%
add_variables('covariates', 'Catholic') %>%
construct() %>%
scrub()
#> # A tibble: 20 x 10
#> Yterms Xterms term estimate std.error
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 Fertility Agriculture (Intercept) 60.304375228 4.25125562
#> 2 Fertility Agriculture <-Xterm 0.194201749 0.07671176
#> 3 Fertility Education (Intercept) 79.610058532 2.10409711
#> 4 Fertility Education <-Xterm -0.862350293 0.14484472
#> 5 Infant.Mortality Agriculture (Intercept) 20.337954766 1.05754318
#> 6 Infant.Mortality Agriculture <-Xterm -0.007805071 0.01908283
#> 7 Infant.Mortality Education (Intercept) 20.272865076 0.65272716
#> 8 Infant.Mortality Education <-Xterm -0.030086548 0.04493333
#> 9 Fertility Agriculture (Intercept) 59.863923712 3.98753957
#> 10 Fertility Agriculture <-Xterm 0.109528109 0.07848208
#> 11 Fertility Agriculture Catholic 0.114962125 0.04273900
#> 12 Fertility Education (Intercept) 74.233689201 2.35197061
#> 13 Fertility Education <-Xterm -0.788329259 0.12929324
#> 14 Fertility Education Catholic 0.110920955 0.02980965
#> 15 Infant.Mortality Agriculture (Intercept) 20.274208854 1.04449977
#> 16 Infant.Mortality Agriculture <-Xterm -0.020059765 0.02055767
#> 17 Infant.Mortality Agriculture Catholic 0.016638302 0.01119509
#> 18 Infant.Mortality Education (Intercept) 19.717357317 0.82539716
#> 19 Infant.Mortality Education <-Xterm -0.022438401 0.04537398
#> 20 Infant.Mortality Education Catholic 0.011460792 0.01046136
#> # ... with 5 more variables: statistic <dbl>, p.value <dbl>,
#> # conf.low <dbl>, conf.high <dbl>, sample.size <int>
There are also additional polish_*
type commands that are more or less simply wrappers around commands that you may do on the results dataset, like filtering or renaming. The list of polish commands can be found in ?mason::polish
.