Introduction to mason

Luke W. Johnston

2016-07-15

Brief note: I created this package to make my analyses easier. So some statistics that have been implemented were chosen because that’s what I’ve done. If you would like a particular statistical method included, please fill out an Issue and I will try to implement it!

Most analyses follow a similar pattern to how construction/engineering projects are developed: design -> add specifications -> construction -> (optional) add to the design and specs -> cleaning, scrubbing, and polishing. The mason package tries to emulate this process to make it easier to do analyses in a consistent and ‘tidy’ format.

Basic command flow

The general command flow for using mason is:

  1. Start the design of a blueprint for the analysis by specifying which statistical technique to use in your analysis (design()).
  2. Add settings/options to the blueprint for the methods of the statistics (add_settings()).
  3. Add the variables you want to run the statistics on (add_variables()). These variables include the \(y\) variables (outcomes), the \(x\) variables (predictors), covariates, and interaction variables.
  4. Using the blueprint, construct the ‘mason project’ (stats analysis) so that the results are generated (construct()).
  5. Sometimes analyses are too big for one first pass, from blueprint to construction, and needs to add more to the blueprint. Use add_variables() or add_settings() after the construct() to add to the existing results.
  6. When you are ready, make the ‘mason project’ cleaned up by scrubbing it down and polishing it up (scrub() and polish_*() commands). The results are now ready for further presentation in a figure or table!

Example usage

Let’s go over an example analysis. We’ll use glm for a simple linear regression. Let’s use the built-in swiss dataset. A quick peek at it shows:

head(swiss)
#>              Fertility Agriculture Examination Education Catholic
#> Courtelary        80.2        17.0          15        12     9.96
#> Delemont          83.1        45.1           6         9    84.84
#> Franches-Mnt      92.5        39.7           5         5    93.40
#> Moutier           85.8        36.5          12         7    33.77
#> Neuveville        76.9        43.5          17        15     5.16
#> Porrentruy        76.1        35.3           9         7    90.57
#>              Infant.Mortality
#> Courtelary               22.2
#> Delemont                 22.2
#> Franches-Mnt             20.2
#> Moutier                  20.3
#> Neuveville               20.6
#> Porrentruy               26.6

Ok, let’s say we want to several models. We are interested in Fertility and Infant.Mortality as outcomes and Education and Agriculture as potential predictors. We also want to control for Catholic. This setup means we have four potential models to analyze. With mason this is relatively easy. Analyses in mason are essentially separated into a blueprint phase and a construction phase. Since any structure or building always needs a blueprint, let’s get that started.

library(mason)
design(swiss, 'glm')
#>              Fertility Agriculture Examination Education Catholic
#> Courtelary        80.2        17.0          15        12     9.96
#> Delemont          83.1        45.1           6         9    84.84
#> Franches-Mnt      92.5        39.7           5         5    93.40
#> Moutier           85.8        36.5          12         7    33.77
#> Neuveville        76.9        43.5          17        15     5.16
#> Porrentruy        76.1        35.3           9         7    90.57
#> Broye             83.8        70.2          16         7    92.85
#> Glane             92.4        67.8          14         8    97.16
#> Gruyere           82.4        53.3          12         7    97.67
#> Sarine            82.9        45.2          16        13    91.38
#> Veveyse           87.1        64.5          14         6    98.61
#> Aigle             64.1        62.0          21        12     8.52
#> Aubonne           66.9        67.5          14         7     2.27
#> Avenches          68.9        60.7          19        12     4.43
#> Cossonay          61.7        69.3          22         5     2.82
#> Echallens         68.3        72.6          18         2    24.20
#> Grandson          71.7        34.0          17         8     3.30
#> Lausanne          55.7        19.4          26        28    12.11
#> La Vallee         54.3        15.2          31        20     2.15
#> Lavaux            65.1        73.0          19         9     2.84
#> Morges            65.5        59.8          22        10     5.23
#> Moudon            65.0        55.1          14         3     4.52
#> Nyone             56.6        50.9          22        12    15.14
#> Orbe              57.4        54.1          20         6     4.20
#> Oron              72.5        71.2          12         1     2.40
#> Payerne           74.2        58.1          14         8     5.23
#> Paysd'enhaut      72.0        63.5           6         3     2.56
#> Rolle             60.5        60.8          16        10     7.72
#> Vevey             58.3        26.8          25        19    18.46
#> Yverdon           65.4        49.5          15         8     6.10
#> Conthey           75.5        85.9           3         2    99.71
#> Entremont         69.3        84.9           7         6    99.68
#> Herens            77.3        89.7           5         2   100.00
#> Martigwy          70.5        78.2          12         6    98.96
#> Monthey           79.4        64.9           7         3    98.22
#> St Maurice        65.0        75.9           9         9    99.06
#> Sierre            92.2        84.6           3         3    99.46
#> Sion              79.3        63.1          13        13    96.83
#> Boudry            70.4        38.4          26        12     5.62
#> La Chauxdfnd      65.7         7.7          29        11    13.79
#> Le Locle          72.7        16.7          22        13    11.22
#> Neuchatel         64.4        17.6          35        32    16.92
#> Val de Ruz        77.6        37.6          15         7     4.97
#> ValdeTravers      67.6        18.7          25         7     8.65
#> V. De Geneve      35.0         1.2          37        53    42.34
#> Rive Droite       44.7        46.6          16        29    50.43
#> Rive Gauche       42.8        27.7          22        29    58.33
#>              Infant.Mortality
#> Courtelary               22.2
#> Delemont                 22.2
#> Franches-Mnt             20.2
#> Moutier                  20.3
#> Neuveville               20.6
#> Porrentruy               26.6
#> Broye                    23.6
#> Glane                    24.9
#> Gruyere                  21.0
#> Sarine                   24.4
#> Veveyse                  24.5
#> Aigle                    16.5
#> Aubonne                  19.1
#> Avenches                 22.7
#> Cossonay                 18.7
#> Echallens                21.2
#> Grandson                 20.0
#> Lausanne                 20.2
#> La Vallee                10.8
#> Lavaux                   20.0
#> Morges                   18.0
#> Moudon                   22.4
#> Nyone                    16.7
#> Orbe                     15.3
#> Oron                     21.0
#> Payerne                  23.8
#> Paysd'enhaut             18.0
#> Rolle                    16.3
#> Vevey                    20.9
#> Yverdon                  22.5
#> Conthey                  15.1
#> Entremont                19.8
#> Herens                   18.3
#> Martigwy                 19.4
#> Monthey                  20.2
#> St Maurice               17.8
#> Sierre                   16.3
#> Sion                     18.1
#> Boudry                   20.3
#> La Chauxdfnd             20.5
#> Le Locle                 18.9
#> Neuchatel                23.0
#> Val de Ruz               20.0
#> ValdeTravers             19.5
#> V. De Geneve             18.0
#> Rive Droite              18.2
#> Rive Gauche              19.3

So far, all we’ve done is created a blueprint of the analysis, but it doesn’t contain much. Let’s add some settings to the blueprint. mason was designed to make use of the %>% pipes from the package magrittr (also found in dplyr), so let’s load up magrittr!

library(magrittr)
dp <- design(swiss, 'glm') %>% 
    add_settings(family = gaussian())

You’ll notice that each time, the only thing that is printed to the console is the dataset. That’s because we haven’t constructed the analysis yet! We are still in the blueprint phase, so nothing new has been added! Since we have two outcomes and two predictors, we have a total of four models to analysis. Normally we would need to run each of the models separately. However, if simply list the outcomes and the predictors in mason, it will ‘loop’ through each combination and run all four models! Let’s add the variables.

dp <- dp %>% 
    add_variables('yvars', c('Fertility', 'Infant.Mortality')) %>% 
    add_variables('xvars', c('Education', 'Agriculture'))

Alright, still nothing has happened. However, we are now at the phase that we can construct the analysis using construct().

dp <- construct(dp)
dp
#>              Fertility Agriculture Examination Education Catholic
#> Courtelary        80.2        17.0          15        12     9.96
#> Delemont          83.1        45.1           6         9    84.84
#> Franches-Mnt      92.5        39.7           5         5    93.40
#> Moutier           85.8        36.5          12         7    33.77
#> Neuveville        76.9        43.5          17        15     5.16
#> Porrentruy        76.1        35.3           9         7    90.57
#> Broye             83.8        70.2          16         7    92.85
#> Glane             92.4        67.8          14         8    97.16
#> Gruyere           82.4        53.3          12         7    97.67
#> Sarine            82.9        45.2          16        13    91.38
#> Veveyse           87.1        64.5          14         6    98.61
#> Aigle             64.1        62.0          21        12     8.52
#> Aubonne           66.9        67.5          14         7     2.27
#> Avenches          68.9        60.7          19        12     4.43
#> Cossonay          61.7        69.3          22         5     2.82
#> Echallens         68.3        72.6          18         2    24.20
#> Grandson          71.7        34.0          17         8     3.30
#> Lausanne          55.7        19.4          26        28    12.11
#> La Vallee         54.3        15.2          31        20     2.15
#> Lavaux            65.1        73.0          19         9     2.84
#> Morges            65.5        59.8          22        10     5.23
#> Moudon            65.0        55.1          14         3     4.52
#> Nyone             56.6        50.9          22        12    15.14
#> Orbe              57.4        54.1          20         6     4.20
#> Oron              72.5        71.2          12         1     2.40
#> Payerne           74.2        58.1          14         8     5.23
#> Paysd'enhaut      72.0        63.5           6         3     2.56
#> Rolle             60.5        60.8          16        10     7.72
#> Vevey             58.3        26.8          25        19    18.46
#> Yverdon           65.4        49.5          15         8     6.10
#> Conthey           75.5        85.9           3         2    99.71
#> Entremont         69.3        84.9           7         6    99.68
#> Herens            77.3        89.7           5         2   100.00
#> Martigwy          70.5        78.2          12         6    98.96
#> Monthey           79.4        64.9           7         3    98.22
#> St Maurice        65.0        75.9           9         9    99.06
#> Sierre            92.2        84.6           3         3    99.46
#> Sion              79.3        63.1          13        13    96.83
#> Boudry            70.4        38.4          26        12     5.62
#> La Chauxdfnd      65.7         7.7          29        11    13.79
#> Le Locle          72.7        16.7          22        13    11.22
#> Neuchatel         64.4        17.6          35        32    16.92
#> Val de Ruz        77.6        37.6          15         7     4.97
#> ValdeTravers      67.6        18.7          25         7     8.65
#> V. De Geneve      35.0         1.2          37        53    42.34
#> Rive Droite       44.7        46.6          16        29    50.43
#> Rive Gauche       42.8        27.7          22        29    58.33
#>              Infant.Mortality
#> Courtelary               22.2
#> Delemont                 22.2
#> Franches-Mnt             20.2
#> Moutier                  20.3
#> Neuveville               20.6
#> Porrentruy               26.6
#> Broye                    23.6
#> Glane                    24.9
#> Gruyere                  21.0
#> Sarine                   24.4
#> Veveyse                  24.5
#> Aigle                    16.5
#> Aubonne                  19.1
#> Avenches                 22.7
#> Cossonay                 18.7
#> Echallens                21.2
#> Grandson                 20.0
#> Lausanne                 20.2
#> La Vallee                10.8
#> Lavaux                   20.0
#> Morges                   18.0
#> Moudon                   22.4
#> Nyone                    16.7
#> Orbe                     15.3
#> Oron                     21.0
#> Payerne                  23.8
#> Paysd'enhaut             18.0
#> Rolle                    16.3
#> Vevey                    20.9
#> Yverdon                  22.5
#> Conthey                  15.1
#> Entremont                19.8
#> Herens                   18.3
#> Martigwy                 19.4
#> Monthey                  20.2
#> St Maurice               17.8
#> Sierre                   16.3
#> Sion                     18.1
#> Boudry                   20.3
#> La Chauxdfnd             20.5
#> Le Locle                 18.9
#> Neuchatel                23.0
#> Val de Ruz               20.0
#> ValdeTravers             19.5
#> V. De Geneve             18.0
#> Rive Droite              18.2
#> Rive Gauche              19.3

Cool! This is the unadjusted model, without any covariates. We said we wanted to adjust for Catholic. But let’s say we want to keep the unadjusted analysis too. Since we have ‘finished’ the analysis by cleaning it up, we can still add to the blueprint.

dp2 <- dp %>%
    add_variables('covariates', 'Catholic') %>% 
    construct()
head(dp2)
#>              Fertility Agriculture Examination Education Catholic
#> Courtelary        80.2        17.0          15        12     9.96
#> Delemont          83.1        45.1           6         9    84.84
#> Franches-Mnt      92.5        39.7           5         5    93.40
#> Moutier           85.8        36.5          12         7    33.77
#> Neuveville        76.9        43.5          17        15     5.16
#> Porrentruy        76.1        35.3           9         7    90.57
#>              Infant.Mortality
#> Courtelary               22.2
#> Delemont                 22.2
#> Franches-Mnt             20.2
#> Moutier                  20.3
#> Neuveville               20.6
#> Porrentruy               26.6

We now have two models in the results. We’re happy with them, so let’s clean it up using the scrub() function.

dp_clean <- dp2 %>% 
    scrub()

All scrub() does is removes any extra specs in the attributes and sets the results as the main dataset. You can see this by looking at it’s details and comparing to the unscrubbed version.

colnames(dp2)
#> [1] "Fertility"        "Agriculture"      "Examination"     
#> [4] "Education"        "Catholic"         "Infant.Mortality"
colnames(dp_clean)
#>  [1] "Yterms"      "Xterms"      "term"        "estimate"    "std.error"  
#>  [6] "statistic"   "p.value"     "conf.low"    "conf.high"   "sample.size"
names(attributes(dp2))
#> [1] "names"     "class"     "row.names" "specs"
names(attributes(dp_clean))
#> [1] "names"     "class"     "row.names"
class(dp2)
#> [1] "glm_bp"     "bp"         "data.frame"
class(dp_clean)
#> [1] "tbl_df"     "tbl"        "data.frame"

And all as a single pipe chain:

swiss %>% 
    design('glm') %>% 
    add_settings() %>% 
    add_variables('yvars', c('Fertility', 'Infant.Mortality')) %>% 
    add_variables('xvars', c('Education', 'Agriculture')) %>% 
    construct() %>% 
    add_variables('covariates', 'Catholic') %>% 
    construct() %>% 
    scrub()
#> # A tibble: 20 x 10
#>              Yterms      Xterms        term     estimate  std.error
#>               <chr>       <chr>       <chr>        <dbl>      <dbl>
#> 1         Fertility Agriculture (Intercept) 60.304375228 4.25125562
#> 2         Fertility Agriculture     <-Xterm  0.194201749 0.07671176
#> 3         Fertility   Education (Intercept) 79.610058532 2.10409711
#> 4         Fertility   Education     <-Xterm -0.862350293 0.14484472
#> 5  Infant.Mortality Agriculture (Intercept) 20.337954766 1.05754318
#> 6  Infant.Mortality Agriculture     <-Xterm -0.007805071 0.01908283
#> 7  Infant.Mortality   Education (Intercept) 20.272865076 0.65272716
#> 8  Infant.Mortality   Education     <-Xterm -0.030086548 0.04493333
#> 9         Fertility Agriculture (Intercept) 59.863923712 3.98753957
#> 10        Fertility Agriculture     <-Xterm  0.109528109 0.07848208
#> 11        Fertility Agriculture    Catholic  0.114962125 0.04273900
#> 12        Fertility   Education (Intercept) 74.233689201 2.35197061
#> 13        Fertility   Education     <-Xterm -0.788329259 0.12929324
#> 14        Fertility   Education    Catholic  0.110920955 0.02980965
#> 15 Infant.Mortality Agriculture (Intercept) 20.274208854 1.04449977
#> 16 Infant.Mortality Agriculture     <-Xterm -0.020059765 0.02055767
#> 17 Infant.Mortality Agriculture    Catholic  0.016638302 0.01119509
#> 18 Infant.Mortality   Education (Intercept) 19.717357317 0.82539716
#> 19 Infant.Mortality   Education     <-Xterm -0.022438401 0.04537398
#> 20 Infant.Mortality   Education    Catholic  0.011460792 0.01046136
#> # ... with 5 more variables: statistic <dbl>, p.value <dbl>,
#> #   conf.low <dbl>, conf.high <dbl>, sample.size <int>

There are also additional polish_* type commands that are more or less simply wrappers around commands that you may do on the results dataset, like filtering or renaming. The list of polish commands can be found in ?mason::polish.