Usage guidance

2021-11-07

Introduction

DescrTab2 is the replacement of the DescrTab package. It supports a variety of different customization options and can be used in .Rmd files in conjunction with knitr.

Preamble settings

DescrTab2 works in your R-console, as well as in .Rmd documents corresponding to output formats of the type pdf_documument, html_document and word_document. It even supports YAML-headers with multiple output formats! For example, if your YAML-header looks like the example below, DescrTab2 should automagically detect the output format depending on the rendering option you choose from the dropdown menue (the arrow next to the “Knit” button on the top menue bar).

---
title: "DescrTab2 tutorial"
output:
  word_document: default
  pdf_document: default
  html_document: default
---

Required LaTeX packages should be loaded automatically as well when rendering as a pdf.

Getting started

Make sure you include the DescrTab2 library by typing

library(DescrTab2)

somewhere in the document before you use it. You are now ready to go!

For instructive purposes, we will use the following dataset:

dat <- iris[, c("Species", "Sepal.Length")]
dat %<>% mutate(animal= c("Mammal", "Fish") %>% rep(75) %>% factor())
dat %<>% mutate(food= c("fries", "wedges") %>% sample(150, TRUE) %>% factor())

Producing beautiful descriptive tables is now as easy as typing:

descr(dat)
Variables
Total
p
(N=150)
Species
setosa 50 (33%) >0.999Chi
versicolor 50 (33%)
virginica 50 (33%)
Sepal.Length
N 150 <0.001Stu
mean 5.8
sd 0.83
median 5.8
Q1 - Q3 5.1 – 6.4
min - max 4.3 – 7.9
animal
Fish 75 (50%) >0.999Chi
Mammal 75 (50%)
food
fries 75 (50%) >0.999Chi
wedges 75 (50%)
Chi Chi-squared goodness-of-fit test
Stu Student’s one-sample t-test

Accessing table elements

The object returned from the descr function is basically just a named list. You may be interested in referencing certain summary statistics from the table in your document. To do this, you can save the list returned by descr:

my_table <- descr(dat)

You can then access the elements of the list using the $ operator.

my_table$variables$Sepal.Length$results$Total$mean
#> NULL

Rstudios autocomplete suggestions are very helpful when navigating this list.

The print function returns a formatted version of this list, which you can also save and access using the same syntax.

my_table <- descr(dat) %>% print(silent=TRUE)

Specifying a group

Use the group option to specify the name of a grouping variable in your data:

descr(dat, "Species")
Variables
setosa
versicolor
virginica
Total
p
(N=50) (N=50) (N=50) (N=150)
Sepal.Length
N 50 50 50 150 <0.001F-t
mean 5 5.9 6.6 5.8
sd 0.35 0.52 0.64 0.83
median 5 5.9 6.5 5.8
Q1 - Q3 4.8 – 5.2 5.6 – 6.3 6.2 – 6.9 5.1 – 6.4
min - max 4.3 – 5.8 4.9 – 7 4.9 – 7.9 4.3 – 7.9
animal
Fish 25 (50%) 25 (50%) 25 (50%) 75 (50%) >0.999Pea
Mammal 25 (50%) 25 (50%) 25 (50%) 75 (50%)
food
fries 32 (64%) 22 (44%) 21 (42%) 75 (50%) 0.052Pea
wedges 18 (36%) 28 (56%) 29 (58%) 75 (50%)
F-t F-test (ANOVA)
Pea Pearson’s chi-squared test

Assigning labels

Use the group_labels option to assign group labels and the var_labels option to assign variable labels:

descr(dat, "Species", group_labels=list(setosa="My custom group label"), var_labels = list(Sepal.Length = "My custom variable label"))
Variables
My custom group label
versicolor
virginica
Total
p
(N=50) (N=50) (N=150)
My custom variable label
N 50 50 50 150 <0.001F-t
mean 5 5.9 6.6 5.8
sd 0.35 0.52 0.64 0.83
median 5 5.9 6.5 5.8
Q1 - Q3 4.8 – 5.2 5.6 – 6.3 6.2 – 6.9 5.1 – 6.4
min - max 4.3 – 5.8 4.9 – 7 4.9 – 7.9 4.3 – 7.9
animal
Fish 25 (50%) 25 (50%) 25 (50%) 75 (50%) >0.999Pea
Mammal 25 (50%) 25 (50%) 25 (50%) 75 (50%)
food
fries 32 (64%) 22 (44%) 21 (42%) 75 (50%) 0.052Pea
wedges 18 (36%) 28 (56%) 29 (58%) 75 (50%)
F-t F-test (ANOVA)
Pea Pearson’s chi-squared test

Assigning a table caption

Use the caption member of the format_options argument to assign a table caption:

descr(dat, "Species", format_options = list(caption="Description of our example dataset."))
Description of our example dataset.
Variables
setosa
versicolor
virginica
Total
p
(N=50) (N=50) (N=50) (N=150)
Sepal.Length
N 50 50 50 150 <0.001F-t
mean 5 5.9 6.6 5.8
sd 0.35 0.52 0.64 0.83
median 5 5.9 6.5 5.8
Q1 - Q3 4.8 – 5.2 5.6 – 6.3 6.2 – 6.9 5.1 – 6.4
min - max 4.3 – 5.8 4.9 – 7 4.9 – 7.9 4.3 – 7.9
animal
Fish 25 (50%) 25 (50%) 25 (50%) 75 (50%) >0.999Pea
Mammal 25 (50%) 25 (50%) 25 (50%) 75 (50%)
food
fries 32 (64%) 22 (44%) 21 (42%) 75 (50%) 0.052Pea
wedges 18 (36%) 28 (56%) 29 (58%) 75 (50%)
F-t F-test (ANOVA)
Pea Pearson’s chi-squared test

Confidence intervals for two group comparisons

For 2-group comparisons, decrtab automatically calculates confidence intervals for differences in effect measures:

descr(dat, "animal")
Variables
Fish
Mammal
Total
p
CI
(N=75) (N=75) (N=150)
Species
setosa 25 (33%) 25 (33%) 50 (33%) >0.999Pea
versicolor 25 (33%) 25 (33%) 50 (33%)
virginica 25 (33%) 25 (33%) 50 (33%)
Sepal.Length
N 75 75 150 0.961Wel Mean dif. CI
mean 5.8 5.8 5.8 [-0.26, 0.27]
sd 0.86 0.81 0.83
median 5.7 5.8 5.8
Q1 - Q3 5.1 – 6.4 5.1 – 6.5 5.1 – 6.4
min - max 4.3 – 7.9 4.4 – 7.7 4.3 – 7.9
food
fries 37 (49%) 38 (51%) 75 (50%) 0.870Pea Prop. dif. CI
wedges 38 (51%) 37 (49%) 75 (50%) [-0.17, 0.15]
Pea Pearson’s chi-squared test
Wel Welch’s two-sample t-test

Different tests

There are a lot of different tests available. Check out the test_choice vignette for details: https://imbi-heidelberg.github.io/DescrTab2/articles/test_choice_tree_pdf.pdf

Here are some different tests in action:

descr(dat %>% select(-"Species"), "animal", test_options = list(exact=TRUE, nonparametric=TRUE))
Variables
Fish
Mammal
Total
p
CI
(N=75) (N=75) (N=150)
Sepal.Length
N 75 75 150 0.870Man HL CI
mean 5.8 5.8 5.8 [-0.3, 0.3]
sd 0.86 0.81 0.83
median 5.7 5.8 5.8
Q1 - Q3 5.1 – 6.4 5.1 – 6.5 5.1 – 6.4
min - max 4.3 – 7.9 4.4 – 7.7 4.3 – 7.9
food
fries 37 (49%) 38 (51%) 75 (50%) 0.935Bos Prop. dif. CI
wedges 38 (51%) 37 (49%) 75 (50%) [-0.17, 0.15]
Man Mann-Whitney’s U test
Bos Boschloo’s test
descr(dat %>% select(c("Species", "Sepal.Length")), "Species", test_options = list(nonparametric=TRUE))
Variables
setosa
versicolor
virginica
Total
p
(N=50) (N=50) (N=50) (N=150)
Sepal.Length
N 50 50 50 150 <0.001Kru
mean 5 5.9 6.6 5.8
sd 0.35 0.52 0.64 0.83
median 5 5.9 6.5 5.8
Q1 - Q3 4.8 – 5.2 5.6 – 6.3 6.2 – 6.9 5.1 – 6.4
min - max 4.3 – 5.8 4.9 – 7 4.9 – 7.9 4.3 – 7.9
Kru Kruskal-Wallis’s one-way ANOVA

Paired observations

In situations with paired data, the group variable usually denotes the timing of the measurement (e.g. “before” and “after” or “time 1”, “time 2”, etc.). In these scenarios, you need an additional index variable that specifies which observations from the different timepoints should be paired. The test_options =list(paired=TRUE, indices = <Character name of index variable name or vector of indices>) option can be used to specify the pairing indices, see the example below. DescrTab2 only works with data in “long format”, see e.g. ?reshape or ?tidyr::pivot_longer for information on how to transoform your data from wide to long format.

descr(dat %>% mutate(animal = fct_recode(animal, Before="Fish", After="Mammal")) %>% select(-"Species"), "animal", test_options = list(paired=TRUE, indices=rep(1:75, each=2)))
#> You specified paired tests and did not explicitly
#> specify format_options$print_Total. print_Total is set to FALSE.
#> Warning in sig_test(var, group, test_options, test_override, var_name): Confidence intervals for differences in proportions ignore the paired structure of the data.
#> Use Exact McNemar's test if you want confidence intervals which use the test statistic of the
#> exact McNemar's test.
Variables
Before
After
p
CI
(N=75) (N=75)
Sepal.Length
N 75 75 0.937Stu Mean dif. CI
mean 5.8 5.8 [-0.16, 0.18]
sd 0.86 0.81
median 5.7 5.8
Q1 - Q3 5.1 – 6.4 5.1 – 6.5
min - max 4.3 – 7.9 4.4 – 7.7
food
fries 37 (49%) 38 (51%) >0.999McN Prop. dif. CI
wedges 38 (51%) 37 (49%) [-0.17, 0.15]
Stu Student’s paired t-test
McN McNemar’s test

descr(dat %>% mutate(animal = fct_recode(animal, Before="Fish", After="Mammal"), idx = rep(1:75, each=2)) %>% select(-"Species"), "animal", test_options = list(paired=TRUE, indices="idx" ))
#> You specified paired tests and did not explicitly
#> specify format_options$print_Total. print_Total is set to FALSE.
#> Warning in sig_test(var, group, test_options, test_override, var_name): Confidence intervals for differences in proportions ignore the paired structure of the data.
#> Use Exact McNemar's test if you want confidence intervals which use the test statistic of the
#> exact McNemar's test.
Variables
Before
After
p
CI
(N=75) (N=75)
Sepal.Length
N 75 75 0.937Stu Mean dif. CI
mean 5.8 5.8 [-0.16, 0.18]
sd 0.86 0.81
median 5.7 5.8
Q1 - Q3 5.1 – 6.4 5.1 – 6.5
min - max 4.3 – 7.9 4.4 – 7.7
food
fries 37 (49%) 38 (51%) >0.999McN Prop. dif. CI
wedges 38 (51%) 37 (49%) [-0.17, 0.15]
Stu Student’s paired t-test
McN McNemar’s test

Significant digits

Every summary statistic in DescrTab2 is formatted by a corresponding formatting function. You can exchange these formatting functions as you please:

descr(dat, "Species", format_summary_stats = list(mean=function(x)formatC(x, digits = 4)) )
Variables
setosa
versicolor
virginica
Total
p
(N=50) (N=50) (N=50) (N=150)
Sepal.Length
N 50 50 50 150 <0.001F-t
mean 5.006 5.936 6.588 5.843
sd 0.35 0.52 0.64 0.83
median 5 5.9 6.5 5.8
Q1 - Q3 4.8 – 5.2 5.6 – 6.3 6.2 – 6.9 5.1 – 6.4
min - max 4.3 – 5.8 4.9 – 7 4.9 – 7.9 4.3 – 7.9
animal
Fish 25 (50%) 25 (50%) 25 (50%) 75 (50%) >0.999Pea
Mammal 25 (50%) 25 (50%) 25 (50%) 75 (50%)
food
fries 32 (64%) 22 (44%) 21 (42%) 75 (50%) 0.052Pea
wedges 18 (36%) 28 (56%) 29 (58%) 75 (50%)
F-t F-test (ANOVA)
Pea Pearson’s chi-squared test

Omitting summary statistics

Let’s say you don’t want to calculate quantiles for your numeric variables. You can specify the summary_stats_cont option to include all summary statistics but quantiles:

descr(dat, "Species", summary_stats_cont = list(N = DescrTab2:::.N, Nmiss = DescrTab2:::.Nmiss, mean =
    DescrTab2:::.mean, sd = DescrTab2:::.sd, median = DescrTab2:::.median, min = DescrTab2:::.min, max =
    DescrTab2:::.max))
Variables
setosa
versicolor
virginica
Total
p
(N=50) (N=50) (N=50) (N=150)
Sepal.Length
N 50 50 50 150 <0.001F-t
mean 5 5.9 6.6 5.8
sd 0.35 0.52 0.64 0.83
median 5 5.9 6.5 5.8
min - max 4.3 – 5.8 4.9 – 7 4.9 – 7.9 4.3 – 7.9
animal
Fish 25 (50%) 25 (50%) 25 (50%) 75 (50%) >0.999Pea
Mammal 25 (50%) 25 (50%) 25 (50%) 75 (50%)
food
fries 32 (64%) 22 (44%) 21 (42%) 75 (50%) 0.052Pea
wedges 18 (36%) 28 (56%) 29 (58%) 75 (50%)
F-t F-test (ANOVA)
Pea Pearson’s chi-squared test

Adding summary statistics

Let’s say you have a categorical variable, but for some reason it’s levels are numerals and you want to calculate the mean. No problem:

# Create example dataset
dat2 <- iris
dat2$cat_var <- c(1,2) %>% sample(150, TRUE) %>% factor()
dat2 <- dat2[, c("Species", "cat_var")]

descr(dat2, "Species", summary_stats_cat=list(mean=DescrTab2:::.factormean))
Variables
setosa
versicolor
virginica
Total
p
(N=50) (N=50) (N=50) (N=150)
cat_var
1 24 (48%) 21 (42%) 25 (50%) 70 (47%) 0.706Pea
2 26 (52%) 29 (58%) 25 (50%) 80 (53%)
mean 1.5 1.6 1.5 1.5
Pea Pearson’s chi-squared test

Combining mean and sd

Use the format_options = list(combine_mean_sd=TRUE) option:

descr(dat, "Species", format_options = c(combine_mean_sd=TRUE))
Variables
setosa
versicolor
virginica
Total
p
(N=50) (N=50) (N=50) (N=150)
Sepal.Length
N 50 50 50 150 <0.001F-t
mean ± sd 5 ± 0.35 5.9 ± 0.52 6.6 ± 0.64 5.8 ± 0.83
median 5 5.9 6.5 5.8
Q1 - Q3 4.8 – 5.2 5.6 – 6.3 6.2 – 6.9 5.1 – 6.4
min - max 4.3 – 5.8 4.9 – 7 4.9 – 7.9 4.3 – 7.9
animal
Fish 25 (50%) 25 (50%) 25 (50%) 75 (50%) >0.999Pea
Mammal 25 (50%) 25 (50%) 25 (50%) 75 (50%)
food
fries 32 (64%) 22 (44%) 21 (42%) 75 (50%) 0.052Pea
wedges 18 (36%) 28 (56%) 29 (58%) 75 (50%)
F-t F-test (ANOVA)
Pea Pearson’s chi-squared test

Omitting p-values

You can declare the format_options = list(print_p = FALSE) option to omit p-values:

descr(dat, "animal", format_options = list(print_p = FALSE))
Variables
Fish
Mammal
Total
CI
(N=75) (N=75) (N=150)
Species
setosa 25 (33%) 25 (33%) 50 (33%)
versicolor 25 (33%) 25 (33%) 50 (33%)
virginica 25 (33%) 25 (33%) 50 (33%)
Sepal.Length
N 75 75 150 Mean dif. CI
mean 5.8 5.8 5.8 [-0.26, 0.27]
sd 0.86 0.81 0.83
median 5.7 5.8 5.8
Q1 - Q3 5.1 – 6.4 5.1 – 6.5 5.1 – 6.4
min - max 4.3 – 7.9 4.4 – 7.7 4.3 – 7.9
food
fries 37 (49%) 38 (51%) 75 (50%) Prop. dif. CI
wedges 38 (51%) 37 (49%) 75 (50%) [-0.17, 0.15]

Similarily for Confidence intervals:

descr(dat, "animal", format_options = list(print_CI = FALSE))
Variables
Fish
Mammal
Total
p
(N=75) (N=75) (N=150)
Species
setosa 25 (33%) 25 (33%) 50 (33%) >0.999Pea
versicolor 25 (33%) 25 (33%) 50 (33%)
virginica 25 (33%) 25 (33%) 50 (33%)
Sepal.Length
N 75 75 150 0.961Wel
mean 5.8 5.8 5.8
sd 0.86 0.81 0.83
median 5.7 5.8 5.8
Q1 - Q3 5.1 – 6.4 5.1 – 6.5 5.1 – 6.4
min - max 4.3 – 7.9 4.4 – 7.7 4.3 – 7.9
food
fries 37 (49%) 38 (51%) 75 (50%) 0.870Pea
wedges 38 (51%) 37 (49%) 75 (50%)
Pea Pearson’s chi-squared test
Wel Welch’s two-sample t-test

Controling options on a per-variable level

You can use the var_options list to control formatting and test options on a per-variable basis. Let’s say in the dataset iris, we want that only the Sepal.Length variable has more digits in the mean and a nonparametric test:

descr(iris, "Species", var_options = list(Sepal.Length = list(
  format_summary_stats = list(
    mean = function(x)
      formatC(x, digits = 4)
  ),
  test_options = c(nonparametric = TRUE)
)))
Variables
setosa
versicolor
virginica
Total
p
(N=50) (N=50) (N=50) (N=150)
Sepal.Length
N 50 50 50 150 <0.001Kru
mean 5.006 5.936 6.588 5.843
sd 0.35 0.52 0.64 0.83
median 5 5.9 6.5 5.8
Q1 - Q3 4.8 – 5.2 5.6 – 6.3 6.2 – 6.9 5.1 – 6.4
min - max 4.3 – 5.8 4.9 – 7 4.9 – 7.9 4.3 – 7.9
Sepal.Width
N 50 50 50 150 <0.001F-t
mean 3.4 2.8 3 3.1
sd 0.38 0.31 0.32 0.44
median 3.4 2.8 3 3
Q1 - Q3 3.2 – 3.7 2.5 – 3 2.8 – 3.2 2.8 – 3.3
min - max 2.3 – 4.4 2 – 3.4 2.2 – 3.8 2 – 4.4
Petal.Length
N 50 50 50 150 <0.001F-t
mean 1.5 4.3 5.6 3.8
sd 0.17 0.47 0.55 1.8
median 1.5 4.3 5.5 4.3
Q1 - Q3 1.4 – 1.6 4 – 4.6 5.1 – 5.9 1.6 – 5.1
min - max 1 – 1.9 3 – 5.1 4.5 – 6.9 1 – 6.9
Petal.Width
N 50 50 50 150 <0.001F-t
mean 0.25 1.3 2 1.2
sd 0.11 0.2 0.27 0.76
median 0.2 1.3 2 1.3
Q1 - Q3 0.2 – 0.3 1.2 – 1.5 1.8 – 2.3 0.3 – 1.8
min - max 0.1 – 0.6 1 – 1.8 1.4 – 2.5 0.1 – 2.5
Kru Kruskal-Wallis’s one-way ANOVA
F-t F-test (ANOVA)

Use user defined test statistics

DescrTab2 has many predefined significance tests, but sometimes you may need to use a custom test. In this case, you can use the test_override option in test_options (or as a part of per variable options, see above)

custom_ttest <- list(
  name = "custom t-test",
  abbreviation = "custom",
  p = function(var) {
    return(t.test(var, alternative = "greater")$p.value)
  }
)

descr(iris %>% select(-Species), test_options = list(test_override = custom_ttest))
Variables
Total
p
(N=150)
Sepal.Length
N 150 <0.001cus
mean 5.8
sd 0.83
median 5.8
Q1 - Q3 5.1 – 6.4
min - max 4.3 – 7.9
Sepal.Width
N 150 <0.001cus
mean 3.1
sd 0.44
median 3
Q1 - Q3 2.8 – 3.3
min - max 2 – 4.4
Petal.Length
N 150 <0.001cus
mean 3.8
sd 1.8
median 4.3
Q1 - Q3 1.6 – 5.1
min - max 1 – 6.9
Petal.Width
N 150 <0.001cus
mean 1.2
sd 0.76
median 1.3
Q1 - Q3 0.3 – 1.8
min - max 0.1 – 2.5
cus custom t-test