This vignette is meant for those who wish to contribute to {gtsummary}, or users who wish to gain an understanding of the inner-workings of a {gtsummary} object so they may more easily modify them to suit your own needs. If this does not describe you, please refer to the {gtsummary} website to an introduction on how to use the package’s functions and tutorials on advanced use.
Every {gtsummary} object has a few characteristics common among all objects. Here, we review those characteristics, and provide instructions on how to construct a {gtsummary} object.
Every {gtsummary} object is a list comprising of, at minimum, these elements:
The .$table_body
object is the data frame that will ultimately be printed as the output. The table must include columns "label"
, "row_type"
, and "variable"
. The "label"
column is printed, and the other two are hidden from the final output.
tbl_summary_ex$table_body
#> # A tibble: 8 x 5
#> variable row_type label stat_1 stat_2
#> <chr> <chr> <chr> <chr> <chr>
#> 1 age label Age, yrs 46 (37, 59) 48 (39, 56)
#> 2 age missing Unknown 7 4
#> 3 grade label Grade <NA> <NA>
#> 4 grade level I 35 (36%) 33 (32%)
#> 5 grade level II 32 (33%) 36 (35%)
#> 6 grade level III 31 (32%) 33 (32%)
#> 7 response label Tumor Response 28 (29%) 33 (34%)
#> 8 response missing Unknown 3 4
The .$table_header
object is a data frame containing information about each of the columns in .$table_body
(one row per column in .$table_body
). The table header has the following columns:
Column | Description |
---|---|
column | Column name from table_body |
label | Label that will be displayed (if column is displayed in output) |
hide | Logical indicating whether the column is hidden in the output |
text_interpret | the {gt} function that is used to interpret the column label |
fmt_fun | If the column needs to be formatted, this list column contains the function that performs the formatting. Note, this is the function object; not the character name of a function. |
bold | For columns that bold row conditionally, the column includes the threshold to bold below. The most common use for this is to bold p-value below a threshold. |
footnote_abbrev | Lists the abbreviation footnotes for a table. All abbreviation footnotes are collated into a single footnote. For example, ‘OR = Odds Ratio’ and ‘CI = Confidence Interval’ appear in a single footnote. |
footnote | Lists the footnotes that will appear for each column. Duplicates abbreviations will appear once. |
tbl_regression_ex$table_header
#> # A tibble: 11 x 8
#> column label hide text_interpret fmt_fun bold footnote_abbrev footnote
#> <chr> <chr> <lgl> <chr> <list> <dbl> <list> <list>
#> 1 variab~ variable TRUE gt::md <NULL> NA <NULL> <NULL>
#> 2 var_ty~ var_type TRUE gt::md <NULL> NA <NULL> <NULL>
#> 3 row_ref row_ref TRUE gt::md <NULL> NA <NULL> <NULL>
#> 4 row_ty~ row_type TRUE gt::md <NULL> NA <NULL> <NULL>
#> 5 label **Charac~ FALSE gt::md <NULL> NA <NULL> <NULL>
#> 6 N N TRUE gt::md <NULL> NA <NULL> <NULL>
#> 7 estima~ **Beta** FALSE gt::md <fn> NA <NULL> <NULL>
#> 8 conf.l~ conf.low TRUE gt::md <fn> NA <NULL> <NULL>
#> 9 conf.h~ conf.high TRUE gt::md <fn> NA <NULL> <NULL>
#> 10 ci **95% CI~ FALSE gt::md <NULL> NA <chr [1]> <NULL>
#> 11 p.value **p-valu~ FALSE gt::md <fn> 0.5 <NULL> <NULL>
The .$gt_calls
object is a list of {gt} calls saved as strings (this may be updated to be expressions at some point). Every {gt} is referred to with the double colon, ::
. The calls are executed in the order they appear in the list, and always begin with the gt::gt()
call.
tbl_regression_ex$gt_calls
#> $gt
#> gt::gt(data = x$table_body)
#>
#> $cols_align
#> gt::cols_align(align = 'center') %>% gt::cols_align(align = 'left', columns = gt::vars(label))
#>
#> $fmt_missing
#> gt::fmt_missing(columns = gt::everything(), missing_text = '')
#>
#> $fmt_missing_ref
#> gt::fmt_missing(columns = gt::vars(estimate, ci), rows = row_ref == TRUE, missing_text = '---')
#>
#> $tab_style_text_indent
#> gt::tab_style(style = gt::cell_text(indent = gt::px(10), align = 'left'),locations = gt::cells_body(columns = gt::vars(label), rows = row_type != 'label'))
#>
#> $cols_label
#> gt::cols_label(label = gt::md("**Characteristic**"), estimate = gt::md("**Beta**"), ci = gt::md("**95% CI**"), p.value = gt::md("**p-value**"))
#>
#> $cols_hide
#> gt::cols_hide(columns = gt::vars(variable, var_type, row_ref, row_type, N, conf.low, conf.high))
#>
#> $fmt
#> gt::fmt(columns = gt::vars(estimate), rows = !is.na(estimate), fns = x$fmt_fun$estimate) %>% gt::fmt(columns = gt::vars(conf.low), rows = !is.na(conf.low), fns = x$fmt_fun$conf.low) %>% gt::fmt(columns = gt::vars(conf.high), rows = !is.na(conf.high), fns = x$fmt_fun$conf.high) %>% gt::fmt(columns = gt::vars(p.value), rows = !is.na(p.value), fns = x$fmt_fun$p.value) %>% gt::tab_style(style = gt::cell_text(weight = 'bold'), locations = gt::cells_body(columns = gt::vars(p.value), rows = p.value <= 0.5))
#>
#> $tab_footnote
#> gt::tab_footnote(footnote = 'CI = Confidence Interval', locations = gt::cells_column_labels(columns = gt::vars(ci)))
The .$kable_calls
object is a list of data frame manipulation calls saved as strings (this may be updated later to be expressions). The calls are executed in the order they appear in the list.
tbl_regression_ex$kable_calls
#> $kable
#> x$table_body
#>
#> $fmt
#> dplyr::mutate(estimate = x$fmt_fun$estimate(estimate)) %>% dplyr::mutate(conf.low = x$fmt_fun$conf.low(conf.low)) %>% dplyr::mutate(conf.high = x$fmt_fun$conf.high(conf.high)) %>% dplyr::mutate(p.value = dplyr::case_when(p.value <= 0.5 ~ paste0('__', x$fmt_fun$p.value(p.value), '__'), TRUE ~ x$fmt_fun$p.value(p.value)))
#>
#> $fmt_missing_ref
#> dplyr::mutate_at(dplyr::vars(estimate, conf.low), ~ dplyr::case_when(row_ref == TRUE ~ '---', TRUE ~ .))
#>
#> $cols_hide
#> dplyr::select(-c("variable", "var_type", "row_ref", "row_type", "N", "conf.low", "conf.high"))
.$fmt_fun
is a named list. If formatting functions are applied to a column in .$table_body
, the formatting function is saved in the list. The names of the list are the names of the columns of .$table_body
. For example, the "p.value"
column is often styled with style_pvalue()
. In this case .$fmt_fun$p.value = style_pvalue
.
The list is generated from .$table_header
.
When constructing a {gtsummary} object, the author will begin with the .$table_body
object. Recall the .$table_body
data frame must include columns "label"
, "row_type"
, and "variable"
. Of these columns, only the "label"
column will be printed with the final results. The "row_type"
column typically will control whether or not the label column is indented. The "variable"
is often used in the inline_text()
family of functions to select the rows to print in the body of an R markdown document.
tbl_regression_ex %>%
pluck("table_body") %>%
select(variable, row_type, label)
#> # A tibble: 5 x 3
#> variable row_type label
#> <chr> <chr> <chr>
#> 1 grade label Grade
#> 2 grade level I
#> 3 grade level II
#> 4 grade level III
#> 5 marker label Marker Level, ng/mL
The other columns in .$table_body
are created by the user and are likely printed in the output. Formatting instructions for these columns is stored in .$table_header
.
The .$table_header
has one row for every column in .$table_body
containing instructions how to format each column, the column headers, and more. There are a few internal {gtsummary} functions to assist in constructing and modifying a .$table_header
data frame.
First is the table_header_fill_missing()
function. This function ensures .$table_header
contains a row for every column of .$table_body
. If a column does not exist, it is populated with appropriate default values.
gtsummary:::table_header_fill_missing(
table_header = tibble(column = names(tbl_regression_ex$table_body)))
#> # A tibble: 11 x 8
#> column label hide text_interpret fmt_fun bold footnote_abbrev footnote
#> <chr> <chr> <lgl> <chr> <list> <dbl> <list> <list>
#> 1 variable variab~ TRUE gt::md <NULL> NA <NULL> <NULL>
#> 2 var_type var_ty~ TRUE gt::md <NULL> NA <NULL> <NULL>
#> 3 row_ref row_ref TRUE gt::md <NULL> NA <NULL> <NULL>
#> 4 row_type row_ty~ TRUE gt::md <NULL> NA <NULL> <NULL>
#> 5 label label TRUE gt::md <NULL> NA <NULL> <NULL>
#> 6 N N TRUE gt::md <NULL> NA <NULL> <NULL>
#> 7 estimate estima~ TRUE gt::md <NULL> NA <NULL> <NULL>
#> 8 conf.low conf.l~ TRUE gt::md <NULL> NA <NULL> <NULL>
#> 9 conf.high conf.h~ TRUE gt::md <NULL> NA <NULL> <NULL>
#> 10 ci ci TRUE gt::md <NULL> NA <NULL> <NULL>
#> 11 p.value p.value TRUE gt::md <NULL> NA <NULL> <NULL>
The modify_header_internal()
is useful for assigning column headers. The function accepts a complete {gtsummary} object as its input, and returns an updated version where the column labels have been added to .$table_header
. The function also switches the default .$table_header$hide
from TRUE
to FALSE
, resulting in column with labels being printed.
Lastly, any time the .$table_header
object is modified, it is critical the author also runs update_calls_from_table_header()
. This function uses the information in .$table_header
to update the gt and kable calls.
Each {gtsummary} object must return calls for printing with either the gt package or the knit::kable function. A function author will write a basis of calls, for example, the the first gt call is always gt::gt()
. After the basics are covered, more complex calls will be added via update_calls_from_table_header()
. If the new function you’re writing is a cobbled together {gtsummary} object (for example, using tbl_merge()
or tbl_stack()
) the basic calls should already be covered.
All {gtsummary} objects are printed with print.gtsummary()
. Within the print function the {gtsummary} object is converted to either a gt object or a knitr::kable object depending on the chosen print engine. While the actual print function is slightly more involved it is basically this:
print.gtsummary <- function(x) {
if (getOption("gtsummary.print_engine") == "gt") {
return(as_gt(x) %>% print())
}
else if (getOption("gtsummary.print_engine") == "kable") {
return(as_kable(x) %>% print())
}
}
The as_gt()
and as_kable()
function execute the calls saved in .$gt_calls
and .$kable_calls
, respectively, converting the object from {gtsummary} to the specified type.