Definition of a gtsummary Object

This vignette is meant for those who wish to contribute to {gtsummary}, or users who wish to gain an understanding of the inner-workings of a {gtsummary} object so they may more easily modify them to suit your own needs. If this does not describe you, please refer to the {gtsummary} website to an introduction on how to use the package’s functions and tutorials on advanced use.


Every {gtsummary} table has a few characteristics common among all tables created with the package. Here, we review those characteristics, and provide instructions on how to construct a {gtsummary} object.


tbl_regression_ex <-
  lm(age ~ grade + marker, trial) %>%
  tbl_regression() %>%
  bold_p(t = 0.5) 

tbl_summary_ex <-
  trial %>%
  select(trt, age, grade, response) %>%
  tbl_summary(by = trt)

Structure of a {gtsummary} object

Every {gtsummary} object is a list comprising of, at minimum, these elements:

.$table_body    .$table_styling         


The .$table_body object is the data frame that will ultimately be printed as the output. The table must include columns "label", "row_type", and "variable". The "label" column is printed, and the other two are hidden from the final output.

#> # A tibble: 8 × 7
#>   variable var_type    var_label      row_type label          stat_1      stat_2
#>   <chr>    <chr>       <chr>          <chr>    <chr>          <chr>       <chr> 
#> 1 age      continuous  Age            label    Age            46 (37, 59) 48 (3…
#> 2 age      continuous  Age            missing  Unknown        7           4     
#> 3 grade    categorical Grade          label    Grade          <NA>        <NA>  
#> 4 grade    categorical Grade          level    I              35 (36%)    33 (3…
#> 5 grade    categorical Grade          level    II             32 (33%)    36 (3…
#> 6 grade    categorical Grade          level    III            31 (32%)    33 (3…
#> 7 response dichotomous Tumor Response label    Tumor Response 28 (29%)    33 (3…
#> 8 response dichotomous Tumor Response missing  Unknown        3           4


The .$table_styling object is a list of data frames containing information about how .$table_body is printed, formatted, and styled.
The list contains the following data frames header, footnote, footnote_abbrev, fmt_fun, text_format, fmt_missing, cols_merge and the following objects source_note, caption, horizontal_line_above.


The header table has the following columns and is one row per column found in .$table_body. The table contains styling information that applies to entire column or the columns headers.

Column Description


Column name from .$table_body


Logical indicating whether the column is hidden in the output. This column is also scoped in modify_header() (and friends) to be used in a selecting environment


Specifies the alignment/justification of the column, e.g. 'center' or 'left'


Label that will be displayed (if column is displayed in output)


the {gt} function that is used to interpret the column label, gt::md() or gt::html()


Includes text printed above columns as spanning headers.


the {gt} function that is used to interpret the column spanning headers, gt::md() or gt::html()


any column beginning with modify_stat_ is a statistic available to report in modify_header() (and others)


any column beginning with modify_selector_ is a column that is scoped in modify_header() (and friends) to be used in a selecting environment

footnote & footnote_abbrev

Each {gtsummary} table may contain a single footnote per header and cell within the table. Footnotes and footnote abbreviations are handled separately. Updates/changes to footnote are appended to the bottom of the tibble. A footnote of NA_character_ deletes an existing footnote.

Column Description


Column name from .$table_body


expression selecting rows in .$table_body, NA indicates to add footnote to header


string containing footnote to add to column/row


Numeric columns/rows are styled with the functions stored in fmt_fun. Updates/changes to styling functions are appended to the bottom of the tibble.

Column Description


Column name from .$table_body


expression selecting rows in .$table_body


list of formatting/styling functions


Columns/rows are styled with bold, italic, or indenting stored in text_format. Updates/changes to styling functions are appended to the bottom of the tibble.

Column Description


Column name from .$table_body


expression selecting rows in .$table_body


one of c('bold', 'italic', 'indent')


logical indicating where the formatting indicated should be undone/removed.


By default, all NA values are shown blanks. Missing values in columns/rows are replaced with the symbol. For example, reference rows in tbl_regression() are shown with an em-dash. Updates/changes to styling functions are appended to the bottom of the tibble.

Column Description


Column name from .$table_body


expression selecting rows in .$table_body


string to replace missing values with, e.g. an em-dash


This object is experimental and may change in the future. This tibble gives instructions for merging columns into a single column. The implementation in as_gt() will be updated after gt::cols_label() gains a rows= argument.

Column Description


Column name from .$table_body


expression selecting rows in .$table_body


glue pattern directing how to combine/merge columns. The merged columns will replace the column indicated in 'column'.


String that is made a table source note. The attribute "text_interpret" is either c("md", "html").


String that is made into the table caption. The attribute "text_interpret" is either c("md", "html").


Expression identifying a row where a horizontal line is placed above in the table.

Example from tbl_regression()

#> $header
#> # A tibble: 24 × 9
#>    column     hide  align interpret_label label interpret_spann… spanning_header
#>    <chr>      <lgl> <chr> <chr>           <chr> <chr>            <chr>          
#>  1 variable   TRUE  cent… gt::md          vari… gt::md           <NA>           
#>  2 var_label  TRUE  cent… gt::md          var_… gt::md           <NA>           
#>  3 var_type   TRUE  cent… gt::md          var_… gt::md           <NA>           
#>  4 reference… TRUE  cent… gt::md          refe… gt::md           <NA>           
#>  5 row_type   TRUE  cent… gt::md          row_… gt::md           <NA>           
#>  6 header_row TRUE  cent… gt::md          head… gt::md           <NA>           
#>  7 N_obs      TRUE  cent… gt::md          N_obs gt::md           <NA>           
#>  8 N          TRUE  cent… gt::md          **N** gt::md           <NA>           
#>  9 coefficie… TRUE  cent… gt::md          coef… gt::md           <NA>           
#> 10 coefficie… TRUE  cent… gt::md          coef… gt::md           <NA>           
#> # … with 14 more rows, and 2 more variables: modify_stat_N <int>,
#> #   modify_stat_n <int>
#> $footnote
#> # A tibble: 0 × 4
#> # … with 4 variables: column <chr>, rows <list>, text_interpret <chr>,
#> #   footnote <chr>
#> $footnote_abbrev
#> # A tibble: 2 × 4
#>   column    rows      text_interpret footnote                
#>   <chr>     <list>    <chr>          <chr>                   
#> 1 ci        <quosure> gt::md         CI = Confidence Interval
#> 2 std.error <quosure> gt::md         SE = Standard Error     
#> $text_format
#> # A tibble: 2 × 4
#>   column  rows       format_type undo_text_format
#>   <chr>   <list>     <chr>       <lgl>           
#> 1 label   <language> indent      FALSE           
#> 2 p.value <quosure>  bold        FALSE           
#> $fmt_missing
#> # A tibble: 4 × 3
#>   column    rows      symbol
#>   <chr>     <list>    <chr> 
#> 1 estimate  <quosure> —     
#> 2 ci        <quosure> —     
#> 3 std.error <quosure> —     
#> 4 statistic <quosure> —     
#> $fmt_fun
#> # A tibble: 10 × 3
#>    column      rows      fmt_fun   
#>    <chr>       <list>    <list>    
#>  1 estimate    <quosure> <fn>      
#>  2 N           <quosure> <fn>      
#>  3 N_obs       <quosure> <fn>      
#>  4 n_obs       <quosure> <fn>      
#>  5 conf.low    <quosure> <fn>      
#>  6 conf.high   <quosure> <fn>      
#>  7 p.value     <quosure> <fn>      
#>  8 std.error   <quosure> <prrr_fn_>
#>  9 statistic   <quosure> <prrr_fn_>
#> 10 var_nlevels <quosure> <prrr_fn_>
#> $cols_merge
#> # A tibble: 0 × 3
#> # … with 3 variables: column <chr>, rows <list>, pattern <chr>

Constructing a {gtsummary} object


When constructing a {gtsummary} object, the author will begin with the .$table_body object. Recall the .$table_body data frame must include columns "label", "row_type", and "variable". Of these columns, only the "label" column will be printed with the final results. The "row_type" column typically will control whether or not the label column is indented. The "variable" column is often used in the inline_text() family of functions, and merging {gtsummary} tables with tbl_merge().

tbl_regression_ex %>%
  purrr::pluck("table_body") %>%
  select(variable, row_type, label)
#> # A tibble: 5 × 3
#>   variable row_type label               
#>   <chr>    <chr>    <chr>               
#> 1 grade    label    Grade               
#> 2 grade    level    I                   
#> 3 grade    level    II                  
#> 4 grade    level    III                 
#> 5 marker   label    Marker Level (ng/mL)

The other columns in .$table_body are created by the user and are likely printed in the output. Formatting and printing instructions for these columns is stored in .$table_styling.


There are a few internal {gtsummary} functions to assist in constructing and modifying a .$table_header data frame.

  1. .create_gtsummary_object(table_body) After a user creates a table_body, pass it to this function and the skeleton of a gtsummary object is created and returned (including the full table_styling list of tables).

  2. .update_table_styling() After columns are added or removed from table_body, run this function to update .$table_styling to include or remove styling instructions for the columns. FYI the default styling for each new column is to hide it.

  3. modify_table_styling() This exported function modifies the printing instructions for a single column or groups of columns.

  4. modify_table_body() This exported function helps users make changes to .$table_body. The function runs .update_table_styling() internally to maintain internal validity with the printing instructions.

Printing a {gtsummary} object

All {gtsummary} objects are printed with print.gtsummary(). Before a {gtsummary} object is printed, it is converted to a {gt} object using as_gt(). This function takes the {gtsummary} object as its input, and uses the information in .$table_styling to construct a list of {gt} calls that will be executed on .$table_body. After the {gtsummary} object is converted to {gt}, it is then printed as any other {gt} object.

In some cases, the package defaults to printing with other engines, such as flextable (as_flex_table()), huxtable (as_hux_table()), kableExtra (as_kable_extra()), and kable (as_kable()). The default print engine is set with the theme element "pkgwide-str:print_engine"

While the actual print function is slightly more involved, it is basically this:

print.gtsummary <- function(x) {
  get_theme_element("pkgwide-str:print_engine") %>%
      "gt" = as_gt(x),
      "flextable" = as_flex_table(x),
      "huxtable" = as_hux_table(x),
      "kable_extra" = as_kable_extra(x),
      "kable" = as_kable(x)
    ) %>%

The .$meta_data$df_stats tibble

Some {gtsummary} tables contain an internal object called .$meta_data containing a list column called "df_stats". The column is a list of tibbles with each tibble containing the summary statistics presented in the final gtsummary table. While the statistics contained in each "df_stats" tibble can vary within a single gtsummary object, all the tibbles have a few common characteristics.

Each tibble contain the following columns

Column Description


String of the variable name


String matching the variable's values in .$table_body$label


The column name the statistics appear under in .$table_body, e.g. 'stat_0', 'stat_1'


This column appears if and only if the variable being summarized has multiple levels. The column is equal to the variable's levels.


Primarily, the tibble stores the summary statistics for each variable. For example, when the mean is requested in tbl_summary(), there will be a column called 'mean'.

The statistics columns each have an attribute called "fmt_fun" containing the formatting function that will be applied before the statistic is placed in .$table_body.