Table 1

2016-11-15

Making Table 1

This vignette demonstrates the main function of the furniture package–table1. The main parts of the package are below:

table1(.data, ..., splitby, row_wise, test, output_type, format_output, format_number, NAkeep, piping, splitby_labels, var_names)

It contains several useful features for summarizing your data:

  1. It simply summarizes many variables succinctly providing means/counts and SD’s/percentages.
  2. The summary can be by a grouping factor (i.e., splitby).
  3. It uses a similar API to the popular tidyverse groups of packages.
  4. It can be used in piping.
  5. It can give bivariate test results for the variable with the grouping variable, which provides the correct test type depending on the variable types.
  6. It is flexible as to its output: can be printed in regular console output or it can be printed in latex, markdown, and pandoc (see knitr::kable).
  7. Numbers can be formatted nicely.

To illustrate, we’ll walk through the main arguments with an example on some ficticious data.

Example

set.seed(84332)
## Create Ficticious Data containing several types of variables
df <- data.frame(a = rnorm(10000),
                 b = runif(10000) + rnorm(10000),
                 c = factor(sample(c(1,2,3,4), 10000, replace=TRUE)),
                 d = factor(sample(c(0,1), 10000, replace=TRUE)),
                 e = trunc(rnorm(10000, 20, 5)))

We will use df to show these main features of table1.

The …

For table1, the ellipses (the ...), are the variables to be summarized that are found in your data. Here, we have a - e in df.

table1(df, 
       a, b, c, d, e)
## 
## |===================================
##               Mean/Count (SD/%)
##  Observations 10000            
##  a                             
##               -0.01 (1.01)     
##  b                             
##               0.51 (1.04)      
##  c                             
##     1         2491 (20.00%)    
##     2         2478 (20.00%)    
##     3         2525 (30.00%)    
##     4         2506 (30.00%)    
##  d                             
##     0         5022 (50.00%)    
##     1         4978 (50.00%)    
##  e                             
##               19.45 (5.00)     
## |===================================

Splitby

To get means/count and SD’s/percentages by a stratifying variable, simply use the splitby argument. The splitby can be a quoted variable (e.g., "df") or can be a one-sided formula as shown below (e.g., ~d).

table1(df,
       a, b, c,
       splitby = ~d)
## 
## |================================================
##               0             1            
##  Observations 5022          4978         
##  a                                       
##               -0.02 (1.00)  -0.01 (1.02) 
##  b                                       
##               0.50 (1.05)   0.52 (1.04)  
##  c                                       
##     1         1267 (30.00%) 1224 (20.00%)
##     2         1274 (30.00%) 1204 (20.00%)
##     3         1221 (20.00%) 1304 (30.00%)
##     4         1260 (30.00%) 1246 (30.00%)
## |================================================

Row Wise

You can get percentages by rows instead of by columns (i.e., groups) by using the row_wise = TRUE option.

table1(df,
       a, b, c,
       splitby = ~d,
       row_wise = TRUE)
## 
## |================================================
##               0             1            
##  Observations 5022          4978         
##  a                                       
##               -0.02 (1.00)  -0.01 (1.02) 
##  b                                       
##               0.50 (1.05)   0.52 (1.04)  
##  c                                       
##     1         1267 (50.00%) 1224 (50.00%)
##     2         1274 (50.00%) 1204 (50.00%)
##     3         1221 (50.00%) 1304 (50.00%)
##     4         1260 (50.00%) 1246 (50.00%)
## |================================================

Test

It is easy to test for bivariate relationships, as in common in many Table 1’s, using test = TRUE.

table1(df,
       a, b, c,
       splitby = ~d,
       test = TRUE)
## 
## |=====================================================
##               0             1             P-Value
##  Observations 5022          4978                 
##  a                                        0.608  
##               -0.02 (1.00)  -0.01 (1.02)         
##  b                                        0.53   
##               0.50 (1.05)   0.52 (1.04)          
##  c                                        0.149  
##     1         1267 (30.00%) 1224 (20.00%)        
##     2         1274 (30.00%) 1204 (20.00%)        
##     3         1221 (20.00%) 1304 (30.00%)        
##     4         1260 (30.00%) 1246 (30.00%)        
## |=====================================================

By default, only the p-values are shown but other options exist such as stars or including the test statistics with the p-values using the format_output argument.

Output Type

Several output types exist for the table (all of the knitr::kable options) including html as shown below. Others include:

  1. “latex”
  2. “markdown”
  3. “pandoc”
table1(df,
       a, b, c,
       splitby = ~d,
       test = TRUE,
       output_type = "html")
0 1 P-Value
Observations 5022 4978
a 0.608
-0.02 (1.00) -0.01 (1.02)
b 0.53
0.50 (1.05) 0.52 (1.04)
c 0.149
– 1 – 1267 (30.00%) 1224 (20.00%)
– 2 – 1274 (30.00%) 1204 (20.00%)
– 3 – 1221 (20.00%) 1304 (30.00%)
– 4 – 1260 (30.00%) 1246 (30.00%)

Format Number

For some papers you may want to format the numbers by inserting a comma in as a placeholder in big numbers (e.g., 30,000 vs. 30000). You can do this by using format_number = TRUE.

table1(df,
       a, b, c,
       splitby = ~d,
       test = TRUE,
       format_number = TRUE)
## 
## |=======================================================
##               0              1              P-Value
##  Observations 5,022          4,978                 
##  a                                          0.608  
##               -0.02 (1.00)   -0.01 (1.02)          
##  b                                          0.53   
##               0.50 (1.05)    0.52 (1.04)           
##  c                                          0.149  
##     1         1,267 (30.00%) 1,224 (20.00%)        
##     2         1,274 (30.00%) 1,204 (20.00%)        
##     3         1,221 (20.00%) 1,304 (30.00%)        
##     4         1,260 (30.00%) 1,246 (30.00%)        
## |=======================================================

NA Keep

In order to explore the missingness in the factor variables, using NAkeep = TRUE does the counts and percentages of the missing values as well.

table1(df,
       a, b, c,
       splitby = ~d,
       test = TRUE,
       NAkeep = TRUE)
## 
## |=====================================================
##               0             1             P-Value
##  Observations 5022          4978                 
##  a                                        0.608  
##               -0.02 (1.00)  -0.01 (1.02)         
##  b                                        0.53   
##               0.50 (1.05)   0.52 (1.04)          
##  c                                        0.149  
##     1         1267 (30.00%) 1224 (20.00%)        
##     2         1274 (30.00%) 1204 (20.00%)        
##     3         1221 (20.00%) 1304 (30.00%)        
##     4         1260 (30.00%) 1246 (30.00%)        
##     NA        0 (0.00%)     0 (0.00%)            
## |=====================================================

Here we do not have any missingness but it shows up as zeros to show that there are none there.

Piping

Finally, to make it easier to implement in the tidyverse of packages, a piping option is available. This option invisibly returns the data frame that was given to the table 1 function and prints the table in console.

library(tidyverse)

df %>%
  table1(a, b, c,
         splitby = ~d,
         test = TRUE,
         piping = TRUE) %>%
  ggplot(aes(x = b, y = a, group = d)) +
    geom_point(aes(color = d), alpha =.25) +
    scale_color_manual(values = c("dodgerblue3", "chartreuse4"), name = "Group")
## 
## |=====================================================
##               0             1             P-Value
##  Observations 5022          4978                 
##  a                                        0.608  
##               -0.02 (1.00)  -0.01 (1.02)         
##  b                                        0.53   
##               0.50 (1.05)   0.52 (1.04)          
##  c                                        0.149  
##     1         1267 (30.00%) 1224 (20.00%)        
##     2         1274 (30.00%) 1204 (20.00%)        
##     3         1221 (20.00%) 1304 (30.00%)        
##     4         1260 (30.00%) 1246 (30.00%)        
## |=====================================================

Splitby Labels and Var Names

The splitby_labels argument gives you the opportunity to name the levels of the splitby factor and the var_names argument lets you rename the variables.

table1(df,
       a, b, c,
       splitby = ~d,
       test = TRUE,
       splitby_labels = c("No", "Yes"),
       var_names = c("A", "B", "C"))
## 
## |=====================================================
##               No            Yes           P-Value
##  Observations 5022          4978                 
##  A                                        0.608  
##               -0.02 (1.00)  -0.01 (1.02)         
##  B                                        0.53   
##               0.50 (1.05)   0.52 (1.04)          
##  C                                        0.149  
##     1         1267 (30.00%) 1224 (20.00%)        
##     2         1274 (30.00%) 1204 (20.00%)        
##     3         1221 (20.00%) 1304 (30.00%)        
##     4         1260 (30.00%) 1246 (30.00%)        
## |=====================================================

This is particularly useful when you adjust a variable within the function:

table1(df,
       factor(ifelse(a > 1, 1, 0)), b, c,
       splitby = ~d,
       test = TRUE,
       splitby_labels = c("No", "Yes"),
       var_names = c("A", "B", "C"))
## 
## |=====================================================
##               No            Yes           P-Value
##  Observations 5022          4978                 
##  A                                        0.642  
##     0         4250 (80.00%) 4195 (80.00%)        
##     1         772 (20.00%)  783 (20.00%)         
##  B                                        0.53   
##               0.50 (1.05)   0.52 (1.04)          
##  C                                        0.149  
##     1         1267 (30.00%) 1224 (20.00%)        
##     2         1274 (30.00%) 1204 (20.00%)        
##     3         1221 (20.00%) 1304 (30.00%)        
##     4         1260 (30.00%) 1246 (30.00%)        
## |=====================================================

Here we changed a to a factor within the function. In order for the name to look better, we can use the var_names argument, otherwise it would be named something like factor.ifelse.a....

Final Note

As a final note, the "table1" object can be coerced to a data.frame very easily:

tab1 = table1(df,
              a, b, c,
              splitby = ~d,
              test = TRUE)
data.frame(tab1)
##              X.            X0            X1 P.Value
## 1  Observations          5022          4978        
## 6             a                               0.608
## 7                -0.02 (1.00)  -0.01 (1.02)        
## 8             b                                0.53
## 9                 0.50 (1.05)   0.52 (1.04)        
## 10            c                               0.149
## 11            1 1267 (30.00%) 1224 (20.00%)        
## 12            2 1274 (30.00%) 1204 (20.00%)        
## 13            3 1221 (20.00%) 1304 (30.00%)        
## 14            4 1260 (30.00%) 1246 (30.00%)

Conclusions

table1 can be a valuable addition to the tools that are being utilized to analyze biobehavioral and social data. Let me know if you find any bugs or if you have suggestions on development.