# Functions for descriptive statistics

#### 2020-01-26

You can make tables summarizing descriptive statistics easily with webr package.

## Installation of packages

You have to install the latest versions of “webr” and “moonBook” packages from github.

if(!require(devtools)) install.packages("devtools")
devtools::install_github("cardiomoon/webr")
devtools::install_github("cardiomoon/moonBook")   # For examples
devtools::install_github("cardiomoon/rrtable")    # For reproducible research

## Load packages

require(webr)
require(moonBook) # For data acs

## Summarizing Frequencies

You can summmarize the frequencies easily with freqSummary() function. Also you can make a table summarizng frequencies with freqTable() function.

freqSummary(acs$Dx)  Count Percent Valid Percent Cum Percent NSTEMI "153" "17.9" "17.9" "17.9" STEMI "304" "35.5" "35.5" "53.3" Unstable Angina "400" "46.7" "46.7" "100.0" Sum "857" "100.0" "100.0" ""  freqTable(acs$Dx)
 rowname Count Percent Valid Percent Cum Percent NSTEMI 153 17.9 17.9 17.9 STEMI 304 35.5 35.5 53.3 Unstable Angina 400 46.7 46.7 100.0 Sum 857 100.0 100.0

### Ready for reproducible research

The freqTable() function returns an object of class “flextable”. With this object, you can make html, pdf, docx, pptx file easily.

result=freqTable(acs$Dx) class(result) [1] "flextable" ### Frequency table for a continuous variable You can make the frequency table for a continuous variable. In this time, you can get a long table. freqTable(mtcars$mpg)
 rowname Count Percent Valid Percent Cum Percent 10.4 2 6.2 6.2 6.2 13.3 1 3.1 3.1 9.4 14.3 1 3.1 3.1 12.5 14.7 1 3.1 3.1 15.6 15 1 3.1 3.1 18.8 15.2 2 6.2 6.2 25.0 15.5 1 3.1 3.1 28.1 15.8 1 3.1 3.1 31.2 16.4 1 3.1 3.1 34.4 17.3 1 3.1 3.1 37.5 17.8 1 3.1 3.1 40.6 18.1 1 3.1 3.1 43.8 18.7 1 3.1 3.1 46.9 19.2 2 6.2 6.2 53.1 19.7 1 3.1 3.1 56.2 21 2 6.2 6.2 62.5 21.4 2 6.2 6.2 68.8 21.5 1 3.1 3.1 71.9 22.8 2 6.2 6.2 78.1 24.4 1 3.1 3.1 81.2 26 1 3.1 3.1 84.4 27.3 1 3.1 3.1 87.5 30.4 2 6.2 6.2 93.8 32.4 1 3.1 3.1 96.9 33.9 1 3.1 3.1 100.0 Sum 32 100.0 100.0

## Frequency table for two categorical variables

You can make a table summarizing the independency of two categorical variables.

x2Table(acs,Dx,sex)
 rowname Female Male Total NSTEMI 50(32.70%) 103(67.30%) 153(100 %) STEMI 84(27.60%) 220(72.40%) 304(100 %) Unstable Angina 153(38.20%) 247(61.80%) 400(100 %) Total 287(33.50%) 570(66.50%) 857(100 %) Chi-squared=8.798, df=2, Cramer's V=0.101, Chi-squared p=0.0123

You can make a table with columnwise percentages.

x2Table(acs,Dx,sex,margin=2)
 rowname Female Male Total NSTEMI 50(17.40%) 103(18.10%) 153(17.90%) STEMI 84(29.30%) 220(38.60%) 304(35.50%) Unstable Angina 153(53.30%) 247(43.30%) 400(46.70%) Total 287(100 %) 570(100 %) 857(100 %) Chi-squared=8.798, df=2, Cramer's V=0.101, Chi-squared p=0.0123

You can hide pecentages.

x2Table(acs,Dx,sex,show.percent=FALSE)
 rowname Female Male Total NSTEMI 50 103 153 STEMI 84 220 304 Unstable Angina 153 247 400 Total 287 570 857 Chi-squared=8.798, df=2, Cramer's V=0.101, Chi-squared p=0.0123

## Numerical summary

### Numerical summary of a vector

You can make a numerical summary table with numSummary() function. If you use the numSummary() function to a continuous vector, you can get the following summary. This function uses psych::describe function

require(dplyr)
numSummary(acs$age) # A tibble: 1 x 12 n mean sd median trimmed mad min max range skew kurtosis se <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 857 63.3 11.7 64 63.6 13.3 28 91 63 -0.175 -0.566 0.400 numSummaryTable(acs$age)
 n mean sd median trimmed mad min max range skew kurtosis se 857.00 63.31 11.70 64.00 63.56 13.34 28.00 91.00 63.00 -0.18 -0.57 0.40

### Numerical summary of a data.frame or a tibble

You can make a numerical summary of a data.frame. The numSummary function uses is.numeric function to select numeric columns and make a numeric summary.

numSummary(acs)
# A tibble: 9 x 13
vars      n  mean    sd median trimmed   mad   min   max range   skew kurtosis
<chr> <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>
1 age     857  63.3 11.7    64      63.6 13.3   28    91    63   -0.175  -0.566
2 EF      723  55.8  9.62   58.1    56.8  7.86  18    79    61   -0.978   1.11
3 heig…   764 163.   9.08  165     164.   7.41 130   185    55   -0.440  -0.0145
4 weig…   766  64.8 11.4    65      64.5 10.4   30   112    82    0.336   0.444
5 BMI     764  24.3  3.35   24.2    24.2  3.01  15.6  41.4  25.8  0.668   2.12
6 TC      834 185.  47.8   183     184.  43.0   25   493   468    0.737   3.77
7 LDLC    833 117.  41.1   114     115.  40.0   15   366   351    0.787   2.33
8 HDLC    834  38.2 11.1    38      38.0 10.4    4    89    85    0.366   1.46
9 TG      842 125.  90.9   106.    111.  60.0   11   877   866    3.02   14.9
# … with 1 more variable: se <dbl>
numSummaryTable(acs)
 rowname vars n mean sd median trimmed mad min max range skew kurtosis se 1 age 857.00 63.31 11.70 64.00 63.56 13.34 28.00 91.00 63.00 -0.18 -0.57 0.40 2 EF 723.00 55.83 9.62 58.10 56.77 7.86 18.00 79.00 61.00 -0.98 1.11 0.36 3 height 764.00 163.18 9.08 165.00 163.52 7.41 130.00 185.00 55.00 -0.44 -0.01 0.33 4 weight 766.00 64.84 11.36 65.00 64.55 10.38 30.00 112.00 82.00 0.34 0.44 0.41 5 BMI 764.00 24.28 3.35 24.16 24.16 3.01 15.62 41.42 25.80 0.67 2.12 0.12 6 TC 834.00 185.20 47.77 183.00 183.76 43.00 25.00 493.00 468.00 0.74 3.77 1.65 7 LDLC 833.00 116.58 41.09 114.00 114.62 40.03 15.00 366.00 351.00 0.79 2.33 1.42 8 HDLC 834.00 38.24 11.09 38.00 37.95 10.38 4.00 89.00 85.00 0.37 1.46 0.38 9 TG 842.00 125.24 90.85 105.50 111.29 60.05 11.00 877.00 866.00 3.02 14.91 3.13

### Use of dplyr::group_by() and dplyr::select() function to summarize

You can use dplyr::select() function to select variables to summarize.

acs %>% select(age,EF) %>% numSummary
# A tibble: 2 x 13
vars      n  mean    sd median trimmed   mad   min   max range   skew kurtosis
<chr> <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>
1 age     857  63.3 11.7    64      63.6 13.3     28    91    63 -0.175   -0.566
2 EF      723  55.8  9.62   58.1    56.8  7.86    18    79    61 -0.978    1.11
# … with 1 more variable: se <dbl>
acs %>% select(age,EF) %>% numSummaryTable
 rowname vars n mean sd median trimmed mad min max range skew kurtosis se 1 age 857.00 63.31 11.70 64.00 63.56 13.34 28.00 91.00 63.00 -0.18 -0.57 0.40 2 EF 723.00 55.83 9.62 58.10 56.77 7.86 18.00 79.00 61.00 -0.98 1.11 0.36

You can use dplyr::group_by() and dplyr::select() function to select variables to summarize by group.

acs %>% group_by(sex) %>% select(age,EF) %>% numSummary
# A tibble: 4 x 14
# Groups:   sex [2]
sex   vars      n  mean    sd median trimmed   mad   min   max range    skew
<chr> <chr> <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
1 Male  age     570  60.6 11.2    61      60.6 11.9   28      91  63   -0.0148
2 Male  EF      483  55.6  9.40   57.3    56.4  8.01  18      79  61   -0.789
3 Fema… age     287  68.7 10.7    70      69.4 10.4   39      90  51   -0.593
4 Fema… EF      240  56.3 10.1    59.2    57.6  7.19  18.4    75  56.6 -1.30
# … with 2 more variables: kurtosis <dbl>, se <dbl>
acs %>% group_by(sex) %>% select(age,EF) %>% numSummaryTable
 sex vars n mean sd median trimmed mad min max range skew kurtosis se Male age 570.00 60.61 11.23 61.00 60.65 11.86 28.00 91.00 63.00 -0.01 -0.36 0.47 Male EF 483.00 55.62 9.40 57.30 56.38 8.01 18.00 79.00 61.00 -0.79 0.76 0.43 Female age 287.00 68.68 10.73 70.00 69.43 10.38 39.00 90.00 51.00 -0.59 -0.26 0.63 Female EF 240.00 56.27 10.06 59.25 57.57 7.19 18.40 75.00 56.60 -1.30 1.70 0.65

You can summarize by multiple groups.

acs %>% group_by(sex,Dx) %>% select(age,EF) %>% numSummary
# A tibble: 12 x 15
# Groups:   sex, Dx [6]
sex   Dx    vars      n  mean    sd median trimmed   mad   min   max range
<chr> <chr> <chr> <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl>
1 Male  STEMI age     220  59.4 11.7    59.5    59.4 11.1   30    86    56
2 Male  STEMI EF      195  52.4  8.90   54      52.9  8.45  18    73.6  55.6
3 Fema… STEMI age      84  69.1 10.4    70      70.0 10.4   42    89    47
4 Fema… STEMI EF       77  52.3 10.9    55.7    53.7  9.04  18.4  67.1  48.7
5 Male  NSTE… age     103  61.1 11.6    59      61.3 13.3   28    85    57
6 Male  NSTE… EF       94  55.1  9.42   58      55.9  7.12  21.8  74    52.2
7 Fema… Unst… age     153  67.7 10.7    70      68.3  8.90  39    90    51
8 Fema… Unst… EF      118  59.4  8.76   61.1    60.8  5.49  22    71.9  49.9
9 Male  Unst… age     247  61.4 10.6    61      61.4 10.4   35    91    56
10 Male  Unst… EF      194  59.1  8.67   60      60.2  5.93  24.7  79    54.3
11 Fema… NSTE… age      50  70.9 11.4    74.5    71.9  8.90  42    88    46
12 Fema… NSTE… EF       45  54.8  9.10   57      55.3  9.79  36.8  75    38.2
# … with 3 more variables: skew <dbl>, kurtosis <dbl>, se <dbl>
acs %>% group_by(sex,Dx) %>% select(age,EF) %>% numSummaryTable
 sex Dx vars n mean sd median trimmed mad min max range skew kurtosis se Male STEMI age 220.00 59.43 11.72 59.50 59.43 11.12 30.00 86.00 56.00 0.00 -0.55 0.79 Male STEMI EF 195.00 52.37 8.90 54.00 52.88 8.45 18.00 73.60 55.60 -0.62 0.53 0.64 Female STEMI age 84.00 69.11 10.36 70.00 70.04 10.38 42.00 89.00 47.00 -0.65 -0.09 1.13 Female STEMI EF 77.00 52.32 10.94 55.70 53.72 9.04 18.40 67.10 48.70 -1.17 1.01 1.25 Male NSTEMI age 103.00 61.15 11.57 59.00 61.28 13.34 28.00 85.00 57.00 -0.11 -0.53 1.14 Male NSTEMI EF 94.00 55.08 9.42 58.00 55.86 7.12 21.80 74.00 52.20 -0.83 0.57 0.97 Female Unstable Angina age 153.00 67.72 10.67 70.00 68.33 8.90 39.00 90.00 51.00 -0.54 -0.34 0.86 Female Unstable Angina EF 118.00 59.40 8.76 61.10 60.79 5.49 22.00 71.90 49.90 -1.86 4.06 0.81 Male Unstable Angina age 247.00 61.44 10.57 61.00 61.41 10.38 35.00 91.00 56.00 0.07 -0.15 0.67 Male Unstable Angina EF 194.00 59.14 8.67 60.00 60.15 5.93 24.70 79.00 54.30 -1.25 2.54 0.62 Female NSTEMI age 50.00 70.88 11.35 74.50 71.88 8.90 42.00 88.00 46.00 -0.72 -0.34 1.61 Female NSTEMI EF 45.00 54.85 9.10 57.00 55.26 9.79 36.80 75.00 38.20 -0.32 -0.83 1.36

## For reproducible research

You can use package rrtable for reproducible research.

require(rrtable)
type=c("table","table")
title=c("Frequency Table","Numerical Summary")
code=c("freqTable(acs\$Dx)","acs %>% group_by(sex) %>% select(EF,age) %>% numSummaryTable")
data=data.frame(type,title,code,stringsAsFactors = FALSE)
data2pptx(data)
[1] "/var/folders/ft/_w6lflrs4mz4f8n_r5w_h7vh0000gn/T/Rtmp5bd86I/Report.pptx"
data2docx(data)
[1] "/var/folders/ft/_w6lflrs4mz4f8n_r5w_h7vh0000gn/T/Rtmp5bd86I/Report.docx"