Get started

The goal of this package is for easily applying same t-tests/basic data description across several sub-groups, with the output as a nice arranged data.frame. Multiple comparison and the significance symbols are also provided.

This kind of analysis is commonly seen in ROI (Region-of-interest) analysis of brain imaging data. That’s why the package is called roistats.

library(roistats)

Get some basic description about the data

After data cleaning and wrangling, we yield a data.frame called color_index. This data.frame contains the neural analysis result of the degree of color memory sensitivity at each brain region of each subject. color_index has three columns:

head(color_index)
#>   subj_id  roi_id  color_index
#> 1      01     AnG -0.032384500
#> 2      01 dLatIPS -0.042524083
#> 3      01      LO -0.032643250
#> 4      01    pIPS -0.014760833
#> 5      01      V1 -0.001259167
#> 6      01    vIPS -0.023800500

Before we dive into the statistical test, we want to get mean, sd, and se (standard error of the mean) for the color_index at each brain region. df_sem function provided in the package can help us with this.

To use this function, you need to use group_by from dplyr to group your data.frame and obtain the desired sub-groups which you want to get the stats summary.

Next step, specify the data.frame and the column’s name of the variable which you want to the stats summary. In this case, the data.frame is called color_index, and the column is also called color_index (a confusing example, sorry).

Note, the data.frame color_index was already grouped by roi_id.

str(color_index)
#> Classes 'grouped_df', 'tbl_df', 'tbl' and 'data.frame':  232 obs. of  3 variables:
#>  $ subj_id    : chr  "01" "01" "01" "01" ...
#>  $ roi_id     : chr  "AnG" "dLatIPS" "LO" "pIPS" ...
#>  $ color_index: num  -0.03238 -0.04252 -0.03264 -0.01476 -0.00126 ...
#>  - attr(*, "groups")=Classes 'tbl_df', 'tbl' and 'data.frame':   8 obs. of  2 variables:
#>   ..$ roi_id: chr [1:8] "AnG" "dLatIPS" "LO" "pIPS" ...
#>   ..$ .rows :List of 8
#>   .. ..$ : int [1:29] 1 9 17 25 33 41 49 57 65 73 ...
#>   .. ..$ : int [1:29] 2 10 18 26 34 42 50 58 66 74 ...
#>   .. ..$ : int [1:29] 3 11 19 27 35 43 51 59 67 75 ...
#>   .. ..$ : int [1:29] 4 12 20 28 36 44 52 60 68 76 ...
#>   .. ..$ : int [1:29] 5 13 21 29 37 45 53 61 69 77 ...
#>   .. ..$ : int [1:29] 6 14 22 30 38 46 54 62 70 78 ...
#>   .. ..$ : int [1:29] 7 15 23 31 39 47 55 63 71 79 ...
#>   .. ..$ : int [1:29] 8 16 24 32 40 48 56 64 72 80 ...
#>   .. ..- attr(*, "ptype")= int(0) 
#>   .. ..- attr(*, "class")= chr [1:3] "vctrs_list_of" "vctrs_vctr" "list"
#>   ..- attr(*, ".drop")= logi TRUE

df_sem(color_index, color_index) # first arg refers the data.frame; second arg refers the coloumn
#> # A tibble: 8 x 5
#>   roi_id  mean_color_index     sd     n      se
#>   <chr>              <dbl>  <dbl> <int>   <dbl>
#> 1 AnG              0.00537 0.0507    29 0.00942
#> 2 dLatIPS          0.0159  0.0510    29 0.00946
#> 3 LO               0.0181  0.0428    29 0.00796
#> 4 pIPS             0.0102  0.0297    29 0.00552
#> 5 V1               0.00955 0.0421    29 0.00782
#> 6 vIPS             0.0162  0.0327    29 0.00607
#> 7 vLatIPS          0.0162  0.0514    29 0.00955
#> 8 VTC              0.00468 0.0218    29 0.00405

You can also achieve this in a typical tidyverse pipeline.

library(magrittr) # No need to import magrittr if you have imported tidyverse already
color_index_summary <- color_index %>%
  df_sem(color_index)

knitr::kable(color_index_summary, digits = 3)
roi_id mean_color_index sd n se
AnG 0.005 0.051 29 0.009
dLatIPS 0.016 0.051 29 0.009
LO 0.018 0.043 29 0.008
pIPS 0.010 0.030 29 0.006
V1 0.010 0.042 29 0.008
vIPS 0.016 0.033 29 0.006
vLatIPS 0.016 0.051 29 0.010
VTC 0.005 0.022 29 0.004

Yay! We have obtained the SEM (which is commonly used for error bar plotting in psych and cog neuro area) for each sub-group easily.

One-sample t-tests for all sub-groups

Now, we want to test whether color_index is significantly against 0 for each sub-group (roi_id). That is, for each roi_id sub-group, we want to test whether the values of column color_index of the data.frame color_index is significantly different from 0. Here, we have eight sub-groups, which means we will get eight one-sample t-test results in total. At a first step analysis to figure out which brain region would be interesting, we don’t care much about the very detailed output from the t.test function provided by {stats} package. So, here we have this t_test_one_sample function that help us apply the same t-test to each sub-group, extract the key results, and wrap everything in a data.frame.

Again, the data.frame color_index was already grouped by roi_id.

t_test_one_sample(color_index, "color_index", mu = 0)
#> # A tibble: 8 x 5
#> # Groups:   roi_id [8]
#>   roi_id  tvalue    df      p p_bonferroni
#>   <chr>    <dbl> <dbl>  <dbl>        <dbl>
#> 1 AnG      0.570    28 0.573        1     
#> 2 dLatIPS  1.68     28 0.104        0.835 
#> 3 LO       2.27     28 0.0311       0.249 
#> 4 pIPS     1.85     28 0.0752       0.601 
#> 5 V1       1.22     28 0.232        1     
#> 6 vIPS     2.67     28 0.0124       0.0991
#> 7 vLatIPS  1.69     28 0.101        0.811 
#> 8 VTC      1.16     28 0.257        1

Here, we see the t-values, dfs, ps, and bonferroni corrected ps! Nice, we get the t-stats for each brain region, and multiple comparison corrected p-values are even provided.

However, I believe the bonferroni method is too conservative, and I want to compare the fdr method results with it. This time, we write things up in a tidyverse format again:

color_index_one_sample_t_res <- color_index %>%
  t_test_one_sample("color_index", mu = 0, p_adjust = c("bonferroni","fdr"))
knitr::kable(color_index_one_sample_t_res, digits = 3)
roi_id tvalue df p p_bonferroni p_fdr
AnG 0.570 28 0.573 1.000 0.573
dLatIPS 1.678 28 0.104 0.835 0.167
LO 2.270 28 0.031 0.249 0.124
pIPS 1.848 28 0.075 0.601 0.167
V1 1.221 28 0.232 1.000 0.294
vIPS 2.673 28 0.012 0.099 0.099
vLatIPS 1.694 28 0.101 0.811 0.167
VTC 1.156 28 0.257 1.000 0.294

Significance symbols for a even clearer table and possible visulization

Usually, we want the significance symbol to highlight the result table or the plot. Here we have the p_range function to create the significance symbol:

library(dplyr)
color_index_one_sample_t_with_sig <- color_index_one_sample_t_res %>% 
  mutate(sig_origin_p = p_range(p))
knitr::kable(color_index_one_sample_t_with_sig, digits = 3)
roi_id tvalue df p p_bonferroni p_fdr sig_origin_p
AnG 0.570 28 0.573 1.000 0.573
dLatIPS 1.678 28 0.104 0.835 0.167
LO 2.270 28 0.031 0.249 0.124 *
pIPS 1.848 28 0.075 0.601 0.167
V1 1.221 28 0.232 1.000 0.294
vIPS 2.673 28 0.012 0.099 0.099 *
vLatIPS 1.694 28 0.101 0.811 0.167
VTC 1.156 28 0.257 1.000 0.294

You can use p_range for a single number too:

p_range(0.002)
#> [1] "**"

Two-sample t-tests for all sub-groups

t_test_two_sample is for applying two-sample t-tests to all sub-groups.

Here we have color_index_two_sample: * subj_id: identify the subjects. This labels the single data point within each roi_id. * roi_id: brain sub-region that of interest for the analysis. We are interested in eight brain regions. * group: whether the test was Paired condition or Control condition. * color_effect: the value that indicate the memory trace of color. For each subj_id at each with each test condition group at each brain regionroi_id, we obtained a single color_effect value.

Note, the data.frame was already grouped by roi_id.

head(color_index_two_sample)
#> # A tibble: 6 x 4
#> # Groups:   roi_id [6]
#>   subj_id roi_id  group  color_effect
#>   <chr>   <chr>   <fct>         <dbl>
#> 1 01      AnG     Paired     -0.0155 
#> 2 01      dLatIPS Paired     -0.0484 
#> 3 01      LO      Paired     -0.00366
#> 4 01      pIPS    Paired     -0.0398 
#> 5 01      V1      Paired     -0.0120 
#> 6 01      vIPS    Paired     -0.0366
str(color_index_two_sample)
#> tibble [464 × 4] (S3: grouped_df/tbl_df/tbl/data.frame)
#>  $ subj_id     : chr [1:464] "01" "01" "01" "01" ...
#>  $ roi_id      : chr [1:464] "AnG" "dLatIPS" "LO" "pIPS" ...
#>  $ group       : Factor w/ 2 levels "Paired","Control": 1 1 1 1 1 1 1 1 1 1 ...
#>  $ color_effect: num [1:464] -0.01546 -0.04841 -0.00366 -0.03982 -0.01201 ...
#>  - attr(*, "groups")= tibble [8 × 2] (S3: tbl_df/tbl/data.frame)
#>   ..$ roi_id: chr [1:8] "AnG" "dLatIPS" "LO" "pIPS" ...
#>   ..$ .rows : list<int> [1:8] 
#>   .. ..$ : int [1:58] 1 9 17 25 33 41 49 57 65 73 ...
#>   .. ..$ : int [1:58] 2 10 18 26 34 42 50 58 66 74 ...
#>   .. ..$ : int [1:58] 3 11 19 27 35 43 51 59 67 75 ...
#>   .. ..$ : int [1:58] 4 12 20 28 36 44 52 60 68 76 ...
#>   .. ..$ : int [1:58] 5 13 21 29 37 45 53 61 69 77 ...
#>   .. ..$ : int [1:58] 6 14 22 30 38 46 54 62 70 78 ...
#>   .. ..$ : int [1:58] 7 15 23 31 39 47 55 63 71 79 ...
#>   .. ..$ : int [1:58] 8 16 24 32 40 48 56 64 72 80 ...
#>   .. ..@ ptype: int(0) 
#>   ..- attr(*, ".drop")= logi TRUE

Here is the example of how to obtain the paired t-test for each sub-group:

t_test_two_sample(color_index_two_sample, x = "color_effect", y = "group", paired = TRUE)
#> # A tibble: 8 x 5
#> # Groups:   roi_id [8]
#>   roi_id  tvalue    df      p p_bonferroni
#>   <chr>    <dbl> <dbl>  <dbl>        <dbl>
#> 1 AnG      0.570    28 0.573        1     
#> 2 dLatIPS  1.68     28 0.104        0.835 
#> 3 LO       2.27     28 0.0311       0.249 
#> 4 pIPS     1.85     28 0.0752       0.601 
#> 5 V1       1.22     28 0.232        1     
#> 6 vIPS     2.67     28 0.0124       0.0991
#> 7 vLatIPS  1.69     28 0.101        0.811 
#> 8 VTC      1.16     28 0.257        1

Can be integrated into tidyverse pipeline too.

color_index_two_sample_t_res <- color_index_two_sample %>%
  t_test_two_sample(
    x = "color_effect", y = "group", paired = TRUE, p_adjust = c("bonferroni","fdr")
  )
knitr::kable(color_index_two_sample_t_res, digits = 3)
roi_id tvalue df p p_bonferroni p_fdr
AnG 0.570 28 0.573 1.000 0.573
dLatIPS 1.678 28 0.104 0.835 0.167
LO 2.270 28 0.031 0.249 0.124
pIPS 1.848 28 0.075 0.601 0.167
V1 1.221 28 0.232 1.000 0.294
vIPS 2.673 28 0.012 0.099 0.099
vLatIPS 1.694 28 0.101 0.811 0.167
VTC 1.156 28 0.257 1.000 0.294