The goal of this package is for easily applying same t-tests/basic data description across several sub-groups, with the output as a nice arranged data.frame
. Multiple comparison and the significance symbols are also provided.
This kind of analysis is commonly seen in ROI (Region-of-interest) analysis of brain imaging data. That’s why the package is called roistats
.
After data cleaning and wrangling, we yield a data.frame called color_index
. This data.frame contains the neural analysis result of the degree of color memory sensitivity at each brain region of each subject. color_index
has three columns:
subj_id
: identify the subjects. This labels the single data point within each roi_id
.roi_id
: brain sub-region that of interest for the analysis. We are interested in eight brain regions.color_index
: the value that indicate how sensitive of a certain brain region to the memory of color. For each subj_id
and roi_id
, we obtained a single color_index
value.head(color_index)
#> subj_id roi_id color_index
#> 1 01 AnG -0.032384500
#> 2 01 dLatIPS -0.042524083
#> 3 01 LO -0.032643250
#> 4 01 pIPS -0.014760833
#> 5 01 V1 -0.001259167
#> 6 01 vIPS -0.023800500
Before we dive into the statistical test, we want to get mean
, sd
, and se
(standard error of the mean) for the color_index
at each brain region. df_sem
function provided in the package can help us with this.
To use this function, you need to use group_by
from dplyr
to group your data.frame
and obtain the desired sub-groups which you want to get the stats summary.
Next step, specify the data.frame
and the column’s name of the variable which you want to the stats summary. In this case, the data.frame
is called color_index
, and the column is also called color_index
(a confusing example, sorry).
Note, the data.frame color_index
was already grouped by roi_id
.
str(color_index)
#> Classes 'grouped_df', 'tbl_df', 'tbl' and 'data.frame': 232 obs. of 3 variables:
#> $ subj_id : chr "01" "01" "01" "01" ...
#> $ roi_id : chr "AnG" "dLatIPS" "LO" "pIPS" ...
#> $ color_index: num -0.03238 -0.04252 -0.03264 -0.01476 -0.00126 ...
#> - attr(*, "groups")=Classes 'tbl_df', 'tbl' and 'data.frame': 8 obs. of 2 variables:
#> ..$ roi_id: chr [1:8] "AnG" "dLatIPS" "LO" "pIPS" ...
#> ..$ .rows :List of 8
#> .. ..$ : int [1:29] 1 9 17 25 33 41 49 57 65 73 ...
#> .. ..$ : int [1:29] 2 10 18 26 34 42 50 58 66 74 ...
#> .. ..$ : int [1:29] 3 11 19 27 35 43 51 59 67 75 ...
#> .. ..$ : int [1:29] 4 12 20 28 36 44 52 60 68 76 ...
#> .. ..$ : int [1:29] 5 13 21 29 37 45 53 61 69 77 ...
#> .. ..$ : int [1:29] 6 14 22 30 38 46 54 62 70 78 ...
#> .. ..$ : int [1:29] 7 15 23 31 39 47 55 63 71 79 ...
#> .. ..$ : int [1:29] 8 16 24 32 40 48 56 64 72 80 ...
#> .. ..- attr(*, "ptype")= int(0)
#> .. ..- attr(*, "class")= chr [1:3] "vctrs_list_of" "vctrs_vctr" "list"
#> ..- attr(*, ".drop")= logi TRUE
df_sem(color_index, color_index) # first arg refers the data.frame; second arg refers the coloumn
#> # A tibble: 8 x 5
#> roi_id mean_color_index sd n se
#> <chr> <dbl> <dbl> <int> <dbl>
#> 1 AnG 0.00537 0.0507 29 0.00942
#> 2 dLatIPS 0.0159 0.0510 29 0.00946
#> 3 LO 0.0181 0.0428 29 0.00796
#> 4 pIPS 0.0102 0.0297 29 0.00552
#> 5 V1 0.00955 0.0421 29 0.00782
#> 6 vIPS 0.0162 0.0327 29 0.00607
#> 7 vLatIPS 0.0162 0.0514 29 0.00955
#> 8 VTC 0.00468 0.0218 29 0.00405
You can also achieve this in a typical tidyverse
pipeline.
library(magrittr) # No need to import magrittr if you have imported tidyverse already
color_index_summary <- color_index %>%
df_sem(color_index)
knitr::kable(color_index_summary, digits = 3)
roi_id | mean_color_index | sd | n | se |
---|---|---|---|---|
AnG | 0.005 | 0.051 | 29 | 0.009 |
dLatIPS | 0.016 | 0.051 | 29 | 0.009 |
LO | 0.018 | 0.043 | 29 | 0.008 |
pIPS | 0.010 | 0.030 | 29 | 0.006 |
V1 | 0.010 | 0.042 | 29 | 0.008 |
vIPS | 0.016 | 0.033 | 29 | 0.006 |
vLatIPS | 0.016 | 0.051 | 29 | 0.010 |
VTC | 0.005 | 0.022 | 29 | 0.004 |
Yay! We have obtained the SEM
(which is commonly used for error bar plotting in psych and cog neuro area) for each sub-group easily.
Now, we want to test whether color_index
is significantly against 0 for each sub-group (roi_id
). That is, for each roi_id
sub-group, we want to test whether the values of column color_index
of the data.frame color_index
is significantly different from 0
. Here, we have eight sub-groups, which means we will get eight one-sample t-test results in total. At a first step analysis to figure out which brain region would be interesting, we don’t care much about the very detailed output from the t.test
function provided by {stats}
package. So, here we have this t_test_one_sample
function that help us apply the same t-test to each sub-group, extract the key results, and wrap everything in a data.frame
.
Again, the data.frame color_index
was already grouped by roi_id
.
t_test_one_sample(color_index, "color_index", mu = 0)
#> # A tibble: 8 x 5
#> # Groups: roi_id [8]
#> roi_id tvalue df p p_bonferroni
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 AnG 0.570 28 0.573 1
#> 2 dLatIPS 1.68 28 0.104 0.835
#> 3 LO 2.27 28 0.0311 0.249
#> 4 pIPS 1.85 28 0.0752 0.601
#> 5 V1 1.22 28 0.232 1
#> 6 vIPS 2.67 28 0.0124 0.0991
#> 7 vLatIPS 1.69 28 0.101 0.811
#> 8 VTC 1.16 28 0.257 1
Here, we see the t-values, dfs, ps, and bonferroni corrected ps! Nice, we get the t-stats for each brain region, and multiple comparison corrected p-values are even provided.
However, I believe the bonferroni method is too conservative, and I want to compare the fdr method results with it. This time, we write things up in a tidyverse
format again:
color_index_one_sample_t_res <- color_index %>%
t_test_one_sample("color_index", mu = 0, p_adjust = c("bonferroni","fdr"))
knitr::kable(color_index_one_sample_t_res, digits = 3)
roi_id | tvalue | df | p | p_bonferroni | p_fdr |
---|---|---|---|---|---|
AnG | 0.570 | 28 | 0.573 | 1.000 | 0.573 |
dLatIPS | 1.678 | 28 | 0.104 | 0.835 | 0.167 |
LO | 2.270 | 28 | 0.031 | 0.249 | 0.124 |
pIPS | 1.848 | 28 | 0.075 | 0.601 | 0.167 |
V1 | 1.221 | 28 | 0.232 | 1.000 | 0.294 |
vIPS | 2.673 | 28 | 0.012 | 0.099 | 0.099 |
vLatIPS | 1.694 | 28 | 0.101 | 0.811 | 0.167 |
VTC | 1.156 | 28 | 0.257 | 1.000 | 0.294 |
Usually, we want the significance symbol to highlight the result table or the plot. Here we have the p_range
function to create the significance symbol:
library(dplyr)
color_index_one_sample_t_with_sig <- color_index_one_sample_t_res %>%
mutate(sig_origin_p = p_range(p))
knitr::kable(color_index_one_sample_t_with_sig, digits = 3)
roi_id | tvalue | df | p | p_bonferroni | p_fdr | sig_origin_p |
---|---|---|---|---|---|---|
AnG | 0.570 | 28 | 0.573 | 1.000 | 0.573 | |
dLatIPS | 1.678 | 28 | 0.104 | 0.835 | 0.167 | |
LO | 2.270 | 28 | 0.031 | 0.249 | 0.124 | * |
pIPS | 1.848 | 28 | 0.075 | 0.601 | 0.167 | |
V1 | 1.221 | 28 | 0.232 | 1.000 | 0.294 | |
vIPS | 2.673 | 28 | 0.012 | 0.099 | 0.099 | * |
vLatIPS | 1.694 | 28 | 0.101 | 0.811 | 0.167 | |
VTC | 1.156 | 28 | 0.257 | 1.000 | 0.294 |
You can use p_range
for a single number too:
t_test_two_sample
is for applying two-sample t-tests to all sub-groups.
Here we have color_index_two_sample
: * subj_id
: identify the subjects. This labels the single data point within each roi_id
. * roi_id
: brain sub-region that of interest for the analysis. We are interested in eight brain regions. * group
: whether the test was Paired
condition or Control
condition. * color_effect
: the value that indicate the memory trace of color. For each subj_id
at each with each test condition group
at each brain regionroi_id
, we obtained a single color_effect
value.
Note, the data.frame was already grouped by roi_id
.
head(color_index_two_sample)
#> # A tibble: 6 x 4
#> # Groups: roi_id [6]
#> subj_id roi_id group color_effect
#> <chr> <chr> <fct> <dbl>
#> 1 01 AnG Paired -0.0155
#> 2 01 dLatIPS Paired -0.0484
#> 3 01 LO Paired -0.00366
#> 4 01 pIPS Paired -0.0398
#> 5 01 V1 Paired -0.0120
#> 6 01 vIPS Paired -0.0366
str(color_index_two_sample)
#> tibble [464 × 4] (S3: grouped_df/tbl_df/tbl/data.frame)
#> $ subj_id : chr [1:464] "01" "01" "01" "01" ...
#> $ roi_id : chr [1:464] "AnG" "dLatIPS" "LO" "pIPS" ...
#> $ group : Factor w/ 2 levels "Paired","Control": 1 1 1 1 1 1 1 1 1 1 ...
#> $ color_effect: num [1:464] -0.01546 -0.04841 -0.00366 -0.03982 -0.01201 ...
#> - attr(*, "groups")= tibble [8 × 2] (S3: tbl_df/tbl/data.frame)
#> ..$ roi_id: chr [1:8] "AnG" "dLatIPS" "LO" "pIPS" ...
#> ..$ .rows : list<int> [1:8]
#> .. ..$ : int [1:58] 1 9 17 25 33 41 49 57 65 73 ...
#> .. ..$ : int [1:58] 2 10 18 26 34 42 50 58 66 74 ...
#> .. ..$ : int [1:58] 3 11 19 27 35 43 51 59 67 75 ...
#> .. ..$ : int [1:58] 4 12 20 28 36 44 52 60 68 76 ...
#> .. ..$ : int [1:58] 5 13 21 29 37 45 53 61 69 77 ...
#> .. ..$ : int [1:58] 6 14 22 30 38 46 54 62 70 78 ...
#> .. ..$ : int [1:58] 7 15 23 31 39 47 55 63 71 79 ...
#> .. ..$ : int [1:58] 8 16 24 32 40 48 56 64 72 80 ...
#> .. ..@ ptype: int(0)
#> ..- attr(*, ".drop")= logi TRUE
Here is the example of how to obtain the paired t-test for each sub-group:
t_test_two_sample(color_index_two_sample, x = "color_effect", y = "group", paired = TRUE)
#> # A tibble: 8 x 5
#> # Groups: roi_id [8]
#> roi_id tvalue df p p_bonferroni
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 AnG 0.570 28 0.573 1
#> 2 dLatIPS 1.68 28 0.104 0.835
#> 3 LO 2.27 28 0.0311 0.249
#> 4 pIPS 1.85 28 0.0752 0.601
#> 5 V1 1.22 28 0.232 1
#> 6 vIPS 2.67 28 0.0124 0.0991
#> 7 vLatIPS 1.69 28 0.101 0.811
#> 8 VTC 1.16 28 0.257 1
Can be integrated into tidyverse
pipeline too.
color_index_two_sample_t_res <- color_index_two_sample %>%
t_test_two_sample(
x = "color_effect", y = "group", paired = TRUE, p_adjust = c("bonferroni","fdr")
)
knitr::kable(color_index_two_sample_t_res, digits = 3)
roi_id | tvalue | df | p | p_bonferroni | p_fdr |
---|---|---|---|---|---|
AnG | 0.570 | 28 | 0.573 | 1.000 | 0.573 |
dLatIPS | 1.678 | 28 | 0.104 | 0.835 | 0.167 |
LO | 2.270 | 28 | 0.031 | 0.249 | 0.124 |
pIPS | 1.848 | 28 | 0.075 | 0.601 | 0.167 |
V1 | 1.221 | 28 | 0.232 | 1.000 | 0.294 |
vIPS | 2.673 | 28 | 0.012 | 0.099 | 0.099 |
vLatIPS | 1.694 | 28 | 0.101 | 0.811 | 0.167 |
VTC | 1.156 | 28 | 0.257 | 1.000 | 0.294 |