Introduction

There are two prevaling methods for using HTS-SIP data to estimate the amount of isotope that each OTU incorporated:

In this vignette, we are going to show how to run both analyses and also compare the results a bit.

Dataset

First, let's load some packages including HTSSIP.

library(dplyr)
library(ggplot2)
library(HTSSIP)

OK. We're going to be using 2 data files:

We'll be using the dataset that we simulated in the HTSSIP_sim vignette.

The phyloseq object is similar to the dataset in the other vignettes.

# HTS-SIP data
physeq_rep3
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 6 taxa and 144 samples ]
## sample_data() Sample Data:       [ 144 samples by 5 sample variables ]

The associated qPCR data is a list of length = 2.

# qPCR data (list object)
physeq_rep3_qPCR %>% names
## [1] "raw"     "summary"

For the analyses in this vignette, we only need the 'summary' table.

# qPCR data (list object)
physeq_rep3_qPCR_sum = physeq_rep3_qPCR$summary
physeq_rep3_qPCR_sum %>% head(n=4)
##   IS_CONTROL                  Sample Buoyant_density qPCR_tech_rep_mean
## 1      FALSE 13C-Glu_rep1_1.671712_2        1.671712           46598172
## 2      FALSE 13C-Glu_rep1_1.671722_1        1.671722           42299449
## 3      FALSE 13C-Glu_rep1_1.680311_3        1.680311           53536519
## 4      FALSE 13C-Glu_rep1_1.683540_4        1.683540           51449609
##   qPCR_tech_rep_sd     Gradient Fraction Treatment Replicate
## 1          9055833 13C-Glu_rep1        2   13C-Glu         1
## 2         16369298 13C-Glu_rep1        1   13C-Glu         1
## 3         18265667 13C-Glu_rep1        3   13C-Glu         1
## 4          3796576 13C-Glu_rep1        4   13C-Glu         1

q-SIP

OK. Let's quantify isotope incorporation witht the q-SIP method.

# transforming OTU counts
physeq_rep3_t = OTU_qPCR_trans(physeq_rep3, physeq_rep3_qPCR_sum)

# calculating atom fraction excess
atomX = qSIP_atom_excess(physeq_rep3_t,
                         control_expr='Treatment=="12C-Con"',
                         treatment_rep='Replicate')
atomX %>% names
## [1] "W" "A"

The resulting list object contains 2 data.frames. We are interested in the 'A' table, which contains estimated BD shifts (Z) and atom fraction excess (A).

atomX$A %>% head(n=4)
## # A tibble: 4 x 9
##   OTU    Wlab Wlight      Z    Gi Mlight Mheavymax  Mlab     A
##   <chr> <dbl>  <dbl>  <dbl> <dbl>  <dbl>     <dbl> <dbl> <dbl>
## 1 OTU.1  1.72   1.70 0.0222 0.636   308.      318.  312. 0.412
## 2 OTU.2  1.71   1.69 0.0117 0.586   308.      318.  310. 0.218
## 3 OTU.3  1.73   1.70 0.0347 0.608   308.      318.  314. 0.644
## 4 OTU.4  1.73   1.70 0.0261 0.705   308.      318.  313. 0.484

Next, let's calculate bootstrap confidence intervales for the atom fraction excess estimations.

# calculating bootstrapped CI values
df_atomX_boot = qSIP_bootstrap(atomX, n_boot=100)
df_atomX_boot %>% head(n=4)
## # A tibble: 4 x 11
##   OTU    Wlab Wlight      Z    Gi Mlight Mheavymax  Mlab     A A_CI_low
##   <chr> <dbl>  <dbl>  <dbl> <dbl>  <dbl>     <dbl> <dbl> <dbl>    <dbl>
## 1 OTU.1  1.72   1.70 0.0222 0.636   308.      318.  312. 0.412   0.0252
## 2 OTU.2  1.71   1.69 0.0117 0.586   308.      318.  310. 0.218   0.126 
## 3 OTU.3  1.73   1.70 0.0347 0.608   308.      318.  314. 0.644   0.431 
## 4 OTU.4  1.73   1.70 0.0261 0.705   308.      318.  313. 0.484   0.366 
## # … with 1 more variable: A_CI_high <dbl>

delta_BD

Now for delta_BD. The setup is easier because we are not using qPCR data, just relative abundances from 16S rRNA sequence data.

df_dBD = delta_BD(physeq_rep3, control_expr='Treatment=="12C-Con"')
df_dBD %>% head(n=4)
## # A tibble: 4 x 4
##   OTU   CM_control CM_treatment delta_BD
##   <chr>      <dbl>        <dbl>    <dbl>
## 1 OTU.1       1.69         1.72   0.0260
## 2 OTU.2       1.69         1.70   0.0130
## 3 OTU.3       1.69         1.74   0.0562
## 4 OTU.4       1.71         1.73   0.0208

Comparing results

Let's plot the data and compare all of the results. First, let's join all of the data into one table for plotting. We'll also format it for plotting.

# checking & joining data 
stopifnot(nrow(df_atomX_boot) == nrow(df_dBD))
df_j = dplyr::inner_join(df_atomX_boot, df_dBD, c('OTU'='OTU'))
stopifnot(nrow(df_atomX_boot) == nrow(df_j))

# formatting data for plotting
df_j = df_j %>%
  dplyr::mutate(OTU = reorder(OTU, -delta_BD))

OK. Time to plot!

# plotting BD shift (Z)
ggplot(df_j, aes(OTU)) +
  geom_point(aes(y=Z), color='blue') +
  geom_point(aes(y=delta_BD), color='red') +
  geom_hline(yintercept=0, linetype='dashed', alpha=0.5) +
  labs(x='OTU', y='BD shift (Z)') +
  theme_bw() +
  theme(
    axis.text.x = element_blank()
  )

plot of chunk unnamed-chunk-10

In the figure, red points are delta_BD and blue points are q-SIP. It's easy to see that delta_BD is a lot more variable than q-SIP. This is likely due to a high influence of compositional data artifacts on delta_BD versus q-SIP.

Let's make a boxplot to show the difference in estimation variance between the two methods.

# plotting BD shift (Z): boxplots

## formatting the table
df_j_g = df_j %>%
  dplyr::select(OTU, Z, delta_BD) %>%
  tidyr::gather(Method, BD_shift, Z, delta_BD) %>%
  mutate(Method = ifelse(Method == 'Z', 'q-SIP', 'delta-BD'))

## plotting 
ggplot(df_j_g, aes(Method, BD_shift)) +
  geom_boxplot() +
  geom_hline(yintercept=0, linetype='dashed', alpha=0.5) +
  labs(x='Method', y='BD shift (Z)') +
  theme_bw() 

plot of chunk unnamed-chunk-11

The boxplot helps to summarize how much more variance delta_BD produces versus q-SIP.

Session info

sessionInfo()
## R version 3.4.3 (2017-11-30)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Sierra 10.12.6
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] phyloseq_1.22.3 HTSSIP_1.4.1    ggplot2_3.2.0   tidyr_0.8.3    
## [5] dplyr_0.8.0.1  
## 
## loaded via a namespace (and not attached):
##   [1] nlme_3.1-131               bitops_1.0-6              
##   [3] matrixStats_0.54.0         bit64_0.9-7               
##   [5] doParallel_1.0.14          RColorBrewer_1.1-2        
##   [7] GenomeInfoDb_1.14.0        tools_3.4.3               
##   [9] backports_1.1.4            utf8_1.1.4                
##  [11] R6_2.4.0                   coenocliner_0.2-2         
##  [13] vegan_2.5-1                rpart_4.1-15              
##  [15] Hmisc_4.2-0                DBI_1.0.0                 
##  [17] lazyeval_0.2.2             BiocGenerics_0.24.0       
##  [19] mgcv_1.8-28                colorspace_1.4-1          
##  [21] permute_0.9-5              ade4_1.7-13               
##  [23] nnet_7.3-12                withr_2.1.2               
##  [25] tidyselect_0.2.5           gridExtra_2.3             
##  [27] DESeq2_1.18.1              bit_1.1-14                
##  [29] compiler_3.4.3             cli_1.1.0                 
##  [31] Biobase_2.38.0             htmlTable_1.13.1          
##  [33] DelayedArray_0.4.1         labeling_0.3              
##  [35] scales_1.0.0               checkmate_1.9.3           
##  [37] genefilter_1.60.0          stringr_1.4.0             
##  [39] digest_0.6.19              foreign_0.8-71            
##  [41] XVector_0.18.0             base64enc_0.1-3           
##  [43] pkgconfig_2.0.2            htmltools_0.3.6           
##  [45] highr_0.8                  htmlwidgets_1.3           
##  [47] rlang_0.4.0                RSQLite_2.1.1             
##  [49] rstudioapi_0.10            jsonlite_1.6              
##  [51] BiocParallel_1.12.0        acepack_1.4.1             
##  [53] RCurl_1.95-4.12            magrittr_1.5              
##  [55] GenomeInfoDbData_1.0.0     Formula_1.2-3             
##  [57] biomformat_1.6.0           Matrix_1.2-17             
##  [59] fansi_0.4.0                Rcpp_1.0.1                
##  [61] munsell_0.5.0              S4Vectors_0.16.0          
##  [63] ape_5.3                    stringi_1.4.3             
##  [65] MASS_7.3-51.4              SummarizedExperiment_1.8.1
##  [67] zlibbioc_1.24.0            rhdf5_2.22.0              
##  [69] plyr_1.8.4                 blob_1.1.1                
##  [71] grid_3.4.3                 parallel_3.4.3            
##  [73] crayon_1.3.4               lattice_0.20-38           
##  [75] Biostrings_2.46.0          splines_3.4.3             
##  [77] multtest_2.34.0            annotate_1.56.2           
##  [79] locfit_1.5-9.1             zeallot_0.1.0             
##  [81] knitr_1.18                 pillar_1.4.1              
##  [83] igraph_1.2.4               GenomicRanges_1.30.3      
##  [85] markdown_0.9               geneplotter_1.56.0        
##  [87] reshape2_1.4.3             codetools_0.2-16          
##  [89] stats4_3.4.3               XML_3.98-1.19             
##  [91] glue_1.3.1                 evaluate_0.14             
##  [93] latticeExtra_0.6-28        data.table_1.10.4-3       
##  [95] vctrs_0.1.0                foreach_1.4.4             
##  [97] gtable_0.3.0               purrr_0.3.2               
##  [99] assertthat_0.2.1           mime_0.6                  
## [101] xtable_1.8-4               survival_2.44-1.1         
## [103] tibble_2.1.1               iterators_1.0.10          
## [105] memoise_1.1.0              AnnotationDbi_1.40.0      
## [107] IRanges_2.12.0             cluster_2.0.6