ProliferativeIndex Vignette

Brittany N. Lasseigne, PhD and Ryne C. Ramaker

2017-02-16

The ProliferativeIndex R package1 provides users with R functions for calculating and analyzing the proliferative index (PI) from an RNA-seq dataset.

The PI was adapted from Venet, et al.2:

“The proliferating cell nuclear antigen, PCNA, is a ring-shaped protein that encircles DNA and regulates several processes leading to DNA replication. As suggested by its name, this is one of the most widely used antigen target for immunohistochemical measures of the fraction of proliferating cells in tissues. Ge et al. profiled with microarrays 36 tissues from normal, healthy individuals encompassing 27 organs. We call ‘meta-PCNA’ the signature composed of the 1% genes the most positively correlated with PCNA expression across these 36 tissues. In plain language, meta-PCNA genes are consistently expressed when PCNA is expressed in normal tissues and consistently repressed when PCNA is repressed. We define the meta-PCNA index as the median expressin of meta-PCNA genes.”

IMPORTANT: Proliferative Indices are only interpretable relative to other PIs. For example, higher/lower PI in tumors compared to normal tissues or in post-mitotic tissues compared to in tissues with high rates of cell turnover. Additionally, PI is measuring proliferation associated with expression (as described above) and not necessarily proliferation itself.

ProliferativeIndex contains the following functions:

Example Data Set

Included with ProliferativeIndex specifically for use with this vignette is data from the The Cancer Genome Atlas (TCGA) Adrenocortical Carcinoma (ACC) dataset.3

After first loading the ProliferativeIndex library:

library(ProliferativeIndex)

This dataset, vstTCGA_ACCData_sub can be accessed from the package:

data(vstTCGA_ACCData_sub)

#Examine only the first few columns and rows because the dataset is large (20501 genes x 10 samples):
dim(vstTCGA_ACCData_sub)
## [1] 20501    10
#Note that sample IDs are column names and HGNC gene IDs (http://www.genenames.org) are rownames and that vst data is numeric.
str(vstTCGA_ACCData_sub)
## 'data.frame':    20501 obs. of  10 variables:
##  $ TCGA.OR.A5J1: num  5.87 4.19 5.92 8.43 6.99 ...
##  $ TCGA.OR.A5J2: num  5.49 4.19 5.2 8.74 4.19 ...
##  $ TCGA.OR.A5J3: num  6.04 4.52 5.44 8.04 4.76 ...
##  $ TCGA.OR.A5J5: num  11.4 4.71 5.22 7.08 6.8 ...
##  $ TCGA.OR.A5J6: num  10.07 4.19 5.11 8.8 4.66 ...
##  $ TCGA.OR.A5J7: num  5.57 4.19 4.96 7.52 4.91 ...
##  $ TCGA.OR.A5J8: num  6.86 4.19 4.19 6.91 5.1 ...
##  $ TCGA.OR.A5J9: num  5.4 4.19 6.46 8.94 6.34 ...
##  $ TCGA.OR.A5JA: num  6.8 4.19 5.25 8.77 6.36 ...
##  $ TCGA.OR.A5JB: num  8.53 4.19 4.19 6.84 4.19 ...
knitr::kable(vstTCGA_ACCData_sub[1:5,1:5])
TCGA.OR.A5J1 TCGA.OR.A5J2 TCGA.OR.A5J3 TCGA.OR.A5J5 TCGA.OR.A5J6
A1BG 5.871339 5.490145 6.036080 11.397348 10.065106
A1CF 4.190503 4.190503 4.523434 4.713955 4.190503
A2BP1 5.915039 5.196520 5.443088 5.221104 5.112238
A2LD1 8.431843 8.741279 8.043286 7.075708 8.798831
A2ML1 6.986670 4.190503 4.764641 6.798125 4.657211

readDataForPI function

Functions in the ProliferativeIndex package come with help pages that can be accessed as usual (for example, ?readDataForPI).

The function readDataForPI is used to read data in for use with the ProliferativeIndex package.

#Inputs are the user's vst dataframe and a model of interest for examining PI:
exampleTCGAData<-readDataForPI(vstTCGA_ACCData_sub, c("AIFM3", "ATP9B", "CTRC", "MCL1", "MGAT4B", "ODF2L", "SNORA65", "TPPP2"))

#examine output which is a list of two objects:
# exampleTCGAData$vstData is the user's vst dataframe and exampleTCGAData$modelIDs is a character string of the user's gene IDs for their model of interest
str(exampleTCGAData)
## List of 2
##  $ vstData :'data.frame':    20501 obs. of  10 variables:
##   ..$ TCGA.OR.A5J1: num [1:20501] 5.87 4.19 5.92 8.43 6.99 ...
##   ..$ TCGA.OR.A5J2: num [1:20501] 5.49 4.19 5.2 8.74 4.19 ...
##   ..$ TCGA.OR.A5J3: num [1:20501] 6.04 4.52 5.44 8.04 4.76 ...
##   ..$ TCGA.OR.A5J5: num [1:20501] 11.4 4.71 5.22 7.08 6.8 ...
##   ..$ TCGA.OR.A5J6: num [1:20501] 10.07 4.19 5.11 8.8 4.66 ...
##   ..$ TCGA.OR.A5J7: num [1:20501] 5.57 4.19 4.96 7.52 4.91 ...
##   ..$ TCGA.OR.A5J8: num [1:20501] 6.86 4.19 4.19 6.91 5.1 ...
##   ..$ TCGA.OR.A5J9: num [1:20501] 5.4 4.19 6.46 8.94 6.34 ...
##   ..$ TCGA.OR.A5JA: num [1:20501] 6.8 4.19 5.25 8.77 6.36 ...
##   ..$ TCGA.OR.A5JB: num [1:20501] 8.53 4.19 4.19 6.84 4.19 ...
##  $ modelIDs: chr [1:8] "AIFM3" "ATP9B" "CTRC" "MCL1" ...

*note, the R package includes a data object, ‘exReadDataObj’ that is the output from the readDataForPI function for comparison

calculatePI function

The function calculatePI calculates PI for all sample’s in the users vst dataframe using a list of PCNA-associated genes collected from Venet et al. (including alternative gene names).

*note, the function will print to the screen how many genes used to calculate the PI were found in the vstData

proliferativeIndices<-calculatePI(exampleTCGAData)
## [1] "vstData contained 131/131 of the PI-associated genes"
summary(proliferativeIndices)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   8.005   8.097   8.130   8.152   8.179   8.343
#Examine function output:
knitr::kable(head(proliferativeIndices))
TCGA.OR.A5J1 TCGA.OR.A5J2 TCGA.OR.A5J3 TCGA.OR.A5J5 TCGA.OR.A5J6 TCGA.OR.A5J7 8.005387 8.190936 8.131733 8.342858 8.084313 8.126670
*note, the R pa ckage includes a data object, ‘exVSTPI’ that is the output from the calculatePI function for comparison
## comparePI fu This function w nction ill summarize the PI values within the user’s dataset.
Min. 1st Qu. 8.005 8.097 [](Proliferati *note, the R pa Median Mean 3rd Qu. Max. 8.130 8.152 8.179 8.343 veIndexVignette_files/figure-html/unnamed-chunk-5-1.png) ckage includes a data object, ‘exVSTPI’ that is the output from the calculatePI function for comparison
## compareModel toPI function
The function `c ompareModeltoPI` will take, as input, the user’s data and model identifiers and compare to PI:
r modelComparison <-compareModeltoPI(exampleTCGAData, proliferativeIndices)
[](Proliferati veIndexVignette_files/figure-html/unnamed-chunk-6-1.png)
r #the output is knitr::kable(mo a table, inspect: delComparison)
Spearma nRho SpearmanPvalue PCAPropOfVariance

PC1 0.3818182 0.2789652 0.51527 PC2 0.0424242 0.9186333 0.11587 PC3 0.2242424 0.5366881 0.07491 PC4 0.1272727 0.7328868 0.06558 PC5 -0.3575758 0.3128003 0.05897 PC6 -0.5515152 0.1042978 0.05068 PC7 0.3818182 0.2789652 0.05002 PC8 0.0060606 1.0000000 0.03992 PC9 0.1636364 0.6567214 0.02878 PC10 0.2727273 0.4482722 0.00000


  1. Ramaker and Lasseigne, et al. bioRxiv, 2016.

  2. Venet, et al. PLoS Computational Biology, 2011 and Ge, et al. Genomics, 2005.

  3. The TCGA ACC dataset was obtained from the TCGA data portal (tcga-data.nci.nih.gov) in June 2015. Level 3 RNASeqV2 raw count data was variance stabalized with the DESeq2 v1.8.2 ‘varianceStabilizingTransformation’.