We developed a novel software package (SMDIC) that enables automated identification of Somatic Mutation-Driven Immune Cell. The operation modes include inference of the relative abundance matrix of tumor-infiltrating immune cells, detection of differential abundance immune cells concerning the gene mutation status, conversion of the abundance matrix of significantly dysregulated cells into two binary matrices (one for up-regulated and one for down-regulated cells), identification of somatic mutation-driven immune cells by comparing the gene mutation status with each immune cell in the binary matrices across all samples, and visualization of immune cell abundance of samples in different mutation status for each gene. Here we present the flow diagram of SMDIC.
SMDIC has three main functions (flow diagram):
(A) inferring
the relative abundance matrix of tumor-infiltrating immune cells
(B) detecting differential abundance immune cells concerning a
particular gene mutation status and converting the abundance matrix of
significantly dysregulated immune cells into two binary matrices (one
for up-regulated and one for down-regulated cells)
(C) identifying
somatic mutation-driven immune cells by comparing the gene mutation
status with each immune cell in the binary matrices across all samples.
This vignette illustrates how to easily use the SMDIC package. With the use of functions in this package, users could identify the immune cells driven by somatic mutations in the tumor microenvironment.
This package provides the exp2cell
function to use
gene expression profile to quantify the cell abundance matrix.
This package provides the maf2matrix
function to use
mutation annotation file (MAF) format data to build a binary mutations
matrix.
This package provides the mutcorcell
function to
identify the immune cells driven by somatic mutations in the tumor
microenvironment.
This package provides the plotwaterfall
function to
plot the waterfall for mutation genes that drive immune cells.
This package provides the plotCoocMutex
function to
plot the co-occurrence and mutual exclusivity plots for mutation genes
that drive immune cells.
This package provides the heatmapcell
function to
draw clustered heatmaps for the cells driven by a somatic
mutation.
This package provides the survcell
function to draw
Kaplan–Meier survival curves in the above-median and below-median groups
for cell risk score.
We can use the function GetExampleData
to return example
data and environment variables.
The function exp2cell can use gene expression profiles to
quantify cell abundance matrix. ‘exp2cell’ provides three methods for
estimating the relative infiltration abundance of different cell types
in the tumor microenvironment (TME), which including xCell, ssGSEA
estimated method proposed by Şenbabaoğlu et al. and CIBERSORT.Through
these methods, the relative abundance matrix of immune cells is inferred
with cells as the row and samples as the column.
We selected the breast cancer gene expression profile data in the GDC TCGA database for exp2cell and obtained the results of the cell abundance matrix. The commands are as follows.
library(SMDIC)
#get breast cancer gene expression profile.
exp.example<-GetExampleData("exp.example")
# perform the exp2cell method. The method must be one of "xCell","ssGSEA" and "CIBERSORT".
cellmatrix.example<-exp2cell(exp.example,method="ssGSEA")
#> Warning in .filterFeatures_newAPI(dataMatrix, dropConstantRows = FALSE): 659
#> rows with constant values throughout the samples.
#> Warning in .gsva_newAPI(expr = dataMatrix, gset.idx.list = mappedGeneSets, :
#> Some gene sets have size one. Consider setting 'minSize > 1'.
#get the result of the exp2cell function
#view the first six rows and six columns of the cell abundance matrix.
head(cellmatrix.example)
#> TCGA.A7.A13G.01 TCGA.A7.A26E.01 TCGA.A1.A0SQ.01 TCGA.GI.A2C9.11
#> aDC -0.1721896 -0.2365829 -0.09769543 -0.11102476
#> B cells -0.1785598 -0.2060056 -0.21060641 -0.15950503
#> CD8 T cells 0.2603276 0.2939651 0.32180621 0.34343045
#> Cytotoxic cells -0.1710623 -0.1744474 -0.13125948 0.04400304
#> DC -0.2110966 -0.2248506 -0.16642146 0.16974204
#> Eosinophils 0.1456574 0.1297275 0.16402532 0.12760198
#> TCGA.E9.A1NE.01 TCGA.A2.A0SX.01 TCGA.EW.A1PD.01 TCGA.BH.A0HN.01
#> aDC 0.30829474 0.255985640 -0.13083749 -0.06885779
#> B cells -0.06773615 -0.007057109 -0.25506334 -0.25029785
#> CD8 T cells 0.42561320 0.337348145 0.30512749 0.28520165
#> Cytotoxic cells 0.29537554 0.125879324 -0.16228083 -0.15758420
#> DC -0.01435208 0.109405174 -0.08489845 -0.26360357
#> Eosinophils 0.13450722 0.068610081 0.11624230 0.09294411
#> TCGA.E2.A1IU.01 TCGA.BH.A0BO.01
#> aDC -0.008594315 -0.08444126
#> B cells -0.124540116 -0.19944249
#> CD8 T cells 0.317656631 0.32995068
#> Cytotoxic cells -0.048766592 -0.09360501
#> DC -0.158910587 0.04338637
#> Eosinophils 0.125777697 0.16156528
The somatic mutation data used in the package is the mutation annotation file (MAF) format. We extract the non-silent somatic mutations (nonsense mutation, missense mutation, frame-shift indels, splice site, nonstop mutation, translation start site, inframe indels) in protein-coding regions, and built a binary mutations matrix, in which 1 represents any mutation occurs in a particular gene in a particular sample, otherwise the element is 0. The genes with a given mutation frequency greater than a threshold value (one percent as the default value) are retained for the following analysis.
We selected the breast cancer somatic mutation data in the GDC TCGA database for maf2matrix and obtained the results of a binary mutation matrix. The commands are as follows.
# get the path of the mutation annotation file.
maf <- system.file("extdata","example.maf.gz",package = "SMDIC")
# perform the maf2matrix method.
mutmatrix.example<-maf2matrix(maffile = maf,percent = 0.01)
#get the result of the exp2cell function
#view the first six rows and six columns of the binary mutations matrix
head(mutmatrix.example)[1:6,1:6]
#> TCGA.E9.A22G.01 TCGA.OL.A6VO.01 TCGA.AR.A1AP.01 TCGA.BH.A0AW.01
#> TP53 1 1 1 1
#> TP53BP2 0 0 0 0
#> TP53BP1 0 0 0 0
#> PCDH10 0 0 0 0
#> CDH1 0 0 0 0
#> CDH10 0 0 0 0
#> TCGA.E2.A1LG.01 TCGA.A2.A04U.01
#> TP53 1 1
#> TP53BP2 0 0
#> TP53BP1 0 0
#> PCDH10 1 0
#> CDH1 0 0
#> CDH10 0 0
For a particular mutation gene, we compare the binary mutation vector
with each binary cell abundance vector in the up-regulated or
down-regulated immune cell-matrix respectively, and a 2×2 contingency
table is constructed.Fisher’s exact test is applied to recover the cells
that had drastic mutation-correlated up-regulated or down-regulated
response. This process is repeated for each immune cell in the binary
abundance matrices. To correct for multiple comparisons, we adjust the
exact test p-values by using the false discovery rate (FDR) method
proposed by Benjamini and Hochberg. The immune cells with the default
FDR<0.05 are deemed as statistically significant mutation-correlated,
and which may be driven by the somatic mutation. We repeat the above
process for each mutated gene.
The “mutcorcell” function is implemented to identify somatic
mutation-specific immune cell response by inputting the abundance matrix
of immune cells and binary mutations matrix. This function firstly
detects differential abundance of immune cells concerning a particular
gene mutation status with the Significance Analysis of Microarrays (SAM)
method. Function mutcorcell
detects the differential immune
cells concerning a particular gene mutation status and construction of
two binary matrices based on the abundance matrix of significant
differential immune cell (one for up-regulated and one for
down-regulated), identification of somatic mutation-driven immune cells
by comparing the gene mutation status with each immune cell in the
binary matrices.
We selected the abundance matrix of immune cells and binary mutations
matrix for mutcorcell and obtained the results of A list of four
matrices.
* A binary numerical matrix which shows the immune cells
driven by somatic mutant gene;
* Two numerical matrices which show
the pvalue and fdr of the immune cells driven by somatic mutant gene;
* A character matrix which shows the cell responses of the immune
cells driven by a somatic mutant gene.
The commands are as
follows.
# get breast cancer cell abundance matrix, which can be the result of exp2cell function.
cellmatrix<-GetExampleData("cellmatrix")
# get breast cancer binary mutations matrix, which can be the result of maf2matrix function.
mutmatrix<-GetExampleData("mutmatrix")
# perform the function `mutcorcell`.
mutcell<-mutcorcell(cellmatrix= cellmatrix,mutmatrix = mutmatrix,fisher.adjust = TRUE)
#get the result of the `mutcorcell` function
mutcell<-GetExampleData("mutcell")
# the binary numerical matrix which shows the immune cells driven by somatic mutant gene.
mutcell$mut_cell[1:6,1:6]
#the numerical matrix which shows the pvalue of the immune cells driven by a somatic mutant gene
#mutcell$mut_cell_p
#the numerical matrix which show the fdr of the immune cells driven by somatic mutant gene
#mutcell$mut_cell_fdr
#the character matrix which shows the cell responses of the immune cells driven by a somatic mutant gene."up" means up-regulation, "down" means down-regulation, and "0" means no significant adjustment relationship
#mutcell$mut_cell_cellresponses
tip: The binary matrix can not only come from the maf2matrix function but also any binary mutations matrix, in which 1 represents any mutation occurs in a particular gene in a particular sample, otherwise the element is 0.
Function mutcellsummary produces result summaries of the
results of mutcorcell function. The summary has four columns.
The first column is gene names, the second column is the cells driven by
the gene, the third column is the number of cells driven by the gene,
the fourth column is mutation rates of the gene.
We selected the result of mutcorcell function, a binary
mutations matrix and a cell abundance matrix for mutcellsummary
and obtained the summaries of mutcorcell function.
The
commands are as follows.
# perform the function mutcellsummary
summary<-mutcellsummary(mutcell =mutcell,mutmatrix = mutmatrix,cellmatrix = cellmatrix)
# get the result of the mutcellsummary function
head(summary)
#> gene
#> 1 TP53
#> 2 CDH1
#> 3 NBPF14
#> 4 OBSCN
#> 5 NF1
#> 6 KCNA4
#> cells
#> 1 Astrocytes,CD8+ naive T-cells,CD8+ Tem,DC,Keratinocytes,Macrophages,Macrophages M1,Melanocytes,NK cells,pDC,Pericytes,Plasma cells,pro B-cells,Sebocytes,Tgd cells,Th1 cells,Th2 cells,Tregs
#> 2 Adipocytes,CD4+ Tcm,Chondrocytes,CMP,Endothelial cells,Fibroblasts,HSC,ly Endothelial cells,Megakaryocytes,Mesangial cells,mv Endothelial cells,NKT
#> 3 CD8+ Tem,DC,iDC,Keratinocytes,pro B-cells,Sebocytes
#> 4 Epithelial cells,Keratinocytes,pro B-cells,Sebocytes,Th1 cells
#> 5 aDC,B-cells,Memory B-cells,pDC,Plasma cells
#> 6 CD8+ T-cells,CD8+ Tcm,Class-switched memory B-cells,NK cells,pDC
#> count mut rate
#> 1 18 0.34394251
#> 2 12 0.12936345
#> 3 6 0.01334702
#> 4 5 0.02977413
#> 5 5 0.03696099
#> 6 5 0.01129363
Function gene2cellsummary produces result summaries of the
immune cells driven by a somatic mutation. The summaries are a matrix
that shows the short name, full name, pvalue, fdr, cell responses(up or
down) of the cells driven by a somatic mutation.
We selected the
result of mutcorcell function, the method we use in exp2cell
and a gene we are interested in for gene2cellsummary ,and
obtained the result summaries of the immune cells driven by a somatic
mutation.
The commands are as follows.
# perform the function gene2cellsummary
gene2cellsummary(gene="TP53",method="xCell",mutcell = mutcell)
#> gene cell name full name
#> Astrocytes TP53 Astrocytes Astrocytes
#> CD8+ naive T-cells TP53 CD8+ naive T-cells CD8+ naive T-cells
#> CD8+ Tem TP53 CD8+ Tem CD8+ effector memory T-cells
#> DC TP53 DC Dendritic cells
#> Keratinocytes TP53 Keratinocytes Keratinocytes
#> Macrophages TP53 Macrophages Macrophages
#> Macrophages M1 TP53 Macrophages M1 Macrophages M1
#> Melanocytes TP53 Melanocytes Melanocytes
#> NK cells TP53 NK cells NK cells
#> pDC TP53 pDC Plasmacytoid dendritic cells
#> Pericytes TP53 Pericytes Pericytes
#> Plasma cells TP53 Plasma cells Plasma cells
#> pro B-cells TP53 pro B-cells pro B-cells
#> Sebocytes TP53 Sebocytes Sebocytes
#> Tgd cells TP53 Tgd cells Gamma delta T-cells
#> Th1 cells TP53 Th1 cells Type 1 T-helper cells
#> Th2 cells TP53 Th2 cells Type 2 T-helper cells
#> Tregs TP53 Tregs Regulatory T-cells
#> pvalue fdr cell responses
#> Astrocytes 1.010916e-07 1.162554e-06 up
#> CD8+ naive T-cells 3.992545e-03 1.311836e-02 up
#> CD8+ Tem 1.748251e-02 4.467753e-02 up
#> DC 1.604871e-02 4.385580e-02 up
#> Keratinocytes 2.331616e-07 1.981328e-06 up
#> Macrophages 1.820606e-04 1.196398e-03 up
#> Macrophages M1 6.327255e-09 9.701790e-08 up
#> Melanocytes 5.155187e-03 1.580924e-02 up
#> NK cells 5.359277e-04 2.465267e-03 up
#> pDC 2.076020e-03 8.121323e-03 up
#> Pericytes 3.445410e-04 1.981111e-03 up
#> Plasma cells 1.620758e-02 4.385580e-02 up
#> pro B-cells 2.637040e-03 9.331064e-03 up
#> Sebocytes 1.510706e-09 3.474624e-08 up
#> Tgd cells 2.118606e-03 8.121323e-03 up
#> Th1 cells 2.584341e-07 1.981328e-06 up
#> Th2 cells 8.859960e-13 4.075581e-11 up
#> Tregs 4.775752e-04 2.440940e-03 up
We provide a set of visual analysis functions including
heatmapcell
,plotwaterfall
,plotCoocMutex
,and survcell
.
Function heatmapcell
is a function to draw clustered
heatmaps for the cells driven by a somatic mutation.
The commands
are as follows.
# load dependent package.
require(pheatmap)
#> Loading required package: pheatmap
# plot significant up-regulation or down-regulation cells heat map specific for breast cancer
heatmapcell(gene = "TP53",mutcell = mutcell,cellmatrix = cellmatrix,mutmatrix = mutmatrix)
We use the TCGA breast cancer sample data set to display the
waterfall and co-occurrence and mutual exclusivity plots, and users can
also enter the cancer data set they are interested in.
Function
plotwaterfall
and plotCoocMutex
provide
visualization of the mutation genes correlated with immune cells using a
waterfall plot and mutually exclusive or co-occurring plot.
The commands are as follows.
#maf<-"dir"
#tips: dir is the name of the mutation annotation file (MAF) format data. It must be an absolute path or the name relative to the current working directory.
#plot the waterfall for mutation genes which drive immune cells
#plotwaterfall(maffile = maf,mutcell.summary = summary,cellnumcuoff =4)
#plot the co-occurrence and mutual exclusivity plots for mutation genes that drive immune cells.
#plotCoocMutex(maffile = maf,mutcell.summary = summary,cellnumcuoff =4)
#view the result of the plotwaterfall function
knitr::include_graphics("../inst/plotwaterfall.jpeg")
#view the result of the plotCoocMutex function
knitr::include_graphics("../inst/plotCoocMutex.jpeg")
Function survcell
draws Kaplan–Meier survival curves in
the above-median and below-median groups for cell risk score. The cell
risk score is calculated by the weighted mean of cells driven by a gene
mutation, where the weight of cells is estimated by the “Univariate” or
“Multivariate” cox regression.
# get the result of `mutcorcell` function.
mutcell<-GetExampleData("mutcell")
# get the result of `exp2cell` function.
cellmatrix<-GetExampleData("cellmatrix")
#get the survival data, the first column is the sample name, the second column is the survival time, and the third is the survival event.
surv<-GetExampleData("surv")
#draw Kaplan–Meier survival curves
survcell(gene ="TP53",mutcell=mutcell,cellmatrix=cellmatrix,surv=surv,palette = c("#E7B800", "#2E9FDF"))