Overview
The goal of sigminer is to provide an uniform interface for genomic variation signature analysis and visualization. sigminer is originated from VSHunter package I wrote. I hate ugly structure and function names in VSHunter, thus reconstruct it using concise function names, S4 classes and S3 methods etc.. I will continue to add more features to uncover genomic variation signatures and their correlationship with phenotypes and genotypes.
Installation
You can install the stable release of sigminer from CRAN with:
You can also install the development version of sigminer from Github with:
Usage
An example for how to extract mutational signatures are given as the following.
library(sigminer)
#> Registered S3 methods overwritten by 'ggplot2':
#> method from
#> [.quosures rlang
#> c.quosures rlang
#> print.quosures rlang
#> Welcome to 'sigminer' package!
#> ======================================================
#> sigminer version 0.1.11
#> Github page: https://github.com/ShixiangWang/sigminer
#>
#> More info please call 'hello()' in console.
#> ======================================================
#>
Load data as a MAF object
laml.maf <- system.file("extdata", "tcga_laml.maf.gz", package = "maftools")
laml <- read_maf(maf = laml.maf)
#> reading maf..
#> silent variants: 475
#> ID N
#> 1: Samples 157
#> 2: 5'Flank 3
#> 3: IGR 5
#> 4: Intron 8
#> 5: RNA 10
#> 6: Silent 449
#> Summarizing..
#> ID summary Mean Median
#> 1: NCBI_Build 37 NA NA
#> 2: Center genome.wustl.edu NA NA
#> 3: Samples 193 NA NA
#> 4: nGenes 1241 NA NA
#> 5: Frame_Shift_Del 52 0.271 0
#> 6: Frame_Shift_Ins 91 0.474 0
#> 7: In_Frame_Del 10 0.052 0
#> 8: In_Frame_Ins 42 0.219 0
#> 9: Missense_Mutation 1342 6.990 7
#> 10: Nonsense_Mutation 103 0.536 0
#> 11: Splice_Site 92 0.479 0
#> 12: total 1732 9.021 9
#> Gene Summary..
#> Hugo_Symbol Frame_Shift_Del Frame_Shift_Ins In_Frame_Del
#> 1: FLT3 0 0 1
#> 2: DNMT3A 4 0 0
#> 3: NPM1 0 33 0
#> 4: IDH2 0 0 0
#> 5: IDH1 0 0 0
#> ---
#> 1237: ZNF689 0 0 0
#> 1238: ZNF75D 0 0 0
#> 1239: ZNF827 1 0 0
#> 1240: ZNF99 0 0 0
#> 1241: ZPBP 0 0 0
#> In_Frame_Ins Missense_Mutation Nonsense_Mutation Splice_Site total
#> 1: 33 15 0 3 52
#> 2: 0 39 5 6 54
#> 3: 0 1 0 0 34
#> 4: 0 20 0 0 20
#> 5: 0 18 0 0 18
#> ---
#> 1237: 0 1 0 0 1
#> 1238: 0 1 0 0 1
#> 1239: 0 0 0 0 1
#> 1240: 0 1 0 0 1
#> 1241: 0 1 0 0 1
#> MutatedSamples AlteredSamples
#> 1: 52 52
#> 2: 48 48
#> 3: 33 33
#> 4: 20 20
#> 5: 18 18
#> ---
#> 1237: 1 1
#> 1238: 1 1
#> 1239: 1 1
#> 1240: 1 1
#> 1241: 1 1
#> Checking clinical data..
#> NOTE: Missing clinical data! It is strongly recommended to provide clinical data associated with samples if available.
#> Done !
Prepare data for signature analysis
library(BSgenome.Hsapiens.UCSC.hg19, quietly = TRUE)
#>
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:base':
#>
#> expand.grid
#>
#> Attaching package: 'Biostrings'
#> The following object is masked from 'package:base':
#>
#> strsplit
sig_pre <- sig_prepare(laml, ref_genome = "BSgenome.Hsapiens.UCSC.hg19",
prefix = "chr", add = TRUE)
#> Warning in maftools::trinucleotideMatrix(maf, ref_genome = ref_genome, prefix = prefix, : Chromosome names in MAF must match chromosome names in reference genome.
#> Ignorinig 101 single nucleotide variants from missing chromosomes chr23
#> Extracting 5' and 3' adjacent bases..
#> Extracting +/- 20bp around mutated bases for background C>T estimation..
#> Estimating APOBEC enrichment scores..
#> Performing one-way Fisher's test for APOBEC enrichment..
#> APOBEC related mutations are enriched in 3.315% of samples (APOBEC enrichment score > 2 ; 6 of 181 samples)
#> Creating mutation matrix..
#> matrix of dimension 188x96
Extract signatures
Before extracting signatures, we can estimate signature number with sig_estimate
function.
library(NMF)
#> Loading required package: pkgmaker
#> Loading required package: registry
#>
#> Attaching package: 'pkgmaker'
#> The following object is masked from 'package:S4Vectors':
#>
#> new2
#> The following object is masked from 'package:base':
#>
#> isFALSE
#> Loading required package: rngtools
#> Loading required package: cluster
#> NMF - BioConductor layer [OK] | Shared memory capabilities [NO: bigmemory] | Cores 7/8
#> To enable shared memory capabilities, try: install.extras('
#> NMF
#> ')
#>
#> Attaching package: 'NMF'
#> The following object is masked from 'package:S4Vectors':
#>
#> nrun
sig_est
keeps all information of estimation.
Please note the ‘pConstant’ option:
It is a small positive value to add to the matrix. Use it ONLY if the functions throws an non-conformable arrays error.
Plot mutational signatures.
Plot cosine similarities against validated signatures.
Citation
If you use sigminer in academic field, please cite:
Wang, Shixiang, et al. “APOBEC3B and APOBEC mutational signature as
potential predictive markers for immunotherapy response in non-small
cell lung cancer.” Oncogene (2018).
and
Gaujoux, Renaud, and Cathal Seoighe. "A Flexible R Package for
Nonnegative Matrix Factorization."" BMC Bioinformatics 11, no. 1 (December 2010).
Acknowledgments
The code for extracting copy number signatures was based in part on the source code from paper Copy number signatures and mutational processes in ovarian carcinoma, if you use this feature, please also cite:
Macintyre, Geoff, et al. "Copy number signatures and mutational
processes in ovarian carcinoma." Nature genetics 50.9 (2018): 1262.
The code for extracting mutational signatures was based in part on the source code of the maftools package, if you use this feature, please also cite:
Mayakonda, Anand, et al. "Maftools: efficient and comprehensive analysis
of somatic variants in cancer." Genome research 28.11 (2018): 1747-1756.
LICENSE
MIT © 2019 Shixiang Wang, Geoffrey Macintyre, Xue-Song Liu