xQTLbiolinks has provided a pipeline for disease target gene discovery through colocalization analysis with xQTLs to detect susceptible genes and causal variants.
To perform colocalization analysis using xQTLbiolinks, users need to provide their own GWAS summary statistics data, specify the tissue of interest and import following packages:
library(data.table)
library(xQTLbiolinks)
library(stringr)
library(coloc)
Download and load an example file of summary statistics dataset
(GRCh38). We perform colocalization analysis in
Brain - Cerebellum
.
<- fread("https://raw.githubusercontent.com/dingruofan/exampleData/master/gwas/AD/gwasDFsub.txt")
gwasDF ="Brain - Cerebellum" tissueSiteDetail
In this example, a data.table object of 16538 (rows) x 5 (cols) is loaded. Five columns are required (arbitrary column names is supported, but columns must be as the following order):
Col 1
. “variants” (character), , using an rsID
(e.g. “rs11966562”);
Col 2
. “chromosome” (character), one of the chromosome
from chr1-chr22;
Col 3
. “position” (integer), genome position of snp;
Col 4
. “P-value” (numeric);
Col 5
. “MAF” (numeric). Allel frequency;
Col 6
. “beta” (numeric). effect size.
Col 7
. “se” (numeric). standard error.
head(gwasDF)
Sentinel SNPs can be detected using
xQTLanalyze_getSentinelSnp
with the arguments
p-value < 5e-8
and
SNP-to-SNP distance > 10e6 bp
. We recommend converting
the genome version of the GWAS file to GRCh38 if that is GRCh37 (run
with parameter: genomeVersion="grch37"
;
grch37To38=TRUE
, and package rtracklayer
is
required).
<- xQTLanalyze_getSentinelSnp(gwasDF, pValueThreshold = 5e-08) sentinelSnpDF
After filtering, a sentinel SNP with P-value<5e-8 is detected in this example:
sentinelSnpDF
Trait genes are genes that located in the range of 1Mb (default, can
be changed with parameter detectRange
) of sentinel SNPs.
Every gene within 1Mb of sentinel SNPs is searched by fuction
xQTLanalyze_getTraits
. Besides, In order to reduce the
number of trait genes and thus reduce the running time, we take the
overlap of eGenes and trait genes as the final output of the function
xQTLanalyze_getTraits
.
<- xQTLanalyze_getTraits(sentinelSnpDF, detectRange=1e6, tissueSiteDetail=tissueSiteDetail) traitsAll
Totally, 3 associations between 3 traits genes and 1 SNPs are detected
traitsAll
For a single trait gene, like CLNK in above
table, colocalization analysis (using coloc
method) can be
performed with:
<- xQTLanalyze_coloc(gwasDF, "CLNK", tissueSiteDetail=tissueSiteDetail) # using gene symbol output
output is a list, including three parts: coloc_Out_summary, gwasEqtlInfo, and gwasEqtlInfo.
$coloc_Out_summary output
For multiple trait genes, a for loop or lapply
function can be used to get all genes’ outputs (using both
coloc
and hyprcoloc
methods).
<- rbindlist(lapply( unique(traitsAll$gencodeId), function(x){ # using gencode ID.
outputs xQTLanalyze_coloc(gwasDF, x, tissueSiteDetail=tissueSiteDetail, method = "Both")$colocOut }))
outputs is a data.table that combined all results of coloc_Out_summary of all genes.
outputs
For the potential casual gene ENSG00000109684.14 (PP4=0.9937 & hypr_posterior=0.9793), we can get its significant associations across tissues:
xQTLvisual_eqtl("ENSG00000109684.14")
For the purpose of visualization of p-value distribution and comparison of the signals of GWAS and eQTL: We merge the variants of GWAS and eQTL by rsid first.
<- xQTLdownload_eqtlAllAsso(gene="ENSG00000109684.14",
eqtlAsso tissueLabel = tissueSiteDetail)
<- merge(gwasDF, eqtlAsso[,.(rsid=snpId, position=pos, maf, pValue)],
gwasEqtldata by=c("rsid", "position"), suffixes = c(".gwas",".eqtl"))
Function xQTLvisual_locusCompare
displays the candidate
SNP rs13120565
(SNP.PP.H4=0.5328849 &
hypr_posterior_explainedBySnp=0.4747) in the top right
corner:
xQTLvisual_locusCompare(gwasEqtldata[,.(rsid, pValue.eqtl)],
legend_position = "bottomright") gwasEqtldata[,.(rsid, pValue.gwas)],
Note: The information on SNP linkage disequilibrium is automatically downloaded online. Due to network issues, the download may fail at times. If this happens, please try running it again.
Locuszoom plot of GWAS signals:
xQTLvisual_locusZoom(gwasEqtldata[,.(rsid, chrom, position, pValue.gwas)], legend=FALSE)
Locuszoom plot of eQTL signals:
xQTLvisual_locusZoom(gwasEqtldata[,.(rsid, chrom, position, pValue.eqtl)],
highlightSnp = "rs13120565", legend=FALSE)
Violin plot of normalized exprssion of eQTL (rs13120565-ENSG00000187323.11):
xQTLvisual_eqtlExp("rs13120565", "ENSG00000109684.14", tissueSiteDetail = tissueSiteDetail)
We can also combine locuscompare and locuszoom plot using
xQTLvisual_locusCombine
:
xQTLvisual_locusCombine(gwasEqtldata[,c("rsid","chrom", "position", "pValue.gwas", "pValue.eqtl")],
highlightSnp="rs13120565")
Colocalized loci should show a general pattern where SNPs in high LD
will show strong associations with expression levels of the colocalized
gene, and the eQTL associations will weaken for SNPs in lower LD. This
pattern of the eQTL varies among different tissues / cell-types, the
strength of which indicates the power of the regulatory effect of the
variant. We can visualize the correlation between p-values of eQTLs and
LD across numerous tissues/cell types to easily distinguish this patten
using xQTLvisual_coloc
:
<- xQTLvisual_coloc(gene="ENSG00000109684.14", variantName="rs13120565",
multi_tissue_coloc tissueLabels = c("Brain - Cerebellar Hemisphere",
"Brain - Cerebellum", "Thyroid", "Lung",
"Cells - EBV-transformed lymphocytes"))
For applying xQTLbiolinks
to a whole case study, please
find this Document.