If you use mvMAPIT in published research, please cite:
Crawford L, Zeng P, Mukherjee S, & Zhou X (2017). Detecting epistasis with the marginal epistasis test in genetic mapping studies of quantitative traits. PLoS genetics, 13(7), e1006869. https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1006869
Stamp J, DenAdel A, Weinreich D, Crawford, L (2022). Leveraging the Genetic Correlation between Traits Improves the Detection of Epistasis in Genome-wide Association Studies. bioRxiv 2022.11.30.518547; doi: https://doi.org/10.1101/2022.11.30.518547
Stamp J, Crawford L (2022). mvMAPIT: Multivariate Genome Wide Marginal Epistasis Test. https://github.com/lcrawlab/mvMAPIT, https://lcrawlab.github.io/mvMAPIT/
Load necessary libraries. For the sake of getting started,
mvMAPIT
comes with a small set of simulated data. This data
contains random genotype-like data and two simulated quantitative traits
with epistatic interactions. To make use of this data, call the genotype
data as simulated_data$genotype
, and the simulated trait
data as simulated_data$trait
. The vignette traces the
analysis of simulated data. The simulations are described in
vignette("simulations")
.
For a working installation of mvMAPIT please look at
theREADME.md
or vignette("docker-mvmapit")
library(mvMAPIT)
library(dplyr)
library(ggplot2)
library(ggrepel)
library(tidyr)
library(kableExtra)
library(RcppAlgos)
The R routine mvmapit
can be run in multiple modes. By
default it runs in a hybrid mode, performing tests both wtih a normal
Z-test as well as the Davies method. The resulting p
-values
can be combined using functions provided by mvMAPIT
,
e.g. fishers_combined()
, that work on the
pvalues
tibble that mvmapit
returns.
NOTE: mvMAPIT takes the X matrix as \(p \times n\); not as \(n \times p\).
<- mvmapit(
mvmapit_hybrid t(data$genotype),
t(data$trait),
test = "hybrid"
)<- fishers_combined(mvmapit_hybrid$pvalues) fisher
To visualize the genome wide p
-values, we use a
Manhattan plot. The p
-values are plotted after combining
the results from the multivariate analysis using Fisher’s method.
<- ggplot(fisher, aes(x = 1:nrow(fisher), y = -log10(p))) +
manhplot geom_hline(yintercept = -log10(thresh), color = "grey40", linetype = "dashed") +
geom_point(alpha = 0.75, color = "grey50") +
geom_text_repel(aes(label=ifelse(p < thresh, as.character(id), '')))
plot(manhplot)
To control the type I error despite multiple testing, we recommend
the conservative Bonferroni correction. The significant SNPs returned by
the mvMAPIT
analysis are shown in the output below. There
are in total 6 significant SNPs after multiple test correction. Of the
significant SNPs, 4 are true positives.
<- 0.05 / (nrow(fisher) / 2)
thresh <- fisher %>%
significant_snps filter(p < thresh) # Call only marginally significant SNPs
<- simulated_data$epistatic %>%
truth ungroup() %>%
mutate(discovered = (name %in% significant_snps$id)) %>%
select(name, discovered) %>%
unique()
%>%
significant_snps mutate_if(is.numeric, ~(as.character(signif(., 3)))) %>%
mutate(true_pos = id %in% truth$name) %>%
kable(., linesep = "") %>%
kable_material(c("striped"))
id | trait | p | true_pos |
---|---|---|---|
snp_00068 | fisher | 2.65e-10 | TRUE |
snp_00343 | fisher | 5.27e-05 | FALSE |
snp_00460 | fisher | 7.86e-07 | FALSE |
snp_00469 | fisher | 1.47e-09 | TRUE |
snp_00665 | fisher | 1.17e-09 | TRUE |
snp_00917 | fisher | 2.83e-07 | TRUE |
To compare this list to the full list of causal epistatic SNPs of the simulations, refer to the following list. There are 5 causal SNPs. Of these 5 causal SNPs, 4 were succesfully discovered by mvMAPIT.
%>%
truth kable(., linesep = "") %>%
kable_material(c("striped"))
name | discovered |
---|---|
snp_00068 | TRUE |
snp_00156 | FALSE |
snp_00469 | TRUE |
snp_00665 | TRUE |
snp_00917 | TRUE |
Now we may take only the significant SNPs according to their marginal epistatic effects and run a simple exhaustive search between them.
The search itself is a simple regression on the interaction terms between all significant interactions.
# exhaustive search for p-values
<- NULL
pairs if (nrow(significant_snps) > 1) {
<- comboGeneral(significant_snps$id, 2)
pairnames # Generate unique pairs of SNP names;
# for length(names) = n, the result is a (n * (n-1)) x 2 matrix with one row corresponding to a pair
for (k in seq_len(nrow(pairnames))) {
<- lm(y ~ X[, pairnames[k, 1]]:X[, pairnames[k, 2]])
fit <- coefficients(summary(fit))[[1]][2, 4]
p_value1 <- coefficients(summary(fit))[[2]][2, 4]
p_value2 <- dplyr::tibble(
tib x = p_value1,
y = p_value2,
u = pairnames[k, 1],
v = pairnames[k, 2]
)<- bind_rows(pairs, tib)
pairs
}
}
colnames(pairs) <- c(colnames(y), "var1", "var2")
We plot the \(-\mathrm{log}_{10}(p)\) of the
p
-values for the regression coefficients as tile plot to
highlight the identified interaction structure.
<- pairs %>%
plotable pivot_longer(
cols = starts_with("p_"),
names_to = "trait",
names_prefix = "trait ",
values_to = "p",
values_drop_na = TRUE
%>%
) mutate(trait = case_when(
== "p_01" ~ "Trait 1",
trait == "p_02" ~ "Trait 2"))
trait <- ggplot(data = plotable, aes(x=var1, y=var2, fill=-log10(p)))+
tiles geom_tile() +
facet_wrap(~trait) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
scale_fill_viridis_c()
plot(tiles)
The only significant interactions after multiple testing correction
are the interaction between snp00068
and
snp_00665
as well as snp_00465
and
snp_00917
.
%>%
pairs filter(p_01 < 0.05/15 | p_02 < 0.05/15) %>%
kable(., linesep = "", digits = 14) %>%
kable_material(c("striped"))
p_01 | p_02 | var1 | var2 |
---|---|---|---|
1.000000e-14 | 1.645102e-07 | snp_00068 | snp_00665 |
2.640252e-07 | 1.559000e-11 | snp_00469 | snp_00917 |
Compare the results of the exhaustive search to the true interaction structure. Notice that the only significant interactions in the exhaustive search are the two with the largest true effects.
<- simulated_data$interactions %>%
true_interactions mutate(var1 = sprintf(group1, fmt = "snp_%05d")) %>%
mutate(var2 = sprintf(group2, fmt = "snp_%05d")) %>%
mutate(trait = case_when(
== 1 ~ "Trait 1",
trait == 2 ~ "Trait 2")) %>%
trait select(-c("group1", "group2"))
<- true_interactions[, c("var1", "var2")]
X <- t(apply(X, 1, sort))
X c("var1", "var2")] <- X
true_interactions[,
<- comboGeneral(simulated_data$epistatic$name %>% unique(), 2)
epistatic_pairnames <- NULL
true_pairs for (k in seq_len(nrow(epistatic_pairnames))) {
<- dplyr::tibble(var1 = epistatic_pairnames[k, 1],
tib var2 = epistatic_pairnames[k, 2])
<- bind_rows(true_pairs, tib)
true_pairs
}<- anti_join(true_pairs, true_interactions) %>%
anti mutate(effect_size = 0)
# Joining, by = c("var1", "var2")
<- true_interactions %>%
true_int_plot bind_rows(anti %>% mutate(trait = "Trait 1")) %>%
bind_rows(anti %>% mutate(trait = "Trait 2"))
<- ggplot(data = true_int_plot, aes(x=var1, y=var2, fill=effect_size)) +
true_tiles geom_tile() +
facet_wrap(~trait) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
scale_fill_gradient2(low = "blue", high = "red", mid = "white")
plot(true_tiles)
While mvMAPIT does not identify the explicit partner, it still implicates more correct SNPs in this example. All true epistatic SNPs are listed in the following table.
%>%
true_interactions kable(., linesep = "") %>%
kable_material(c("striped"))
effect_size | trait | var1 | var2 |
---|---|---|---|
-0.1446703 | Trait 1 | snp_00156 | snp_00469 |
-0.1463573 | Trait 1 | snp_00068 | snp_00156 |
0.0876213 | Trait 1 | snp_00469 | snp_00665 |
-0.4940839 | Trait 1 | snp_00068 | snp_00665 |
0.4244533 | Trait 1 | snp_00469 | snp_00917 |
0.1504647 | Trait 1 | snp_00068 | snp_00917 |
-0.4336821 | Trait 2 | snp_00156 | snp_00469 |
-0.3663968 | Trait 2 | snp_00068 | snp_00156 |
0.2189245 | Trait 2 | snp_00469 | snp_00665 |
-0.7666908 | Trait 2 | snp_00068 | snp_00665 |
0.9583180 | Trait 2 | snp_00469 | snp_00917 |
-0.2884704 | Trait 2 | snp_00068 | snp_00917 |