This document is part of the “DrBats” project whose goal is to implement exploratory statistical analysis on large sets of data with uncertainty. The idea is to visualize the results of the analysis in a way that explicitely illustrates the uncertainty in the data.
The “DrBats” project applies a Bayesian Latent Factor Model.
This project involves the following persons, listed in alphabetical order :
Bénédicte Fontez (aut)
Nadine Hilgert (aut)
Susan Holmes (aut)
Gabrielle Weinrott (cre, aut)
First, let’s load the package.
require(DrBats)
mycol<-c("#ee204d", "#1f75fe", "#1cac78", "#ff7538", "#b4674d", "#926eae",
"#fce883", "#000000", "#78dbe2", "#6e5160", "#ff43a4")
To fit the model, you need to specify a bunch of parameters: the type of prior (“PLT” or “cauchyPLT” for now), the program you want to use (only “stan” is implemented for the time being), a dataset projected onto a histogram basis, the number of latent factors (“D”), and then some other habitual MCMC things (iterations, thinning, burnin).
Then you can execute the modelFit()
function with the chosen parameters. STAN calculations can be long, so the following example is intentionally very small, with few iterations and few chains on the toy dataset.
If you want to estimate many parameteres and require a large number of iterations, you may want to consider running the following commands on a server or a desktop machine.
data("toydata")
proj.data <- toydata$Y.simul$Y
stanfit <- modelFit(model = "igPLT", prog = "stan", parallel = T, Xhisto = proj.data,
nchains = 2, nthin = 10, niter = 1000, D = toydata$wlu$D)
First we must convert the stanfit object to an mcmc.list. In this step we have loaded a stanfit object that is available in the package (with more iterations and chains than in the example above). We can do this using the function coda.obj()
.
data("stanfit")
coda.plt <- coda.obj(stanfit)
Now we must rotate the posterior estimations to obtain a coherent rotation for all simulations. We choose to do this with respect to the rotation obtained in Principal Component Analysis.
The function to obtain the proper rotation of all parameters is coda.plt.clean()
, and takes as arguments the dimension of the original dataset (\(N \times P\)), the number of latent factors (\(D\)), the coda object, a rotation matrix, the “real” latent factors (obtained by a transformation of the PCA output using the function wlu()
), and the “real” factor loadings (the scores obtainted with PCA).
N = nrow(proj.data)
P = ncol(proj.data)
D = toydata$wlu$D
# PCA in the histogram basis
obs <- toydata$X
times <- toydata$t
pca.data <- pca.Deville(obs, times, t.range = c(min(times), max(times)), breaks = 8)
# Post-processing landmark information
rotation <- toydata$wlu$Q # rotation matrix
real.W <- toydata$wlu$W # PCA-determined latent factors
real.B <- t(pca.data$Cp[ , 1:(toydata$wlu$D)]) # PCA-determined scores
coda.plt.clean <- clean.mcmc(N, P, D, coda.plt, rotation, real.W, real.B)
We can plot the posterior density of the likelihood:
post <- postdens(coda.plt.clean, Y = toydata$Y.simul$Y, D = toydata$wlu$D, chain = 1)
hist(post, main = "Histogram of the posterior density", xlab = "Density")
We can also plot the individual projections on the plane spanned by the estimated latent factors, with \(90 \%\) uncertainty regions.
beta.res <- visbeta(coda.plt.clean, toydata$Y.simul$Y, toydata$wlu$D, chain = 1, axes = c(1, 2), quant = c(0.05, 0.95))
ggplot2::ggplot() +
ggplot2::geom_path(data = beta.res$contour.df, ggplot2::aes(x = x, y = y, colour = ind)) +
ggplot2::geom_point(data = beta.res$mean.df, ggplot2::aes(x = x, y = y, colour = ind)) +
ggplot2::ggtitle("Convex hull of HMC estimates of the scores")