This file is meant to present the function runMCMC_btadjust()
in the runMCMCbtadjust
package. The aim of this function is to run a Markov Chain Monte Carlo (MCMC) for a specified Bayesian model while adapting automatically the burn-in and thinning parameters to meet pre-specified targets in terms of MCMC convergence and number of effective values of MCMC outputs - where the term “number of effective values” for the MCMC outputs refers to sample size adjusted for autocorrelation. This is done in only one call to the function that repeatedly calls the MCMC until criteria for convergence and number of effective values are met. This allows to obtain a MCMC output that is out of the transient phase of the MCMC (convergence) and that contains a pre-specified number of nearly independent draws from the posterior distribution (number of effective values).
This function has four main advantages: (i) it saves the analyst’s programming time since he/she does not have to repeatedly diagnose and re-run MCMCs until desired levels of convergence and number of effective values are reached; (ii) it allows a minimal, normalized quality control of MCMC outputs by allowing to meet pre-specified levels in terms of convergence and number of quasi-independent values; (iii) it may save computer’s time when compared to cases where we have to restart the MCMC from the beginning if it has not converged or reached the specified number of effective values (as e.g. with runMCMC
function in NIMBLE
); and (iv) it can be applied with different MCMC R languages.
Indeed, runMCMC_btadjust()
uses other Bayesian packages to fit the MCMC. At present, only the JAGS
, NIMBLE
and greta
languages can be used as these are the main Bayesian languages in R known by the package author and that permit to continue an already fitted MCMC - which is required for numerical efficiency. We will here show how to fit and compare a very simple model under these three languages, using the possibilities allowed by runMCMC_btadjust()
. Our model is one of the simplest statistical model we could think of: inspired from @Kery_2010, we model data of weights of 1,000 Pilgrim falcons (Falco peregrinus) simulated from a Gaussian distribution with mean 600 grams and standard error 30 grams:
set.seed(1)
y1000<-rnorm(n=1000,mean=600,sd=30)
This document is made to provide simple examples with the three languages - JAGS
, NIMBLE
and greta
. Yet only the languages among these three that are available/installed on the computer compiling this document will be developed. And if NIMBLE
is not available, no example will be developed given that Nimble is the reference example herein.
We start with fitting the example with NIMBLE
(cf. https://r-nimble.org/).
library(runMCMCbtadjust)
library(nimble)
#> Warning: le package 'nimble' a été compilé avec la version R 4.3.2
As NIMBLE
distinguishes data that have random distributions from other data, we specify two distinct lists to contain these:
ModelData <-list(mass = y1000)
ModelConsts <- list(nobs = length(y1000))
We then write our Bayesian code within R with the nimbleCode
function in the nimble
package:
ModelCode<-nimbleCode(
{
# Priors
population.mean ~ dunif(0,5000)
population.sd ~ dunif(0,100)
# Normal distribution parameterized by precision = 1/variance in Nimble
population.variance <- population.sd * population.sd
precision <- 1 / population.variance
# Likelihood
for(i in 1:nobs){
mass[i] ~ dnorm(population.mean, precision)
}
})
The model is a simple linear - and Gaussian - model with only an intercept - actually the same model -for the likelihood section - as the probabilistic model used to generate the data.
Our - optional - next step is to specify starting values for model’s parameters. This is done by first writing a function that is repetitively called for each chain. We - also optionally - indicate the names of parameters to be saved and diagnosed in a vector called params
:
ModelInits <- function()
{list (population.mean = rnorm(1,600,90), population.sd = runif(1, 1, 30))}
Nchains <- 3
set.seed(1)
Inits<-lapply(1:Nchains,function(x){ModelInits()})
#specifying the names of parameters to analyse and save:
params <- c("population.mean", "population.sd")
#devising the maximum level allowed for the Geweke diagnostic of convergence (cf. following)
npars<-length(params)
Gew.Max<-as.double(format(quantile(sapply(1:100000,function(x,N){max(abs(rnorm(N)))},npars),0.95),digits=3,scientific=FALSE))
We are now ready to launch runMCMC_btadjust()
: since we use NIMBLE
, we must specify arguments code
, data
, constants
(see below), which are specific to NIMBLE
, as well as MCMC_language="Nimble"
. The next arguments of runMCMC_btadjust()
that we will here work with are for most of them shared among MCMC_language
s. We first do it on one chain (argument Nchains=1
) using in the control
list argument neff.method="Coda"
to use the Coda
method to calculate the number of effective parameters and convtype="Geweke"
to use the Geweke method to diagnose convergence, with the pre-specified maximum - over analyzed parameters - convergence of 2.23 (conv.max=
2.23) - coming from simulated 95% quantiles from standard gaussian distributions that Geweke diagnostics should theoretically follow - and the minimum - over analyzed parameters - number of effective values of 1,000 (neff.min=1000
). Other arguments that are the same for all MCMC languages include params
(parameter names to diagnose and save), inits
(initial values - which are here provided through the list of values Inits[1]
but could also have been specified through a function giving such a result - such as here ModInits
), niter.min
(minimum number of iterations), niter.max
(maximum number of iterations), nburnin.min
, nburnin.max
thin.min
, thin.max
(minimum and maximum number of iterations for respectively the burn-in and thinning parameters):
out.mcmc.Coda.Geweke<-runMCMC_btadjust(code=ModelCode, constants = ModelConsts, data = ModelData, MCMC_language="Nimble",
Nchains=1, params=params, inits=Inits[1],
niter.min=1000, niter.max=300000,
nburnin.min=100, nburnin.max=200000,
thin.min=1, thin.max=1000,
conv.max=Gew.Max, neff.min=1000,
control=list(neff.method="Coda", convtype="Geweke"))
#> [1] "control$seed is NULL. Replaced by 1"
#> ===== Monitors =====
#> thin = 1: population.mean, population.sd
#> ===== Samplers =====
#> RW sampler (2)
#> - population.mean
#> - population.sd
#> [1] "###################################################################################"
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> [1] "###################################################################################"
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> [1] "Raw multiplier of thin: 14.897"
#> [1] "###################################################################################"
#> [1] "Testing multiplier of thin: 15 :"
#> [1] "Testing multiplier of thin: 14 :"
#> [1] "Testing multiplier of thin: 13 :"
#> [1] "Testing multiplier of thin: 12 :"
#> [1] "Testing multiplier of thin: 11 :"
#> [1] "Testing multiplier of thin: 10 :"
#> [1] "Testing multiplier of thin: 9 :"
#> [1] "Testing multiplier of thin: 8 :"
#> [1] "Testing multiplier of thin: 7 :"
#> [1] "Testing multiplier of thin: 6 :"
#> [1] "Retained multiplier of thin: 6 :"
#> [1] "###################################################################################"
#> [1] "Case of niter update: Convergence and trying to reach end of MCMC at the end of next cycle"
#> [1] "###################################################################################"
#> [1] "###################################################################################"
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> [1] "###################################################################################"
#> [1] "MCMC has reached the required level of convergence."
#> [1] "###################################################################################"
#> [1] "MCMC has reached the required level of effective values."
#> [1] "###################################################################################"
The above information printed during running runMCMC_btadjust()
is somewhat elusive. It is possible to get more information printed with components identifier.to.print
, print.diagnostics
, print.thinmult
and innerprint
of the argument control
of runMCMC_btadjust()
or the component showCompilerOutput
of control.MCMC
- this last one being active only with MCMC_language=="Nimble"
. See the help file of runMCMC_btadjust()
for more information. By default, only print.thinmult
is TRUE
and therefore activated. We hereafter just show the activation of component print.diagnostics
to show the reader that it can produce useful information to better realize what is being done in terms of control of convergence and number of effective values.
out.mcmc.Coda.Geweke.with.print.diagnostics<-runMCMC_btadjust(code=ModelCode, constants = ModelConsts, data = ModelData, MCMC_language="Nimble",
Nchains=1, params=params, inits=Inits[1],
niter.min=1000, niter.max=300000,
nburnin.min=100, nburnin.max=200000,
thin.min=1, thin.max=1000,
conv.max=Gew.Max, neff.min=1000,
control=list(neff.method="Coda", convtype="Geweke",print.diagnostics=TRUE))
#> [1] "control$seed is NULL. Replaced by 1"
#> ===== Monitors =====
#> thin = 1: population.mean, population.sd
#> ===== Samplers =====
#> RW sampler (2)
#> - population.mean
#> - population.sd
#> [1] "###################################################################################"
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> [1] "###################################################################################"
#> [1] "Current state of diagnostics:"
#> Nchains thin niter.tot Nvalues nu.burn
#> MCMC parameters 1 1 1000 900 101
#> [1] "###################################################################################"
#> max median mean name_max prop_ab_p975 prop_ab_p995 prop_ab_p9995
#> Geweke 1.388 1.15 1.15 population.sd 0 0 0
#> [1] "###################################################################################"
#> min median mean name_min prop_bel_1000
#> Neff 53.312 95.868 95.868 population.mean 1
#> prop_bel_5000 prop_bel_10000
#> Neff 1 1
#> [1] "###################################################################################"
#> [1] "###################################################################################"
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> [1] "###################################################################################"
#> [1] "Current state of diagnostics:"
#> Nchains thin niter.tot Nvalues nu.burn
#> MCMC parameters 1 1 2000 1900 101
#> [1] "###################################################################################"
#> max median mean name_max prop_ab_p975 prop_ab_p995 prop_ab_p9995
#> Geweke 1.537 1.22 1.22 population.sd 0 0 0
#> [1] "###################################################################################"
#> min median mean name_min prop_bel_1000
#> Neff 127.546 241.16 241.16 population.mean 1
#> prop_bel_5000 prop_bel_10000
#> Neff 1 1
#> [1] "###################################################################################"
#> [1] "Raw multiplier of thin: 14.897"
#> [1] "###################################################################################"
#> [1] "Testing multiplier of thin: 15 :"
#> [1] "Testing multiplier of thin: 14 :"
#> [1] "Testing multiplier of thin: 13 :"
#> [1] "Testing multiplier of thin: 12 :"
#> [1] "Testing multiplier of thin: 11 :"
#> [1] "Testing multiplier of thin: 10 :"
#> [1] "Testing multiplier of thin: 9 :"
#> [1] "Testing multiplier of thin: 8 :"
#> [1] "Testing multiplier of thin: 7 :"
#> [1] "Testing multiplier of thin: 6 :"
#> [1] "###################################################################################"
#> [1] "Current state of diagnostics:"
#> Nchains thin niter.tot Nvalues nu.burn
#> MCMC parameters 1 6 2000 317 101
#> [1] "###################################################################################"
#> max median mean name_max prop_ab_p975 prop_ab_p995 prop_ab_p9995
#> Geweke 0.912 0.891 0.891 population.sd 0 0 0
#> [1] "###################################################################################"
#> min median mean name_min prop_bel_1000
#> Neff 103.353 158.328 158.328 population.mean 1
#> prop_bel_5000 prop_bel_10000
#> Neff 1 1
#> [1] "###################################################################################"
#> [1] "Retained multiplier of thin: 6 :"
#> [1] "###################################################################################"
#> [1] "Case of niter update: Convergence and trying to reach end of MCMC at the end of next cycle"
#> [1] "###################################################################################"
#> [1] "###################################################################################"
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> [1] "###################################################################################"
#> [1] "Current state of diagnostics:"
#> Nchains thin niter.tot Nvalues nu.burn
#> MCMC parameters 1 6 18455 3059 101
#> [1] "###################################################################################"
#> max median mean name_max prop_ab_p975 prop_ab_p995
#> Geweke 0.381 0.24 0.24 population.mean 0 0
#> prop_ab_p9995
#> Geweke 0
#> [1] "###################################################################################"
#> min median mean name_min prop_bel_1000
#> Neff 1848.939 2132.664 2132.664 population.mean 0
#> prop_bel_5000 prop_bel_10000
#> Neff 1 1
#> [1] "###################################################################################"
#> [1] "###################################################################################"
#> [1] "MCMC has reached the required level of convergence."
#> [1] "###################################################################################"
#> [1] "MCMC has reached the required level of effective values."
#> [1] "###################################################################################"
We advise the reader to use the print.diagnostics
functionality but do not in the following to keep the length of the document to a minimum.
Before turning to other models, let us depict the nature of the result of runMCMC_btadjust()
function. The length of the output 1 is always equal to the number of Markov Chains - argument Nchains
. Its class is mcmc.list - a type of object classical for MCMC outputs and defined in the coda
package. Each component of this list contains the successive retained values for each saved parameter as well as attributes that give information on the MCMC they come from - see the beginning of the first component below:
length(out.mcmc.Coda.Geweke)
#> [1] 1
class(out.mcmc.Coda.Geweke)
#> [1] "mcmc.list"
head(out.mcmc.Coda.Geweke[[1]])
#> Markov Chain Monte Carlo (MCMC) output:
#> Start = 100
#> End = 136
#> Thinning interval = 6
#> population.mean population.sd
#> [1,] 584.7840 34.71150
#> [2,] 587.4004 34.86733
#> [3,] 589.9136 33.16044
#> [4,] 593.8190 31.64323
#> [5,] 597.1063 31.45800
#> [6,] 597.4063 31.51227
#> [7,] 600.5426 30.18588
The output however has extra information in its attributes
: indeed, attributes
has 5 components, whose name are: call.params, final.params, final.diags, sessionInfo, class: in addition to containing the class of the object - here “mcmc.list”, these attributes include information on the R session in which the function was executed - component sessionInfo, the final diagnostics of the model - component final.diags -, the parameters used in the call of the runMCMC_btadjust()
function - component call.params - and finally the final “parameters” of the function - component final.params. In case MCMC_language
is not “Greta”, the call.params component contains either the entire data and /or the constants or a summary of these (to keep the output to a controlled object size) - this choice is controlled by the component save.data
of parameter control
. The component final.params has a series of heterogeneous parameters including whether the model has converged, has reached its targets in terms of numbers of effective values…, as well as information in terms of duration of the different sections of the analysis - which we will use in the sequence of this document. See the help file for more information as well as the below printing. Finally, the sessionInfo component has many interesting info relative to the context in which the function runMCMC_btadjust()
was executed (including platform, version of R, versions of packages…).
names(attributes(out.mcmc.Coda.Geweke))
#> [1] "call.params" "final.params" "final.diags" "sessionInfo" "class"
names(attributes(out.mcmc.Coda.Geweke)$package.versions)
#> NULL
attributes(out.mcmc.Coda.Geweke)$final.params
#> $converged
#> [1] TRUE
#>
#> $neffs.reached
#> [1] TRUE
#>
#> $final.Nchains
#> [1] 1
#>
#> $burnin
#> [1] 100
#>
#> $thin
#> [1] 6
#>
#> $niter.tot
#> [1] 18455
#>
#> $WAIC
#> NULL
#>
#> $extraResults
#> NULL
#>
#> $Temps
#> $Temps$chain1
#> NULL
#>
#>
#> $duration
#> Time difference of 21.81957 secs
#>
#> $duration.MCMC.preparation
#> Time difference of 15.97213 secs
#>
#> $duration.MCMC.transient
#> Time difference of 0.008876192 secs
#>
#> $duration.MCMC.asymptotic
#> Time difference of 1.628959 secs
#>
#> $duration.MCMC.after
#> Time difference of 6.41346e-05 secs
#>
#> $duration.btadjust
#> Time difference of 4.209543 secs
#>
#> $CPUduration
#> [1] 7.71
#>
#> $CPUduration.MCMC.preparation
#> [1] 5.57
#>
#> $CPUduration.MCMC.transient
#> [1] 0.008779536
#>
#> $CPUduration.MCMC.asymptotic
#> [1] 1.61122
#>
#> $CPUduration.MCMC.after
#> [1] 0
#>
#> $CPUduration.btadjust
#> [1] 0.52
#>
#> $childCPUduration
#> [1] NA
#>
#> $childCPUduration.MCMC.preparation
#> [1] NA
#>
#> $childCPUduration.MCMC.transient
#> [1] NA
#>
#> $childCPUduration.MCMC.asymptotic
#> [1] NA
#>
#> $childCPUduration.MCMC.after
#> [1] NA
#>
#> $childCPUduration.btadjust
#> [1] NA
#>
#> $time_end
#> [1] "2024-04-25 10:50:38 CEST"
attributes(out.mcmc.Coda.Geweke)$sessionInfo
#> R version 4.3.1 (2023-06-16 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19045)
#>
#> Matrix products: default
#>
#>
#> locale:
#> [1] LC_COLLATE=C LC_CTYPE=French_France.utf8
#> [3] LC_MONETARY=French_France.utf8 LC_NUMERIC=C
#> [5] LC_TIME=French_France.utf8
#>
#> time zone: Europe/Paris
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] nimble_1.0.1 runMCMCbtadjust_1.1.0
#>
#> loaded via a namespace (and not attached):
#> [1] Matrix_1.6-4 jsonlite_1.8.8 compiler_4.3.1
#> [4] crayon_1.5.2 Rcpp_1.0.11 tensorflow_2.14.0
#> [7] parallel_4.3.1 greta_0.4.5 callr_3.7.3
#> [10] globals_0.16.2 progress_1.2.3 tfruns_1.5.1
#> [13] png_0.1-8 reticulate_1.34.0 lattice_0.21-8
#> [16] coda_0.19-4 R6_2.5.1 igraph_1.5.1
#> [19] knitr_1.45 future_1.33.1 rlang_1.1.2
#> [22] xfun_0.41 cli_3.6.1 withr_2.5.2
#> [25] magrittr_2.0.3 ps_1.7.5 digest_0.6.33
#> [28] grid_4.3.1 processx_3.8.2 rstudioapi_0.15.0
#> [31] base64enc_0.1-3 rappdirs_0.3.3 hms_1.1.3
#> [34] lifecycle_1.0.4 runjags_2.2.2-4 prettyunits_1.2.0
#> [37] vctrs_0.6.4 pracma_2.4.4 glue_1.6.2
#> [40] evaluate_0.23 numDeriv_2016.8-1.1 whisker_0.4.1
#> [43] listenv_0.9.1 codetools_0.2-19 parallelly_1.37.0
#> [46] tools_4.3.1 pkgconfig_2.0.3
We then run the MCMC with Nchains
MCMC chains, the - default - Gelman-Rubin diagnostic of convergence and the -default- rstan
method for calculating the number of effective values:
out.mcmc<-runMCMC_btadjust(code=ModelCode, constants = ModelConsts, data = ModelData, MCMC_language="Nimble",
Nchains=Nchains, params=params, inits=Inits,
niter.min=1000, niter.max=300000,
nburnin.min=100, nburnin.max=200000,
thin.min=1, thin.max=1000,
conv.max=1.05, neff.min=1000)
#> [1] "control$seed is NULL. Replaced by 1"
#> ===== Monitors =====
#> thin = 1: population.mean, population.sd
#> ===== Samplers =====
#> RW sampler (2)
#> - population.mean
#> - population.sd
#> ===== Monitors =====
#> thin = 1: population.mean, population.sd
#> ===== Samplers =====
#> RW sampler (2)
#> - population.mean
#> - population.sd
#> ===== Monitors =====
#> thin = 1: population.mean, population.sd
#> ===== Samplers =====
#> RW sampler (2)
#> - population.mean
#> - population.sd
#> [1] "###################################################################################"
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> [1] "###################################################################################"
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> [1] "Raw multiplier of thin: 4.796"
#> [1] "###################################################################################"
#> [1] "Testing multiplier of thin: 5 :"
#> [1] "Retained multiplier of thin: 5 :"
#> [1] "###################################################################################"
#> [1] "Case of niter update: Convergence and trying to reach end of MCMC at the end of next cycle"
#> [1] "###################################################################################"
#> [1] "###################################################################################"
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> [1] "###################################################################################"
#> [1] "MCMC has reached the required level of convergence."
#> [1] "###################################################################################"
#> [1] "MCMC has reached the required level of effective values."
#> [1] "###################################################################################"
We compare the characteristics of the two MCMCs, both in terms of burn-in, thinning parameter, number of iterations and in terms of time (both total time and CPU time).
Table: Comparison of the efficiency of first two NIMBLE models:
Nimble.Coda.Geweke | Nimble.default | |
---|---|---|
converged | 1.00000000 | 1.00000000 |
neffs.reached | 1.00000000 | 1.00000000 |
final.Nchains | 1.00000000 | 3.00000000 |
burnin | 100.00000000 | 575.00000000 |
thin | 6.00000000 | 5.00000000 |
niter.tot | 18455.00000000 | 2955.00000000 |
duration | 21.81957293 | 41.27244711 |
duration.MCMC.preparation | 15.97213101 | 32.59623289 |
duration.MCMC.transient | 0.00887619 | 0.16122569 |
duration.MCMC.asymptotic | 1.62895883 | 0.66733418 |
duration.MCMC.after | 0.00006413 | 0.00004601 |
duration.btadjust | 4.20954275 | 7.84760833 |
CPUduration | 7.71000000 | 10.92000000 |
CPUduration.MCMC.preparation | 5.57000000 | 7.67000000 |
CPUduration.MCMC.transient | 0.00877954 | 0.15566836 |
CPUduration.MCMC.asymptotic | 1.61122046 | 0.64433164 |
CPUduration.MCMC.after | 0.00000000 | 0.00000000 |
CPUduration.btadjust | 0.52000000 | 2.45000000 |
childCPUduration | NA | NA |
childCPUduration.MCMC.preparation | NA | NA |
childCPUduration.MCMC.transient | NA | NA |
childCPUduration.MCMC.asymptotic | NA | NA |
childCPUduration.MCMC.after | NA | NA |
childCPUduration.btadjust | NA | NA |
We acknowledge that the Coda.Geweke algorithm takes much less time (rows named “duration” and “CPUduration” in the previous table) than the classical setting to prepare data (rows named “duration.MCMC.preparation” and “CPUduration.MCMC.preparation”)- as NIMBLE takes quite a lot of time to prepare each MCMC chain - and we have 3 chains to prepare in the default setting compared to 1 with Geweke.
out.mcmc.Geweke<-runMCMC_btadjust(code=ModelCode, constants = ModelConsts, data = ModelData, MCMC_language="Nimble",
Nchains=1, params=params, inits=Inits[1],
niter.min=1000, niter.max=300000,
nburnin.min=100, nburnin.max=200000,
thin.min=1, thin.max=1000,
conv.max=Gew.Max, neff.min=1000,
control=list(convtype="Geweke"))
#> [1] "control$seed is NULL. Replaced by 1"
#> ===== Monitors =====
#> thin = 1: population.mean, population.sd
#> ===== Samplers =====
#> RW sampler (2)
#> - population.mean
#> - population.sd
#> [1] "###################################################################################"
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> [1] "###################################################################################"
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> [1] "Raw multiplier of thin: 13.571"
#> [1] "###################################################################################"
#> [1] "Testing multiplier of thin: 14 :"
#> [1] "Testing multiplier of thin: 13 :"
#> [1] "Testing multiplier of thin: 12 :"
#> [1] "Testing multiplier of thin: 11 :"
#> [1] "Testing multiplier of thin: 10 :"
#> [1] "Testing multiplier of thin: 9 :"
#> [1] "Testing multiplier of thin: 8 :"
#> [1] "Testing multiplier of thin: 7 :"
#> [1] "Testing multiplier of thin: 6 :"
#> [1] "Retained multiplier of thin: 6 :"
#> [1] "###################################################################################"
#> [1] "Case of niter update: Convergence and trying to reach end of MCMC at the end of next cycle"
#> [1] "###################################################################################"
#> [1] "###################################################################################"
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> [1] "###################################################################################"
#> [1] "MCMC has reached the required level of convergence."
#> [1] "###################################################################################"
#> [1] "MCMC has reached the required level of effective values."
#> [1] "###################################################################################"
We compare the characteristics of the three NIMBLE
MCMCs,
Table: Comparison of the efficiency of the three NIMBLE models:
Nimble.Coda.Geweke | Nimble.Geweke | Nimble.default | |
---|---|---|---|
converged | 1.00000000 | 1.00000000 | 1.00000000 |
neffs.reached | 1.00000000 | 1.00000000 | 1.00000000 |
final.Nchains | 1.00000000 | 1.00000000 | 3.00000000 |
burnin | 100.00000000 | 100.00000000 | 575.00000000 |
thin | 6.00000000 | 6.00000000 | 5.00000000 |
niter.tot | 18455.00000000 | 17232.00000000 | 2955.00000000 |
duration | 21.81957293 | 18.75002503 | 41.27244711 |
duration.MCMC.preparation | 15.97213101 | 11.06818104 | 32.59623289 |
duration.MCMC.transient | 0.00887619 | 0.00950664 | 0.16122569 |
duration.MCMC.asymptotic | 1.62895883 | 1.62829668 | 0.66733418 |
duration.MCMC.after | 0.00006413 | 0.00004697 | 0.00004601 |
duration.btadjust | 4.20954275 | 6.04399371 | 7.84760833 |
CPUduration | 7.71000000 | 4.41000000 | 10.92000000 |
CPUduration.MCMC.preparation | 5.57000000 | 2.54000000 | 7.67000000 |
CPUduration.MCMC.transient | 0.00877954 | 0.00946134 | 0.15566836 |
CPUduration.MCMC.asymptotic | 1.61122046 | 1.62053866 | 0.64433164 |
CPUduration.MCMC.after | 0.00000000 | 0.00000000 | 0.00000000 |
CPUduration.btadjust | 0.52000000 | 0.24000000 | 2.45000000 |
childCPUduration | NA | NA | NA |
childCPUduration.MCMC.preparation | NA | NA | NA |
childCPUduration.MCMC.transient | NA | NA | NA |
childCPUduration.MCMC.asymptotic | NA | NA | NA |
childCPUduration.MCMC.after | NA | NA | NA |
childCPUduration.btadjust | NA | NA | NA |
Results did not completely corroborate our above expectations: the thinning parameter was not increased when changing from Coda.Geweke to Geweke as expected above (row “thin”).
We now turn to the comparison of the statistical parameter outputs. We use two sample Kolmogorov-Smirnov tests to compare each parameter by pairs of MCMC methods:
Table: P-values of paired Kolmogorov-Smirnov tests of output parameters (columns) between the first three NIMBLE models (rows):
mean | sd | |
---|---|---|
default vs. Geweke | 0.2004 | 0.3160 |
Coda.Geweke vs. Geweke | 1.0000 | 1.0000 |
Default vs. Coda.Geweke | 0.1554 | 0.2840 |
The p-values associated to the KS tests are not very small. This indicates that the MCMC outputs can be considered as being drawn from the same distributions.
These parameters are summarized in the next tables.
Table: Summary of the statistical parameters of the Nimble Coda.Geweke model:
Mean | SD | Naive SE | Time-series SE | |
---|---|---|---|---|
population.mean | 599.607 | 1.052 | 0.019 | 0.024 |
population.sd | 31.093 | 0.698 | 0.013 | 0.014 |
Table: Summary of the statistical parameters of the Nimble Geweke model:
Mean | SD | Naive SE | Time-series SE | |
---|---|---|---|---|
population.mean | 599.610 | 1.059 | 0.020 | 0.025 |
population.sd | 31.090 | 0.698 | 0.013 | 0.015 |
Table: Summary of the statistical parameters of the Nimble default model:
Mean | SD | Naive SE | Time-series SE | |
---|---|---|---|---|
population.mean | 599.686 | 0.968 | 0.026 | 0.029 |
population.sd | 31.078 | 0.718 | 0.019 | 0.019 |
We notice that parameter values are very close, that naive standard errors (SEs) are very close to Time-series SEs - which is linked to the automatic tuning of the thinning parameter which produces output samples which are nearly independent - and that differences between mean estimators are within several units of Time series-SEs - which we interpret is mostly due to the control of convergence.
We now turn to analyzing the same data with the same statistical model using JAGS
with runMCMC_btadjust()
. We rely on the data simulated above. In JAGS
, we now put all the data in the same list:
ModelData.Jags <-list(mass = y1000, nobs = length(y1000))
We then propose the use of JAGS
with a specification of the model from within R - which we find more convenient. We therefore write the model within R
as a character chain:
modeltotransfer<-"model {
# Priors
population.mean ~ dunif(0,5000)
population.sd ~ dunif(0,100)
# Normal distribution parameterized by precision = 1/variance in Jags
population.variance <- population.sd * population.sd
precision <- 1 / population.variance
# Likelihood
for(i in 1:nobs){
mass[i] ~ dnorm(population.mean, precision)
}
}"
The other objects useful or required for running runMCMC_btadjust
with JAGS
are similar to those required with NIMBLE
(Inits
, Nchains
, params
) and are not repeated here.
We then launch runMCMC_btadjust()
with MCMC_language="Jags"
, specifying arguments code
and data
which are required in this case:
set.seed(1)
out.mcmc.Jags<-runMCMC_btadjust(code=modeltotransfer, data = ModelData.Jags, MCMC_language="Jags",
Nchains=Nchains, params=params, inits=Inits,
niter.min=1000,niter.max=300000,
nburnin.min=100,nburnin.max=200000,
thin.min=1,thin.max=1000,
conv.max=1.05,neff.min=1000)
#> [1] "control$seed is NULL. Replaced by 1"
#> Compiling model graph
#> Resolving undeclared variables
#> Allocating nodes
#> Graph information:
#> Observed stochastic nodes: 1000
#> Unobserved stochastic nodes: 2
#> Total graph size: 1009
#>
#> Initializing model
#>
#> [1] "###################################################################################"
#> [1] "###################################################################################"
#> [1] "MCMC has reached the required level of convergence."
#> [1] "###################################################################################"
#> [1] "MCMC has reached the required level of effective values."
#> [1] "###################################################################################"
Note that if we had written the JAGS
code in a text file named "ModelJags.txt"
, we would just have replaced in the command above code=modeltotransfer
by code="ModelJags.txt"
.
Table: Summary of the statistical parameters of the Jags model:
Mean | SD | Naive SE | Time-series SE | |
---|---|---|---|---|
population.mean | 599.640 | 1.009 | 0.019 | 0.023 |
population.sd | 31.081 | 0.681 | 0.013 | 0.016 |
Results seem in line with those of NIMBLE
. We check this using a paired Kolmogorov-Smirnov tests with NIMBLE
models:
Table: P-values of paired Kolmogorov-Smirnov tests of output parameters (columns) of the Jags model with the three NIMBLE models (rows):
mean | sd | |
---|---|---|
Nimble.Geweke vs. Jags | 0.7510 | 0.9239 |
Nimble.Coda.Geweke vs. Jags | 0.5774 | 0.7943 |
Nimble.Default vs. Jags | 0.1728 | 0.2101 |
Our results do confirm that the JAGS
result cannot be considered as stemming from a different probability distribution than NIMBLE
results.
We finally compare the efficiency of the JAGS
and default NIMBLE
MCMCs:
Table: Comparison of the efficiency of the default NIMBLE model and the Jags model:
Nimble.default | Jags | |
---|---|---|
converged | 1.00000000 | 1.00000000 |
neffs.reached | 1.00000000 | 1.00000000 |
final.Nchains | 3.00000000 | 3.00000000 |
burnin | 575.00000000 | 100.00000000 |
thin | 5.00000000 | 1.00000000 |
niter.tot | 2955.00000000 | 1000.00000000 |
duration | 41.27244711 | 6.71807909 |
duration.MCMC.preparation | 32.59623289 | 2.47208190 |
duration.MCMC.transient | 0.16122569 | 0.15866129 |
duration.MCMC.asymptotic | 0.66733418 | 1.42795165 |
duration.MCMC.after | 0.00004601 | 0.00004601 |
duration.btadjust | 7.84760833 | 2.65933824 |
CPUduration | 10.92000000 | 3.16000000 |
CPUduration.MCMC.preparation | 7.67000000 | 1.60000000 |
CPUduration.MCMC.transient | 0.15566836 | 0.15300000 |
CPUduration.MCMC.asymptotic | 0.64433164 | 1.37700000 |
CPUduration.MCMC.after | 0.00000000 | 0.00000000 |
CPUduration.btadjust | 2.45000000 | 0.03000000 |
childCPUduration | NA | NA |
childCPUduration.MCMC.preparation | NA | NA |
childCPUduration.MCMC.transient | NA | NA |
childCPUduration.MCMC.asymptotic | NA | NA |
childCPUduration.MCMC.after | NA | NA |
childCPUduration.btadjust | NA | NA |
The conclusion is that JAGS
is much faster than NIMBLE
on this example (row named duration
in the previous table), due to much less time devoted to MCMC preparation - as well as to burn-in/thinning adjustment (rows named duration.MCMC.preparation
and duration.btadjust
in the previous table). Actually there is no adjustment with JAGS
(niter.tot
is equal to the initial number of iterations). Yet, NIMBLE
is quicker regarding MCMC updating by iteration since it took NIMBLE
less time than JAGS
for the transient phase (respectively less than twice time for the asymptotic phase) although using more than 5.75 (resp. 2.6444444 for the asymptotic phase) times more iterations than JAGS
.
At first sight, we would also conclude that MCMC efficiency per effective value is also better with NIMBLE
since both languages had the same target for the minimum number of effective value - 1,000 - and the total MCMC time was lower with NIMBLE
. Yet, the number of effective values are different:
Table: Comparison of the number of effective values between the default NIMBLE model and the JAGS model:
Nimble.default | Jags | |
---|---|---|
Min. Number Eff. values | 1113.0000000 | 1699.0000000 |
MCMC CPU time per Effective Value | 0.0007188 | 0.0009005 |
Indeed, “JAGS” with just the first iterations produced a higher number of effective values - actually bigger than the targeted “neff.min” - than “NIMBLE”. Yet, the MCMC time per effective value remained lower with “NIMBLE” than with “JAGS” with this model (cf. table above).
We finally run the greta
version of our model with runMCMC_btadjust()
. greta
is rather different from JAGS
and NIMBLE
in that the model defines objects in R and thus does not require a model code to be passed to runMCMC_btadjust()
, nor Data or Constants. We rely on the very data simulated above. The coding with greta
is as follows:
#in my setting I need to load not only greta but R6 & tensorflow packages
library(greta)
#> Warning: le package 'greta' a été compilé avec la version R 4.3.3
library (R6)
#> Warning: le package 'R6' a été compilé avec la version R 4.3.2
library(tensorflow)
#> Warning: le package 'tensorflow' a été compilé avec la version R 4.3.2
#first requirement of greta: declaring the data that will be analyzed with the function as_data
Y<-as_data(y1000)
#we then proceed by writing the model directly in R, starting with the priors of the parameters using greta functions for probability distributions - here uniform()
population.mean<-uniform(0,5000)
population.sd<-uniform(0,100)
#we then define the distribution of the data - here with the normal distribution - by default parametrized with a standard deviation in greta:
try({distribution(Y)<-normal(population.mean,population.sd) })
#we finally declare the greta model, which will be the object passed to runMCMC_btadjust
m<-model(population.mean, population.sd)
### we finally have to prepare initial values with a specific greta function - initials:
ModelInits.Greta <- function()
{initials(population.mean = rnorm(1,600,90), population.sd = runif(1, 1, 30))}
set.seed(1)
Inits.Greta<-lapply(1:Nchains,function(x){ModelInits.Greta()})
We are now ready to fit the model with runMCMC_btadjust()
, specifying MCMC_language="Greta"
and giving the argument model
instead of code
and data
:
out.mcmc.greta<-runMCMC_btadjust(model=m, MCMC_language="Greta",
Nchains=Nchains,params=params,inits=Inits.Greta,
niter.min=1000,niter.max=300000,
nburnin.min=100,nburnin.max=200000,
thin.min=1,thin.max=1000,
conv.max=1.05, neff.min=1000)
#> [1] "control$seed is NULL. Replaced by 1"
#> [1] "###################################################################################"
#>
#> [1] "Raw multiplier of thin: 4.093"
#> [1] "###################################################################################"
#> [1] "Testing multiplier of thin: 4 :"
#> [1] "Retained multiplier of thin: 4 :"
#> [1] "###################################################################################"
#> [1] "Case of niter update: Convergence and trying to reach end of MCMC at the end of next cycle"
#> [1] "###################################################################################"
#> [1] "###################################################################################"
#>
#> [1] "###################################################################################"
#> [1] "MCMC has reached the required level of convergence."
#> [1] "###################################################################################"
#> [1] "MCMC has reached the required level of effective values."
#> [1] "###################################################################################"
Table: Summary of the statistical parameters of the greta model:
Mean | SD | Naive SE | Time-series SE | |
---|---|---|---|---|
population.mean | 599.683 | 0.983 | 0.026 | 0.027 |
population.sd | 31.106 | 0.694 | 0.018 | 0.016 |
We first check that estimations are similar to those with NIMBLE
and JAGS
with paired Kolmogorov-Smirnov tests:
Table: P-values of paired Kolmogorov-Smirnov tests of output parameters of the greta model with the default NIMBLE model and the JAGS model:
mean | sd | |
---|---|---|
Nimble vs. greta | 0.9974 | 0.2307 |
Jags vs. greta | 0.3427 | 0.6979 |
We then report the efficiency of the MCMCs.
Table: Comparison of the efficiency of the default NIMBLE, the JAGS and the greta models:
Nimble.default | Jags | Greta | |
---|---|---|---|
converged | 1.00000000 | 1.00000000 | 1.00000000 |
neffs.reached | 1.00000000 | 1.00000000 | 1.00000000 |
final.Nchains | 3.00000000 | 3.00000000 | 3.00000000 |
burnin | 575.00000000 | 100.00000000 | 0.00000000 |
thin | 5.00000000 | 1.00000000 | 4.00000000 |
niter.tot | 2955.00000000 | 1000.00000000 | 1972.00000000 |
duration | 41.27244711 | 6.71807909 | 17.57540703 |
duration.MCMC.preparation | 32.59623289 | 2.47208190 | 0.54078507 |
duration.MCMC.transient | 0.16122569 | 0.15866129 | 4.27031694 |
duration.MCMC.asymptotic | 0.66733418 | 1.42795165 | 8.42106500 |
duration.MCMC.after | 0.00004601 | 0.00004601 | 0.00004697 |
duration.btadjust | 7.84760833 | 2.65933824 | 4.34319305 |
CPUduration | 10.92000000 | 3.16000000 | 42.81000000 |
CPUduration.MCMC.preparation | 7.67000000 | 1.60000000 | 0.02000000 |
CPUduration.MCMC.transient | 0.15566836 | 0.15300000 | 14.37752355 |
CPUduration.MCMC.asymptotic | 0.64433164 | 1.37700000 | 28.35247645 |
CPUduration.MCMC.after | 0.00000000 | 0.00000000 | 0.00000000 |
CPUduration.btadjust | 2.45000000 | 0.03000000 | 0.06000000 |
childCPUduration | NA | NA | NA |
childCPUduration.MCMC.preparation | NA | NA | NA |
childCPUduration.MCMC.transient | NA | NA | NA |
childCPUduration.MCMC.asymptotic | NA | NA | NA |
childCPUduration.MCMC.after | NA | NA | NA |
childCPUduration.btadjust | NA | NA | NA |
MCMC time (rows duration.MCMC.transient
& duration.MCMC.asymptotic
) was far greater with greta
than with JAGS
and NIMBLE
, for a minimum number of effective values with greta
of 1127. Total duration is rather close with greta
compared with NIMBLE
, due to the great time required by NIMBLE
for MCMC preparation - while this preparation is done outside runMCMC_btadjust()
with greta
. Yet, when we compare CPU total durations (CPUduration
), greta
gets worse than NIMBLE
while it was the reverse for total duration (duration
), simply because greta
parallelized its process and therefore required more CPU time per time unit.
We tried to give a second chance to greta
, based on the following post: https://forum.greta-stats.org/t/size-and-number-of-leapfrog-steps-in-hmc/332. The idea was to let greta
have more information to adapt its hmc parameters during the warm-up phase by just having more chains to run - hereafter, 15.
Nchains.Greta<-15
ModelInits.Greta <- function()
{initials(population.mean = rnorm(1,600,90), population.sd = runif(1, 1, 30))}
set.seed(1)
Inits.Greta<-lapply(1:Nchains.Greta,function(x){ModelInits.Greta()})
out.mcmc.greta.morechains<-runMCMC_btadjust(model=m, MCMC_language="Greta",
Nchains=Nchains.Greta,params=params,inits=Inits.Greta,
niter.min=1000,niter.max=300000,
nburnin.min=100,nburnin.max=200000,
thin.min=1,thin.max=1000,
conv.max=1.05, neff.min=1000)
#> [1] "control$seed is NULL. Replaced by 1"
#> [1] "###################################################################################"
#>
#> [1] "###################################################################################"
#> [1] "MCMC has reached the required level of convergence."
#> [1] "###################################################################################"
#> [1] "MCMC has reached the required level of effective values."
#> [1] "###################################################################################"
Table: Summary of the statistical parameters of the greta model with 15 chains:
Mean | SD | Naive SE | Time-series SE | |
---|---|---|---|---|
population.mean | 599.6800 | 0.9764 | 0.0080 | 0.0166 |
population.sd | 31.1108 | 0.7070 | 0.0058 | 0.0112 |
This run was indeed much faster. Parameter estimates were still not significantly different from those with NIMBLE
and JAGS
based on paired Kolmogorov-Smirnov tests:
Table: P-values of paired Kolmogorov-Smirnov tests of output parameters of the greta model with 15 chains with the default NIMBLE model and the JAGS model:
mean | sd | |
---|---|---|
Nimble vs. greta.morechains | 0.97118 | 0.07684 |
Jags vs. greta.morechains | 0.10423 | 0.09067 |
We now report the efficiency of the MCMCs:
Table: Comparison of the efficiency of the default NIMBLE, the JAGS and the greta.morechains models:
Nimble.default | Jags | Greta.morechains | |
---|---|---|---|
converged | 1.00000000 | 1.00000000 | 1.00000000 |
neffs.reached | 1.00000000 | 1.00000000 | 1.00000000 |
final.Nchains | 3.00000000 | 3.00000000 | 15.00000000 |
burnin | 575.00000000 | 100.00000000 | 0.00000000 |
thin | 5.00000000 | 1.00000000 | 1.00000000 |
niter.tot | 2955.00000000 | 1000.00000000 | 1000.00000000 |
duration | 41.27244711 | 6.71807909 | 15.10678387 |
duration.MCMC.preparation | 32.59623289 | 2.47208190 | 0.52773094 |
duration.MCMC.transient | 0.16122569 | 0.15866129 | 5.90620852 |
duration.MCMC.asymptotic | 0.66733418 | 1.42795165 | 5.90620852 |
duration.MCMC.after | 0.00004601 | 0.00004601 | 0.00004292 |
duration.btadjust | 7.84760833 | 2.65933824 | 2.76659298 |
CPUduration | 10.92000000 | 3.16000000 | 44.22000000 |
CPUduration.MCMC.preparation | 7.67000000 | 1.60000000 | 0.00000000 |
CPUduration.MCMC.transient | 0.15566836 | 0.15300000 | 22.04500000 |
CPUduration.MCMC.asymptotic | 0.64433164 | 1.37700000 | 22.04500000 |
CPUduration.MCMC.after | 0.00000000 | 0.00000000 | 0.00000000 |
CPUduration.btadjust | 2.45000000 | 0.03000000 | 0.13000000 |
childCPUduration | NA | NA | NA |
childCPUduration.MCMC.preparation | NA | NA | NA |
childCPUduration.MCMC.transient | NA | NA | NA |
childCPUduration.MCMC.asymptotic | NA | NA | NA |
childCPUduration.MCMC.after | NA | NA | NA |
childCPUduration.btadjust | NA | NA | NA |
We still observed more CPU duration with greta
, although the associated number of effective values for greta
was now 3244, which rendered MCMC CPU efficiency with greta
closer to NIMBLE
.
The function runMCMC_btadjust()
now allows automatic parallelization of different Markov chains of the MCMC parts of the algorithm which can be of interest when we have several Markov chains. We initially planned to run the parallelized versions of both the NIMBLE
and JAGS
models with 3 chains - recall that greta
already uses parallelization. Yet, due to intrincacies of the CRAN process, the vignette did not work on CRAN whereas it did on my computer. I can therefore not put it here. The only difference with the preceding calls are the addition of control.MCMC=list(parallelize=TRUE)
- or of parallelize=TRUE
in control.MCMC
if already present.
We hope we have convinced the R user of Bayesian models that runMCMC_btadjust()
can help having a more efficient, quality oriented use of these types of models while saving analyst’s and potentially computer time. Indeed, to recap, the aim of this function is to run a Markov Chain Monte Carlo (MCMC) for a specified Bayesian model while adapting automatically the burn-in and thinning parameters to meet pre-specified targets in terms of MCMC convergence and number of effective values of MCMC outputs. This is done in only one call to the function that repeatedly calls the MCMC until criteria for convergence and number of effective values are met. The function has four main advantages:
(i) it saves the analyst’s programming time since he/she does not have to repeatedly diagnose and re-run MCMCs until desired levels of convergence and number of effective values are reached;
(ii) it allows a minimal, normalized quality control of MCMC outputs by allowing to meet pre-specified levels in terms of convergence and number of quasi-independent values;
(iii) it may save computer’s time when compared to cases where we have to restart the MCMC from the beginning if it has not converged or reached the specified number of effective values;
(iv) it can be applied with different MCMC R languages - at present greta
, NIMBLE
and JAGS
. This comes with two positive consequences in practice: first, allowing the user a more rigorous comparison between the three Bayesian fitting languages in terms of comparability of inference and of MCMC efficiency - especially in terms of CPU time per effective value; second, making it easier to develop the same Bayesian model with these different languages, which is to our experience welcome in practical cases, since these different languages have advantages over the other ones that vary from one context to the other.