Multi model usage

bayesnec

The background of bayesnec is covered in the Single model usage vignette. Here we explain multi model usage using bayesnec. In bayesnec it is possible to fit a custom model set, specific model set, or all of the available models. When multiple models are specified the bnec function returns a model weighted estimate of predicted posterior values, based on the "pseudobma" using Bayesian bootstrap through loo_model_weights (Vehtari et al. 2020; Vehtari, Gelman, and Gabry 2017). These are reasonably analogous to the way model weights are generated using AIC or AICc (Burnham and Anderson 2002).

It is also possible to obtain all individual model fits from the fitted bayesnecfit model object if required using the pull_out function, and also to update an existing model fit with additional models, or to drop models using the function amend.

Multi-model inference can be useful where there are a range of plausible models that could be used (Burnham and Anderson 2002) and has been recently adopted in ecotoxicology for Species Sensitivity Distribution (SSD) model inference (Thorley and Schwarz 2018). The approach may have considerable value in concentration-response modelling because there is often no a priori knowledge of the functional form that the response relationship should take. In this case model averaging can be a useful way of allowing the data to drive the model selection process, with weights proportional to how well the individual models fits the data. Well-fitting models will have high weights, dominating the model averaged outcome. Conversely, poorly fitting models will have very low model weights and will therefore have little influence on the outcome. Where multiple models fit the data equally well, these can equally influence the outcome, and the resultant posterior predictions reflect that model uncertainty. It is possible to specify the "stacking" method (Yao et al. 2018) for model weights if desired (through the argument loo_controls) which aims to minimise prediction error. We do not currently recommend using stacking weights given the typical sample sizes associated with most concentration—response experiments, and because the main motivation for model averaging within the bayesnec package is to properly capture model uncertainty rather than reduce prediction error.

Examples

Fitting multiple models and model averaging using the bnec function

Fitting a bnec model

So far we have explored how to fit individual models via the function bnec. The bayesnec package also has the capacity to fit a custom selection of models, pre-specified sets of models, or even all the available models in the package. Note that as these are Bayesian methods requiring multiple Hamiltonian Monte Carlo (HMC) chains, using bnec can be very slow when specifying models = "all" within a bayesnecformula. See details under ?bnec for more information on the models, and model sets that can be specified, as well as the Model details vignette which contains considerable information on the available models in bnec and their appropriate usage. In general it is safe to call models = "all", because by default bnec will discard invalid models and the model averaging approach should result in an overall fit that reflects the models that best fit the data. However, because the HMC can be slow for models = "all" we do recommend testing your fit using a single (likely) model in the first instance, to make sure there are no issues with dispersion, the appropriate distribution is selected and model fitting appears robust (see the Single model usage vignette for more details).

To run this vignette, we will need the ggplot2 package:

library(ggplot2)
library(bayesnec)
# function which generates an "ecx4param" model
make_ecx_data <- function(top, bot, ec50, beta, x) {
  top + (bot - top) / (1 + exp((ec50 - x) * exp(beta)))
}
x <- seq(0, 10, length = 12)
y <- make_ecx_data(x = x, bot = 0, top = 1, beta = 0.5, ec50 = 5)
set.seed(333)
df_ <- data.frame(x = rep(x, 15), y = rnorm(15 * length(x), y, 0.2))
exp_5 <- bnec(y ~ crf(x, model = "decline"), data = df_, iter = 2e3)

Here we run bnec using model = "decline" using a simulated data example for a Beta-distributed response variable. We are using the "decline" set here because we are not going to consider hormesis (these allow an initial increase in the response), largely to save time in fitting this example. Moreover, you might want to consider saving the output as an .RData file—doing so can be a useful way of fitting large model sets (ie model = "all", or model = "decline") at a convenient time (this can be very slow, and may be run overnight for example) so you can reload them later to explore, plot, extract values, and amend the model set as required.

Whenever the model argument of crf is a model set, or a concatenation of specific models, bnec will return an object of class bayesmanecfit.

Exploring a bayesmanecfit model

We have created some plotting method functions for both the bayesnecfit and bayesmanecfit model fits, so we can plot a bayesmanecfit model object simply with autoplot.

autoplot(exp_5)

plot of chunk exmp2-decline

The default plot looks exactly the same as our regular bayesnecfit plot, but the output is based on a weighted average of all the model fits in the model = "decline" model set that were able to fit successfully using the default bnec settings. Because the model set contains a mix of NEC and ECx models, no-effect toxicity estimate reported by default on this plot (and in the summary output below) is reported as a N(S)EC (see Fisher et al. (2023)).

The fitted bayesmanecfit object contains different elements to the bayesnecfit. In particular, mod_stats contains the table of model fit statistics for all the fitted models. This includes the model name, the WAIC (as returned from brms), wi (the model weight, currently defaulting to "pseudobma" using Bayesian bootstrap from loo), and the dispersion estimates (only reported if response is modelled with a Poisson or Binomial distribution, otherwise NA).

exp_5$mod_stats
#>               model      waic           wi dispersion_Estimate dispersion_Q2.5 dispersion_Q97.5
#> nec4param nec4param -84.96557 3.288210e-01                  NA              NA               NA
#> neclin       neclin -52.48550 2.765032e-04                  NA              NA               NA
#> ecxlin       ecxlin -27.78986 6.449676e-06                  NA              NA               NA
#> ecx4param ecx4param -82.07242 1.014342e-01                  NA              NA               NA
#> ecxwb1       ecxwb1 -75.03545 1.148493e-02                  NA              NA               NA
#> ecxwb2       ecxwb2 -85.41745 3.002406e-01                  NA              NA               NA
#> ecxll5       ecxll5 -84.23712 1.602656e-01                  NA              NA               NA
#> ecxll4       ecxll4 -81.99187 9.747066e-02                  NA              NA               NA

We can obtain a neater summary of the model fit by using the summary method for a bayesmanecfit object. A list of fitted models, and model weights are provided. In addition, the model averaged no-effect estimate is reported, in this case as an N(S)EC. For this example the nec4param and ecxwb2 models have the highest weights.

All these model fits are satisfactory despite the relatively low number of iterations set in our example, but the summary would also include a warning if there were fits with divergent transitions.

summary(exp_5)
#> Object of class bayesmanecfit
#> 
#>  Family: gaussian  
#>   Links: mu = identity; sigma = identity  
#> 
#> Number of posterior draws per model:  1600
#> 
#> Model weights (Method: pseudobma_bb_weights):
#>             waic   wi
#> nec4param -84.97 0.33
#> neclin    -52.49 0.00
#> ecxlin    -27.79 0.00
#> ecx4param -82.07 0.10
#> ecxwb1    -75.04 0.01
#> ecxwb2    -85.42 0.30
#> ecxll5    -84.24 0.16
#> ecxll4    -81.99 0.10
#> 
#> 
#> Summary of weighted N(S)EC posterior estimates:
#> NB: Model set contains a combination of ECx and NEC
#>     models, and is therefore a model averaged
#>     combination of NEC and NSEC estimates.
#>        Estimate Q2.5 Q97.5
#> N(S)EC     3.34 1.98  3.97
#> 
#> 
#> Bayesian R2 estimates:
#>           Estimate Est.Error Q2.5 Q97.5
#> nec4param     0.86      0.01 0.84  0.87
#> neclin        0.83      0.01 0.80  0.84
#> ecxlin        0.80      0.01 0.77  0.82
#> ecx4param     0.85      0.01 0.84  0.87
#> ecxwb1        0.85      0.01 0.83  0.86
#> ecxwb2        0.86      0.01 0.84  0.87
#> ecxll5        0.86      0.01 0.84  0.87
#> ecxll4        0.85      0.01 0.84  0.87

The bayesmanecfit object also contains all of the individual brms model fits, which can be extracted using the pull_out function. For example, we can pull out and plot the model ecx4param.

exp_5_nec4param <- pull_out(exp_5, model = "ecx4param")
autoplot(exp_5_nec4param)

plot of chunk exmp2-ecx4param

This would extract the nec4param model from the bayesmanecfit and create a new object of class bayesnecfit which contains just a single fit. This would be identical to fitting the ecx4param as a single model using bnec. All of the models in the bayesmanecfit can be simultaneously plotted using the argument all_models = TRUE.

autoplot(exp_5, all_models = TRUE)

plot of chunk exmp2-allmods

You can see that some of these models represent very bad fits, and accordingly have extremely low model weights, such as the ecxlin and neclin models in this example. There is no harm in leaving in poor models with low weight, precisely because they have such a low model weight and therefore will not influence posterior predictions. However, it is important to assess the adequacy of model fits of all models, because a poor fit may be more to do with a model that has not converged.

We can assess the chains for one of the higher weighted models to make sure they show good convergence. It is good practice to do this for all models with a high weight, as these are the models most influencing the multi-model inference.

plot(pull_brmsfit(exp_5, "ecxwb1"))

plot of chunk exmp2-goodmod

Assessing chains for all the models in bayesmanecfit does not work as well using the default brms plotting method. Instead use check_chains and make sure to pass a filename argument, which means plots are automatically saved to pdf with a message.

check_chains(exp_5, filename = "example_5_all_chains")

We can also make a plot to compare the posterior probability density to that of the prior using the check_priors function, for an individual model fit, but also saving all fits to a file in the working directory.

check_priors(pull_out(exp_5, "nec4param"))

plot of chunk exmp2-checkpriors

check_priors(exp_5, filename = "example_5_all_priors")

Where a large number of models are failing to converge, obviously it would be better to adjust iter and warmup in the bnec call, as well as some of the other arguments to brms such as adapt_delta. See the ?brm documentation for more details. It is possible to investigate if models from a bayesmanecfit class achieved convergence according to Rhat:

rhat(exp_5) |>
  sapply("[[", "failed")
#> nec4param    neclin    ecxlin ecx4param    ecxwb1    ecxwb2    ecxll5    ecxll4 
#>     FALSE     FALSE     FALSE     FALSE     FALSE     FALSE     FALSE     FALSE

Here we get a message because none of our models failed the default Rhat criterion. A more conservative cut off of 1.01 can also be used by changing the default argument to the desired value. In this case one model fails, although we note that this is a very stringent criterion, and we have also used less than the default bayesnec value of iter (10,000).

rhat(exp_5, rhat_cutoff = 1.01) |>
  sapply("[[", "failed")
#> nec4param    neclin    ecxlin ecx4param    ecxwb1    ecxwb2    ecxll5    ecxll4 
#>     FALSE     FALSE     FALSE     FALSE     FALSE      TRUE      TRUE     FALSE

Extracting toxicity values

The models prefixed with ecx are all models that do not have the NEC as a parameter in the model. That is, they are smooth curves as a function of concentration and have no breakpoint (no true No-Effect-Concentration, NEC). Thus the reported no-effect toxicity estimate is the NSEC (Fisher and Fox 2023) (see the Model details vignette for more details). If a true model averaged estimate of NEC based only on threshold models is desired, this can be obtained with model = "nec". We can use the helper functions pull_out and amend to alter the model set as required. pull_out has a model argument and can be used to pull a single model out (as above) or to pull out a specific set of models.

We can use this to obtain first a set of NEC only models from the existing set.

exp_5_nec <- pull_out(exp_5, model = "nec")

In this case, because we have already fitted "decline" models, we can ignore the message regarding the missing NEC models—these are all models that are not appropriate for a Beta family with a logit link function, or allow hormesis, which we did not consider in this example.

Now we have two model sets, an NEC set, and a mixed NEC and ECx set. Of course, before we use this model set for any inference, we would need to check the chain mixing and acf plot for each of the input models. For the “all” set, the model with the highest weight is nec4param.

Now we can use the ecx function to get EC10 and EC50 values. We can do this using our all model set, because it is valid to use NEC models for estimating ECx (see more information in the Model details vignette).

ECx10 <- ecx(exp_5, ecx_val = 10)
ECx50 <- ecx(exp_5, ecx_val = 50)
ECx10
#>      Q50     Q2.5    Q97.5 
#> 3.631996 2.739285 4.157703 
#> attr(,"resolution")
#> [1] 1000
#> attr(,"ecx_val")
#> [1] 10
#> attr(,"toxicity_estimate")
#> [1] "ecx"
ECx50
#>      Q50     Q2.5    Q97.5 
#> 5.100009 4.852034 5.358647 
#> attr(,"resolution")
#> [1] 1000
#> attr(,"ecx_val")
#> [1] 50
#> attr(,"toxicity_estimate")
#> [1] "ecx"

The weighted NEC estimates can be extracted directly from the NEC model set object, as they are an explicit parameter in these models.

NECvals <- exp_5_nec$w_ne
NECvals
#> Estimate     Q2.5    Q97.5 
#> 3.533912 3.160582 4.035932

Note that the new NEC estimates from the nec-containing model fits are slightly higher than those reported in the summary output of all the fitted models. This can happen for smooth curves, which is what was used as the underlying data generation model in the simulations here, and is explored in more detail in the Compare posteriors vigenette.

Putting it all together

Now we can make a combined plot of our output, showing the model averaged NEC model and the “all averaged model”, along with the relevant thresholds.

preds <- exp_5_nec$w_pred_vals$data

autoplot(exp_5, nec = FALSE, all = FALSE) +
  geom_vline(mapping = aes(xintercept = ECx10, colour = "ECx 10"),
             linetype = c(1, 3, 3), key_glyph = "path") +
  geom_vline(mapping = aes(xintercept = ECx50, colour = "ECx 50"),
             linetype = c(1, 3, 3), key_glyph = "path") +
  geom_vline(mapping = aes(xintercept = NECvals, colour = "NEC"),
             linetype = c(1, 3, 3), key_glyph = "path") +
  scale_color_manual(values = c("ECx 10" = "orange", "ECx 50" = "blue",
                                "NEC" = "darkgrey"), name = "") +
  geom_line(data = preds, mapping = aes(x = x, y = Estimate),
            colour = "tomato", linetype = 2) +
  geom_line(data = preds, mapping = aes(x = x, y = Q2.5),
            colour = "tomato", linetype = 2) +
  geom_line(data = preds, mapping = aes(x = x, y = Q97.5),
            colour = "tomato", linetype = 2) +
  theme(legend.position = c(0.8, 0.8),
        axis.title = element_text(size = 16),
        axis.text = element_text(size = 12),
        strip.text.x = element_text(size = 16))

plot of chunk exmp2-pretty

References

Burnham, K P, and D R Anderson. 2002. Model Selection and Multimodel Inference; A Practical Information-Theoretic Approach. 2nd ed. New York: Springer.
Fisher, Rebecca, and David R Fox. 2023. “Introducing the No Significant Effect Concentration (NSEC).” Journal Article. Environmental Toxicology and Chemistry 42 (9): 2019–28. https://doi.org/https://doi.org/10.1002/etc.5610.
Fisher, Rebecca, David R. Fox, Andrew P. Negri, Joost van Dam, Florita Flores, and Darren Koppel. 2023. “Methods for Estimating No-Effect Toxicity Concentrations in Ecotoxicology.” Integrated Environmental Assessment and Management n/a (n/a). https://doi.org/https://doi.org/10.1002/ieam.4809.
Thorley, Joe, and Carl Schwarz. 2018. ssdtools: Species Sensitivity Distributions. https://CRAN.R-project.org/package=ssdtools.
Vehtari, Aki, Jonah Gabry, Mans Magnusson, Yuling Yao, Paul-Christian Bürkner, Topi Paananen, and Andrew Gelman. 2020. Loo: Efficient Leave-One-Out Cross-Validation and WAIC for Bayesian Models. https://mc-stan.org/loo/.
Vehtari, Aki, Andrew Gelman, and Jonah Gabry. 2017. “Practical Bayesian Model Evaluation Using Leave-One-Out Cross-Validation and WAIC.” Statistics and Computing 27: 1413–32. https://doi.org/10.1007/s11222-016-9696-4.
Yao, Yuling, Aki Vehtari, Daniel Simpson, and Andrew Gelman. 2018. Using Stacking to Average Bayesian Predictive Distributions (with Discussion).” Bayesian Analysis 13: 917–1007.