Selective Inference for Linear Mixed and Additive Models

David Rügamer 2021-04-13


This vignette describes the generic use of the selfmade software to produce valid post-selection inference, or more specifically selective inference, for linear mixed and additive models after any type of variable selection mechanism, which can be repeated in a bootstrap-like manner.


Rügamer, D., & Greven, S. (2018). Selective inference after likelihood-or test-based model selection in linear models. Statistics & Probability Letters, 140, 7-12.

Rügamer, D., & Greven, S. (2020). Inference for (L_2)-Boosting. Statistics and Computing, 30(2), 279-289.


selection_function <- function(y)
  # based on any input y of the same dimension as the original response
  # a model is selected and mapped to an integer value
  # ....
  best_model_index <- get_best_model(list_of_models)

Note that the selection_function should return the original result when called with the original data vector (y).


  1. Run the experiment with the original_response
  2. Save the model selection result (e.g., as integer indicating the selected model) as well as the final model final_model (after refitting the model with gamm4 if a different package has been used for model selection)
  3. Define the model selection function (selection_function)
  4. Define the wrapper (check_congruency) function returning a logical value whether the result of any model call of selection_function is equivalent to the original model selection result
  5. Run the mocasin-function providing selective inference as follows:
res <- mocasin(mod = final_model,
               checkFun = check_congruency,
               this_y = original_response
               # further options


Example 1

dat <- gamSim(1,n=500,scale=2) ## simulate 4 term additive truth

dat$y <- 3 + dat$x0^2 + rnorm(n=500)
br <- gamm4(y~ s(x0) + s(x1), data = dat)
summary(br$gam) ## summary of gam

# do not use any selection
# - hence it's not necessary to define selection_function
#   and the checl_congruency always returns TRUE
checkFun <- function(yb) TRUE

# calculate selective inference, which, in this case,
# except for an approximation error, should be equivalent
# to the unconditional inference
res <- mocasin(br, this_y = dat$y,
               checkFun = checkFun,
               nrlocs = c(0.7,1),
               nrSamples = 1000, trace = FALSE)
# we get very similar results using"rbind", res$selinf)

Example 2

# use the ham data and use scaled information liking
# as response
ham$Informed.liking <- scale(ham$Informed.liking)

# We first define a function to fit a model based on response
# This function is usually not required but can be 
# specified in order to define the selection_function
# as a function of the model instead of as a function
# of the response vector
modFun <- function(y)
  ham$y <- y
  lmer(y ~ Gender + Information * Product + (1 | Consumer) +
  (1 | Product), data=ham)

# define the selection_function:
# here this is done as function of a model
# which, in combination with modFun, can then
# be used as function 
selFun <- function(mod) step(mod, reduce.fixed = FALSE)

# define a function which extracts the results 
# of the selection procedure
extractSelFun <- function(this_mod){

this_mod <- attr(this_mod, "model")
  return(attr(this_mod$coefficients, "names")) else
             names(getME(this_mod, "theta"))))


# Now we run the initial model selection on the 
# orginal data, which is a 
# backward elimination of non-significant effects:
(step_result <- selFun(modFun(ham$Informed.liking)))
attr(step_result, "model")
# Elimination tables for random- and fixed-effect terms:
(sel <- extractSelFun(step_result))

# Now we can define the function checking the congruency
# with the original selection
checkFun <- function(yb){ 

 this_mod <- modFun(yb)
 setequal( extractSelFun(selFun(this_mod)), sel )

# and compute valid p-values conditional on the selection
# (this takes some time and will produce a lot of warnings)
res <- mocasin(attr(step_result, "model"), this_y = ham$Informed.liking,
          checkFun = checkFun, which = 1:4, nrSamples = 50, trace = FALSE)


Example 3

# Run an AIC comparison between different additive models 
# fitted with mgcv::gam

# create data and models
# use enough noise to get a diverse model selection
dat <- gamSim(1,n=400,dist="normal",scale=10)
b0123 <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat)
b123 <- gam(y~s(x1)+s(x2)+s(x3),data=dat)
b013 <- gam(y~s(x0)+s(x1)+s(x3),data=dat)

# seems that the second model seems to be the most 
# 'appropriate' one
which.min(AIC(b0123, b123, b013)$AIC)
# and refit the model with gamm4
b123_gamm4 <- gamm4(y~s(x0)+s(x1)+s(x3),data=dat)

# define selection_function
selection_function <- function(y)
  dat$y <- y
  list_of_models <- list(

  # return an integer value which model is best
    which.min(sapply(list_of_models, AIC))

# define the congruency function
checkFun <- function(y) selection_function(y)==2

# compute inference
res <- mocasin(mod = b123_gamm4,
               checkFun = checkFun,
               nrlocs = 3, # test one position of the spline
               nrSamples = 10)