`library(GLMMselect)`

The `GLMMselect`

package provides a novel Bayesian model
selection approach for the analysis of Poisson and binary data.
`GLMMselect`

contains functions to simultaneously select
fixed effects and random effects for non-Gaussian data. In the
`GLMMselect`

package, we use pseudo likelihood method to
solve the problem that the random effects cannot be analytically
integrated out of GLMMs. In addition, we develop a fractional Bayes
factor (FBF) approach to obtain posterior probabilities of the competing
models. Full details on the methods implemented in
`GLMMselect`

can be found in the manuscript (Xu et
al. 202X).

This vignette contains a estimated example for count data and a case
study presented in the manuscript (Xu et al. 202X) to illustrate how
`GLMMselect`

works. For the simulated example, the count data
are simulated from Poisson GLMM with four candidate fixed effects and
two types of random effects (spatial random effects and overdispersion
random effects).

The function `GLMMselect()`

implemented in the
`GLMMselect`

package is described below:

- Function
`GLMMselect()`

performs the ARM and HCM (Xu et al. 202X) which performs model selection for generalized linear mixed models, given a numeric vector of binary or count data`Y`

, a matrix of covariates`X`

, a list of covariance matrices for random effects`Sigma`

, a list of design matrices for random effects`Z`

, the description of the error distribution`family`

, and the prior for variance component of random effects`prior`

. Arguments`pip_fixed`

and`pip_random`

are the thresholds that if the posterior inclusion probability of fixed effects or random effects is larger than`pip_fixed`

or`pip_random`

, the corresponding fixed effects and random effects will be included in the best model. In addition, the argument`NumofModel`

can adjust the number of models with the largest posterior probabilities which are printed out in the output. The function`GLMMselect()`

returns a list of the indices of fixed effects and random effects that are identified in the best model, a table of models’ posterior probabilities (MPP) which are sorted in decreasing order, a table for fixed effects’ inference, and a table for random effects’ inference which includes point estimates and standard deviations for coefficients, and posterior inclusion probabilities (PIP) for each effect.

The generalized linear mixed model used in the
`GLMMselect`

package is \[
g(\pmb{Y})=X\pmb{\beta}+\sum_{q=1}^Q Z_q \pmb{\alpha}_q,\]
where

- \(g()\) is the link function.
- \(\pmb{Y}\) is an \(n \times 1\) vector of observations, either binary data or count data.
- \(X\) is the matrix of covariates.
- \(\pmb{\beta}\) is the \(p \times 1\) vector of regression coefficients.
- \(Z_q\) is an incidence matrix relating the vector of random effects \(\pmb{\alpha}_q\) to the observations.
- \(\pmb{\alpha}_q\) is a vector of random effects with covariance matrix \(\tau_q \Sigma_q\). \(\Sigma_q\) is a known semi-positive definite matrix, and \(\tau_q\) is an unknown scalar. The common types of \(\pmb{\alpha}_q\) can be spatial random effects, longitudinal random effects and overdispersion random effects.

Currently, the `GLMMselect`

package can analyze binary
responses (`family = "bernoulli"`

) and Poisson responses
(`family = "poisson"`

). The manuscript that develops the
methods in GLMMselect (Xu et al. 202X) provides details on the priors
for \(\pmb{\beta}\) and \(\tau_q\), as well as details about the FBF
approach used by GLMMselect.

The `GLMMselect()`

function requires a vector of
observations (either Bernoulli or assumed Poisson distributed), a matrix
of covariates, a list of design matrices for random effects, and a list
of covariance matrices for random effects.

The vector of observations must be a numeric \(n \times 1\) vector. In the GLMMselect package, there is an example simulated from the Poisson generalized linear mixed model with four candidate covariates, a vector of spatial random effects, and a vector of overdispersion random effects. Here are the first five elements of the Poisson simulated vector of observations:

```
data("Y")
1:5]
Y[#> [1] 33 47 187 26 35
```

The covariate matrix passed to the function contains all candidate covariates. Each column corresponds to a candidate covariate. Each row corresponds to an observation. In this example, the covariate matrix contains four candidate covariates. The covariates in the first two columns are in the true model. Here are the first five rows of the covariate matrix:

```
data("X")
1:5,]
X[#> [,1] [,2] [,3] [,4]
#> [1,] 0.3586051 -1.39887035 -1.2441304 -0.97847040
#> [2,] -0.1766957 0.09031447 0.5870933 -1.19242832
#> [3,] 0.7548008 0.54639426 -0.2518881 0.02433473
#> [4,] 0.0854506 -0.97412286 0.1203769 0.04743402
#> [5,] 0.5478787 -1.19069820 -1.3385666 -0.28752443
```

The design matrices for candidate random effects are passed to the GLMMselect function as a list. In this example, this is a list of two design matrices for two types of random effects. The first component is for spatial random effects. The second component is for overdispersion random effects. Here are the first five rows and the first five columns for each of these design matrices:

```
data("Z")
1]][1:5,1:5]
Z[[#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 0 0 0 0
#> [2,] 0 1 0 0 0
#> [3,] 0 0 1 0 0
#> [4,] 0 0 0 1 0
#> [5,] 0 0 0 0 1
2]][1:5,1:5]
Z[[#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 0 0 0 0
#> [2,] 0 1 0 0 0
#> [3,] 0 0 1 0 0
#> [4,] 0 0 0 1 0
#> [5,] 0 0 0 0 1
```

The covariance matrices for the candidate random effects are also passed to the GLMMselect function as a list. In this example, this is a list of two covariance matrices for two types of random effects. The first component is for spatial random effects. The second componet is for overdispersion random effects. Here are the first five rows and the first five columns for each of these covariance matrices:

```
data("Sigma")
1]][1:5,1:5]
Sigma[[#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1.2861009 0.7911009 0.4988302 0.3047474 0.1670184
#> [2,] 0.7911009 0.9938302 0.5970181 0.3611012 0.2041574
#> [3,] 0.4988302 0.5970181 0.8561012 0.4964282 0.2877719
#> [4,] 0.3047474 0.3611012 0.4964282 0.7827719 0.4448714
#> [5,] 0.1670184 0.2041574 0.2877719 0.4448714 0.7498331
2]][1:5,1:5]
Sigma[[#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 0 0 0 0
#> [2,] 0 1 0 0 0
#> [3,] 0 0 1 0 0
#> [4,] 0 0 0 1 0
#> [5,] 0 0 0 0 1
```

Data `Y`

,`X`

, `Sigma`

, and
`Z`

above are attached in the `GLMMselect`

package.

In this example, we use `GLMMselect`

to analyze Poisson
count data with an approximate reference (AR) prior for the variance
components of random effects.

```
<- GLMMselect(Y=Y, X=X, Sigma=Sigma, Z=Z,
Model_selection_output family="poisson", prior="AR", offset=NULL)
$BestModel
Model_selection_output#> $covariate_inclusion
#> [1] 1 2
#>
#> $random_effect_inclusion
#> [1] 1
$PosteriorProb
Model_selection_output#> x1 x2 x3 x4 r1 r2 MPP
#> 1 1 1 0 0 1 0 0.624
#> 2 1 1 1 0 1 0 0.137
#> 3 1 1 0 1 1 0 0.114
#> 4 1 1 1 1 1 0 0.055
#> 5 1 1 0 0 0 1 0.032
#> 6 1 1 0 0 1 1 0.022
#> 7 1 1 1 0 0 1 0.006
#> 8 1 1 0 1 0 1 0.004
#> 9 1 1 1 0 1 1 0.002
#> 10 1 1 0 1 1 1 0.002
$FixedEffect
Model_selection_output#> Est SD PIP
#> x1 1.028 0.028 1.000
#> x2 0.997 0.027 1.000
#> x3 -0.012 0.020 0.202
#> x4 0.000 0.023 0.177
$RandomEffect
Model_selection_output#> Est SD PIP
#> r1 0.042 0.016 0.956
#> r2 0.012 0.006 0.070
```

`GLMMselect()`

outputs the indices of the covariates and
the indices of the random effects which are in the best model. The true
model contains the first two covariates and spatial random effects.
`GLMMselect`

correctly identifies these two covariates and
the spatial random effects.

Here is a case study to help illustrate how to use
`GLMMselect`

. This dataset provides the number of male lip
cancer cases in the 56 counties of Scotland during the period 1975-1980,
as well as the percentage of the work force employed in agriculture,
fishing, or forestry (AFF) as a covariate. The model we consider is
\[
y_i|\mu_i \stackrel{ind}{\sim} \mbox{Poisson}(\mu_i), \ \ \ i=1\dots 56,
\\
\log(\mu_i) = \log(n_i)+\pmb{x}_i^T\pmb{\beta}+\alpha_{1i}+\alpha_{2i},
\\
\pmb{\alpha}_1 \sim N(\pmb{0},\tau_1 \Sigma_1), \\
\pmb{\alpha}_2 \sim N(\pmb{0}, \tau_2 \Sigma_2).
\]

\(n_i\) is the expected number of lip cancer cases in the \(i^{th}\) county, which are assumed to be known constants. \(\pmb{\alpha}_1\) is a vector of spatial random effects. \(\pmb{\alpha}_2\) is a vector of overdispersion random effects.

In this example, we use `GLMMselect`

to analyze lip cancer
data with a half Cauchy (HC) prior for variance components of random
effects. Data `lipcancer_Y`

,`lipcancer_X`

,
`lipcancer_Sigma`

, and `lipcancer_Z`

are attached
in the `GLMMselect`

package.

```
<- GLMMselect(Y=lipcancer_Y, X=lipcancer_X, Sigma=lipcancer_Sigma, Z=lipcancer_Z,
lip_cancer_output family="poisson", prior="HC", offset=lipcancer_offset)
$BestModel
lip_cancer_output#> $covariate_inclusion
#> [1] 1
#>
#> $random_effect_inclusion
#> [1] 1
$PosteriorProb
lip_cancer_output#> x1 r1 r2 MPP
#> 1 1 1 0 0.905
#> 2 0 1 0 0.084
#> 3 1 1 1 0.009
#> 4 0 1 1 0.002
#> 5 1 0 1 0.000
#> 6 0 0 1 0.000
#> 7 1 0 0 0.000
#> 8 0 0 0 0.000
$FixedEffect
lip_cancer_output#> Est SD PIP
#> x1 0.428 0.128 0.914
$RandomEffect
lip_cancer_output#> Est SD PIP
#> r1 0.486 0.157 1.000
#> r2 0.000 0.052 0.011
```

`ref.ICAR`

provides an objective Bayesian approach for
modeling spatially correlated areal data using an intrinsic conditional
autoregressive prior on a vector of spatial random effects.

Xu, Shuangshuang, Ferreira, M. A. R., Porter, E. M., and Franck, C. T. (202X). Bayesian Model Selection for Generalized Linear Mixed Models, Biometrics, under review.