2 Introduction to estimands and estimation methods

2.1 Estimands

The ICH E9(R1) addendum on estimands and sensitivity analyses describes a systematic approach to ensure alignment among clinical trial objectives, trial execution/conduct, statistical analyses, and interpretation of results (ICH E9 working group (2019)). As per the addendum, an estimand is a precise description of the treatment effect reflecting the clinical question posed by the trial objective which summarizes at a population-level what the outcomes would be in the same patients under different treatment conditions being compared. One important attribute of an estimand is a list of possible intercurrent events (ICEs), i.e. of events occurring after treatment initiation that affect either the interpretation or the existence of the measurements associated with the clinical question of interest, and the definition of appropriate strategies to deal with ICEs. The three most relevant strategies for the purpose of this document are the hypothetical strategy, the treatment policy strategy, and the composite strategy. For the hypothetical strategy, a scenario is envisaged in which the ICE would not occur. Under this scenario, endpoint values after the ICE are not directly observable and treated using models for missing data. For the treatment policy strategy, the treatment effect in the presence of the ICEs is targeted and analyses are based on the observed outcomes regardless whether the subject had an ICE or not. For the composite strategy, the ICE itself is included as a component of the endpoint.

2.2 Alignment between the estimand and the estimation method

The ICH E9(R1) addendum distinguishes between ICEs and missing data (ICH E9 working group (2019)). Whereas ICEs such as treatment discontinuations reflect clinical practice, the amount of missing data can be minimized in the conduct of a clinical trial. However, there are many connections between missing data and ICEs. For example, it is often difficult to retain subjects in a clinical trial after treatment discontinuation and a subject’s dropout from the trial leads to missing data. As another example, outcome values after ICEs addressed using a hypothetical strateg are not directly observable under the hypothetical scenario. Consequently, any observed outcome values after such ICEs are typically discarded and treated as missing data.

The addendum proposes that estimation methods to address the problem presented by missing data should be selected to align with the estimand. A recent overview of methods to align the estimator with the estimand is Mallinckrodt et al. (2020). A short introduction on estimation methods for studies with longitudinal endpoints can also be found in Wolbers et al. (2022). One prominent statistical method for this purpose is multiple imputation (MI), which is the target of the rbmi package.

2.2.1 Missing data prior to ICEs

Missing data may occur in subjects without an ICE or prior to the occurrence of an ICE. As such missing outcomes are not associated with an ICE, it is often plausible to impute them under a missing-at-random (MAR) assumption using a standard MMRM imputation model of the longitudinal outcomes. Informally, MAR occurs if the missing data can be fully accounted for by the baseline variables included in the model and the observed longitudinal outcomes, and if the model is correctly specified.

2.2.2 Implementation of the hypothetical strategy

The MAR imputation model described above is often also a good starting point for imputing data after an ICE handled using a hypothetical strategy (Mallinckrodt et al. (2020)). Informally, this assumes that unobserved values after the ICE would have been similar to the observed data from subjects who did not have the ICE and remained under follow-up. However, in some situations, it may be more reasonable to assume that missingness is “informative” and indicates a systematically better or worse outcome than in observed subjects. In such situations, MNAR imputation with a \(\delta\)-adjustment could be explored as a sensitivity analysis. \(\delta\)-adjustments add a fixed or random quantity to the imputations in order to make the imputed outcomes systematically worse or better than those observed as described in Cro et al. (2020). In rbmi only fixed \(\delta\)-adjustments are implemented.

2.2.3 Implementation of the treatment policy strategy

Ideally, data collection continues after an ICE handled with a treatment policy strategy and no missing data arises. Indeed, such post-ICE data are increasingly systematically collected in RCTs. However, despite best efforts, missing data after an ICE such as study treatment discontinuation may still occur because the subject drops out from the study after discontinuation. It is difficult to give definite recommendations regarding the implementation of the treatment policy strategy in the presence of missing data at this stage because the optimal method is highly context dependent and a topic of ongoing statistical research.

For ICEs which are thought to have a negligible effect on efficacy outcomes, standard MAR-based imputation may be appropriate. In contrast, an ICE such as treatment discontinuation may be expected to have a more substantial impact on efficacy outcomes. In such settings, the MAR assumption may still be plausible after conditioning on the subject’s time-varying treatment status (Guizzaro et al. (2021)). In this case, one option is to impute missing post-discontinuation data based on subjects who also discontinued treatment but continued to be followed up (Polverejan and Dragalin (2020)). Another option which may require somewhat less post-discontinuation data is to include all subjects in the imputation procedure but to model post-discontinuation data by using a time-varying treatment status indicators (e.g. time-varying indicators of treatment compliance, discontinuation, or initiation of rescue treatment) (Guizzaro et al. (2021)). In this approach, post-ICE outcomes are included in every step of the analysis, including in the fitting of the imputation model. It assumes that ICEs may impact post-ICE outcomes but that otherwise missingness is non-informative. The approach also assumes that the time-varying covariates do not contain missing values, deviations in outcomes after the ICE are correctly modeled by these time-varying covariates, and that sufficient post-ICE data are available to inform the regression coefficients of the time-varying covariates. These proposals are relatively recent and there remain open questions regarding the appropriate trade-off between model complexity (e.g. should the model account for a potentially differential effect on post-ICE outcomes depending on the timing of the ICE?) and the variance in the resulting treatment effect estimate. More generally, it is not yet established how much post-discontinuation data is required to implement such methods robustly and without the risk of substantial inflation of variance.

In some trial settings, only few subjects discontinue the randomized treatment. In other settings, treatment discontinuation rates are higher but it is difficult to retain subjects in the trial after treatment discontinuation leading to sparse data collection after treatment discontinuation. In both settings, the amount of available data after treatment discontinuation may be insufficient to inform an imputation model which explicitly models post-discontinuation data. Depending on the disease area and the anticipated mechanism of action of the intervention, it may be plausible to assume that subjects in the intervention group behave similarly to subjects in the control group after the ICE treatment discontinuation. In this case, reference-based imputation methods are an option (Mallinckrodt et al. (2020)). Reference-based imputation methods formalize the idea to impute missing data in the intervention group based on data from a control or reference group. For a general description and review of reference-based imputation methods, we refer to Carpenter, Roger, and Kenward (2013), Cro et al. (2020), I. White, Royes, and Best (2020) and Wolbers et al. (2022). For a technical description of the implemented statistical methodology for reference-based imputation, we refer to section 3 (in particular section 3.4).

2.2.4 Implementation of the composite strategy

The composite strategy is typically applied to binary or time-to-event outcomes but it can also be used for continuous outcomes by ascribing a suitably unfavorable value to patients who experience ICEs for which a composite strategy has been defined. One possibility to implement this is to use MI with a \(\delta\)-adjustment for post-ICE data as described in Darken et al. (2020).

3 Statistical methodology

3.1 Overview of the imputation procedure

Analyses of datasets with missing data always rely on missing data assumptions. The methods described here can be used to produce valid imputations under a MAR assumption or under reference-based imputation assumptions. MNAR imputation based on fixed \(\delta\)-adjustments as typically used in sensitivity analyses such as tipping-point analyses are also supported.

Three general imputation approaches are implemented in rbmi:

Conventional MI based on Bayesian (or approximate Bayesian) posterior draws from the imputation model combined with Rubin’s rules for inference as described in Carpenter, Roger, and Kenward (2013) and Cro et al. (2020).
Conditional mean imputation based on the REML estimate of the imputation model combined with resampling techniques (the jackknife or the bootstrap) for inference as described in Wolbers et al. (2022).
Bootstrapped MI methods based on REML estimates of the imputation model as described in von Hippel and Bartlett (2021).

3.1.1 Conventional MI

Conventional MI approaches include the following steps:

Base imputation model fitting step (Section 3.3)

Fit a Bayesian multivariate normal mixed model for repeated measures (MMRM) to the observed longitudinal outcomes after exclusion of data after ICEs for which reference-based missing data imputation is desired (Section 3.3.3). Draw \(M\) posterior samples of the estimated parameters (regression coefficients and covariance matrices) from this model.
Alternatively, \(M\) approximate posterior draws from the posterior distribution can be sampled by repeatedly applying conventional restricted maximum-likelihood (REML) parameter estimation of the MMRM model to nonparametric bootstrap samples from the original dataset (Section 3.3.4).

Imputation step (Section 3.4)

Take a single sample \(m\) (\(m\in 1,\ldots, M)\) from the posterior distribution of the imputation model parameters.
For each subject, use the sampled parameters and the defined imputation strategy to determine the mean and covariance matrix describing the subject’s marginal outcome distribution for all longitudinal outcome assessments (i.e. observed and missing outcomes).
For each subjects, construct the conditional multivariate normal distribution of their missing outcomes given their observed outcomes (including observed outcomes after ICEs for which a reference-based assumption is desired).
For each subject, draw a single sample from this conditional distribution to impute their missing outcomes leading to a complete imputed dataset.
For sensitivity analyses, a pre-defined \(\delta\)-adjustment may be applied to the imputed data prior to the analysis step. (Section 3.5).

Analysis step (Section 3.6)

Analyze the imputed dataset using an analysis model (e.g. ANCOVA) resulting in a point estimate and a standard error (with corresponding degrees of freedom) of the treatment effect.

Pooling step for inference (Section 3.7)

Repeat steps 2. and 3. for each posterior sample \(m\), resulting in \(M\) complete datasets, \(M\) point estimates of the treatment effect, and \(M\) standard errors (with corresponding degrees of freedom). Pool the \(M\) treatment effect estimates, standard errors, and degrees of freedom using the rules by Barnard and Rubin to obtain the final pooled treatment effect estimator, standard error, and degrees of freedom.

3.1.2 Conditional mean imputation

The conditional mean imputation approach includes the following steps:

Base imputation model fitting step (Section 3.3)

Fit a conventional multivariate normal/MMRM model using restricted maximum likelihood (REML) to the observed longitudinal outcomes after exclusion of data after ICEs for which reference-based missing data imputation is desired (Section 3.3.2).

Imputation step (Section 3.4)

For each subject, use the fitted parameters from step 1. to construct the conditional distribution of missing outcomes given observed outcomes (including observed outcomes after ICEs for which reference-based missing data imputation is desired) as described above.
For each subject, impute their missing data deterministically by the mean of this conditional distribution leading to a complete imputed dataset.
For sensitivity analyses, a pre-defined \(\delta\)-adjustment may be applied to the imputed data prior to the analysis step. (Section 3.5).

Analysis step (Section 3.6)

Apply an analysis model (e.g. ANCOVA) to the completed dataset resulting in a point estimate of the treatment effect.

Jackknife or bootstrap inference step (Section 3.8)

Inference for the treatment effect estimate from 3. is based on re-sampling techniques. Both the jackknife and the bootstrap are supported. Importantly, these methods require repeating all steps of the imputation procedure (i.e. imputation, conditional mean imputation, and analysis steps) on each of the resampled datasets.

3.1.3 Bootstrapped MI

The bootstrapped MI approach includes the following steps:

Base imputation model fitting step (Section 3.3)

Apply conventional restricted maximum-likelihood (REML) parameter estimation of the MMRM model to \(B\) nonparametric bootstrap samples from the original dataset using the observed longitudinal outcomes after exclusion of data after ICEs for which reference-based missing data imputation is desired.

Imputation step (Section 3.4)

Take a bootstrapped dataset \(b\) (\(b\in 1,\ldots, B)\) and its corresponding imputation model parameter estimates.
For each subject (from the bootstrapped dataset), use the parameter estimates and the defined strategy for dealing with their ICEs to determine the mean and covariance matrix describing the subject’s marginal outcome distribution for all longitudinal outcome assessments (i.e. observed and missing outcomes).
For each subjects (from the bootstrapped dataset), construct the conditional multivariate normal distribution of their missing outcomes given their observed outcomes (including observed outcomes after ICEs for which reference-based missing data imputation is desired).
For each subject (from the bootstrapped dataset), draw \(D\) samples from this conditional distributions to impute their missing outcomes leading to \(D\) complete imputed dataset for bootstrap sample \(b\).
For sensitivity analyses, a pre-defined \(\delta\)-adjustment may be applied to the imputed data prior to the analysis step. (Section 3.5).

Analysis step (Section 3.6)

Analyze each of the \(B\times D\) imputed datasets using an analysis model (e.g. ANCOVA) resulting in \(B\times D\) point estimates of the treatment effect.

Pooling step for inference (Section 3.9)

Pool the \(B\times D\) treatment effect estimates as described in von Hippel and Bartlett (2021) to obtain the final pooled treatment effect estimate, standard error, and degrees of freedom.

3.2 Setting, notation, and missing data assumptions

Assume that the data are from a study with \(n\) subjects in total and that each subject \(i\) (\(i=1,\ldots,n\)) has \(J\) scheduled follow-up visits at which the outcome of interest is assessed. In most applications, the data will be from a randomized trial of an intervention vs a control group and the treatment effect of interest is a comparison in outcomes at a specific visit between these randomized groups. However, single-arm trials or multi-arm trials are in principle also supported by the rbmi implementation.

Denote the observed outcome vector of length \(J\) for subject \(i\) by \(Y_i\) (with missing assessments coded as NA (not available)) and its non-missing and missing components by \(Y_{i!}\) and \(Y_{i?}\), respectively. By default, imputation of missing outcomes in \(Y_{i}\) is performed under a MAR assumption in rbmi. Therefore, if missing data following an ICE are to be handled using MAR imputation, this is compatible with the default assumption. As discussed in Section 2, the MAR assumption is often a good starting point for implementing a hypothetical strategy. But also note that observed outcome data after an ICE handled using a hypothetical strategy is not compatible with this strategy. Therefore, we assume that all post-ICE data after ICEs handled using a hypothetical strategy are already set to NA in \(Y_i\) prior calling any rbmi functions. However, any observed outcomes after ICEs handled using a treatment policy strategy should be included in \(Y_i\) as they are compatible with this strategy.

Subjects may also experience up to one ICE after which missing data imputation according to a reference-based imputation method is foreseen. For a subject \(i\) with such an ICE, denote their first visit which is affected by the ICE by \(\tilde{t}_i \in \{1,\ldots,J\}\). For all other subjects, set \(\tilde{t}_i=\infty\). A subject’s outcome vector after setting observed outcomes from visit \(\tilde{t}_i\) onwards to missing (i.e. NA) is denoted as \(Y'_i\) and the corresponding data vector after removal of NA elements as \(Y'_{i!}\).

MNAR \(\delta\)-adjustments are added to the imputed datasets after the formal imputation steps. This is covered in a separate section (Section 3.5).

3.3 The base imputation model

3.3.1 Included data and model specification

The purpose of the imputation model is to estimate (covariate-dependent) mean trajectories and covariance matrices for each group in the absence of ICEs handled using reference-based imputation methods. Conventionally, publications on reference-based imputation methods have implicitly assumed that the corresponding post-ICE data is missing for all subjects (Carpenter, Roger, and Kenward (2013)). We also allow the situation where post-ICE data is available for some subjects but needs to be imputed using reference-based methods for others. However, any observed data after ICEs for which reference-based imputation methods are specified is not compatible with the imputation model described below and they are therefore removed and considered as missing for the purpose of estimating the imputation model, and for this purpose only. For example, if a patient has an ICE addressed with a reference-based method but outcomes after the ICE are collected, these post-ICE outcomes will be excluded when fitting the base imputation model (but they will be included again in the following steps). That is, the base imputation model is fitted to \(Y'_{i!}\) and not to \(Y_{i!}\). If we did not exclude these data, then the imputation model would mistakenly estimate mean trajectories based on a mixture of observed pre- and post-ICE data which are not relevant for reference-based imputations.

Observed post-ICE outcomes in the control or reference group are also excluded from the base imputation model if the user specifies a reference-based imputation strategy for such ICEs. This ensures that an ICE has the same impact on the data included in the imputation model regardless whether the ICE occurred in the control or the intervention group. On the other hand, imputation in the reference group is based on a MAR assumption even for reference-based imputation methods and it may be preferable in some settings to include post-ICE data from the control group in the base imputation model. This can be implemented by specifying a MAR strategy for the ICE in the control group and a reference-based strategy for the same ICE in the intervention group.

The base imputation model of the longitudinal outcomes \(Y'_i\) assumes that the mean structure is a linear function of covariates. Full flexibility for the specification of the linear predictor of the model is supported. At a minimum the covariates should include the treatment group, the (categorical) visit, and treatment-by-visit interactions. Typically, other covariates including the baseline outcome are also included. External time-varying covariates (e.g. calendar time of the visit) as well as internal time-varying (e.g. time-varying indicators of treatment discontinuation or initiation of rescue treatment) may in principle also be included if indicated (Guizzaro et al. (2021)). Missing covariate values are not allowed. This means that the values of time-varying covariates must be non-missing at every visit regardless of whether the outcome is measured or missing.

Denote the \(J\times p\) design matrix for subject \(i\) corresponding to the mean structure model by \(X_i\) and the same matrix after removal of rows corresponding to missing outcomes in \(Y'_{i!}\) by \(X'_{i!}\). Here \(p\) is the number of parameters in the mean structure of the model for the elements of \(Y'_{i!}\). The base imputation model for the observed outcomes is defined as: \[ Y'_{i!} = X'_{i!}\beta + \epsilon_{i!} \mbox{ with } \epsilon_{i!}\sim N(0,\Sigma_{i!!})\] where \(\beta\) is the vector of regression coefficients and \(\Sigma_{i!!}\) is a covariance matrix which is obtained from the complete-data \(J\times J\)-covariance matrix \(\Sigma\) by omitting rows and columns corresponding to missing outcome assessments for subject \(i\).

Typically, a common unstructured covariance matrix for all subjects is assumed for \(\Sigma\) but separate covariate matrices per treatment group are also supported. Indeed, the implementation also supports the specification of separate covariate matrices according to an arbitrarily defined categorical variable which groups the subjects into disjoint subset. For example, this could be useful if different covariance matrices are suspected in different subject strata. Finally, for all imputation methods described below that do not rely on Bayesian model fitting through MCMC, there is further flexibility in the choice of the covariance structure, i.e. unstructured (default), heterogeneous Toeplitz, heterogeneous compound symmetry, and AR(1) covariance structures are supported.

3.3.2 Restricted maximum likelihood estimation (REML)

Frequentist parameter estimation for the base imputation is based on REML. The use of REML as an improved alternative to maximum likelihood (ML) for covariance parameter estimation was originally proposed by Patterson and Thompson (1971). Since then, it has become the default method for parameter estimation in linear mixed effects models. rbmi allows to choose between ML and REML methods to estimate the model parameters, with REML being the default option.

3.3.3 Bayesian model fitting

The Bayesian imputation model is fitted with the R package rstan (Stan Development Team (2020)). rstan is the R interface of Stan. Stan is a powerful and flexible statistical software developed by a dedicated team and implements Bayesian inference with state-of-the-art MCMC sampling procedures. The multivariate normal model with missing data specified in section 3.3.1 can be considered a generalization of the models described in the Stan user’s guide (see Stan Development Team (2020, sec. 3.5)).

The same prior distributions as in the SAS implementation of the “five macros” are used (Roger (2021)), i.e. an improper flat priors for the regression coefficients and a weakly informative inverse Wishart prior for the covariance matrix (or matrices). Specifically, let \(S \in \mathbb{R}^{J \times J}\) be a symmetric positive definite matrix and \(\nu \in (J-1, \infty)\). Then the symmetric positive definite matrix \(x \in \mathbb{R}^{J \times J}\) has density: \[ \text{InvWish}(x \vert \nu, S) = \frac{1}{2^{\nu J/2}} \frac{1}{\Gamma_J(\frac{\nu}{2})} \vert S \vert^{\nu/2} \vert x \vert ^{-(\nu + J + 1)/2} \text{exp}(-\frac{1}{2} \text{tr}(Sx^{-1})). \] For \(\nu > J+1\) the mean is given by: \[ E[x] = \frac{S}{\nu - J - 1}. \] We choose \(S\) equal to the estimated covariance matrix from the frequentist REML fit and \(\nu = J+2\) as these are the lowest degrees of freedom that guarantee a finite mean. Setting the degrees of freedom with such a low \(\nu\) ensures that the prior has little impact on the posterior. Moreover, this choice allows to interpret the parameter \(S\) as the mean of the prior distribution.

As in the “five macros”, the MCMC algorithm is initialized at the parameters from a frequentist REML fit (see section 3.3.2). As described above, we are using only weakly informative priors for the parameters. Therefore, the Markov chain is essentially starting from the targeted stationary posterior distribution and only a minimal amount of burn-in of the chain is required.

3.3.4 Approximate Bayesian posterior draws via the bootstrap

Several authors have suggested that a stabler way to get Bayesian posterior draws from the imputation model is to bootstrap the incomplete data and to calculate REML estimates for each bootstrap sample (Little and Rubin (2002), Efron (1994), Honaker and King (2010), von Hippel and Bartlett (2021)). This method is proper in that the REML estimates from the bootstrap samples are asymptotically equivalent to a sample from the posterior distribution and may provide additional robustness to model misspecification (Little and Rubin (2002, sec. 10.2.3, part 6), Honaker and King (2010)). In order to retain balance between treatment groups and stratification factors across bootstrap samples, the user is able to provide stratification variables for the bootstrap in the rbmi implementation.

3.4 Imputation step

3.4.1 Marginal imputation distribution for a subject - MAR case

For each subject \(i\), the marginal distribution of the complete \(J\)-dimensional outcome vector from all assessment visits according to the imputation model is a multivariate normal distribution. Its mean \(\tilde{\mu}_i\) is given by the predicted mean from the imputation model conditional on the subject’s baseline characteristics, group, and, optionally, time-varying covariates. Its covariance matrix \(\tilde{\Sigma}_i\) is given by the overall estimated covariance matrix or, if different covariance matrices are assumed for different groups, the covariance matrix corresponding to subject \(i\)’s group.

3.4.2 Marginal imputation distribution for a subject - reference-based imputation methods

For each subject \(i\), we calculate the mean and covariance matrix of the complete \(J\)-dimensional outcome vector from all assessment visits as for the MAR case and denote them by \(\mu_i\) and \(\Sigma_i\). For reference-based imputation methods, a corresponding reference group is also required for each group. Typically, the reference group for the intervention group will be the control group. The reference mean \(\mu_{ref,i}\) is defined as the predicted mean from the imputation model conditional on the reference group (rather than the actual group subject \(i\) belongs to) and the subject’s baseline characteristics. The reference covariance matrix \(\Sigma_{ref,i}\) is the overall estimated covariance matrix or, if different covariance matrices are assumed for different groups, the estimated covariance matrix corresponding to the reference group. In principle, time-varying covariates could also be included in reference-based imputation methods. However, this is only sensible for external time-varying covariates (e.g. calendar time of the visit) and not for internal time-varying covariates (e.g. treatment discontinuation) because the latter likely depend on the actual treatment group and it is typically not sensible to assume the same trajectory of the time-varying covariate for the reference group.

Based on these means and covariance matrices, the subject’s marginal imputation distribution for the reference-based imputation methods is then calculated as detailed in Carpenter, Roger, and Kenward (2013, sec. 4.3). Denote the mean and covariance matrix of this marginal imputation distribution by \(\tilde{\mu}_i\) and \(\tilde{\Sigma}_i\). Recall that the subject’s first visit which is affected by the ICE is denoted by \(\tilde{t}_i \in \{1,\ldots,J\}\) (and visit \(\tilde{t}_i-1\) is the last visit unaffected by the ICE). The marginal distribution for the patient \(i\) is then built according to the specific assumption for the data up to and post the ICE as follows:

Jump to reference (JR): the patient’s outcome distribution is normally distributed with the following mean: \[\tilde{\mu}_i = (\mu_i[1], \dots, \mu_i[\tilde{t}_i-1], \mu_{ref,i}[\tilde{t}_i], \dots, \mu_{ref,i}[J])^T.\] The covariance matrix is constructed as follows. First, we partition the covariance matrices \(\Sigma_i\) and \(\Sigma_{ref,i}\) in blocks according to the time of the ICE \(\tilde{t}_i\): \[ \Sigma_{i} = \begin{bmatrix} \Sigma_{i, 11} & \Sigma_{i, 12} \\ \Sigma_{i, 21} & \Sigma_{i,22} \\ \end{bmatrix} \] \[ \Sigma_{ref,i} = \begin{bmatrix} \Sigma_{ref, i, 11} & \Sigma_{ref, i, 12} \\ \Sigma_{ref, i, 21} & \Sigma_{ref, i,22} \\ \end{bmatrix}. \] We want the covariance matrix \(\tilde{\Sigma}_i\) to match \(\Sigma_i\) for the pre-deviation measurements, and \(\Sigma_{ref,i}\) for the conditional components for the post-deviation given the pre-deviation measurements. The solution is derived in Carpenter, Roger, and Kenward (2013, sec. 4.3) and is given by: \[ \begin{matrix} \tilde{\Sigma}_{i,11} = \Sigma_{i, 11} \\ \tilde{\Sigma}_{i, 21} = \Sigma_{ref,i, 21} \Sigma^{-1}_{ref,i, 11} \Sigma_{i, 11} \\ \tilde{\Sigma}_{i, 22} = \Sigma_{ref, i, 22} - \Sigma_{ref,i, 21} \Sigma^{-1}_{ref,i, 11} (\Sigma_{ref,i, 11} - \Sigma_{i,11}) \Sigma^{-1}_{ref,i, 11} \Sigma_{ref,i, 12}. \end{matrix} \]
Copy increments in reference (CIR): the patient’s outcome distribution is normally distributed with the following mean: \[ \begin{split} \tilde{\mu}_i =& (\mu_i[1], \dots, \mu_i[\tilde{t}_i-1], \mu_i[\tilde{t}_i-1] + (\mu_{ref,i}[\tilde{t}_i] - \mu_{ref,i}[\tilde{t}_i-1]), \dots,\\ & \mu_i[\tilde{t}_i-1]+(\mu_{ref,i}[J] - \mu_{ref,i}[\tilde{t}_i-1]))^T. \end{split} \] The covariance matrix is derived as for the JR method.
Copy reference (CR): the patient’s outcome distribution is normally distributed with mean and covariance matrix taken from the reference group: \[ \tilde{\mu}_i = \mu_{ref,i} \] \[ \tilde{\Sigma}_i = \Sigma_{ref,i}. \]
Last mean carried forward (LMCF): the patient’s outcome distribution is normally distributed with the following mean: \[ \tilde{\mu}_i = (\mu_i[1], \dots, \mu_i[\tilde{t}_i-1], \mu_i[\tilde{t}_i-1], \dots, \mu_i[\tilde{t}_i-1])'\] and covariance matrix: \[ \tilde{\Sigma}_i = \Sigma_i.\]

3.4.3 Imputation of missing outcome data

The joint marginal multivariate normal imputation distribution of subject \(i\)’s observed and missing outcome data has mean \(\tilde{\mu}_i\) and covariance matrix \(\tilde{\Sigma}_i\) as defined above. The actual imputation of the missing outcome data is obtained by conditioning this marginal distribution on the subject’s observed outcome data. Of note, this approach is valid regardless whether the subject has intermittent or terminal missing data.

The conditional distribution used for the imputation is again a multivariate normal distribution and explicit formulas for the conditional mean and covariance are readily available. For completeness, we report them here with the notation and terminology of our setting. The marginal distribution for the outcome of patient \(i\) is \(Y_i \sim N(\tilde{\mu}_i, \tilde{\Sigma}_i)\) and the outcome \(Y_i\) can be decomposed in the observed (\(Y_{i,!}\)) and the unobserved (\(Y_{i,?}\)) components. Analogously the mean \(\tilde{\mu}_i\) can be decomposed as \((\tilde{\mu}_{i,!},\tilde{\mu}_{i,?})\) and the covariance \(\tilde{\Sigma}_i\) as: \[ \tilde{\Sigma}_i = \begin{bmatrix} \tilde{\Sigma}_{i, !!} & \tilde{\Sigma}_{i,!?} \\ \tilde{\Sigma}_{i, ?!} & \tilde{\Sigma}_{i, ??} \end{bmatrix}. \] The conditional distribution of \(Y_{i,?}\) conditional on \(Y_{i,!}\) is then a multivariate normal distribution with expectation \[ E(Y_{i,?} \vert Y_{i,!})= \tilde{\mu}_{i,?} + \tilde{\Sigma}_{i, ?!} \tilde{\Sigma}_{i,!!}^{-1} (Y_{i,!} - \tilde{\mu}_{i,!}) \] and covariance matrix \[ Cov(Y_{i,?} \vert Y_{i,!}) = \tilde{\Sigma}_{i,??} - \tilde{\Sigma}_{i,?!} \tilde{\Sigma}_{i,!!}^{-1} \tilde{\Sigma}_{i,!?}. \]

Conventional random imputation consists in sampling from this conditional multivariate normal distribution. Conditional mean imputation imputes missing values with the deterministic conditional expectation \(E(Y_{i,?} \vert Y_{i,!})\).

3.5 \(\delta\)-adjustment

A marginal \(\delta\)-adjustment approach similar to the “five macros” in SAS is implemented (Roger (2021)), i.e. fixed non-stochastic values are added after the multivariate normal imputation step and prior to the analysis. This is relevant for sensitivity analyses in order to make imputed data systematically worse or better, respectively, than observed data. In addition, some authors have suggested \(\delta\)-type adjustments to implement a composite strategy for continuous outcomes (Darken et al. (2020)).

The implementation provides full flexibility regarding the specific implementation of the \(\delta\)-adjustment, i.e. the value that is added may depend on the randomized treatment group, the timing of the subject’s ICE, and other factors. For suggestions and case studies regarding this topic, we refer to Cro et al. (2020).

3.6 Analysis step

After data imputation, a standard analysis model can be applied to the completed data resulting in a treatment effect estimate. As the imputed data no longer contains missing values, the analysis model is often simple. For example, it can be an analysis of covariance (ANCOVA) model with the outcome (or the change in the outcome from baseline) at a specific visit j as the dependent variable, the randomized treatment group as the primary covariate and, typically, adjustment for the same baseline covariates as for the imputation model.

3.7 Pooling step for inference of (approximate) Bayesian MI and Rubin’s rules

Assume that the analysis model has been applied to \(M\) multiple imputed random datasets which resulted in \(m\) treatment effect estimates \(\hat{\theta}_m\) (\(m=1,\ldots,M\)) with corresponding standard error \(SE_m\) and (if available) degrees of freedom \(\nu_{com}\). If degrees of freedom are not available for an analysis model, set \(\nu_{com}=\infty\) for inference based on the normal distribution.

Rubin’s rules are used for pooling the treatment effect estimates and corresponding variances estimates from the analysis steps across the \(M\) multiple imputed datasets. According to Rubin’s rules, the final estimate of the treatment effect is calculated as the sample mean over the \(M\) treatment effect estimates: \[ \hat{\theta} = \frac{1}{M} \sum_{m = 1}^M \hat{\theta}_m. \] The pooled variance is based on two components that reflect the within and the between variance of the treatment effects across the multiple imputed datasets: \[ V(\hat{\theta}) = V_W(\hat{\theta}) + (1 + \frac{1}{M}) V_B(\hat{\theta}) \] where \(V_W(\hat{\theta}) = \frac{1}{M}\sum_{m = 1}^M SE^2_m\) is the within-variance and \(V_B(\hat{\theta}) = \frac{1}{M-1} \sum_{m = 1}^M (\hat{\theta}_m - \hat{\theta})^2\) is the between-variance.

Confidence intervals and tests of the null hypothesis \(H_0: \theta=\theta_0\) are based on the \(t\)-statistics \(T\):

\[ T= (\hat{\theta}-\theta_0)/\sqrt{V(\hat{\theta})}. \] Under the null hypothesis, \(T\) has an approximate \(t\)-distribution with \(\nu\) degrees of freedom. \(\nu\) is calculated according to the Barnard and Rubin approximation, see Barnard and Rubin (1999) (formula 3) or Little and Rubin (2002) (formula (5.24), page 87):

\[ \nu = \frac{\nu_{old}* \nu_{obs}}{\nu_{old} + \nu_{obs}} \] with \[ \nu_{old} = \frac{M-1}{\lambda^2} \quad\mbox{and}\quad \nu_{obs} = \frac{\nu_{com} + 1}{\nu_{com} + 3} \nu_{com} (1 - \lambda) \] where \(\lambda = \frac{(1 + \frac{1}{M})V_B(\hat{\theta})}{V(\hat{\theta})}\) is the fraction of missing information.

3.8 Bootstrap and jackknife inference for conditional mean imputation

3.8.1 Point estimate of the treatment effect

The point estimator is obtained by applying the analysis model (Section 3.6) to a single conditional mean imputation of the missing data (see Section 3.4.3) based on the REML estimator of the parameters of the imputation model (see Section 3.3.2). We denote this treatment effect estimator by \(\hat{\theta}\).

As demonstrated in Wolbers et al. (2022) (Section 2.4), this treatment effect estimator is valid if the analysis model is an ANCOVA model or, more generally, if the treatment effect estimator is a linear function of the imputed outcome vector. Indeed, if this is the case, then the estimator is identical to the pooled treatment effect across multiple random REML imputation with an infinite number of imputations and corresponds to a computationally efficient implementation of a proposal by von Hippel and Bartlett (2021). We expect that the conditional mean imputation method is also applicable to some other analysis models (e.g. for general MMRM analysis models) but this has not been formally justified.

3.8.2 Jackknife standard errors, confidence intervals (CI) and tests for the treatment effect

For a dataset containing \(n\) subjects, the jackknife standard error depends on treatment effect estimates \(\hat{\theta}_{(-b)}\) (\(b=1,\ldots,n\)) from samples of the original dataset which leave out the observation from subject \(b\). As described previously, to obtain treatment effect estimates for leave-one-subject-out datasets, all steps of the imputation procedure (i.e. imputation, conditional mean imputation, and analysis steps) need to be repeated on this new dataset.

Then, the jackknife standard error is defined as \[\hat{se}_{jack}=[\frac{(n-1)}{n}\cdot\sum_{b=1}^{n} (\hat{\theta}_{(-b)}-\bar{\theta}_{(.)})^2]^{1/2}\] where \(\bar{\theta}_{(.)}\) denotes the mean of all jackknife estimates (Efron and Tibshirani (1994), chapter 10). The corresponding two-sided normal approximation \(1-\alpha\) CI is defined as \(\hat{\theta}\pm z^{1-\alpha/2}\cdot \hat{se}_{jack}\) where \(\hat{\theta}\) is the treatment effect estimate from the original dataset. Tests of the null hypothesis \(H_0: \theta=\theta_0\) are then based on the \(Z\)-score \(Z=(\hat{\theta}-\theta_0)/\hat{se}_{jack}\) using a standard normal approximation.

A simulation study reported in Wolbers et al. (2022) demonstrated exact protection of the type I error for jackknife-based inference with a relatively low sample size (n = 100 per group) and a substantial amount of missing data (>25% of subjects with an ICE).

3.8.3 Bootstrap standard errors, confidence intervals (CI) and tests for the treatment effect

As an alternative to the jackknife, the bootstrap has also been implemented in rbmi (Efron and Tibshirani (1994), Davison and Hinkley (1997)).

Two different bootstrap methods are implemented in rbmi: Methods based on the bootstrap standard error and the normal approximation and percentile bootstrap methods. Denote the treatment effect estimates from \(B\) bootstrap samples by \(\hat{\theta}^*_b\) (\(b=1,\ldots,B\)). The bootstrap standard error \(\hat{se}_{boot}\) is defined as the empirical standard deviation of the bootstrapped treatment effect estimates. Confidence intervals and tests based on the bootstrap standard error can then be constructed in the same way as for the jackknife. Confidence intervals using the percentile bootstrap are based on empirical quantiles of the bootstrap distribution and corresponding statistical tests are implemented in rbmi via inversion of the confidence interval. Explicit formulas for bootstrap inference as implemented in the rbmi package and some considerations regarding the required number of bootstrap samples are included in the Appendix of Wolbers et al. (2022).

A simulation study reported in Wolbers et al. (2022) demonstrated a small inflation of the type I error rate for inference based on the bootstrap standard error (up to \(5.3\%\) for a nominal type I error rate of \(5\%\)) for a sample size of n = 100 per group and a substantial amount of missing data (>25% of subjects with an ICE). Based on this simulations, we recommend the jackknife over the bootstrap for inference because it performed better in our simulation study and is typically much faster to compute than the bootstrap.

3.9 Pooling step for inference of the bootstrapped MI methods

Assume that the analysis model has been applied to \(B\times D\) multiple imputed random datasets which resulted in \(B\times D\) treatment effect estimates \(\hat{\theta}_{bd}\) (\(b=1,\ldots,B\); \(d=1,\ldots,D\)).

The final estimate of the treatment effect is calculated as the sample mean over the \(B*D\) treatment effect estimates: \[ \hat{\theta} = \frac{1}{BD} \sum_{b = 1}^B \sum_{d = 1}^D \hat{\theta}_{bd}. \] The pooled variance is based on two components that reflect the variability within and between imputed bootstrap samples (von Hippel and Bartlett (2021), formula 8.4): \[ V(\hat{\theta}) = (1 + \frac{1}{B})\frac{MSB - MSW}{D} + \frac{MSW}{BD} \]

where \(MSB\) is the mean square between the bootstrapped datasets, and \(MSW\) is the mean square within the bootstrapped datasets and between the imputed datasets:

\[ \begin{align*} MSB &= \frac{D}{B-1} \sum_{b = 1}^B (\bar{\theta_{b}} - \hat{\theta})^2 \\ MSW &= \frac{1}{B(D-1)} \sum_{b = 1}^B \sum_{d = 1}^D (\theta_{bd} - \bar{\theta_b})^2 \end{align*} \] where \(\bar{\theta_{b}}\) is the mean across the \(D\) estimates obtained from random imputation of the \(b\)-th bootstrap sample.

The degrees of freedom are estimated with the following formula (von Hippel and Bartlett (2021), formula 8.6):

\[ \nu = \frac{(MSB\cdot (B+1) - MSW\cdot B)^2}{\frac{MSB^2\cdot (B+1)^2}{B-1} + \frac{MSW^2\cdot B}{D-1}} \]

Confidence intervals and tests of the null hypothesis \(H_0: \theta=\theta_0\) are based on the \(t\)-statistics \(T\):

\[ T= (\hat{\theta}-\theta_0)/\sqrt{V(\hat{\theta})}. \] Under the null hypothesis, \(T\) has an approximate \(t\)-distribution with \(\nu\) degrees of freedom.

3.10 Comparison between the implemented approaches

3.10.1 Treatment effect estimation

All approaches provide consistent treatment effect estimates for standard and reference-based imputation methods in case the analysis model of the completed datasets is a general linear model such as ANCOVA. Methods other than conditional mean imputation should also be valid for other analysis models. The validity of conditional mean imputation has only been formally demonstrated for analyses using the general linear model (Wolbers et al. (2022, sec. 2.4)) though it may also be applicable more widely (e.g. for general MMRM analysis models).

Treatment effects based on conditional mean imputation are deterministic. All other methods are affected by Monte Carlo sampling error and the precision of estimates depends on the number of imputations or bootstrap samples, respectively.

3.10.2 Standard errors of the treatment effect

All approaches provide frequentist consistent estimates of the standard error for imputation under a MAR assumption. For reference-based imputation methods, methods based on conditional mean imputation or bootstrapped MI provide frequentist consistent estimates of the standard error whereas Rubin’s rules applied to conventional MI methods provides so-called information anchored inference (Bartlett (2021), Cro, Carpenter, and Kenward (2019), von Hippel and Bartlett (2021), Wolbers et al. (2022)). Frequentist consistent estimates of the standard error lead to confidence intervals and tests which have (asymptotically) correct coverage and type I error control under the assumption that the reference-based assumption reflects the true data-generating mechanism. For finite samples, simulations for a sample size of \(n=100\) per group reported in Wolbers et al. (2022) demonstrated that conditional mean imputation combined with the jackknife provided exact protection of the type one error rate whereas the bootstrap was associated with a small type I error inflation (between 5.1% to 5.3% for a nominal level of 5%).

It is well known that Rubin’s rules do not provide frequentist consistent estimates of the standard error for reference-based imputation methods (Seaman, White, and Leacy (2014), Liu and Pang (2016), Tang (2017), Cro, Carpenter, and Kenward (2019), Bartlett (2021)). Standard errors from Rubin’s rule are typically larger than frequentist standard error estimates leading to conservative inference and a corresponding loss of statistical power, see e.g. the simulations reported in Wolbers et al. (2022). Intuitively, this occurs because reference-based imputation methods borrow information from the reference group for imputations in the intervention group leading to a reduction in the frequentist variance of the resulting treatment effect contrast which is not captured by Rubin’s variance estimator. Formally, this occurs because the imputation and analysis models are uncongenial for reference-based imputation methods (Meng (1994), Bartlett (2021)). Cro, Carpenter, and Kenward (2019) argued that Rubin’s rule is nevertheless valid for reference-based imputation methods because it is approximately information-anchored, i.e. that the proportion of information lost due to missing data under MAR is approximately preserved in reference-based analyses. In contrast, frequentist standard errors for reference based imputation are not information anchored for reference-based imputation and standard errors under reference-based assumptions are typically smaller than those for MAR imputation.

Information anchoring is a sensible concept for sensitivity analyses, whereas for a primary analyses, it may be more important to adhere to the principles of frequentist inference. Analyses of data with missing observations generally rely on unverifiable missing data assumptions and the assumptions for reference-based imputation methods are relatively strong. Therefore, these assumptions need to be clinically justified as appropriate or at least conservative for the considered disease area and the anticipated mechanism of action of the intervention.

Conditional mean imputation combined with the jackknife is the only method which leads to deterministic standard error estimates and, consequently, confidence intervals and \(p\)-values are also deterministic. This is particularly important in a regulatory setting where it is important to ascertain whether a calculated \(p\)-value which is close to the critical boundary of 5% is truly below or above that threshold rather than being uncertain about this because of Monte Carlo error.

3.10.3 Computational complexity

Bayesian MI methods rely on the specification of prior distributions and the usage of Markov chain Monte Carlo (MCMC) methods. All other methods based on multiple imputation or bootstrapping require no other tuning parameters than the specification of the number of imputations \(M\) or bootstrap samples \(B\) and rely on numerical optimization for fitting the MMRM imputation models via REML. Conditional mean imputation combined with the jackknife has no tuning parameters.

In our rbmi implementation, the fitting of the MMRM imputation model via REML is computationally most expensive. MCMC sampling using rstan (Stan Development Team (2020)) is typically relatively fast in our setting and requires only a small burn-in and burn-between of the chains. In addition, the number of random imputations for reliable inference using Rubin’s rules is often smaller than the number of resamples required for the jackknife or the bootstrap (see e.g. the discussions in I. R. White, Royston, and Wood (2011, sec. 7) for Bayesian MI and the Appendix of Wolbers et al. (2022) for the bootstrap). Thus, for many applications, we expect that conventional MI based on Bayesian posterior draws will be fastest, followed by conventional MI using approximate Bayesian posterior draws and conditional mean imputation combined with the jackknife. Conditional mean imputation combined with the bootstrap and bootstrapped MI methods will typically be most computationally demanding. Of note, all implemented methods are conceptually straightforward to parallelise and some parallelization support is provided by rbmi.

rbmi: Statistical Specifications

Alessandro Noci, Craig Gower-Page, and Marcel Wolbers

1 Scope of this document

2 Introduction to estimands and estimation methods

2.1 Estimands

2.2 Alignment between the estimand and the estimation method

2.2.1 Missing data prior to ICEs

2.2.2 Implementation of the hypothetical strategy

2.2.3 Implementation of the treatment policy strategy

2.2.4 Implementation of the composite strategy

3 Statistical methodology

3.1 Overview of the imputation procedure

3.1.1 Conventional MI

3.1.2 Conditional mean imputation

3.1.3 Bootstrapped MI

3.2 Setting, notation, and missing data assumptions

3.3 The base imputation model

3.3.1 Included data and model specification

3.3.2 Restricted maximum likelihood estimation (REML)

3.3.3 Bayesian model fitting

3.3.4 Approximate Bayesian posterior draws via the bootstrap

3.4 Imputation step

3.4.1 Marginal imputation distribution for a subject - MAR case

3.4.2 Marginal imputation distribution for a subject - reference-based imputation methods

3.4.3 Imputation of missing outcome data

3.5 \(\delta\)-adjustment

3.6 Analysis step

3.7 Pooling step for inference of (approximate) Bayesian MI and Rubin’s rules

3.8 Bootstrap and jackknife inference for conditional mean imputation

3.8.1 Point estimate of the treatment effect

3.8.2 Jackknife standard errors, confidence intervals (CI) and tests for the treatment effect

3.8.3 Bootstrap standard errors, confidence intervals (CI) and tests for the treatment effect

3.9 Pooling step for inference of the bootstrapped MI methods

3.10 Comparison between the implemented approaches

3.10.1 Treatment effect estimation

3.10.2 Standard errors of the treatment effect

3.10.3 Computational complexity

4 Mapping of statistical methods to `rbmi` functions

5 Comparison to other software implementations

References

rbmi: Statistical Specifications

Alessandro Noci, Craig Gower-Page, and Marcel Wolbers

1 Scope of this document

2 Introduction to estimands and estimation methods

2.1 Estimands

2.2 Alignment between the estimand and the estimation method

2.2.1 Missing data prior to ICEs

2.2.2 Implementation of the hypothetical strategy

2.2.3 Implementation of the treatment policy strategy

2.2.4 Implementation of the composite strategy

3 Statistical methodology

3.1 Overview of the imputation procedure

3.1.1 Conventional MI

3.1.2 Conditional mean imputation

3.1.3 Bootstrapped MI

3.2 Setting, notation, and missing data assumptions

3.3 The base imputation model

3.3.1 Included data and model specification

3.3.2 Restricted maximum likelihood estimation (REML)

3.3.3 Bayesian model fitting

3.3.4 Approximate Bayesian posterior draws via the bootstrap

3.4 Imputation step

3.4.1 Marginal imputation distribution for a subject - MAR case

3.4.2 Marginal imputation distribution for a subject - reference-based imputation methods

3.4.3 Imputation of missing outcome data

3.5 \(\delta\)-adjustment

3.6 Analysis step

3.7 Pooling step for inference of (approximate) Bayesian MI and Rubin’s rules

3.8 Bootstrap and jackknife inference for conditional mean imputation

3.8.1 Point estimate of the treatment effect

3.8.2 Jackknife standard errors, confidence intervals (CI) and tests for the treatment effect

3.8.3 Bootstrap standard errors, confidence intervals (CI) and tests for the treatment effect

3.9 Pooling step for inference of the bootstrapped MI methods

3.10 Comparison between the implemented approaches

3.10.1 Treatment effect estimation

3.10.2 Standard errors of the treatment effect

3.10.3 Computational complexity

4 Mapping of statistical methods to rbmi functions

5 Comparison to other software implementations

References

4 Mapping of statistical methods to `rbmi` functions