A valid principal surrogate endpoint can be used as a primary endpoint for evaluating treatments in phase II clinical trials and for predicting individual treatment effects post lincensure. A surrogate is considered to be valid if it provides reliable predictions of treatment effects on the clinical endpoint of interest. Frangakis and Rubin (2002) introduced the concept of principal stratification and the definition of a principal surrogate (PS). Informally, a post-treatment intermediate response variable is a principal surrogate if causal effects of the treatment on the clinical outcome only exist when causal effects of the treatment on the intermediate variable exist. The criteria for a PS have been modified and extended in more recent works, with most current literture focusing of wide effect modification as the primary criterion of interst; tests for wide effect modification are included in the package.
The goal of PS evaluation is estimation and testing of how treatment efficacy on the clinical outcome of interest varies over subgroups defined by possible treatment and surrogate combinations of interest; this is an effect modification objective. The combinations of interest are called the principal strata and they include a set of unobservable counterfactual responses: responses that would have occured under a set of condtions counter to the observed conditions. To finesse this problem of unobservable responses, a variety of clever trial designs and estimation approaches have been proposed. Several of these have been implemented in the pseval
package.
Let \(Z_i\) be the treatment indicator for subject \(i\), where 0 indicates the control or standard treatment, and 1 indicates the experimental treatment. We currently only allow for two levels of treatment and assume that the treatment assigments are randomized. Let \(S_i\) be the observed value of the intermediate response for subject \(i\). Since \(S_i\) can be affected by treatment, there are two naturally occuring counterfactual values of \(S_i\): \(S_i(1)\) under treatment, and \(S_i(0)\) under control. Let \(s_z\) be the realization of the random variable \(S(z)\), for \(z \in \{0, 1\}\). The outcome of interest is denoted \(Y_i\). We consider the counterfactual values of \(Y_i(0)\) and \(Y_i(1)\). We allow for both binary and time-to-event outcomes, thus \(Y_i\) may be a vector containing a time variable and an event/censoring indicator, i.e. \(Y_i = (T_i, \Delta_i)\) where \(\Delta_i = 1\) if \(T_i\) is an event time, and \(\Delta_i = 0\) if \(T_i\) is a censoring time. For all of the methods, \(S_i(z)\) is only defined if the clinical outcome \(Y_i(z)\) does not occur before the potential surrogate \(S_i(z)\) is measured at a fixed time \(\tau\) after entry into the study. The data analyses only include participants who have not experienced the clinical outcome by time \(\tau\). For validity all of the methods assume no individual-level treatment effects on \(Y\) before \(\tau\), which we refer to as the ‘Equal early individual risk’ assumption below.
Criteria for \(S\) to be a good surrogate are based on risk estimands that condition on the potential intermediate responses. The risk is defined as a mapping \(g\) of the cumulative distribution function of \(Y(z)\) conditional on the intermediate responses. Currently we focus only on marginal risk estimands which condition only on \(S(1)\), the intermediate response or biomarker under active treatment:
\[ risk_1(s_1) = g\left\{F_{s_1}\left[Y(1) | S(1) = s_1\right]\right\}, \\ risk_0(s_1) = g\left\{F_{s_1}\left[Y(0) | S(1) = s_1\right]\right\}. \]
The joint risk estimands also condition on \(S(0)\), the intermediate response or biomarker under control. In the special case where \(S(0)\) is constant, such as the immune response to HIV antigens in the placebo arm of a vaccine trial, the joint and marginal risk estimands are equivalent. This special case is referred to as case constant biomarker (CB) in much of the literature (P. Gilbert and Hudgens 2008).
For instance, for a binary outcome, the risk function may simply be the probability \(risk_z(s_1) = P(Y(z) = 1 | S(1) = s_1)\), or for a time-to-event outcome the risk function may be the cumulative distribution function \(risk_z(s_1) = P(Y(z) \leq t | S(1) = s_1)\).
Specification of the distributions of \(Y(z) | S(1)\) determines the likelihood, we will denote this as \(f(y | \beta, s_1, z)\). If \(S(1)\) were fully observed, simple maximum likelihood estimation could be used. The key challenge in estimating these risk estimands is solving the problem of conditioning on counterfactual values that are not observable for at least a subset of subjects in a randomized trial. This involves integrating out missing values based on some models and/or set of assumptions. Namely, \(S(1)\) is unobserved for all subjects who received treatment 0.
Frangakis and Rubin (2002) gave a single criterion for a biomarker S to be a PS: causal effects of the treatment on the clinical outcome only exist when causal effects of the treatment on the intermediate variable exist. In general this can only be evaluated using the joint risk estimands, which consider not only the counterfacutal values of the biomarker under treatment, but also under control \(S(0)\). However, in the special case where all \(S(0)\) values are constant, say at level \(C\), such as an immune response to HIV in a HIV negative population pre-vaccination this criteria, often referred to as average causal neccasity (ACN), can by written in terms of the marginal risk estimands as:
\[ risk_1(C)=risk_0(C) \]
More recently, other works P. Gilbert and Hudgens (2008), J. Wolfson and Gilbert (2010), Ying Huang, Gilbert, and Wolfson (2013), Erin E. Gabriel and Gilbert (2014), and Erin E Gabriel and Follmann (2015) have suggested that this criterion is both too restrictive and in some cases can be vacuously true. Instead most current works suggest that the wide effect modification (WEM) criterion is of primary importance, ACN being of secondary importance. WEM is given formally in terms of the risk estimands and a known contrast function \(h\) satisfying \(h(x, y) = 0\) if and only if \(x = y\) by:
\[ |h(risk_1(s_1),risk_0(s_1)) - h(risk_1(s{_1}^*),risk_0(s{_1}^*))|>\delta \]
for at least some \(s_1 \neq s_{1}^*\) and \(\delta>0\), with the larger the \(\delta\) the better. To evaluate WEM and ACN we need to identify the risk estimands.
We first make three standard assumptions used in much of the literature for absorbing events outcomes:
J. Wolfson and Gilbert (2010) outlines how these assumptions are needed for identification of the risk estimands. Now to deal with the missing \(S(1)\) values among those with \(Z = 0\), we focus on three trial augmentations: Baseline immunogenicity predictor (BIP), closeout placebo vaccination (CPV), and baseline surrogate measurement (BSM). For details on these augmentations, we refer you to D. Follmann (2006), P. Gilbert and Hudgens (2008), Erin E. Gabriel and Gilbert (2014), and Erin E Gabriel and Follmann (2015).
In time-to-event settings one more assumption is needed: - Non-informative censoring.
It should be noted that the equal individual risk assumption requires that time-to-event analysis start at time \(\tau\), rather than at randomization.
Briefly, a BIP \(W\) is any baseline measurement or set of measurements that is highly correlated with \(S\). It is particularly useful if \(W\) is unlikely to be associated with the clinical outcome after conditioning on \(S\), i.e. \(Y \perp W | S(1)\); some of the methods leverage this assumption. The BIP \(W\) is used to integrate out the missing \(S(1)\) among those with \(Z = 0\) based on a model for \(S(1) | W\) that is estimated among those with \(Z = 1\). We describe how this model is used in the next section.
The assumptions needed for a BIP to be useful depend on the risk model used. If the BIP is included in the risk model, only the assumption of no interaction with treatment and the candidate surrogate are needed. However, if the BIP is not included in the risk model, the assumption that that clinical outcome is independent of the BIP given the candidate surrogate is needed. Although not a requirement for identification of the risk estimands, it has been found in most simulations studies that a correlation between the BIP and \(S(1)\) of greater than 0.6 is needed for unbiased estimation in finite samples.
Under a CPV augmented design, control recipients that do not have events at the end of the follow-up period are given the experimental treatment. Then their intermediate response is measured at some time post treatment. This measurement is then used as a direct imputation for the missing \(S(1)\). One set of conservative assumptions to use CPV as a direct imputation for \(S(1)\) are given in J. Wolfson and Gilbert (2010) are:
Erin E. Gabriel and Gilbert (2014) suggested the baseline augmentation BSM, which is a pre-treatment measurement of the candidate PS, denoted \(S_B\). The BSM may be a good predictor of \(S(1)\) without any further assumptions. It can be used in the same way as a BIP. Alternatively you can transform \(S(1) - S_B\) and use this as the candidate surrogate, further increasing the assocation with the BSM/BIP. Under the BSM assumption outlined in Erin E. Gabriel and Gilbert (2014);
then \(S(0)=S_{BSM}\) almost surely.
Let \(f(y | \beta, s_1, z)\) denote the density of \(Y | S(1), Z\) with parameters \(\beta\). Further let \(R_i\) denote the indicator for missingness in \(S_i(1)\). We proceed to estimate \(\beta\) by maximizing
\[ \prod_{i = 1}^n \left\{f(Y_i | \beta, S_i(1), Z_i)\right\}^R_i \left\{\int f(Y_i | \beta, s, Z_i) \, d \hat{F}_{S(1) | W}(s | W_i)\right\}^{1 - R_i} \]
with respect to \(\beta\).
This procedure is called estimated maximum likelihood (EML) and was developed in Pepe and Fleming (1991). The key idea is that we are averaging the likelihood contributions for subjects missing \(S(1)\) with respect to the estimated distribution of \(S(1) | W\). A BIP \(W\) that is strongly associated with \(S(1)\) is needed for adequate performance.
Closed-form inference is not available for EML estimates, thus we recommend use of the bootstrap for estimation of standard errors. It was suggested as an approach to prinicipal surrogate evaluation by P. Gilbert and Hudgens (2008) and Y Huang and Gilbert (2011).
Ying Huang, Gilbert, and Wolfson (2013) suggest a different estimation procedure that does have a closed form variance estimator. Pseudscore estimates were also suggested in J. Wolfson (2009) and implemented for several special cases in Ying Huang, Gilbert, and Wolfson (2013). We have implemented here only one of the special cases: categorical \(BIP\) and binary \(Y\) (\(S\) may be continuous or categorical). In addition to having closed form variance estimators, it has been argued that the pseudo-score estimators are more efficient than the EML estimators. The closed form variance estimates are not yet implemented.
The pseval
package allows users to specify the type of augmented design that suits their data, specify the form of the risk model along with the distribution of \(Y | S(1)\), and specify different integration models to estimate the distribution of \(S(1) | W\). Then the likelihood can be maximized and bootstraps run. Post-estimation summaries are available to display and analyze the treatment efficacy as a function of \(S(1)\). All of this is implemented with a flexible and familiar interface.
pseval
is an R package aimed at implementing existing methods for surrogate evaluation using a flexible and common interface. It is still in active development and testing. Development will take place on the Github page, and the current version of the package can be installed as shown below. First you must install the devtools
package, if you haven’t already install.packages("devtools")
.
devtools::install_github("sachsmc/pseval")
Here we will walk through some basic analyses from the point of view of a new R user. Along the way we will highlight the main features of pseval
. pseval
supports both binary outcomes and time-to-event, thus we will also need to load the survival
package.
library(pseval)
library(survival)
First let’s create an example dataset. The pseval package provides the function generate_example_data
which takes a single argument: the sample size.
fakedata <- generate_example_data(n = 500)
head(fakedata)
Z | BIP | CPV | BSM | S.obs | time.obs | event.obs | Y.obs | S.obs.cat | BIP.cat |
---|---|---|---|---|---|---|---|---|---|
1 | 0.2274917 | NA | 0.1437389 | 1.0287231 | 0.1396934 | 0 | 0 | (0.569,1.27] | (0.0243,0.639] |
0 | -0.5432429 | 0.5792939 | -0.5768676 | -0.4233137 | 0.0081350 | 1 | 0 | (-Inf,-0.157] | (-0.593,0.0243] |
0 | -1.1516303 | -0.1803749 | -1.1055681 | -1.0844813 | 0.0251196 | 0 | 0 | (-Inf,-0.157] | (-Inf,-0.593] |
0 | 0.0778870 | NA | 0.4649162 | 0.3216446 | 0.0146523 | 1 | 1 | (-0.157,0.569] | (0.0243,0.639] |
1 | 0.4246523 | NA | 0.5006394 | 1.3380942 | 0.0340022 | 1 | 0 | (1.27, Inf] | (0.0243,0.639] |
1 | -0.0340998 | NA | 0.1321681 | 0.8794913 | 0.3585655 | 0 | 0 | (0.569,1.27] | (-0.593,0.0243] |
The example data includes both a time-to-event outcome, a binary outcome, a surrogate, a BIP, CPV, and BSM, and a categorical version of the surrogate. The true model for the time is exponential, with parameters (intercept) = 1, S(1) = -0.5, Z = 0, S(1):Z = -1. The true model for binary is logistic, with the same parameter values.
In the above table S.obs.cat and BIP.cat are formed as S.obs.cat <- factor(S.obs,levels=c(-Inf, quantile(c(S.0, S.1), c(.25, .5, .75), na.rm = TRUE), Inf))
and similarly for BIP.cat. Alternatively a user could input arbitrary numeric values to represent different discrete subgroups (e.g., 0s and 1s to denote 2 subgroups).
psdesign
objectWe begin by creating a psdesign
object with the synonymous function. This is the object that combines the raw dataset with information about the study design and the structure of the data. Subsequent analysis will operate on this psdesign object. It is analogous to the svydesign
function in the survey package. The first argument is the data frame where the data are stored. All subsequent arguments describe the mappings from the variable names in the data frame to important variables in the PS analysis, using the same notation as above. An optional weights argument describes the sampling weights, if any. Our first analysis will use the binary version of the outcome, with continuous \(S.1\) and the BIP labeled \(BIP\). The object has a print method, so we can inspect the result.
binary.ps <- psdesign(data = fakedata, Z = Z, Y = Y.obs, S = S.obs, BIP = BIP)
binary.ps
## Augmented data frame: 500 obs. by 6 variables.
## Z Y S.1 S.0 cdfweights BIP
## 1 1 0 1.029 NA 1 0.2275
## 2 0 0 NA -0.423 1 -0.5432
## 3 0 0 NA -1.084 1 -1.1516
## 4 0 1 NA 0.322 1 0.0779
## 5 1 0 1.338 NA 1 0.4247
## 6 1 0 0.879 NA 1 -0.0341
##
## Empirical VE: 0.495
##
## Mapped variables:
## Z -> Z
## Y -> Y.obs
## S -> S.obs
## BIP -> BIP
##
## Integration models:
## None present, see ?add_integration for information on integration models.
##
## Risk models:
## None present, see ?add_riskmodel for information on risk models.
## No estimates present, see ?ps_estimate.
## No bootstraps present, see ?ps_bootstrap.
The printout displays a brief description of the data, including the empirical vaccine efficacy estimate, the variables used in the analysis and their corresponding variables in the original dataset. Finally the printout invites the user to see the help page for add_integration
, in order to add an integration model to the psdesign object, the next step in the analysis.
psdesign
easily accomodates case-control or case-cohort sampling. Let’s modify the fake dataset to see how it works. We’re going to sample all of the cases, and 20% of the controls for measurement of \(S\).
fakedata.cc <- fakedata
missdex <- sample((1:nrow(fakedata.cc))[fakedata.cc$Y.obs == 0],
size = floor(sum(fakedata.cc$Y.obs == 0) * .8))
fakedata.cc[missdex, ]$S.obs <- NA
fakedata.cc$weights <- ifelse(fakedata.cc$Y.obs == 1, 1, .2)
Now we can create the psdesign
object, using the entire dataset (including those missing S.obs
) and passing the weights to the weights
field.
binary.cc <- psdesign(data = fakedata.cc, Z = Z, Y = Y.obs, S = S.obs, BIP = BIP, weights = weights)
binary.cc
## Augmented data frame: 500 obs. by 6 variables.
## Z Y S.1 S.0 cdfweights BIP
## 1 1 0 NA NA 0.2 0.2275
## 2 0 0 NA NA 0.2 -0.5432
## 3 0 0 NA NA 0.2 -1.1516
## 4 0 1 NA 0.322 1.0 0.0779
## 5 1 0 NA NA 0.2 0.4247
## 6 1 0 NA NA 0.2 -0.0341
##
## Empirical VE: 0.495
##
## Mapped variables:
## Z -> Z
## Y -> Y.obs
## S -> S.obs
## BIP -> BIP
## weights -> weights
##
## Integration models:
## None present, see ?add_integration for information on integration models.
##
## Risk models:
## None present, see ?add_riskmodel for information on risk models.
## No estimates present, see ?ps_estimate.
## No bootstraps present, see ?ps_bootstrap.
For survival outcomes, a key assumption is that the potential surrogate is measured at a fixed time \(\tau\) after entry into the study. Any subjects who have a clinical outcome prior to \(\tau\) will be removed from the analysis, with a warning. If tau
is not specified in the psdesign
object, then it is assumed to be 0.
The EML procedure requires an estimate of \(F_{S(1) | W}\), this is referred to as the integration model. Let’s see the help page for add_integration
:
?add_integration
add_integration | R Documentation |
Add integration model to a psdesign object
add_integration(psdesign, integration)
psdesign
|
A psdesign object |
integration
|
An integration object |
This is a list of the available integration models. The fundamental problem in surrogate evaluation is that there are unobserved values of the counterfactual surrogate reponses S(1). In the estimated maximum likelihood framework, for subjects missing the S(1) values, we use an auxiliary pre-treatment variable or set of variables W that is observed for every subject to estimate the distribution of S(1) | W. Typically, this W is a BIP. Then for each missing S(1), we integrate likelihood contributions over each non-missing S(1) given their value of W, and average over the contributions.
integrate_parametric This is a parametric integration model that fits a linear model for the mean of S(1) | W and assumes a Gaussian distribution.
integrate_bivnorm This is another parametric integration model that assumes that S(1) and W are jointly normally distributed. The user must specify their mean, variances and correlation.
integrate_nonparametric This is a non-parametric integration model that is only valid for categorical S(1) and W. It uses the observed proportions to estimate the joint distribution of S(1), W.
integrate_semiparametric This is a semi-parametric model that uses the semi-parametric location scale model of Heagerty and Pepe (1999). Models are specified for the location of S(1) | W and the scale of S(1) | W. Then integrations are drawn from the empirical distribution of the residuals from that model, which are then transformed to the appropriate location and scale.
test <- psdesign(generate_example_data(n = 100), Z = Z, Y = Y.obs, S = S.obs, BIP = BIP) add_integration(test, integrate_parametric(S.1 ~ BIP)) test + integrate_parametric(S.1 ~ BIP) # same as above
For this first example, let’s use the parametric integration model. We specify the mean model for \(S(1) | W\) as a formula. We can add the integration model directly to the psdesign object and inspect the results. Note that in the formula, we refer to the variable names in the augmented dataset.
binary.ps <- binary.ps + integrate_parametric(S.1 ~ BIP)
binary.ps
## Augmented data frame: 500 obs. by 6 variables.
## Z Y S.1 S.0 cdfweights BIP
## 1 1 0 1.029 NA 1 0.2275
## 2 0 0 NA -0.423 1 -0.5432
## 3 0 0 NA -1.084 1 -1.1516
## 4 0 1 NA 0.322 1 0.0779
## 5 1 0 1.338 NA 1 0.4247
## 6 1 0 0.879 NA 1 -0.0341
##
## Empirical VE: 0.495
##
## Mapped variables:
## Z -> Z
## Y -> Y.obs
## S -> S.obs
## BIP -> BIP
##
## Integration models:
## integration model for S.1 :
## integrate_parametric(formula = S.1 ~ BIP )
##
## Risk models:
## None present, see ?add_riskmodel for information on risk models.
## No estimates present, see ?ps_estimate.
## No bootstraps present, see ?ps_bootstrap.
We can add multiple integration models to a psdesign object, say we want a model for \(S(0) | W\):
binary.ps + integrate_parametric(S.0 ~ BIP)
## Augmented data frame: 500 obs. by 6 variables.
## Z Y S.1 S.0 cdfweights BIP
## 1 1 0 1.029 NA 1 0.2275
## 2 0 0 NA -0.423 1 -0.5432
## 3 0 0 NA -1.084 1 -1.1516
## 4 0 1 NA 0.322 1 0.0779
## 5 1 0 1.338 NA 1 0.4247
## 6 1 0 0.879 NA 1 -0.0341
##
## Empirical VE: 0.495
##
## Mapped variables:
## Z -> Z
## Y -> Y.obs
## S -> S.obs
## BIP -> BIP
##
## Integration models:
## integration model for S.1 :
## integrate_parametric(formula = S.1 ~ BIP )
## integration model for S.0 :
## integrate_parametric(formula = S.0 ~ BIP )
##
## Risk models:
## None present, see ?add_riskmodel for information on risk models.
## No estimates present, see ?ps_estimate.
## No bootstraps present, see ?ps_bootstrap.
In a future version of the package, we will allow for estimation of the joint risk estimands that depend on both \(S(0)\) and \(S(1)\). We can also use splines or other transformations in the formula:
library(splines)
binary.ps + integrate_parametric(S.1 ~ BIP^2)
## Augmented data frame: 500 obs. by 6 variables.
## Z Y S.1 S.0 cdfweights BIP
## 1 1 0 1.029 NA 1 0.2275
## 2 0 0 NA -0.423 1 -0.5432
## 3 0 0 NA -1.084 1 -1.1516
## 4 0 1 NA 0.322 1 0.0779
## 5 1 0 1.338 NA 1 0.4247
## 6 1 0 0.879 NA 1 -0.0341
##
## Empirical VE: 0.495
##
## Mapped variables:
## Z -> Z
## Y -> Y.obs
## S -> S.obs
## BIP -> BIP
##
## Integration models:
## integration model for S.1 :
## integrate_parametric(formula = S.1 ~ BIP^2 )
##
## Risk models:
## None present, see ?add_riskmodel for information on risk models.
## No estimates present, see ?ps_estimate.
## No bootstraps present, see ?ps_bootstrap.
binary.ps + integrate_parametric(S.1 ~ bs(BIP, df = 3))
## Augmented data frame: 500 obs. by 6 variables.
## Z Y S.1 S.0 cdfweights BIP
## 1 1 0 1.029 NA 1 0.2275
## 2 0 0 NA -0.423 1 -0.5432
## 3 0 0 NA -1.084 1 -1.1516
## 4 0 1 NA 0.322 1 0.0779
## 5 1 0 1.338 NA 1 0.4247
## 6 1 0 0.879 NA 1 -0.0341
##
## Empirical VE: 0.495
##
## Mapped variables:
## Z -> Z
## Y -> Y.obs
## S -> S.obs
## BIP -> BIP
##
## Integration models:
## integration model for S.1 :
## integrate_parametric(formula = S.1 ~ bs(BIP, df = 3) )
##
## Risk models:
## None present, see ?add_riskmodel for information on risk models.
## No estimates present, see ?ps_estimate.
## No bootstraps present, see ?ps_bootstrap.
These are shown as examples, we will proceed with the simple linear model for integration.
The next step is to define the risk model.
Let’s see how to add a risk model:
?add_riskmodel
add_riskmodel | R Documentation |
Add risk model to a psdesign object
add_riskmodel(psdesign, riskmodel)
psdesign
|
A psdesign object |
riskmodel
|
A risk model object, from the list above |
The risk model component specifies the likelihood for the data. This involves specifying the distribution of the outcome variable, whether it is binary or time-to-event, and specifying how the surrogate S(1) and the treatment Z interact and affect the outcome. We use the formula notation to be consistent with other regression type models in R. Below is a list of available risk models.
risk_binary This is a generic risk model for binary outcomes. The user can specify the formula, and link function using either risk.logit for the logistic link, or risk.probit for the probit link. Custom link functions may also be specified, which take a single numeric vector argument, and returns a vector of corresponding probabilities.
risk_weibull This is a parameterization of the Weibull model for time-to-event outcomes that is consistent with that of rweibull. The user specifies the formula for the linear predictor of the scale parameter.
risk_exponential This is a simple exponential model for a time-to-event outcome.
risk_poisson This is a Poisson model for count outcomes. It allows for offsets in the formula.
test <- psdesign(generate_example_data(n = 100), Z = Z, Y = Y.obs, S = S.obs, BIP = BIP) + integrate_parametric(S.1 ~ BIP) add_riskmodel(test, risk_binary()) test + risk_binary() # same as above
Let’s add a simple binary risk model using the logit link. The argument D
specifies the number of samples to use for the simulated annealing aka empirical integration in the EML procedure. In general, D should be set to something reasonably large, like 2 or 3 times the sample size. We use a smaller D so that this vignette builds in a reasonable amount of time.
binary.ps <- binary.ps + risk_binary(model = Y ~ S.1 * Z, D = 50, risk = risk.logit)
binary.ps
## Augmented data frame: 500 obs. by 6 variables.
## Z Y S.1 S.0 cdfweights BIP
## 1 1 0 1.029 NA 1 0.2275
## 2 0 0 NA -0.423 1 -0.5432
## 3 0 0 NA -1.084 1 -1.1516
## 4 0 1 NA 0.322 1 0.0779
## 5 1 0 1.338 NA 1 0.4247
## 6 1 0 0.879 NA 1 -0.0341
##
## Empirical VE: 0.495
##
## Mapped variables:
## Z -> Z
## Y -> Y.obs
## S -> S.obs
## BIP -> BIP
##
## Integration models:
## integration model for S.1 :
## integrate_parametric(formula = S.1 ~ BIP )
##
## Risk models:
## risk_binary(model = Y ~ S.1 * Z, D = 50, risk = risk.logit )
##
## No estimates present, see ?ps_estimate.
## No bootstraps present, see ?ps_bootstrap.
We estimate the parameters and bootstrap using the same type of syntax. We can add a ps_estimate
object, which takes optional arguments start
for starting values, and other arguments that are passed to the optim
function. The method = "BFGS"
determines the optimization method, we have found that “BFGS” works well in these types of problems.
The ps_bootstrap
function takes the additional arguments n.boots
for the number of bootstrap replicates, and progress.bar
which is a logical that displays a progress bar in the R console if true. It is helpful to pass the estimates as starting values in the bootstrap resampling.
binary.est <- binary.ps + ps_estimate(method = "BFGS")
binary.boot <- binary.est + ps_bootstrap(n.boots = 50, progress.bar = FALSE,
start = binary.est$estimates$par, method = "BFGS")
binary.boot
## Augmented data frame: 500 obs. by 6 variables.
## Z Y S.1 S.0 cdfweights BIP
## 1 1 0 1.029 NA 1 0.2275
## 2 0 0 NA -0.423 1 -0.5432
## 3 0 0 NA -1.084 1 -1.1516
## 4 0 1 NA 0.322 1 0.0779
## 5 1 0 1.338 NA 1 0.4247
## 6 1 0 0.879 NA 1 -0.0341
##
## Empirical VE: 0.495
##
## Mapped variables:
## Z -> Z
## Y -> Y.obs
## S -> S.obs
## BIP -> BIP
##
## Integration models:
## integration model for S.1 :
## integrate_parametric(formula = S.1 ~ BIP )
##
## Risk models:
## risk_binary(model = Y ~ S.1 * Z, D = 50, risk = risk.logit )
##
## Estimated parameters:
## (Intercept) S.1 Z S.1:Z
## -1.384 0.387 0.276 -1.303
## Convergence: TRUE
##
## Bootstrap replicates:
## boot.se lower.CL.2.5% upper.CL.97.5%
## (Intercept) 0.232 -2.000 -1.075
## S.1 0.166 0.165 0.721
## Z 0.298 -0.108 0.993
## S.1:Z 0.249 -1.860 -0.961
##
## Out of 50 bootstraps, 50 converged ( 100 %)
##
## Test for wide effect modification on 1 degree of freedom. 2-sided p value < .0001
The next code chunk shows how the model can be defined and estimated all at once.
binary.est <- psdesign(data = fakedata, Z = Z, Y = Y.obs, S = S.obs, BIP = BIP) +
integrate_parametric(S.1 ~ BIP) +
risk_binary(model = Y ~ S.1 * Z, D = 50, risk = risk.logit) +
ps_estimate(method = "BFGS")
We provide summary and plotting methods for the psdesign object. If bootstrap replicates are present, the summary method does a test for wide effect modification. Under the parametric risk models implemented in this package, the test for wide effect modification is equivalent to a test that the \(S(1):Z\) coefficient is different from 0. This is implemented using a Wald test using the bootstrap estimate of the variance.
Another way to assess wide effect modification is to compute the standardized total gain (STG) (Erin E. Gabriel, Sachs, and Gilbert 2015). This is implemented in the calc_STG
function. The standardized total gain can be interpreted as the area sandwiched between the risk difference curve and the horizontal line at the marginal risk difference. It is a measure of the spread of the distribution of the risk difference, and is a less parametric way to test for wide effect modification. The calc_STG
function computes the STG at the estimated parameters, at the bootstrap samples, if present, and conducts a permutation test that the STG is different from 0. The permutation test randomly permutes the labels of the outcome variable to simulate the null distribution and provides a p-value. The function prints the results and invisibly returns a list containing the observed STG, the bootstrapped STGS, and the permuted STGs.
calc_STG(binary.boot, permute.times = 100, progress.bar = FALSE)
## $obsSTG
## [1] 0.5719707
##
## $bootstraps
## STG.boot.se STG.lower.CL.2.5 STG.upper.CL.97.5
## V1 0.3950485 0.3988421 1.138319
##
## $permutation
## [1] "permutation p = 0"
The summary method also computes the marginal vaccine efficacy marginalized over \(S(1)\) and compares it to the average vaccine efficacy conditional on \(S(1)\). This is an assessment of model fit. A warning will be given if the two estimates are dramatically different. These estimates are presented in the summary along with the empirical marginal vaccine efficacy.
smary <- summary(binary.boot)
## Augmented data frame: 500 obs. by 6 variables.
## Z Y S.1 S.0 cdfweights BIP
## 1 1 0 1.029 NA 1 0.2275
## 2 0 0 NA -0.423 1 -0.5432
## 3 0 0 NA -1.084 1 -1.1516
## 4 0 1 NA 0.322 1 0.0779
## 5 1 0 1.338 NA 1 0.4247
## 6 1 0 0.879 NA 1 -0.0341
##
## Empirical VE: 0.495
##
## Mapped variables:
## Z -> Z
## Y -> Y.obs
## S -> S.obs
## BIP -> BIP
##
## Integration models:
## integration model for S.1 :
## integrate_parametric(formula = S.1 ~ BIP )
##
## Risk models:
## risk_binary(model = Y ~ S.1 * Z, D = 50, risk = risk.logit )
##
## Estimated parameters:
## (Intercept) S.1 Z S.1:Z
## -1.384 0.387 0.276 -1.303
## Convergence: TRUE
##
## Bootstrap replicates:
## boot.se lower.CL.2.5% upper.CL.97.5%
## (Intercept) 0.232 -2.000 -1.075
## S.1 0.166 0.165 0.721
## Z 0.298 -0.108 0.993
## S.1:Z 0.249 -1.860 -0.961
##
## Out of 50 bootstraps, 50 converged ( 100 %)
##
## Test for wide effect modification on 1 degree of freedom. 2-sided p value < .0001
##
## Vaccine Efficacy:
## empirical marginal model
## 0.495 0.495 0.492
## Model-based average VE is -0.6 % different from the empirical and -0.6 % different from the marginal.
The calc_risk
function computes the risk in each treatment arm, and contrasts of the risks. By default it computes the vaccine efficacy, but there are other contrast functions available. The contrast function is a function that takes 2 inputs, the \(risk_0\) and \(risk_1\), and returns some one dimensional function of those two inputs. It must be vectorized. Some built-in functions are “VE” for vaccine efficacy \(= 1 - risk_1(s)/risk_0(s)\), “RR” for relative risk \(= risk_1(s)/risk_0(s)\), “logRR” for log of the relative risk, and “RD” for the risk difference \(= risk_1(s) -risk_0(s)\). You can pass the name of the function, or the function itself to calc_risk
. See ?calc_risk
for more information about contrast functions.
Other arguments of the calc_risk
function include t
, the time at which to calculate the risk for time-to-event outcomes, n.samps
which is the number of samples over the range of S.1 at which the risk will be calculated, and CI.type
, which can be "pointwise"
for pointwise confidence intervals or "band"
for a simultaneous confidence band. sig.level
is the significance level for the bootstrap confidence intervals.
head(calc_risk(binary.boot, contrast = "VE", n.samps = 20))
S.1 | Y | R0 | R1 | Y.boot.se | Y.upper.CL.0.95 | Y.lower.CL.0.95 | R0.boot.se | R0.upper.CL.0.95 | R0.lower.CL.0.95 | R1.boot.se | R1.upper.CL.0.95 | R1.lower.CL.0.95 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
V1 | -0.8372564 | -1.7082106 | 0.1534442 | 0.4155592 | 1.1758273 | -0.7390779 | -4.0085902 | 0.0393556 | 0.2491279 | 0.0684448 | 0.0774733 | 0.5516630 | 0.2887396 |
V2 | -0.6339887 | -1.2639270 | 0.1639372 | 0.3711418 | 0.8860543 | -0.4920145 | -3.0174554 | 0.0383486 | 0.2551078 | 0.0781847 | 0.0672707 | 0.4863613 | 0.2637811 |
V3 | -0.1161197 | -0.3895842 | 0.1932667 | 0.2685603 | 0.4099263 | 0.0447200 | -1.3393349 | 0.0350019 | 0.2707639 | 0.1089484 | 0.0426309 | 0.3410452 | 0.2067538 |
V4 | 0.0490538 | -0.1792365 | 0.2034228 | 0.2398836 | 0.3178573 | 0.1853505 | -1.0019890 | 0.0338063 | 0.2758826 | 0.1208001 | 0.0361968 | 0.3064580 | 0.1836151 |
V5 | 0.1099355 | -0.1090032 | 0.2072651 | 0.2298577 | 0.2894938 | 0.2318730 | -0.8864849 | 0.0333702 | 0.2777843 | 0.1254417 | 0.0340765 | 0.2942145 | 0.1755247 |
V6 | 0.1190925 | -0.0987636 | 0.2078476 | 0.2283754 | 0.2854649 | 0.2386396 | -0.8695268 | 0.0333053 | 0.2780711 | 0.1261529 | 0.0337700 | 0.2923978 | 0.1743324 |
head(calc_risk(binary.boot, contrast = function(R0, R1) 1 - R1/R0, n.samps = 20))
S.1 | Y | R0 | R1 | Y.boot.se | Y.upper.CL.0.95 | Y.lower.CL.0.95 | R0.boot.se | R0.upper.CL.0.95 | R0.lower.CL.0.95 | R1.boot.se | R1.upper.CL.0.95 | R1.lower.CL.0.95 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
V1 | -0.7443456 | -1.4975481 | 0.1581705 | 0.3950385 | 1.0347684 | -0.6219071 | -3.5264716 | 0.0389201 | 0.2518496 | 0.0727490 | 0.0728466 | 0.5218994 | 0.2771584 |
V2 | -0.6783042 | -1.3556074 | 0.1616014 | 0.3806696 | 0.9434272 | -0.5429986 | -3.2149133 | 0.0385851 | 0.2537961 | 0.0759586 | 0.0695147 | 0.5006358 | 0.2691029 |
V3 | -0.3892786 | -0.8072635 | 0.1773256 | 0.3204741 | 0.6203441 | -0.2371016 | -2.0772624 | 0.0368851 | 0.2624311 | 0.0915840 | 0.0550529 | 0.4132195 | 0.2356341 |
V4 | -0.0982180 | -0.3653295 | 0.1943485 | 0.2653497 | 0.3987953 | 0.0610433 | -1.3010968 | 0.0348729 | 0.2713158 | 0.1101818 | 0.0418886 | 0.3372052 | 0.2043726 |
V5 | 0.1510366 | -0.0636918 | 0.2098892 | 0.2232574 | 0.2718759 | 0.2617865 | -0.8112173 | 0.0330806 | 0.2790727 | 0.1286610 | 0.0327268 | 0.2861123 | 0.1702232 |
V6 | 0.3605245 | 0.1426685 | 0.2236417 | 0.1917351 | 0.1989690 | 0.3972018 | -0.4612803 | 0.0317255 | 0.2873381 | 0.1461777 | 0.0269065 | 0.2469684 | 0.1451658 |
It is easy to plot the risk estimates.
plot(binary.boot, contrast = "VE", lwd = 2)
abline(h = smary$VE.estimates[2], lty = 2)
expit <- function(x) exp(x)/(1 + exp(x))
trueVE <- function(s){
r0 <- expit(-1 - 0 * s)
r1 <- expit(-1 - .75 * s)
1 - r1/r0
}
rug(binary.boot$augdata$S.1)
curve(trueVE(x), add = TRUE, col = "red")
legend("bottomright", legend = c("estimated VE", "marginal VE", "true VE"),
col = c("black", "black", "red"), lty = c(1, 3, 1), lwd = c(2, 1, 1))
By default, plots of psdesign objects with bootstrap samples will display simultaneous confidence bands for the curve. These bands \(L\alpha\) satisfy
\[ P\left\{\sup_{s \in B} | \hat{VE}(s) - VE(s) | \leq L_\alpha \right\} \leq 1 - \alpha, \]
for confidence level \(\alpha\). The alternative is to use pointwise confidence intervals, with the option CI.type = "pointwise"
. These intervals satisfy
\[ P\left\{\hat{L}_\alpha \leq VE(s) \leq \hat{U}_\alpha\right\} \leq 1 - \alpha, \mbox{ for all } s. \]
Different summary measures are available for plotting. The options are “VE” for vaccine efficacy = \(1 - risk_1(s)/risk_0(s)\), “RR” for relative risk = \(risk_1(s)/risk_0(s)\), “logRR” for log of the relative risk, “risk” for the risk in each treatment arm, and “RD” for the risk difference = \(risk_1(s) - risk_0(s)\). We can also tranform using the log
option of plot
.
plot(binary.boot, contrast = "logRR", lwd = 2)
plot(binary.boot, contrast = "RR", log = "y", lwd = 2)
The calc_risk
function is the workhorse that creates the plots. You can call this function directly to obtain estimates, standard errors, and confidence intervals for the estimated risk in each treatment arm and transformations of the risk like VE. The parameter n.samps
determines the number of points at which to calculate the VE. The points are evenly spaced over the range of S.1. Use this function to compute other summaries, make plots using ggplot2
or lattice
and more.
ve.est <- calc_risk(binary.boot, CI.type = "pointwise", n.samps = 200)
head(ve.est)
S.1 | Y | R0 | R1 | Y.boot.se | Y.lower.CL.2.5 | Y.upper.CL.97.5 | R0.boot.se | R0.lower.CL.2.5 | R0.upper.CL.97.5 | R1.boot.se | R1.lower.CL.2.5 | R1.upper.CL.97.5 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
V1 | -1.5715720 | -3.849498 | 0.1200610 | 0.5822355 | 2.963667 | -11.203998 | -1.8818612 | 0.0413275 | 0.0418645 | 0.1703962 | 0.1059844 | 0.3945413 | 0.7682222 |
V2 | -1.3902144 | -3.240127 | 0.1276712 | 0.5413419 | 2.389472 | -9.217149 | -1.5801108 | 0.0410731 | 0.0473655 | 0.1753151 | 0.1008133 | 0.3679758 | 0.7229872 |
V3 | -1.3317194 | -3.054958 | 0.1302121 | 0.5280047 | 2.225421 | -8.634962 | -1.4880785 | 0.0409602 | 0.0492700 | 0.1769255 | 0.0988394 | 0.3595665 | 0.7072877 |
V4 | -1.0740453 | -2.304896 | 0.1419214 | 0.4690353 | 1.609720 | -6.380699 | -1.1156254 | 0.0402724 | 0.0585549 | 0.1864632 | 0.0885919 | 0.3235911 | 0.6324371 |
V5 | -1.0227113 | -2.168115 | 0.1443565 | 0.4573379 | 1.505848 | -5.987884 | -1.0412398 | 0.0400972 | 0.0605912 | 0.1891923 | 0.0862922 | 0.3166504 | 0.6165750 |
V6 | -0.9673797 | -2.025319 | 0.1470200 | 0.4447825 | 1.400190 | -5.583945 | -0.9634884 | 0.0398938 | 0.0628602 | 0.1921698 | 0.0837369 | 0.3092595 | 0.5991961 |
In the examples above, D
was set to a small number so that the vignette can be built in a timely manner. This parameter is the number of samples to use for the empirical integration in the estimated maximum likelihood. In real applications, we recommend using a much larger D
, a least the sample size, and possibly some multple of the sample size. Note that it will take much longer to run with a higher D
, but the results will be more accurate.
plot(binary.boot, contrast = "VE", lwd = 2, CI.type = "band")
sbs <- calc_risk(binary.boot, CI.type = "pointwise", n.samps = 200)
lines(Y.lower.CL.2.5 ~ S.1, data = sbs, lty = 3, lwd = 2)
lines(Y.upper.CL.97.5 ~ S.1, data = sbs, lty = 3, lwd = 2)
legend("bottomright", lwd = 2, lty = 1:3, legend = c("estimate", "simultaneous CI", "pointwise CI"))
library(ggplot2)
VE.est <- calc_risk(binary.boot, n.samps = 200)
ggplot(VE.est,
aes(x = S.1, y = Y, ymin = Y.lower.CL.0.95, ymax = Y.upper.CL.0.95)) +
geom_line() + geom_ribbon(alpha = .2) + ylab(attr(VE.est, "Y.function"))
cc.fit <- binary.cc + integrate_parametric(S.1 ~ BIP) +
risk_binary(D = 10) + ps_estimate()
cc.fit
## Augmented data frame: 500 obs. by 6 variables.
## Z Y S.1 S.0 cdfweights BIP
## 1 1 0 NA NA 0.2 0.2275
## 2 0 0 NA NA 0.2 -0.5432
## 3 0 0 NA NA 0.2 -1.1516
## 4 0 1 NA 0.322 1.0 0.0779
## 5 1 0 NA NA 0.2 0.4247
## 6 1 0 NA NA 0.2 -0.0341
##
## Empirical VE: 0.495
##
## Mapped variables:
## Z -> Z
## Y -> Y.obs
## S -> S.obs
## BIP -> BIP
## weights -> weights
##
## Integration models:
## integration model for S.1 :
## integrate_parametric(formula = S.1 ~ BIP )
##
## Risk models:
## risk_binary(D = 10 )
##
## Estimated parameters:
## (Intercept) S.1 Z S.1:Z
## -1.367 0.385 0.214 -1.261
## Convergence: TRUE
##
## No bootstraps present, see ?ps_bootstrap.
surv.fit <- psdesign(fakedata, Z = Z, Y = Surv(time.obs, event.obs),
S = S.obs, BIP = BIP, CPV = CPV) +
integrate_semiparametric(formula.location = S.1 ~ BIP, formula.scale = S.1 ~ 1) +
risk_exponential(D = 10) + ps_estimate(method = "BFGS") + ps_bootstrap(n.boots = 20)
## Warning in psdesign(fakedata, Z = Z, Y = Surv(time.obs, event.obs), S
## = S.obs, : tau missing in psdesign: assuming that the surrogate S was
## measured at time 0.
## Bootstrapping 20 replicates:
## ===========================================================================
surv.fit
## Augmented data frame: 500 obs. by 7 variables.
## Z Y S.1 S.0 cdfweights BIP CPV
## 1 1 0.139693419+ 1.029 NA 1 0.2275 NA
## 2 0 0.008134972 NA -0.423 1 -0.5432 0.579
## 3 0 0.025119584+ -0.180 -1.084 1 -1.1516 -0.180
## 4 0 0.014652331 NA 0.322 1 0.0779 NA
## 5 1 0.034002155 1.338 NA 1 0.4247 NA
## 6 1 0.358565530+ 0.879 NA 1 -0.0341 NA
##
## Empirical VE: -0.38
##
## Mapped variables:
## Z -> Z
## Y -> Surv(time.obs, event.obs)
## S -> S.obs
## BIP -> BIP
## CPV -> CPV
##
## Integration models:
## integration model for S.1 :
## integrate_semiparametric(formula.location = S.1 ~ BIP, formula.scale = S.1 ~ 1 )
##
## Risk models:
## risk_exponential(D = 10 )
##
## Estimated parameters:
## (Intercept) S.1 Z S.1:Z
## -0.895 0.162 0.213 -0.941
## Convergence: TRUE
##
## Bootstrap replicates:
## boot.se lower.CL.2.5% upper.CL.97.5%
## (Intercept) 0.1025 -1.0141 -0.705
## S.1 0.0823 -0.0228 0.274
## Z 0.1503 0.0191 0.544
## S.1:Z 0.1409 -1.1913 -0.676
##
## Out of 20 bootstraps, 20 converged ( 100 %)
##
## Test for wide effect modification on 1 degree of freedom. 2-sided p value < .0001
plot(surv.fit)
## Warning in riskcalc(psdesign$risk.function, psdesign$augdata$Y, psdesign
## $estimates$par, : No time given for time to event outcome, using restricted
## mean survival: 0.4
## Warning in riskcalc(psdesign$risk.function, psdesign$augdata$Y, thispar, :
## No time given for time to event outcome, using restricted mean survival:
## 0.4
## Warning in riskcalc(psdesign$risk.function, psdesign$augdata$Y, thispar, :
## No time given for time to event outcome, using restricted mean survival:
## 0.4
## Warning in riskcalc(psdesign$risk.function, psdesign$augdata$Y, thispar, :
## No time given for time to event outcome, using restricted mean survival:
## 0.4
## Warning in riskcalc(psdesign$risk.function, psdesign$augdata$Y, thispar, :
## No time given for time to event outcome, using restricted mean survival:
## 0.4
## Warning in riskcalc(psdesign$risk.function, psdesign$augdata$Y, thispar, :
## No time given for time to event outcome, using restricted mean survival:
## 0.4
## Warning in riskcalc(psdesign$risk.function, psdesign$augdata$Y, thispar, :
## No time given for time to event outcome, using restricted mean survival:
## 0.4
## Warning in riskcalc(psdesign$risk.function, psdesign$augdata$Y, thispar, :
## No time given for time to event outcome, using restricted mean survival:
## 0.4
## Warning in riskcalc(psdesign$risk.function, psdesign$augdata$Y, thispar, :
## No time given for time to event outcome, using restricted mean survival:
## 0.4
## Warning in riskcalc(psdesign$risk.function, psdesign$augdata$Y, thispar, :
## No time given for time to event outcome, using restricted mean survival:
## 0.4
## Warning in riskcalc(psdesign$risk.function, psdesign$augdata$Y, thispar, :
## No time given for time to event outcome, using restricted mean survival:
## 0.4
## Warning in riskcalc(psdesign$risk.function, psdesign$augdata$Y, thispar, :
## No time given for time to event outcome, using restricted mean survival:
## 0.4
## Warning in riskcalc(psdesign$risk.function, psdesign$augdata$Y, thispar, :
## No time given for time to event outcome, using restricted mean survival:
## 0.4
## Warning in riskcalc(psdesign$risk.function, psdesign$augdata$Y, thispar, :
## No time given for time to event outcome, using restricted mean survival:
## 0.4
## Warning in riskcalc(psdesign$risk.function, psdesign$augdata$Y, thispar, :
## No time given for time to event outcome, using restricted mean survival:
## 0.4
## Warning in riskcalc(psdesign$risk.function, psdesign$augdata$Y, thispar, :
## No time given for time to event outcome, using restricted mean survival:
## 0.4
## Warning in riskcalc(psdesign$risk.function, psdesign$augdata$Y, thispar, :
## No time given for time to event outcome, using restricted mean survival:
## 0.4
## Warning in riskcalc(psdesign$risk.function, psdesign$augdata$Y, thispar, :
## No time given for time to event outcome, using restricted mean survival:
## 0.4
## Warning in riskcalc(psdesign$risk.function, psdesign$augdata$Y, thispar, :
## No time given for time to event outcome, using restricted mean survival:
## 0.4
## Warning in riskcalc(psdesign$risk.function, psdesign$augdata$Y, thispar, :
## No time given for time to event outcome, using restricted mean survival:
## 0.4
## Warning in riskcalc(psdesign$risk.function, psdesign$augdata$Y, thispar, :
## No time given for time to event outcome, using restricted mean survival:
## 0.4
S.obs.cat
and BIP.cat
are factors:
with(fakedata, table(S.obs.cat, BIP.cat))
S.obs.cat/BIP.cat | (-Inf,-0.593] | (-0.593,0.0243] | (0.0243,0.639] | (0.639, Inf] |
---|---|---|---|---|
(-Inf,-0.157] | 91 | 34 | 0 | 0 |
(-0.157,0.569] | 34 | 28 | 62 | 1 |
(0.569,1.27] | 0 | 63 | 31 | 31 |
(1.27, Inf] | 0 | 0 | 32 | 93 |
cat.fit <- psdesign(fakedata, Z = Z, Y = Y.obs,
S = S.obs.cat, BIP = BIP.cat) +
integrate_nonparametric(formula = S.1 ~ BIP) +
risk_binary(Y ~ S.1 * Z, D = 10, risk = risk.probit) + ps_estimate(method = "BFGS")
cat.fit
## Augmented data frame: 500 obs. by 6 variables.
## Z Y S.1 S.0 cdfweights BIP
## 1 1 0 (0.569,1.27] <NA> 1 (0.0243,0.639]
## 2 0 0 <NA> (-Inf,-0.157] 1 (-0.593,0.0243]
## 3 0 0 <NA> (-Inf,-0.157] 1 (-Inf,-0.593]
## 4 0 1 <NA> (-0.157,0.569] 1 (0.0243,0.639]
## 5 1 0 (1.27, Inf] <NA> 1 (0.0243,0.639]
## 6 1 0 (0.569,1.27] <NA> 1 (-0.593,0.0243]
##
## Empirical VE: 0.495
##
## Mapped variables:
## Z -> Z
## Y -> Y.obs
## S -> S.obs.cat
## BIP -> BIP.cat
##
## Integration models:
## integration model for S.1 :
## integrate_nonparametric(formula = S.1 ~ BIP )
##
## Risk models:
## risk_binary(model = Y ~ S.1 * Z, D = 10, risk = risk.probit )
##
## Estimated parameters:
## (Intercept) S.1(-0.157,0.569] S.1(0.569,1.27]
## -0.166 -3.227 -0.783
## S.1(1.27, Inf] Z S.1(-0.157,0.569]:Z
## -0.128 -0.201 2.685
## S.1(0.569,1.27]:Z S.1(1.27, Inf]:Z
## 0.182 -1.270
## Convergence: TRUE
##
## No bootstraps present, see ?ps_bootstrap.
plot(cat.fit)
Categorical W allows for estimation of the model using the pseudo-score method. \(S\) may be continuous or categorical:
cat.fit.ps <- psdesign(fakedata, Z = Z, Y = Y.obs,
S = S.obs, BIP = BIP.cat) +
integrate_nonparametric(formula = S.1 ~ BIP) +
risk_binary(Y ~ S.1 * Z, D = 10, risk = risk.logit) + ps_estimate(method = "pseudo-score") +
ps_bootstrap(n.boots = 20, method = "pseudo-score")
## Bootstrapping 20 replicates:
## ===========================================================================
summary(cat.fit.ps)
## Augmented data frame: 500 obs. by 6 variables.
## Z Y S.1 S.0 cdfweights BIP
## 1 1 0 1.029 NA 1 (0.0243,0.639]
## 2 0 0 NA -0.423 1 (-0.593,0.0243]
## 3 0 0 NA -1.084 1 (-Inf,-0.593]
## 4 0 1 NA 0.322 1 (0.0243,0.639]
## 5 1 0 1.338 NA 1 (0.0243,0.639]
## 6 1 0 0.879 NA 1 (-0.593,0.0243]
##
## Empirical VE: 0.495
##
## Mapped variables:
## Z -> Z
## Y -> Y.obs
## S -> S.obs
## BIP -> BIP.cat
##
## Integration models:
## integration model for S.1 :
## integrate_nonparametric(formula = S.1 ~ BIP )
##
## Risk models:
## risk_binary(model = Y ~ S.1 * Z, D = 10, risk = risk.logit )
##
## Estimated parameters:
## (Intercept) S.1 Z S.1:Z
## -1.237 0.257 0.128 -1.174
## Convergence: TRUE
##
## Bootstrap replicates:
## boot.se lower.CL.2.5% upper.CL.97.5%
## (Intercept) 0.227 -1.6588 -0.885
## S.1 0.150 0.0594 0.592
## Z 0.344 -0.2797 0.676
## S.1:Z 0.266 -1.7808 -0.959
##
## Out of 20 bootstraps, 20 converged ( 100 %)
##
## Test for wide effect modification on 1 degree of freedom. 2-sided p value < .0001
##
## Vaccine Efficacy:
## empirical marginal model
## 0.495 0.495 0.487
## Model-based average VE is -1.6 % different from the empirical and -1.6 % different from the marginal.
plot(cat.fit.ps)
Follmann, D. 2006. “Augmented Designs to Assess Immune Response in Vaccine Trials.” Biometrics 62 (4): 1161–69.
Frangakis, CE, and DB Rubin. 2002. “Principal Stratification in Causal Inference.” Biometrics 58 (1): 21–29.
Gabriel, Erin E, and Dean Follmann. 2015. “Augmented Trial Designs for Evaluation of Principal Surrogates.” Biostatistics 0 (0): 1–25.
Gabriel, Erin E., and Peter B. Gilbert. 2014. “Evaluating Principal Surrogate Endpoints with Time-to-Event Data Accounting for Time-Varying Treatment Efficacy.” Biostatistics 15 (2): 251–65.
Gabriel, Erin E., Michael C. Sachs, and Peter B. Gilbert. 2015. “Comparing and Combining Biomarkers as Principle Surrogates for Time-to-Event Clinical Endpoints.” Statistics in Medicine 34 (3): 381–95. doi:10.1002/sim.6349.
Gilbert, PB, and MG Hudgens. 2008. “Evaluating Candidate Principal Surrogate Endpoints.” Biometrics 64 (4): 1146–54.
Huang, Y, and PB Gilbert. 2011. “Comparing Biomarkers as Principal Surrogate Endpoints.” Biometrics.
Huang, Ying, Peter B Gilbert, and Julian Wolfson. 2013. “Design and Estimation for Evaluating Principal Surrogate Markers in Vaccine Trials.” Biometrics 69 (2): 301–9.
Pepe, MS, and TR Fleming. 1991. “A Nonparametric Method for Dealing with Mismeasured Covariate Data.” Journal of the American Statistical Association 86 (413): 108–13.
Wolfson, J. 2009. “Statistical Methods for Identifying Surrogate Endpoints in Vaccine Trails.” PhD thesis, Chair: Gilbert, Peter: University of Washington; Department of Biostatistics.
Wolfson, J, and PB Gilbert. 2010. “Statistical Identifiability and the Surrogate Endpoint Problem, with Application to Vaccine Trials.” Biometrics 66 (4): 1153–61.