# Analysing data with psborrow

This vignette provides additional details of the analysis model that is fitted as well as a demonstration of how to use the package to fit said models to an existing dataset. For examples of how to fit the analysis model to the simulated datasets generated by the package itself, please see the user-guide vignette.

The “Analysis Model” section of this vignette is a modified excerpt from Lu, Y. et al. (2021).

## Analysis Model

The package uses Markov chain Monte Carlo (MCMC) through JAGS to obtain the posterior distribution for the hazard ratio between treatment and control arms. The package relies on the Bayesian commensurate prior approach (Lewis, et al., 2019). The probability density function for the survival time of patient $$i$$ following a Weibull model is as below:

$t_i \sim Weibull( \nu, \lambda_i)$

for $$i = 1, 2, \dots,n$$, where:

• $$\nu$$ is the shape parameter
• $$\lambda$$ is the rate,
• $$n$$ is the total sample size of the analysis population.

The probability density function is:

$f(t_i, x_i) = \nu\lambda_i t^{\nu-1} e^{-\lambda_it^\nu}$

with the rate taking the format:

$\lambda_i = exp(\alpha_i + \beta_{trt} \times I_{trt} + \beta^T \times X_{i})$

Where:

• $$\alpha_i = \alpha_{cc}$$ if patient $$i$$ is in the concurrent trial, and $$\alpha_i = \alpha_{ec}$$ if patient $$i$$ is in the external trial. Because the impact of therapy is captured by $$I_{trt}$$ below, $$\alpha_{cc}$$ and $$\alpha_{ec}$$ effectively correspond to the log of baseline hazard rate for concurrent control arm and external control arm, respectively.
• $$\beta_n$$ is an n-length vector of regression coefficients
• $$X_{ni}$$ is an n-length covariate vector for patient $$i$$, including all additional parameters to adjust for (e.g. propensity score).
• $$I_{trt}$$ is the treatment indicator (value equals to 0 or 1).

A hierarchical model is defined in which the log hazard rate for the concurrent controls is centered on the external controls, i.e. $$\alpha_{cc} | \alpha_{ec} \sim N(\alpha_{ec}, \tau)$$. The hyperprior $$\tau$$, also referred to as the commensurability parameter, determines the extent of borrowing (see Lewis, et al., 2019).

Four borrowing methods have been implemented:

1. Not borrowing any external control information
2. Fully incorporating external control arm patients to the concurrent control group
3. Dynamic borrowing with the commensurability hyperparameter $$\tau$$ following a Gamma distribution
4. Dynamic borrowing with the commensurability hyperparameter $$\tau$$ following a half-Cauchy distribution

For methods (1) and (2), a conventional Bayesian model without dynamic borrowing is deployed with all relevant parameters assigned a non-informative prior. For method (1) no external control arm information is used and as such $$\alpha_{cc} = \alpha \sim N(0, 0.0001)$$ For method (2) the external control arm is treated as part of the concurrent control and as such $$\alpha_{cc} = \alpha_{ec} = \alpha \sim N(0, 0.0001)$$.

In comparison, methods (3) and (4) have the commensurability hyper-parameter $$\tau$$, which acts as the precision variable assessing the similarity between the concurrent and external control groups. The prior of $$tau$$ follows a Gamma distribution (default: shape = 1, rate = 0.001) and a half-Cauchy distribution (default: location = 0, scale = 0.2), respectively, in methods 3 and 4. Users can update the default values for the parameter if they wish to customize the hyper-prior.

After users state the prior choices and the simulated data is ready for use, the Markov chain Monte Carlo (MCMC) algorithm is deployed to obtain samples from the posterior distribution.

## Example of use

To demonstrate how to use the package to fit the analysis model we first generate a dataset where the time to event is drawn from the following distribution:

$t_i \sim Weibull(0.95, \lambda_i)$ Where:

$\lambda_i = \frac{1}{150}I_{ec} \times \frac{1}{200}I_{cc} \times \frac{1}{400}I_{trt} \times exp(0.2\times x_{age} - 0.1\times I_{female})$

• $$I_{ec}$$ is 1 if the patient is in the external trial else 0
• $$I_{cc}$$ is 1 if the patient is in the concurrent trial else 0
• $$I_{trt}$$ is 1 if the patient is in the treatment arm else 0
• $$I_{female}$$ is 1 if the patient is female else 0
• $$x_{age}$$ is the patient’s age

Patients are then censored by drawing a censoring time from the exponential distribution with a rate parameter of $$\lambda = 1/400$$

suppressPackageStartupMessages({
library(psborrow)
library(dplyr)
library(flexsurv)
})

set.seed(401)

n <- 1000

grp_lvl <- c("CC", "TRT", "EC")
sex_lvl <- c("Male", "Female")

log_hr_grp <- c(
"TRT" = log(1 / 400),
"CC" = log(1 / 200),
"EC" = log(1 / 150)
)

log_hr_sex <- c(
"Male" = 0,
"Female" = -0.1
)

log_hr_age <- 0.2

Shape <- 0.95

dat <- tibble(
pt = 1:n,
grp = factor(
sample(grp_lvl, size = n, replace = TRUE, prob = c(0.3, 0.3, 0.4)),
levels = grp_lvl
),
sex = factor(
sample(sex_lvl, size = n, replace = TRUE, prob = c(0.4, 0.6)),
levels = sex_lvl
),
age = rnorm(n, mean = 0, sd = 1),
lambda = exp(
log_hr_age * age +
log_hr_sex[as.character(sex)] +
log_hr_grp[as.character(grp)]
),
time = rweibullPH(n, scale = lambda, shape = Shape),
centime = rexp(n, 1 / 400)
) %>%
mutate(event = ifelse(time <= centime, 1, 0)) %>%
mutate(time = ifelse(time <= centime, time, centime)) %>%
select(pt, grp, age, sex, time, event)

dat
#> # A tibble: 1,000 × 6
#>       pt grp      age sex     time event
#>    <int> <fct>  <dbl> <fct>  <dbl> <dbl>
#>  1     1 TRT    1.12  Female  64.0     1
#>  2     2 CC    -0.577 Female  14.3     0
#>  3     3 CC    -0.281 Female 190.      0
#>  4     4 CC    -0.371 Male   348.      0
#>  5     5 CC     1.29  Female  43.6     1
#>  6     6 EC    -0.840 Male    27.8     0
#>  7     7 EC     0.901 Female 706.      1
#>  8     8 TRT    0.906 Female 105.      1
#>  9     9 CC     1.43  Male    57.9     1
#> 10    10 TRT    2.75  Female  85.9     1
#> # … with 990 more rows

To conduct an analysis we need to use the apply_mcmc() function. However to use this function we first need to convert our data into the required format. Full details of the required format can be found in help(apply_mcmc) however an example of this is shown below:

dat2 <- dat %>%
mutate(trt = ifelse(grp == "TRT", 1, 0)) %>%
mutate(ext = ifelse(grp == "EC", 1, 0)) %>%
mutate(cnsr = 1 - event) %>%
select(pt, trt, ext, time, cnsr, sex, age)

dat2
#> # A tibble: 1,000 × 7
#>       pt   trt   ext  time  cnsr sex       age
#>    <int> <dbl> <dbl> <dbl> <dbl> <fct>   <dbl>
#>  1     1     1     0  64.0     0 Female  1.12
#>  2     2     0     0  14.3     1 Female -0.577
#>  3     3     0     0 190.      1 Female -0.281
#>  4     4     0     0 348.      1 Male   -0.371
#>  5     5     0     0  43.6     0 Female  1.29
#>  6     6     0     1  27.8     1 Male   -0.840
#>  7     7     0     1 706.      0 Female  0.901
#>  8     8     1     0 105.      0 Female  0.906
#>  9     9     0     0  57.9     0 Male    1.43
#> 10    10     1     0  85.9     0 Female  2.75
#> # … with 990 more rows

We can now pass our data to apply_mcmc() to produce our posterior samples. Prior distributions are specified using the set_prior() function, details for which can be found in help(set_prior) as well as the user-guide vignette. Covariates for the analysis model need to be specified by a 1-sided formula provided to the formula_cov argument. Note that the treatment indicator is automatically added to the model and should not be specified in the covariate formula; if you require treatment interactions these should be specified explicitly via trt:cov rather than by trt*cov. The intercept term must always be included in this formula. You are advised to leave the pred parameter as either “all” or “ps” depending on your requirements. If you do wish to restrict the covariates included in the model then this should be done by omitting them from the formula_cov rather than the pred argument in set_prior().

postsamp <- apply_mcmc(
dt = dat2,
formula_cov = ~ 1 + age + sex,
priorObj = set_prior(pred = "all", prior = "gamma", r0 = 0.85, alpha = c(0, 0)),
n.chains = 1,
n.burn = 100,
n.iter = 200,
seed = 47
)
#> Input all is found in pred. Any other input is disregarded.
#>
|
|                                                  |   0%
|
|*                                                 |   2%
|
|**                                                |   4%
|
|***                                               |   6%
|
|****                                              |   8%
|
|*****                                             |  10%
|
|******                                            |  12%
|
|*******                                           |  14%
|
|********                                          |  16%
|
|*********                                         |  18%
|
|**********                                        |  20%
|
|***********                                       |  22%
|
|************                                      |  24%
|
|*************                                     |  26%
|
|**************                                    |  28%
|
|***************                                   |  30%
|
|****************                                  |  32%
|
|*****************                                 |  34%
|
|******************                                |  36%
|
|*******************                               |  38%
|
|********************                              |  40%
|
|*********************                             |  42%
|
|**********************                            |  44%
|
|***********************                           |  46%
|
|************************                          |  48%
|
|*************************                         |  50%
|
|**************************                        |  52%
|
|***************************                       |  54%
|
|****************************                      |  56%
|
|*****************************                     |  58%
|
|******************************                    |  60%
|
|*******************************                   |  62%
|
|********************************                  |  64%
|
|*********************************                 |  66%
|
|**********************************                |  68%
|
|***********************************               |  70%
|
|************************************              |  72%
|
|*************************************             |  74%
|
|**************************************            |  76%
|
|***************************************           |  78%
|
|****************************************          |  80%
|
|*****************************************         |  82%
|
|******************************************        |  84%
|
|*******************************************       |  86%
|
|********************************************      |  88%
|
|*********************************************     |  90%
|
|**********************************************    |  92%
|
|***********************************************   |  94%
|
|************************************************  |  96%
|
|************************************************* |  98%
|
|**************************************************| 100%

From here we can extract our posterior samples as follows:

head(extract_samples(postsamp))
#>       HR_cc_hc HR_trt_cc  alpha  alpha   beta_trt  beta_age beta_sexFemale        r0       tau
#> [1,] 1.1096928 0.3671814 -4.949371 -4.845288 -1.0018994 0.2745032    -0.25909116 0.9201647  360.0789
#> [2,] 1.0268710 0.4697018 -4.876703 -4.850186 -0.7556573 0.2539210    -0.25321051 0.9204208  283.2014
#> [3,] 1.0575392 0.3991505 -4.942033 -4.886088 -0.9184167 0.3249545    -0.22758668 0.9211114  667.9504
#> [4,] 1.0205880 0.4088358 -4.897589 -4.877210 -0.8944417 0.2851334    -0.16777557 0.9215409 2315.9094
#> [5,] 0.9857758 0.3605800 -4.859848 -4.874174 -1.0200414 0.2935454    -0.18770880 0.9191615 2675.4973
#> [6,] 0.9864785 0.3633342 -4.903234 -4.916847 -1.0124321 0.2535249    -0.09992967 0.9020110  783.1036

For reference the returned parameters are:

• HR_cc_hc - The hazard ratio between the historical control arm and the concurrent control arm. This can be be thought of as the ratio of the scale parameter between the baseline external control distribution and the baseline trial distribution. This is equivalent to exp(alpha - alpha)

• HR_trt_cc - The hazard ratio between the treatment arm and the concurrent control arm. This is equivalent to exp(beta_trt)

• alpha - The shape parameter for the trial’s baseline distribution

• alpha - The shape parameter for the historical control’s baseline distribution

• beta_trt - The log-hazard ratio for the treatment effect. This is equivalent to log(HR_trt_cc)

• beta_age - The log-hazard ratio for the covariate age

• beta_sexFemale - The log-hazard ratio for the covariate sexFemale

• r0 - The scale parameter for the baseline distribution of both the trial and the historical control

• tau/sigma - The precision/variance for alpha i.e. controls how much information is borrowed from the historical control arm

If prior = "no_ext" or prior = "full_ext" then alpha and alpha will be replaced with a single parameter alpha representing the shape parameter for the single baseline distribution for the trail.

Additionally general summary statistics of our samples can be generated by the summary() function:

summary(postsamp)
#>        parameter         mean           sd        lci           uci
#> 1       HR_cc_hc    1.0112903   0.03767595  0.9401316    1.09642467
#> 2      HR_trt_cc    0.3857669   0.03881582  0.3230326    0.46980137
#> 3       alpha   -4.9414910   0.09106815 -5.0775808   -4.76802672
#> 4       alpha   -4.9309512   0.08729994 -5.0899134   -4.75289074
#> 5       beta_trt   -0.9573926   0.09823656 -1.1300021   -0.75544616
#> 6       beta_age    0.2781143   0.04180654  0.2019639    0.35759578
#> 7 beta_sexFemale   -0.1390660   0.07769175 -0.2664242    0.02923811
#> 8             r0    0.9239697   0.01768511  0.8921297    0.95277956
#> 9            tau 1002.0669283 999.10913348 49.5932394 3564.40952449