Cumulative Incidence Regression

Klaus Holst & Thomas Scheike

2020-09-25

Fine-Gray model

considered \[\begin{align*} U^{FG}_{n}(\beta) = \sum_{i=0}^{n} \int_0^{+\infty} \left( X_i- E_n(t,\beta) \right) w_i(t,X_i) dN_{1,i}(t) \text{ where } E_n(t,\beta)=\frac{\tilde S_1(t,\beta) }{\tilde S_0(t,\beta)}, \end{align*}\] with \(w_i(t,X_i) = \frac{G_c(t,X_i)}{G_c(T_i \wedge t,X_i)} I( C_i > T_i \wedge t )\) ,\(\tilde S_k(t,\beta) = \sum_{j=1}^n X_j^k \exp(X_j^T\beta) Y_{1,j}(t)\) for \(k=0,1\), and with with \(\tilde Y_{1,i}(t) = Y_{1,i}(t) w_i(t,X_i)\) for \(i=1,...,n\). \(w_i(t)\) needs to be replaced by an estimator of the censoring distribution, and since it does not depend on \(X\) the \(\hat w_i(t) = \frac{\hat G_c(t,X_i)}{\hat G_c(T_i \wedge t,X_i)} I(C_i > T_i \wedge t)\) where \(\hat G_c\) is the Kaplan-Meier estimator of the censoring distribution.

In this article we briefly introduce some functions for doing cumulative incidence regression, and how to augment the Fine-Gray estimator.

First we simulate some competing risks data using some utility functions.

We simulate data with two causes based on the Fine-Gray model: \[\begin{align} F_1(t,X) & = P(T\leq t, \epsilon=1|X)=( 1 - exp(-\Lambda_1(t) \exp(X^T \beta_1))) \\ F_2(t,X) & = P(T\leq t, \epsilon=2|X)= ( 1 - exp(-\Lambda_2(t) \exp(X^T \beta_2))) \cdot (1 - F_1(\infty,X)) \end{align}\] where the baselines are given as \(\Lambda_j(t) = \rho_j (1- exp(-t/\nu_j))\) for \(j=1,2\), and the \(X\) being two independent binomials. Alternatively, one can also replace the FG-model with a logistic link \(\mbox{expit}( \Lambda_j(t) + \exp(X^T \beta_j))\).

The advantage of the model is that it is easy to fit and to get standard errors, and that it is quite flexible essentially being a Cox-model. On the downside is that the coefficients are quite hard to interpret since they are the \(cloglog\) coefficients of \(1-F_1(t,X)\). Specifically, \[\begin{align} \log(-\log( 1-F_1(t,X_1+1,X_2))) - \log(-\log( 1-F_1(t,X_1,X_2))) & = \beta_1, \end{align}\] so the effect is \(\beta_1\) of \(X_1\) is on \(1-F_1(t,X)\) on the \(cloglog\) scale.

 library(mets)
 options(warn=-1)
 set.seed(1000) # to control output in simulatins for p-values below.

 rho1 <- 0.2; rho2 <- 10
 n <- 4000
 beta=c(0.0,-0.1,-0.5,0.3)
 ## beta1=c(0.0,-0.1); beta2=c(-0.5,0.3)
 dats <- simul.cifs(n,rho1,rho2,beta,rc=0.5,rate=7)
 dtable(dats,~status)
#> 
#> status
#>    0    1    2 
#> 1184  147 2669
 dsort(dats) <- ~time

We have a look at the non-parametric cumulative incidence curves

 par(mfrow=c(1,2))
 cifs1 <- cif(Event(time,status)~strata(Z1,Z2),dats,cause=1)
 plot(cifs1)

 cifs2 <- cif(Event(time,status)~strata(Z1,Z2),dats,cause=2)
 plot(cifs2)

Now fitting the Fine-Gray model

 fg <- cifreg(Event(time,status)~Z1+Z2,data=dats,cause=1,propodds=NULL)
 summary(fg)
#> 
#>     n events
#>  4000    147
#> 
#>  4000 clusters
#> log-coeffients:
#>     Estimate      S.E.   dU^-1/2 P-value
#> Z1  0.068822  0.081846  0.082637  0.4004
#> Z2 -0.368080  0.166064  0.167404  0.0267
#> 
#> exp(coeffients):
#>      Estimate  Std.Err     2.5%    97.5% P-value
#> [Z1] 1.071246 0.087678 0.899401 1.243090  0.4165
#> [Z2] 0.692062 0.114927 0.466810 0.917315  0.0074

 dd <- expand.grid(Z1=c(-1,1),Z2=0:1)
 pfg <- predict(fg,dd)
 plot(pfg,ylim=c(0,0.2))

Comparison

We compare with the cmprsk function but without running it to avoid dependencies:

run <- 0
if (run==1) {
 library(cmprsk)
 mm <- model.matrix(~Z1+Z2,dats)[,-1]
 cr <- with(dats,crr(time,status,mm))
 cbind(cr$coef,diag(cr$var)^.5,fg$coef,fg$se.coef,cr$coef-fg$coef,diag(cr$var)^.5-fg$se.coef)
}
#           [,1]       [,2]        [,3]       [,4]         [,5]          [,6]
# Z1  0.06882201 0.08184642  0.06882199 0.08184642 1.398233e-08 -9.805955e-11
# Z2 -0.36807861 0.16606441 -0.36807951 0.16606444 9.045060e-07 -2.529264e-08

if (run==1) {
 fg0 <- cifreg(Event(time,status)~Z1+Z2,data=dats,cause=1,propodds=NULL,beta=cr$coef,no.opt=TRUE)
 cbind(diag(cr$var)^.5,fg0$se.coef,diag(cr$var)^.5-fg0$se.coef)
}
#            [,1]       [,2]          [,3]
# [1,] 0.08184642 0.08184642 -8.975096e-11
# [2,] 0.16606441 0.16606441 -1.091264e-10

Estimates and SE’s are slightly different.

We also remove all censorings from the data to compare the estimates with those based on coxph, and observe that the estimates as well as the standard errors agree

datsnc <- dtransform(dats,status=2,status==0)
dtable(datsnc,~status)
#> 
#> status
#>    1    2 
#>  147 3853
datsnc$id <- 1:n
datsnc$entry <- 0
max <- max(dats$time)+1
## for cause 2 add risk interaval 
datsnc2 <- subset(datsnc,status==2)
datsnc2 <- transform(datsnc2,entry=time)
datsnc2 <- transform(datsnc2,time=max)
datsncf <- rbind(datsnc,datsnc2)
#
cifnc <- cifreg(Event(time,status)~Z1+Z2,data=datsnc,cause=1,propodds=NULL)
cc <- coxph(Surv(entry,time,status==1)~Z1+Z2+cluster(id),datsncf,robust=TRUE)
cbind(cifnc$coef,cifnc$se.coef, cc$coef, diag(cc$var)^.5)
#>           [,1]       [,2]        [,3]       [,4]
#> Z1  0.06785579 0.08251841  0.06785579 0.08251841
#> Z2 -0.37911703 0.16716489 -0.37911703 0.16716489
cbind(cifnc$coef-cc$coef, diag(cc$var)^.5-cifnc$se.coef)
#>            [,1]          [,2]
#> Z1 2.341183e-14  4.260481e-15
#> Z2 5.834222e-14 -1.521006e-14

the cmprsk gives something slightly-slightly different

if (run==1) {
 library(cmprsk)
 mm <- model.matrix(~Z1+Z2,datsnc)[,-1]
 cr <- with(datsnc,crr(time,status,mm))
 cbind(cifnc$coef-cr$coef, diag(cr$var)^.5-cifnc$se.coef)
}
#             [,1]          [,2]
# Z1  1.194188e-10 -6.742754e-11
# Z2 -8.828051e-07 -2.488335e-08

Strata dependent Censoring weights

We can improve efficiency and avoid bias by allowing the censoring weights to depend on the covariates

 fgcm <- cifreg(Event(time,status)~Z1+Z2,data=dats,cause=1,propodds=NULL,
        cens.model=~strata(Z1,Z2))
 summary(fgcm)
#> 
#>     n events
#>  4000    147
#> 
#>  4000 clusters
#> log-coeffients:
#>     Estimate      S.E.   dU^-1/2 P-value
#> Z1  0.069400  0.078569  0.083018  0.3771
#> Z2 -0.282263  0.159823  0.167928  0.0774
#> 
#> exp(coeffients):
#>      Estimate  Std.Err     2.5%    97.5% P-value
#> [Z1] 1.071865 0.084216 0.906805 1.236925  0.3935
#> [Z2] 0.754076 0.120519 0.517863 0.990288  0.0413
 summary(fg)
#> 
#>     n events
#>  4000    147
#> 
#>  4000 clusters
#> log-coeffients:
#>     Estimate      S.E.   dU^-1/2 P-value
#> Z1  0.068822  0.081846  0.082637  0.4004
#> Z2 -0.368080  0.166064  0.167404  0.0267
#> 
#> exp(coeffients):
#>      Estimate  Std.Err     2.5%    97.5% P-value
#> [Z1] 1.071246 0.087678 0.899401 1.243090  0.4165
#> [Z2] 0.692062 0.114927 0.466810 0.917315  0.0074

We note that the standard errors are slightly smaller for the more efficient estimator.

The influence functions of the FG-estimator is given by ,
\[\begin{align*} \phi_i^{FG} & = \int (X_i- e(t)) \tilde w_i(t) dM_{i1}(t,X_i) + \int \frac{q(t)}{\pi(t)} dM_{ic}(t), \\ & = \phi_i^{FG,1} + \phi_i^{FG,2}, \end{align*}\] where the first term is what would be achieved for a known censoring distribution, and the second term is due to the variability from the Kaplan-Meier estimator. Where \(M_{ic}(t) = N_{ic}(t) - \int_0^t Y_i(s) d\Lambda_c (s)\) with \(M_{ic}\) the standard censoring martingale.

The function \(q(t)\) that reflects that the censoring only affects the terms related to cause “2” jumps, can be written as (see Appendix B2) \[\begin{align*} q(t) & = E( H(t,X) I(T \leq t, \epsilon=2) I(C > T)/G_c(T)) = E( H(t,X) F_2(t,X) ), \end{align*}\] with \(H(t,X) = \int_t^{\infty} (X- e(s)) G(s) d \Lambda_1(s,X)\) and since \(\pi(t)=E(Y(t))=S(t) G_c(t)\).

In the case where the censoring weights are stratified (based on \(X\)) we get the influence functions related to the censoring term with \[\begin{align*} q(t,X) & = E( H(t,X) I(T \leq t, \epsilon=2) I(T < C)/G_c(T,X) | X) = H(t,X) F_2(t,X), \end{align*}\] so that the influence function becomes \[\begin{align*} \int (X-e(t)) w(t) dM_1(t,X) + \int H(t,X) \frac{F_2(t,X)}{S(t,X)} \frac{1}{G_c(t,X)} dM_c(t,X). \end{align*}\] with \(H(t,X) = \int_t^{\infty} (X- e(s)) G(s,X) d \Lambda_1(s,X)\).

Augmenting the FG-estimator

Rather than using a larger censoring model we can also compute the augmentation term directly and then fit the FG-model based on this augmentation term and do a couple of iterations

  fgaugS <- FG_AugmentCifstrata(Event(time,status)~Z1+Z2+strata(Z1,Z2),data=dats,cause=1,E=fg$E)
  summary(fgaugS)
#> 
#>     n events
#>  4000    147
#> 
#>  4000 clusters
#> log-coeffients:
#>     Estimate      S.E.   dU^-1/2 P-value
#> Z1  0.085343  0.078087  0.082735  0.2744
#> Z2 -0.271442  0.156652  0.166221  0.0831
#> 
#> exp(coeffients):
#>      Estimate  Std.Err     2.5%    97.5% P-value
#> [Z1] 1.089091 0.085044 0.922409 1.255773  0.2948
#> [Z2] 0.762280 0.119412 0.528236 0.996324  0.0465

  fgaugS2 <- FG_AugmentCifstrata(Event(time,status)~Z1+Z2+strata(Z1,Z2),data=dats,cause=1,E=fgaugS$E)
  summary(fgaugS2)
#> 
#>     n events
#>  4000    147
#> 
#>  4000 clusters
#> log-coeffients:
#>     Estimate      S.E.   dU^-1/2 P-value
#> Z1  0.085497  0.078116  0.082736  0.2737
#> Z2 -0.270548  0.156632  0.166211  0.0841
#> 
#> exp(coeffients):
#>      Estimate  Std.Err     2.5%    97.5% P-value
#> [Z1] 1.089258 0.085089 0.922487 1.256029  0.2942
#> [Z2] 0.762961 0.119504 0.528737 0.997185  0.0473

  fgaugS3 <- FG_AugmentCifstrata(Event(time,status)~Z1+Z2+strata(Z1,Z2),data=dats,cause=1,E=fgaugS2$E)
  summary(fgaugS3)
#> 
#>     n events
#>  4000    147
#> 
#>  4000 clusters
#> log-coeffients:
#>     Estimate      S.E.   dU^-1/2 P-value
#> Z1  0.085498  0.078116  0.082736  0.2737
#> Z2 -0.270540  0.156632  0.166211  0.0841
#> 
#> exp(coeffients):
#>      Estimate  Std.Err     2.5%    97.5% P-value
#> [Z1] 1.089260 0.085089 0.922488 1.256032  0.2942
#> [Z2] 0.762967 0.119505 0.528742 0.997193  0.0473

Again we note slightly smaller standard errors when augmenting the estimator.

The function compute the augmentation term for fixed \(E(t)\) based on the current \(\hat \beta\) \[\begin{align*} U_n^{A} = \sum_{i=1}^n \int_{0}^{+\infty} \frac{F_2(t,X_i)}{S(t,X_i)G_c(t,X_i)} H(t,X_i,E,G_c,\Lambda_1) dM_{ci}(t) \end{align*}\] using working models based on stratification to get \(F_1^s\) and \(F_2^s\) where the strata are given by \(strata()\) in the call. Then fits the FG model so solve the \[\begin{align*} U_n^{A}(\beta_p) + U^{FG}_{n}(\beta) = 0. \end{align*}\]

Then we may iterate to get a solution to the augmented score equation \[\begin{align*} U_n^{A}(\beta_\infty) + U^{FG}_{n}(\beta_\infty) = 0. \end{align*}\]

The censoring model here is one overall Kaplan-Meier.

The influence funtion for the augmented estimator is \[\begin{align*} \int (X-e(t)) w(t) dM_1(t,X) + \int H(t,X) \frac{F_2(t,X)}{S(t,X)} \frac{1}{G_c(t)} dM_c. \end{align*}\] and standard errors are based on this formula.

SessionInfo

sessionInfo()
#> R version 4.0.2 (2020-06-22)
#> Platform: x86_64-redhat-linux-gnu (64-bit)
#> Running under: Fedora 32 (Workstation Edition)
#> 
#> Matrix products: default
#> BLAS/LAPACK: /usr/lib64/libopenblas-r0.3.10.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] mets_1.2.8.1   lava_1.6.8     timereg_1.9.6  survival_3.2-3
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.5          mvtnorm_1.1-1       lattice_0.20-41    
#>  [4] ucminf_1.1-4        digest_0.6.25       grid_4.0.2         
#>  [7] magrittr_1.5        evaluate_0.14       rlang_0.4.7        
#> [10] stringi_1.4.6       Matrix_1.2-18       rmarkdown_2.3      
#> [13] splines_4.0.2       tools_4.0.2         stringr_1.4.0      
#> [16] numDeriv_2016.8-1.1 xfun_0.15           yaml_2.2.1         
#> [19] compiler_4.0.2      htmltools_0.5.0     knitr_1.29