MCBoostSurv - Basics

library("mcboost")
library("mlr3")
library("mlr3proba")
library("mlr3pipelines")
library("mlr3learners")
library("tidyverse")
set.seed(27099)

Minimal Example: McBoostSurv

To show the basic functionality of MCBoostSurv, we provide a minimal example on the standard survival data set rats. After loading and pre-processing the data, we train a mlr3learner on the training data. We instantiate a MCBoostSurv instance with the default parameters. Then, we run the $multicalibrate() method on our data to start multi-calibration in survival analysis. With $predict_probs(), we can get multicalibrated predictions.


#prepare task 
task = tsk("rats")
prep_pipe = po("encode", param_vals = list(method="one-hot")) 
prep = prep_pipe$train(list(task))[[1]]

#split data
train = prep$clone()$filter(1:199)
val = prep$clone()$filter(200:250)
test = prep$clone()$filter(256:300)

# get trained survival model 
baseline = lrn("surv.ranger")$train(train)

# initialize mcboost
mc_surv = MCBoostSurv$new(init_predictor = baseline)

# multicalibrate model 
mc_surv$multicalibrate(data = val$data(cols = val$feature_names), 
                       labels = val$data(cols = val$target_names))

# get new predictions
mc_surv$predict_probs(test$data(cols = test$feature_names))

What does mcboost do?

Internally mcboostsurv runs the following procedure max_iter times (similar ro mcboost, just for distributions over time):

  1. Predict on X using the model from the previous iteration, init_predictor in the first iteration.
  2. Compute the residuals res = y - y_hat for all time points
  3. Split predictions into num_buckets according to y_hat and time.
  4. Fit the auditor (auditor_fitter) (here calledc(x)) on the data in each bucket with target variable r.
  5. Compute misscal = mean(c(x) * res(x))
  6. if misscal > alpha: For the bucket with highest misscal, update the model using the prediction c(x). else: Stop the procedure

Multicalibrate model trained on PBC data

Based on this, we can now show multicalibration on a data set with two sensitive attributes (age and gender). Again, we load and pre-process the data.

Load Dataset

Mutlicalibrate survival model with validation data

Development of IBS in the defined subgroups