How to use breakDown package for models created with caret

Przemyslaw Biecek

2018-04-06

This example demonstrates how to use the breakDown package for models created with the caret package.

First we will generate some data.

library(caret)

set.seed(2)
training <- twoClassSim(50, linearVars = 2)
trainX <- training[, -ncol(training)]
trainY <- training$Class

head(training)
#>   TwoFactor1 TwoFactor2    Linear1    Linear2 Nonlinear1 Nonlinear2
#> 1 -0.6561702 -1.6480450  1.0744594  0.9758906  0.2342843  0.6805653
#> 2 -0.9849973  1.4598834  0.2605978 -0.1694232  0.1381283  0.7460168
#> 3  2.3722541  1.7069944 -0.3142720  0.7221918 -0.6920591  0.4642024
#> 4 -2.2067173 -0.6972704 -0.7496301 -0.8444186 -0.9303336  0.1374181
#> 5  0.5166671 -0.7228376 -0.8621983  1.2772937  0.9959069  0.8143796
#> 6  1.3331262 -0.9929323  2.0480403 -1.3431105  0.6711474  0.8321613
#>   Nonlinear3  Class
#> 1  0.6920055 Class1
#> 2  0.5599569 Class2
#> 3  0.3426912 Class1
#> 4  0.2344975 Class2
#> 5  0.4296028 Class1
#> 6  0.7367007 Class1

Now we are ready to train a model. Let’s train a glm model with caret.

cctrl1 <- trainControl(method = "cv", number = 3, returnResamp = "all",
                       classProbs = TRUE, 
                       summaryFunction = twoClassSummary)

test_class_cv_model <- train(trainX, trainY, 
                             method = "glm", 
                             trControl = cctrl1,
                             metric = "ROC", 
                             preProc = c("center", "scale"))
test_class_cv_model
#> Generalized Linear Model 
#> 
#> 50 samples
#>  7 predictor
#>  2 classes: 'Class1', 'Class2' 
#> 
#> Pre-processing: centered (7), scaled (7) 
#> Resampling: Cross-Validated (3 fold) 
#> Summary of sample sizes: 33, 33, 34 
#> Resampling results
#> 
#>   ROC       Sens       Spec       ROC SD      Sens SD     Spec SD   
#>   0.728588  0.6805556  0.6805556  0.07234259  0.06364688  0.06364688
#> 
#> 

To use breakDown we need a function that will calculate scores/predictions for a single observation. By default the predict() function returns predicted class.

So we are adding type = "prob" argument to get scores. And since there will be two scores for each observarion we need to extract one of them.

predict.fun <- function(model, x) predict(model, x, type = "prob")[,1]
testing <- twoClassSim(10, linearVars = 2)
predict.fun(test_class_cv_model, testing[1,])
#> [1] 0.1997649

Now we are ready to call the broken() function.

library("breakDown")
explain_2 <- broken(test_class_cv_model, testing[1,], data = trainX, predict.function = predict.fun)
explain_2
#>                                  contribution
#> (Intercept)                             0.500
#> + TwoFactor2 = 1.0795496599768         -0.184
#> + TwoFactor1 = -0.47479191451519       -0.135
#> + Linear2 = -0.787000382007838         -0.083
#> + Nonlinear3 = 0.614545789547265       -0.016
#> + Linear1 = 0.75131230478348           -0.012
#> + Nonlinear1 = 0.915338645223528        0.022
#> + Nonlinear2 = 0.983319560531527        0.108
#> final_prognosis                         0.200
#> baseline:  0

And plot it.

library(ggplot2)
plot(explain_2) + ggtitle("breakDown plot for caret/glm model")