A few questions always asked by business leaders after seeing the results of highly accurate machine learning models are as follows
- Are machine learning models interpretable and transparent?
- How can the results of the model be used to develop a business strategy?
- Can the predictions from the model be used to explain to the regulators why something was rejected or accepted based on model prediction?
DataRobot does provide many diagnostics like partial dependence, feature impact, reason codes to answer the above questions and using those diagnostics predictions can be converted to prescriptions for the business. In this vignette we would be covering reason codes. Partial dependence has been covered in detail in the companion vignette “Interpreting Predictive Models Using Partial Dependence Plots”.
The DataRobot modeling engine is a commercial product that supports the rapid development and evaluation of a large number of different predictive models from a single data source. The open-source R package datarobot allows users of the DataRobot modeling engine to interact with it from R, creating new modeling projects, examining model characteristics, and generating predictions from any of these models for a specified dataset. This vignette illustrates how to interact with DataRobot using datarobot package, build models, make prediction using a model and then use reason codes to explain why a model is predicting high or low. Reason codes can be used to answer the questions mentioned earlier.
Let’s load datarobot and other useful packages
library(datarobot)
library(httr)
library(knitr)
library(data.table)
To access the DataRobot modeling engine, it is necessary to establish an authenticated connection, which can be done in one of two ways. In both cases, the necessary information is an endpoint - the URL address of the specific DataRobot server being used - and a token, a previously validated access token.
token is unique for each DataRobot modeling engine account and can be accessed using the DataRobot webapp in the account profile section.
endpoint depends on DataRobot modeling engine installation (cloud-based, on-prem…) you are using. Contact your DataRobot admin for endpoint to use. The endpoint for DataRobot cloud accounts is https://app.datarobot.com/api/v2
The first access method uses a YAML configuration file with these two elements - labeled token and endpoint - located at $HOME/.config/datarobot/drconfig.yaml. If this file exists when the datarobot package is loaded, a connection to the DataRobot modeling engine is automatically established. It is also possible to establish a connection using this YAML file via the ConnectToDataRobot function, by specifying the configPath parameter.
The second method of establishing a connection to the DataRobot modeling engine is to call the function ConnectToDataRobot with the endpoint and token parameters.
endpoint <- "https://<YOUR ENDPOINT HERE>/api/v2"
apiToken <- "<YOUR API TOKEN HERE>"
ConnectToDataRobot(endpoint = endpoint, token = apiToken)
We would be using a sample dataset related to credit scoring open sourced by LendingClub (https://www.lendingclub.com/). Below is the information related to the variables.
Id | Min. : 1 | 1st Qu.: 2501 | Median : 5000 | Mean : 5000 | 3rd Qu.: 7500 | Max. :10000 | NA |
is_bad | Min. :0.0000 | 1st Qu.:0.0000 | Median :0.0000 | Mean :0.1295 | 3rd Qu.:0.0000 | Max. :1.0000 | NA |
emp_title | Length:10000 | Class :character | Mode :character | NA | NA | NA | NA |
emp_length | Length:10000 | Class :character | Mode :character | NA | NA | NA | NA |
home_ownership | Length:10000 | Class :character | Mode :character | NA | NA | NA | NA |
annual_inc | Min. : 2000 | 1st Qu.: 40000 | Median : 58000 | Mean : 68203 | 3rd Qu.: 82000 | Max. :900000 | NA’s :1 |
verification_status | Length:10000 | Class :character | Mode :character | NA | NA | NA | NA |
pymnt_plan | Length:10000 | Class :character | Mode :character | NA | NA | NA | NA |
Notes | Length:10000 | Class :character | Mode :character | NA | NA | NA | NA |
purpose_cat | Length:10000 | Class :character | Mode :character | NA | NA | NA | NA |
purpose | Length:10000 | Class :character | Mode :character | NA | NA | NA | NA |
zip_code | Length:10000 | Class :character | Mode :character | NA | NA | NA | NA |
addr_state | Length:10000 | Class :character | Mode :character | NA | NA | NA | NA |
debt_to_income | Min. : 0.00 | 1st Qu.: 8.16 | Median :13.41 | Mean :13.34 | 3rd Qu.:18.69 | Max. :29.99 | NA |
delinq_2yrs | Min. : 0.0000 | 1st Qu.: 0.0000 | Median : 0.0000 | Mean : 0.1482 | 3rd Qu.: 0.0000 | Max. :11.0000 | NA’s :5 |
earliest_cr_line | Length:10000 | Class :character | Mode :character | NA | NA | NA | NA |
inq_last_6mths | Min. : 0.000 | 1st Qu.: 0.000 | Median : 1.000 | Mean : 1.067 | 3rd Qu.: 2.000 | Max. :25.000 | NA’s :5 |
mths_since_last_delinq | Min. : 0.00 | 1st Qu.: 18.00 | Median : 34.00 | Mean : 35.89 | 3rd Qu.: 53.00 | Max. :120.00 | NA’s :6316 |
mths_since_last_record | Min. : 0.00 | 1st Qu.: 0.00 | Median : 86.00 | Mean : 61.65 | 3rd Qu.:101.00 | Max. :119.00 | NA’s :9160 |
open_acc | Min. : 1.000 | 1st Qu.: 6.000 | Median : 9.000 | Mean : 9.335 | 3rd Qu.:12.000 | Max. :39.000 | NA’s :5 |
pub_rec | Min. :0.00000 | 1st Qu.:0.00000 | Median :0.00000 | Mean :0.06013 | 3rd Qu.:0.00000 | Max. :3.00000 | NA’s :5 |
revol_bal | Min. : 0 | 1st Qu.: 3524 | Median : 8646 | Mean : 14271 | 3rd Qu.: 16952 | Max. :1207359 | NA |
revol_util | Min. : 0.00 | 1st Qu.: 25.00 | Median : 48.70 | Mean : 48.45 | 3rd Qu.: 71.80 | Max. :100.60 | NA’s :26 |
total_acc | Min. : 1.00 | 1st Qu.:13.00 | Median :20.00 | Mean :22.01 | 3rd Qu.:29.00 | Max. :90.00 | NA’s :5 |
initial_list_status | Length:10000 | Class :character | Mode :character | NA | NA | NA | NA |
collections_12_mths_ex_med | Min. :0 | 1st Qu.:0 | Median :0 | Mean :0 | 3rd Qu.:0 | Max. :0 | NA’s :32 |
mths_since_last_major_derog | Min. :1.000 | 1st Qu.:1.000 | Median :2.000 | Mean :2.002 | 3rd Qu.:3.000 | Max. :3.000 | NA |
policy_code | Length:10000 | Class :character | Mode :character | NA | NA | NA | NA |
Let’s divide our data in train and test. We can use train data to create a datarobot project using SetupProject function and test data to make predictions and generate reason codes. Detailed explanation about creating projects was described in the vignette , “Introduction to the DataRobot R Package.” The specific sequence used here was:
target <- "is_bad"
projectName <- "Credit Scoring"
numWorkers <- 10
set.seed(1111)
split <- sample(nrow(Lending), round(0.9 * nrow(Lending)), replace = FALSE)
train <- Lending[split,]
test <- Lending[-split,]
project <- SetupProject(dataSource = train,
projectName = projectName)
SetTarget(project = project,
target = target)
Once the modeling process has completed, the ListModels function returns an S3 object of class “listOfModels” that characterizes all of the models in a specified DataRobot project. Calling this function before the modeling process is complete causes a partial result to be returned, with a warning; to avoid this problem, the WaitForAutopilot function is used before calling ListModels:
# increase the number of workers used by this project
UpdateProject(project = project$projectId,
workerCount = numWorkers)
WaitForAutopilot(project, verbosity = 1, timeout = 999999)
results <- as.data.frame(ListModels(project))
kable(head(results), longtable = TRUE, booktabs = TRUE, row.names = TRUE)
modelType | expandedModel | modelId | blueprintId | featurelistName | featurelistId | samplePct | validationMetric | |
---|---|---|---|---|---|---|---|---|
1 | Advanced AVG Blender | Advanced AVG Blender | 58d9501cf2ff94714e225839 | 300fb24a331bd7e8313023faf0db0a4d | Informative Features | 58d94dddfbfa573bd3aa3295 | 63.9929 | 0.34306 |
2 | ENET Blender | ENET Blender | 58d9501cf2ff94714e22583b | 9df9e8c1439604171b9a025faf88aa5a | Informative Features | 58d94dddfbfa573bd3aa3295 | 63.9929 | 0.34310 |
3 | AVG Blender | AVG Blender | 58d9501bf2ff94714e225837 | e3aadd9331abf90ea27fa28dcbd72fc7 | Informative Features | 58d94dddfbfa573bd3aa3295 | 63.9929 | 0.34570 |
4 | ENET Blender | ENET Blender | 58d9501cf2ff94714e22583d | 6ed326ca6e31c0ce27aa7729b256e814 | Informative Features | 58d94dddfbfa573bd3aa3295 | 63.9929 | 0.34572 |
5 | eXtreme Gradient Boosted Trees Classifier with Early Stopping | eXtreme Gradient Boosted Trees Classifier with Early Stopping::Tree-based Algorithm Preprocessing v21 | 58d94ec5f2ff947071225847 | 8565246f48026082fac7fb3153ed8aa0 | Informative Features | 58d94dddfbfa573bd3aa3295 | 63.9929 | 0.34636 |
6 | eXtreme Gradient Boosted Trees Classifier with Early Stopping | eXtreme Gradient Boosted Trees Classifier with Early Stopping::Tree-based Algorithm Preprocessing v1 | 58d94ec5f2ff94707122584b | c2cf08be8a3d2c7d91bbb299ac00fe83 | Informative Features | 58d94dddfbfa573bd3aa3295 | 63.9929 | 0.34636 |
The generation of model predictions is a three-step process:
1. Upload dataset for prediction using UploadPredictionDataset.
2. Create a predict job using RequestPredictionsForDataset function, which returns the predictJobId.
3. Pass the predictJobId to GetPredictions along with the projectId for the DataRobot project containing the model. The result returned by this function is a vector of predicted responses; in the case of binary classification projects, the optional type parameter may be used to request a vector of probabilities instead of binary responses; refer to the help files for details.
As a specific example, the following code sequence identifies the model with the best performance, extracts it as bestModel, and generates predictions for it from the test dataframe we created earlier:
allModels <- ListModels(project)
modelFrame <- as.data.frame(allModels)
metric <- modelFrame$validationMetric
bestIndex <- which.min(metric)
bestModel <- allModels[[bestIndex]]
dataset <- UploadPredictionDataset(project, test, maxWait = 1200)
bestPredictJobId <- RequestPredictionsForDataset(project, bestModel$modelId, dataset$id)
bestPredictions <- GetPredictions(project, bestPredictJobId, type="probability")
testPredictions <- data.frame(original = test$is_bad, prediction = bestPredictions)
kable(head(testPredictions), longtable = TRUE, booktabs = TRUE, row.names = TRUE)
original | prediction | |
---|---|---|
1 | 0 | 0.1882636 |
2 | 0 | 0.0890647 |
3 | 0 | 0.0888415 |
4 | 0 | 0.2131941 |
5 | 0 | 0.0966023 |
6 | 0 | 0.0956107 |
We need to generate Feature Impact for the model before we can get reason codes using that model. Feature Impact, which is available for all model types, works by altering input data and observing the effect on a model’s score. It is an on-demand feature, meaning that you must initiate a calculation to see the results. Once you have had DataRobot compute the feature impact for a model, that information is saved with the project (you do not need to recalculate feature impact each time you re-open the project or each time you request reason codes in new data).
Feature Impact for a given column measures how much worse a model’s error score would be if DataRobot made predictions after randomly shuffling that column (while leaving other columns unchanged). This technique is sometimes called Permutation Importance.
featureImpactJobId <- RequestFeatureImpact(bestModel)
featureImpact <- GetFeatureImpactForJobId(project, featureImpactJobId, maxWait = 1200)
#Print top 10 features
kable(featureImpact[1:10,], longtable = TRUE, booktabs = TRUE, row.names = TRUE)
featureName | impactNormalized | impactUnnormalized | |
---|---|---|---|
1 | purpose_cat | 1.0000000 | 0.0684926 |
2 | total_acc | 0.6393432 | 0.0437903 |
3 | Notes | 0.3740867 | 0.0256222 |
4 | open_acc | 0.2490012 | 0.0170547 |
5 | annual_inc | 0.1822035 | 0.0124796 |
6 | revol_util | 0.1383699 | 0.0094773 |
7 | inq_last_6mths | 0.1377131 | 0.0094323 |
8 | emp_title | 0.1308886 | 0.0089649 |
9 | purpose | 0.0708564 | 0.0048531 |
10 | emp_length | 0.0614025 | 0.0042056 |
For each prediction, DataRobot provides an ordered list of reasons; the number of reasons is based on the setting. Each reason is a feature from the dataset and its corresponding value, accompanied by a qualitative indicator of the reason’s strength—strong (+++), medium (++), or weak (+) positive or negative (-) influence.
There are three main inputs you can set for DataRobot to use when computing reason codes
1. maxCodes
: the Number of reasons for each predictions. Default is 3.
2. thresholdLow
: Probability threshold below which DataRobot should calculate reason codes.
3. thresholdHigh
: Probability threshold above which DataRobot should calculate reason codes.
# Calculate reason codes
reasonCodeJobID <- RequestReasonCodesInitialization(bestModel)
reasonCodeJobIDInitialization <- GetReasonCodesInitializationFromJobId(project,reasonCodeJobID)
# Calculate reason codes for our dataset
reasonCodeRequest <- RequestReasonCodes(bestModel, dataset$id, maxCodes = 3, thresholdLow = 0.25, thresholdHigh = 0.75)
# Get the reason codes we calculated
reasonCodeRequestMetaData <- GetReasonCodesMetadataFromJobId(project, reasonCodeRequest, maxWait = 1800)
reasonCodeMetadata <- GetReasonCodesMetadata(project, reasonCodeRequestMetaData$id)
reasonCodeAsDataFrame <- GetAllReasonCodesRowsAsDataFrame(project, reasonCodeRequestMetaData$id)
reasonCodeAsDataFrame$rowId <- NULL
#subset top 3 and bottom 3 predictions
reasonCodeAsDataFrameTopBottom <- rbind(reasonCodeAsDataFrame[order(reasonCodeAsDataFrame$class1Probability),][1:3,],
reasonCodeAsDataFrame[order(reasonCodeAsDataFrame$class2Probability),][1:3,])
kable(head(reasonCodeAsDataFrameTopBottom), longtable = TRUE, booktabs = TRUE, row.names = TRUE)
prediction | class1Label | class1Probability | class2Label | class2Probability | reason1FeatureName | reason1FeatureValue | reason1QualitativeStrength | reason1Strength | reason1Label | reason2FeatureName | reason2FeatureValue | reason2QualitativeStrength | reason2Strength | reason2Label | reason3FeatureName | reason3FeatureValue | reason3QualitativeStrength | reason3Strength | reason3Label | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
590 | 0 | 1 | 0.0152559 | 0 | 0.9847441 | inq_last_6mths | 5 | — | -1.1721930 | 1 | purpose | CDT CARD RAISING RATE FR/17 TO 27 % | – | -0.4290407 | 1 | total_acc | 38 | – | -0.3091894 | 1 |
823 | 0 | 1 | 0.0170943 | 0 | 0.9829057 | inq_last_6mths | 5 | — | -1.2399844 | 1 | annual_inc | 230000 | – | -0.6646366 | 1 | total_acc | 40 | – | -0.4963727 | 1 |
596 | 0 | 1 | 0.0196080 | 0 | 0.9803920 | inq_last_6mths | 6 | — | -0.9582067 | 1 | total_acc | 28 | – | -0.3250731 | 1 | annual_inc | 110000 | – | -0.1900297 | 1 |
600 | 1 | 1 | 0.9692447 | 0 | 0.0307553 | purpose_cat | credit card small business | +++ | 5.1677089 | 1 | inq_last_6mths | 2 | + | 0.2373998 | 1 | revol_util | 84.1 | + | 0.2321250 | 1 |
414 | 1 | 1 | 0.9651950 | 0 | 0.0348050 | purpose_cat | major purchase small business | +++ | 4.9394082 | 1 | Notes | ++ | 0.3794873 | 1 | annual_inc | 38400 | + | 0.1755247 | 1 | |
540 | 1 | 1 | 0.9576436 | 0 | 0.0423564 | purpose_cat | small business small business | +++ | 4.5879754 | 1 | open_acc | 5 | – | -0.8902392 | 1 | earliest_cr_line (Year) | 2005 | - | -0.3356800 | 1 |
From the example above, you could answer “Why did the model give one of the customers a 97% probability of defaulting?” Top reason explains that purpose_cat of loan was “credit card small business”" and we can also see in above example that whenever model is predicting high probability of default, purpose_cat is related to small business.
Some notes on reasons:
- If the data points are very similar, the reasons can list the same rounded up values.
- It is possible to have a reason state of MISSING if a “missing value” was important in making the prediction.
- Typically, the top reasons for a prediction have the same direction as the outcome, but it’s possible that with interaction effects or correlations among variables a reason could, for instance, have a strong positive impact on a negative prediction.
In some projects – such as insurance projects – the prediction adjusted by exposure is more useful to look at than just raw prediction. For example, the raw prediction (e.g. claim counts) is divided by exposure (e.g. time) in the project with exposure column. The adjusted prediction provides insights with regard to the predicted claim counts per unit of time. To include that information, set excludeAdjustedPredictions
to False in correspondent method calls.
reasonCodeAsDataFrameWithExposure <- GetAllReasonCodesRowsAsDataFrame(project, reasonCodeRequestMetaData$id, excludeAdjustedPredictions = FALSE)
kable(head(reasonCodeAsDataFrameWithExposure), longtable = TRUE, booktabs = TRUE, row.names = TRUE)
rowId | prediction | class1Label | class1Probability | class2Label | class2Probability | reason1FeatureName | reason1FeatureValue | reason1QualitativeStrength | reason1Strength | reason1Label | reason2FeatureName | reason2FeatureValue | reason2QualitativeStrength | reason2Strength | reason2Label | reason3FeatureName | reason3FeatureValue | reason3QualitativeStrength | reason3Strength | reason3Label | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 0 | 0 | 1 | 0.0673742 | 0 | 0.9326258 | revol_util | 37.1 | — | -0.2352395 | 1 | earliest_cr_line (Month) | 9 | ++ | 0.1299938 | 1 | mths_since_last_delinq | 16 | ++ | 0.1047859 | 1 |
2 | 1 | 0 | 1 | 0.1144156 | 0 | 0.8855844 | total_acc | 27 | — | -0.5951351 | 1 | inq_last_6mths | 2 | ++ | 0.2421135 | 1 | annual_inc | 50000 | – | -0.1259684 | 1 |
3 | 2 | 0 | 1 | 0.0946424 | 0 | 0.9053576 | total_acc | 23 | — | -0.5863631 | 1 | annual_inc | 40000 | ++ | 0.2572561 | 1 | purpose | FICO score 762 want’s to buy a new car | – | -0.2285383 | 1 |
4 | 3 | 0 | 1 | 0.0532636 | 0 | 0.9467364 | total_acc | 24 | — | -0.5805456 | 1 | annual_inc | 112000 | – | -0.3023279 | 1 | revol_util | 39.5 | – | -0.2645794 | 1 |
5 | 4 | 0 | 1 | 0.1360759 | 0 | 0.8639241 | total_acc | 19 | — | -0.3870679 | 1 | annual_inc | 30000 | ++ | 0.1768992 | 1 | revol_util | 62.1 | ++ | 0.1716356 | 1 |
6 | 5 | 0 | 1 | 0.0907338 | 0 | 0.9092662 | total_acc | 24 | — | -0.3785583 | 1 | inq_last_6mths | 3 | ++ | 0.2634575 | 1 | revol_util | 3.1 | – | -0.1964933 | 1 |
This note has described the Reason Codes which are useful for understanding why model is predicting high or low predictions for a specific case. DataRobot also provides qualitative stregth of each reason. Reason Codes can be used in developing good business strategy by taking prescriptions based on the reasons which are responsible for high or low predictions. They are also useful in explaining the actions taken based on the model predictions to regulatory or compliance department within an organiation.