The fmeffects
package computes, aggregates, and
visualizes forward marginal effects (FMEs) for any supervised machine
learning model. Read here
how they are computed or the research paper for a more
in-depth understanding. There are three main functions:
fme()
computes FMEs for a given model, data, feature of
interest, and step sizecame()
can be applied subsequently to find subspaces of
the feature space where FMEs are more homogeneousame()
provides an overview of the prediction function
w.r.t. each feature by using average marginal effects (AMEs) based on
FMEs.For demonstration purposes, we consider usage data from the Capital
Bike Sharing scheme (Fanaee-T and Gama, 2014). It contains information
about bike sharing usage in Washington, D.C. for the years 2011-2012
during the period from 7 to 8 a.m. We are interested in predicting
count
(the total number of bikes lent out to users).
## Classes 'data.table' and 'data.frame': 727 obs. of 11 variables:
## $ season : Factor w/ 4 levels "fall","spring",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ year : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ month : num 1 1 1 1 1 1 1 1 1 1 ...
## $ holiday : Factor w/ 2 levels "True","False": 2 2 2 2 2 2 2 2 2 2 ...
## $ weekday : Factor w/ 7 levels "Sun","Mon","Tue",..: 7 1 2 3 4 5 6 7 1 2 ...
## $ workingday: Factor w/ 2 levels "True","False": 2 2 1 1 1 1 1 2 2 1 ...
## $ weather : Factor w/ 3 levels "clear","misty",..: 1 2 1 1 1 2 1 2 1 1 ...
## $ temp : num 8.2 16.4 5.74 4.92 7.38 6.56 8.2 6.56 3.28 4.92 ...
## $ humidity : num 0.86 0.76 0.5 0.74 0.43 0.59 0.69 0.74 0.53 0.5 ...
## $ windspeed : num 0 13 13 9 13 ...
## $ count : num 3 1 64 94 88 95 84 9 6 77 ...
## - attr(*, ".internal.selfref")=<externalptr>
FMEs are a model-agnostic interpretation method, i.e., they can be
applied to any regression or (binary) classification model. Before we
can compute FMEs, we need a trained model. The fme
package
supports models from the mlr3
, tidymodels
(parsnip) and caret
libraries. Let’s try it with a random
forest using the ranger
algorithm:
set.seed(123)
library(mlr3verse)
library(ranger)
task = as_task_regr(x = bikes, id = "bikes", target = "count")
forest = lrn("regr.ranger")$train(task)
FMEs can be used to compute feature effects for both numerical and
categorical features. This can be done with the fme()
function.
The most common application is to compute the FME for a single
numerical feature, i.e., a univariate feature effect. The variable of
interest must be specified with the feature
argument. In
this case, step.size
can be any number deemed most useful
for the purpose of interpretation. Most of the time, this will be a unit
change, e.g., step.size = 1
. As the concept of numerical
FMEs extends to multivariate feature effects as well, fme()
can be asked to compute a bivariate feature effect as well. In this
case, feature
needs to be supplied with the names of two
numerical features, and step.size
requires a vector, e.g.,
step.size = c(1, 1)
.
Assume we are interested in the effect of temperature on bike sharing
usage. Specifically, we set step.size = 1
to investigate
the FME of an increase in temperature by 1 degree Celsius (°C). Thus, we
compute FMEs for feature = "temp"
and
step.size = 1
.
effects = fme(model = forest,
data = bikes,
target = "count",
feature = "temp",
step.size = 1,
ep.method = "envelope")
Note that we have specified ep.method = "envelope"
. This
means we exclude observations for which adding 1°C to the temperature
results in the temperature value falling outside the range of
temp
in the overall data. Thereby, we reduce the risk of
asking the model to extrapolate.
## `geom_smooth()` using formula = 'y ~ s(x, bs = "cs")'
The black arrow indicates direction and magnitude of
step.size
. The horizontal line is the average marginal
effect (AME). The AME is computed as a simple mean over all
observation-wise FMEs. Therefore, on average, the FME of a temperature
increase of 1°C on bike sharing usage is roughly 2.4. As can be seen,
the observation-wise effects seem to vary for different values of temp.
While the FME tends to be positive for lower temperature values
(0-20°C), it turns negative for higher temperature values
(>20°C).
Also, we can extract all relevant aggregate information from the
effects
object:
## [1] 2.364761
For a more in-depth analysis, we can inspect the FME for each observation in the data set:
## obs.id fme
## 1: 1 1.831311
## 2: 2 3.148305
## 3: 3 5.689283
## 4: 4 -0.942657
## 5: 5 2.081746
## 6: 6 4.813296
Bivariate feature effects can be considered when one is interested in
the combined effect of two features on the target variable. Let’s assume
we want to estimate the effect of a decrease in temperature by 3°C,
combined with a decrease in humidity by 10 percentage points, i.e., the
FME for feature = c("temp", "humidity")
and
step.size = c(−3, −0.1)
:
effects2 = fme(model = forest,
data = bikes,
target = "count",
feature = c("temp", "humidity"),
step.size = c(-3, -0.1),
ep.method = "envelope")
plot(effects2, jitter = c(0.1, 0.02))
The plot for bivariate FMEs uses a color scale to indicate direction and magnitude of the estimated effect. Let’s check the AME:
## [1] -2.796907
It seems that a combined decrease in temperature by 3°C and humidity by 10 percentage points seems to result in slightly lower bike sharing usage (on average). However, a quick check of the variance of the FMEs implies that effects are highly heterogeneous:
## [1] 591.1291
Therefore, it could be interesting to move the interpretation of
feature effects from a global to a semi-global perspective via the
came()
function.
For a categorical feature, the FME of an observation is simply the
difference in predictions when changing the observed category of the
feature to the category specified in step.size
. For
instance, one could be interested in the effect of rainy weather on the
bike sharing demand, i.e., the FME of changing the feature value of
weather
to rain
for observations where weather
is either clear
or misty
:
effects3 = fme(model = forest,
data = bikes,
target = "count",
feature = "weather",
step.size = "rain")
summary(effects3)
##
## Forward Marginal Effects Object
##
## Step type:
## categorical
##
## Feature & reference category:
## weather, rain
##
## Extrapolation point detection:
## none, EPs: 0 of 657 obs. (0 %)
##
## Average Marginal Effect (AME):
## -55.5029
Here, the AME of rain
is -55. Therefore, while holding
all other features constant, a change to rainy weather can be expected
to reduce bike sharing usage by 55.
For categorical feature effects, we can plot the empirical distribution
of the FMEs:
For an informative overview of all feature effects in a model, we can
use the ame()
function:
## Feature step.size AME SD 0.25 0.75 n
## 1 season spring -29.5627 30.3933 -38.9776 -6.47 548
## 2 season summer 0.3712 22.2538 -8.3257 11.5291 543
## 3 season fall 13.9269 28.0969 -0.2271 35.7786 539
## 4 season winter 14.6231 24.5739 1.2331 25.8998 551
## 5 year 0 -100.0511 67.9522 -158.5412 -20.643 364
## 6 year 1 97.9793 61.0461 23.7845 149.0662 363
## 7 month 1 4.1008 13.1329 -1.2309 7.4386 727
## 8 holiday False -1.7886 21.8287 -9.6724 8.3443 21
## 9 holiday True -13.4861 25.6105 -32.4273 6.4777 706
## 10 weekday Sat -54.0185 48.8713 -85.2344 -16.5142 622
## 11 weekday Sun -82.8857 55.7827 -119.0325 -32.2624 622
## 12 weekday Mon 10.1004 27.9977 -8.8229 30.4342 623
## 13 weekday Tue 17.1576 24.7033 0.5063 32.5038 625
## 14 weekday Wed 20.3346 22.484 1.3541 34.6645 623
## 15 weekday Thu 19.5628 23.6865 -0.4163 35.5117 624
## 16 weekday Fri 1.3505 35.6711 -25.4026 29.6176 623
## 17 workingday False -204.8757 89.7998 -259.5304 -143.91 496
## 18 workingday True 162.7476 63.766 121.3106 210.1368 231
## 19 weather clear 26.0338 41.5209 3.8218 24.2506 284
## 20 weather misty 3.0396 32.1851 -8.7945 1.1693 513
## 21 weather rain -55.5029 52.9214 -93.484 -3.5707 657
## 22 temp 1 2.341 7.1894 -0.5878 4.3155 727
## 23 humidity 0.01 -0.2617 2.7596 -0.3505 0.434 727
## 24 windspeed 1 0.0207 2.1497 -0.1694 0.2686 727
This computes the AME for each feature included in the model, with a
default step size of 1 for numerical features (or, 0.01 if their range
is smaller than 1). For categorical features, AMEs are computed for all
available categories.
——
We can use came()
on a specific FME object to compute
subspaces of the feature space where FMEs are more homogeneous. Let’s
take the effect of a decrease in temperature by 3°C combined with a
decrease in humidity by 10 percentage points, and see if we can find
three appropriate subspaces.
##
## PartitioningCtree of an FME object
##
## Method: partitions = 3
##
## n cAME SD(fME)
## 718 -2.796907 24.31315 *
## 649 -4.871194 22.35313
## 49 6.346359 21.69575
## 20 42.112717 39.89350
## ---
## * root node (non-partitioned)
##
## AME (Global): -2.7969
As can be seen, the CTREE algorithm was used to partition the feature space into three subspaces. The coefficient of variation (CoV) is used as a criterion to measure homogeneity in each subspace. We can see that the CoV is substantially smaller in each of the subspaces than in the root node, i.e., the global feature space. The conditional AME (cAME) can be used to interpret how the expected FME varies across the subspaces. Let’s visualize our results:
In this case, we get a decision tree that assigns observations to a
feature subspace according to the weather situation
(weather
) and the day of the week (weekday
).
The information contained in the boxes below the terminal nodes are
equivalent to the summary output and can be extracted from
subspaces$results
. With cAMEs of -4.88, 4.16, and 25.68,
respectively, the expected ME is estimated to vary substantially in
direction and magnitude across the subspaces. For example, the cAME is
highest on rainy days. It turns negative on non-rainy days in spring,
summer and winter.
Fanaee-T, H. and Gama, J. (2014). Event labeling combining ensemble detectors and background knowledge. Progress in Artificial Intelligence 2(2): 113–127
Vanschoren, J., van Rijn, J. N., Bischl, B. and Torgo, L. (2013). Openml: networked science in machine learning. SIGKDD Explorations 15(2): 49–60