This vignettes shows how to calculate marginal effects at specific values or levels for the terms of interest. It is recommended to read the general introduction first, if you haven’t done this yet.
The terms
-argument not only defines the model terms of interest, but each model term can be limited to certain values. This allows to compute and plot marginal effects for (grouping) terms at specific values only, or to define values for the main effect of interest.
There are several options to define these values, which always should be placed in square brackets directly after the term name and can vary for each model term.
terms = "c172code [1,3]"
. For factors, you could also use factor levels, e.g. terms = "Species [setosa,versicolor]"
.terms = c("c12hour [30:80]", "c172code [1,3]")
. This would plot all values from 30 to 80 for the variable c12hour.terms = "c12hour [meansd]"
), quartiles (terms = "c12hour [quart]"
) or minumum and maximum values (terms = "c12hour [mixmax]"
).terms = "hp [exp]"
.pretty
range, for numeric variables. In this case, ggpredict()
tries to calculate a pretty numeric range based on how large the range of the related variable is. Example: terms = "age [pretty]"
. This is what the pretty
-argument automatically does when a term has more than 25 unique values, so by using terms = "... [pretty]"
gives you the option to selectively prettify terms.library(ggeffects)
library(ggplot2)
data(efc)
fit <- lm(barthtot ~ c12hour + neg_c_7 + c161sex + c172code, data = efc)
mydf <- ggpredict(fit, terms = c("c12hour [30:80]", "c172code [1,3]"))
mydf
#> # A tibble: 22 x 5
#> x predicted conf.low conf.high group
#> <int> <dbl> <dbl> <dbl> <fct>
#> 1 30 67.1 64.0 70.3 low level of education
#> 2 30 68.6 65.4 71.8 high level of education
#> 3 35 65.9 62.8 69.0 low level of education
#> 4 35 67.3 64.2 70.5 high level of education
#> 5 40 64.6 61.6 67.7 low level of education
#> 6 40 66.1 62.9 69.2 high level of education
#> 7 45 63.3 60.3 66.4 low level of education
#> 8 45 64.8 61.6 68.0 high level of education
#> 9 50 62.1 59.0 65.1 low level of education
#> 10 50 63.5 60.3 66.7 high level of education
#> # ... with 12 more rows
ggplot(mydf, aes(x, predicted, colour = group)) + geom_line()
Defining value ranges is especially useful when variables are, for instance, log-transformed. ggpredict()
then typically only uses the range of the log-transformed variable, which is in most cases not what we want. In such situation, specify the range in the terms
-argument.
data(mtcars)
mpg_model <- lm(mpg ~ log(hp), data = mtcars)
# x-values and predictions based on the log(hp)-values
ggpredict(mpg_model, "hp")
#> # A tibble: 22 x 5
#> x predicted conf.low conf.high group
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 3.95 57.9 49.4 66.3 1
#> 2 4.13 57.4 49.0 65.8 1
#> 3 4.17 57.3 48.9 65.6 1
#> 4 4.19 57.2 48.9 65.6 1
#> 5 4.51 56.4 48.2 64.6 1
#> 6 4.53 56.4 48.2 64.5 1
#> 7 4.55 56.3 48.2 64.5 1
#> 8 4.57 56.3 48.1 64.4 1
#> 9 4.65 56.1 48.0 64.2 1
#> 10 4.69 56.0 47.9 64.1 1
#> # ... with 12 more rows
# x-values and predictions based on hp-values from 50 to 150
ggpredict(mpg_model, "hp [50:150]")
#> # A tibble: 101 x 5
#> x predicted conf.low conf.high group
#> <int> <dbl> <dbl> <dbl> <fct>
#> 1 50 30.5 27.9 33.1 1
#> 2 51 30.3 27.8 32.9 1
#> 3 52 30.1 27.6 32.6 1
#> 4 53 29.9 27.4 32.4 1
#> 5 54 29.7 27.3 32.1 1
#> 6 55 29.5 27.1 31.9 1
#> 7 56 29.3 27.0 31.7 1
#> 8 57 29.1 26.8 31.4 1
#> 9 58 28.9 26.7 31.2 1
#> 10 59 28.7 26.5 31.0 1
#> # ... with 91 more rows
Especially in situations where we have two continuous variables in interaction terms, or where the “grouping” variable is continuous, it is helpful to select specific values of the grouping variable - else, predictions would be made for too many groups, which is no longer helpful when interpreting marginal effects.
You can use
"minmax"
: minimum and maximum values (lower and upper bounds) of the variable are used."meansd"
: uses the mean value as well as one standard deviation below and above mean value."zeromax"
: is similar to the "minmax"
option, however, 0 is always used as minimum value. This may be useful for predictors that don’t have an empirical zero-value. "quart"
calculates and uses the quartiles (lower, median and upper), including minimum and maximum value. "quart2"
calculates and uses the quartiles (lower, median and upper), excluding minimum and maximum value.data(efc)
# short variable label, for plot
attr(efc$c12hour, "label") <- "hours of care"
fit <- lm(barthtot ~ c12hour * c161sex + neg_c_7, data = efc)
mydf <- ggpredict(fit, terms = c("c161sex", "c12hour [meansd]"))
plot(mydf)
The brackets in the terms
-argument also accept the name of a valid function, to (back-)transform predicted valued. In this example, an alternative would be to specify that values should be exponentiated, which is indicated by [exp]
in the terms
-argument:
# x-values and predictions based on exponentiated hp-values
ggpredict(mpg_model, "hp [exp]")
#> # A tibble: 22 x 5
#> x predicted conf.low conf.high group
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 52. 30.1 27.6 32.6 1
#> 2 62. 28.2 26.1 30.3 1
#> 3 65.0 27.7 25.7 29.7 1
#> 4 66.0 27.5 25.5 29.5 1
#> 5 91.0 24.1 22.7 25.5 1
#> 6 93. 23.9 22.4 25.3 1
#> 7 95 23.6 22.3 25.0 1
#> 8 97 23.4 22.1 24.7 1
#> 9 105. 22.5 21.3 23.8 1
#> 10 109. 22.1 20.9 23.4 1
#> # ... with 12 more rows
This section is intended to show some examples how the plotted output differs, depending on which value range is used. To see the difference in the “curvilinear” trend, we use a quadratic term on a standardized variable.
library(sjmisc)
data(efc)
efc$c12hour <- std(efc$c12hour)
m <- lm(barthtot ~ c12hour + I(c12hour^2) + neg_c_7 + c160age + c172code, data = efc)
me <- ggpredict(m, terms = "c12hour")
plot(me)
ggpredict()
prints a message, which says that there are many unique values for the variable of interest, so these were “prettified”, resulting in a smaller set of unique values. This is less memory consuming and may be needed especially for more complex models.
You can turn off automatic “prettifying” with the pretty
-argument.
This results in a smooth plot, as all values from the term of interest are taken into account.
By default, the typical
-argument determines which function will be applied to the covariates to hold these terms constant. Use the condition
-argument to define specific values at which a covariate should be held constant. condition
requires a named vector, with the name indicating the covariate.
data(mtcars)
mpg_model <- lm(mpg ~ log(hp) + disp, data = mtcars)
# "disp" is hold constant at its mean
ggpredict(mpg_model, "hp [exp]")
#> # A tibble: 22 x 5
#> x predicted conf.low conf.high group
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 52. 25.6 21.9 29.3 1
#> 2 62. 24.6 21.5 27.6 1
#> 3 65.0 24.3 21.4 27.2 1
#> 4 66.0 24.2 21.4 27.0 1
#> 5 91.0 22.3 20.6 24.0 1
#> 6 93. 22.2 20.5 23.8 1
#> 7 95 22.0 20.4 23.6 1
#> 8 97 21.9 20.4 23.4 1
#> 9 105. 21.4 20.1 22.8 1
#> 10 109. 21.2 20.0 22.5 1
#> # ... with 12 more rows
# "disp" is hold constant at value 200
ggpredict(mpg_model, "hp [exp]", condition = c(disp = 200))
#> # A tibble: 22 x 5
#> x predicted conf.low conf.high group
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 52. 26.3 23.0 29.6 1
#> 2 62. 25.3 22.6 28.0 1
#> 3 65.0 25.0 22.4 27.5 1
#> 4 66.0 24.9 22.4 27.4 1
#> 5 91.0 23.0 21.5 24.4 1
#> 6 93. 22.8 21.4 24.3 1
#> 7 95 22.7 21.4 24.1 1
#> 8 97 22.6 21.3 23.9 1
#> 9 105. 22.1 21.0 23.3 1
#> 10 109. 21.9 20.8 23.0 1
#> # ... with 12 more rows