Note: This vignette requires version 0.8.0 of marginaleffects
, or the development version from Github.
This vignette replicates some of the analyses in this excellent blog post by Frank Harrell: Avoiding One-Number Summaries of Treatment Effects for RCTs with Binary Outcomes. Here, we show how one-number summaries and the entire distribution unit-level contrasts can be easily computed with comparisons()
.
Dr. Harrell discusses summaries from logistic regression models in the blog post above. He focuses on a context in which one is interested in comparing two groups, such as in randomized controlled trials. He highlights potential pitfalls of presenting “one-number summmaries”, e.g., odds ratio and mean proportion difference. Finally, he recommends focusing on the entire distribution of proportion difference between groups.
For clarification, we use the following terms interchangeably in the context of logistic regression where the covariate of interest is categorical:
We focus on subset data from the GUSTO-I study, where patients were randomly assigned to accelerated tissue plasminogen activator (tPA) or streptokinase (SK).
Load libraries, data and fit a covariate-adjusted logistic regression model.
library(marginaleffects)
library(modelsummary)
library(rms)
load(url(
"https://github.com/vincentarelbundock/modelarchive/raw/main/data-raw/gusto.rda"
))
subset(gusto, tx %in% c("tPA", "SK"))
gusto <-$tx <- factor(gusto$tx, levels = c("tPA", "SK"))
gusto
glm(
mod <-~ tx + rcs(age, 4) + Killip + pmin(sysbp, 120) + lsp(pulse, 50) +
day30 pmi + miloc + sex, family = "binomial",
data = gusto)
As usual, we can produce a one-number summary of the relationship of interest by exponentiating the coefficients, which yields an Odds Ratio (OR):
modelsummary(mod, exponentiate = TRUE, coef_omit = "^(?!txSK)")
Model 1 | |
---|---|
txSK | 1.230 |
(0.065) | |
Num.Obs. | 30510 |
AIC | 12428.6 |
BIC | 12553.5 |
Log.Lik. | −6199.317 |
F | 173.216 |
RMSE | 0.24 |
Unlike ORs, adjusted risk differences vary from individual to individual based on the values of the control variables. The comparisons()
function can compute adjusted risk differences for every individual. Here, we display only the first 6 of them:
comparisons(
mod,variables = "tx") |>
head()
#> rowid type term contrast comparison std.error statistic
#> 1 1 response tx SK - tPA 0.0010741928 0.0004966749 2.162768
#> 2 2 response tx SK - tPA 0.0008573104 0.0003799743 2.256233
#> 3 3 response tx SK - tPA 0.0017797796 0.0007784409 2.286339
#> 4 4 response tx SK - tPA 0.0011367499 0.0004999032 2.273940
#> 5 5 response tx SK - tPA 0.0013655083 0.0005934013 2.301155
#> 6 6 response tx SK - tPA 0.0024015964 0.0010127226 2.371426
#> p.value conf.low conf.high predicted predicted_hi
#> 1 0.03055900 0.0001007278 0.002047658 0.005769605 0.005769605
#> 2 0.02405605 0.0001125746 0.001602046 0.003742994 0.004600304
#> 3 0.02223446 0.0002540634 0.003305496 0.009589391 0.009589391
#> 4 0.02296960 0.0001569575 0.002116542 0.004970544 0.006107294
#> 5 0.02138288 0.0002024631 0.002528553 0.007343757 0.007343757
#> 6 0.01771961 0.0004166965 0.004386496 0.012975875 0.012975875
#> predicted_lo day30 tx Killip pmi miloc sex age pulse sysbp
#> 1 0.004695412 0 SK I no Anterior male 19.027 60 130
#> 2 0.003742994 0 tPA I no Inferior male 20.781 75 124
#> 3 0.007809612 0 SK I no Anterior male 20.969 85 135
#> 4 0.004970544 0 tPA I no Inferior male 20.984 90 129
#> 5 0.005978249 0 SK I no Anterior male 21.449 70 157
#> 6 0.010574279 0 SK I no Anterior female 22.523 84 135
Population-averaged (aka “marginal”) adjusted risk difference (see this vignette) can be obtained using the summary()
function:
comparisons(
mod,variables = "tx") |>
summary()
#> Term Contrast Effect Std. Error z value Pr(>|z|) 2.5 % 97.5 %
#> 1 tx SK - tPA 0.01108 0.002766 4.005 6.1955e-05 0.005658 0.0165
#>
#> Model type: glm
#> Prediction type: response
The comparisons()
function above computed the predicted probability of mortality (day30==1
) for each observed row of the data in two couterfactual cases: when tx
is “SK”, and when tx
is “tPA”. Then, it computed the differences between these two sets of predictions. Finally, it computed the population-average of risk differences.
Instead of risk differences, we could compute population-averaged (marginal) adjusted risk ratios:
comparisons(
mod,variables = "tx",
transform_pre = "lnratioavg",
transform_post = exp) |>
summary()
#> Term Contrast Effect Pr(>|z|) 2.5 % 97.5 %
#> 1 tx ln(mean(SK) / mean(tPA)) 1.177 9.8075e-05 1.085 1.278
#>
#> Model type: glm
#> Prediction type: response
#> Post-transformation: exp
Instead of estimating one-number summaries, we can focus on unit-level proportion differences using comparisons()
. This function applies the fitted logistic regression model to predict outcome probabilities for each patient, i.e., unit-level.
comparisons(mod, variables = "tx")
cmp <-
head(cmp)
#> rowid type term contrast comparison std.error statistic
#> 1 1 response tx SK - tPA 0.0010741928 0.0004966749 2.162768
#> 2 2 response tx SK - tPA 0.0008573104 0.0003799743 2.256233
#> 3 3 response tx SK - tPA 0.0017797796 0.0007784409 2.286339
#> 4 4 response tx SK - tPA 0.0011367499 0.0004999032 2.273940
#> 5 5 response tx SK - tPA 0.0013655083 0.0005934013 2.301155
#> 6 6 response tx SK - tPA 0.0024015964 0.0010127226 2.371426
#> p.value conf.low conf.high predicted predicted_hi
#> 1 0.03055900 0.0001007278 0.002047658 0.005769605 0.005769605
#> 2 0.02405605 0.0001125746 0.001602046 0.003742994 0.004600304
#> 3 0.02223446 0.0002540634 0.003305496 0.009589391 0.009589391
#> 4 0.02296960 0.0001569575 0.002116542 0.004970544 0.006107294
#> 5 0.02138288 0.0002024631 0.002528553 0.007343757 0.007343757
#> 6 0.01771961 0.0004166965 0.004386496 0.012975875 0.012975875
#> predicted_lo day30 tx Killip pmi miloc sex age pulse sysbp
#> 1 0.004695412 0 SK I no Anterior male 19.027 60 130
#> 2 0.003742994 0 tPA I no Inferior male 20.781 75 124
#> 3 0.007809612 0 SK I no Anterior male 20.969 85 135
#> 4 0.004970544 0 tPA I no Inferior male 20.984 90 129
#> 5 0.005978249 0 SK I no Anterior male 21.449 70 157
#> 6 0.010574279 0 SK I no Anterior female 22.523 84 135
Show the predicted probability for individual patients under both treatment alternatives.
plot(x = cmp$predicted_hi,
y = cmp$predicted_lo,
main = "Risk of Mortality",
xlab = "SK",
ylab = "tPA")
abline(0, 1)
Lastly, present the entire distribution of unit-level proportion differences and its mean and median.
hist(cmp$comparison,
breaks = 100,
main = "Distribution of unit-level contrasts",
xlab = "SK - tPA")
abline(v = mean(cmp$comparison), col = "red")
abline(v = median(cmp$comparison), col = "blue")
comparisons()
performed the following calculations under the hood:
gusto
d <-
$tx = "SK"
d predict(mod, newdata = d, type = "response")
predicted_hi <-
$tx = "tPA"
d predict(mod, newdata = d, type = "response")
predicted_lo <-
predicted_hi - predicted_lo comparison <-
The original dataset contains 30510 patients, thus comparisons()
generates an output with same amount of rows.
nrow(gusto)
#> [1] 30510
nrow(cmp)
#> [1] 30510