Unit-level contrasts in logistic regressions

Note: This vignette requires version 0.8.0 of marginaleffects, or the development version from Github.

This vignette replicates some of the analyses in this excellent blog post by Frank Harrell: Avoiding One-Number Summaries of Treatment Effects for RCTs with Binary Outcomes. Here, we show how one-number summaries and the entire distribution unit-level contrasts can be easily computed with comparisons().

Dr. Harrell discusses summaries from logistic regression models in the blog post above. He focus on a context in which one is interested in comparing two groups, such as in randomized controlled trials. He highlights potential pitfalls of presenting “one-number summmaries” — e.g., odds ratio and mean proportion difference — and proposes focusing on the entire distribution of proportion difference between groups.

For clarification, the following terms can be used interchangeably in the context of logistic regression where the covariate of interest is categorical:

Data

We focus on subset data from the GUSTO-I study, where patients were randomly assigned to accelerated tissue plasminogen activator (tPA) or streptokinase (SK).

Load libraries, data and fit full covariate-adjusted logistic model.

library(marginaleffects)
library(rms)

load(url(
"https://github.com/vincentarelbundock/modelarchive/raw/main/data-raw/gusto.rda"
))

gusto <- subset(gusto, tx %in% c("tPA", "SK"))
gusto$tx <- factor(gusto$tx, levels = c("tPA", "SK"))

mod <- glm(
    day30 ~ tx + rcs(age, 4) + Killip + pmin(sysbp, 120) + lsp(pulse, 50) +
    pmi + miloc + sex, family = "binomial",
    data = gusto)

One-Number Summaries

Population-averaged (aka “marginal”) proportion difference (see this vignette):

comparisons(
    mod,
    variables = "tx") |> 
    summary()
#>   Term Contrast  Effect Std. Error z value   Pr(>|z|)    2.5 % 97.5 %
#> 1   tx SK - tPA 0.01108   0.002766   4.005 6.1955e-05 0.005658 0.0165
#> 
#> Model type:  glm 
#> Prediction type:  response

The comparisons() function above computed predictions for each observed row of the data in two couterfactual cases: when tx is “SK”, and when tx is “tPA”. Then, it computed the differences between these two sets of predictions. Finally, it took the average of predicted differences in probabilities.

Now we want to compute population-averaged adjusted odds ratio.

Since odds ratios are non-collapsible, we cannot use the same strategy with them. Instead, we call transform_pre="lnoravg. Let hi be the vector of predicted probabilities when tx is “SK”, and let lo be the vector of predicted probabilities when tx is “tPA”. Then, the transform_pre="lnoravg" applies this function:

log((mean(hi)/(1 - mean(hi)))/(mean(lo)/(1 - mean(lo))))

Finally, we use transform_post=exp to exponentiate the results.

comparisons(
    mod,
    variables = "tx",
    transform_pre = "lnoravg",
    transform_post = exp) |>
    summary()
#>   Term                 Contrast Effect Std. Error z value   Pr(>|z|) 2.5 % 97.5 %
#> 1   tx ln(odds(SK) / odds(tPA))  1.192    0.04489   26.54 < 2.22e-16 1.104   1.28
#> 
#> Model type:  glm 
#> Prediction type:  response 
#> Post-transformation:  exp

Population-averaged (marginal) adjusted risk ratio (proportion):

comparisons(
    mod,
    variables = "tx",
    transform_pre = "lnratioavg",
    transform_post = exp) |>
    summary()
#>   Term                 Contrast Effect Std. Error z value   Pr(>|z|) 2.5 % 97.5 %
#> 1   tx ln(mean(SK) / mean(tPA))  1.177    0.04194   28.08 < 2.22e-16 1.095   1.26
#> 
#> Model type:  glm 
#> Prediction type:  response 
#> Post-transformation:  exp

Unit-level Summaries

Instead of estimating one-number summaries, we can focus on unit-level proportion differences using comparisons(). This function applies the fitted logistic regression model to predict outcome probabilities for each patient, i.e., unit-level.

cmp <- comparisons(mod, variables = "tx")

head(cmp)
#>   rowid     type term contrast   comparison    std.error statistic    p.value     conf.low   conf.high
#> 1     1 response   tx SK - tPA 0.0010741928 0.0004966749  2.162768 0.03055900 0.0001007278 0.002047658
#> 2     2 response   tx SK - tPA 0.0008573104 0.0003799743  2.256233 0.02405605 0.0001125746 0.001602046
#> 3     3 response   tx SK - tPA 0.0017797796 0.0007784409  2.286339 0.02223446 0.0002540634 0.003305496
#> 4     4 response   tx SK - tPA 0.0011367499 0.0004999032  2.273940 0.02296960 0.0001569575 0.002116542
#> 5     5 response   tx SK - tPA 0.0013655083 0.0005934013  2.301155 0.02138288 0.0002024631 0.002528553
#> 6     6 response   tx SK - tPA 0.0024015964 0.0010127226  2.371426 0.01771961 0.0004166965 0.004386496
#>     predicted predicted_hi predicted_lo day30  tx Killip pmi    miloc    sex    age pulse sysbp
#> 1 0.005769605  0.005769605  0.004695412     0  SK      I  no Anterior   male 19.027    60   130
#> 2 0.003742994  0.004600304  0.003742994     0 tPA      I  no Inferior   male 20.781    75   124
#> 3 0.009589391  0.009589391  0.007809612     0  SK      I  no Anterior   male 20.969    85   135
#> 4 0.004970544  0.006107294  0.004970544     0 tPA      I  no Inferior   male 20.984    90   129
#> 5 0.007343757  0.007343757  0.005978249     0  SK      I  no Anterior   male 21.449    70   157
#> 6 0.012975875  0.012975875  0.010574279     0  SK      I  no Anterior female 22.523    84   135

Show the predicted probability for individual patients under both treatment alternatives.

plot(x = cmp$predicted_hi,
     y = cmp$predicted_lo,
     main = "Risk of Mortality",
     xlab = "SK",
     ylab = "tPA")

abline(0, 1)

Lastly, present the entire distribution of unit-level proportion differences and its mean and median.

hist(cmp$comparison,
     breaks = 100,
     main = "Distribution of unit-level contrasts",
     xlab = "SK - tPA")
abline(v = mean(cmp$comparison), col = "red")
abline(v = median(cmp$comparison), col = "blue")

Appendix

comparisons() performed the following calculations under the hood:

d  <- gusto

d$tx = "SK"
predicted_hi <- predict(mod, newdata = d, type = "response")

d$tx = "tPA"
predicted_lo <- predict(mod, newdata = d, type = "response")

comparison <- predicted_hi - predicted_lo

The original dataset contains 30510 patients, thus comparisons() generates an output with same amount of rows.

nrow(gusto)
#> [1] 30510
nrow(cmp)
#> [1] 30510