The aim of this vignette is to illustrate the use/functionality of the glm_coef
function. glm_coef
can be used to display model coefficients with confidence intervals and p-values. The advantages and limitations of glm_coef
are:
gee
, glm
and survreg
.We start by loading relevant packages and setting alignment in pander
tables (as suggested in the Template of this package):
library(pubh, warn.conflicts = FALSE)
library(car, warn.conflicts = FALSE)
library(descr, warn.conflicts = FALSE)
library(multcomp, warn.conflicts = FALSE)
library(pander, warn.conflicts = FALSE)
library(visreg, warn.conflicts = FALSE)
set.alignment("right", row.names = "left", permanent = TRUE)
For continuous outcomes there is no need of exponentiating the results.
data(birthwt)
birthwt$smoke <- factor(birthwt$smoke, labels=c("Non-smoker", "Smoker"))
birthwt$race <- factor(birthwt$race > 1, labels=c("White", "Non-white"))
model_norm <- glm(bwt ~ smoke + race, data = birthwt)
Traditional output from the model:
pander(Anova(model_norm))
LR Chisq | Df | Pr(>Chisq) | |
---|---|---|---|
smoke | 15.83 | 1 | 6.917e-05 |
race | 18.49 | 1 | 1.706e-05 |
pander(summary(model_norm))
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | 3335 | 91.16 | 36.58 | 6.831e-87 |
smokeSmoker | -428.5 | 107.7 | -3.979 | 9.904e-05 |
raceNon-white | -452.1 | 105.1 | -4.3 | 2.751e-05 |
(Dispersion parameter for gaussian family taken to be 471136.9 )
Null deviance: | 99969656 on 188 degrees of freedom |
Residual deviance: | 87631472 on 186 degrees of freedom |
Table of coefficients:
glm_coef(model_norm)
#> Estimate Std. Error Lower CI Upper CI Pr(>|t|)
#> (Intercept) 3334.82 162.77 3013.71 3655.93 < 0.001
#> smokeSmoker -428.49 126.34 -677.74 -179.24 < 0.001
#> raceNon-white -452.10 155.59 -759.05 -145.15 0.004
Once we know the order in which the parameters are displayed, we can add labels to our final table:
Note: Compare results using naive and robust standard errors.
pander(glm_coef(model_norm, labels=c("Constant", "Smoker - Non-smoker", "Non-white - White"),
se.rob = FALSE), split.table=Inf, caption="Table of coeffients using naive
standard errors.")
Estimate | Std. Error | Lower CI | Upper CI | Pr(>|t|) | |
---|---|---|---|---|---|
Constant | 3335 | 91.16 | 3155 | 3515 | < 0.001 |
Smoker - Non-smoker | -428.5 | 107.7 | -640.9 | -216.1 | < 0.001 |
Non-white - White | -452.1 | 105.1 | -659.5 | -244.7 | < 0.001 |
pander(glm_coef(model_norm, labels=c("Constant", "Smoker - Non-smoker", "Non-white - White")),
split.table=Inf, caption="Table of coeffients using robust standard errors.")
Estimate | Std. Error | Lower CI | Upper CI | Pr(>|t|) | |
---|---|---|---|---|---|
Constant | 3335 | 162.8 | 3014 | 3656 | < 0.001 |
Smoker - Non-smoker | -428.5 | 126.3 | -677.7 | -179.2 | < 0.001 |
Non-white - White | -452.1 | 155.6 | -759 | -145.2 | 0.004 |
Effect plot:
visreg(model_norm, "smoke", by = "race", overlay = TRUE, band = FALSE, partial = FALSE,
rug = FALSE, ylab = "Birth weight (g)", xlab = "Smoking status")
For logistic regression we are interested in the odds ratios.
data(diet, package = "Epi")
model_binom <- glm(chd ~ fibre, data = diet, family = binomial)
pander(glm_coef(model_binom, labels = c("Constant", "Fibre intake (g/day)")), split.table=Inf,
caption="Parameter estimates from logistic regression.")
OR | Std. Error | Lower CI | Upper CI | Pr(>|z|) | |
---|---|---|---|---|---|
Constant | 0.95 | 0.59 | 0.3 | 3.01 | 0.934 |
Fibre intake (g/day) | 0.33 | 0.37 | 0.16 | 0.67 | 0.002 |
Effect plot:
visreg(model_binom, "fibre", scale = "response", band = FALSE, rug = FALSE,
ylab = "P (CHD)", xlab = "Fibre (g/day)")
data(bdendo, package = "Epi")
levels(bdendo$gall) <- c("No GBD", "GBD")
levels(bdendo$est) <- c("No oestrogen", "Oestrogen")
model_clogit <- clogit(d ~ est * gall + strata(set), data = bdendo)
glm_coef(model_clogit)
#> OR Std. Error Lower CI Upper CI Pr(>|z|)
#> estOestrogen 14.88 14.88 4.49 49.36 < 0.001
#> gallGBD 18.07 18.07 3.20 102.01 0.001
#> estOestrogen:gallGBD 0.13 0.13 0.02 0.90 0.039
pander(glm_coef(model_clogit, labels = c("Oestrogen/No oestrogen", "GBD/No GBD",
"Oestrogen:GBD Interaction")),
split.table = Inf, caption = "Parameter estimates from conditional logistic regression.")
OR | Std. Error | Lower CI | Upper CI | Pr(>|z|) | |
---|---|---|---|---|---|
Oestrogen/No oestrogen | 14.88 | 14.88 | 4.49 | 49.36 | < 0.001 |
GBD/No GBD | 18.07 | 18.07 | 3.2 | 102 | 0.001 |
Oestrogen:GBD Interaction | 0.13 | 0.13 | 0.02 | 0.9 | 0.039 |
Effect plot:
visreg(model_clogit, "gall", by = "est", xlab="Gall blader disease", ylab="P (cancer)",
overlay = TRUE, rug = FALSE, band = FALSE, partial = FALSE, trans = inv_logit)
library(ordinal, warn.conflicts = FALSE)
data(housing)
model_clm <- clm(Sat ~ Infl + Type + Cont, weights = Freq, data = housing)
glm_coef(model_clm)
#> Ordinal OR Lower CI Upper CI Std. Error Pr(>|Z|)
#> Low|Medium 0.61 0.48 0.78 0.12 < 0.001
#> Medium|High 2.00 1.56 2.55 0.13 < 0.001
#> InflMedium 1.76 1.44 2.16 0.10 < 0.001
#> InflHigh 3.63 2.83 4.66 0.13 < 0.001
#> TypeApartment 0.56 0.45 0.71 0.12 < 0.001
#> TypeAtrium 0.69 0.51 0.94 0.16 0.018
#> TypeTerrace 0.34 0.25 0.45 0.15 < 0.001
#> ContHigh 1.43 1.19 1.73 0.10 < 0.001
labs_ord <- c("Constant: Low/Medium satisfaction",
"Constant: Medium/High satisfaction",
"Perceived influence: Medium/Low",
"Perceived influence: High/Low",
"Accommodation: Apartment/Tower",
"Accommodation: Atrium/Tower",
"Accommodation: Terrace/Tower",
"Afforded: High/Low")
pander(glm_coef(model_clm, labels = labs_ord), split.table = Inf,
caption = "Parameter estimates on satisfaction of householders.")
Ordinal OR | Lower CI | Upper CI | Std. Error | Pr(>|Z|) | |
---|---|---|---|---|---|
Constant: Low/Medium satisfaction | 0.61 | 0.48 | 0.78 | 0.12 | < 0.001 |
Constant: Medium/High satisfaction | 2 | 1.56 | 2.55 | 0.13 | < 0.001 |
Perceived influence: Medium/Low | 1.76 | 1.44 | 2.16 | 0.1 | < 0.001 |
Perceived influence: High/Low | 3.63 | 2.83 | 4.66 | 0.13 | < 0.001 |
Accommodation: Apartment/Tower | 0.56 | 0.45 | 0.71 | 0.12 | < 0.001 |
Accommodation: Atrium/Tower | 0.69 | 0.51 | 0.94 | 0.16 | 0.018 |
Accommodation: Terrace/Tower | 0.34 | 0.25 | 0.45 | 0.15 | < 0.001 |
Afforded: High/Low | 1.43 | 1.19 | 1.73 | 0.1 | < 0.001 |
Note: In tne previous table parameter estimates and confidene intervals for Perceived influence and Accommodation were not adjusted for multiple comparisons. See example from Poisson Regression to see how to include adjusted parameters.
library(nnet)
model_multi <- multinom(Sat ~ Infl + Type + Cont, weights = Freq, data = housing)
#> # weights: 24 (14 variable)
#> initial value 1846.767257
#> iter 10 value 1747.045232
#> final value 1735.041933
#> converged
glm_coef(model_multi)
#>
#> [1] "Medium vs Low"
#> Multinomial OR lower95ci upper95ci z value Pr(>|z|)
#> (Intercept) NA -0.76 -0.08 -2.42 0.015
#> InflMedium 1.56 1.18 2.06 3.15 0.002
#> InflHigh 1.94 1.35 2.80 3.57 < 0.001
#> TypeApartment 0.65 0.46 0.91 -2.53 0.012
#> TypeAtrium 1.14 0.74 1.77 0.59 0.556
#> TypeTerrace 0.51 0.34 0.77 -3.23 0.001
#> ContHigh 1.43 1.11 1.86 2.73 0.006
#>
#>
#> [1] "High vs Low"
#> Multinomial OR lower95ci upper95ci z value Pr(>|z|)
#> (Intercept) NA -0.45 0.17 -0.87 0.384
#> InflMedium 2.09 1.59 2.73 5.37 < 0.001
#> InflHigh 5.02 3.61 6.96 9.65 < 0.001
#> TypeApartment 0.48 0.35 0.65 -4.74 < 0.001
#> TypeAtrium 0.66 0.44 1.01 -1.93 0.054
#> TypeTerrace 0.24 0.16 0.36 -7.06 < 0.001
#> ContHigh 1.62 1.27 2.06 3.88 < 0.001
For Poisson regression we are interested in incidence rate ratios.
data(quine)
levels(quine$Eth) <- list(White = "N", Aboriginal = "A")
levels(quine$Sex) <- list(Male = "M", Female = "F")
model_pois <- glm(Days ~ Eth + Sex + Age, family = poisson, data = quine)
glm_coef(model_pois)
#> IRR Std. Error Lower CI Upper CI Pr(>|z|)
#> (Intercept) 11.53 0.28 6.63 20.06 < 0.001
#> EthAboriginal 1.70 0.21 1.14 2.54 0.01
#> SexFemale 0.90 0.18 0.63 1.28 0.556
#> AgeF1 0.80 0.32 0.43 1.48 0.475
#> AgeF2 1.42 0.26 0.85 2.36 0.18
#> AgeF3 1.35 0.28 0.78 2.32 0.284
The assumption is that the mean is equal than the variance. Is that the case?
pander(estat(~ Days|Eth, data = quine, label = "Days of school absences"), split.table=Inf)
Eth | N | Min. | Max. | Mean | Median | SD | CV | |
---|---|---|---|---|---|---|---|---|
Days of school absences | White | 77 | 0 | 69 | 12.18 | 7 | 13.56 | 1.11 |
Aboriginal | 69 | 0 | 81 | 21.23 | 15 | 17.72 | 0.83 |
Note: Look at the relative dispersion (coefficient of variation), for the variance to be equal to the means the CV would have to be about 35%.
More formally the following calculation should be close to 1:
deviance(model_pois) / df.residual(model_pois)
#> [1] 12.44646
Thus, we have over-dispersion. One option is to use a negative binomial distribution.
model_negbin <- glm.nb(Days ~ Eth + Sex + Age, data = quine)
unadj <- glm_coef(model_negbin, labels=c("Constant",
"Race: Aboriginal/White",
"Sex: Female/Male",
"F1/Primary",
"F2/Primary",
"F3/Primary"))
Notice that age group is a factor with more than two levels and is significant:
pander(Anova(model_negbin))
LR Chisq | Df | Pr(>Chisq) | |
---|---|---|---|
Eth | 12.66 | 1 | 0.0003743 |
Sex | 0.1486 | 1 | 0.6999 |
Age | 9.484 | 3 | 0.0235 |
Thus, we want to report confidence intervals and \(p\)-values adjusted for multiple comparisons.
The unadjusted CIs:
pander(unadj, split.table=Inf, caption = "Parameter estimates with unadjusted CIs and p-values.")
IRR | Std. Error | Lower CI | Upper CI | Pr(>|z|) | |
---|---|---|---|---|---|
Constant | 12.24 | 0.27 | 7.28 | 20.58 | < 0.001 |
Race: Aboriginal/White | 1.76 | 0.2 | 1.19 | 2.62 | 0.005 |
Sex: Female/Male | 0.94 | 0.18 | 0.66 | 1.33 | 0.722 |
F1/Primary | 0.69 | 0.29 | 0.39 | 1.22 | 0.204 |
F2/Primary | 1.2 | 0.26 | 0.71 | 2.01 | 0.496 |
F3/Primary | 1.29 | 0.27 | 0.75 | 2.2 | 0.357 |
Effect plot:
visreg(model_negbin, "Age", by = "Eth", scale = "response", rug = FALSE, band = FALSE)
We adjust for multiple comparisons:
model_glht <- glht(model_negbin, linfct = mcp(Age = "Tukey"))
age_glht <- xymultiple(model_glht, Exp = TRUE, plot = FALSE)
We can see the comparison graphically with:
xymultiple(model_glht, Exp = TRUE)
#> Comparison Ratio lwr upr Pr(>|Z|)
#> 1 F1 - F0 0.69 0.38 1.26 0.220
#> 2 F2 - F0 1.20 0.66 2.17 0.550
#> 3 F3 - F0 1.29 0.69 2.40 0.550
#> 4 F2 - F1 1.73 1.02 2.92 0.022
#> 5 F3 - F1 1.86 1.07 3.21 0.020
#> 6 F3 - F2 1.08 0.62 1.88 0.737
Parameter estimates on the effect of age group on the number of days absent from school. Bars represent 95% CIs adjusted by the method of Westfall for multiple comparisons.
We use this information to construct the final table:
final <- unadj
final[, 5] <- as.character(final[, 5])
age_glht[, 5] <- as.character(age_glht[, 5])
final[4:6, 3:5] <- age_glht[1:3, 3:5]
pander(final, split.table=Inf, caption = "Parameter estimates. CIs and p-values for age group were adjusted
for multiple comparisons by the method of Westfall.")
IRR | Std. Error | Lower CI | Upper CI | Pr(>|z|) | |
---|---|---|---|---|---|
Constant | 12.24 | 0.27 | 7.28 | 20.58 | < 0.001 |
Race: Aboriginal/White | 1.76 | 0.2 | 1.19 | 2.62 | 0.005 |
Sex: Female/Male | 0.94 | 0.18 | 0.66 | 1.33 | 0.722 |
F1/Primary | 0.69 | 0.29 | 0.38 | 1.26 | 0.22 |
F2/Primary | 1.2 | 0.26 | 0.66 | 2.17 | 0.55 |
F3/Primary | 1.29 | 0.27 | 0.69 | 2.4 | 0.55 |
data(bladder)
bladder$times <- bladder$stop
bladder$rx <- factor(bladder$rx, labels=c("Placebo", "Thiotepa"))
model_surv <- survreg(Surv(times, event) ~ rx, data = bladder)
Using robust standard errors (default):
glm_coef(model_surv)
#> Survival time ratio Std. Error Lower CI Upper CI Pr(>|z|)
#> rxThiotepa 1.64 0.31 0.89 3.04 0.116
#> Scale 1.00 0.08 0.85 1.18 0.992
pander(glm_coef(model_surv, labels = c("Treatment: Thiotepa/Placebo", "Scale")),
split.table = Inf)
Survival time ratio | Std. Error | Lower CI | Upper CI | Pr(>|z|) | |
---|---|---|---|---|---|
Treatment: Thiotepa/Placebo | 1.64 | 0.31 | 0.89 | 3.04 | 0.116 |
Scale | 1 | 0.08 | 0.85 | 1.18 | 0.992 |
In this example the scale parameter is not statistically different from one, meaning hazard is constant and thus, we can use the exponential distribution:
model_exp <- survreg(Surv(times, event) ~ rx, data = bladder, dist = "exponential")
pander(glm_coef(model_exp, labels = "Treatment: Thiotepa/Placebo"),
split.table = Inf)
Survival time ratio | Std. Error | Lower CI | Upper CI | Pr(>|z|) | |
---|---|---|---|---|---|
Treatment: Thiotepa/Placebo | 1.64 | 0.33 | 0.85 | 3.16 | 0.139 |
Interpretation: Patients receiving Thiotepa live on average 64% more than those in the Placebo group.
Using naive standard errors:
pander(glm_coef(model_exp, se.rob = FALSE, labels = "Treatment: Thiotepa/Placebo"),
split.table = Inf)
Survival time ratio | Std. Error | Lower CI | Upper CI | Pr(>|z|) | |
---|---|---|---|---|---|
Treatment: Thiotepa/Placebo | 1.64 | 0.2 | 1.11 | 2.41 | 0.012 |
Effect plot:
visreg(model_exp, "rx", partial = FALSE, rug = FALSE, ylab = "Survival time",
xlab = "Treatment")
model_cox <- coxph(Surv(times, event) ~ rx, data = bladder)
pander(glm_coef(model_cox, labels = c("Treatment: Thiotepa/Placebo")), split.table = Inf)
Hazard ratio | Std. Error | Lower CI | Upper CI | Pr(>|z|) | |
---|---|---|---|---|---|
Treatment: Thiotepa/Placebo | 0.64 | 0.2 | 0.44 | 0.94 | 0.024 |
Interpretation: Patients receiving Thiotepa are 64% less likely of dying than those in the Placebo group.
Effect plot:
visreg(model_cox, "rx", partial = FALSE, trans = exp, rug = FALSE,
ylab = "Hazard", xlab = "Treatment")
library(nlme, warn.conflicts = FALSE)
data(Orthodont)
model_lme <- lme(distance ~ Sex*I(age - mean(age, na.rm = TRUE)), random=~1|Subject,
method="ML", data=Orthodont)
glm_coef(model_lme)
#> Coeff Lower CI Upper CI SE DF
#> (Intercept) 24.97 24.03 24.03 0.48 79
#> SexFemale -2.32 -3.78 -3.78 0.75 25
#> I(age - mean(age, na.rm = TRUE)) 0.78 0.63 0.63 0.08 79
#> SexFemale:I(age - mean(age, na.rm = TRUE)) -0.30 -0.54 -0.54 0.12 79
#> t value Pr(>|t|)
#> (Intercept) 52.39 < 0.001
#> SexFemale -3.11 0.005
#> I(age - mean(age, na.rm = TRUE)) 10.06 < 0.001
#> SexFemale:I(age - mean(age, na.rm = TRUE)) -2.49 0.015
pander(glm_coef(model_lme, labels = c("Constant", "Sex: female-male", "Age (years)",
"Sex:Age interaction")), split.table=Inf)
Coeff | Lower CI | Upper CI | SE | DF | t value | Pr(>|t|) | |
---|---|---|---|---|---|---|---|
Constant | 24.97 | 24.03 | 24.03 | 0.48 | 79 | 52.39 | < 0.001 |
Sex: female-male | -2.32 | -3.78 | -3.78 | 0.75 | 25 | -3.11 | 0.005 |
Age (years) | 0.78 | 0.63 | 0.63 | 0.08 | 79 | 10.06 | < 0.001 |
Sex:Age interaction | -0.3 | -0.54 | -0.54 | 0.12 | 79 | -2.49 | 0.015 |
visreg(model_lme, "age", by = "Sex", overlay = TRUE, xlab = "Age (years)", ylab = "Distance (mm)")
library(gee, warn.conflicts = FALSE)
model_gee_norm <- gee(distance ~ Sex*I(age - mean(age, na.rm = TRUE)), id = Subject,
data = Orthodont, corstr = "AR-M")
#> (Intercept)
#> 24.9687500
#> SexFemale
#> -2.3210227
#> I(age - mean(age, na.rm = TRUE))
#> 0.7843750
#> SexFemale:I(age - mean(age, na.rm = TRUE))
#> -0.3048295
For GEE models, robust standard errors are used by default:
pander(glm_coef(model_gee_norm, labels = c("Constant", "Sex: female-male", "Age (years)",
"Sex:Age interaction")), split.table=Inf)
Coeff | Lower CI | Upper CI | SE | Pr(>|z|) | |
---|---|---|---|---|---|
Constant | 25.06 | 24.2 | 25.92 | 0.44 | < 0.001 |
Sex: female-male | -2.42 | -3.89 | -0.94 | 0.75 | 0.001 |
Age (years) | 0.77 | 0.56 | 0.98 | 0.1 | < 0.001 |
Sex:Age interaction | -0.29 | -0.53 | -0.05 | 0.12 | 0.02 |
data(Thall)
c1 <- cbind(Thall[, c(1:5)], count = Thall$y1)[, c(1:4, 6)]
c2 <- cbind(Thall[, c(1:4, 6)], count = Thall$y2)[, c(1:4, 6)]
c3 <- cbind(Thall[, c(1:4, 7)], count = Thall$y3)[, c(1:4, 6)]
c4 <- cbind(Thall[, c(1:4, 8)], count = Thall$y3)[, c(1:4, 6)]
epilepsy <- rbind(c1, c2, c3, c4)
model_gee <- gee(count ~ treat + base + I(age - mean(age, na.rm = TRUE)), id = factor(id),
data = epilepsy, family = poisson, corstr = "exchangeable", scale.fix = TRUE)
pander(glm_coef(model_gee, labels = c("Constant", "Treatment (Prograbide/Control)",
"Baseline count", "Age (years)")), split.table = Inf)
Coeff | Exp(Coeff) | Lower CI | Upper CI | SE | Pr(>|z|) | |
---|---|---|---|---|---|---|
Constant | 1.23 | NA | 1 | 1.45 | 0.11 | < 0.001 |
Treatment (Prograbide/Control) | -0.13 | 0.88 | 0.68 | 1.14 | 0.13 | 0.33 |
Baseline count | 0.02 | 1.02 | 1.02 | 1.02 | 0 | < 0.001 |
Age (years) | 0.02 | 1.02 | 1.01 | 1.04 | 0.01 | 0.003 |
Using glmer
:
library(lme4, warn.conflicts = FALSE)
model_glmer <- glmer(count ~ treat + base + I(age - mean(age, na.rm = TRUE)) +
(1|id), data=epilepsy, family=poisson)
pander(glm_coef(model_glmer, labels = c("Constant", "Treatment (Prograbide/Control)",
"Baseline count", "Age (years)")), split.table = Inf)
Coeff | Exp(Coeff) | Lower CI | Upper CI | SE | z value | Pr(>|z|) | |
---|---|---|---|---|---|---|---|
Constant | 0.85 | NA | 0.53 | 1.18 | 0.16 | 5.21 | < 0.001 |
Treatment (Prograbide/Control) | -0.22 | 0.8 | 0.57 | 1.13 | 0.18 | -1.27 | 0.203 |
Baseline count | 0.03 | 1.03 | 1.02 | 1.03 | 0 | 8.7 | < 0.001 |
Age (years) | 0.01 | 1.01 | 0.99 | 1.04 | 0.01 | 0.91 | 0.364 |
Do we may have over-dispersion?
pander(estat(~ count|treat, data = epilepsy, label = "Number of seizures"))
treat | N | Min. | Max. | Mean | Median | SD | CV | |
---|---|---|---|---|---|---|---|---|
Number of seizures | Control | 112 | 0 | 76 | 8.8 | 5 | 12.09 | 1.37 |
Prograbide | 124 | 0 | 102 | 8.31 | 4 | 14.48 | 1.74 |
Scaling the variance:
model_quasi <- gee(count ~ treat + base + I(age - mean(age, na.rm = TRUE)), id = factor(id),
data = epilepsy, family = quasi(variance = "mu^2", link = "log"),
corstr = "exchangeable")
pander(glm_coef(model_quasi, labels = c("Constant", "Treatment (Prograbide/Control)",
"Baseline count", "Age (years)")), split.table = Inf)
Coeff | Exp(Coeff) | Lower CI | Upper CI | SE | Pr(>|z|) | |
---|---|---|---|---|---|---|
Constant | 0.97 | NA | 0.8 | 1.15 | 0.09 | < 0.001 |
Treatment (Prograbide/Control) | -0.17 | 0.85 | 0.67 | 1.08 | 0.12 | 0.175 |
Baseline count | 0.03 | 1.03 | 1.03 | 1.03 | 0 | < 0.001 |
Age (years) | 0.02 | 1.02 | 1 | 1.03 | 0.01 | 0.041 |