Abstract
Why use frequentist methods when you can use, in an even simpler way, the Bayesian framework? Throughout this tutorial, we will explore many of the analyses you might want to do with your data.
In short, because it’s:
Reasons to prefer this approach are reliability, better accuracy in noisy data, better estimation for small samples, less prone to type I error, the possibility of introducing prior knowledge into the analysis and, critically, results intuitiveness and their straightforward interpretation (Andrews & Baguley, 2013; Etz & Vandekerckhove, 2016; Kruschke, 2010; Kruschke, Aguinis, & Joo, 2012; Wagenmakers et al., 2018). Indeed, in the frequentist view, the effects are fixed (but unknown) and data are random, while the Bayesian inference calculates the probability of different effect values (called the “posterior” distribution) given the observed data. Bayesian’s uncertainty can be summarized, for example, by giving a range of values on the posterior distribution that includes 95% of the probability (the 95% Credible Interval). To illustrate the difference, the Bayesian framework allows to say “given the observed data, the effect has 95% probability of falling within this range”, while the Frequentist less straightforward alternative would be “there is a 95% probability that when computing a confidence interval from data of this sort, the effect falls within this range”. In general, the frequentist approach has been associated with the focus on null hypothesis testing, and the misuse of p values has been shown to critically contribute to the reproducibility crisis of psychological science (Chambers, Feredoes, Muthukumaraswamy, Suresh, & Etchells, 2014; Szucs & Ioannidis, 2016). There is a general agreement that the generalization of the Bayesian approach is a way of overcoming these issues (Benjamin et al., 2018; Etz & Vandekerckhove, 2016).
Once we agreed that the Bayesian framework is the right way to go, you might wonder what is the Bayesian framework. What’s all the fuss about?
Omitting the maths behind it, let’s just say that:
Let’s imagine two numeric variables, Y and X. The correlation between them is r = -0.063 (p < .05). A Bayesian analysis would return the probability of distribution of this effect (the posterior), that we can characterize using several indices (centrality (median or mean), dispersion (SD or Median Absolute Deviation - MAD), etc.). Let’s plot the posterior distribution of the possible correlation values that are compatible with our data.
Posterior probability distribution of the correlation between X and Y
In this example (based on real data):
Now that you’re familiar with posterior distributions, the core difference of the Bayesian framework, let’s practice!
Let’s start by taking a look at the dataset included within the psycho
package.
library(rstanarm)
library(dplyr)
library(ggplot2)
library(psycho)
df <- psycho::affective
summary(df)
## Sex Age Birth_Season Salary Life_Satisfaction
## F:1000 Min. :18.00 Fall :288 <1000:514 Min. :1.000
## M: 251 1st Qu.:21.16 Spring:348 <2000:223 1st Qu.:4.000
## Median :22.97 Summer:332 2000+:128 Median :5.000
## Mean :26.91 Winter:283 NA's :386 Mean :4.847
## 3rd Qu.:27.54 3rd Qu.:6.000
## Max. :80.14 Max. :7.000
## Concealing Adjusting Tolerating
## Min. :0.000 Min. :0.000 Min. :0.500
## 1st Qu.:2.750 1st Qu.:2.750 1st Qu.:3.500
## Median :3.750 Median :3.750 Median :4.250
## Mean :3.743 Mean :3.802 Mean :4.157
## 3rd Qu.:4.750 3rd Qu.:5.000 3rd Qu.:5.000
## Max. :7.000 Max. :6.750 Max. :7.000
The data include 5 continuous variables (age, life satisfaction and 3 affective styles) and 3 factors (sex, salary and season of birth).
Let’s start with something simple : a correlation. To simplify, a (Pearson’s) correlation is pretty much nothing more than a simple linear regression (with standardized variables). Let’s see if there’s a linear relationship between Life Satisfaction and the tendency of Tolerating our emotions using a Bayesian linear regression model.
# Let's fit our model
fit <- rstanarm::stan_glm(Life_Satisfaction ~ Tolerating, data=df)
Let’s check the results:
# Format the results using analyze()
results <- psycho::analyze(fit)
# We can extract a formatted summary table
summary(results, round = 2)
Variable | Median | MAD | CI_lower | CI_higher | MPE | Overlap |
---|---|---|---|---|---|---|
R2 | 0.02 | 0.01 | 0.01 | 0.04 | NA | NA |
(Intercept) | 4.06 | 0.15 | 3.83 | 4.32 | NA | NA |
Tolerating | 0.19 | 0.03 | 0.13 | 0.24 | 100 | 0.67 |
For each parameter of the model, the summary shows:
It also returns the (unadjusted) R2 (which represents the percentage of variance of the outcome explained by the model). In the Bayesian framework, the R2 is also estimated with probabilities. As such, characteristics of its posterior distribution are returned.
We can also print a formatted version:
print(results)
## We fitted a Markov Chain Monte Carlo gaussian (link = identity) model (4 chains, each with iter = 2000; warmup = 1000; thin = 1; post-warmup = 1000) to predict Life_Satisfaction (formula = Life_Satisfaction ~ Tolerating). The model's priors were set as follows:
##
## ~ normal (location = (0), scale = (3.11))
##
##
## - The effect of Tolerating has a probability of 100% of being positive (Median = 0.19, MAD = 0.035, 90% CI [0.13, 0.24], O = 0.67%). It can be considered as small or very small with respective probabilities of 4.12% and 95.88%.The model explains about 2.34% of the outcome's variance (MAD = 0.0083, 90% CI [0.010, 0.037], adj. R2 = 0.020).
##
## The intercept is at 4.06 (MAD = 0.15, 90% CI [3.83, 4.32]). Within this model:
Note that the print()
returns also additional info, such as the 100% CI of the R2 and, for each parameter, the limits of the range defined by the MPE.
For now, omit the part dedicated to priors. We’ll see it in the next chapters. Let’s rather interpret the part related to effects.
Full Bayesian mixed linear models are fitted using the rstanarm R wrapper for the stan probabilistic language (Gabry & Goodrich, 2016). Bayesian inference was done using Markov Chain Monte Carlo (MCMC) sampling. The prior distributions of all effects were set as weakly informative (mean = 0, SD = 3.11), meaning that we did not expect effects different from null in any particular direction. For each model and each coefficient, we will present several characteristics of the posterior distribution, such as its median (a robust estimate comparable to the beta from frequentist linear models), MAD (median absolute deviation, a robust equivalent of standard deviation) and the 90% credible interval. Instead of the p value as an index of effect existence, we also computed the maximum probability of effect (MPE), i.e., the maximum probability that the effect is different from 0 in the median’s direction. For our analyses, we will consider an effect as inconsistent (i.e., not probable enough) if its MPE is lower than 90% (however, beware not to fall in a p value-like obsession).
The current model explains about 2.34% of life satisfaction variance. Within this model, a positive linear relationship between life satisfaction and tolerating exists with high probability (Median = 0.19, MAD = 0.035, 90% CI [0.13, 0.24], MPE = 100%).
To visualize the model, the most neat way is to extract a “reference grid” (i.e., a theorethical dataframe with balanced data).
refgrid <- df %>%
select(Tolerating) %>%
psycho::refdata(length.out=10)
predicted <- psycho::get_predicted(fit, newdata=refgrid)
predicted
Tolerating | Life_Satisfaction_Median | Life_Satisfaction_CI_5 | Life_Satisfaction_CI_95 |
---|---|---|---|
0.500000 | 4.151345 | 3.954587 | 4.386005 |
1.222222 | 4.289382 | 4.124491 | 4.478645 |
1.944444 | 4.427571 | 4.278317 | 4.556361 |
2.666667 | 4.564690 | 4.456243 | 4.665020 |
3.388889 | 4.702245 | 4.626350 | 4.781523 |
4.111111 | 4.839042 | 4.770260 | 4.902102 |
4.833333 | 4.974823 | 4.893558 | 5.052642 |
5.555556 | 5.112638 | 5.008735 | 5.222022 |
6.277778 | 5.249428 | 5.102698 | 5.384321 |
7.000000 | 5.387813 | 5.197429 | 5.554375 |
Our refgrid is made of equally spaced (balanced) predictor values. It also include the median of the posterior prediction, as well as 90% credible intervals. Now, we can plot it as follows:
ggplot(predicted, aes(x=Tolerating, y=Life_Satisfaction_Median)) +
geom_line() +
geom_ribbon(aes(ymin=Life_Satisfaction_CI_5,
ymax=Life_Satisfaction_CI_95),
alpha=0.1)
When the predictor is categorical, simplifying the model is called running an ANOVA. Let’s do it by answering the following question: does the level of life satisfaction depend on the salary?
# Let's fit our model
fit <- rstanarm::stan_glm(Life_Satisfaction ~ Salary, data=df)
Let’s check the results:
# Format the results using analyze()
results <- psycho::analyze(fit)
# We can extract a formatted summary table
print(results)
## We fitted a Markov Chain Monte Carlo gaussian (link = identity) model (4 chains, each with iter = 2000; warmup = 1000; thin = 1; post-warmup = 1000) to predict Life_Satisfaction (formula = Life_Satisfaction ~ Salary). The model's priors were set as follows:
##
## ~ normal (location = (0, 0), scale = (3.61, 3.61))
##
##
## - The effect of Salary2000+ has a probability of 92.88% of being positive (Median = 0.20, MAD = 0.14, 90% CI [-0.031, 0.43], O = 46.75%). It can be considered as very small with a probability of 92.88%.The model explains about 0.42% of the outcome's variance (MAD = 0.0036, 90% CI [0, 0.011], adj. R2 = -0.0039).
##
## The intercept is at 4.76 (MAD = 0.065, 90% CI [4.66, 4.87]). Within this model:
## - The effect of Salary<2000 has a probability of 87.10% of being positive (Median = 0.13, MAD = 0.11, 90% CI [-0.049, 0.32], O = 56.22%). It can be considered as very small with a probability of 87.10%.
What interest us is the pairwise comparison between the groups. The get_contrasts
function computes the estimated marginal means (least-squares means), i.e., the means of each group estimated by the model, as well as the contrasts.
contrasts <- psycho::get_contrasts(fit, "Salary")
We can see the estimated means like that:
contrasts$means
Level | Median | MAD | CI_lower | CI_higher |
---|---|---|---|---|
Salary <1000 | 4.76 | 0.07 | 4.66 | 4.87 |
Salary <2000 | 4.90 | 0.10 | 4.73 | 5.04 |
Salary 2000+ | 4.97 | 0.13 | 4.76 | 5.17 |
And the contrasts comparisons like that:
contrasts$contrasts
Contrast | Median | MAD | CI_lower | CI_higher | MPE |
---|---|---|---|---|---|
<1000 - <2000 | -0.13 | 0.11 | -0.32 | 0.05 | 87.10 |
<1000 - 2000+ | -0.20 | 0.14 | -0.43 | 0.03 | 92.88 |
<2000 - 2000+ | -0.07 | 0.16 | -0.33 | 0.19 | 67.60 |
As we can see, the only probable difference (MPE > 90%) is between Salary <1000 and Salary 2000+.
ggplot(contrasts$means, aes(x=Level, y=Median, group=1)) +
geom_line() +
geom_pointrange(aes(ymin=CI_lower, ymax=CI_higher)) +
ylab("Life Satisfaction") +
xlab("Salary")
Let’s see if we can predict the sex with the tendency to flexibly adjust our emotional reactions. As the Sex is a binary factor (with two modalities), we have to fit a logistic model.
# Let's fit our model
fit <- rstanarm::stan_glm(Sex ~ Adjusting, data=df, family = "binomial")
First, let’s check our model:
# Format the results using analyze()
results <- psycho::analyze(fit)
## Warning in Ops.factor(ypredloo, y): '-' not meaningful for factors
## Warning in stats::var(y): Calling var(x) on a factor x is deprecated and will become an error.
## Use something like 'all(duplicated(x)[-1L])' to test for a constant vector.
# We can extract a formatted summary table
summary(results, round = 2)
Variable | Median | MAD | CI_lower | CI_higher | MPE | Overlap |
---|---|---|---|---|---|---|
R2 | 0.01 | 0.01 | 0.00 | 0.02 | NA | NA |
(Intercept) | -2.20 | 0.22 | -2.56 | -1.84 | NA | NA |
Adjusting | 0.21 | 0.05 | 0.13 | 0.29 | 100 | 4.65 |
It appears that the link between adjusting and the sex is highly probable (MPE > 90%). But in what direction? To know that, we have to find out what is the intercept (the reference level).
## [1] "F" "M"
As female is the first level, it means that it is the intercept. Based on our model, an increase of 1 on the scale of adjusting will increase the probability (expressed in log odds ratios) of being a male.
To visualize this type of model, we have to derive a reference grid.
refgrid <- df %>%
select(Adjusting) %>%
psycho::refdata(length.out=10)
predicted <- psycho::get_predicted(fit, newdata=refgrid)
Note that get_predicted
automatically transformed log odds ratios (the values in which the model is expressed) to probabilities, easier to apprehend.
ggplot(predicted, aes(x=Adjusting, y=Sex_Median)) +
geom_line() +
geom_ribbon(aes(ymin=Sex_CI_5,
ymax=Sex_CI_95),
alpha=0.1) +
ylab("Probability of being a male")
We can nicely see the non-linear relationship between adjusting and the probability of being a male.
Let’s create models a bit more complex, mixing factors with numeric predictors, to see if the life satisfaction is related to the tendency to suppress, conceal the emotional reactions, and does this relationship depends on the sex.
# Let's fit our model
fit <- rstanarm::stan_glm(Life_Satisfaction ~ Concealing * Sex, data=df)
Let’s check our model:
# Format the results using analyze()
results <- psycho::analyze(fit)
# We can extract a formatted summary table
summary(results, round = 2)
Variable | Median | MAD | CI_lower | CI_higher | MPE | Overlap |
---|---|---|---|---|---|---|
R2 | 0.01 | 0.01 | 0.00 | 0.02 | NA | NA |
(Intercept) | 5.19 | 0.12 | 5.00 | 5.38 | NA | NA |
Concealing | -0.10 | 0.03 | -0.15 | -0.05 | 99.95 | 11.35 |
SexM | -0.66 | 0.33 | -1.16 | -0.12 | 97.82 | 30.84 |
Concealing:SexM | 0.18 | 0.07 | 0.06 | 0.30 | 99.28 | 22.16 |
Again, it is important to notice that the intercept (the baseline) corresponds here to Concealing = 0 and Sex = F. As we can see next, there is, with high probability, a negative linear relationship between concealing (for females only) and life satisfaction. Also, at the (theorethical) intercept (when concealing = 0), the males have a lower life satisfaction. Finally, the interaction is also probable. This means that when the participant is a male, the relationship between concealing and life satisfaction is significantly different (increased by 0.17. In other words, we could say that the relationship is of -0.10+0.17=0.07 in men).
How to represent this type of models? Again, we have to generate a reference grid.
refgrid <- df %>%
select(Concealing, Sex) %>%
psycho::refdata(length.out=10)
predicted <- psycho::get_predicted(fit, newdata=refgrid)
predicted
Concealing | Sex | Life_Satisfaction_Median | Life_Satisfaction_CI_5 | Life_Satisfaction_CI_95 |
---|---|---|---|---|
0.0000000 | F | 5.190388 | 5.001200 | 5.381736 |
0.7777778 | F | 5.114082 | 4.960851 | 5.274063 |
1.5555556 | F | 5.037193 | 4.906996 | 5.157815 |
2.3333333 | F | 4.961762 | 4.863765 | 5.060087 |
3.1111111 | F | 4.885500 | 4.806135 | 4.963808 |
3.8888889 | F | 4.810217 | 4.732327 | 4.885447 |
4.6666667 | F | 4.734072 | 4.642445 | 4.826327 |
5.4444444 | F | 4.658615 | 4.533215 | 4.771824 |
6.2222222 | F | 4.582772 | 4.420314 | 4.722283 |
7.0000000 | F | 4.506502 | 4.318889 | 4.689379 |
0.0000000 | M | 4.534983 | 4.007624 | 4.968796 |
0.7777778 | M | 4.598149 | 4.195471 | 4.999753 |
1.5555556 | M | 4.660830 | 4.332912 | 4.985867 |
2.3333333 | M | 4.720981 | 4.461401 | 4.973491 |
3.1111111 | M | 4.783895 | 4.572836 | 4.965763 |
3.8888889 | M | 4.844840 | 4.688676 | 5.003795 |
4.6666667 | M | 4.908570 | 4.753125 | 5.065187 |
5.4444444 | M | 4.972404 | 4.783561 | 5.169266 |
6.2222222 | M | 5.034394 | 4.775300 | 5.286061 |
7.0000000 | M | 5.099924 | 4.763345 | 5.413600 |
As we can see, the reference grid is balanced in terms of factors and numeric predictors. Now, to plot this becomes very easy!
ggplot(predicted, aes(x=Concealing, y=Life_Satisfaction_Median, fill=Sex)) +
geom_line(aes(colour=Sex)) +
geom_ribbon(aes(fill=Sex,
ymin=Life_Satisfaction_CI_5,
ymax=Life_Satisfaction_CI_95),
alpha=0.1) +
ylab("Life Satisfaction")
We can see that the error for the males is larger, due to less observations.
The Mixed modelling framework allows estimated effects to vary by group at lower levels while estimating population-level effects through the specification of fixed (explanatory variables) and random (variance components) effects. Outperforming traditional procedures such as repeated measures ANOVA (Kristensen & Hansen, 2004), these models are particularly suited to cases in which experimental stimuli are heterogeneous (e.g., images) as the item-related variance, in addition to the variance induced by participants, can be accounted for (Baayen, Davidson, & Bates, 2008; Magezi, 2015). Moreover, mixed models can handle unbalanced data, nested designs, crossed random effects and missing data.
As for how to run this type of analyses, it is quite easy. Indeed, all what has been said previously remains the same for mixed models. Except that there are random effects (specified by putting + (1|random_term)
in the formula). For example, we might want to consider the salary as a random effect (to “adjust” (so to speak) for the fact that the data is structured in two groups). Let’s explore the relationship between the tendency to conceal emotions and age (adjusted for salary).
# Let's fit our model (it takes more time)
fit <- rstanarm::stan_lmer(Concealing ~ Age + (1|Salary), data=df)
Let’s check our model:
# Format the results using analyze()
results <- psycho::analyze(fit)
## Warning in ypredloo - y: la taille d'un objet plus long n'est pas multiple
## de la taille d'un objet plus court
# We can extract a formatted summary table
summary(results, round = 2)
Variable | Median | MAD | CI_lower | CI_higher | MPE | Overlap |
---|---|---|---|---|---|---|
R2 | 0.00 | 0.00 | 0.00 | 0.01 | NA | NA |
(Intercept) | 3.95 | 0.17 | 3.61 | 4.25 | NA | NA |
Age | -0.01 | 0.00 | -0.02 | 0.00 | 91.2 | 46.61 |
As we can see, the linear relationship has only a moderate probability of being different from 0.
refgrid <- df %>%
select(Age) %>%
psycho::refdata(length.out=10)
# We name the predicted dataframe by adding '_linear' to keep it for further comparison (see next part)
predicted_linear <- psycho::get_predicted(fit, newdata=refgrid)
ggplot(predicted_linear, aes(x=Age, y=Concealing_Median)) +
geom_line() +
geom_ribbon(aes(ymin=Concealing_CI_5,
ymax=Concealing_CI_95),
alpha=0.1)
Relationships in the real world are often non-linear. For example, based on the previous relationship between concealing and age, we could try modelling a polynomial (second order) transformation to the predictor.
# Let's fit our model (it takes more time)
fit <- rstanarm::stan_lmer(Concealing ~ poly(Age, 2, raw=TRUE) + (1|Salary), data=df)
Let’s check our model:
# Format the results using analyze()
results <- psycho::analyze(fit)
## Warning in ypredloo - y: la taille d'un objet plus long n'est pas multiple
## de la taille d'un objet plus court
# We can extract a formatted summary table
summary(results, round = 2)
Variable | Median | MAD | CI_lower | CI_higher | MPE | Overlap |
---|---|---|---|---|---|---|
R2 | 0.01 | 0.01 | 0.00 | 0.02 | NA | NA |
(Intercept) | 4.99 | 0.49 | 4.09 | 5.81 | NA | NA |
poly(Age, 2, raw = TRUE)1 | -0.07 | 0.03 | -0.11 | -0.02 | 99.2 | 24.10 |
poly(Age, 2, raw = TRUE)2 | 0.00 | 0.00 | 0.00 | 0.00 | 98.6 | 80.18 |
As we can see, both the linear relationship and the second order curvature are highly probable. However, when setting raw=TRUE
in the formula, the coefficients become unintepretable. So let’s visualize them.
The model visualization routine is similar to the previous ones.
refgrid <- df %>%
select(Age) %>%
psycho::refdata(length.out=20)
predicted_poly <- psycho::get_predicted(fit, newdata=refgrid)
ggplot(predicted_poly, aes(x=Age, y=Concealing_Median)) +
geom_line() +
geom_ribbon(aes(ymin=Concealing_CI_5,
ymax=Concealing_CI_95),
alpha=0.1)
As we can see, adding the polynomial degree changes the relationship. Since the model is here very simple, we can add on the plot the actual points (however, they do not take into account the random effects and such), as well as plot the two models. Also, let’s make it “dynamic” using plotly
.
p <- ggplot() +
# Linear model
geom_line(data=predicted_linear,
aes(x=Age, y=Concealing_Median),
colour="blue",
size=1) +
geom_ribbon(data=predicted_linear,
aes(x=Age,
ymin=Concealing_CI_5,
ymax=Concealing_CI_95),
alpha=0.1,
fill="blue") +
# Polynormial Model
geom_line(data=predicted_poly,
aes(x=Age, y=Concealing_Median),
colour="red",
size=1) +
geom_ribbon(data=predicted_poly,
aes(x=Age,
ymin=Concealing_CI_5,
ymax=Concealing_CI_95),
fill="red",
alpha=0.1) +
# Actual data
geom_point(data=df, aes(x=Age, y=Concealing))
library(plotly) # To create interactive plots
ggplotly(p) # To transform a ggplot into an interactive plot
It’s good to take a few steps back and look at the bigger picture :)
One of the interesting aspect of the Bayesian framework is the possibility of adding prior expectations about the effect, to help model fitting and increase accuracy in noisy data or small samples.
As you might have notice, we didn’t specify any priors in the previous analyses. In fact, we let the algorithm define and set weakly informative priors, designed to provide moderate regularization and help stabilize computation, without biasing the effect direction. For example, a wealky informative prior, for a standardized predictor (with mean = 0 and SD = 1) could be a normal distribution with mean = 0 and SD = 1. This means that the effect of this predictor is expected to be equally probable in any direction (as the distribution is symmetric around 0), with probability being higher close to 0 and lower far from 0.
While this prior doesn’t bias the direction of the Bayesian (MCMC) sampling, it suggests that having an effect of 100 (i.e., located at 100 SD of the mean as our variables are standardized) is highly unprobable, and that an effect close to 0 is more probable.
To better play with priors, let’s start by standardizing our dataframe.
# Standardize (scale and center) the numeric variables
dfZ <- psycho::standardize(df)
Then, we can explicitly specify a weakly informative prior for all effects of the model.
# Let's fit our model
fit <- rstanarm::stan_glm(Life_Satisfaction ~ Tolerating,
data=dfZ,
prior=normal(location = 0, # Mean
scale = 1, # SD
autoscale=FALSE)) # Don't adjust scale automatically
Let’s plot the prior (the expectation) against the posterior (the estimated effect) distribution.
results <- psycho::analyze(fit)
# Extract the posterior
posterior <- results$values$effects$Tolerating$posterior
# Create a posterior with the prior and posterior distribution and plot them.
data.frame(posterior = posterior,
prior = rnorm(length(posterior), 0, 1)) %>%
ggplot() +
geom_density(aes(x=posterior), fill="lightblue", alpha=0.5) +
geom_density(aes(x=prior), fill="blue", alpha=0.5) +
scale_y_sqrt() # Change the Y axis so the plot is less ugly
This plot is rather ugly, because our posterior is very precise (due to the large sample) compared to the prior.
Although the default priors tend to work well, prudent use of more informative priors is encouraged. It is important to underline that setting informative priors (if realistic), does not overbias the analysis. In other words, is only “directs” the sampling: if the data are highly informative about the parameter values (enough to overwhelm the prior), a prudent informative prior (even if oppositive to the observed effect) will yield similar results to a non-informative prior. In other words, you can’t change the results dramatically by tweaking the priors. But as the amount of data and/or the signal-to-noise ratio decrease, using a more informative prior becomes increasingly important. Of course, if you see someone using a prior with mean = 42 and SD = 0.0001, you should look at his results with caution…
Anyway, see the official rstanarm documentation for details.
As Bayesian models usually generate a lot of samples (iterations), one could want to plot them as well, instead (or along) the posterior “summary”. This can be done quite easily by extracting all the iterations in get_predicted
.
# Fit the model
fit <- rstanarm::stan_glm(Sex ~ Adjusting, data=df, family = "binomial")
# Generate a new refgrid
refgrid <- df %>%
select(Adjusting) %>%
psycho::refdata(length.out=10)
# Get predictions and keep iterations
predicted <- psycho::get_predicted(fit, newdata=refgrid, keep_iterations=TRUE)
# Reshape this dataframe to have iterations as factor
predicted <- predicted %>%
tidyr::gather(Iteration, Iteration_Value, starts_with("iter"))
# Plot iterations as well as the median prediction
ggplot(predicted, aes(x=Adjusting)) +
geom_line(aes(y=Iteration_Value, group=Iteration), size=0.3, alpha=0.01) +
geom_line(aes(y=Sex_Median), size=1) +
ylab("Male Probability\n")
This package helped you? Don’t forget to cite the various packages you used :)
You can cite psycho
as follows:
Improve this vignette by modifying this file!