Applied Bayesian Analyses in R

Part 3

Sven De Maeyer

New example data WritingData.RData

  • Experimental study on Writing instructions

  • 2 conditions:

    • Control condition (Business as usual)
    • Experimental condition (Observational learning)

Your Turn

  • Open WritingData.RData

  • Estimate 3 models with SecondVersion as dependent variable

    • M1: fixed effect of FirstVersion_GM + random effect of Class ((1|Class))
    • M2: M1 + random effect of FirstVersion_GM ((1 + FirstVersion_GM |Class))
    • M3: M2 + fixed effect of Experimental_condition
  • Compare the models on their fit

  • What do we learn?

  • Make a summary of the best fitting model

Divergent transitions…

Exploring the parameter space…

Divergent transitions…

  • Something to worry about!

  • Essentially: sampling of parameter estimate values went wrong

  • Fixes:

    • sometimes fine-tuning the sampling algorithm (e.g., control = list(adapt_delta = 0.9)) works
    • sometimes you need more informative priors
    • sometimes the model is just not a good model

Our fix here for model 3

M3 <- brm(
  SecondVersion ~ FirstVersion_GM + Experimental_condition + (1 + FirstVersion_GM |Class),
  data = WritingData,
  backend = "cmdstanr",
  cores = 4,
  control = list(adapt_delta = 0.9),
  seed = 1975 
)

Our fix here for model 3

Interpretation of results…

Different ways to summarize our results:

  • visually

  • credible intervals (eti & hdi)

  • rope + hdi rule

  • hypothesis tests

Visually summarizing the posterior distribution

Functions in bayesplot package

  • mcmc_areas() function

  • mcmc_areas_ridges() function

  • mcmc_intervals() function

The mcmc_areas() function

Gives a posterior distribution including a certain credible interval that you can set manually with the prob argument:

mcmc_areas(
  M3,
  pars = c(
    "b_FirstVersion_GM",
    "b_Experimental_condition"
  ),
  prob = .89
)

The mcmc_areas() function

The mcmc_areas_ridges() function


Almost similar to the previous, only the horizontal spacing changes a bit…


Meanwhile, see how you can easily change the color scheme for bayesplot graphs


color_scheme_set(scheme = "red")

mcmc_areas_ridges(
  M3,
  pars = c(
    "b_FirstVersion_GM",
    "b_Experimental_condition"
  ),
  prob = .89
)

The mcmc_areas_ridges() function

The mcmc_intervals() function

Summarizes the posterior as a horizontal bar with identifiers for two CI.

Here we set one for a 50% and one for a 89% CI

color_scheme_set(scheme = "gray")

mcmc_intervals(
  M3,
  pars = c(
    "b_FirstVersion_GM",
    "b_Experimental_condition"
  ),
  prob = .5,
  prob_outer = .89
)

The mcmc_intervals() function

Manually create visualizations


Powercombo: as_draws_df() + ggplot2 + ggdist


What does as_draws_df() do?


posterior_PD <- as_draws_df(M3)
head(posterior_PD)

Manually create visualizations

# A draws_df: 6 iterations, 1 chains, and 49 variables
  b_Intercept b_FirstVersion_GM b_Experimental_condition sd_Class__Intercept
1         113              0.98                      4.7                 5.2
2         112              1.47                      5.1                 2.8
3         113              1.07                      3.5                 2.8
4         111              1.09                      4.7                 4.2
5         112              1.08                      4.0                 1.7
6         111              1.04                      5.0                 2.4
  sd_Class__FirstVersion_GM cor_Class__Intercept__FirstVersion_GM sigma
1                      0.93                                  0.77   7.4
2                      1.06                                  0.68   8.1
3                      0.88                                  0.79   7.1
4                      0.75                                  0.74   8.0
5                      0.94                                  0.78   7.1
6                      0.97                                  0.82   8.1
  r_Class[1,Intercept]
1                 1.84
2                -1.59
3                -0.60
4                 0.95
5                 0.81
6                 0.38
# ... with 41 more variables
# ... hidden reserved variables {'.chain', '.iteration', '.draw'}

Use draws to create a plot using ggdist geoms

ggdist package has a set of functions to visualize a distribution

An example


Before we start, set our own plot theme (not so necessary)


# Setting a plotting theme
theme_set(theme_linedraw() +
            theme(
              text = element_text(family = "Times", size = 14),
              panel.grid = element_blank()
              )
)

An example


We use posterior_PD as a starting point (our draws)

library(ggdist)

Plot <- ggplot(
  posterior_PD,
  aes(x = b_Experimental_condition)
  ) +
  stat_halfeye()

Plot + scale_y_continuous(name = "", breaks = NULL)

An example

Change the CI’s


Change the CI’s to 50% and 89%


Plot <- ggplot(
  posterior_PD,
  aes(x = b_Experimental_condition)
  ) +
  stat_halfeye(
    .width = c(.50,.89)
  )

Plot + scale_y_continuous(name = "", breaks = NULL)

Change the CI’s

Use another visualization


Let’s make a dotplot… (research shows this is best interpreted) with 100 dots


Plot <- ggplot(
  posterior_PD,
  aes(x = b_Experimental_condition)
  ) +
  stat_dotsinterval(
    quantiles = 100,
    .width = c(.50,.89)
  )

Plot + scale_y_continuous(name = "", breaks = NULL)

Use another visualization

Plot two parameters each in a facet


We use pivot_longer(everything()) to stack information on multiple parameters


posterior_PD %>% 
  select(
    b_Experimental_condition, b_FirstVersion_GM
  ) %>% 
  pivot_longer(everything()) %>%
  ggplot(
    aes(x = value)
  ) +
  stat_halfeye(
    .width = c(.50,.89)
  ) +
facet_wrap(name ~ .) +
scale_y_continuous(name = "", breaks = NULL)

Plot two parameters each in a facet

Visualize calculated predictions based on posterior

Our example: 2 groups according to Experimental_condition

How to visualize the posterior probability of averages for both groups?

posterior_PD %>% 
  select(
    b_Intercept, b_Experimental_condition
  ) %>% 
  mutate(
    Mean_Control_condition = b_Intercept,
    Mean_Experimental_condition = b_Intercept + b_Experimental_condition
  ) %>% 
  select(
    Mean_Control_condition, Mean_Experimental_condition
  ) %>% 
  pivot_longer(everything()) %>%
  ggplot(
    aes(x = value, color = name, fill = name)
  ) +
  stat_halfeye(
    .width = c(.50,.89),
    alpha = .40
  ) + 
  scale_y_continuous(name = "", breaks = NULL)

Visualize calculated predictions based on posterior

Hypothetical Outcome Plots (HOPs)

Code: see separate script called HOP_script.R

# A tibble: 6 × 3
   draw     X Pred1
  <int> <dbl> <dbl>
1     1 -15    95.8
2     1 -14.9  95.9
3     1 -14.8  96.0
4     1 -14.7  96.1
5     1 -14.6  96.2
6     1 -14.5  96.3

Visualizing Random Effects

Plotting the residuals

To plot differences between classes we can use class-specific residuals:

head(as_draws_df(M3) %>% 
  select(ends_with(",Intercept]")) %>%
  select(1:3),
  5
)
# A tibble: 5 × 3
  `r_Class[1,Intercept]` `r_Class[2,Intercept]` `r_Class[3,Intercept]`
                   <dbl>                  <dbl>                  <dbl>
1                  1.84                  1.36                  -4.38  
2                 -1.59                  3.12                   0.0290
3                 -0.595                 2.02                  -0.347 
4                  0.945                 2.50                  -2.55  
5                  0.815                -0.0404                -1.59  

Plotting the residuals

as_draws_df(M3) %>% 
  select(ends_with(",Intercept]")) %>%
  pivot_longer(starts_with("r_Class")) %>% 
  mutate(sigma_i = value) %>%
  ggplot(aes(x = sigma_i, y = reorder(name, sigma_i))) +
  stat_pointinterval(
    point_interval = mean_qi, 
    .width = .89, 
    size = 1/6) +
  scale_y_discrete(expression(italic(j)), breaks = NULL) +
  labs(x = expression(italic(u)[italic(j)])) +
  coord_flip()

Plotting the residuals

ICC estimation

head(
  as_draws_df(M3) %>%
    mutate(
      ICC = (sd_Class__Intercept^2/(sd_Class__Intercept^2 + sigma^2))) %>%
    select(sigma, sd_Class__Intercept, ICC), 
  5
  ) 
# A tibble: 5 × 3
  sigma sd_Class__Intercept    ICC
  <dbl>               <dbl>  <dbl>
1  7.36                5.24 0.336 
2  8.07                2.77 0.106 
3  7.10                2.76 0.131 
4  8.01                4.18 0.214 
5  7.11                1.71 0.0545

ICC estimation

as_draws_df(M3) %>%
  mutate(
    ICC = (sd_Class__Intercept^2/(sd_Class__Intercept^2 + sigma^2))
    ) %>%
  select(ICC) %>%                           
  ggplot(aes(x = ICC)) +                    
   stat_dotsinterval(
     quantiles = 100,
     .width = c(.50,.89)
   ) +
   scale_x_continuous("marginal posterior", 
                      breaks = seq(.00,.60,by =.05)) + 
   scale_y_continuous("ICC", breaks = NULL)

ICC estimation

HOP per higher level unit

Code: see separate script called HOP_MixedEffects_script.R

Reporting stats about the posterior

Credible Intervals


  • ETI: Equal Tailed Interval


  • HDI: Highest Density Interval

Concept of ROPE

ROPE: Region Of Practical Equivalence

Set a region of parameter values that can be considered equivalent to having no effect

  • in standard effect sizes the advised default is a range of -0.1 to 0.1

  • this equals 1/2 of a small effect size (Cohen, 1988)

  • all parameter values in that range are set equal to no effect

ROPE + HDI

ROPE + HDI rule


  • 95% of HDI out of ROPE \(\rightarrow\) \(H_0\) rejected

  • 95% of HDI all in ROPE \(\rightarrow\) \(H_0\) not rejected

  • 95% of HDI partially out of ROPE \(\rightarrow\) undecided

Applying the HDI + ROPE rule with bayestestR package


We can use the equivalence_test() function of the bayestestR package


library(bayestestR)
equivalence_test(M3)
# Test for Practical Equivalence

  ROPE: [-1.68 1.68]

Parameter              |       H0 | inside ROPE |          95% HDI
------------------------------------------------------------------
Intercept              | Rejected |      0.00 % | [109.18, 113.43]
FirstVersion_GM        | Accepted |    100.00 % |     [0.48, 1.39]
Experimental_condition | Rejected |      0.00 % |     [1.71, 7.47]

Visualizing the HDI + ROPE rule


We visualize the equivalence_test() by adding plot( )


equivalence_test(M3) %>%
  plot()

Probability of Direction (PD) with parameters package

library(parameters)
model_parameters(
  M3,
  ci_method = "hdi",
  rope_range = c(-1.8,1.8), #sd MarathonTimeM = 17.76 so 17.76*0.1 
  test = c("rope", "pd")
  )
# Fixed Effects

Parameter              | Median |           95% CI |     pd | % in ROPE |  Rhat |     ESS
-----------------------------------------------------------------------------------------
(Intercept)            | 111.36 | [109.34, 113.58] |   100% |        0% | 1.000 | 1925.00
FirstVersion_GM        |   0.94 | [  0.49,   1.40] | 99.98% |      100% | 1.001 | 1718.00
Experimental_condition |   4.54 | [  1.77,   7.52] | 99.75% |     0.21% | 0.999 | 1814.00

# Sigma

Parameter | Median |       95% CI |   pd | % in ROPE |  Rhat |     ESS
----------------------------------------------------------------------
sigma     |   7.68 | [7.19, 8.20] | 100% |        0% | 1.001 | 5235.00

Outro

Some books

Some books

Some free online books

  • Bayes Rules!:

https://www.bayesrulesbook.com/

  • Or this book:

https://vasishth.github.io/bayescogsci/book/

Rens van de Schoot

In Nature Reviews

THE Podcast

If you like running - like I do - this could be a great companion on your run!

https://www.learnbayesstats.com/

Site on creating the graphs

There are many blogs and websites that you can consult if you want to find out more about making graphs.

One that I often fall back to is:


http://mjskay.github.io/tidybayes/

Questions?


Do not hesitate to contact me!


THANK YOU!