Integrated excercise

Background behind the dataset

This dataset is a simulated dataset that is based on an existing study of Frumuselu et al. (2015). In this study, the key question was whether subtitles help in foreign language acquisition. Spanish students (n = 36) watched episodes of the popular tv-show “Friends” for half an hour each week, during 26 weeks. The students were assigned to 3 conditions:

English subtitled (condition “FL”)
Spanish subtitled (condition “MT”)
No subtitles (condition “NoSub”)

At 3 occasions students got a Fluency test:

Before the 26 weeks started
After 12 weeks
After the experiment

The dependent variable is a measure based on the number of words used in a scripted spontaneous interview with a test taker. The data is structured as follows:

Code

load(file = here("Integrated_exercise","Subtitles.RData"))
head(Subtitles, 9)

  student occasion condition fluency
1       1     Occ1        FL  101.25
2       1     Occ2        FL  103.76
3       1     Occ3        FL  117.39
4       2     Occ1        MT   98.79
5       2     Occ2        MT  106.75
6       2     Occ3        MT  110.54
7       3     Occ1     NoSub  104.83
8       3     Occ2     NoSub  102.04
9       3     Occ3     NoSub  100.63

If we visualize the dataset we get a first impression of the effect of the condition. In this exercise it is your task to do the proper Bayesian modelling and interpretation.

Code

library(tidyverse)

theme_set(theme_linedraw() +
            theme(text = element_text(family = "Times", size = 12),
                  panel.grid = element_blank())
)

Subtitles %>%
  ggplot(
    aes(
      x = occasion,
      y = fluency,
      group = student
      )
  ) +
  geom_path(
    aes(
      color = condition
    )
  )

1. Task 1

First we will start by building 4 alternative mixed effects models. In each of the models we have to take into account the fact that we have multiple observations per student. So we need a random effect for students in the model.

M0: only an intercept and a random effect for student
M1: M0 + fixed effect of occasion
M2: M1 + fixed effect of condition
M3: M2 + interaction effect between occasion and condition

Make use of the default priors of brms.

Once the models are estimated, compare the models on their fit making use of the leave-one-out cross-validation. Determine which model fits best and will be used to build our inferences.

Solution

The following code-block shows how the models can be estimated and compared on their fit. Notice how I first create a set of dummy-variables for the categories of the occasion and the condition variables. This makes it a bit harder (more code writing) to define my model but it generates some flexibility later on. For instance, when thinking about setting priors, I can set priors for each of the effects of these dummy variables. Also, it will result in shorter names of parameters in my model in the resulting objects for the model.

Code

library(brms)
library(tidyverse)

Subtitles <- Subtitles %>%
  mutate(

    # dummy for Occ2
    Occ2 = case_when(
      occasion == "Occ2" ~ 1,
      occasion == "Occ1" ~ 0,
      occasion == "Occ3" ~ 0,
    ),
    
    # dummy for Occ3
    Occ3 = case_when(
      occasion == "Occ3" ~ 1,
      occasion == "Occ1" ~ 0,
      occasion == "Occ2" ~ 0,
    ),
    
    # dummy for FL condition
    FL = case_when(
      condition == "FL" ~ 1,
      condition == "MT" ~ 0,
      condition == "NoSub" ~ 0,
    ),
    
    # dummy for MT
    MT = case_when(
      condition == "MT" ~ 1,
      condition == "FL" ~ 0,
      condition == "NoSub" ~ 0,
    )
  )

# Estimate the models

M0 <- brm(
  fluency ~ 1 + (1|student),
  data = Subtitles,
  cores = 4,
  backend = "cmdstanr",
  seed = 1975
)

M1 <- brm(
  fluency ~ 1 + Occ2 + Occ3 + (1|student),
  data = Subtitles,
  cores = 4,
  backend = "cmdstanr",
  seed = 1975
)

M2 <- brm(
  fluency ~ 1 + Occ2 + Occ3 + FL + MT + (1|student),
  data = Subtitles,
  cores = 4,
  backend = "cmdstanr",
  seed = 1975
)

M3 <- brm(
  fluency ~ 1 + Occ2*FL + Occ2*MT + Occ3*FL + Occ3*MT + (1|student),
  data = Subtitles,
  cores = 4,
  backend = "cmdstanr",
  seed = 1975
)

# loo cross-validation of the models

loo_M0 <- loo(M0)
loo_M1 <- loo(M1)
loo_M2 <- loo(M2)
loo_M3 <- loo(M3)

loo_models <- loo_compare(
  loo_M0,
  loo_M1,
  loo_M2,
  loo_M3
)

print(loo_models, simplify = F)

Loading required package: Rcpp

Loading 'brms' package (version 2.22.0). Useful instructions
can be found by typing help('brms'). A more detailed introduction
to the package is available through vignette('brms_overview').


Attaching package: 'brms'

The following object is masked from 'package:rstan':

    loo

The following object is masked from 'package:stats':

    ar

   elpd_diff se_diff elpd_loo se_elpd_loo p_loo  se_p_loo looic  se_looic
M3    0.0       0.0  -294.0      7.7        28.9    3.5    588.1   15.4  
M2  -35.9       7.0  -330.0      8.4        12.3    1.9    659.9   16.7  
M1  -42.6       8.1  -336.6      9.7        19.9    3.3    673.2   19.4  
M0  -73.5       8.5  -367.6      8.6         6.0    1.1    735.2   17.2

Based on the model comparison we can conclude that the final model (M3) fits the data best. We will use this model in the next sections to build our inferences.

Note

When doing the loo comparison you might encounter a warning message saying that there is one or more observations showing a Pareto K higher than .7. What this means is that the leave-one-out cross-validation might be biased. The warning message suggest ‘moment matching’ as potential solution. You could try this. In my experience, the impact of one or two observations suffering a Pareto K value that is too high is rather small or even negligible. But it is always better to double check. Also, be aware that the moment matching solution creates a very slow estimation of the loo!

2. Task 2

Now that we have established the best fitting model, it is our task to approach the model critically before delving into the interpretation of the results.

Apply the different steps of the WAMBS check-list template for the final model.

Subtask 2.1: what about the priors?

What are the default brms priors? Do they make sense? Do they generate impossible datasets? If necessary, specify your own (weakly informative) priors and approach them critically as well.

Potential Solution

Let’s start with a prior predictive check.

Code

M3_priors <- brm(
  fluency ~ 1 + Occ2*FL + Occ2*MT + Occ3*FL + Occ3*MT + (1|student),
  data = Subtitles,
  cores = 4,
  backend = "cmdstanr",
  seed = 1975,
  sample_prior = "only"
)

pp_check(M3_priors)

As you might notice, you will get an error message saying that sampling from the priors is not possible. This is due to the fact that brms by default uses flat priors which generates this error message.

To get the priors used by brms we use the get_prior() command.

Code

get_prior(
  fluency ~ 1 + Occ2*FL + Occ2*MT + Occ3*FL + Occ3*MT + (1|student),
  data = Subtitles
)

                    prior     class      coef   group resp dpar nlpar lb ub
                   (flat)         b                                        
                   (flat)         b        FL                              
                   (flat)         b   FL:Occ3                              
                   (flat)         b        MT                              
                   (flat)         b   MT:Occ3                              
                   (flat)         b      Occ2                              
                   (flat)         b   Occ2:FL                              
                   (flat)         b   Occ2:MT                              
                   (flat)         b      Occ3                              
 student_t(3, 104.9, 6.1) Intercept                                        
     student_t(3, 0, 6.1)        sd                                    0   
     student_t(3, 0, 6.1)        sd           student                  0   
     student_t(3, 0, 6.1)        sd Intercept student                  0   
     student_t(3, 0, 6.1)     sigma                                    0   
       source
      default
 (vectorized)
 (vectorized)
 (vectorized)
 (vectorized)
 (vectorized)
 (vectorized)
 (vectorized)
 (vectorized)
      default
      default
 (vectorized)
 (vectorized)
      default

As can be seen above, for all the beta’s (fixed effects) brms uses a flat prior. Actually, that is something that is better avoided. More appropriate would be to come up with our own priors. Let’s think about this. All the explanatory variables are dummy variables. So, they quantify differences between groups of observations (based on time or condition).

As we have no prior idea about the directions of the effects of condition nor of the effect of time, we could use a prior distribution centred around 0 (most probability assigned to no effect).

Next, we have to think about setting the width of the prior. For instance, if we use a normal distribution to express our prior belief, we have to think about the sd for the normal distribution that captures our prior belief. In our case, the sd has to be high enough to assign some probability to very strong positive effects as well as to very strong negative effects. Here, it is important that we know our data well. I mean, we need to know the scale of our dependent variable (fluency). This variable has a standard deviation of 7.1. Now we can use Effect Sizes as a frame of reference to determine what a large positive and negative effect implies on the scale of the fluency variable. Remember, an effect size of 0.8 (or higher) indicates a strong effect (this is based on the Cohen’s d rules of thumb). So, on our scale of the fluency variable an effect of 5.7 (= 7.1 * 0.8) indicates a strong effect. If we use the value 5.7 as our sd for priors for the effects of our dummy variables, this would imply that we think that the 95% most probable parameter values for the effect of the dummy variables would be situated between -11.4 (= -2 * 5.7) and 11.4 (= 2 * 5.7). Visually the prior would look like this:

Code

# Setting a plotting theme

library(ggplot2)
library(ggtext) # to be able to change the fonts etc in graphs

theme_set(theme_linedraw() +
            theme(text = element_text(family = "Times", size = 8),
                  panel.grid = element_blank(),
                  plot.title = element_markdown())
)

Prior_betas <- ggplot( ) +
  stat_function(
    fun = dnorm,    # We use the normal distribution
    args = list(mean = 0, sd = 5.7), # 
    xlim = c(-15,15)
  ) +
  scale_y_continuous(name = "density") +
  labs(title = "Prior for the effects of independent variables",
       subtitle = "N(0,5.7)")

Prior_betas

Notice that even effects of -11 and 11 (almost effect sizes of -2 and 2) still get a decent amount of probability in our prior density function.

Let’s set these priors and try to apply a pp_check(). Notice that I set the priors for all slopes (class = "b") at once.

Code

Custom_prior <- c(
  set_prior(
    "normal(0,5.7)",
    class = "b"
  )
)

M3_priors <- brm(
  fluency ~ 1 + Occ2*FL + Occ2*MT + Occ3*FL + Occ3*MT + (1|student),
  data = Subtitles,
  cores = 4,
  backend = "cmdstanr",
  seed = 1975,
  prior = Custom_prior,
  sample_prior = "only"
)

pp_check(M3_priors)

The simulated data goes all the way (light blue lines)! But it doesn’t generate extremely high or low observations and from this check we also learn that we have set quiet broad priors as they result in big differences between distributions of fluency based on the simulated datasets coming from our model with these priors.

Time to apply these priors (that we somehow understand now) to estimate the real model.

Code

Custom_prior <- c(
  set_prior(
    "normal(0,5.7)",
    class = "b"
  )
)

M3 <- brm(
  fluency ~ 1 + Occ2*FL + Occ2*MT + Occ3*FL + Occ3*MT + (1|student),
  data = Subtitles,
  cores = 4,
  backend = "cmdstanr",
  seed = 1975,
  prior = Custom_prior
  )

Subtask 2.2: did the model converge properly?

Perform different checks on the convergence of the model.

Possible solution

Let’s start by checking the trace-plots.

Code

library(ggmcmc)

Model_chains <- ggs(M3b)

Model_chains %>%
  filter(Parameter %in% c(
    "b_Intercept",
    "b_FL", 
    "b_MT", 
    "b_Occ2", 
    "b_Occ3",
    "b_FL:Occ2",
    "b_FL:Occ3",
    "b_MT:Occ2",
    "b_MT:Occ3"
    )
  ) %>%
  ggplot(aes(
    x   = Iteration,
    y   = value, 
    col = as.factor(Chain)))+
  geom_line() +
  facet_grid(Parameter ~ .,
             scale  = 'free_y',
             switch = 'y') +
  labs(title = "Caterpillar Plots for the parameters",
       col   = "Chains")

Looking at the trace-plots, we can conclude that all chains mixed very well. This is already a first indication of successful convergence.

Next, we can check the R-hat statistic for each of the parameters. With the following plot we get a visual overview of all the R-hat statistics (notice the large number of parameters, because we also have random effects in our model):

Note

In the code below you can see that I first create a vector called Rhats to use in the mcmc_rhat() function. To create the vector I call the brms::rhat() function. By writing explicitly brms:: before the rhat() I make sure that the rhat() function from the package brms is used. I do this to avoid the use of another rhat() function that might be loaded by activating other packages and that results in incompatible results with the mcmc_rhat() function.

Code

library(brms)
library(bayesplot)

# Extract posterior draws as array (robust method)
Rhats <- brms::rhat(M3b)

# Plot with bayesplot
mcmc_rhat(Rhats, size = 2) +
  yaxis_text(hjust = 1)

None of the parameters shows a high R-hat statistic. They all are below the threshold of 1.05, indicating that all parameters converged well.

Time to get insight in the amount of autocorrelation. A first check is plotting the ratio of the number of Effective Sample Sizes to the Total Sampel Sizes for all the parameters. Remember that this ratio should be above 0.1 to be sure that the amount of autocorrelation is acceptable. Following code gives a visual overview of these ratios.

Code

mcmc_neff(
  neff_ratio(M3b)
  ) + 
  yaxis_text(hjust = 1)

From the plot we learn that for all the parameters the ratio is above 0.1. So, we can conclude that the amount of autocorrelation is not problematic for the estimation of any of the parameters.

Subtask 2.3: does the posterior distribution histogram have enough information?

Check if the posterior distribution histograms of the different parameters are informative enough to substantiate our inferences.

Possible solution

To evaluate this, we create histograms based on the draws for our parameter values based on our model. We will apply this first for all main fixed effects.

Code

library(patchwork)

posterior_PD <- as_draws_df(M3b)

post_intercept <- 
  posterior_PD %>%
  select(b_Intercept) %>%
  ggplot(aes(x = b_Intercept)) +
  geom_histogram() +
  ggtitle("Intercept") 

post_Occ2 <- 
  posterior_PD %>%
  select(b_Occ2) %>%
  ggplot(aes(x = b_Occ2)) +
  geom_histogram() +
  ggtitle("Beta Occ2") 

post_Occ3 <- 
  posterior_PD %>%
  select(b_Occ3) %>%
  ggplot(aes(x = b_Occ3)) +
  geom_histogram() +
  ggtitle("Beta Occ3") 

post_FL <- 
  posterior_PD %>%
  select(b_FL) %>%
  ggplot(aes(x = b_FL)) +
  geom_histogram() +
  ggtitle("Beta FL") 

post_MT <- 
  posterior_PD %>%
  select(b_MT) %>%
  ggplot(aes(x = b_MT)) +
  geom_histogram() +
  ggtitle("Beta MT") 


post_intercept + post_Occ2 + post_Occ3 + post_FL + post_MT +
  plot_layout(ncol = 3)

These plots show clear slopes and a peak, indicating that the posterior is informative enough for each of these parameters.

Now, let’s do the same for the interaction effects.

Code

post_Occ2_FL <- 
  posterior_PD %>%
  select(`b_Occ2:FL`) %>%
  ggplot(aes(x = `b_Occ2:FL`)) +
  geom_histogram() +
  ggtitle("Beta Occ2:FL") 

post_Occ2_MT <- 
  posterior_PD %>%
  select(`b_Occ2:MT`) %>%
  ggplot(aes(x = `b_Occ2:MT`)) +
  geom_histogram() +
  ggtitle("Beta Occ2:MT") 

post_Occ3_FL <- 
  posterior_PD %>%
  select(`b_FL:Occ3`) %>%
  ggplot(aes(x = `b_FL:Occ3`)) +
  geom_histogram() +
  ggtitle("Beta Occ3:FL") 

post_Occ3_MT <- 
  posterior_PD %>%
  select(`b_MT:Occ3`) %>%
  ggplot(aes(x = `b_MT:Occ3`)) +
  geom_histogram() +
  ggtitle("Beta Occ3:MT") 

post_Occ2_FL + post_Occ2_MT + post_Occ3_FL + post_Occ3_MT +
  plot_layout(ncol = 3)

Here the conclusion is the same. These histograms show no problematic cases.

Subtask 2.4: how well does the model predict the observed data?

Perform posterior predictive checks based on the model.

Possible solutions

Code

pp_check(M3b)

Looking at the posterior probability checks, we can see that the distributions of simulated data show a strong resemblance to the distribution of the observed data. So, the conclusion could be that our model is performing quiet well.

Subtask 2.5: what about prior sensitivity of the results?

Finally, we have to check if the results of our model are not too dependent on the priors we specified in the model.

Code

library(priorsense)

powerscale_sensitivity(M3b)

Sensitivity based on cjs_dist:
# A tibble: 47 × 4
   variable               prior likelihood diagnosis          
   <chr>                  <dbl>      <dbl> <chr>              
 1 b_Intercept           0.0880      0.150 prior-data conflict
 2 b_Occ2                0.101       0.242 prior-data conflict
 3 b_FL                  0.110       0.166 prior-data conflict
 4 b_MT                  0.0808      0.138 prior-data conflict
 5 b_Occ3                0.127       0.276 prior-data conflict
 6 b_Occ2:FL             0.129       0.285 prior-data conflict
 7 b_Occ2:MT             0.108       0.271 prior-data conflict
 8 b_FL:Occ3             0.153       0.372 prior-data conflict
 9 b_MT:Occ3             0.116       0.272 prior-data conflict
10 sd_student__Intercept 0.0192      0.719 -                  
# ℹ 37 more rows

We learn from this prior sensitivity analyses that there is a prior-data conflict.

Note

If we delve into the paper on the priorsense package we can read this paragraph:

Prior-data conflict (Evans & Moshonov, 2006; Nott, Wang, et al., 2020; Walter & Augustin, 2009) can arise due to intentionally or unintentionally informative priors disagreeing with, but not being dominated by, the likelihood. When this is the case, the posterior will be sensitive to power-scaling both the prior and the likelihood, as illustrated in Figure 5. When prior-data conflict has been detected, the modeller may wish to modify the model by using a less informative prior (e.g., Evans & Jang, 2011; Nott, Seah, et al., 2020) or using heavy-tailed distributions (e.g., Gagnon, 2022; O’Hagan & Pericchi, 2012).

Reference: Kallioinen, N., Paananen, T., Bürkner, P.-C., & Vehtari, A. (2023). Detecting and diagnosing prior and likelihood sensitivity with power-scaling. Statistics and Computing, 34(1), 57. https://doi.org/10.1007/s11222-023-10366-5

So, it might imply that our priors are “unintentionally informative”. Let’s see what happens if we would stick to the default priors of brms. Given that the object called M3 contains the estimation of the model using the default priors we can quickly learn about the prior sensitivity of this model.

Code

powerscale_sensitivity(M3)

Sensitivity based on cjs_dist:
# A tibble: 47 × 4
   variable                 prior likelihood diagnosis
   <chr>                    <dbl>      <dbl> <chr>    
 1 b_Intercept           0.00283      0.0603 -        
 2 b_Occ2                0.000612     0.128  -        
 3 b_FL                  0.00104      0.0596 -        
 4 b_MT                  0.00118      0.0606 -        
 5 b_Occ3                0.000393     0.124  -        
 6 b_Occ2:FL             0.000629     0.147  -        
 7 b_Occ2:MT             0.000578     0.142  -        
 8 b_FL:Occ3             0.000427     0.137  -        
 9 b_MT:Occ3             0.000400     0.127  -        
10 sd_student__Intercept 0.00617      0.502  -        
# ℹ 37 more rows

Looking at this prior sensitivity check we learn that using the default priors makes the model less sensitive to the priors used!

“What to do now?”

Well, in my opinion there is not a single best way to deal with this situation. What I would do first is compare the results based on both models (M3 and M3b). Let’s print the parameter estimates of both models. To do this I rely on the tab_model() function from the very helpful package sjPlot (so, do not forget to install that one if you want to use it!), as it allows me to show the results of both models side-to-side.

	M3 (brms default priors)		M3b (custom priors)
Predictors	Estimates	CI (95%)	Estimates	CI (95%)
Intercept	102.60	100.31 – 104.92	101.75	99.68 – 103.80
Occ2	-0.29	-2.83 – 2.16	0.79	-1.58 – 3.12
FL	-2.39	-5.77 – 0.73	-0.99	-3.96 – 1.94
MT	-1.40	-4.61 – 1.85	-0.22	-3.14 – 2.78
Occ3	2.50	-0.00 – 4.94	3.78	1.42 – 6.17
Occ2:FL	7.08	3.47 – 10.59	5.30	1.87 – 8.66
Occ2:MT	8.07	4.57 – 11.51	6.44	3.12 – 9.57
FL:Occ3	17.15	13.53 – 20.66	14.87	11.46 – 18.20
MT:Occ3	8.84	5.32 – 12.49	7.09	3.68 – 10.26
Random Effects
σ²	5.16		5.00
τ₀₀	46.69		44.86
ICC	0.10		0.10
N	36 _student		36 _student
Observations	108		108
Marginal R² / Conditional R²	0.709 / 0.816		0.695 / 0.803

This table clearly demonstrates how the model making use of the default priors has different results than the model with the priors we had set ourselves!

If we think about this more profoundly this doesn’t have to come as a surprise! The priors used for the effect of Occasion was maybe nonsense. Why not assuming that students got more fluent over the time course of 26 weeks? To be fairly honest, I deliberately used these priors that seemed uninformative so that we would - for didactical reasons - run into this issue!

For the rest of the exercise we will stick to the results of the model with default priors. But, another option could be to change our custom priors, mimicking the idea that we expect students to become more fluent on the course of 26 weeks. That’s what I actually did in a blogpost that I wrote about using the priorsense package that can be found here: https://sdemaeyer.quarto.pub/posts/2024-02-PriorSense/PriorSensePost.html

3. Task 3

Now a more general task. Make different visualizations of the model results.

One of the possible visualizations could be a rather complex one. Remember, there are 3 conditions and 3 occasions. What I like to see is a plot showing the expected means for each of the conditions on each of the 3 occasions.

And what do we learn about the progress between Occ1 and Occ2 in each of the groups? And what about the progress between Occ2 and Occ3?

Possible solution

Let’s start tackling this challenge:

|“One of the possible visualizations could be a rather complex one. Remember, there are 3 conditions and 3 occasions. What I like to see is a plot showing the expected means for each of the conditions on each of the 3 occasions.”

Starting point: create an object containing the draws from the posterior based on model M3.

Code

posterior_M3 <- as_draws_df(M3)

The next step is calculating some predicted means, based on combinations of our dummy variables in the model. This is a ‘tricky’ step because we have to keep in mind that the model also contains interaction effects.

Code

posterior_M3 <- posterior_M3 %>%
  mutate(
    
    # calculate expected mean for Occ1 condition NoSub (that's just our intercept)
    Occ1_NoSub = b_Intercept,
    # calculate expected mean for Occ1 condition FL (add main effect of FL)
    Occ1_FL = b_Intercept + b_FL,
    # calculate expected mean for Occ1 condition MT (add main effect of MT)
    Occ1_MT = b_Intercept + b_MT,

    # calculate expected mean for Occ2 condition NoSub (add main effect of Occ2)
    Occ2_NoSub = b_Intercept + b_Occ2,
    # calculate expected mean for Occ2 condition FL (add main effects and interaction term)
    Occ2_FL = b_Intercept + b_Occ2 + b_FL + `b_Occ2:FL`,
    # calculate expected mean for Occ2 condition MT (add main effects and interaction term)
    Occ2_MT = b_Intercept + b_Occ2 + b_MT + `b_Occ2:MT`,

    # calculate expected mean for Occ3 condition NoSub (add main effect of Occ2)
    Occ3_NoSub = b_Intercept + b_Occ3,
    # calculate expected mean for Occ3 condition FL (add main effects and interaction term)
    Occ3_FL = b_Intercept + b_Occ3 + b_FL + `b_FL:Occ3`,
    # calculate expected mean for Occ3 condition MT (add main effects and interaction term)
    Occ3_MT = b_Intercept + b_Occ3 + b_MT + `b_MT:Occ3`

  )

Now that we have calculated these estimated means, we can use these columns to create plots. What

Code

library(ggplot2)
library(ggdist)

posterior_M3 %>%

  # First: select only the relevant columns of our posterior draws object
  select(
    Occ1_NoSub, Occ1_FL, Occ1_MT,
    Occ2_NoSub, Occ2_FL, Occ2_MT,
    Occ3_NoSub, Occ3_FL, Occ3_MT,
  ) %>%
  
  # Second: pivot everything longer to get two columns
  # column called `name` containing the name of the parameter
  # calumn called `value` containing the value for that parameter in that draw
  pivot_longer(everything()) %>%
  
  # Now create new variable that indicates the Occasion because I want to use that as X-axis
  # And create new variable that indicates the group (NoSub, MT, or FL) to make groups in the plot
  mutate(
    Occasion = case_when(
      name == "Occ1_NoSub" | name == "Occ1_FL" | name == "Occ1_MT" ~ "Occ1",
      name == "Occ2_NoSub" | name == "Occ2_FL" | name == "Occ2_MT" ~ "Occ2",
      name == "Occ3_NoSub" | name == "Occ3_FL" | name == "Occ3_MT" ~ "Occ3",
    ),
    Group = case_when(
      name == "Occ1_NoSub" | name == "Occ2_NoSub" | name == "Occ3_NoSub" ~ "No subtitles",
      name == "Occ1_FL"    | name == "Occ2_FL"    | name == "Occ3_FL"    ~ "Foreign language",
      name == "Occ1_MT"    | name == "Occ2_MT"    | name == "Occ3_MT"    ~ "Mother tongue",
    )
  ) %>%
  
  # Time to create the plot
  ggplot(
    aes(x = Occasion, 
        y = value, 
        group = Group, 
        fill = Group)  # identify the experimental groups by the fill color
  ) +
  stat_halfeye(
    .width = c(.5, .89),
    alpha = .4 # make the fill color of the density plots more transparent
  )

The next challenge:

|“And what do we learn about the progress between Occ1 and Occ2 in each of the groups? And what about the progress between Occ2 and Occ3? Is progress between two occasions different in each of the conditions?”

These questions can be compared with using tests on estimated marginal means and post-hoc testing by making use of contrasts in the frequentist realm.

We can first answer these questions by making visualisations of the posterior probability distributions for the estimated differences between occasions in each of the conditions. To make plots we rely again on our draws drawn from our posterior probability distributions. As a starting point we use the posterior_M3b object created above.

Code

# Start by creating difference scores between occasions based on draws

posterior_M3 <- posterior_M3 %>%
  mutate(

    # Difference between occasions for condition NoSub
    Occ2_Occ1_NoSub = Occ2_NoSub - Occ1_NoSub,
    Occ3_Occ2_NoSub = Occ3_NoSub - Occ2_NoSub,

    # Difference between occasions for condition FL
    Occ2_Occ1_FL = Occ2_FL - Occ1_FL,
    Occ3_Occ2_FL = Occ3_FL - Occ2_FL,

    # Difference between occasions for condition MT
    Occ2_Occ1_MT = Occ2_MT - Occ1_MT,
    Occ3_Occ2_MT = Occ3_MT - Occ2_MT
    
  )

# Create the visualisation

posterior_M3 %>%

  # select only the relevant columns for creating the plot
  select(
    Occ2_Occ1_NoSub, Occ3_Occ2_NoSub,
    Occ2_Occ1_FL, Occ3_Occ2_FL,
    Occ2_Occ1_MT, Occ3_Occ2_MT,
  ) %>%
  
  # do the pivot_longer to create one vector of values and a vector of parameter names
  pivot_longer(
    everything( )
    ) %>%

 # Create variables that help identify groups of parameters to color the distributions
  mutate(
    Occasion = case_when(
      name == "Occ2_Occ1_NoSub" | name == "Occ2_Occ1_FL" | name == "Occ2_Occ1_MT" ~ "Progress Occ1 Occ2",
      name == "Occ3_Occ2_NoSub" | name == "Occ3_Occ2_FL" | name == "Occ3_Occ2_MT" ~ "Progress Occ2 Occ3"),
    Group = case_when(
      name == "Occ2_Occ1_NoSub" | name == "Occ3_Occ2_NoSub"  ~ "No subtitles",
      name == "Occ2_Occ1_FL" | name == "Occ3_Occ2_FL"  ~ "Foreign language",
      name == "Occ2_Occ1_MT" | name == "Occ3_Occ2_MT"  ~ "Mother tongue",
    )
  ) %>%

  ggplot(
    aes(
      x = value,
      y = Group
    )
  ) + 

  # Let's plot interval visualisations
  
  stat_pointinterval(
    .width = c(.5, .89)  
    ) +
  scale_y_discrete(name = "") +           # Drop the label of the y-scale
  scale_x_continuous(name = "Fluency") +  # Name the x-scale
  facet_wrap(.~Occasion)

From this visualisation we learn that the progress between occasion 1 and 2 is rather similar in both subtitle groups (intervals strongly overlap). The students in the condition without subtitles make a smaller progress between occasion 1 and occasion 2 than the students in both subtitling conditions.

Focussing on the progress between occasion 2 and 3, we see that the progress of the students in the foreign language subtitling condition stands out from the progress made in the two other conditions.

Code

hypothesis(
  posterior_M3,
  hypothesis = "Occ2_Occ1_MT > Occ2_Occ1_FL"
)

Hypothesis Tests for class :
                Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
1 (Occ2_Occ1_MT)-(O... > 0     0.98       1.8    -1.98     3.96        2.4
  Post.Prob Star
1      0.71     
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.

If we wrap this hypothesis() part within a plot() function, we get a visualization of the posterior probability distribution of the difference in progress for the two subtitling conditions.

Code

plot(
  hypothesis(
    posterior_M3,
    hypothesis = "Occ2_Occ1_MT > Occ2_Occ1_FL"
    )
)

This visualization shows that a great amount of credible estimates of this difference in progress between both subtitling conditions is situated around zero, with high probabilities for both a negative and positive difference. In other words, most of the evidence shows that we are not sure to state whether both subtitling conditions differ in the progress in fluency they induced between occasion 1 and occasion 2.

Now, let’s focus on the difference between occasion 2 and occasion 3: how much evidence is there that this progress is stronger in the condition of foreign language subtitling compared to the condition of mother tongue subtitling?

Code

hypothesis(
  posterior_M3,
  hypothesis = "Occ3_Occ2_FL > Occ3_Occ2_MT"
)

Hypothesis Tests for class :
                Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
1 (Occ3_Occ2_FL)-(O... > 0     9.26      1.77     6.31    12.15        Inf
  Post.Prob Star
1         1    *
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.

Code

plot(
 hypothesis(
  posterior_M3,
  hypothesis = "Occ3_Occ2_FL > Occ3_Occ2_MT"
 ) 
)

Here we can see that we have a lot of supporting evidence to conclude that the improvement between occasion 2 and 3 in fluency is stronger for the students in the foreign language condition than for the students in the mother tongue condition.

References

Frumuselu, Anca Daniela, Sven De Maeyer, Vincent Donche, and María del Mar Gutiérrez Colon Plana. 2015. “Television Series Inside the EFL Classroom: Bridging the Gap Between Teaching and Learning Informal Language Through Subtitles.” Linguistics and Education 32 (December): 107–17. https://doi.org/10.1016/j.linged.2015.10.001.