This dataset is a simulated dataset that is based on an existing study of Frumuselu et al. (2015). In this study, the key question was whether subtitles help in foreign language acquisition. Spanish students (n = 36) watched episodes of the popular tv-show “Friends” for half an hour each week, during 26 weeks. The students were assigned to 3 conditions:
English subtitled (condition “FL”)
Spanish subtitled (condition “MT”)
No subtitles (condition “NoSub”)
At 3 occasions students got a Fluency test:
Before the 26 weeks started
After 12 weeks
After the experiment
The dependent variable is a measure based on the number of words used in a scripted spontaneous interview with a test taker. The data is structured as follows:
If we visualize the dataset we get a first impression of the effect of the condition. In this exercise it is your task to do the proper Bayesian modelling and interpretation.
First we will start by building 4 alternative mixed effects models. In each of the models we have to take into account the fact that we have multiple observations per student. So we need a random effect for students in the model.
M0: only an intercept and a random effect for student
M1: M0 + fixed effect of occasion
M2: M1 + fixed effect of condition
M3: M2 + interaction effect between occasion and condition
Make use of the default priors of brms.
Once the models are estimated, compare the models on their fit making use of the leave-one-out cross-validation. Determine which model fits best and will be used to build our inferences.
Solution
The following code-block shows how the models can be estimated and compared on their fit. Notice how I first create a set of dummy-variables for the categories of the occasion and the condition variables. This makes it a bit harder (more code writing) to define my model but it generates some flexibility later on. For instance, when thinking about setting priors, I can set priors for each of the effects of these dummy variables. Also, it will result in shorter names of parameters in my model in the resulting objects for the model.
Loading 'brms' package (version 2.22.0). Useful instructions
can be found by typing help('brms'). A more detailed introduction
to the package is available through vignette('brms_overview').
Attaching package: 'brms'
The following object is masked from 'package:rstan':
loo
The following object is masked from 'package:stats':
ar
Based on the model comparison we can conclude that the final model (M3) fits the data best. We will use this model in the next sections to build our inferences.
Note
When doing the loo comparison you might encounter a warning message saying that there is one or more observations showing a Pareto K higher than .7. What this means is that the leave-one-out cross-validation might be biased. The warning message suggest ‘moment matching’ as potential solution. You could try this. In my experience, the impact of one or two observations suffering a Pareto K value that is too high is rather small or even negligible. But it is always better to double check. Also, be aware that the moment matching solution creates a very slow estimation of the loo!
2. Task 2
Now that we have established the best fitting model, it is our task to approach the model critically before delving into the interpretation of the results.
Apply the different steps of the WAMBS check-list template for the final model.
Subtask 2.1: what about the priors?
What are the default brms priors? Do they make sense? Do they generate impossible datasets? If necessary, specify your own (weakly informative) priors and approach them critically as well.
As you might notice, you will get an error message saying that sampling from the priors is not possible. This is due to the fact that brms by default uses flat priors which generates this error message.
To get the priors used by brms we use the get_prior() command.
prior class coef group resp dpar nlpar lb ub
(flat) b
(flat) b FL
(flat) b FL:Occ3
(flat) b MT
(flat) b MT:Occ3
(flat) b Occ2
(flat) b Occ2:FL
(flat) b Occ2:MT
(flat) b Occ3
student_t(3, 104.9, 6.1) Intercept
student_t(3, 0, 6.1) sd 0
student_t(3, 0, 6.1) sd student 0
student_t(3, 0, 6.1) sd Intercept student 0
student_t(3, 0, 6.1) sigma 0
source
default
(vectorized)
(vectorized)
(vectorized)
(vectorized)
(vectorized)
(vectorized)
(vectorized)
(vectorized)
default
default
(vectorized)
(vectorized)
default
As can be seen above, for all the beta’s (fixed effects) brms uses a flat prior. Actually, that is something that is better avoided. More appropriate would be to come up with our own priors. Let’s think about this. All the explanatory variables are dummy variables. So, they quantify differences between groups of observations (based on time or condition).
As we have no prior idea about the directions of the effects of condition nor of the effect of time, we could use a prior distribution centred around 0 (most probability assigned to no effect).
Next, we have to think about setting the width of the prior. For instance, if we use a normal distribution to express our prior belief, we have to think about the sd for the normal distribution that captures our prior belief. In our case, the sd has to be high enough to assign some probability to very strong positive effects as well as to very strong negative effects. Here, it is important that we know our data well. I mean, we need to know the scale of our dependent variable (fluency). This variable has a standard deviation of 7.1. Now we can use Effect Sizes as a frame of reference to determine what a large positive and negative effect implies on the scale of the fluency variable. Remember, an effect size of 0.8 (or higher) indicates a strong effect (this is based on the Cohen’s d rules of thumb). So, on our scale of the fluency variable an effect of 5.7 (= 7.1 * 0.8) indicates a strong effect. If we use the value 5.7 as our sd for priors for the effects of our dummy variables, this would imply that we think that the 95% most probable parameter values for the effect of the dummy variables would be situated between -11.4 (= -2 * 5.7) and 11.4 (= 2 * 5.7). Visually the prior would look like this:
Code
# Setting a plotting themelibrary(ggplot2)library(ggtext) # to be able to change the fonts etc in graphstheme_set(theme_linedraw() +theme(text =element_text(family ="Times", size =8),panel.grid =element_blank(),plot.title =element_markdown()))Prior_betas <-ggplot( ) +stat_function(fun = dnorm, # We use the normal distributionargs =list(mean =0, sd =5.7), # xlim =c(-15,15) ) +scale_y_continuous(name ="density") +labs(title ="Prior for the effects of independent variables",subtitle ="N(0,5.7)")Prior_betas
Notice that even effects of -11 and 11 (almost effect sizes of -2 and 2) still get a decent amount of probability in our prior density function.
Let’s set these priors and try to apply a pp_check(). Notice that I set the priors for all slopes (class = "b") at once.
The simulated data goes all the way (light blue lines)! But it doesn’t generate extremely high or low observations and from this check we also learn that we have set quiet broad priors as they result in big differences between distributions of fluency based on the simulated datasets coming from our model with these priors.
Time to apply these priors (that we somehow understand now) to estimate the real model.
Perform different checks on the convergence of the model.
Possible solution
Let’s start by checking the trace-plots.
Code
library(ggmcmc)Model_chains <-ggs(M3b)Model_chains %>%filter(Parameter %in%c("b_Intercept","b_FL", "b_MT", "b_Occ2", "b_Occ3","b_FL:Occ2","b_FL:Occ3","b_MT:Occ2","b_MT:Occ3" ) ) %>%ggplot(aes(x = Iteration,y = value, col =as.factor(Chain)))+geom_line() +facet_grid(Parameter ~ .,scale ='free_y',switch ='y') +labs(title ="Caterpillar Plots for the parameters",col ="Chains")
Looking at the trace-plots, we can conclude that all chains mixed very well. This is already a first indication of successful convergence.
Next, we can check the R-hat statistic for each of the parameters. With the following plot we get a visual overview of all the R-hat statistics (notice the large number of parameters, because we also have random effects in our model):
Note
In the code below you can see that I first create a vector called Rhats to use in the mcmc_rhat() function. To create the vector I call the brms::rhat() function. By writing explicitly brms:: before the rhat() I make sure that the rhat() function from the package brms is used. I do this to avoid the use of another rhat() function that might be loaded by activating other packages and that results in incompatible results with the mcmc_rhat() function.
Code
library(brms)library(bayesplot)# Extract posterior draws as array (robust method)Rhats <- brms::rhat(M3b)# Plot with bayesplotmcmc_rhat(Rhats, size =2) +yaxis_text(hjust =1)
None of the parameters shows a high R-hat statistic. They all are below the threshold of 1.05, indicating that all parameters converged well.
Time to get insight in the amount of autocorrelation. A first check is plotting the ratio of the number of Effective Sample Sizes to the Total Sampel Sizes for all the parameters. Remember that this ratio should be above 0.1 to be sure that the amount of autocorrelation is acceptable. Following code gives a visual overview of these ratios.
Code
mcmc_neff(neff_ratio(M3b) ) +yaxis_text(hjust =1)
From the plot we learn that for all the parameters the ratio is above 0.1. So, we can conclude that the amount of autocorrelation is not problematic for the estimation of any of the parameters.
Subtask 2.3: does the posterior distribution histogram have enough information?
Check if the posterior distribution histograms of the different parameters are informative enough to substantiate our inferences.
Possible solution
To evaluate this, we create histograms based on the draws for our parameter values based on our model. We will apply this first for all main fixed effects.
Here the conclusion is the same. These histograms show no problematic cases.
Subtask 2.4: how well does the model predict the observed data?
Perform posterior predictive checks based on the model.
Possible solutions
Code
pp_check(M3b)
Looking at the posterior probability checks, we can see that the distributions of simulated data show a strong resemblance to the distribution of the observed data. So, the conclusion could be that our model is performing quiet well.
Subtask 2.5: what about prior sensitivity of the results?
Finally, we have to check if the results of our model are not too dependent on the priors we specified in the model.
We learn from this prior sensitivity analyses that there is a prior-data conflict.
Note
If we delve into the paper on the priorsense package we can read this paragraph:
Prior-data conflict (Evans & Moshonov, 2006; Nott, Wang, et al., 2020; Walter & Augustin, 2009) can arise due to intentionally or unintentionally informative priors disagreeing with, but not being dominated by, the likelihood. When this is the case, the posterior will be sensitive to power-scaling both the prior and the likelihood, as illustrated in Figure 5. When prior-data conflict has been detected, the modeller may wish to modify the model by using a less informative prior (e.g., Evans & Jang, 2011; Nott, Seah, et al., 2020) or using heavy-tailed distributions (e.g., Gagnon, 2022; O’Hagan & Pericchi, 2012).
Reference: Kallioinen, N., Paananen, T., Bürkner, P.-C., & Vehtari, A. (2023). Detecting and diagnosing prior and likelihood sensitivity with power-scaling. Statistics and Computing, 34(1), 57. https://doi.org/10.1007/s11222-023-10366-5
So, it might imply that our priors are “unintentionally informative”. Let’s see what happens if we would stick to the default priors of brms. Given that the object called M3 contains the estimation of the model using the default priors we can quickly learn about the prior sensitivity of this model.
Looking at this prior sensitivity check we learn that using the default priors makes the model less sensitive to the priors used!
“What to do now?”
Well, in my opinion there is not a single best way to deal with this situation. What I would do first is compare the results based on both models (M3 and M3b). Let’s print the parameter estimates of both models. To do this I rely on the tab_model() function from the very helpful package sjPlot (so, do not forget to install that one if you want to use it!), as it allows me to show the results of both models side-to-side.
M3 (brms default priors)
M3b (custom priors)
Predictors
Estimates
CI (95%)
Estimates
CI (95%)
Intercept
102.60
100.31 – 104.92
101.75
99.68 – 103.80
Occ2
-0.29
-2.83 – 2.16
0.79
-1.58 – 3.12
FL
-2.39
-5.77 – 0.73
-0.99
-3.96 – 1.94
MT
-1.40
-4.61 – 1.85
-0.22
-3.14 – 2.78
Occ3
2.50
-0.00 – 4.94
3.78
1.42 – 6.17
Occ2:FL
7.08
3.47 – 10.59
5.30
1.87 – 8.66
Occ2:MT
8.07
4.57 – 11.51
6.44
3.12 – 9.57
FL:Occ3
17.15
13.53 – 20.66
14.87
11.46 – 18.20
MT:Occ3
8.84
5.32 – 12.49
7.09
3.68 – 10.26
Random Effects
σ2
5.16
5.00
τ00
46.69
44.86
ICC
0.10
0.10
N
36 student
36 student
Observations
108
108
Marginal R2 / Conditional R2
0.709 / 0.816
0.695 / 0.803
This table clearly demonstrates how the model making use of the default priors has different results than the model with the priors we had set ourselves!
If we think about this more profoundly this doesn’t have to come as a surprise! The priors used for the effect of Occasion was maybe nonsense. Why not assuming that students got more fluent over the time course of 26 weeks? To be fairly honest, I deliberately used these priors that seemed uninformative so that we would - for didactical reasons - run into this issue!
For the rest of the exercise we will stick to the results of the model with default priors. But, another option could be to change our custom priors, mimicking the idea that we expect students to become more fluent on the course of 26 weeks. That’s what I actually did in a blogpost that I wrote about using the priorsense package that can be found here: https://sdemaeyer.quarto.pub/posts/2024-02-PriorSense/PriorSensePost.html
3. Task 3
Now a more general task. Make different visualizations of the model results.
One of the possible visualizations could be a rather complex one. Remember, there are 3 conditions and 3 occasions. What I like to see is a plot showing the expected means for each of the conditions on each of the 3 occasions.
And what do we learn about the progress between Occ1 and Occ2 in each of the groups? And what about the progress between Occ2 and Occ3?
Possible solution
Let’s start tackling this challenge:
|“One of the possible visualizations could be a rather complex one. Remember, there are 3 conditions and 3 occasions. What I like to see is a plot showing the expected means for each of the conditions on each of the 3 occasions.”
Starting point: create an object containing the draws from the posterior based on model M3.
Code
posterior_M3 <-as_draws_df(M3)
The next step is calculating some predicted means, based on combinations of our dummy variables in the model. This is a ‘tricky’ step because we have to keep in mind that the model also contains interaction effects.
Code
posterior_M3 <- posterior_M3 %>%mutate(# calculate expected mean for Occ1 condition NoSub (that's just our intercept)Occ1_NoSub = b_Intercept,# calculate expected mean for Occ1 condition FL (add main effect of FL)Occ1_FL = b_Intercept + b_FL,# calculate expected mean for Occ1 condition MT (add main effect of MT)Occ1_MT = b_Intercept + b_MT,# calculate expected mean for Occ2 condition NoSub (add main effect of Occ2)Occ2_NoSub = b_Intercept + b_Occ2,# calculate expected mean for Occ2 condition FL (add main effects and interaction term)Occ2_FL = b_Intercept + b_Occ2 + b_FL +`b_Occ2:FL`,# calculate expected mean for Occ2 condition MT (add main effects and interaction term)Occ2_MT = b_Intercept + b_Occ2 + b_MT +`b_Occ2:MT`,# calculate expected mean for Occ3 condition NoSub (add main effect of Occ2)Occ3_NoSub = b_Intercept + b_Occ3,# calculate expected mean for Occ3 condition FL (add main effects and interaction term)Occ3_FL = b_Intercept + b_Occ3 + b_FL +`b_FL:Occ3`,# calculate expected mean for Occ3 condition MT (add main effects and interaction term)Occ3_MT = b_Intercept + b_Occ3 + b_MT +`b_MT:Occ3` )
Now that we have calculated these estimated means, we can use these columns to create plots. What
Code
library(ggplot2)library(ggdist)posterior_M3 %>%# First: select only the relevant columns of our posterior draws objectselect( Occ1_NoSub, Occ1_FL, Occ1_MT, Occ2_NoSub, Occ2_FL, Occ2_MT, Occ3_NoSub, Occ3_FL, Occ3_MT, ) %>%# Second: pivot everything longer to get two columns# column called `name` containing the name of the parameter# calumn called `value` containing the value for that parameter in that drawpivot_longer(everything()) %>%# Now create new variable that indicates the Occasion because I want to use that as X-axis# And create new variable that indicates the group (NoSub, MT, or FL) to make groups in the plotmutate(Occasion =case_when( name =="Occ1_NoSub"| name =="Occ1_FL"| name =="Occ1_MT"~"Occ1", name =="Occ2_NoSub"| name =="Occ2_FL"| name =="Occ2_MT"~"Occ2", name =="Occ3_NoSub"| name =="Occ3_FL"| name =="Occ3_MT"~"Occ3", ),Group =case_when( name =="Occ1_NoSub"| name =="Occ2_NoSub"| name =="Occ3_NoSub"~"No subtitles", name =="Occ1_FL"| name =="Occ2_FL"| name =="Occ3_FL"~"Foreign language", name =="Occ1_MT"| name =="Occ2_MT"| name =="Occ3_MT"~"Mother tongue", ) ) %>%# Time to create the plotggplot(aes(x = Occasion, y = value, group = Group, fill = Group) # identify the experimental groups by the fill color ) +stat_halfeye(.width =c(.5, .89),alpha = .4# make the fill color of the density plots more transparent )
The next challenge:
|“And what do we learn about the progress between Occ1 and Occ2 in each of the groups? And what about the progress between Occ2 and Occ3? Is progress between two occasions different in each of the conditions?”
These questions can be compared with using tests on estimated marginal means and post-hoc testing by making use of contrasts in the frequentist realm.
We can first answer these questions by making visualisations of the posterior probability distributions for the estimated differences between occasions in each of the conditions. To make plots we rely again on our draws drawn from our posterior probability distributions. As a starting point we use the posterior_M3b object created above.
Code
# Start by creating difference scores between occasions based on drawsposterior_M3 <- posterior_M3 %>%mutate(# Difference between occasions for condition NoSubOcc2_Occ1_NoSub = Occ2_NoSub - Occ1_NoSub,Occ3_Occ2_NoSub = Occ3_NoSub - Occ2_NoSub,# Difference between occasions for condition FLOcc2_Occ1_FL = Occ2_FL - Occ1_FL,Occ3_Occ2_FL = Occ3_FL - Occ2_FL,# Difference between occasions for condition MTOcc2_Occ1_MT = Occ2_MT - Occ1_MT,Occ3_Occ2_MT = Occ3_MT - Occ2_MT )# Create the visualisationposterior_M3 %>%# select only the relevant columns for creating the plotselect( Occ2_Occ1_NoSub, Occ3_Occ2_NoSub, Occ2_Occ1_FL, Occ3_Occ2_FL, Occ2_Occ1_MT, Occ3_Occ2_MT, ) %>%# do the pivot_longer to create one vector of values and a vector of parameter namespivot_longer(everything( ) ) %>%# Create variables that help identify groups of parameters to color the distributionsmutate(Occasion =case_when( name =="Occ2_Occ1_NoSub"| name =="Occ2_Occ1_FL"| name =="Occ2_Occ1_MT"~"Progress Occ1 Occ2", name =="Occ3_Occ2_NoSub"| name =="Occ3_Occ2_FL"| name =="Occ3_Occ2_MT"~"Progress Occ2 Occ3"),Group =case_when( name =="Occ2_Occ1_NoSub"| name =="Occ3_Occ2_NoSub"~"No subtitles", name =="Occ2_Occ1_FL"| name =="Occ3_Occ2_FL"~"Foreign language", name =="Occ2_Occ1_MT"| name =="Occ3_Occ2_MT"~"Mother tongue", ) ) %>%ggplot(aes(x = value,y = Group ) ) +# Let's plot interval visualisationsstat_pointinterval(.width =c(.5, .89) ) +scale_y_discrete(name ="") +# Drop the label of the y-scalescale_x_continuous(name ="Fluency") +# Name the x-scalefacet_wrap(.~Occasion)
From this visualisation we learn that the progress between occasion 1 and 2 is rather similar in both subtitle groups (intervals strongly overlap). The students in the condition without subtitles make a smaller progress between occasion 1 and occasion 2 than the students in both subtitling conditions.
Focussing on the progress between occasion 2 and 3, we see that the progress of the students in the foreign language subtitling condition stands out from the progress made in the two other conditions.
Hypothesis Tests for class :
Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
1 (Occ2_Occ1_MT)-(O... > 0 0.98 1.8 -1.98 3.96 2.4
Post.Prob Star
1 0.71
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.
If we wrap this hypothesis() part within a plot() function, we get a visualization of the posterior probability distribution of the difference in progress for the two subtitling conditions.
This visualization shows that a great amount of credible estimates of this difference in progress between both subtitling conditions is situated around zero, with high probabilities for both a negative and positive difference. In other words, most of the evidence shows that we are not sure to state whether both subtitling conditions differ in the progress in fluency they induced between occasion 1 and occasion 2.
Now, let’s focus on the difference between occasion 2 and occasion 3: how much evidence is there that this progress is stronger in the condition of foreign language subtitling compared to the condition of mother tongue subtitling?
Hypothesis Tests for class :
Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
1 (Occ3_Occ2_FL)-(O... > 0 9.26 1.77 6.31 12.15 Inf
Post.Prob Star
1 1 *
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.
Here we can see that we have a lot of supporting evidence to conclude that the improvement between occasion 2 and 3 in fluency is stronger for the students in the foreign language condition than for the students in the mother tongue condition.
Frumuselu, Anca Daniela, Sven De Maeyer, Vincent Donche, and María del Mar Gutiérrez Colon Plana. 2015. “Television Series Inside the EFL Classroom: Bridging the Gap Between Teaching and Learning Informal Language Through Subtitles.”Linguistics and Education 32 (December): 107–17. https://doi.org/10.1016/j.linged.2015.10.001.