This dataset is a simulated dataset that is based on an existing study of Frumuselu et al. (2015). In this study, the key question was whether subtitles help in foreign language acquisition. Spanish students (n = 36) watched episodes of the popular tv-show “Friends” for half an hour each week, during 26 weeks. The students were assigned to 3 conditions:
English subtitled (condition “FL”)
Spanish subtitled (condition “MT”)
No subtitles (condition “NoSub”)
At 3 occasions students got a Fluency test:
Before the 26 weeks started
After 12 weeks
After the experiment
The dependent variable is a measure based on the number of words used in a scripted spontaneous interview with a test taker. The data is structured as follows:
If we visualize the dataset we get a first impression of the effect of the condition. In this exercise it is your task to do the proper Bayesian modelling and interpretation.
First we will start by building 4 alternative mixed effects models. In each of the models we have to take into account the fact that we have multiple observations per student. So we need a random effect for students in the model.
M0: only an intercept and a random effect for student
M1: M0 + fixed effect of occasion
M2: M1 + fixed effect of condition
M3: M2 + interaction effect between occasion and condition
Make use of the default priors of brms.
Once the models are estimated, compare the models on their fit making use of the leave-one-out cross-validation. Determine which model fits best and will be used to build our inferences.
2. Task 2
Now that we have established the best fitting model, it is our task to approach the model critically before delving into the interpretation of the results.
Apply the different steps of the WAMBS check-list template for the final model.
Subtask 2.1: what about the priors?
What are the default brms priors? Do they make sense? Do they generate impossible datasets? If necessary, specify your own (weakly informative) priors and approach them critically as well.
Subtask 2.2: did the model converge properly?
Perform different checks on the convergence of the model.
Subtask 2.3: does the posterior distribution histogram have enough information?
Check if the posterior distribution histograms of the different parameters are informative enough to substantiate our inferences.
Subtask 2.4: how well does the model predict the observed data?
Perform posterior predictive checks based on the model.
Subtask 2.5: what about prior sensitivity of the results?
Finally, we have to check if the results of our model are not too dependent on the priors we specified in the model.
3. Task 3
Now a more general task. Make different visualizations of the model results.
One of the possible vizualisations could be a rather complex one. Remember, there are 3 conditions and 3 occasions. What I like to see is a plot showing the expected means for each of the conditions on each of the 3 occasions.
And what do we learn about the progress between Occ1 and Occ2 in each of the groups? And what about the progress between Occ2 and Occ3? Is progress between two occasions different in each of the conditions?
Frumuselu, Anca Daniela, Sven De Maeyer, Vincent Donche, and María del Mar Gutiérrez Colon Plana. 2015. “Television Series Inside the EFL Classroom: Bridging the Gap Between Teaching and Learning Informal Language Through Subtitles.”Linguistics and Education 32 (December): 107–17. https://doi.org/10.1016/j.linged.2015.10.001.