discovr: Comparing several means (GLM 1)
Overview
Usage: This tutorial accompanies Discovering Statistics Using R and RStudio (Field 2023) by Andy Field. It contains material from the book so there are some copyright considerations but I offer them under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. Tl;dr: you can use this tutorial for teaching and non-profit activities but please don't meddle with it or claim it as your own work.
Welcome to the discovr
space pirate academy
Hi, welcome to discovr space pirate academy. Well done on embarking on this brave mission to planet s, which is a bit like Mars, but a less red and more hostile environment. That's right, more hostile than a planet without water. Fear not though, the fact you are here means that you can master , and before you know it you'll be as brilliant as our pirate leader Mae Jemstone (she's the badass with the gun). I am the space cat-det, and I will pop up to offer you tips along your journey.
On your way you will face many challenges, but follow Mae's system to keep yourself on track:
- This icon flags materials for teleporters. That's what we like to call the new cat-dets, you know, the ones who have just teleported into the academy. This material is the core knowledge that everyone arriving at space academy must learn and practice. For accessibility, these sections will also be labelled with (1).
- Once you have been at space pirate academy for a while, you get your own funky visor. It has various modes. My favourite is the one that allows you to see everything as a large plate of tuna. More important, sections marked for cat-dets with visors goes beyond the core material but is still important and should be studied by all cat-dets. However, try not to be disheartened if you find it difficult. For accessibility, these sections will also be labelled with (2).
- Those almost as brilliant as Mae (because no-one is quite as brilliant as her) get their own space suits so that they can go on space pirate adventures. They get to shout RRRRRR really loudly too. Actually, everyone here gets to should RRRRRR really loudly. Try it now. Go on. It feels good. Anyway, this material is the most advanced and you can consider it optional unless you are a postgraduate cat-det. For accessibility, these sections will also be labelled with (3).
It's not just me that's here to help though, you will meet other characters along the way:
- aliens love dropping down onto the planet and probing humanoids. Unfortunately you'll find them probing you quite a lot with little coding challenges. Helps is at hand though.
- bend-R is our coding robot. She will help you to try out bits of by writing the code for you before you encounter each coding challenge.
- we also have our friendly alien bugs that will, erm, help you to avoid bugs in your code by highlighting common mistakes that even Mae Jemstone sometimes makes (but don't tell her I said that or my tuna supply will end).
Also, use hints and solutions to guide you through the exercises (Figure 1).

By for now and good luck - you'll be amazing!
Workflow
Before attempting this tutorial it's a good idea to work through this tutorial on how to install, set up and work within and
.
The tutorials are self-contained (you practice code in code boxes). However, so you get practice at working in
I strongly recommend that you create an Quarto document within an
project and practice everything you do in the tutorial in the Quarto document, make notes on things that confused you or that you want to remember, and save it. Within this Quarto document you will need to load the relevant packages and data.
Packages
This tutorial uses the following packages:
BayesFactor
(Morey and Rouder 2022)broom
(Robinson, Hayes, and Couch 2022)effectsize
(Ben-Shachar, Lüdecke, and Makowski 2020; Ben-Shachar et al. 2022)emmeans
(Lenth 2022) is used bymodelbased
here
(Müller 2020)ggfortify
(Horikoshi and Tang 2022; Tang, Horikoshi, and Li 2016)Hmisc
(Harrell 2022) is loaded byggplot2
knitr
(Xie 2022)modelbased
(Makowski et al. 2022)parameters
(Lüdecke et al. 2020; Lüdecke et al. 2022)sandwich
(Zeileis and Lumley 2022; Zeileis 2006) is automatically loaded byparameters
It also uses these tidyverse
packages (Wickham 2022; Wickham et al. 2019): dplyr
(Wickham et al. 2022), forcats
(Wickham 2021), ggplot2
(Wickham 2016), and readr
(Wickham, Hester, and Bryan 2022), `.
Coding style
There are (broadly) two styles of coding:
Verbose: Using this style you declare the package when using a function:
package::function()
. For example, if I want to use themutate()
function from the packagedplyr
, I will typedplyr::mutate()
. If you adopt verbose style, you don't need to load packages at the start of your Quarto document (although see below for some exceptions).Concise: Using this style you load all of the packages at the start of your Quarto document using
library(package_name)
, and then refer to functions without their package. For example, if I want to use themutate()
function from the packagedplyr
, I will uselibrary(dplyr)
in my first code chunk and type the function asmutate()
when I use it subsequently.
Coding style is a personal choice. The Google style guide and tidyverse style guide recommend a verbose style, and I use it in teaching materials for two reasons (1) it helps you to remember which functions come from which packages, and (2) it prevents clashes resulting from using functions from different packages that have the same name. However, even with this style it makes sense to load tidyverse
because the dplyr
and ggplot2
packages contain functions that are often used within other functions and in these cases the verbose style is difficult to read. Also, no-one wants to write ggplot2::
before every function from ggplot2
.
You can use either style in this tutorial because all packages are pre-loaded. If working outside of the tutorial, load the tidyverse
package (and any others if you're using a concise style) at the beginning of your Quarto document:
library(tidyverse)
Data
To work outside of this tutorial you need to download the following data files:
Set up an project in the way that I recommend in this tutorial, and save the data files to the folder within your project called data. Place this code in the first code chunk in your Quarto document:
puppy_tib <- here::here("data/puppies.csv") |>
readr::read_csv() |>
dplyr::mutate(
dose = forcats::as_factor(dose)
)
This code reads in the data and converts the variable dose to a factor (categorical variable).
A puppy-tastic example (1)
The main example in this tutorial is about puppies. Here's my dog, Milton, when he was a puppy:

If that doesn't make you feel better about this tutorial then nothing will because lots of people believe that puppy therapy is good for stress, and that's what this example is all about. Puppy therapy is a form of animal-assisted therapy, in which puppy contact is introduced into the therapeutic process. Despite a common belief that puppy therapy is effective in reducing stress, the evidence base is pretty mixed. Imagine we ran a study in which we randomized people into three groups: (1) a control group in which people had no puppy contact; (2) 15 minutes of puppy therapy (a low-dose group); and (3) 30 minutes of puppy contact (a high-dose group). The dependent variable was a measure of happiness ranging from 0 (as unhappy as I can possibly imagine being) to 10 (as happy as I can possibly imagine being).
The design of this study mimics a very simple randomized controlled trial (as used in pharmacological, medical and psychological intervention trials) because people are randomized into a control group or groups containing the active intervention (in this case puppies, but in other cases a drug or a surgical procedure). We'd predict that any form of puppy therapy should be better than the control (i.e. higher happiness scores) but also formulate a dose-response hypothesis that as exposure time increases (from 0 minutes to 15 and 30) happiness will increase too.
Alien coding challenge
View the data in puppy_tib.
puppy_tib
Note that there are three variables: the participant's id, which is a character variable (note the
The variable dose is a factor (categorical variable), so having read the data file and converted it to a factor it's a good idea to check that the levels of dose are in the order that we want: Control, 15 minutes, 30 minutes.
Code example
To check the levels of a factor we use the levels()
function. For example, to check the levels of the variable dose within the tibble puppy_tib we'd execute
levels(puppy_tib$dose)
Because I have set up the data within this tutorial you should see that the levels are listed in the order that we want them when you execute the code. If they were in the wrong order, we could use fct_relevel
to change the order:
puppy_tib <- puppy_tib |>
dplyr::mutate(
dose = forcats::fct_relevel(dose, "No puppies", "15 mins", "30 mins")
)
The model (1)
We can include categorical predictors using dummy coding (there are other forms of coding too, we'll look at one of those later, contrast coding, later).
Group | Dummy 1 (long) | Dummy 2 (short) |
---|---|---|
No puppies (control) | 0 | 0 |
15 minutes | 0 | 1 |
30 minutes | 1 | 0 |
Table 1 shows dummy coding for this example. Every person has their group membership coded by a unique combination of 0s and 1s across the two dummy variables. Someone in the no puppy condition would have a 0 for both dummy variables, someone who had 15 minutes of therapy would have a 0 on the dummy variable labelled long and a 1 on the dummy variable labelled short, and someone who received 30-minutes of puppy therapy would have a 1 on the dummy variable labelled long and a 0 on the dummy variable labelled short. The two dummy variables represent the length of the dose of puppy therapy: one compares the 30-minute dose against no puppies (long) and the other compares the 15-minute does against no puppies (short).
As revision from the chapter, the model we're fitting is:
\[ \text{happiness}_i = \hat{b}_0 + \hat{b}_1\text{long}_i+ \hat{b}_2\text{short}_i + e_i \] in which a persons happiness is predicted from knowing their group code (i.e., the numeric code for the long and short dummy variables in Table 1) and the estimates of the parameters for these effects and the intercept (\(b_0\)).
Exploring the data (1)
Alien coding challenge
Use what you already know to create a violin plot with error bars with dose on the x-axis and happiness scores on the y-axis.
# Start by initializing the plot replace the placeholders with the name of the tibble and variable names):
ggplot2::ggplot(tibble_name, aes(var_for_x_axis, var_for_y_axis))
# Now add the violin geom
# Add the violin geom:
ggplot2::ggplot(puppy_tib, aes(dose, happiness)) +
geom_violin()
# Use stat_summary() to add the error bars
# Use stat_summary() to add the error bars (it's fine to use mean_cl_normal):
ggplot2::ggplot(puppy_tib, aes(dose, happiness)) +
geom_violin() +
stat_summary(fun.data = "mean_cl_boot")
# Add axis labels, then sensible breaks for the y-axis and, finally, a minimal theme
# Solution
ggplot2::ggplot(puppy_tib, aes(dose, happiness)) +
geom_violin() +
stat_summary(fun.data = "mean_cl_boot") +
labs(x = "Dose of puppies", y = "Happiness (0-10)") +
scale_y_continuous(breaks = 1:7) +
theme_minimal()
Alien coding challenge
Use what you already know to compute the mean and a 95% confidence interval of happiness scores split by the therapy group to which a person belonged.
# Start by piping the tibble into the group_by function to group output by dose:
puppy_tib |>
dplyr::group_by(dose)
# Now pipe the results into the summarize() function
# Pipe the results into the summarize() function
puppy_tib |>
dplyr::group_by(dose) |>
dplyr::summarize()
# Within summarize(), use the mean() function to create a variable that is the mean happiness score
# Use the mean() function to create a variable that is the mean happiness score:
puppy_tib |>
dplyr::group_by(dose) |>
dplyr::summarize(
mean = mean(happiness, na.rm = TRUE)
)
# Add two more rows that use mean_cl_normal to calculate the lower and upper boundary of the 95% confidence interval
# Solution
puppy_tib |>
dplyr::group_by(dose) |>
dplyr::summarize(
mean = mean(happiness, na.rm = TRUE),
`95% CI lower` = mean_cl_normal(happiness)$ymin,
`95% CI upper` = mean_cl_normal(happiness)$ymax
) |>
knitr::kable(digits = 3) # to round values
Fit the model (1)
Dummy coding in (1)
automatically dummy codes categorical predictors. If the categorical predictor is a factor, then it uses the first level of the variable as its reference category. For the variable dose (which is a factor) the levels are ordered No puppies, 15 minutes and 30 minutes. As such, dummy variable 1 will represent the no puppies group vs. the 15-minute group. will label this variable as dose15 mins, which tells us that this dummy variable represents the part of dose that compares the 15-minute group to the no puppies group. Dummy variable 2 will represent the no puppies group vs. the 30-minute group. will label this variable as dose30 mins, which tells us that this dummy variable represents the part of dose that compares the 30-minute group to the no puppies group.
For these data, these default comparisons make sense (we compare everything to the no puppies group) but there will be times when the default creation of dummy variables and the order of your factor levels groups do not combine to make the comparisons that you want. You may need to change the order of factor levels, or specify bespoke contrasts (see later).
Tip: Character variables
You can use character variables (rather than factors) as predictors in linear models and will try to do something sensible with them. Specifically, it orders the categories alphabetically and creates dummy variables using the first category as the reference category. However, it's good practice to convert character variables to factors yourself and specify the order of levels so that you are in control of what goes into the model.
Using lm()
(1)
To fit the model we use the lm()
function, because we are fitting a linear model with a categorical predictor. We've used this function before, just to recap it takes the following general form (I've retained only the key options):
lm(outcome ~ predictor(s), data, subset, na.action = na.fail)
Based on previous tutorials we could do the following 4 steps:
- Create a model called puppy_lm using
lm()
- Summarize the fit of the model using
broom::glance()
- Obtain the model parameters and confidence intervals using
broom::tidy()
- Plot diagnostics using
plot()
This is, basically, what we do except that when we're using the linear model to compare means, we often want an overall test of each predictor. We don't get this with broom::glance()
(we get only the overall fit). Instead we can use the anova()
function to get overall effects of the model and then pipe this into parameters::parameters()
to get an effect size (\(\omega^2\)) for each predictor.
Code example
To get overall (omnibus) tests of categorical predictors we can use the following general code:
anova(my_model) |>
parameters::parameters(omega_squared = "raw")
In which we replace my_model with the name of a model created using lm()
. By default we will get the overall effect size (\(\omega^2\)), but we will see in future tutorials that you can instead get a version of this effect size that adjusts for other predictors known as partial \(\omega^2\), by changing omega_squared = "raw" to omega_squared = "partial".
Alien coding challenge
Using the sample code and what you have already learnt, use the code box below to:
- Create a model called puppy_lm using
lm()
that predicts happiness from dose. - Summarize the overall fit of predictors by adapting the sample code
- Obtain the model parameters and confidence intervals using
tidy()
# Create a model called puppy_lm using lm:
puppy_lm <- lm(xxx ~ xxx, data = xxxx)
# Create a model called puppy_lm using lm:
puppy_lm <- lm(happiness ~ dose, data = puppy_tib)
# Now adapt the sample code to get the summary of the model
# Adapt the sample code to get the summary of the model
anova(puppy_lm) |>
parameters::parameters(omega_squared = "raw")
# Obtain the model parameters and confidence intervals using `tidy()`
# Solution
puppy_lm <- lm(happiness ~ dose, data = puppy_tib, na.action = na.exclude)
anova(puppy_lm) |>
parameters::parameters(omega_squared = "raw") |>
knitr::kable(digits = 3) # to round values
broom::tidy(puppy_lm, conf.int = TRUE) |>
knitr::kable(digits = 3) # to round values
The test of whether the group means are the same is represented by the F-statistic for the effect of dose, which is significant, F(2, 12) = 5.12, p = 0.025. Given that our model represents the group means, this F tells us that using group means to predict happiness scores is significantly better than using the mean of all scores: in other words, the group means are significantly different
Moving onto the parameter estimates, \(\hat{b}_0\) (the value in the column labelled estimate and the row labelled (intercept)) is equal to the mean of the baseline category (the no puppies group), \(\hat{b}_0\) = 2.20 (check your earlier summary statistics!). The b-value for the first dummy variable (labelled dose15 mins) is equal to the difference between the means of the 15-minute group and the control group (3.20 \(−\) 2.20 = 1.00).
The b-value for the second dummy variable (labelled dose30 mins) is equal to the difference between the means of the 30-minute group and the control group (5.00 \(−\) 2.20 = 2.80). These values demonstrate how dummy coding partitions the variance in happiness scores to compare specific group means. We can see from the significance values of the associated t-tests that the difference between the 30-minute group and the control group is significant because p = 0.008, which is less than 0.05; however, the difference between the 15-minute and the control group is not (p = 0.282).
Report
Overall, happiness was significantly different across the three therapy groups, F(2, 12) = 5.12, p = 0.025. Happiness was significantly different to zero in the no puppies group, \(\hat{b}\) = 2.20 [0.83, 3.57], t = 3.51, p = 0.004, was not significantly higher in the 15-minute therapy group compared to the no puppy control, \(\hat{b}\) = 1.00 [-0.93, 2.93], t = 1.13, p = 0.282, but was significantly higher the 15-minute therapy group compared to the no puppy control, \(\hat{b}\) = 2.80 [0.87, 4.73], t = 3.16, p = 0.008. A 30-minutes dose of puppies, therefore, appears to improve happiness compared to no puppies but a 15-minutes does does not.
Diagnostic plots (1)
As with any linear model, we can use the plot()
function to produce diagnostic plots from the model. We have used this function in previous tutorials. Remember that it takes this general form:
plot(my_model, which = numbers_of_the_plots_you_want)
You place the name of your model into the function and use which to request specific plots. We use plots 1 and 3 to explore linearity and homoscedasticity, plot 2 is a Q-Q plot to look for normality in residuals, and plot 4 shows each case's Cook's distance so is useful for identifying influential cases. Therefore, a reasonable set of plots is to look at plots 1, 3, 2, and 4 from the plot()
function.
Alien coding challenge
Obtain plots 1, 3, 2 and 4 (in that order) for the model puppy_lm.
plot(puppy_lm, which = c(1, 3, 2, 4))
# or to get a nicely formatted plots
# library(ggfortify) # outside of this tutorial you'll need this
ggplot2::autoplot(puppy_lm,
which = c(1, 3, 2, 4),
colour = "#5c97bf",
smooth.colour = "#ef4836",
alpha = 0.5,
size = 1) +
theme_minimal()
The Residual vs. fitted and Scale-location plots show points that are equally spread for the three groups, which implies that residual variances are similar across groups. The dots look odd because they are aligned in columns, but this reflects the fact there are only 3 possible values of the predictor variable (no puppies, 15 minutes and 30 minutes). The Q-Q plot tells us about the normality of residuals in the model. We want our residuals to be normally distributed which means that the dots on the graph should cling lovingly to the diagonal line. Ours look like they have had a bit of an argument with the diagonal line, which suggests that we may not be able to assume normality of errors and should perhaps use a robust model (which will be explained sooner than you might like). The plot of Cooks distances shows that they are all well below 1, so no influential cases to worry about.
Contrast coding (2)
You might be quite bored by now. At the beginning I tried to motivate you with a photo of my dog, Milton. I don't know whether it worked, but I'm going to ramp up the cute factor with a video of the first time I met Milton because you can never have too many puppies.
Now were all in a state of peaceful, inner, puppy calm and tranquillity, let's destroy it with contrast coding.
Remember that we predicted that any form of puppy therapy should be better (i.e. higher happiness scores) than the no puppies condition and that as exposure time increases happiness will increase too (a dose-response hypothesis). In the chapter we discovered that we can operationalize these hypotheses as two dummy variables using the contrast coding in Table 2.
Group | Dummy 1 (No puppies vs. puppies) | Dummy 2 (15 mins vs 30 mins) |
---|---|---|
No puppies (control) | -2/3 | 0 |
15 minutes | 1/3 | -1/2 |
30 minutes | 1/3 | 1/2 |
As revision from the chapter, when using this coding scheme the model we're fitting is:
\[ \text{happiness}_i = \hat{b}_0 + \hat{b}_1\text{contrast 1}_i+ \hat{b}_2\text{contrast 2}_i + e_i \] In which the variables contrast 1 and contrast 2 are the dummy variables that represents the no puppies group compared to all other groups (contrast 1), and the difference between the 15-minute group and the 30-minute group (contrast 2).
Code example
Use the contrast()
function to set the contrast attribute of a variable. It takes the general form:
contrasts(predictor_variable) <- contrast_instructions
The contrast_instructions can be a set of weights for the contrasts that you want to do, or a built in contrast. To set the contrasts in Table 2, we need to tell what weights to assign to each group. The weights we want to assign for contrast 1 are \(−2/3\) (no puppies), \(1/3\) (15-minute group) and \(1/3\) (30-minute group). We can collect these values into an object in the usual way. Lets call the object puppy_vs_none to remind us what it represents:
puppy_vs_none <- c(-2/3, 1/3, 1/3)
The object puppy_vs_none indicates that the first group has a weight of \(-2/3\), and the second and third groups a weight of \(1/3\). The order of the numbers is important because it corresponds to the order of the factor levels for the predictor variable. In the puppy data, remember that the order of levels of dose is no puppies, 15-minutes and 30-minutes. As such, puppy_vs_none contains the weights for no puppies, 15-minutes and 30-minutes, in that order.
We can do the same for the second contrast. We know from Table 2 that the weights for contrast 2 are: 0 (no puppies), −1/2 (15-minute group) and 1/2 (30-minute group). Remembering that the first weight we enter will be for the no puppies group, we enter the value 0 as the first weight, then \(-1/2\) for the 15-minute group and finally \(1/2\) for the 30-minute group. Lets call this object short_vs_long to remind us that it compares the means of the shorter puppy therapy session against the mean of the longer one:
short_vs_long <- c(0, -1/2, 1/2)
Having created these variables we need to bind them together as columns using cbind()
, and set them as the contrast attached to our predictor variable, dose:
contrasts(puppy_tib$dose) <- cbind(puppy_vs_none, short_vs_long)
This command sets the contrast property of dose to contain the weights for the two contrasts that we just created. To make sure we've done everything correctly we can check the contrasts for dose variable by executing:
contrasts(puppy_tib$dose)
The weights have been converted to decimals, but you can see that they are set up as they should be (it might help to look back at Table 2).
Alien coding challenge
Use the code from the example to set the contrasts in Table 2 for dose.
# Set the weights for the first contrast using the example code:
puppy_vs_none <- c(-2/3, 1/3, 1/3)
# Now set the weights for the second contrast
# Set the weights for the second contrast:
short_vs_long <- c(0, -1/2, 1/2)
# Next, set the two contrasts to the dose variable
# Set the two contrasts to the dose variable:
contrasts(puppy_tib$dose) <- cbind(puppy_vs_none, short_vs_long)
# Finally, view the weights to check they are set correctly
# Put it all together:
puppy_vs_none <- c(-2/3, 1/3, 1/3)
short_vs_long <- c(0, -1/2, 1/2)
contrasts(puppy_tib$dose) <- cbind(puppy_vs_none, short_vs_long)
contrasts(puppy_tib$dose) # This line prints the contrast weights so we can check them
Tip: Built in contrasts
has several functions for setting built-in contrasts. In each case n is the number of factor levels/group. For the puppy data \(n = 3\).
contr.treatment(n, base = x)
: Each category is compared to a user-defined baseline category, x. For the puppy example, to compare each group to the 30-minute group we'd use
contr.treatment(3, base = 3)
contr.SAS(n)
: Each category is compared to the last category.contr.helmert(n)
Each category (except the first) is compared to the mean effect of all previous categories. For the puppy data this would give give us a first contrast that compares the second category (15-minutes) to the previous category (no puppies), and a second contrast that compares the third category (30-minutes) to the combined effect of 15-minutes and no puppies (the previous categories).
You set these contrasts in the usual way. For example, to set a Helmert contrast for dose we'd execute:
contrasts(puppy_tib$dose) <- contr.helmert(3)
Alien coding challenge
Having set the contrasts for dose we can fit the model using exactly the same code as before. Do this in the code box.
Tip: rounding values in tables
Remember that you can round numeric values in tables by piping the table into kable()
. For example, if we want to round the numbers in the object created by broom::tidy(puppy_lm, conf.int = TRUE)
to three decimal places, we could use:
broom::tidy(puppy_lm, conf.int = TRUE) |>
knitr::kable(digits = 3)
puppy_lm <- lm(happiness ~ dose, data = puppy_tib)
anova(puppy_lm) |>
parameters::parameters(omega_squared = "raw") |>
knitr::kable(digits = 3)
broom::tidy(puppy_lm, conf.int = TRUE) |>
knitr::kable(digits = 3)
The first table has the test of whether the group means are the same represented by the F-statistic for the effect of dose. This table is identical to when we fitted the model using dummy coding. That's because, overall, the model hasn't changed (we're still predicting happiness from the 3 group means), it's only how we decompose the effect of dose that has changed.
The table of parameter estimates is different to before. Notice that the contrasts are helpfully labelled because we named them when we set them up. The first contrast compares the combined puppy therapy groups to the no puppies control. The difference in the mean happiness for anyone having any duration of puppy therapy compared to those having no puppy therapy was 1.90, and if we assume this sample is one of the 95% that yields confidence intervals containing the population values then this difference could be anything between 0.23 (a fairly small difference given happiness was measured on a 10-point scale) and 3.57 (a fairly large shift along the 10-point scale). The observed difference of 1.90 is statistically significantly different from 0 as shown by the t-test, which has a p = 0.029. This contrast suggests that happiness was significantly higher in those exposed to puppies, than those who were not.
The second contrast shows that the mean happiness across the people having 30-minutes of puppy therapy was 1.80 higher than those having 15 minutes. Again, if we assume this sample is one of the 95% that yields confidence intervals containing the population values then this difference could be anything between -0.13 (people who have 30 minutes of puppy therapy are less happy than those having 15 minutes) and 3.73 (people having 30 minutes of puppy therapy are a fair bit happier than those having 15 minutes). The observed difference of 1.80 is not statistically significantly different from 0 as shown by the t-test, which has a p = 0.065. This contrast suggests that happiness was statistically comparable in those receiving 15- and 30-minutes of puppy therapy.
Report
Overall, happiness was significantly different across the three therapy groups, F(2, 12) = 5.12, p = 0.025. Happiness was significantly different to zero in the no puppies group, \(\hat{b}\) = 3.47 [2.68, 4.26], t = 9.57, p < 0.001. Happiness was significantly higher for those that had any puppy therapy compared to the no puppy control, \(\hat{b}\) = 1.90 [0.23, 3.57], t = 2.47, p = 0.029, but was not significantly different in the 30-minute therapy group compared to the 15-minute group, \(\hat{b}\) = 1.80 [-0.13, 3.73], t = 2.03, p = 0.065. A dose of puppies, therefore, appears to improve happiness compared to no puppies but the duration of therapy did not have a significant impact.
Trend analysis (polynomial contrasts) (2)
Our groups have a meaningful order, so we might have wanted to conduct trend analysis. Had we wanted to do this we first make sure that the levels of dose were in the correct order (no puppies, 15-minutes and 30-minutes) and if not re-ordered them.
Code example
With the factor levels in the correct order we would set the contrast using:
contrasts(puppy_tib$dose) <- contr.poly(3)
The 3 tells contr.poly()
how many groups there are in the predictor variable.
Alien coding challenge
Having set the contrast we we'd recreate the model using the same code as before (the only difference Id suggest is naming the resulting model puppy_trend. Try setting a polynomial contrast and then refitting the model.
contrasts(puppy_tib$dose) <- contr.poly(3)
puppy_trend <- lm(happiness ~ dose, data = puppy_tib)
anova(puppy_trend) |>
parameters::parameters(omega_squared = "raw") |>
knitr::kable(digits = 3) #only necessary to round values
broom::tidy(puppy_trend, conf.int = TRUE) |>
knitr::kable(digits = 3) #only necessary to round values
The resulting Output breaks down the experimental effect to see whether it can explained by either a linear (dose.L) or a quadratic (dose.Q) relationship in the means First, lets look at the linear component. This contrast tests whether the means increase across groups in a linear way. For the linear trend the t-statistic is 1.98 and this value is significant at p = 0.008. Therefore, we can say that as the dose of puppy therapy increased from nothing to 15 minutes to 30 minutes, happiness increased proportionately. The quadratic trend tests whether the pattern of means is curvilinear (i.e., is represented by a curve that has one bend). The error bars on the violin plot, which we created in the Exploring data section, suggest that the means cannot be represented by a curve and the results for the quadratic trend bear this out. The t-statistic for the quadratic trend is non-significant, p = 0.612, which is not very significant at all.
Post hoc tests (2)
Were going to use the estimate_contrasts()
function from the modelbased
package (Makowski et al. 2022), which builds upon the emmeans
package (Lenth 2022) to provide more user-friendly experience.
Code example
To get post hoc tests for a linear model, you place the linear model into the function. In general, then:
modelbased::estimate_contrasts(my_model,
contrast = "predictor_variable",
adjust = "holm",
ci = 0.95)
In which my_model is replaced by the name of your linear model (puppy_lm in our case). Some of the key arguments are:
- contrast: replace "predictor_variable" with the name of the variable that you want post hoc tests for. In this case we'd set contrast = "dose".
- adjust: what adjustment to apply for the number of tests conducted. By default Holm's adjustment is applied, which we discussed earlier. You can change it to "bonferroni", "tukey", "hochberg", "hommel", "BH" (the Benjamini and Hochberg adjustment discussed earlier), "BY", "fdr" or "none" (to apply no adjustment, which I don't recommend).
- ci: Sets the width of the confidence interval. The default of 0.95 for a 95% interval is consistent with common practice.
Therefore, using the defaults (which are sensible) we could get post hoc tests by executing:
modelbased::estimate_contrasts(puppy_lm, contrast = "dose")
# If you prefer a Bonferroni adjustment then execute:
modelbased::estimate_contrasts(puppy_lm, contrast = "dose", adjust = "bonferroni")
Alien coding challenge
Generate the code to get post hoc tests for the puppy example using a Holm post hoc test. Round the values to 3 decimal places.
modelbased::estimate_contrasts(puppy_lm, contrast = "dose") |>
knitr::kable(digits = 3) #to round values
The first row of the output compares the 15-minute and the 30-minute puppy therapy groups and reveals a non-significant difference (p = 0.130 is greater than 0.05). The duration of puppy therapy didn't significantly affect mean happiness. The second row compares the no puppies group to the 15-minute group and also reveals a non-significant difference (p = 0.282 is greater than 0.05). The final row compares the no puppies group to the 30-minute group where there is a significant difference (p = 0.025 is less than 0.05).
The output also shows us the mean difference in happiness between each pair of groups and the confidence interval for that difference. For example, we can see that comparing the no puppies group to the 30 minute group the difference in mean happiness was -2.80. The direction of the value reflects the order of the groups: because no puppies is listed as Level1 and 30 mins as Level2 the minus sign tells us that happiness was lower in the no puppies group. If we assume that this sample is one of the 95% that yields a confidence interval containing the population value, then this difference in mean happiness might be as large in magnitude as 5.27 (a very large effect given happiness is measured on a 10-point scale) or as small as 0.33 (virtually no difference).
Cohen's d (3)
We can use the effectsize
package (Ben-Shachar, Lüdecke, and Makowski 2020; Ben-Shachar et al. 2022), which we met in discovr_09
, to calculate Cohen's d. Our samples are small (n < 20) so we'll actually use Hedge's g (hedges_g()
), which is a bias corrected version of d for small samples. Because we have three groups, and we want to quantify the effect between all pairs of groups we are going to need to filter the data to include only the groups we want to compare before we pass it into hedges_g()
. So, for each effect size we'd adapt this code
my_tib |>
dplyr::filter(predictor == "first_group" | predictor == "second_group") |>
effectsize::hedges_g(outcome ~ predictor,
data = _,
pooled_sd = TRUE,
paired = FALSE)
and we'd replace my_tib with the name of our tibble (in this case puppy_tib), we'd replace predictor with the predictor variable, in this case dose, and we'd replace first_group and second_group with the values of the levels that we want to compare. Finally, note that within hedges_g()
we assign the output of the pipe to the data argument using data = _ and replace outcome with the name of the outcome variable (happiness) and predictor with the predictor variable, in this case dose.
Code example
Let's adapt this code to compare the control group with the 15-minute therapy group. First let's see what the labels are for these groups by executing
levels(puppy_tib$dose)
## [1] "No puppies" "15 mins" "30 mins"
These values give us the text that we need to use as first_group and second_group in the code. This gives us
puppy_tib |>
dplyr::filter(dose == "No puppies" | dose == "15 mins") |>
effectsize::hedges_g(happiness ~ dose, data = _)
Tip
Remember that if the default options of a function are the ones you want to use then you can omit them. In this case, if we want to use the pooled standard deviation then we can omit the default value of pooled_sd = TRUE and because we do not have paired data we can omit paired = FALSE.
Alien coding challenge
Generate the code to get Cohen's d for all pairs of groups in the puppy example.
# Us ethe sample code to compare the control group with the 15-minute therapy group
puppy_tib |>
dplyr::filter(dose == "No puppies" | dose == "15 mins") |>
effectsize::hedges_g(happiness ~ dose, data = _)
# Now adapt the code to compare the control group with the 30-minute therapy group
# Us ethe sample code to compare the control group with the 15-minute therapy group
puppy_tib |>
dplyr::filter(dose == "No puppies" | dose == "15 mins") |>
effectsize::hedges_g(happiness ~ dose, data = _)
# Now adapt the code to compare the control group with the 30-minute therapy group
puppy_tib |>
dplyr::filter(dose == "No puppies" | dose == "30 mins") |>
effectsize::hedges_g(happiness ~ dose, data = _)
# Now adapt the code to compare the 15- and 30-minute therapy groups
# Us ethe sample code to compare the control group with the 15-minute therapy group
puppy_tib |>
dplyr::filter(dose == "No puppies" | dose == "15 mins") |>
effectsize::hedges_g(happiness ~ dose, data = _)
# Now adapt the code to compare the control group with the 30-minute therapy group
puppy_tib |>
dplyr::filter(dose == "No puppies" | dose == "30 mins") |>
effectsize::hedges_g(happiness ~ dose, data = _)
# Now adapt the code to compare the 15- and 30-minute therapy groups
puppy_tib |>
dplyr::filter(dose == "15 mins" | dose == "30 mins") |>
effectsize::hedges_g(happiness ~ dose, data = _)
#round the values (optional)
puppy_tib |>
dplyr::filter(dose == "No puppies" | dose == "15 mins") |>
effectsize::hedges_g(happiness ~ dose, data = _) |>
knitr::kable(digits = 3)
puppy_tib |>
dplyr::filter(dose == "No puppies" | dose == "30 mins") |>
effectsize::hedges_g(happiness ~ dose, data = _)|>
knitr::kable(digits = 3)
puppy_tib |>
dplyr::filter(dose == "15 mins" | dose == "30 mins") |>
effectsize::hedges_g(happiness ~ dose, data = _)|>
knitr::kable(digits = 3)
Report
Participants were significantly more happy after 30-minutes of puppy therapy compared to no puppies, \(M_{\text{difference}}\) = -2.80 [-5.27, -0.33], t = -3.16, p = 0.025, \(\hat{g}\) = -1.74 [-3.11, -0.31]. The effect size was suspiciously large. There was no significant difference in happiness between those exposed for 15-minutes compared to no puppies, \(M_{\text{difference}}\) = -1.00 [-3.47, 1.47], t = -1.13, p = 0.282, \(\hat{g}\) = -0.69 [-1.84, 0.49] although the effect was large. Also, there was no significant difference in happiness between those exposed for 15-minutes compared to 30-minutes, \(M_{\text{difference}}\) = -1.80 [-4.27, 0.67], t = -2.03, p = 0.130, \(\hat{g}\) = -1.12 [-2.34, 0.15] although the difference was greater than a standard deviation.
Robust models (2)
Heteroscedasticity-consistent methods (2)
Welchs F is a robust variant of the F-statistic which can be obtained using the oneway.test()
function which comes as part of .
Code example
The format of this function is the same as lm()
:
oneway.test(outcome ~ predictor, data = my_data)
Alien coding challenge
Adapt the code example to obtain Welch's F for the puppy data.
oneway.test(happiness ~ dose, data = puppy_tib)
Note that the error degrees of freedom have been adjusted—you should remember this when you report the values. For these data, Welchs F(2, 7.94) = 4.23, p = .054, which is just about non-significant. If we were using this test it would imply that the mean happiness did not differ significantly across different puppy therapy groups.
We can get robust parameter estimates using lmRob()
and robust tests of these parameters using model_parameters()
as described discovr_08 and the book chapter.
Alien coding challenge
Remember that we fit the model in the same way but replacing lm()
with robust::lmRob
. Adapt your earlier code for fitting a non-robust model to create and view an object called puppy_rob.
Tip: Contrasts
If you ran the trend analysis earlier on and are working outside of this tutorial, you will need to re-set the contrast for dose to match the planned comparisons by executing:
contrasts(puppy_tib$dose) <- cbind(puppy_vs_none, short_vs_long)
Within the tutorial you don't need to do this (it has been done for you).
puppy_rob <- robust::lmRob(happiness ~ dose, data = puppy_tib)
summary(puppy_rob)
The bottom of the output shows significance tests of bias. These tests suggest that bias in the original model is not problematic (because the p-value for these tests are not significant) but given we have a tiny sample size (5 per group) you can take these tests with a pinch of salt. More important, the robust parameter estimates are identical to the original ones, the standard errors haven't changed much and the profile of significance is the same (compare with your earlier output). In short, the robust estimates and test confirm the findings from the non-robust model.
Alien coding challenge
Remember from discovr_08 that to get a summary of an existing model like puppy_lm that uses heteroscedasticity-consistent standard errors (i.e. robust significance tests and confidence intervals), we put the model into model_parameters()
and set vcov = "HC4". Try this in the code box:
parameters::model_parameters(puppy_lm, vcov = "HC4") |>
knitr::kable(digits = 3)
When we fit the model with heteroscedasticity-consistent standard errors, the significance tests change but we are again left with a profile of results that show that happiness is significantly lower for the no puppy group than both puppy groups combined, but not significantly different after 30-minutes compared to 15.
Methods based on trimmed means (3)
A different approach is to use these functions from the WRS2 package (Mair and Wilcox 2019):
t1way()
: is a test of trimmed meanslincon()
: post hoc tests for trimmed meanst1waybt()
: a variant oft1way
that also uses a bootstrapmcppb20()
: post hoc tests fort1waybt()
Code example
These functions have a lot of arguments in common, so lets look at all of their general forms:
t1way(outcome ~ predictor, data = my_tibble, tr = 0.2, nboot = 100)
lincon(outcome ~ predictor, data = my_tibble, tr = 0.2)
t1waybt(outcome ~ predictor, data = my_tibble, tr = 0.2, nboot = 599)
mcppb20(outcome ~ predictor, data = my_tibble, tr = 0.2, nboot = 599)
All of the functions take the same input as lm()
, that is you specify the predictor and outcome, and the tibble containing the data. They all have an argument tr that is the proportion of trimming to be done and defaults to 20% (0.2 as a proportion). Some of them also have a nboot argument that controls the number of bootstrap samples. The defaults are fine although I usually increase the number of bootstrap samples to 1000.
Tip
You wouldn't normally conduct all of these tests; you'd either do t1way()
with lincon()
or t1waybt()
with mcppb20()
.
Alien coding challenge
Adapt the sample code to fit a robust model of 20% trimmed means with post hoc tests to the puppy data.
WRS2::t1way(happiness ~ dose, data = puppy_tib, nboot = 1000)
WRS2::lincon(happiness ~ dose, data = puppy_tib)
The overall test suggests a non-significant difference between trimmed means across the therapy groups, \(F_t\) (2, 4) = 3, p = .16. The post hoc tests (which we should ignore anyway because the overall effect is not significant) show no significant differences between pairs of trimmed means. The value of psyhat (\(\hat{\psi}\)) and its confidence intervals reflect the difference between trimmed means. Note that the confidence intervals are corrected for the number of tests, but the p-values are not. As such, we should ascertain significance from whether or not the confidence intervals cross zero. In this case they all do, which implies that none of the groups are significantly different. This is different to what we found when we did not trim the means (see the earlier model).
Bayes factors (3)
Like in previous tutorials (discovr_08, discovr_09) we can use the BayesFactor
package (Morey and Rouder 2022). In this scenario we use the anovaBF()
function.
Code example
the anovaBF()
function has basically the same format as most of the other functions in this tutorial:
my_model <- BayesFactor::anovaBF(formula = outcome ~ predictor,
data = my_tib,
rscaleFixed = "medium",
whichModels = "withmain")
The function uses default priors that can be specified as a number or as "medium" (the default), "wide", and "ultrawide". These labels correspond to r scale values of 1/2, \(^\sqrt{2}/_2\), and 1.
Alien coding challenge
Adapt the sample code to obtain a BayesFactor for the puppy data using the default prior.
puppy_bf <- BayesFactor::anovaBF(formula = happiness ~ dose, data = puppy_tib, rscaleFixed = "medium")
puppy_bf
The Bayes factor compares the full model (predicting happiness from dose and the intercept) to the null model (predicting happiness from only the intercept). It, therefore, quantifies the overall effect of dose. The Bayes Factor is 3.07, which means that the data are 3.07 times more likely under the alternative hypothesis (dose of puppy therapy has an effect) than under the null (dose of puppy therapy has no effect). This value is not strong evidence, but nevertheless suggests we should shift our belief about puppy therapy towards it being effective by a factor of about 3.
Transfer task (2)
There goes my hero ... watch him as he goes (to hospital)
Children wearing superhero costumes are more likely to harm themselves because of the unrealistic impression of invincibility that these costumes could create. For example, children have reported to hospital with severe injuries because of trying to initiate flight without having planned for landing strategies (Davies et al. 2007). I can relate to the imagined power that a costume bestows upon you; indeed, I have been known to dress up as Fisher by donning a beard and glasses and trailing a goat around on a lead in the hope that it might make me more knowledgeable about statistics. Imagine we had data (hero_tib) about the severity of injury (on a scale from 0, no injury, to 100, death) for children reporting to the accident and emergency department at hospitals, and information on which superhero costume they were wearing (hero): Spiderman, Superman, the Hulk or a teenage mutant ninja turtle. Fit a model with planned contrasts to test the hypothesis that those wearing costumes of flying superheroes (Superman and Spiderman) have more severe injuries.
Tip
There is a detailed solution to this task at https://www.discovr.rocks/solutions/alex/alex_11/#task-113.
To get you into the mood for hulk-related data analysis, Figure 3 shows my wife and I on the Hulk rollercoaster at the islands of adventure in Orlando, on our honeymoon.

Not in the mood yet? OK, let's ramp up the 'weird' with a photo of some complete strangers reading one of my textbooks on the same Hulk roller-coaster that my wife and I rode (not at the same time as the book readers I should add). Nothing says 'I love your textbook' like taking it on a stomach-churning high speed ride. I dearly wish that reading my books on roller coasters would become a 'thing'.

To get you started ... to test our main hypotheses we need to first enter the codes for the contrasts in Table 3. The first contrast tests the main prediction but the other two are necessary to break down the variation in injury severity completely. Contrast 1 tests hypothesis 1: flying superhero costumes (Superman and Spiderman) will lead to more severe injuries. In the table, the numbers assigned to the groups are the number of groups in the opposite chunk divided by the number of groups that have non-zero codes, and we randomly assigned one chunk to be a negative value. [Note that the number of groups in the opposite chunk divided by the number of groups that have non-zero codes will give you a weight of magnitude \(\frac{2}{4}\), but this value is the same as \(\frac{1}{2}\)].
Group | Flying vs. non | Superman vs. Spiderman | Hulk vs. Ninja turtle |
---|---|---|---|
Superman | 1/2 | 1/2 | 0 |
Spiderman | 1/2 | -1/2 | 0 |
Hulk | -1/2 | 0 | 1/2 |
Ninja Turtle | -1/2 | 0 | -1/2 |
Alien coding challenge
Use the code box to set the contrasts and fit the model.
# Start by defining the contrasts:
flying_vs_not <- c(x, x, x, x)
super_vs_spider <- c(x, x, x, x)
hulk_vs_ninja <- c(x, x, x, x)
# Now assign these variables to the contrast attached to the variable hero
# Assign these variables to the contrast attached to the variable hero
flying_vs_not <- c(1/2, 1/2, -1/2, -1/2)
super_vs_spider <- c(1/2, -1/2, 0, 0)
hulk_vs_ninja <- c(0, 0, 1/2, -1/2)
contrasts(hero_tib$hero) <- cbind(flying_vs_not, super_vs_spider, hulk_vs_ninja)
# Now fit the model with lm():
hero_lm <- lm(xxxxxx)
# Fit the model with lm()
hero_lm <- lm(injury ~ hero, data = hero_tib)
# Now print the summary table using anova() and parameters():
anova(xxxx) |>
parameters::parameters(xxxxxx)
# print the summary table using anova() and parameters()
anova(hero_lm) |>
parameters::parameters(omega_squared = "raw")
# Next, print the parameter estimates using tidy()
# Optionally, round the values to 3 decimal places
# Next, print the parameter estimates using tidy()
# Optionally, round the values to 3 decimal places
broom::tidy(hero_lm, conf.int = TRUE) |>
dplyr::mutate(
dplyr::across(where(is.numeric), ~round(3))
)
# Put it all together:
flying_vs_not <- c(1/2, 1/2, -1/2, -1/2)
super_vs_spider <- c(1/2, -1/2, 0, 0)
hulk_vs_ninja <- c(0, 0, 1/2, -1/2)
contrasts(hero_tib$hero) <- cbind(flying_vs_not, super_vs_spider, hulk_vs_ninja)
hero_lm <- lm(injury ~ hero, data = hero_tib)
anova(hero_lm) |>
parameters::parameters(omega_squared = "raw") |>
knitr::kable(digits = 3)
broom::tidy(hero_lm, conf.int = TRUE) |>
knitr::kable(digits = 3)
Alien coding challenge
Use the code box to get a robust F-statistic and robust tests of the parameter estimates.
oneway.test(injury ~ hero, data = hero_tib)
parameters::model_parameters(hero_lm, vcov = "HC3") |>
knitr::kable(digits = 3)
A message from Mae Jemstone:
Sometimes we want to compare groups or categories. I remember a few years back the directors of the space pirate academy wanted to see whether our students were happier with our courses than students at other pirate academies. We didn't really need to test this because in the dead of night I whoosh! through space dropping test tubes of raw borglesnark (the smelliest plant in the galaxy) onto the other pirate academies to make sure their students are miserable. I digress. Anyway, you now know how to compare group means using the linear model. Lucky for us the process is very similar to fitting any other linear model, so we have had some useful practice of the skills we have acquired up to now. The main additional consideration is defining contrasts. That's tricky, but you seem to have risen to the challenge. Brilliant stuff! I'm off to cultivate some borglesnark.
Resources
Statistics
- The tutorials typically follow examples described in detail in Field (2023). That book covers the theoretical side of the statistical models, and has more depth on conducting and interpreting the models in these tutorials.
- If any of the statistical content doesn't make sense, you could try my more introductory book An adventure in statistics (Field 2016).
- There are free lectures and screencasts on my YouTube channel.
- There are free statistical resources on my websites www.discoveringstatistics.com and milton-the-cat.rocks.
- R for data science by Wickham and Grolemund (2017) is an open-access book by the creator of the tidyverse (Hadley Wickham). It covers the tidyverse and data management.
- ModernDive is an open-access textbook on and
.
cheat sheets.
list of online resources.
Acknowledgement
I'm extremely grateful to Allison Horst for her very informative blog post on styling learnr tutorials with CSS and also for sending me a CSS template file and allowing me to adapt it. Without Allison, these tutorials would look a lot worse (but she can't be blamed for my colour scheme).
References
Ben-Shachar, Mattan S., Daniel Lüdecke, and Dominique Makowski. 2020. “effectsize: Estimation of Effect Size Indices and Standardized Parameters.” Journal of Open Source Software 5 (56). The Open Journal: 2815. doi:10.21105/joss.02815.
Ben-Shachar, Mattan S., Dominique Makowski, Daniel Lüdecke, Indrajeet Patil, and Brenton M. Wiernik. 2022. Effectsize: Indices of Effect Size and Standardized Parameters. https://easystats.github.io/effectsize/.
Field, Andy P. 2016. An Adventure in Statistics: The Reality Enigma. London: Sage.
———. 2023. Discovering Statistics Using R and RStudio. Second. London: Sage.
Harrell, Frank E, Jr. 2022. Hmisc: Harrell Miscellaneous. https://hbiostat.org/R/Hmisc/.
Horikoshi, Masaaki, and Yuan Tang. 2022. Ggfortify: Data Visualization Tools for Statistical Analysis Results. https://github.com/sinhrks/ggfortify.
Lenth, Russell V. 2022. Emmeans: Estimated Marginal Means, Aka Least-Squares Means. https://github.com/rvlenth/emmeans.
Lüdecke, Daniel, Mattan S. Ben-Shachar, Indrajeet Patil, and Dominique Makowski. 2020. “Extracting, Computing and Exploring the Parameters of Statistical Models Using R.” Journal of Open Source Software 5 (53): 2445. doi:10.21105/joss.02445.
Lüdecke, Daniel, Dominique Makowski, Mattan S. Ben-Shachar, Indrajeet Patil, Søren Højsgaard, and Brenton M. Wiernik. 2022. Parameters: Processing of Model Parameters. https://easystats.github.io/parameters/.
Mair, Patrick, and Rand Wilcox. 2019. “Robust Statistical Methods in R Using the Wrs2 Package.” Behavior Research Methods, May. doi:10.3758/s13428-019-01246-w.
Makowski, Dominique, Daniel Lüdecke, Mattan S. Ben-Shachar, and Indrajeet Patil. 2022. Modelbased: Estimation of Model-Based Predictions, Contrasts and Means. https://easystats.github.io/modelbased/.
Morey, Richard D., and Jeffrey N. Rouder. 2022. BayesFactor: Computation of Bayes Factors for Common Designs. https://richarddmorey.github.io/BayesFactor/.
Müller, Kirill. 2020. Here: A Simpler Way to Find Your Files. https://CRAN.R-project.org/package=here.
Robinson, David, Alex Hayes, and Simon Couch. 2022. Broom: Convert Statistical Objects into Tidy Tibbles. https://CRAN.R-project.org/package=broom.
Tang, Yuan, Masaaki Horikoshi, and Wenxuan Li. 2016. “Ggfortify: Unified Interface to Visualize Statistical Result of Popular R Packages.” The R Journal 8 (2): 474–85. doi:10.32614/RJ-2016-060.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. 2nd ed. New York, NY: Springer.
———. 2021. Forcats: Tools for Working with Categorical Variables (Factors). https://CRAN.R-project.org/package=forcats.
———. 2022. Tidyverse: Easily Install and Load the Tidyverse. https://CRAN.R-project.org/package=tidyverse.
Wickham, Hadley, and G. Grolemund. 2017. R for Data Science. Sebastopol, CA: O’Reilly.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. doi:10.21105/joss.01686.
Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2022. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.
Wickham, Hadley, Jim Hester, and Jennifer Bryan. 2022. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.
Xie, Yihui. 2022. Knitr: A General-Purpose Package for Dynamic Report Generation in R. https://yihui.org/knitr/.
Zeileis, Achim. 2006. “Object-Oriented Computation of Sandwich Estimators.” Journal of Statistical Software 16 (9): 1–16. doi:10.18637/jss.v016.i09.
Zeileis, Achim, and Thomas Lumley. 2022. Sandwich: Robust Covariance Matrix Estimators. https://sandwich.R-Forge.R-project.org/.