GLM 6: covariates

# The general linear model: comparing means adjusted for other predictors (ANCOVA)

## Overview

This tutorial is one of a series that accompanies Discovering Statistics Using IBM SPSS Statistics (Field 2017) by me, Andy Field. These tutorials contain abridged sections from the book (so there are some copyright considerations).1

• Who is the tutorial aimed at?
• Students enrolled on my Discovering Statistics module at the University of Sussex, or anyone reading my textbook Discovering Statistics Using IBM SPSS Statistics (Field 2017)
• What is covered?
• This tutorial develops the material from the previous tutorial to look at comparing means adjusted for other predictors using IBM SPSS Statistics. This use of the GLM is often referred to as analysis of covariance (ANCOVA).
• This tutorial does not teach the background theory: it is assumed you have either attended my lecture or read the relevant chapter in my book (or someone else’s)
• The aim of this tutorial is to augment the theory that you already know by guiding you through fitting linear models using IBM SPSS Statistics and asking you questions to test your knowledge along the way.
• The main tutorial follows the example described in detail in Field (2017), so there’s a thorough account in there.
• You can access free lectures and screencasts on my YouTube channel
• There are more statistical resources on my website www.discoveringstatistics.com

## Puppy love

The main example in this tutorial is from Field (2017) and extends the example about puppy therapy from previous tutorials. On the assumption that you can’t have enough puppies in your life, here’s another picture of my dog looking cute to help you to deal with the psychological trauma of this statistics tutorial.

The previous tutorials focussed on an example in which a researcher tested the efficacy of puppy therapy by exposing different groups of randomly-assigned people to (1) a control group; (2) 15 minutes of puppy therapy (a low-dose group); and (3) 30 minutes of puppy contact (a high-dose group). The dependent variable was a measure of happiness ranging from 0 (as unhappy as I can possibly imagine being) to 10 (as happy as I can possibly imagine being) - see Hoagwood et al. (2017). Let’s think about things other than puppy therapy that might influence happiness ratings in this experiment. The obvious one is how much you like dogs: a dog phobic is going to be about as happy after puppy therapy as I would be after tarantula therapy. If we measure such variables we can adjust the means in the different therapy groups for the influence they have on the outcome variable by including them in the linear model. This scenario is often referred to as analysis of covariance (ANCOVA), and the variables that you adjust for are called covariates.

The researchers who conducted the puppy therapy study in the previous tutorials realized that a participant’s love of dogs would affect whether puppy therapy would affect happiness. Therefore, they repeated the study on different participants, but included a self-report measure of love of puppies from 0 (I am a weird person who hates puppies, please be deeply suspicious of me) to 7 (puppies are the best thing ever, one day I might marry one). The data are in the file puppy_ love.sav, which contains the variables Dose (1 = control, 2 = 15 minutes, 3 = 30 minutes), Happiness (the person’s happiness on a scale from 0-10), and Puppy_love (the love of puppies from 0 to 7) - Figure 2.

## The model

As a reminder, the model we fitted the previous tutorials (in which we just compared means across the three groups without adjusting for the love of puppies) involved predicting happiness from two dummy variables that represented the difference between the 15-minute group and the control (Short) and the 30-minute group and the control (Long):

$\text{Happiness}_i = b_0 + b_1\text{Long}_i+ b_2\text{Short}_i + ε_i$

If we want to adjust the group means using a covariate then we simply extend the model to include this covariate. Our covariate is Puppy love so we add this into the model and assign it a parameter ($$b_3$$). The model we’re fitting is, therefore:

$\text{Happiness}_i = b_0 + b_1\text{Long}_i+ b_2\text{Short}_i + b_3\text{Puppy love}_i + ε_i$

## Fitting the model using SPSS Statistics

We could create dummy variables for the Dose variable and fit the model through the Analyze > Regression > Linear… menu in SPSS Statistics, but it is more common to use the Analyze > General Linear Model > Univariate… menu, which has the benefit that the dummy coding is done automatically. The following video shows how to fit the model using SPSS Statistics.

## Interpreting ANCOVA

### Interpreting the main effect of Dose

If, unlike me, you selected the option for homogeneity tests then your output will contain Levene’s test. For various reasons explained in my book (and lectures) I would ignore this test. (Ideally you’d inspect residuals from the model as covered in an earlier tutorial.) The covariate (Puppy_love) significantly predicts happiness because p = 0.035, which is less than the (typical) criterion of 0.05. Therefore, the person’s happiness is significantly influenced by their love of puppies. The effect of Dose tells us that when the means are adjusted for the love of puppies, the effect of puppy therapy is significant (p = 0.027, which is less than 0.05).

Quiz

To interpret the main effect of Dose we look at the adjusted values of the group means (Output 2). From these adjusted means you can see that happiness increased across the three doses.

Quiz

Output 3 shows the contrasts that we specified. the first compares level 2 (15 minutes) against level 1 (control), and the second compared level 3 (30 minutes) against level 1 (control). The group differences are displayed: a difference value, standard error, significance value and 95% confidence interval. These results show that both the 15-minute group (contrast 1, p = 0.045) and 30-minute group (contrast 2, p = 0.010) had significantly different happiness compared to the control group.

Output 4 shows the Sidak corrected post hoc comparisons. The bottom table shows the bootstrapped significance and confidence intervals for these tests and because these will be robust we’ll interpret this table (your values will differ because of how bootstrapping works). There is a significant difference between the control group and both the 15- (p = 0.003) and 30-minute (p = 0.021) groups. The 30- and 15-minute groups did not significantly differ (p = 0.558). It is interesting that the significant difference between the 15-minute and control groups when bootstrapped (p = 0.003) is not present for the normal post hoc tests (p = 0.130). This anomaly could reflect properties of the data that have biased the non-robust version of the post hoc test.

### Interpreting the covariate

To interpret the significant effect of the love of puppies you should plot a scatterplot of love of puppies (X-axis) against happiness (Y-axis). I’ve produced such a plot below (and you can try to produce one yourself) - how would you interpret this plot?

Quiz

## Homogeneity of regression slopes

To test the assumption of homogeneity of regression slopes (i.e. that the relationship between the covariate and outcome variable is consistent across levels of the categorical predictor) we need to re-fit the model but include the interaction between the covariate and predictor variable (the Dose × Puppy_love interaction). This video explains how:

Output 5 shows the main summary table for the model including the interaction term. The effects of the dose of puppy therapy and love of puppies are still significant, but so is the covariate by outcome interaction (Dose × Puppy_love), implying that the assumption of homogeneity of regression slopes is not realistic (p = 0.028), which raises concerns about the main analysis.

Quiz

## Unguided example

Let’s look at a second example from Field (2017). A marketing manager tested the benefit of soft drinks for curing hangovers. He took 15 people and got them drunk. The next morning as they awoke, dehydrated and feeling as though they’d licked a camel’s sandy feet clean with their tongue, he gave five of them water to drink, five of them Lucozade (a glucose-based drink) and the remaining five a leading brand of cola (this variable is called drink). He measured how well they felt (on a scale from 0 = I feel like death to 10 = I feel really full of beans and healthy) two hours later (this variable is called well). He measured how drunk the person got the night before on a scale of 0 = as sober as a member of the straight edge movement to 10 = flapping about on the floor like a haddock out of water. The data are in the file hangover_cure.sav, which contains the variables drink (1 = water, 2 = Lucozade, 3 = Cola), well (how well the person felt from 0-10), and drunk (how drunk the person got the night before from 0 to 10). Fit a model to see whether people felt better after different drinks when adjusting for how drunk they were the night before. Use contrasts that compare each group to the control (water).

Quiz

## Next tutorial

The next tutorial will look at analysing factorial designs using the general linear model.

## References

Field, Andy P. 2017. Discovering Statistics Using IBM SPSS Statistics: And Sex and Drugs and Rock ’N’ Roll. Book. 5th ed. London: Sage.

Hoagwood, K. E., M. Acri, M. Morrissey, and R. Peth-Pierce. 2017. “Animal-Assisted Therapies for Youth with or at Risk for Mental Health Problems: A Systematic Review.” Journal Article. Applied Developmental Science 21 (1): 1–13. doi:10.1080/10888691.2015.1134267. <Go to ISI>://WOS:000394970700001.