This tutorial is one of a series that accompanies *Discovering Statistics Using IBM SPSS Statistics* (Field 2017) by me, Andy Field. These tutorials contain abridged sections from the book (so there are some copyright considerations).^{1}

- Who is the tutorial aimed at?
- Students enrolled on my
*Discovering Statistics*module at the University of Sussex, or anyone reading my textbook*Discovering Statistics Using IBM SPSS Statistics*(Field 2017)

- Students enrolled on my
- What is covered?
- This tutorial develops the material from the previous tutorial to look at using categorical predictors in the linear model using
*IBM SPSS Statistics*. We will look at dummy coding and the linear model as applied to independent experimental designs (i.e., experiments in which different entities participate in different experimental conditions. - This tutorial
*does not*teach the background theory: it is assumed you have either attended my lecture or read the relevant chapter in my book (or someone else’s) - The aim of this tutorial is to augment the theory that you already know by guiding you through fitting linear models using IBM SPSS Statistics and asking you questions to test your knowledge along the way.

- This tutorial develops the material from the previous tutorial to look at using categorical predictors in the linear model using
- Want more information?
- The main tutorial follows the example described in detail in Field (2017), so there’s a thorough account in there.
- You can access free lectures and screencasts on my YouTube channel
- There are more statistical resources on my website www.discoveringstatistics.com

The main example in this tutorial is from Field (2017) and is about puppies. Here’s my dog, Milton, when he was a puppy:

If that doesn’t make you feel better about this tutorial then nothing will because lots of people believe that puppy therapy is good for stress, and that’s what this example is all about. Puppy therapy is a form of animal-assisted therapy, in which puppy contact is introduced into the therapeutic process. Despite a common belief that puppy therapy is effective in reducing stress, the evidence base is pretty mixed. A review of animal-assisted therapy in childhood mental health found that of 24 studies, 8 found positive effects, 10 showed mixed findings, and 6 concluded that there was no effect (Hoagwood et al. 2017). Imagine we ran a study in which we randomized people into three groups: (1) a control group (this could be a treatment as usual, a no treatment or ideally some kind of placebo group – for example, if our hypothesis was specifically about puppies we could give people in this group a cat disguised as a dog); (2) 15 minutes of puppy therapy (a low-dose group); and (3) 30 minutes of puppy contact (a high-dose group). The dependent variable was a measure of happiness ranging from 0 (as unhappy as I can possibly imagine being) to 10 (as happy as I can possibly imagine being). The design of this study mimics a very simple randomized controlled trial (as used in pharmacological, medical and psychological intervention trials) because people are randomized into a control group or groups containing the active intervention (in this case puppies, but in other cases a drug or a surgical procedure). We’d predict that any form of puppy therapy should be better than the control (i.e. higher happiness scores) but also formulate a dose-response hypothesis that as exposure time increases (from 0 minutes to 15 and 30) happiness will increase too. The data are in the file puppies_dummy.sav and are shown in Figure 2.

The data editor has five variables/columns:

**Person**: an ID number**Dose**: codes the group to which the individual belonged (I have coded 1 = control, 2 = 15 minutes and 3 = 30 minutes)**Happiness**: the person’s happiness score (1-10)**dummy1**: the first of two dummy variables that codes the three groups using zeros and 1s. This variable has codes 1 = 30 mins, 0 = everything else. It represents the 30-minute group compared to the control.**dummy2**: the second of two dummy variables that codes the three groups using zeros and 1s. This variable has codes 1 = 15 mins, 0 = everything else. It represents the 15-minute group compared to the control.

We can include categorical predictors using dummy coding (there are other forms of coding two, for example, contrast coding, which we’ll cover in the next tutorial). One source of confusion is that SPSS has different menu structures when the goal of the linear model is to compare means. This can create the impression that when you fit a model to compare means it is a different model to the one you fit when looking at predicting an outcome from continuous variables.

As revision from the lecture/chapter, the model we’re fitting is:

\[
\text{Happiness}_i = b_0 + b_1\text{Long}_i+ b_2\text{Short}_i + ε_i
\] In which the variables **Long** and **Short** are the dummy variables named **dummy1** and **dummy2** in the data file. The first variable (**Long**) represents the difference between the 30-minute group and the control group, whereas the second variable (**Short**) represents the difference between the 15-minute group and the control group.

Quiz

The following video shows how to fit the model using *SPSS Statistics*.

The first part of the output (Output 2) shows the descriptive statistics for each group. I have also plotted the means and 95% confidence intervals in Figure 3. The line that joins the means seems to indicate a linear trend in that, as the dose of puppy therapy increases, so does the mean level of happiness.

Output 3 shows the main ANOVA summary table. The table is divided into between-group effects (effects due to the model – the experimental effect) and within-group effects (this is the unsystematic variation in the data).

Quiz

Output 4 shows the Welch and the Brown-Forsythe F-statistics. If you’re interested in how these values are calculated then look at my SPSS book (note that the error degrees of freedom have been adjusted and you should remember this when you report the values). Based on whether the observed *p* is less than 0.05, the Welch *F* yields a non-significant result (*p* = 0.054) whereas Brown-Forsythe is significant (*p* = 0.026). This is confusing, but only if you like to imbue the 0.05 threshold with magical powers and engage in black-and-white thinking of the sort that people who use p-values so often do.

A useful measure of effect size is Cohen’s *d*, which is the difference between two means divided by some estimate of the standard deviation of those means:

\[ \hat{d} = \frac{\bar{X_1}-\bar{X_2}}{s} \]

I have put a hat on the *d* to remind us that we’re really interested in the effect size in the population, but because we can’t measure that directly, we estimate it from the samples (The hat means ‘estimate of’). By dividing by the standard deviation we are expressing the difference in means in standard deviation units (a bit like a *z*–score). The standard deviation is a measure of ‘error’ or ‘noise’ in the data, so *d* is effectively a signal-to-noise ratio. However, if we’re using two means, then there will be a standard deviation associated with each of them so which one should we use? There are three choices:

- If one of the group is a control group it makes sense to use that groups standard deviation to compute d (the argument being that the experimental manipulation might affect the standard deviation of the experimental group, so the control group SD is a ‘purer’ measure of natural variation in scores)
- Sometimes we assume that group variances (and therefore standard deviations) are equal (homogeneity of variance) and if they are we can pick a standard deviation from either of the groups because it won’t matter.
- We use what’s known as a ‘pooled estimate’, which is the weighted average of the two group variances. This is given by the following equation:

\[ s_p = \sqrt{\frac{(N_1-1) s_1^2+(N_2-1) s_2^2}{N_1+N_2-2}} \]

Say we wanted to estimate *d* for the difference between the 30-minute group and the control, Output 1 shows us the means, sample size and standard deviation for the groups:

- Control:
*M*= 2.2,*N*= 5,*s*= 1.304, \(s^2\) = 1.7 - 15-mins:
*M*= 3.2,*N*= 5,*s*= 1.304, \(s^2\) = 1.7 - 30-mins:
*M*= 5.0,*N*= 5,*s*= 1.581, \(s^2\) = 2.5

We have a logical control group so *d* is simply:

\[ \hat{d}_\text{30 mins vs control} = \frac{5.0-2.2}{1.304} = 2.15 \]

Calculate *d* for the comparison between the 15-minute group and control, and the 30-minute group compared to the 15-minute group.

For the comparison between the 15-minute group and control, we can again use the control group standard deviation:

\[ \hat{d}_\text{15 mins vs control} = \frac{3.2-2.2}{1.304} = 0.77 \] For the comparison between the 15- and 30-minute groups, there is no obvious control so we could use the pooled estimate:

\[
\begin{aligned}
\ s_p &= \sqrt{\frac{(N_1-1) s_1^2+(N_2-1) s_2^2}{N_1+N_2-2}} \\
\ &= \sqrt{\frac{(5-1)2.5+(5-1)1.7}{5+5-2}} \\
\ &= \sqrt{\frac{16.8}{8}} \\
\ &= \sqrt{2.1} \\
\ &= 1.449
\end{aligned}
\] which gives a *d* of:

\[ \hat{d}_\text{30 mins vs 15 mins} = \frac{5.0-3.2}{1.45} = 1.24 \]

Let’s look at a second example from Field (2017) (Smart Alex task). Children wearing superhero costumes are more likely to harm themselves because of the unrealistic impression of invincibility that these costumes could create. For example, children have reported to hospital with severe injuries because of trying ‘to initiate flight without having planned for landing strategies’ (Davies et al. 2007). I can relate to the imagined power that a costume bestows upon you; indeed, I have been known to dress up as Fisher by donning a beard and glasses and trailing a goat around on a lead in the hope that it might make me more knowledgeable about statistics. Imagine we had data about the severity of **injury** (on a scale from 0, no injury, to 100, death) for children reporting to the accident and emergency department at hospitals, and information on which superhero costume they were wearing (**hero**): Superman ( = 1), Spiderman (= 2), the Hulk (= 3) or a teenage mutant ninja turtle (= 4). Fit a model with planned contrasts to test the hypothesis that different costumes give rise to more severe injuries.

To get you into the mood for hulk-related data analysis, here is a photo of my wife and I on the Hulk roller-coaster in Florida on our honeymoon (we also went on the Spiderman ride, which was *incredible*). You may well resort to similar facial expressions during this example:

There are detailed answers to this task on the companion website of my SPSS textbook.

Quiz

Say we wanted to estimate *d* for the effect of Superman costumes compared to Ninja Turtle costumes. The means, sample size and standard deviation for these two groups are (these should be in your SPSS output):

- Superman:
*M*= 60.33,*N*= 6,*s*= 17.85, \(s^2\) = 318.62 - Ninja Turtle:
*M*= 26.25,*N*= 8,*s*= 8.16, \(s^2\) = 66.50

Neither group is a natural control (you would need a ‘no costume’ condition really), but if we decided that Ninja Turtle (for some reason) was a control (perhaps because Turtles don’t fly but supermen do) then *d* is simply:

\[ \hat{d}_\text{superman vs ninja} = \frac{60.33-26.25}{8.16} = 4.18 \]

In other words, the mean injury severity for people wearing superman costumes is 4 standard deviations greater than for those wearing Ninja Turtle costumes. This is a pretty huge effect. Let’s have a look at using the pooled estimate.

\[ \begin{aligned} \ s_p &= \sqrt{\frac{(N_1-1) s_1^2+(N_2-1) s_2^2}{N_1+N_2-2}} \\ \ &= \sqrt{\frac{(6-1)318.62+(8-1)66.50}{6+8-2}} \\ \ &= \sqrt{\frac{2058.6}{12}} \\ \ &= 13.10 \end{aligned} \]

When the group standard deviations are different, this pooled estimate can be useful; however, it changes the meaning of *d* because we’re now comparing the difference between means against all of the background ‘noise’ in the measure, not just the noise that you would expect to find in normal circumstances. Using this estimate of the standard deviation we get:

\[ \hat{d}_\text{superman vs ninja} = \frac{60.33-26.25}{13.10} = 2.60 \]

Notice that *d* is smaller now; the injury severity for Superman costumes is about 2 standard deviations greater than for Ninja Turtle Costumes (which is still pretty big). Compute Cohen’s *d* for the effect of Superman costumes on injury severity compared to Hulk and Spiderman costumes. Try using both the standard deviation of the control (the non-Superman costume) and also the pooled estimate.

Comparison | Mean_1 | Mean_2 | d_control | d_pooled |
---|---|---|---|---|

Superman v Spiderman | 60.33 | 41.63 | 1.53 | 1.26 |

Superman v Hulk | 60.33 | 35.38 | 1.86 | 1.62 |

Superman v Ninja | 60.33 | 26.25 | 4.18 | 2.60 |

Spiderman v Hulk | 41.63 | 35.38 | 0.47 | 0.49 |

Spiderman v Ninja | 41.63 | 26.25 | 1.88 | 1.48 |

Hulk v Ninja | 35.38 | 26.25 | 1.12 | 0.82 |

The next tutorial will follow up both of the examples in this tutorial to look at how to pick apart the differences in group means using contrast coding and *post hoc* tests.

Davies, P., J. Surridge, L. Hole, and L. Munro-Davies. 2007. “Superhero-Related Injuries in Paediatrics: A Case Series.” Journal Article. *Archives of Disease in Childhood* 92 (3): 242–43. doi:10.1136/adc.2006.109793.

Field, Andy P. 2017. *Discovering Statistics Using Ibm Spss Statistics: And Sex and Drugs and Rock ’N’ Roll*. Book. 5th ed. London: Sage.

Hoagwood, K. E., M. Acri, M. Morrissey, and R. Peth-Pierce. 2017. “Animal-Assisted Therapies for Youth with or at Risk for Mental Health Problems: A Systematic Review.” Journal Article. *Applied Developmental Science* 21 (1): 1–13. doi:10.1080/10888691.2015.1134267.

This tutorial is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, basically you can use it for teaching and non-profit activities but not meddle with it.↩