GLM 10: categorical outcomes

The general linear model: categorical outcomes

Overview

This tutorial is one of a series that accompanies Discovering Statistics Using IBM SPSS Statistics (Field 2017) by me, Andy Field. These tutorials contain abridged sections from the book (so there are some copyright considerations).1

  • Who is the tutorial aimed at?
    • Students enrolled on my Discovering Statistics module at the University of Sussex, or anyone reading my textbook Discovering Statistics Using IBM SPSS Statistics (Field 2017)
  • What is covered?
    • This tutorial looks at situations where we want to predict a binary outcome (i.e., an outcome consisting of two categories).
    • This tutorial does not teach the background theory: it is assumed you have either attended my lecture or read the relevant chapter in my book (or someone else’s)
    • The aim of this tutorial is to augment the theory that you already know by guiding you through fitting linear models using IBM SPSS Statistics and asking you questions to test your knowledge along the way.
  • Want more information?
    • The main tutorial follows a lecture delivered at Christmas and so has a Christmas theme rather than following the example in Field (2017), but that chapter will flesh things out for you.
    • You can access free lectures and screencasts on my YouTube channel
    • There are more statistical resources on my website www.discoveringstatistics.com

A Christmas disaster

Let’s begin with a Christmas tale. A year ago Santa was resting in his workshop studying his nice and naughty lists. He noticed a name on the naughty list in bold, upper case letters. It said ANDY FIELD OF UNIVERSITY OF SUSSEX. He went to look up the file of this Andy Field character. He stared into his snow globe, and as the mists cleared he saw a sad, lonely, friend-less character walking across campus. Under one arm a box of chocolates, under the other a small pink Hippo. As he walked the campus he enticed the young girls and boys around him to follow him by offering chocolate. Like the Pied Piper, he led them to a large hall. Once inside, the boys and girls’ eyes glistened in anticipation of more chocolate. Instead he unleashed a monologue about the general linear model of such fearsome tedium that Santa began to wonder how anyone could have grown to be so soulless and cruel.

Santa dusted off his sleigh and whizzed through the night sky to the Sussex campus. Once there he confronted the evil fiend that he had seen in his globe. “You’ve been a naughty boy,” he said. “I give you a choice. Give up teaching statistics, or I will be forced to let the Krampus pay you a visit.”

Andy looked sad, “But I love statistics,” he said to Santa, “It’s cool.”

Santa pulled out a candy cane, from it emerged a screen. Just as he was about to instruct the screen to call the Krampus, an incoming message appeared:

What was Santa to do? How could he find out what determines whether presents get delivered or not? He panicked. Just then, Santa heard a sad little voice. It said, “I can help you”. “How? replied Santa.”My students," he replied, “they can save Christmas. All they need are some data.”

With that, Santa looked into his candy screen at the elves who had called him, and turned to Andy. “Tell them what you need.”

Andy discovered that to deliver presents Santa uses a large team of elves, and that at each house they usually consume treats. The treats might be Christmas pudding, or sometimes mulled wine. He also discovered that they consume different quantities. Sometimes nothing is left, but other times there might be 1, 2, 3 or even 4 pieces of pudding or glasses of mulled wine. The Elves transmitted a log file of 400 of the previous year’s deliveries. It was called santas_log.sav and you can see it in Figure 1. It contains the following variables:

  • id: Name of the elf doing the delivery
  • quantity: How many treats the elf ate before attempting the delivery
  • treat: which kind of treats were consumed (Christmas pudding or Mulled wine)
  • delivered: were the presents delivered (Delivered or not delivered)
Figure 1: santas_log.sav

Figure 1: santas_log.sav

The model

The model we’re fitting is described by the following equation:

\[ \begin{aligned} p(\text{delivery}_i) &= \frac{1}{1 + e^{-(b_0 + b_1\text{treat}_i + b_2\text{quantity}_i + b_3\text{treat}\times\text{quantity}_i)}} \\ \end{aligned} \]

In other words we can predict the probability of presents being delivered from the type of treat consumed, the quantity consumed, and the interaction of those two variables. We can re-arrange this equation to express it differently. It’s exactly the same equation, just re-arranged:

\[ \begin{aligned} \ln\bigg(\frac{P(\text{delivery}_i)}{1-P(\text{delivery}_i)}\bigg) &= b_0 + b_1\text{treat}_{i} + b_2\text{quantity}_i + b_3\text{treat}\times\text{quantity}_i \\ \end{aligned} \]

In other words we predict the log odds of delivery from the standard equation for a linear model.

Odds and odds ratios

Odds

Table 1 shows the number of successful and unsuccessful deliveries split by whether pudding or mulled wine was consumed (we’re ignoring quantity to keep things simple). The odds of an event (e.g. delivery) is the ratio of the number of times an event occurs compared to the number of times it doesn’t occur:

\[ \begin{aligned} \text{odds} &= \frac{\text{number of times event occurs}}{\text{number of times even does not occur}} \\ \text{odds}_\text{delivery} &= \frac{\text{number of times presents are delivered}}{\text{number of times presents are not delivered}} \end{aligned} \]

Table 1: Classification table for delivery and type of treat
Not delivered Delivered Sum
Pudding 28 150 178
Mulled wine 122 100 222
Sum 150 250 400

The odds ratio

The odds ratio expresses the change in odds. In this case, we could compute the change in odds as we change from wine as a treat to pudding. To do this we would compute the odds of delivery in the wine condition, and then do the same for the pudding condition. The resulting odds ratio would be:

\[ \text{odds ratio} = \frac{\text{odds}_\text{delivery after wine}}{\text{odds}_\text{delivery after pudding}} \]

Quiz

We will interpret these values as we encounter them later on.

A model with one predictor

Fitting the model using IBM SPSS Statistics

To fit the model we use the Analyze > Regression > Binary Logistic … menu. The following video shows how.

Model summary

Output 1 tells us both how we coded our outcome variable (it reminds us that 0 = not delivered, and 1 = delivered) and how it has coded the categorical predictors (the parameter codings for treat). We chose indicator coding and so the coding is the same as the values in the data editor (0 = pudding, 1 = mulled wine). The parameter codes are important for interpreting the direction of effects.

Output 1

Output 1

Output 2 shows the \(R^2\) for the model, and these can be interpreted in much the same way as for a linear model with a continuous outcome. In this case, if we look at the Nagelkerke value of 0.215 it tells us that by including treat as a predictor, the model explains 21.5% of the variation in deliveries. (Strictly speaking because the outcome is two categories we shouldn’t talk about ‘variance’ in the statistical sense because it doesn’t make sense for nominal outcome variables. That’s why I used the word variation).

Output 2

Output 2

Model parameters

Output 3 shows the model parameters. We can see that \(b_0\) is 1.678, which means that in the condition coded as zero (i.e. the pudding condition) the log odds of delivery were 1.678. What does this mean? Earlier we calculated the odds of delivery after pudding as being 5.36, and this is the value labelled Exp(B) by SPSS. The log odds, is simply the natural logarithm of this value, \(ln(5.36) = 1.678\). Because no-one’s brain can think in terms of logs, we tend to interpret Exp(B) so that we can think in terms of odds. For \(b_0\), then, the odds of delivery in the baseline group were 5.36. In the pudding condition, 5.36 times more presents were delivered than not.

The b for the effect of treat tells us how the odds in the baseline (pudding) group change as the variable treat changes by 1 unit. A change of 1 unit in this case would be a change in that variable from 0 to 1. In other words, a change from pudding to mulled wine. The value of -1.877 tells us that as we move from pudding to mulled wine the log odds of delivery decreases by 1.877. In other words, the odds of delivery are going down (they are worse after wine than pudding). Again, we can look at Exp(B) for a clearer interpretation. The value is 0.15, and this matches the value of the odds ratio that we calculated earlier). Look back to that section and you’ll see that this value means that the odds of delivery after wine are 0.15 the odds of delivery after pudding. The odds of delivery are about 6-7 times smaller after wine than pudding.

Output 3

Output 3

Assuming the current sample is one of the 95% for which the confidence interval contains the true value, then the population value of the odds ratio for treat lies between 0.094 and 0.248. However, our sample could be one of the 5% that produces a confidence interval that ‘misses’ the population value. The important thing is that the interval doesn’t contain 1 (both values are less than 1). The value of 1 is important because it is the threshold at which the direction of the effect changes. Think about what the odds ratio represents: values greater than 1 mean that as the predictor variable increases, so do the odds of (in this case) delivery, but values less than 1 mean that as the predictor variable increases, the odds of delivery decrease. If the value is 1 it means that the odds of delivery are identical for pudding and wine. In other words, there is no effect at all of treat.

If the confidence interval contains 1 then the population value might be one that suggests that the type of treat increases the odds of delivery, or decreases it or doesn’t change it. For our confidence interval, the fact that both limits are below 1 suggests that the direction of the relationship that we have observed reflects that of the population (i.e., it’s likely that the odds of delivery after wine really are worse than after pudding). If the upper limit had been above 1 then it would tell us that there is a chance that in the population the odds of delivery are actually higher after wine than pudding, or that the type of treat makes no difference at all.

Now we have the basic understanding of what the parameters mean, let’s fit the full model.

A model with several predictor

Fitting the model using IBM SPSS Statistics

The full model included not only treat but also the effect of quantity and the treat × quantity interaction. This video shows how to add these variables to the model that we have already fitted:

Model summary

Output 4 shows the \(R^2\) for the model, The Nagelkerke value has increased from 0.215 to 0.400. It tells us that by including treat, quantity and their interaction in the model it explains 40% of the variation in deliveries. When we included only treat the value was 0.215, so by including quantity and the interaction we have increases the \(R^2\) by $0.400-0.215 = 0.185. In other words quantity and the interaction term collectively explain an extra 18.5% of variation.

Output 4

Output 4

Model parameters

Output 5 shows the model parameters. We can see that \(b_0\) is 1.83, which means that when all predictors are zero (i.e. the situation where zero puddings were consumed, and the interaction term is zero too) the log odds of delivery were 1.83. As before, we can look to the odds ratio, or Exp(B) for interpretation. It tells us that the odds of delivery were 6.23.

The bs for the main effects of treat and quantity are both quite close to zero indicating very small effects. If we inspect the Exp(B) values, they are both close to 1: for treat it is 1.22, and for quantity it is 0.92. Remember that an odds ratio of 1 represents ‘no effect’, so these effect sizes suggest that when considered alone, the type of treat and the quantity consumed did not have an large impact on the deliveries. This is backed up by the non-significant p-values of 0.701 and 0.629 respectively.

However, the interaction effect is significant (p < 0.001), and the b-value suggests that as combined effect of treat and quantity increases by one unit, the log odds of delivery decrease by 1.028 units. The corresponding odds ratio is much less than 1 (0.358) and the confidence interval for it ranges from 0.228 to 0.563. It is important this this interval doesn’t contain 1 because it suggests that (assuming this sample is one of the 95% for which the confidence interval contains the population value) the population value is not 1. In other words the true effect relates to a decrease in the odds of delivery as the interaction increases. To unpick what this value of 0.563 means, let’s have a look at what happens when we fit models that predict deliveries from quantity alone, but we fit separate models for mulled wine and Christmas pudding.

Output 5

Output 5

Understanding the interaction term

See whether you can use split file (Data > Spplit File …) to split the file by treat. This will mean that all subsequent analyses are performed separately for Christmas pudding and mulled wine. Having done that, go back to the logistic regression menu (Analyze > Regression > Binary Logistic …) and fit a model that predicts delivery from quantity alone (i.e., remove the effect of treat and the interaction). If you get stuck, watch this video:

Output 6 shows the effect of quantity of Christmas pudding consumed on delivery of presents. The b is close to zero (-0.081), the odds ratio is close to 1 (0.922), and the p = 0.629, which all suggest that quantity has no real effect when the treat consumed is pudding. This is shown by the red line in the graph, which plots the percentage of presents delivered after different amounts of treats. the red line maps the trend for pudding, and the line is fairly flat as you’d expect given the b is close to zero (although the b in the table is not the slope of this red line it should map onto the general trend).

Output 6

Output 6

Output 7 shows the effect of quantity of mulled wine consumed on delivery of presents. The b is negative (-1.109), the odds ratio is close to 0 (0.33), and the p < 0.001, which all suggest that quantity has a negative effect on deliveries: as quantity of mulled wine increases the log odds of delivery decrease. This is shown by the green line in the graph, which plots the percentage of presents delivered after different amounts of treats. The green line maps the trend for mulled wine, and the line slopes down steeply as you’d expect given the negative b, and the odds ratio well below 1. (Again, the b in the table is not the slope of this red line it should map onto the general trend).

Output 7

Output 7

The interaction effect, therefore, reflects the fact that the effect of quantity on delivery is significantly different for Christmas pudding and mulled wine. In other words, the values of the bs are significantly different: -0.081 is significantly different from -1.109. As a rough approximation it means that in the graph, the red and green lines have different slopes.

Interestingly, if we calculate the difference between the bs we get: \(−1.109−(−0.081) = −1.028\). And if we take the exponent of this value we get the odds ratio for the interaction effect: \(e^{-1.028}=0.358\) (look back at Output 5). So, the odds ratio for the interaction is the odds ratio for the difference in the effect of one predictor across levels of the other. In this case, it’s the odds ratio for the difference between the effect of quantity of Christmas pudding on delivery and the effect of the quantity of mulled wine on delivery.

Classification plots

Output 8 displays the classification plot, which is a histogram of the predicted probabilities of presents being delivered. If the model perfectly fits the data, then this histogram should show all the cases for which the event has occurred on the right-hand side (they are shown as D for delivered), and all the cases for which the event hasn’t occurred on the left-hand side (they are shown as N for not delivered). In this example, all the cases where presents were delivered should appear on the right and all those where presents were not delivered should appear on the left. As a rule of thumb, the more the cases cluster at each end of the graph, the better; such a plot would show that when the outcome did occur the predicted probability of the event occurring is also high (i.e., close to 1). Likewise, at the other end of the plot it would show that when the event didn’t occur the predicted probability of the event occurring is also low (i.e., close to 0). This situation represents a model that correctly predicts the observed outcome data. If lots of points cluster in the centre of the plot then it shows that for many cases the model is predicting a probability of 0.5; in other words, there is close to a 50:50 chance that these cases are predicted correctly by the model – you could predict these cases as accurately as the model by tossing a coin. In Output 7 ‘delivered’ cases are predicted relatively well by the model (the probabilities are generally not that close to 0.5), for not delivered cases are not quite as well predicted: quite a few more Ns appear on the right hand side than Ds on the left. In general though, predictions from the model are quite accurate.

Output 8

Output 8

Casewise diagnostics

Output 9 shows the cases with studentized residuals greater than 2. You apply the same rules as for any linear model. We’re looking for no more than 5% of cases (we had 400 cases, so 5% would be 20) to have absolute values greater than 1.96. Here we have 9, which is well within what we’d expect. The cases with values of 3.329 might warrant further inspection, but again, only 3 cases out of 400 is probably not too alarming. If you check the Cook’s distances that have been saved in the data file, you will also see that these are all well below 1.

Output 9

Output 9

Robust models

If you choose to bootstrap the model, you’ll see Output 10 (remember you need to switch off any variables you have asked to save, like standardized residuals). The bootstrap gives us robust confidence intervals for the raw b. Looking at the effect of the interaction, it is still very significant, which reassures us about our conclusions from the regular model. We

Output 10

Output 10

Next tutorial

This is the last tutorial on the module. Hooray and well done!

References

Field, Andy P. 2017. Discovering Statistics Using Ibm Spss Statistics: And Sex and Drugs and Rock ’N’ Roll. Book. 5th ed. London: Sage.


  1. This tutorial is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, basically you can use it for teaching and non-profit activities but not meddle with it.

Andy Field