This tutorial is one of a series that accompanies Discovering Statistics Using IBM SPSS Statistics (Field 2017) by me, Andy Field. These tutorials contain abridged sections from the book (so there are some copyright considerations).1
First, let's see whether you understood the lecture and book chapter with a quiz, because everyone likes to start a tutorial with a quiz. Or is that chocolate? "Everyone likes to start a tutorial with chocolate?" does sound plausible, but I get so confused between the two. No, I'm pretty sure it's quizzes that people like, not chocolate, so here goes:
We continue the example from Field (2017) from previous tutorials that predicts physical and downloaded album sales (outcome variable) from the amount (in thousands of pounds/dollars/euro/whatever currency you use) spent promoting the album before release (Adverts), airplay of songs from the album in the week before release (Airplay), and how attractive people found the band’s image out of 10 (Image). The data are in the file album_sales.sav, and although you'll be familiar with the data set if you have done the previous tutorials it looks like this:
Figure 1: The data in IBM SPSS Statistics
As revision from previous tutorials, the model we're fitting is:
\[ \text{Sales}_i = b_0 + b_1\text{Advertising}_i+ b_2\text{Airplay}_i + b_3\text{Image}_i + ε_i \] We fit the model hierarchically in two blocks with advertising budget entered in the first block and the other two predictors entered in a second block. We're going to fit the model as we have done in previous tutorials, but this time look at options to give us diagnostic information about the model. The following video recaps how to fit this model using SPSS Statistics and how to ask for diagnostic information.
We requested several plots of the residuals from the model. Remember from the lecture that we can identify potential problems with linearity and spherical errors (i.e., independent errors and homoscedastic errors). The Figure below (taken from Field (2017)) shows that we're looking for a random scatter of dots. Curvature in the plot indicates a lack of linearity, and a funnel shape (residuals that 'fan out') indicates heteroscedasticity.
Figure 2: Residual plots
Let's look at the plots in our output to see whether we can see any of these patterns. The first plot shows the standardized predicted values from the model (zpred) against the standardized residuals from the model (zresid). The plots included here should match your SPSS output (if they don't then one of us has fit the model incorrectly) but I have added an overlay to help you to interpret them.
Figure 3: zpred vs. zresid for the model
Now let's look at the partial plots to see whether we can see any of these patterns. There are three plots: one for each predictor. Again I have added an overlay on each plot to help you to interpret them.
Figure 4: Partial plots for the model
Finally, even though normality is not a major concern (especially with a sample size of 200) we can check whether the residuals are normally distributed using the histogram and P-P plot that we requested.
Figure 5: Normality plots for the model
As a bare minimum we should check the model for outliers using the standardized residuals and use Cook's distance to identify influential cases. Field (2017) describes a much wider battery of values that you can use to check for these things so if you're starting to get the stats bug (?!) then check that out. Otherwise, we can apply the following heuristics:
We asked for a summary of standardized residuals when we selected these options:
Figure 9: Requesting casewise diagnostics for the model
The resulting output (which should match that in your viewer window is):
Output 1: Casewise diagnostic summary for the model
When attempting the following quiz remember that there were 200 cases in total and that the absolute value is the value when you ignore the plus or minus sign. The standardized residuals are labelled Std. Residual in the output:
For Cook's distance it is a matter of screening the column in the data editor in which you saved these values and noting cases with values greater than 1.
Figure 10: Values of Cook's distance in the data editor
The next output contains the bootstrap confidence intervals for each model parameter. These bootstrap confidence intervals2 and significance values do not rely on assumptions of normality or homoscedasticity, so they give us an accurate estimate of the population value of b for each predictor (assuming our sample is one of the 95% with confidence intervals that contain the population value).
Output 2: Bootstrapped model parameter estimates
The quiz told you about the confidence interval for image, for the remaining predictors the confidence intervals tell us that assuming that each confidence interval is one of the 95% that contains the population parameter:
To sum up, the bootstrapped statistics tell us that advertising, b = 0.09 [0.07, 0.10], p = 0.001, airplay, b = 3.37 [2.77, 3.97], p = 0.001, and the band’s image, b = 11.09 [6.26, 15.28], p = 0.001, all significantly predict album sales. Basically we interpret bootstrap confidence intervals and significance tests in the same way as regular ones but the bootstrapped ones should be robust to violations of the assumptions of the model.
We'll use the same unguided example as in the last tutorial. To recap, the data are in the file socialanxietyregression.sav. This file contains three variables of interest to us:
In the previous tutorial we fitted a hierarchical linear model with two blocks:
Use what you have learned to fit this model but saving diagnostic information. Use the output to answer these questions.
The plots that you should get are displayed below so you can check your output.
Figure 11: Plots from the social anxiety model (ZPRED vs ZResid, partial plot for shame, partial plot for OCD, and P-P plot of the standardized residuals
To answer these questions remember that the sample size was 134.
The nature of bootstrapping means that I can't ask questions about the specific values. Here are some general questions:
The next tutorial will look at using categorical predictors in the linear model.
These are useful resources for understanding some of the concepts in this tutorial. These are not written or hosted by me, so I take no responsibility for whether they work. If they are working though, you might find them useful.
Cook, R. D., and S. Weisberg. 1982. Residuals and Influence in Regression. Book. New York: Chapman & Hall.
Field, Andy P. 2017. Discovering Statistics Using Ibm Spss Statistics: And Sex and Drugs and Rock ’N’ Roll. Book. 5th ed. London: Sage.
Steketee, G., R. Frost, N. Amir, M. Bouvard, C. Carmin, D. A. Clark, J. Cottraux, et al. 2001. “Development and Initial Validation of the Obsessive Beliefs Questionnaire and the Interpretation of Intrusions Inventory.” Journal Article. Behaviour Research and Therapy 39 (8): 987–1006. <Go to ISI>://000170088800011.
Tangney, J. P., R. Dearing, P. E. Wagner, and R. Gramzow. 2000. The Test of Self–conscious Affect–3 (Tosca–3). Book. Fairfax,VA: George Mason University.
Turner, S.M., D. C. Beidel, and C. V. Dancu. 1996. Social Phobia and Anxiety Inventory: Manual. Book. Toronto: Multi–health Systems Inc.
This tutorial is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, basically you can use it for teaching and non-profit activities but not meddle with it.↩
Because of how bootstrapping works the values in your output will be different than mine, and different if you re-run the analysis.↩