Regression CHAPTER SIXTEEN NOTE TO INSTRUCTORS OUTLINE OF RESOURCES

CHAPTER SIXTEEN Regression NOTE TO INSTRUCTORS This chapter includes a number of complex concepts that may seem intimidating to students. Encourage students to focus on the big picture through some of the discussion questions and classroom activities. You can ease students concerns about multiple regression by describing it as similar to simple linear regression except that researchers examine multiple variables rather than only one. OUTLINE OF RESOURCES III. III. III. IV. Simple Linear Regression Discussion Question 16-1 (p. 152) Discussion Question 16-2 (p. 152) Classroom Activity 16-1: Make It Your Own (p. 153) Discussion Question 16-3 (p. 153) Discussion Question 16-4 (p. 153) Discussion Question 16-5 (p. 154) Classroom Activity 16-2: Finding the Regression Line (p. 154) Interpretation and Prediction Classroom Activity 16-3: Make It Your Own (p. 154) Discussion Question 16-6 (p. 155) Discussion Question 16-7 (p. 156) Multiple Regression Discussion Question 16-8 (p. 156) Discussion Question 16-9 (p. 157) Additional Readings (p. 158) Online Resources (p. 158) Next Steps: Structural Equation Modeling (SEM) Discussion Question 16-9 (p. 157) Classroom Activity 16-4: Careers in Prediction (p. 157) Classroom Activity 16-5: SEM in Context (p. 158) 151

152 CHAPTER 16 REGRESSION IV. Handouts Handout 16-1: Finding the Regression Line (p. 159) Handout 16-2: Careers in Prediction (p. 160) Handout 16-3: Examing SEM in Context (p. 161) CHAPTER GUIDE I. Simple Linear Regression 1. Simple linear regression is a statistical tool that enables us to predict an individual s score on the dependent variable from his or her score on one independent variable. 2. Regression allows us to make quantitative predictions that more precisely explain relations among variables. > Discussion Question 16-1 What is simple linear regression, and why is it useful? Simple linear regression is a tool that allows us to make predictions. Simple linear regression is useful as an extension of correlation that allows us to quantify the relationship among variables with greater precision and accuracy. 3. Because simple linear regression helps us to find the equation for a line, we must have data that are linearly related to use it. 4. We can use z scores when making these predictions. Specifically, the formula is z Y^ = (r XY )(z X ). The first z score is for the dependent variable and the second z score is for the independent variable. The ^ symbol signals that the z score is predicted rather than being the actual score. 5. The tendency for scores that are particularly high or low to drift toward the mean over time is known as regression to the mean. 6. Usually, we want to predict a raw score from a raw score. We will first need to convert a raw score on one variable to a z score. We can then predict a z score for the second variable. Finally, we convert the z score from the second variable to a raw score. > Discussion Question 16-2 How would we predict a raw score from a raw score? In order to predict a raw score from a raw score, we must first transform one raw score into a z score. Then we multiply that z score by the correlation coefficient to get the predicted z score for the second variable. Finally, we transpose that z score back into a raw score and make our prediction.

CHAPTER 16 REGRESSION 153 Classroom Activity 16-1 Make It Your Own Use your students weight and height as measures for this exercise, or use height and age if you think using weight would be a sensitive issue. Have your students anonymously submit their weight and height or height and age. Load these data into SPSS and run the analysis as a correlation and simple regression. 7. The intercept is the predicted value for Y when X is equal to 0, which is the point at which the line crosses or intercepts the y-axis. 8. The slope is the amount that Y is predicted to increase for an increase of 1 in X. > Discussion Question 16-3 What is the difference between the intercept and the slope? Why do we calculate them in simple linear regression? The difference between the intercept and the slope is that the intercept is the predicted value for Y when X is equal to 0 and the slope is the amount that Y is predicted to increase for an increase of 1 unit in X. We calculate them in simple linear regression because it allows us to develop a raw-score regression equation for predicting the raw score for Y. 9. Both the intercept and the slope are needed to calculate the equation of a line: Y^ = a + b(x). 10. To calculate the intercept, we calculate the z score for X when X = 0 by using the formula: z X = (X M X )/SD X. We then use the z-score regression equation to calculate the predicted z score for Y by using the formula: z Y^ = (r XY )(z X ). We then convert this z score to the predicted raw score for Y using the formula: Y^ = z Y^ (SD Y ) + M Y. 11. To calculate the slope, we repeat the previous steps that we used to calculate the intercept but use an X of 1 rather than 0. We then determine the change in Y^ as X increased from 0 to 1. It is important to include the appropriate sign based on whether there is an increase or decrease in Y^. > Discussion Question 16-4 How do we calculate the slope of the regression line? How is it different from calculating the intercept? We calculate the slope of the regression line by first calculating a z score for X when X = 1 by using the formula: z X = (X M X )/SD X. We then use the z-score regression equation to calculate the predicted z score for Y by using the formula: z Y^ = (r XY )(z X ). We then convert this z score to the predicted raw score for Y using the formula: Y^ = z Y^ (SD Y ) + M Y.

154 CHAPTER 16 REGRESSION Calculating the slope of the regression line is different from calculating the intercept of the regression line because for calculating the slope we substitute an X of 0 with an X of 1. 12. With both the intercept and slope calculated, we can now use our formula to predict the raw score for Y. 13. If we find at least three other predicted values for Y, we can use these values to draw a regression line. This is also known as the line of best fit. 14. A negative slope means that the regression line starts in the upper left of the graph and ends in the lower right. A positive slope means that the regression line starts in the lower left of the graph and ends in the upper right. > Discussion Question 16-5 How can you tell whether a slope is positive or negative? You can tell whether a slope is positive or negative by first drawing a regression line through the dots on a graph corresponding to pairs of scores for X and Y^; A negative slope means that the line looks like it s going downhill as we move from left to right, while a positive slope means that the line looks like it s going uphill as we move from left to right. Classroom Activity 16-2 Finding the Regression Line Have students use data created from Classroom Activity 16-3, Creating Correlations, from the previous chapter. Have students use the data collected to determine the regression line. Handout 16-1, found at the end of this chapter, can be used to aid in this process. II. 15. The standardized regression coefficient (also known as beta weight), a standardized version of the slope in a regression equation, is the predicted change in the dependent variable in terms of standard deviations for a 1 standard deviation increase in the independent variable. 16. The standardized regression coefficient is symbolized by β and pronouned beta or called beta weight. It is calculated using the formula β = (b)( SS X / SS Y ). Interpretation and Prediction 1. The number that best describes how far away, on average, the data points are from the line of best fit is called the standard error of the estimate. In other words, it is a statistic indicating the typical distance between a regression line and the actual data points.

CHAPTER 16 REGRESSION 155 Classroom Activity 16-3 Make It Your Own In this activity, use SAT scores and overall GPAs to demonstrate simple regression. Again, anonymously collect the data from the students. Have the students frame the research question for a correlation for a simple regression. After running the analysis, have the students discuss the results. It is likely that your data may suffer from a restricted range but that is a good point for class discussion because real data are messy. 2. The proportionate reduction in error is a statistic that quantifies how much more accurate our predictions are when we use the regression line instead of the mean as a prediction tool. > Discussion Question 16-6 Why do you think that we would use the mean as a basis of comparison with the regression line? Why would we use the mean instead of some other number from our sample? We use the mean as a basis of comparison with the regression line because, with limited information, the mean is a fair predictor. By using the mean, we can calculate the coefficient of determination and measure how accurate our predictions are in using the regression line compared to the mean. We use the mean instead of some other number from a sample because the mean is involved in calculating the regression equation and, as a result, we can quantify the improvement in prediction that results from using the regression line over the mean. 3. If we were to subtract the mean score of the sample from each person s score, square that value, and sum all of the values, we would obtain the sum of squared errors, or the sum of squares total (SS total ). This is the error that results if we were to predict the mean as the score for each person. 4. We want our regression equation to be a substantial improvement over just using the mean as our prediction. 5. To determine how much better our regression equation predicts over the mean, we plug each X value into the regression equation. 6. To find the sum of squared errors, or SS error, we subtract each predicted score from the mean, square the errors, and sum them. 7. To find the amount of error we ve reduced, we subtract the sum of squared errors from the sum of squares total. This number is divided by the sum of squares total to obtain a proportion. 8. The proportionate reduction in error is symbolized as r 2 and is calculated using the formula: r 2 = (SS total SS error )/SS total.

156 CHAPTER 16 REGRESSION > Discussion Question 16-7 What is the difference between the SS total and the SS error? What is the purpose of calculating them? The difference between the SS total and the SS error is that the SS total represents the error in prediction from the mean compared to the SS error, which represents error from predicting Y with our regression equation. The purpose of calculating them is to quantify the amount of error that we ve reduced in using the regression equation instead of the mean. 9. We could also calculate r 2 by squaring the correlation coefficient. III. Multiple Regression 1. An orthogonal variable is an independent variable that makes a separate and distinct contribution in the prediction of a dependent variable, as compared to another independent variable. 2. Multiple regression is a statistical technique that includes two or more predictor variables in a prediction equation. 3. Multiple regression is more widely used than simple linear regression because most dependent variables are best explained by using more than one independent variable. > Discussion Question 16-8 Why is multiple regression an improvement over simple linear regression? Multiple regression is an improvement over simple linear regression because it provides greater prediction by incorporating two or more predictor variables into the regression equation. 4. Compared to using averages, multiple regression represents a significant advance in our ability to predict human behavior. 5. When calculating the proportionate reduction in error for multiple regression, its symbol is R 2 rather than r 2 to indicate that the error is based on more than one independent variable. 6. In stepwise multiple regression, computer software determines the order in which independent variables are included in the equation. 7. Stepwise multiple regression is frequently used because it is the default in many software programs and is useful in the absence of a clear, predictive theory. 8. Another approach is to use hierarchical multiple regression whereby the researcher adds independent variables into the equation in an order determined by theory. 9. In order to use hierarchical multiple regression, we need to have a specific predictive theory that we are testing.

CHAPTER 16 REGRESSION 157 > Discussion Question 16-9 What is the difference between stepwise and hierarchical multiple regression? When would you want to use one technique rather than the other? The difference between stepwise and hierarchical multiple regression is that, in stepwise regression, the computer software program determines the order of variable entry, while in hierarchical regression analysis, the researcher determines order of variable entry in light of theory. A stepwise regression can be used in the absence of theory, such as in model building, while hierarchical regression can be used to test a specific theory. Classroom Activity 16-4 Careers in Prediction The chapter refers to many opportunities for using prediction within certain careers. In this activity, students will expand on this topic using Handout 16-2. The goal of this activity is for students to observe the relevance and usefulness of regression in their daily experience. IV. Next Steps: Structural Equation Modeling (SEM) 1. Structural equation modeling (SEM) is one of several statistical techniques (and one of the most sophisticated statistical approaches) that quantifies how well sample data fit a theoretical model that hypothesizes a set of relations among multiple variables. 2. When using SEM, statisticians will refer to a statistical (or theoretical) model, which is a hypothesized network of relations, often portrayed graphically, among multiple variables. 3. When creating a model that hypothesizes the relation among factors being tested, we create paths that describe the connection between two variables in a statistical mode. We can conduct a path analysis to examine a hypothesized model by conducting a series of regression analyses that quantify the paths at each succeeding step in the model. 4. In SEM, we refer to variables that we observe and are measured as manifest variables. 5. In contrast, latent variables are ideas that we want to research but cannot directly measure. We will try to indirectly observe such variables using appropriate measurement tools. 6. When encountering a model such as SEM, it is important to first figure out what variables the researcher is studying. Next, look at the numbers to see what variables are related and the signs of the numbers to see the direction of the relation.

158 CHAPTER 16 REGRESSION Classroom Activity 16-5 SEM in Context In this activity, students will try to understand how path analysis is used in context. To do this, students will download or be given copies of the article: Kim, Y. M. & Neff, J. A. (2010). Direct and indirect effects of parental influence upon adolescent alcohol use: A structural equation modeling analysis. Journal of Child & Adolescent Substance Abuse, 19(3), 244 260. Students will use Handout 16-3 in their analysis of the article. Additional Readings Harrell, F. E. (2001). Regression Modeling Strategies. New York: Springer. Beyond discussing regression, this book also explores when and how to use this statistic. It is geared toward graduate students and researchers. Cohen, J., and Cohen, P. (2002). Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. Mahwah, NJ: Lawrence Erlbaum Publishers. This book is data oriented and presents an excellent nonmathematical approach to data analysis. It aims toward at least a graduate level course in statistics, but is also an invaluable reference for those wanting more depth in this area. Online Resources The following site provides you with simulations or demonstrations for almost all topics found in the textbook, as well as additional information about each topic: http://onlinestatbook.com/stat_sim/index.html. The regression by eye is a good support for students as they learn to visually grasp regression. The following is the award-winning Web Experimental Psychology Lab site, home of the Magic experiment: http://www.psychologie.uzh. ch/sowi/ulf/lab/webexppsylab.html. There are a number of fun experiments that your students can explore, including ranking probability terms and learning via tutorial dialogues.

CHAPTER 16 REGRESSION 159 HANDOUT 16-1: FINDING THE REGRESSION LINE Directions: For this exercise, use the data obtained from Classroom Exercise 15-3, Creating Correlations, from Chapter 15 to answer the questions below. 1. What is the intercept for this data? 2. What is the slope for this data? 3. What is the equation of the regression line for this data? 4. Using the regression line, predict Y using any relevant number for X (except 0 or 1).

160 CHAPTER 16 REGRESSION HANDOUT 16-2: CAREERS IN PREDICTION Directions: Answer the questions below to explore the relevance of prediction in everyday experience by examining how prediction is used in certain jobs. 1. Brainstorm a list of careers that could use prediction and write them below. Use the authors suggestions in the textbook chapter to help you with this, but be sure to develop additional ideas. You may also want to use online or newspaper job listings to help you decide how prediction could be useful in these careers. 2. For each job listed above, discuss why you think prediction could be useful and how prediction would be used in this context.

CHAPTER 16 REGRESSION 161 HANDOUT 16-3: EXAMINING SEM IN CONTEXT Directions: For this exercise, you will need the article: Kim, Y. M. & Neff, J. A. (2010). Direct and indirect effects of parental influence upon adolescent alcohol use: A structural equation modeling analysis. Journal of Child & Adolescent Substance Abuse, 19(3), 244 260. Read the article, and answer the questions below to help you understand how SEM is used in psychological research. 1. Summarize the research in the space below. What were the researchers hypotheses? What were their methods? What were their conclusions? 2. Draw the authors model in the space below and include their findings from SEM. 3. Interpret the findings from SEM. What do their findings actually mean?