CORRELATION AND REGRESSION TOPIC SLIDE Linear Regression Defined 2 Regression Equation 3 The Slope or b 4 The Y-Intercept or a 5 What Value of the Y-Variable Should be Predicted When r = 0? 7 The Regression Line 9 The Point of Averages 12 Residuals 15 Extrapolation, Restricted Range, and Lurking Variables 20 Tutorials Obtaining a linear regression analysis in Excel 2007
➊ The stronger the correlation, the more accurately one variable can be predicted from another variable ➋ By using the linear regression equation, we can predict scores for one variable (the Y-variable) from scores on a second variable (the X-variable) The linear regression equation assumes the statistical relationship between two variables follows a straight line known as the regression line
➊ The regression equation consists of four parts: The predicted value for the Y-variable or y The slope of the regression line or b The known value of the X-variable or x The value for the y-intercept or a y' b x i a
y' b x a i ➊ The slope of the regression line or b : Has the same sign (+ or -) as the correlation coefficient r Is a function of the strength of the correlation and the ratio of standard deviations for X and Y variables b r SDy SDx
y' b x a i ➊ The value for the y-intercept or a : Is the point where the regression line crosses the y-axis Is the predicted value of y when the x-variable equals zero This value may sometimes be a strange value, but remember it s a predicted value
a Y b X ➊ The y-intercept equals: The slope of the regression equation (b) times the overall mean for the x-variable (X ) subtracted from The overall mean for the y-variable (Y )
➊ If the correlation is zero, that means the value for the slope is zero and the regression line is flat (i.e., horizontal) ➋ If b = 0, then the y-intercept formula simplifies to: a Y Which means the regression equation simplifies to: y' Y Why?
➊ If there is no correlation between two variables, the best prediction for either variable is its mean ➋ On average, the mean is closer to all values in a distribution compared to any other score In other words, if the mean is used to predict each score in a data set, the average error in prediction will be smaller compared to using some other score from the distribution
➊ What values make the regression line? The values predicted by the regression equation create the regression line y' b x i a These predicted points all fall on the regression line
➊ Represents a central point inside the points of a scatterplot The points in a scatterplot can be thought of as regressing to this central point ➋ Is the best fitting line and is also known as the line of leastsquares Imagine the different angles you could plot a straight line through a scatterplot The line that would result in the smallest average distance from all points would be the regression line
Regression Equation The blue line is the regression line. The points that make this line are the predicted values from the regression equation.
➊ Every linear regression line passes through the point of averages The point of averages is located by the intersection of the overall mean for the x-variable and the overall mean of the y-variable ➋ Point predicted closer to the point of averages are, on average, more accurate than points plotted further away from this point
Regression Equation The black dot represents the point of averages where the overall means for the x-variable (Father s Height 69 inches) and y-variable (Son s Height 71.5 inches). This point is always found on a linear regression line
➊ The regression line can be plotted using Excel, however, you can also plot this line using two points: The point of averages and The y-intercept ➋ You can also plot the regression line by plugging-in values of the x-variable into the regression equation and solving for the predicted value of the y-variable Remember the regression line is made-up of all the predicted values of the y-variable or y
➊ The term residuals refers to the amount of error in prediction In other words, the regression equation produces a predicted value for the y-variable The difference between the predicted value of Y and the real value of Y is known as error or the residual Excel can calculate the residuals for each predicted score, however if we were to obtain the residuals by hand, the formula used is: Formula for Residuals: y y
Regression Equation Residual Residual The distance between each real point and the regression line is a residual or error in prediction. The sum of the residuals is always equal to zero.
➊ Residuals can help identify outliers When a residual is very large, it may indicate an outlier Outliers can have the effect of increasing or decreasing the slope of the regression line This means that outliers can also increase or decrease the correlation between two variables Depending on the size of the outlier, a researcher may want to run the regression analysis with and without the outlier to see how much the score may affect the results
➊ The regression equation attempts to predict the mean of the y-variable at each value of the x-variable WHY? Suppose you have three fathers who are each 74 inches tall (or 6 2 ) Each of these fathers has a son who is a different height The value of the x-variable entered into the regression equation will be the same for each of these three fathers What value for sons heights should the equation try to predict?
Regression Equation What height should be predicted for the three sons who each have a father that is 74 tall? The regression equation will try to predict the average height of the sons (y-variable) at each height of the fathers (x-variable).
➊ What is meant by extrapolation? Predicting values beyond the range of the data used to develop the regression equation ➋ What is meant by limited range? When the regression equation is based on a very narrow range of data compared to the true range of the data in the population What is meant by lurking variables? Other variables that can account for the correlation between two variables
➊ The correlation coefficient can be obtained by hand using the following formula: r b SDx SDy
End of Chapter 3 Part 2