Name Date Per Key Vocabulary: response variable explanatory variable independent variable dependent variable scatterplot positive association negative association linear correlation r-value regression line mathematical model least-squares regression line ŷ y-hat SSM SSE r 2 coefficient of determination residuals residual plot influential observation Calculator Skills: seq(x,x,min,max,scl) x, s, y, s x y 2-Var Stats sum Clear All Lists residual plot Diagnostic On Use a separate sheet of paper to answer the questions, if more space is needed. 3.1 Scatterplots 1. (p.118) Sir Francis Galton ( - ), an English statistician related to American, invented the words and. 2. What is the difference between a response variable and an explanatory variable? 3. How are response and explanatory variables related to dependent and independent variables? 4. Is it proper to use the terms, response variable and explanatory variable, if the explanatory variable does not actually cause the response variable? 5. What is the order of tasks involved in examining relationships between two variables? (p.122) 6. A scatterplot shows the relationship between two variables measured on the individuals.
7. True or false: In a scatterplot, each point represents one individual; the x-coordinate of the point represents the value of one variable and the y-coordinate represents the value of another variable measured on that same individual. 8. Suppose that someone has math scores for the children in one classroom, and English scores for a second set of children in another classroom. The person asks you about making a scatterplot for these data. What would you say? 9. Which variable always appears on the horizontal axis of a scatterplot? 10. When describing a scatterplot, to what three aspects of the pattern should you refer? 11. True or false: In describing the form of a scatterplot, it is important to say whether the graph appears to be linear or not. 12. In describing the form of a scatterplot, what term do you use if the values tend to fall into two or more groups that are separated from one another by gaps? 13. In describing the direction of a scatterplot, when there is a positive or negative slope, we say that the variables are positively or negatively. 14. True or false: In describing the strength of a scatterplot, we look at the amount of scatter in the data points how close the points lie to a simple form such as a line. 15. Explain the difference between a positive association and a negative association (using the definition on p. 135). 16. When you are drawing a scatterplot, what symbols should you use on the axes if the origin of the graph is not at zero? 17. What are three other tips for drawing scatterplots properly? 18. Suppose that you want your scatterplot to reflect the influence of a particular categorical variable, in addition to the relationship of the two quantitative variables that are plotted. For example, suppose you want to graph the relation between entertainment violence and real-life violence for males and females on the same graph, in such a way that displays the relationship separately for males and females. What should you do? 19. A common problem in constructing a scatterplot occurs when two or more individuals have exactly the same values for each of the two variables. What should you do in that case? 3.2 Correlation 1. Which is a better method for judging the strength of a linear relationship: simply to look at the graph, or to use a calculated numerical statistic that summarizes the strength of the linear relationship? Explain why. 2. What does correlation measure? 3. We ve used Greek letter to represent a population mean, x-bar to represent a sample mean; Greek letter to represent the population standard deviation, and s to represent the sample standard deviation. What letter does our book use to designate what is called the correlation?
4. Given that letter above, for the correlation coefficient, is in our own alphabet and not the Greek alphabet, do you think it refers to a sample statistic or a population parameter? 5. Would you guess that there is some other Greek letter that refers to the population value of the correlation coefficient? xi 6. When you look at the formula for the sample correlation coefficient that your text gives, you see s yi y. Can you give a simpler name to these expressions? s y x x and 7. What is the meaning of a positive and negative sign associated with the correlation coefficient? 8. True or false: Correlation makes a distinction between explanatory and response variables. 9. Suppose one person calculates the correlation of IQ score of some individuals with number of boxing matches fought, testing the hypothesis that boxing (the explanatory variable) affects IQ (the response variable. A second person, using the same data set, also calculates the correlation of the number of fights with IQ score, only this person thinks of IQ as the explanatory variable and number of fights as the response variable. Do they get the same correlation, or different ones? 10. Explain why two variables must both be quantitative in order to find the correlation between them. 11. Suppose someone codes race as follows: 0 = Caucasian, 1 = African American, 2 = Asian, 3 = Hispanic, 4 = American Indian, 5 = Other. Then someone calculates a correlation between race and a reading test score for a sample of kids. Do you have a problem with this? If so, what s your problem? 12. True or false: A correlation coefficient has units. 13. Melinda computes a correlation between the height of mothers and their daughters. Larry is looking at the computations and says, You blew it! You have the height of mothers measured in centimeters, and the height of the daughters measured in inches! Does Melinda need to do anything to fix her correlation coefficient, and if so, what? 14. What range of values is possible for the correlation coefficient? 15. What is true about the relationship between two variables if the r-value is: a. Near 0? d. Exactly 1? b. Near 1? e. Exactly -1? c. Near -1? 16. What sort of correlation coefficient do you find when two variables have a very strong linear relationship, and when the first gets greater, the second gets smaller? 17. Suppose the data points are two variables collected for all the days of 2006. For each of those days, imagine that we know (variable 1) the number of words Mrs. O. spoke in that day, and (variable 2) the peak barometric pressure for that day in Caracas, Venezuela. About what would you guess the correlation between these two variables to be? Why?
18. True or false: Correlation measures the strength of relationships other than just linear. 19. Suppose there are two variables which, when graphed in a scatterplot, form an almost perfect u-shaped parabola. Would the strong relationship between these variables imply a high correlation coefficient (meaning close to 1 or - 1)? Why or why not? 20. Does the correlation coefficient resemble the median and IQR in being fairly resistant to outliers, or resemble the mean and standard deviation in being heavily influenced by outliers (i.e. non-resistant)? 21. Someone practices guessing correlation coefficients from scatterplots using an applet on the internet. Why should the person not get too confident of his or her guessing power given scatterplots of real-life data? (Read p. 144 and look Figure 3.8 on p. 141) 22. In attempting to give a more complete description of a set of data involving two variables, someone want to give a measure of center and spread as well as the correlation coefficient. Assuming the person has made a good decision to use the correlation coefficient, what measure of center and spread would be most consistent with the correlation coefficient: the mean and standard deviation or the median and IQR? 23. The women in a corporation think that they are being discriminated against in their salaries. A management spokesman says to them, Look at this plot. The first data point is the average salary for men who have worked here 1 year, put into an ordered pair with the salary for women who have worked here one year. The second ordered pair is the average salary for men and women with two year s experience, and so forth. The correlation between men s salaries and women s salaries is.95! That s almost a perfect correlation! You women have nothing to complain about! Is this argument valid? Why or why not? 3.3 Least-Squares Regression 1. Finish this statement: A regression line is a straight line that 2. The least-squares regression line (abbreviated: ) is one way to try to fit a to two-variable data that shows a linear trend. 3. Because we use a regression line to y-values from given x-values, we want a regression line that makes the distances of the points in a scatterplot to the regression line as as possible. 4. Why is this regression model called a least squares regression line? 5. In other words, this is a line that minimizes the total in the squares. 6. The equation for a LSRL is ŷ = 7. True or false: This is the same equation (using the same letters) we use for lines in algebra.
8. Why do we use ŷ instead of y? 9. The slope of a LSRL is b =. 10. The intercept of a LSRL can be found by a =. 11. Under STAT-CALC in your graphing calculator, find the correct LinReg command for lines in statistics. It is NOT 4: LinReg(ax + b) but : 12. When copying down a LSRL from the calculator, don t forget to write instead of just y =. (Mrs. O. forgets this a lot please gently remind her!) 13. Interpreting the slope is important think of it as a rate of change. That is the amount of change in when increases by one unit. 14. The intercept of the regression line is the value of ŷ when x =. 15. Once you have a LSRL, how do you find a predicted value of y for a given x-value? 16. Suppose that someone measures height as a function of weight for a bunch of human adults, and gets a regression equation predicting height as a function of weight. Why is the y-intercept of the equation not as meaningful or important as the slope, or as the equation as a whole? 17. Look at both computer outputs on p.156. It is very important that you can find the slope and the y-intercept of a regression line from these. Use p. 155 to help you identify them. (Ignore all the other statistics for now.) 18. Suppose you have a regression equation output from a computer and you are asked to plot the line by hand. How would you do it? 19. Computer outputs for r 2 say. 20. While r is called the correlation coefficient, r 2 is called the of. 21. Finish this statement: r 2 is the fraction of 22. The r 2 value shows how much of the variation in one variable can be accounted for by the linear relationship with the other variable. If r 2 = 0.95, what can be concluded about the relationship between x and y? 23. True or false: In a regression line, like a correlation coefficient, you get the same numbers (slopes and intercepts) no matter which variable is considered the explanatory variable and which is considered the response variable.
24. True or false: If two variables are perfectly correlated, then the slope of the LSRL and the correlation coefficient r are the same. 25. Every LSRL passes through the point (, ) 26. When you see a correlation r, square it to get a better feel for the strength of the association. Read the paragraph on p. 165. In the r 2 scale, a correlation of.7 is about halfway between 0 and 1 because r 2 would equal. 27. Define residual and give the formula written in words and symbols. 28. The mean of the least-squares residuals is always. (It might be approximate due to.) 29. If a LSRL fits the data well, what do you see on the residual plot? 30. True or false: A curved pattern on a residual plot means the data is not linear. (Recall: a curved pattern on a normal probability plot shows that the data is not very.) 31. An outlier in a scatterplot is any observation that lies outside the overall of the other observations in any direction. It will have a large residual if it is an outlier in the direction. 32. An influential point is an observation that has a effect on the calculations of least-squares regression. These are generally outliers in the direction and may not have large residuals.