Homework Linear Regression Problems should be worked out in your notebook 1. Following are the mean heights of Kalama children: Age (months) 18 19 20 21 22 23 24 25 26 27 28 29 Height (cm) 76.1 77.0 78.1 78.2 78.8 79.7 79.9 81.1 81.2 81.8 82.8 83.5 a) Sketch a scatter plot b) Describe the pattern of the scatterplot. c) What is the correlation coefficient? Interpret in terms of the problem. d) Calculate and interpret the slope. e) Calculate and interpret the y-intercept. f) Write the equation of the regression line. Draw the regression line. g) Predict the height of a 32 month old child. h) Make a residual plot and comment on whether a linear model is appropriate. 2. The average prices (in dollars) per ounce of gold and silver for the years 1986 through 1994 are given below. Year 1986 1987 1988 1989 1990 1991 1992 1993 1994 Gold 368 478 438 383 385 363 345 361 389 Silver 5.47 7.01 6.53 5.50 4.82 4.04 3.94 4.30 5.30 a. What is the explanatory variable? Explain. b. Find the regression line for gold predicting silver. c. Interpret the slope and y-intercept. d. What is the correlation coefficient? Interpret. e. Find the regression line for silver predicting gold. f. Interpret the slope and y-intercept. g. What is the correlation coefficient? Interpret. Compare your answer to part d. h. What is the coefficient of determination? Interpret. 3. Good runners take more steps per second as they speed up. Here are the average numbers of steps per second for a group of top female runners at different speeds. The speeds are in feet per second. Speed (ft/s) 15.86 16.88 17.50 18.62 19.97 21.06 22.11 Steps per second 3.05 3.12 3.17 3.25 3.36 3.46 3.55 a) You want to predict steps per second from running speed. Which is the explanatory variable? Make a scatterplot of the data with this goal in mind. b) Describe the pattern of the scatterplot. c) What is the correlation coefficient? Interpret in terms of the problem. d) Calculate and interpret the slope. e) Calculate and interpret the y-intercept. f) Write the equation of the regression line. Draw the regression line. g) If you need to cover 20 ft/s to win a race, predict the steps per second you ll need to maintain. h) Make a residual plot and comment on whether a linear model is appropriate.
4. Car dealers across North America use the Red Book to help them determine the value of used cars that their customers trade in when purchasing new cars. The book lists on a monthly basis the amount paid at recent used-car auctions and indicates the values according to condition and optional features, but does not inform the dealers as to how odometer readings affect the trade-in value. In an experiment to determine whether the odometer reading should be included, ten 3-year-old cars are randomly selected of the same make, condition, and options. The trade-in value (in $100) and mileage (in 1000s of miles) are shown below. Odometer 59 92 61 72 52 67 88 62 95 83 Trade-in 37 31 43 39 41 39 35 40 29 33 a) Describe the pattern of the scatterplot. b) Find the sample regression line for determining how the odometer reading affects the trade-in value of the car. c) Interpret the slope in terms of the problem. d) Calculate and interpret the correlation coefficient. e) Calculate and interpret the coefficient of determination. f) Predict the trade-in value of a car with 60,000 miles. g) What would be the odometer reading of a car with a trade-in value of $4200? h) Make a residual plot and comment on whether a linear model is appropriate. i) What is the residual for the car with 92,000 miles on the odometer? 5. In one of the Boston city parks there has been a problem with muggings in the summer months. A police cadet took a random sample of 10 days (out of the 90-day summer) and compiled the following data. For each day, x represents the number of police officers on duty in the park and y represents the number of reported muggings on that day.. x 10 15 16 1 4 6 18 12 14 7 y 5 2 1 9 7 8 1 5 3 6 a) Sketch a scatter plot. Describe the pattern of the scatterplot. b) What is the regression line? c) What is the correlation coefficient? Interpret in terms of the problem. d) Interpret the slope in terms of the problem. e) Find the coefficient of determination and interpret in terms of the problem. f) Predict the number of muggings if there are 9 police officers on duty. 6. Each of the following statements contains a blunder. Explain in each case what is wrong. a. There is a high correlation between the gender of American workers and their income b. We found a high correlation (r = 1.09) between students ratings of faculty teaching and ratings made by other faculty members. c. The correlation between planting rate and yield of corn was found to be r =.23 bushel.
7. Foal weight at birth is an indicator of health, so it is of interest to breeders of thoroughbred horses. Is foal weight related to the weight of the mare? The accompanying data are from the article Suckling Behavior Does Not Measure Milk Intake in Horses (animal Behavior [1999]) Observation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Mare weight(kg) 556 638 588 550 580 642 568 642 556 616 549 504 515 551 594 Foal weight(kg) 129 119 132 123.5 112 113.5 95 104 104 93.5 108.5 95 117.5 128 127.5 a) Describe the pattern of the scatterplot. b) Find the equation of the regression line. c) Interpret the slope in terms of the problem. d) Interpret the y-intercept in terms of the problem. e) Calculate and interpret the correlation coefficient. f) Calculate and interpret the coefficient of determination. 8. The scatterplot shows the advertised prices (in thousands of dollars) plotted against ages (in years) for a random sample of Plymouth Voyagers on several dealers lots. A computer printout showing the results of a straight line to the data by the method of least squares gives: Price = 12.37 1.13 Age R-sq = 75.5% a) Find the correlation coefficient for the relationship between price and age of Voyagers based on these data. b) What is the slope of the regression line? Interpret it in the context of these data. c) How will the size of the correlation coefficient change if the 10-year-old Voyager is removed from the data set? Explain. d) How will the slope of the LSRL change if the 10- year-old Voyager is removed from the data? Plymouth Voyagers Scatter Plot 14 12 Price_1000 10 8 6 4 2 2 4 6 8 10 Age_in_years 9. One measure of the success of knee surgery is postsurgical range of motion for the knee joint. Postsurgical range of motion was recorded for 12 patients who had surgery following a knee dislocation. The age of each patient was also recorded ( Reconstruction American Journal of Sports Medicine). The average age was 25.83 years and standard deviation of 7.578 years. The average range of motion was 130.1 degrees with a standard deviation of 11.927 degrees. The correlation coefficient was r =.5534. a) If we use age to try and predict the range of motion, what is the slope? What is the y-intercept? Interpret the two in context of the problem. b) Use the regression line to predict the range of motion of someone 32 years of age. c) Use the regression line to predict the range of motion of someone 50 years of age. Do you feel this is an accurate prediction? Explain your thoughts.
10. Newsweek gave the following 1994 average weekly earnings from allowances, chores, work, and gifts for children of ages 4 through 12. Age Earnings 4 5 6 7 8 9 10 11 12 $5. 87 $7. 42 $7. 62 $10. 63 $10. 65 $10. 69 $12. 01 $13. 79 $20. 19 a. Construct a scatter plot. Describe the pattern of the scatterplot. b. Interpret the slope in terms of the problem. c. Find the coefficient of determination and interpret in terms of the problem. d. Find the correlation coefficient and interpret in terms of the problem. e. Predict the weekly earnings of a child who is age 16. Do you think this is a good prediction? Explain. 11. The paper A Cross-National Relationship between Sugar Consumption and Major Depression? (Depression and Anxiety [2002]) concluded that there was a strong correlation ( r.9444 ) between refined sugar consumption (calories per person per day) and annual rate of major depression (cases per 100 people) based on data from 6 countries. The average sugar consumption was 340.83 calories per person per day with a standard deviation of 110.56 calories while the annual rate of depression was 4.26 cases with a standard deviation of 1.338 cases. a) What is the slope of the regression line of annual rate of depression based on sugar consumption? What is the y-intercept? Interpret the two in context of the problem. b) Use the regression line to predict the depression rate of the United States if the average person consumes 300 calories per person per day. c) New Zealand s depression rate is 5.7 annual cases per 100 people. Use the model to find the possible sugar consumption. Does the regression line allow us to make this prediction? Explain. 12. How quickly can athletes return to their sport following injuries requiring surgery? The paper Arthroscopic Distal Clavicle Resection for Isolated Atraumatic Osteolysis in Weight Lifters (American Journal of Sports Medicine, 1998) discovered there was a moderate positive (r =.55) linear relationship between a lifters age and the number of days after arthroscopic shoulder surgery before being able to return to their sport between 10 weight lifters. The average age of the weight lifters was 30.4 with standard deviation of 2.875 years. The average number of days before being able to return to their sport was 3.2 days with a standard deviation of 1.398 days. a. Determine the line to predict the number of days based on the age of the weight lifter. b. Determine the coefficient of determination and interpret in terms of the problem. c. Given the spread of the lifters was from 26 to 34 years old, predict the number of days for a 28 year old lifter. Do you feel this prediction is accurate? Explain.
13. Success in hunting varies greatly among species of animals. Lions, who hunt singly, are rarely successful in more than 10 percent of their hunts. Wild African dogs, who hunt in packs, are among the most efficient of all hunters, succeeding at a rate of over 90 percent of their hunts. In the early 1960 s, researcher Jane Goodall discovered that chimpanzees were not solely vegetarian in their diets, as had previously been thought. This discovery spurred a tremendous amount of primate research. Some of the latest primatology research has been done on chimpanzees to find out if larger hunting parties increase the chances of a successful hunt. The results of one such research project are summarized in the table for the number of chimpanzees in the hunting party versus the percentage of successful hunts. Number of Chimps 1 2 3 4 5 6 7 8 9 10 12 13 14 15 16 Percent of Success 20 30 28 42 40 58 45 62 65 63 75 75 78 75 82 a. Construct a scatter plot. b. Determine the regression line. c. Interpret the y-intercept. Does the interpretation make sense in this context? d. Interpret the slope. e. Find the correlation coefficient and interpret in terms of the problem. f. Find the coefficient of determination and interpret in terms of the problem. g. Sketch the residual plot. Interpret in terms of the problem. 14. The following is a table of the number of registered automatic weapons (in thousands) of selected states and their corresponding murder rates. Weapons 116. 8. 3 36. 0. 6 6. 9 2. 5 2. 4 2. 6 Rates 131. 10. 6 101. 4. 4 115. 6. 6 36. 53. a. Determine the regression line. b. Predict the number of weapons for a state with a rate of 8.5? c. Predict the murder rate for a state with 10,000 registered automatic weapons. 15. The following output data from MINITAB shows the height of girls (in cm) based on the number of years old. Predictor Coef Stdev t-ratio p Constant 76.61 1.188 64.52 0.000 Age(yrs) 6.3661 0.1672 38.02 0.000 s=1.518 R-sq=99.5% a) What is the equation of the least squares line? Interpret the slope. b) Find the correlation coefficient and coefficient of determination. Interpret in the context of the problem. c) Predict the height of a 3 year old girl. d) Predict the age if a girl is 135 cm.
16. Women made significant gains in the 1970 s in terms of their acceptance into professions that had been traditionally populated by men. To measure just how big these gains were, we will compare the percentage of professional degrees award to women in 1973-1974 to the percentage awarded in 1978-1979 for selected fields of student. Field Degrees in 73-74 Degrees in 78-79 Dentistry 2.0% 11.9% Law 11.5 28.5 Medicine 11.2 23.1 Optometry 4.2 13.0 Osteopathic medicine 2.8 15.7 Podiatry 1.1 7.2 Theology 5.5 13.1 Veterinary medicine 11.2 28.9 a) What is the regression line? b) Interpret the slope in terms of the problem. c) Find the coefficient of determination and interpret in terms of the problem. d) Sketch the residual plot. Interpret. e) Find the residual for optometry. f) Find the residual for veterinary medicine. Did the regression line over or under predict? Explain. 17. Shells of mollusks function as both part of the skeletal system and as protective armor. It has been argued that many features of these shells were the result of natural selection in the constant battle against predators. The paper Postmortem Changes in Strength of Gastropod Shells included scatter plot of data on x = shell height (cm) and y = breaking strength (newtons). The least squares line for a sample of 38 hermit crab shells was y 2751. 244. 9 x. a. What are the slope and intercept of this line? b. When shell height increases by 1 cm, by how much does breaking strength tend to change? c. What breaking strength would you predict when shell height is 2 cm? d. Does this approximate linear relationship appear to hold for shell heights as small as 1 cm? Explain your thoughts. 18. Given the following data sets, find the regression line. Sketch the residual plot and comment on the likelihood of the regression line being a good model. x y 2 3 4 5 6 7 8 9 86 96 103 110 115 120 130 131 x y 3 6 8 9 11 14 18 20 19 22 39 50 75 87 96 125
19. The data come from a study of ice cream consumption that spanned the springs and summers of three years. The ice cream consumption (pints per capita per year), family income of consumers ($1000 per year) and the temperature (degrees Fahrenheit) is listed below. Consumption Income Temperature 20. 07 19. 45 20. 44 221. 2111. 17. 89 17. 00 14. 98 1399. 1331. 18. 25 1331. 1398. 18. 72 17. 78 18. 25 1918. 1851. 17. 78 1851. 41 56 63 68 69 65 61 47 32 24 a. Complete two scatter plots with consumption being the response variable for each plot. b. Find the two regression lines. c. Interpret the slopes. d. Interpret the coefficient of determinations. e. Sketch and interpret both residual plots. f. Which do you think is the better predictor of consumption? Explain. g. Predict the consumption for a temperature of 53 degrees. h. Predict the consumption for an income of $17,500. i. Predict the income and temperature for 3 gallons a year. 20. People with diabetes measure their fasting plasma glucose (FPG; measured in units of milligrams per milliliter) after fasting for at least 8 hours. Another measurement, made at regular medical checkups is called HbA. This is roughly the percent of red blood cells that have a glucose molecule attached. It measures average exposure to glucose over a period of several months. The table below gives data on both HbA and FPG for 18 diabetics five months after they had completed a diabetes education class. HbA FPG HbA FPG Subject (%) (mg/ml) Subject (%) (mg/ml) 1 6.1 141 10 8.7 172 2 6.3 158 11 9.4 200 3 6.4 112 12 10.4 271 4 6.8 153 13 10.6 103 5 7.0 134 14 10.7 172 6 7.1 95 15 10.7 359 7 7.5 96 16 11.2 145 8 7.7 78 17 13.7 147 9 7.9 148 18 19.3 255 a) Sketch a scatter plot. Describe the scatterplot. Subject 15 is an outlier in the y direction. Subject 18 is an outlier in the x direction. b) Find the correlation and the regression line for all 18 subjects c) Find the correlation and the regression line when only subject 15 is removed. d) Find the correlation and the regression line when only subject 18 is removed. e) Are either or both of these points influential for the correlation? Explain why r changes in opposite directions when we remove each of these points. f) Is either Subject 15 or Subject 18 strongly influential for the least-squares line?