Homework Linear Regression Problems should be worked out in your notebook

Similar documents
HW 3.2: page 193 #35-51 odd, 55, odd, 69, 71-78

AP Statistics Practice Test Ch. 3 and Previous

Math 075 Activities and Worksheets Book 2:

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do.

3.2A Least-Squares Regression

Lecture 6B: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

Lecture 12: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

M 140 Test 1 A Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

Chapter 3: Examining Relationships

3.4 What are some cautions in analyzing association?

14.1: Inference about the Model

Regression Equation. November 29, S10.3_3 Regression. Key Concept. Chapter 10 Correlation and Regression. Definitions

Chapter 3 Review. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Section 3.2 Least-Squares Regression

Chapter 14. Inference for Regression Inference about the Model 14.1 Testing the Relationship Signi!cance Test Practice

STAT 201 Chapter 3. Association and Regression

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

3. For a $5 lunch with a 55 cent ($0.55) tip, what is the value of the residual?

Lesson 1: Distributions and Their Shapes

Homework #3. SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

BIVARIATE DATA ANALYSIS

STATISTICS INFORMED DECISIONS USING DATA

10/4/2007 MATH 171 Name: Dr. Lunsford Test Points Possible

7) Briefly explain why a large value of r 2 is desirable in a regression setting.

3.2 Least- Squares Regression

Reminders/Comments. Thanks for the quick feedback I ll try to put HW up on Saturday and I ll you

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

Answer all three questions. All questions carry equal marks.

SCATTER PLOTS AND TREND LINES

Chapter 3: Describing Relationships

STAT 135 Introduction to Statistics via Modeling: Midterm II Thursday November 16th, Name:

Practice First Midterm Exam

Lab 4 (M13) Objective: This lab will give you more practice exploring the shape of data, and in particular in breaking the data into two groups.

Regression. Regression lines CHAPTER 5

5 To Invest or not to Invest? That is the Question.

Unit 8 Day 1 Correlation Coefficients.notebook January 02, 2018

Unit 8 Bivariate Data/ Scatterplots

Correlation & Regression Exercises Chapters 14-15

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

STATISTICS 8 CHAPTERS 1 TO 6, SAMPLE MULTIPLE CHOICE QUESTIONS

M 140 Test 1 A Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 60

A response variable is a variable that. An explanatory variable is a variable that.

Chapter 3, Section 1 - Describing Relationships (Scatterplots and Correlation)

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) 1) A) B) C) D)

(a) 50% of the shows have a rating greater than: impossible to tell

Chapter 4: More about Relationships between Two-Variables

UF#Stats#Club#STA#2023#Exam#1#Review#Packet# #Fall#2013#

Chapter 3 CORRELATION AND REGRESSION

Pre-Test Unit 9: Descriptive Statistics

Statistical Reasoning in Public Health 2009 Biostatistics 612, Homework #2

Further Mathematics 2018 CORE: Data analysis Chapter 3 Investigating associations between two variables

(a) 50% of the shows have a rating greater than: impossible to tell

Lab 5a Exploring Correlation

Chapter 4: More about Relationships between Two-Variables Review Sheet

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test February 2016

Homework 2 Math 11, UCSD, Winter 2018 Due on Tuesday, 23rd January

bivariate analysis: The statistical analysis of the relationship between two variables.

REVIEW PROBLEMS FOR FIRST EXAM

CHILD HEALTH AND DEVELOPMENT STUDY

STT 200 Test 1 Green Give your answer in the scantron provided. Each question is worth 2 points.

Chapter 1: Exploring Data

Unit 1 Exploring and Understanding Data

INTERPRET SCATTERPLOTS

Semester 1 Final Scientific calculators are allowed, NO GRAPHING CALCULATORS. You must show all your work to receive full credit.

Introduction to regression

Stat 13, Lab 11-12, Correlation and Regression Analysis

Section I: Multiple Choice Select the best answer for each question.

Business Statistics Probability

Multiple Regression Analysis

AP Stats Chap 27 Inferences for Regression

Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world

Beware of Confounding Variables

12.1 Inference for Linear Regression. Introduction

about Eat Stop Eat is that there is the equivalent of two days a week where you don t have to worry about what you eat.

1.4 - Linear Regression and MS Excel

STATISTICS 201. Survey: Provide this Info. How familiar are you with these? Survey, continued IMPORTANT NOTE. Regression and ANOVA 9/29/2013

Multiple Choice Questions

EXECUTIVE SUMMARY DATA AND PROBLEM

STAT445 Midterm Project1

Unit 3 Lesson 2 Investigation 4

AP Statistics Practice Test Unit Seven Sampling Distributions. Name Period Date

Chapter 5: Summarizing Bivariate Data Review Pack

MEASURES OF GROUP CHARACTERISTICS

CHAPTER 3 Describing Relationships

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Caffeine & Calories in Soda. Statistics. Anthony W Dick

STATISTICS & PROBABILITY

Welcome to OSA Training Statistics Part II

Part 1. For each of the following questions fill-in the blanks. Each question is worth 2 points.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Examining Relationships Least-squares regression. Sections 2.3

Exponential Decay. Lesson2

Introduction to Econometrics

Pitfalls in Linear Regression Analysis

Problem #1 Neurological signs and symptoms of ciguatera poisoning as the start of treatment and 2.5 hours after treatment with mannitol.

STOR 155 Section 2 Midterm Exam 1 (9/29/09)

INTERMEDIATE ALGEBRA Review for Exam 3

Department of Statistics TEXAS A&M UNIVERSITY STAT 211. Instructor: Keith Hatfield

Simple Linear Regression the model, estimation and testing

Transcription:

Homework Linear Regression Problems should be worked out in your notebook 1. Following are the mean heights of Kalama children: Age (months) 18 19 20 21 22 23 24 25 26 27 28 29 Height (cm) 76.1 77.0 78.1 78.2 78.8 79.7 79.9 81.1 81.2 81.8 82.8 83.5 a) Sketch a scatter plot b) Describe the pattern of the scatterplot. c) What is the correlation coefficient? Interpret in terms of the problem. d) Calculate and interpret the slope. e) Calculate and interpret the y-intercept. f) Write the equation of the regression line. Draw the regression line. g) Predict the height of a 32 month old child. h) Make a residual plot and comment on whether a linear model is appropriate. 2. The average prices (in dollars) per ounce of gold and silver for the years 1986 through 1994 are given below. Year 1986 1987 1988 1989 1990 1991 1992 1993 1994 Gold 368 478 438 383 385 363 345 361 389 Silver 5.47 7.01 6.53 5.50 4.82 4.04 3.94 4.30 5.30 a. What is the explanatory variable? Explain. b. Find the regression line for gold predicting silver. c. Interpret the slope and y-intercept. d. What is the correlation coefficient? Interpret. e. Find the regression line for silver predicting gold. f. Interpret the slope and y-intercept. g. What is the correlation coefficient? Interpret. Compare your answer to part d. h. What is the coefficient of determination? Interpret. 3. Good runners take more steps per second as they speed up. Here are the average numbers of steps per second for a group of top female runners at different speeds. The speeds are in feet per second. Speed (ft/s) 15.86 16.88 17.50 18.62 19.97 21.06 22.11 Steps per second 3.05 3.12 3.17 3.25 3.36 3.46 3.55 a) You want to predict steps per second from running speed. Which is the explanatory variable? Make a scatterplot of the data with this goal in mind. b) Describe the pattern of the scatterplot. c) What is the correlation coefficient? Interpret in terms of the problem. d) Calculate and interpret the slope. e) Calculate and interpret the y-intercept. f) Write the equation of the regression line. Draw the regression line. g) If you need to cover 20 ft/s to win a race, predict the steps per second you ll need to maintain. h) Make a residual plot and comment on whether a linear model is appropriate.

4. Car dealers across North America use the Red Book to help them determine the value of used cars that their customers trade in when purchasing new cars. The book lists on a monthly basis the amount paid at recent used-car auctions and indicates the values according to condition and optional features, but does not inform the dealers as to how odometer readings affect the trade-in value. In an experiment to determine whether the odometer reading should be included, ten 3-year-old cars are randomly selected of the same make, condition, and options. The trade-in value (in $100) and mileage (in 1000s of miles) are shown below. Odometer 59 92 61 72 52 67 88 62 95 83 Trade-in 37 31 43 39 41 39 35 40 29 33 a) Describe the pattern of the scatterplot. b) Find the sample regression line for determining how the odometer reading affects the trade-in value of the car. c) Interpret the slope in terms of the problem. d) Calculate and interpret the correlation coefficient. e) Calculate and interpret the coefficient of determination. f) Predict the trade-in value of a car with 60,000 miles. g) What would be the odometer reading of a car with a trade-in value of $4200? h) Make a residual plot and comment on whether a linear model is appropriate. i) What is the residual for the car with 92,000 miles on the odometer? 5. In one of the Boston city parks there has been a problem with muggings in the summer months. A police cadet took a random sample of 10 days (out of the 90-day summer) and compiled the following data. For each day, x represents the number of police officers on duty in the park and y represents the number of reported muggings on that day.. x 10 15 16 1 4 6 18 12 14 7 y 5 2 1 9 7 8 1 5 3 6 a) Sketch a scatter plot. Describe the pattern of the scatterplot. b) What is the regression line? c) What is the correlation coefficient? Interpret in terms of the problem. d) Interpret the slope in terms of the problem. e) Find the coefficient of determination and interpret in terms of the problem. f) Predict the number of muggings if there are 9 police officers on duty. 6. Each of the following statements contains a blunder. Explain in each case what is wrong. a. There is a high correlation between the gender of American workers and their income b. We found a high correlation (r = 1.09) between students ratings of faculty teaching and ratings made by other faculty members. c. The correlation between planting rate and yield of corn was found to be r =.23 bushel.

7. Foal weight at birth is an indicator of health, so it is of interest to breeders of thoroughbred horses. Is foal weight related to the weight of the mare? The accompanying data are from the article Suckling Behavior Does Not Measure Milk Intake in Horses (animal Behavior [1999]) Observation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Mare weight(kg) 556 638 588 550 580 642 568 642 556 616 549 504 515 551 594 Foal weight(kg) 129 119 132 123.5 112 113.5 95 104 104 93.5 108.5 95 117.5 128 127.5 a) Describe the pattern of the scatterplot. b) Find the equation of the regression line. c) Interpret the slope in terms of the problem. d) Interpret the y-intercept in terms of the problem. e) Calculate and interpret the correlation coefficient. f) Calculate and interpret the coefficient of determination. 8. The scatterplot shows the advertised prices (in thousands of dollars) plotted against ages (in years) for a random sample of Plymouth Voyagers on several dealers lots. A computer printout showing the results of a straight line to the data by the method of least squares gives: Price = 12.37 1.13 Age R-sq = 75.5% a) Find the correlation coefficient for the relationship between price and age of Voyagers based on these data. b) What is the slope of the regression line? Interpret it in the context of these data. c) How will the size of the correlation coefficient change if the 10-year-old Voyager is removed from the data set? Explain. d) How will the slope of the LSRL change if the 10- year-old Voyager is removed from the data? Plymouth Voyagers Scatter Plot 14 12 Price_1000 10 8 6 4 2 2 4 6 8 10 Age_in_years 9. One measure of the success of knee surgery is postsurgical range of motion for the knee joint. Postsurgical range of motion was recorded for 12 patients who had surgery following a knee dislocation. The age of each patient was also recorded ( Reconstruction American Journal of Sports Medicine). The average age was 25.83 years and standard deviation of 7.578 years. The average range of motion was 130.1 degrees with a standard deviation of 11.927 degrees. The correlation coefficient was r =.5534. a) If we use age to try and predict the range of motion, what is the slope? What is the y-intercept? Interpret the two in context of the problem. b) Use the regression line to predict the range of motion of someone 32 years of age. c) Use the regression line to predict the range of motion of someone 50 years of age. Do you feel this is an accurate prediction? Explain your thoughts.

10. Newsweek gave the following 1994 average weekly earnings from allowances, chores, work, and gifts for children of ages 4 through 12. Age Earnings 4 5 6 7 8 9 10 11 12 $5. 87 $7. 42 $7. 62 $10. 63 $10. 65 $10. 69 $12. 01 $13. 79 $20. 19 a. Construct a scatter plot. Describe the pattern of the scatterplot. b. Interpret the slope in terms of the problem. c. Find the coefficient of determination and interpret in terms of the problem. d. Find the correlation coefficient and interpret in terms of the problem. e. Predict the weekly earnings of a child who is age 16. Do you think this is a good prediction? Explain. 11. The paper A Cross-National Relationship between Sugar Consumption and Major Depression? (Depression and Anxiety [2002]) concluded that there was a strong correlation ( r.9444 ) between refined sugar consumption (calories per person per day) and annual rate of major depression (cases per 100 people) based on data from 6 countries. The average sugar consumption was 340.83 calories per person per day with a standard deviation of 110.56 calories while the annual rate of depression was 4.26 cases with a standard deviation of 1.338 cases. a) What is the slope of the regression line of annual rate of depression based on sugar consumption? What is the y-intercept? Interpret the two in context of the problem. b) Use the regression line to predict the depression rate of the United States if the average person consumes 300 calories per person per day. c) New Zealand s depression rate is 5.7 annual cases per 100 people. Use the model to find the possible sugar consumption. Does the regression line allow us to make this prediction? Explain. 12. How quickly can athletes return to their sport following injuries requiring surgery? The paper Arthroscopic Distal Clavicle Resection for Isolated Atraumatic Osteolysis in Weight Lifters (American Journal of Sports Medicine, 1998) discovered there was a moderate positive (r =.55) linear relationship between a lifters age and the number of days after arthroscopic shoulder surgery before being able to return to their sport between 10 weight lifters. The average age of the weight lifters was 30.4 with standard deviation of 2.875 years. The average number of days before being able to return to their sport was 3.2 days with a standard deviation of 1.398 days. a. Determine the line to predict the number of days based on the age of the weight lifter. b. Determine the coefficient of determination and interpret in terms of the problem. c. Given the spread of the lifters was from 26 to 34 years old, predict the number of days for a 28 year old lifter. Do you feel this prediction is accurate? Explain.

13. Success in hunting varies greatly among species of animals. Lions, who hunt singly, are rarely successful in more than 10 percent of their hunts. Wild African dogs, who hunt in packs, are among the most efficient of all hunters, succeeding at a rate of over 90 percent of their hunts. In the early 1960 s, researcher Jane Goodall discovered that chimpanzees were not solely vegetarian in their diets, as had previously been thought. This discovery spurred a tremendous amount of primate research. Some of the latest primatology research has been done on chimpanzees to find out if larger hunting parties increase the chances of a successful hunt. The results of one such research project are summarized in the table for the number of chimpanzees in the hunting party versus the percentage of successful hunts. Number of Chimps 1 2 3 4 5 6 7 8 9 10 12 13 14 15 16 Percent of Success 20 30 28 42 40 58 45 62 65 63 75 75 78 75 82 a. Construct a scatter plot. b. Determine the regression line. c. Interpret the y-intercept. Does the interpretation make sense in this context? d. Interpret the slope. e. Find the correlation coefficient and interpret in terms of the problem. f. Find the coefficient of determination and interpret in terms of the problem. g. Sketch the residual plot. Interpret in terms of the problem. 14. The following is a table of the number of registered automatic weapons (in thousands) of selected states and their corresponding murder rates. Weapons 116. 8. 3 36. 0. 6 6. 9 2. 5 2. 4 2. 6 Rates 131. 10. 6 101. 4. 4 115. 6. 6 36. 53. a. Determine the regression line. b. Predict the number of weapons for a state with a rate of 8.5? c. Predict the murder rate for a state with 10,000 registered automatic weapons. 15. The following output data from MINITAB shows the height of girls (in cm) based on the number of years old. Predictor Coef Stdev t-ratio p Constant 76.61 1.188 64.52 0.000 Age(yrs) 6.3661 0.1672 38.02 0.000 s=1.518 R-sq=99.5% a) What is the equation of the least squares line? Interpret the slope. b) Find the correlation coefficient and coefficient of determination. Interpret in the context of the problem. c) Predict the height of a 3 year old girl. d) Predict the age if a girl is 135 cm.

16. Women made significant gains in the 1970 s in terms of their acceptance into professions that had been traditionally populated by men. To measure just how big these gains were, we will compare the percentage of professional degrees award to women in 1973-1974 to the percentage awarded in 1978-1979 for selected fields of student. Field Degrees in 73-74 Degrees in 78-79 Dentistry 2.0% 11.9% Law 11.5 28.5 Medicine 11.2 23.1 Optometry 4.2 13.0 Osteopathic medicine 2.8 15.7 Podiatry 1.1 7.2 Theology 5.5 13.1 Veterinary medicine 11.2 28.9 a) What is the regression line? b) Interpret the slope in terms of the problem. c) Find the coefficient of determination and interpret in terms of the problem. d) Sketch the residual plot. Interpret. e) Find the residual for optometry. f) Find the residual for veterinary medicine. Did the regression line over or under predict? Explain. 17. Shells of mollusks function as both part of the skeletal system and as protective armor. It has been argued that many features of these shells were the result of natural selection in the constant battle against predators. The paper Postmortem Changes in Strength of Gastropod Shells included scatter plot of data on x = shell height (cm) and y = breaking strength (newtons). The least squares line for a sample of 38 hermit crab shells was y 2751. 244. 9 x. a. What are the slope and intercept of this line? b. When shell height increases by 1 cm, by how much does breaking strength tend to change? c. What breaking strength would you predict when shell height is 2 cm? d. Does this approximate linear relationship appear to hold for shell heights as small as 1 cm? Explain your thoughts. 18. Given the following data sets, find the regression line. Sketch the residual plot and comment on the likelihood of the regression line being a good model. x y 2 3 4 5 6 7 8 9 86 96 103 110 115 120 130 131 x y 3 6 8 9 11 14 18 20 19 22 39 50 75 87 96 125

19. The data come from a study of ice cream consumption that spanned the springs and summers of three years. The ice cream consumption (pints per capita per year), family income of consumers ($1000 per year) and the temperature (degrees Fahrenheit) is listed below. Consumption Income Temperature 20. 07 19. 45 20. 44 221. 2111. 17. 89 17. 00 14. 98 1399. 1331. 18. 25 1331. 1398. 18. 72 17. 78 18. 25 1918. 1851. 17. 78 1851. 41 56 63 68 69 65 61 47 32 24 a. Complete two scatter plots with consumption being the response variable for each plot. b. Find the two regression lines. c. Interpret the slopes. d. Interpret the coefficient of determinations. e. Sketch and interpret both residual plots. f. Which do you think is the better predictor of consumption? Explain. g. Predict the consumption for a temperature of 53 degrees. h. Predict the consumption for an income of $17,500. i. Predict the income and temperature for 3 gallons a year. 20. People with diabetes measure their fasting plasma glucose (FPG; measured in units of milligrams per milliliter) after fasting for at least 8 hours. Another measurement, made at regular medical checkups is called HbA. This is roughly the percent of red blood cells that have a glucose molecule attached. It measures average exposure to glucose over a period of several months. The table below gives data on both HbA and FPG for 18 diabetics five months after they had completed a diabetes education class. HbA FPG HbA FPG Subject (%) (mg/ml) Subject (%) (mg/ml) 1 6.1 141 10 8.7 172 2 6.3 158 11 9.4 200 3 6.4 112 12 10.4 271 4 6.8 153 13 10.6 103 5 7.0 134 14 10.7 172 6 7.1 95 15 10.7 359 7 7.5 96 16 11.2 145 8 7.7 78 17 13.7 147 9 7.9 148 18 19.3 255 a) Sketch a scatter plot. Describe the scatterplot. Subject 15 is an outlier in the y direction. Subject 18 is an outlier in the x direction. b) Find the correlation and the regression line for all 18 subjects c) Find the correlation and the regression line when only subject 15 is removed. d) Find the correlation and the regression line when only subject 18 is removed. e) Are either or both of these points influential for the correlation? Explain why r changes in opposite directions when we remove each of these points. f) Is either Subject 15 or Subject 18 strongly influential for the least-squares line?