REGRESSION ANALYSIS ON THE COMPUTER. Dependent variable is %BF. Variable Constant waist. Coefficient SE(b 0 ) b 0 SE(b 0 )

Similar documents
REGRESSION ANALYSIS ON THE COMPUTER. Dependent variable is %BF. Variable Coefficient Constant waist SE(b 0.

7) Briefly explain why a large value of r 2 is desirable in a regression setting.

Homework 2 Math 11, UCSD, Winter 2018 Due on Tuesday, 23rd January

Math 075 Activities and Worksheets Book 2:

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) 1) A) B) C) D)

HW 3.2: page 193 #35-51 odd, 55, odd, 69, 71-78

AP Statistics Practice Test Ch. 3 and Previous

Lecture 12: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do.

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

STAT 201 Chapter 3. Association and Regression

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

Unit 8 Bivariate Data/ Scatterplots

Section 3.2 Least-Squares Regression

10/4/2007 MATH 171 Name: Dr. Lunsford Test Points Possible

(a) 50% of the shows have a rating greater than: impossible to tell

Homework Linear Regression Problems should be worked out in your notebook

Lesson 1: Distributions and Their Shapes

STOR 155 Section 2 Midterm Exam 1 (9/29/09)

(a) 50% of the shows have a rating greater than: impossible to tell

STT 200 Test 1 Green Give your answer in the scantron provided. Each question is worth 2 points.

Lecture 6B: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

Homework #3. SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

STAT445 Midterm Project1

M 140 Test 1 A Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

Correlation & Regression Exercises Chapters 14-15

Business Statistics Probability

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Section I: Multiple Choice Select the best answer for each question.

STA 3024 Spring 2013 EXAM 3 Test Form Code A UF ID #

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Problem #1 Neurological signs and symptoms of ciguatera poisoning as the start of treatment and 2.5 hours after treatment with mannitol.

Lab 4 (M13) Objective: This lab will give you more practice exploring the shape of data, and in particular in breaking the data into two groups.

Chapter 10: Moderation, mediation and more regression

Lab 5a Exploring Correlation

Regression Equation. November 29, S10.3_3 Regression. Key Concept. Chapter 10 Correlation and Regression. Definitions

Practice First Midterm Exam

Chapter 3, Section 1 - Describing Relationships (Scatterplots and Correlation)

14.1: Inference about the Model

STATISTICS & PROBABILITY

One-Way ANOVAs t-test two statistically significant Type I error alpha null hypothesis dependant variable Independent variable three levels;

3.2 Least- Squares Regression

INTERPRET SCATTERPLOTS

Overview of Non-Parametric Statistics

3.2A Least-Squares Regression

Chapter 3: Examining Relationships

Math 261 Exam I Spring Name:

Simple Linear Regression the model, estimation and testing

EXECUTIVE SUMMARY DATA AND PROBLEM

Chapter 14. Inference for Regression Inference about the Model 14.1 Testing the Relationship Signi!cance Test Practice

TEACHING REGRESSION WITH SIMULATION. John H. Walker. Statistics Department California Polytechnic State University San Luis Obispo, CA 93407, U.S.A.

STATISTICS INFORMED DECISIONS USING DATA

3. For a $5 lunch with a 55 cent ($0.55) tip, what is the value of the residual?

appstats26.notebook April 17, 2015

Instructions and Checklist

Chapter 4: More about Relationships between Two-Variables

Semester 1 Final Scientific calculators are allowed, NO GRAPHING CALCULATORS. You must show all your work to receive full credit.

CHAPTER TWO REGRESSION

SUMMER 2011 RE-EXAM PSYF11STAT - STATISTIK

Bangor University Laboratory Exercise 1, June 2008

LAB ASSIGNMENT 4 INFERENCES FOR NUMERICAL DATA. Comparison of Cancer Survival*

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

Section 6: Analysing Relationships Between Variables

CHAPTER ONE CORRELATION

SPSS output for 420 midterm study

Relationships. Between Measurements Variables. Chapter 10. Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Unit 3 Lesson 2 Investigation 4

CP Statistics Sem 1 Final Exam Review

SPSS output for 420 midterm study

Test date Name Meal Planning for the Family Study Sheet References: Notes in class, lectures, labs, assignments

Chapter 12: Analysis of covariance, ANCOVA

REVIEW PROBLEMS FOR FIRST EXAM

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Math 081 W2010 Exam 1 Ch 4.4 to 4.6 V 01 Preparation Dressler. Name 6) Multiply. 1) ) ) )

The Pretest! Pretest! Pretest! Assignment (Example 2)

CHAPTER 3 Describing Relationships

Regression Including the Interaction Between Quantitative Variables

UF#Stats#Club#STA#2023#Exam#1#Review#Packet# #Fall#2013#

manipulation influences other variables, the researcher is conducting a(n)

Further Mathematics 2018 CORE: Data analysis Chapter 3 Investigating associations between two variables

Chapter 13: Factorial ANOVA

MEASURES OF ASSOCIATION AND REGRESSION

Chapter 3 CORRELATION AND REGRESSION

Chapter 3 Review. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

ANALYZING BIVARIATE DATA

Roller coasters are an old thrill that continues to grow in popularity. Engineers and

AP Stats Chap 27 Inferences for Regression

Chapter 9: Comparing two means

Math MidTerm Exam & Math Final Examination STUDY GUIDE Spring 2011

Business Research Methods. Introduction to Data Analysis

Ordinary Least Squares Regression

Statistics Spring Study Guide

Chapter 8 Estimating with Confidence. Lesson 2: Estimating a Population Proportion

BIVARIATE DATA ANALYSIS

Chapter 9: Answers. Tests of Between-Subjects Effects. Dependent Variable: Time Spent Stalking After Therapy (hours per week)

IAPT: Regression. Regression analyses

Chapter 25. Paired Samples and Blocks. Copyright 2010 Pearson Education, Inc.

Simple Linear Regression

Unit 1 Exploring and Understanding Data

bivariate analysis: The statistical analysis of the relationship between two variables.

Transcription:

672 CHAPER 27 Inferences for Regression REGRESSION ANALYSIS ON HE COMPUER All statistics packages make a table of results for a regression. hese tables differ slightly from one package to another, but all are essentially the same. We ve seen two examples of such tables already. All packages offer analyses of the residuals. With some, you must request plots of the residuals as you request the regression. Others let you find the regression first and then analyze the residuals afterward. Either way, your analysis is not complete if you don t check the residuals with a histogram or Normal probability plot and a scatterplot of the residuals against x or the predicted values. You should, of course, always look at the scatterplot of your two variables before computing a regression. Regressions are almost always found with a computer or calculator. he calculations are too long to do conveniently by hand for data sets of any reasonable size. No matter how the regression is computed, the results are usually presented in a table that has a standard form. Here s a portion of a typical regression results table, along with annotations showing where the numbers come from: Activity: Regression on the Computer. How fast is the universe expanding? And how old is it? A prominent astronomer used regression to astound the scientific community. Read the story, analyze the data, and interactively learn about each of the numbers in a typical computer regression output table. R 2 s e may be called "Intercept" x-variable Dependent variable is %BF y-variable df = n - 2 R squared = 67.8% s = 4.713 with 25 2 = 248 degrees of freedom Variable Constant waist Coefficient 42.7341 1.69997 b SE(b ) b 1 SE(b 1 ) SE(Coeff) t-ratio 2.717.743 15.7 22.9 t = b 1 SE(b 1 ) Prob.1.1 P-values (two-tailed) t = b SE(b ) he regression table gives the coefficients (once you find them in the middle of all this other information), so we can see that the regression equation is and that the for the regression is 67.8%. (Is accounting for 68% of the variation in %Body Fat good enough to be useful? Is a prediction ME of more than 9% good enough? Health professionals might not be satisfied.) R 2 %BF = -42.73 + 1.7 Waist he column of t-ratios gives the test statistics for the respective null hypotheses that the true values of the coefficients are zero. he corresponding P-values are also usually reported. EXERCISES 1. Hurricane predictions. In Chapter 7 we looked at data from the National Oceanic and Atmospheric Administration about their success in predicting hurricane tracks. Here is a scatterplot of the error (in nautical miles) for predicting hurricane locations 72 hours in the future vs. the year in which the prediction (and the hurricane) occurred:

Exercises 673 72 Error (naut mi) 625 5 375 25. 7.5 15. 22.5 3. Year (since 197) In Chapter 7 we could describe this relationship only in general terms. Now we can learn more. Here is the regression analysis: a) Explain in context what the regression says. b) State the hypothesis about the slope (both numerically and in words) that describes how use of marijuana is associated with other drugs. c) Assuming that the assumptions for inference are satisfied, perform the hypothesis test and state your conclusion in context. d) Explain what R-squared means in context. e) Do these results indicate that marijuana use leads to the use of harder drugs? Explain. 3. Movie budgets. How does the cost of a movie depend on its length? Data on the cost (millions of dollars) and the running time (minutes) for major release films of 25 are summarized in these plots and computer output: Dependent variable is: 72Error R squared 5 58.5% s 5 75.38 with 36 2 2 5 34 degrees of freedom Intercept 453.223 24.61 18.4 #.1 Year since 197 28.3784 1.29 26.92 #.1 a) Explain in context what the regression says. b) State the hypothesis about the slope (both numerically and in words) that describes how hurricane prediction quality has changed. c) Assuming that the assumptions for inference are satisfied, perform the hypothesis test and state your conclusion in context. d) Explain what R-squared means in context. 2. Drug use. he European School Study Project on Alcohol and Other Drugs, published in 1995, investigated the use of marijuana and other drugs. Data from 11 countries are summarized in the following scatterplot and regression analysis. hey show the association between the percentage of a country s ninth graders who report having smoked marijuana and who have used other drugs such as LSD, amphetamines, and cocaine. 3. Budget ($m) 16 12 Dependent variable is: Budget($M) R squared 5 15.4% s 5 32.95 with 12 2 2 5 118 degrees of freedom Intercept 231.3869 17.12 21.83.693 Run ime.7144.1541 4.64 #.1 8 4 8 4 4 9 12 15 18 Run ime (minutes) Other (% use) 22.5 15. 7.5 25 5 75 Predicted 12.5 25. 37.5 Marijuana (% use in 9th grade) 8 4 Dependent variable is: Other R-squared 5 87.3% s 5 3.853 with 11 2 2 5 9 degrees of freedom Intercept 23.678 2.24 21.39.1974 Marijuana.6153.784 7.85,.1 4 1.5. 1.5 Nscores

674 CHAPER 27 Inferences for Regression a) Explain in context what the regression says. b) he intercept is negative. Discuss its value, taking note of the P-value. c) he output reports s = 32.95. Explain what that means in this context. d) What s the value of the standard error of the slope of the regression line? e) Explain what that means in this context. 4. House prices. How does the price of a house depend on its size? Data from Saratoga, New York, on 164 randomly selected houses that had been sold include data on price ($1 s) and size (1 s ft 2 ), producing the following graphs and computer output: Dependent variable is: Price R squared 5 59.5% s 5 53.79 with 164 2 2 5 162 degrees of freedom Intercept 23.11686 4.688 2.665.563 Size 94.4539 2.393 39.5 #.1 # of Houses Price ($1 s) 3 15 15 4 3 2 1 8 6 4 2 3 1.25 2.5 3.75 5. Size (1 ft 2 ) 125 25 375 5 Predicted 5 2 ($1s) 5. Movie budgets: the sequel. Exercise 3 shows computer output examining the association between the length of a movie and its cost. a) Check the assumptions and conditions for inference. b) Find a 95% confidence interval for the slope and interpret it in context. 6. Second home. Exercise 4 shows computer output examining the association between the sizes of houses and their sale prices. a) Check the assumptions and conditions for inference. b) Find a 95% confidence interval for the slope and interpret it in context. 7. Hot dogs. Healthy eating probably doesn t include hot dogs, but if you are going to have one, you d probably hope it s low in both calories and sodium. In its July 27 issue, Consumer Reports listed the number of calories and sodium content (in milligrams) for 13 brands of all-beef hot dogs it tested. Examine the association, assuming that the data satisfy the conditions for inference. Dependent variable is: Sodium R squared 5 6.5% s 5 59.66 with 13 2 2 5 11 degrees of freedom Constant 9.9783 77.69 1.17.2663 Calories 2.29959.567 4.1.18 a) State the appropriate hypotheses about the slope. b) est your hypotheses and state your conclusion in the proper context. 8. Cholesterol 27. Does a person s cholesterol level tend to change with age? Data collected from 146 adults aged 45 to 62 produced the regression analysis shown. Assuming that the data satisfy the conditions for inference, examine the association between age and cholesterol level. Dependent variable is: Chol s 5 46.16 Intercept 194.232 13.55 14.3 #.1 Age.771639.2574 3..56 a) State the appropriate hypothesis for the slope. b) est your hypothesis and state your conclusion in the proper context. 9. Second frank. Look again at Exercise 7 s regression output for the calorie and sodium content of hot dogs. a) he output reports s = 59.66. Explain what that means in this context. b) What s the value of the standard error of the slope of the regression line? c) Explain what that means in this context. 1. More cholesterol. Look again at Exercise 8 s regression output for age and cholesterol level. a) he output reports s = 46.16. Explain what that means in this context. b) What s the value of the standard error of the slope of the regression line? c) Explain what that means in this context. a) Explain in context what the regression says. b) he intercept is negative. Discuss its value, taking note of its P-value. c) he output reports s = 53.79. Explain what that means in this context. d) What s the value of the standard error of the slope of the regression line? e) Explain what that means in this context.

Exercises 675 11. Last dog. Based on the regression output seen in Exercise 7, create a 95% confidence interval for the slope of the regression line and interpret your interval in context. 14. Used cars 27. Classified ads in a newspaper offered several used oyota Corollas for sale. Listed below are the ages of the cars and the advertised prices. 12. Cholesterol, finis. Based on the regression output seen in Exercise 8, create a 95% confidence interval for the slope of the regression line and interpret it in context. Age (yr) Advertised Price ($) Age (yr) Advertised Price ($) 13. Marriage age 23. he scatterplot suggests a decrease in the difference in ages at first marriage for men and women since 1975. We want to examine the regression to see if this decrease is significant. Men Women (years) 2.6 2.35 2.1 1.85 1975 1985 1995 25 Year Dependent variable is: Men 2 Women R squared 5 65.6% s 5.1869 with 28 2 2 5 26 degrees of freedom Intercept 61.867 8.468 7.3 #.1 Year 2.2996.43 27.4 #.1 a) Write appropriate hypotheses. b) Here are the residuals plot and a histogram of the residuals. Do you think the conditions for inference are satisfied? Explain. 1 8 1 1399 1 13495 3 12999 4 95 4 1495 5 8995 5 9495 6 6999 7 695 7 785 8 6999 8 5995 1 495 1 4495 13 285 a) Make a scatterplot for these data. b) Do you think a linear model is appropriate? Explain. c) Find the equation of the regression line. d) Check the residuals to see if the conditions for inference are met. 15. Marriage age 23, again. Based on the analysis of marriage ages since 1975 given in Exercise 13, give a 95% confidence interval for the rate at which the age gap is closing. Explain what your confidence interval means. 16. Used cars 27, again. Based on the analysis of used car prices you did for Exercise 14, create a 95% confidence interval for the slope of the regression line and explain what your interval means in context. 17. Fuel economy. A consumer organization has reported test data for 5 car models. We will examine the association between the weight of the car (in thousands of pounds) and the fuel efficiency (in miles per gallon). Here are the scatterplot, summary statistics, and regression analysis: # of Years 6 4 2.25..25 Fuel efficiency (mpg) 32 28 24 2 (yr).375.25.125..125 2. 2.25 2.5 Predicted (yr) c) est the hypothesis and state your conclusion about the trend in age at first marriage. 2. 2.5 3. 3.5 Weight (1 lb) Variable Count Mean StdDev MPG 5 25.2 4.83394 wt/1 5 2.8878.511656 Dependent variable is: MPG R-squared 5 75.6% s 5 2.413 with 5 2 2 5 48 df Intercept 48.7393 1.976 24.7 #.1 Weight 28.21362.6738 212.2 #.1

676 CHAPER 27 Inferences for Regression (mpg) 3 3 6 2 1 1 2 25 3 Predicted (mpg) 45 525 6 675 Predicted Count 1 5 # of Students 2 15 1 5 a) Is there strong evidence of an association between the weight of a car and its gas mileage? Write an appropriate hypothesis. b) Are the assumptions for regression satisfied? c) est your hypothesis and state your conclusion. 18. SA scores. How strong was the association between student scores on the Math and Verbal sections of the old SA? Scores on each ranged from 2 to 8 and were widely used by college admissions offices. Here are summaries and plots of the scores for a graduating class at Ithaca High School: Variable Count Mean Median StdDev Range IntQRange Verbal 162 596.296 61 99.5199 49 14 Math 162 612.99 63 98.1343 44 15 Dependent variable is: Math R-squared 5 46.9% s 5 71.75 with 162 2 2 5 16 df Intercept 29.554 34.35 6.1 #.1 Verbal.67575.568 11.9 #.1 8 7 8 4 4 (mpg) 18 8 2 12 22 a) Is there evidence of an association between Math and Verbal scores? Write an appropriate hypothesis. b) Discuss the assumptions for inference. c) est your hypothesis and state an appropriate conclusion. 19. Fuel economy, part II. Consider again the data in Exercise 17 about the gas mileage and weights of cars. a) Create a 95% confidence interval for the slope of the regression line. b) Explain in this context what your confidence interval means. 2. SAs, part II. Consider the high school SA scores data from Exercise 18. a) Find a 9% confidence interval for the slope of the true line describing the association between Math and Verbal scores. b) Explain in this context what your confidence interval means. 21. *Fuel economy, part III. Consider again the data in Exercise 17 about the gas mileage and weights of cars. a) Create a 95% confidence interval for the average fuel efficiency among cars weighing 25 pounds, and explain what your interval means. b) Create a 95% prediction interval for the gas mileage you might get driving your new 345-pound SUV, and explain what that interval means. Math 6 5 4 4 5 6 7 8 Verbal 22. *SAs again. Consider the high school SA scores data from Exercise 18 once more. a) Find a 9% confidence interval for the mean SA- Math score for all students with an SA-Verbal score of 5. b) Find a 9% prediction interval for the Math score of the senior class president if you know she scored 71 on the Verbal section.

Exercises 677 23. Cereal. A healthy cereal should be low in both calories and sodium. Data for 77 cereals were examined and judged acceptable for inference. he 77 cereals had between 5 and 16 calories per serving and between and 32 mg of sodium per serving. Here s the regression analysis: Dependent variable is: Sodium R-squared 5 9.% s 5 8.49 with 77 2 2 5 75 degrees of freedom (g) 8 4 4 Intercept 21.4143 51.47.416.6786 Calories 1.29357.4738 2.73.79 a) Is there an association between the number of calories and the sodium content of cereals? Explain. b) Do you think this association is strong enough to be useful? Explain. 24. Brain size. Does your IQ depend on the size of your brain? A group of female college students took a test that measured their verbal IQs and also underwent an MRI scan to measure the size of their brains (in 1s of pixels). he scatterplot and regression analysis are shown, and the assumptions for inference were satisfied. Verbal IQ 12 1 # of Cereals 2 15 1 5 1 2 3 4 Predicted (g) 5 5 1 ( F) 26. Winter. he output shows an attempt to model the association between average January emperature (in degrees Fahrenheit) and Latitude (in degrees north of the equator) for 59 U.S. cities. Which of the assumptions for inference do you think are violated? Explain. 8 Dependent variable is: IQ_V R-squared 5 6.5% 8 85 9 95 Brain Size (1 pixels) erbal Intercept 24.1835 76.38 Size.98842.884 a) est an appropriate hypothesis about the association between brain size and IQ. b) State your conclusion about the strength of this association. 25. Another bowl. Further analysis of the data for the breakfast cereals in Exercise 23 looked for an association between Fiber content and Calories by attempting to construct a linear model. Here are several graphs. Which of the assumptions for inference are violated? Explain. Mean Jan. emp ( F) ( F) 25 6 45 3 15 2 1 1 3 35 4 45 Latitude 25. 37.5 5. Predicted ( F) Fiber (g) 12 9 6 3 # of Cities 2 15 1 5 5 75 1 125 15 Calories 12.5. 12.5 25. ( F)

678 CHAPER 27 Inferences for Regression 27. Acid rain. Biologists studying the effects of acid rain on wildlife collected data from 163 streams in the Adirondack Mountains. hey recorded the ph (acidity) of the water and the BCI, a measure of biological diversity. Here s a scatterplot of BCI against ph: BCI 15 125 1 75 6.8 7.2 7.6 8. ph And here is part of the regression analysis: Dependent variable is: BCI R-squared 5 27.1% s 5 14.4 with 163 2 2 5 161 degrees of freedom Intercept 2733.37 187.9 ph 2197.694 25.57 a) State the null and alternative hypotheses under investigation. b) Assuming that the assumptions for regression inference are reasonable, find the t- and P-values. c) State your conclusion. 28. El Niño. Concern over the weather associated with El Niño has increased interest in the possibility that the climate on earth is getting warmer. he most common theory relates an increase in atmospheric levels of carbon dioxide (CO 2 ), a greenhouse gas, to increases in temperature. Here is part of a regression analysis of the mean annual CO 2 concentration in the atmosphere, measured in parts per million (ppm), at the top of Mauna Loa in Hawaii and the mean annual air temperature over both land and sea across the globe, in degrees Celsius. he scatterplots and residuals plots indicated that the data were appropriate for inference. Dependent variable is: emp R-squared 5 33.4% s 5.89 with 37 2 2 5 35 degrees of freedom Intercept 15.366.3139 CO2.4.9 a) Write the equation of the regression line. b) Is there evidence of an association between CO 2 level and global temperature? c) Do you think predictions made by this regression will be very accurate? Explain. 29. Ozone. he Environmental Protection Agency is examining the relationship between the ozone level (in parts per million) and the population (in millions) of U.S. cities. Part of the regression analysis is shown. Dependent variable is: Ozone R-squared 5 84.4% s 5 5.454 with 16 2 2 5 14 df Intercept 18.892 2.395 Pop 6.65 1.91 a) We suspect that the greater the population of a city, the higher its ozone level. Is the relationship significant? Assuming the conditions for inference are satisfied, test an appropriate hypothesis and state your conclusion in context. b) Do you think that the population of a city is a useful predictor of ozone level? Use the values of both R 2 and s in your explanation. 3. Sales and profits. A business analyst was interested in the relationship between a company s sales and its profits. She collected data (in millions of dollars) from a random sample of Fortune 5 companies and created the regression analysis and summary statistics shown. he assumptions for regression inference appeared to be satisfied. Profits Sales Dependent variable is: Profits Count 79 79 R-squared 5 66.2% s 5 466.2 Mean 29.839 4178.29 Variance 635,172 49,163, Intercept 2176.644 61.16 Std Dev 796.977 711.63 Sales.92498.75 a) Is there a significant association between sales and profits? est an appropriate hypothesis and state your conclusion in context. b) Do you think that a company s sales serve as a useful predictor of its profits? Use the values of both R 2 and s in your explanation. 31. Ozone, again. Consider again the relationship between the population and ozone level of U.S. cities that you analyzed in Exercise 29. a) Give a 9% confidence interval for the approximate increase in ozone level associated with each additional million city inhabitants. *b) For the cities studied, the mean population was 1.7 million people. he population of Boston is approximately.6 million people. Predict the mean ozone level for cities of that size with an interval in which you have 9% confidence. 32. More sales and profits. Consider again the relationship between the sales and profits of Fortune 5 companies that you analyzed in Exercise 3. a) Find a 95% confidence interval for the slope of the regression line. Interpret your interval in context. *b) Last year the drug manufacturer Eli Lilly, Inc., reported gross sales of $9 billion (that s $9, million). Create a 95% prediction interval for the company s profits, and interpret your interval in context. 33. Start the car! In October 22, Consumer Reports listed the price (in dollars) and power (in cold cranking amps) of auto batteries. We want to know if more expensive batteries are generally better in terms of starting power. Here are several software displays:

Exercises 679 Dependent variable is: Power R-squared 5 25.2% s 5 116. with 33 2 2 5 31 degrees of freedom Intercept 384.594 93.55 4.11.3 Cost 4.14649 1.282 3.23.29 Power (cold cranking amps) (cold cranking amps) (cold cranking amps) 9 8 7 6 5 2 1 1 2 1 1 5 75 1 Cost ($) 6 7 8 Predicted (cold cranking amps) 1.25. 1.25 they started crawling. Data were collected on 28 boys and 26 girls. Parents reported the month of the baby s birth and the age (in weeks) at which their child first crawled. he table gives the average emperature ( F) when the babies were 6 months old and average Crawling Age (in weeks) for each month of the year. Make the plots and compute the analyses necessary to answer the following questions. Birth Month 6-Month emperature Average Crawling Age Jan. 66 29.84 Feb. 73 3.52 Mar. 72 29.7 April 63 31.84 May 52 28.58 June 39 31.44 July 33 33.64 Aug. 3 32.82 Sept. 33 33.83 Oct. 37 33.35 Nov. 48 33.38 Dec. 57 32.32 a) Would this association appear to be weaker, stronger, or the same if data had been plotted for individual babies instead of using monthly averages? Explain. b) Is there evidence of an association between emperature and Crawling Age? est an appropriate hypothesis and state your conclusion. Don t forget to check the assumptions. c) Create and interpret a 95% confidence interval for the slope of the true relationship. 35. Body fat. Do the data shown in the table below indicate an association between Waist size and %Body Fat? a) est an appropriate hypothesis and state your conclusion. *b) Give a 95% confidence interval for the mean %Body Fat found in people with 4-inch Waists. Normal Scores a) How many batteries were tested? b) Are the conditions for inference satisfied? Explain. c) Is there evidence of an association between the cost and cranking power of auto batteries? est an appropriate hypothesis and state your conclusion. d) Is the association strong? Explain. e) What is the equation of the regression line? f) Create a 9% confidence interval for the slope of the true line. g) Interpret your interval in this context. 34. Crawling. Researchers at the University of Denver Infant Study Center wondered whether temperature might influence the age at which babies learn to crawl. Perhaps the extra clothing that babies wear in cold weather would restrict movement and delay the age at which Waist (in.) Weight (lb) Body Fat (%) Waist (in.) Weight (lb) Body Fat (%) 32 175 6 33 188 1 36 181 21 4 24 2 38 2 15 36 175 22 33 159 6 32 168 9 39 196 22 44 246 38 4 192 31 33 16 1 41 25 32 41 215 27 35 173 21 34 159 12 38 187 25 34 146 1 38 188 3 44 219 28

68 CHAPER 27 Inferences for Regression 36. Body fat, again. Use the data from Exercise 35 to examine the association between Weight and %Body Fat. a) Find a 9% confidence interval for the slope of the regression line of %Body Fat on Weight. b) Interpret your interval in context. *c) Give a 95% prediction interval for the %Body Fat of an individual who weighs 165 pounds. 37. Grades. he data set below shows midterm scores from an Introductory Statistics course. First Name Midterm 1 Midterm 2 Homework imothy 82 3 61 Karen 96 68 72 Verena 57 82 69 Jonathan 89 92 84 Elizabeth 88 86 84 Patrick 93 81 71 Julia 9 83 79 homas 83 21 51 Marshall 59 62 58 Justin 89 57 79 Alexandra 83 86 78 Christopher 95 75 77 Justin 81 66 66 Miguel 86 63 74 Brian 81 86 76 Gregory 81 87 75 Kristina 98 96 84 imothy 5 27 2 Jason 91 83 71 Whitney 87 89 85 Alexis 9 91 68 Nicholas 95 82 68 Amandeep 91 37 54 Irena 93 81 82 Yvon 88 66 82 Sara 99 9 77 Annie 89 92 68 Benjamin 87 62 72 David 92 66 78 Josef 62 43 56 Rebecca 93 87 8 Joshua 95 93 87 Ian 93 65 66 Katharine 92 98 77 Emily 91 95 83 Brian 92 8 82 Shad 61 58 65 Michael 55 65 51 Israel 76 88 67 Iris 63 62 67 First Name Midterm 1 Midterm 2 Homework Mark 89 66 72 Peter 91 42 66 Catherine 9 85 78 Christina 75 62 72 Enrique 75 46 72 Sarah 91 65 77 homas 84 7 7 Sonya 94 92 81 Michael 93 78 72 Wesley 91 58 66 Mark 91 61 79 Adam 89 86 62 Jared 98 92 83 Michael 96 51 83 Kathryn 95 95 87 Nicole 98 89 77 Wayne 89 79 44 Elizabeth 93 89 73 John 74 64 72 Valentin 97 96 8 David 94 9 88 Marc 81 89 62 Samuel 94 85 76 Brooke 92 9 86 a) Fit a model predicting the second midterm score from the first. b) Comment on the model you found, including a discussion of the assumptions and conditions for regression. Is the coefficient for the slope statistically significant? c) A student comments that because the P-value for the slope is very small, Midterm 2 is very well predicted from Midterm 1. So, he reasons, next term the professor can give just one midterm. What do you think? 38. Grades? he professor teaching the Introductory Statistics class discussed in Exercise 37 wonders whether performance on homework can accurately predict midterm scores. a) o investigate it, she fits a regression of the sum of the two midterms scores on homework scores. Fit the regression model. b) Comment on the model including a discussion of the assumptions and conditions for regression. Is the coefficient for the slope statistically significant? c) Do you think she can accurately judge a student s performance without giving the midterms? Explain. 39. Strike two. Remember the Little League instructional video discussed in Chapter 25? Ads claimed it would improve the performances of Little League pitchers. o test this claim, 2 Little Leaguers threw 5 pitches each,

Exercises 681 and we recorded the number of strikes. After the players participated in the training program, we repeated the test. he table shows the number of strikes each player threw before and after the training. A test of paired differences failed to show that this training improves ability to throw strikes. Is there any evidence that the effectiveness of the video (After Before) depends on the player s initial ability to throw strikes (Before)? est an appropriate hypothesis and state your conclusion. Propose an explanation for what you find. Number of Strikes (out of 5) Before After Before After 28 35 33 33 29 36 33 35 3 32 34 32 32 28 34 3 32 3 34 33 32 31 35 34 32 32 36 37 32 34 36 33 32 35 37 35 33 36 37 32 4. All the efficiency money can buy. A sample of 84 model-24 cars from an online information service was examined to see how fuel efficiency (as highway mpg) relates to the cost (Manufacturer s Suggested Retail Price in dollars) of cars. Here are displays and computer output: a) State what you want to know, identify the variables, and give the appropriate hypotheses. b) Check the assumptions and conditions. c) If the conditions are met, complete the analysis. 41. Education and mortality. he software output below is based on the mortality rate (deaths per 1, people) and the education level (average number of years in school) for 58 U.S. cities. Variable Count Mean StdDev Mor tality 58 942.51 61.849 Education 58 11.328.79348 Dependent variable is: Mor tality R-squared 5 41.% s 5 47.92 with 58 2 2 5 56 degrees of freedom Intercept 1493.26 88.48 Education 249.922 8. Mortality (age-adjusted, deaths/1,) 15 975 9 825 9.75 1.5 11.25 12. Median Education (yr) Efficiency (highway mpg) 5 4 3 2 1 5 5 2 4 6 8 Manufacturer s Suggested Retail Price 9 93 96 99 Predicted Dependent variable is: Highway MPG R squared 5 3.1% s 5 5.298 with 84 2 2 5 82 degrees of freedom Constant 33.581 1.299 25.5 #.1 MSRP 22.16543e-4. 25.95 #.1 22.5 1 8 6 4 2 15. 7.5. 7.5 18 21 24 27 3 Predicted 1 1 a) Comment on the assumptions for inference. b) Is there evidence of a strong association between the level of Education in a city and the Mortality rate? est an appropriate hypothesis and state your conclusion.

682 CHAPER 27 Inferences for Regression c) Can we conclude that getting more education is likely (on average) to prolong your life? Why or why not? d) Find a 95% confidence interval for the slope of the true relationship. e) Explain what your interval means. *f) Find a 95% confidence interval for the average Mortality rate in cities where the adult population completed an average of 12 years of school. 42. Property assessments. he software outputs below provide information about the Size (in square feet) of 18 homes in Ithaca, New York, and the city s assessed Value of those homes. Variable Count Mean StdDev Range Size 18 23.39 264.727 89 Value 18 6946.7 5527.62 1971 Dependent variable is: V alue R-squared 5 32.5% s 5 4682 with 18 2 2 5 16 degrees of freedom Intercept 3718.8 8664 Size 11.8987 4.29 Assesed Value ($) ($) 72 68 64 6 56 4 4 18 2 22 24 Size (sq ft) a) Explain why inference for linear regression is appropriate with these data. b) Is there a significant association between the Size of a home and its assessed Value? est an appropriate hypothesis and state your conclusion. c) What percentage of the variability in assessed Value is explained by this regression? d) Give a 9% confidence interval for the slope of the true regression line, and explain its meaning in the proper context. e) From this analysis, can we conclude that adding a room to your house will increase its assessed Value? Why or why not? *f) he owner of a home measuring 21 square feet files an appeal, claiming that the $7,2 assessed Value is too high. Do you agree? Explain your reasoning. JUS CHECKING Answers 1. A high t-ratio of 3.27 indicates that the slope is different from zero that is, that there is a linear relationship between height and mouth size. he small P-value says that a slope this large would be very unlikely to occur by chance if, in fact, there was no linear relationship between the variables. 2. Not really. he R 2 for this regression is only 15.3%, so height doesn t account for very much of the variability in mouth size. 3. he value of s tells the standard deviation of the residuals. Mouth sizes have a mean of 6.3 cubic centimeters. A standard deviation of 15.7 in the residuals indicates that the errors made by this regression model can be quite large relative to what we are estimating. Errors of 15 to 3 cubic centimeters would be common. 1 1 Normal Scores 4 ($) 4 575 625 Predicted ($)