Regression Equation. November 29, S10.3_3 Regression. Key Concept. Chapter 10 Correlation and Regression. Definitions

Similar documents
Lecture 6B: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

Chapter 3 CORRELATION AND REGRESSION

Lecture 12: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

Section 3.2 Least-Squares Regression

Chapter 1: Exploring Data

Homework #3. SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

3.2 Least- Squares Regression

Chapter 3: Examining Relationships

3.2A Least-Squares Regression

M15_BERE8380_12_SE_C15.6.qxd 2/21/11 8:21 PM Page Influence Analysis 1

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

Multiple Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

Math 075 Activities and Worksheets Book 2:

Chapter 3: Describing Relationships

Conditional Distributions and the Bivariate Normal Distribution. James H. Steiger

HW 3.2: page 193 #35-51 odd, 55, odd, 69, 71-78

STATISTICS INFORMED DECISIONS USING DATA

bivariate analysis: The statistical analysis of the relationship between two variables.

CHAPTER 3 Describing Relationships

3.4 What are some cautions in analyzing association?

Lab 4 (M13) Objective: This lab will give you more practice exploring the shape of data, and in particular in breaking the data into two groups.

AP Statistics Practice Test Ch. 3 and Previous

CHILD HEALTH AND DEVELOPMENT STUDY

Chapter 3 Review. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Unit 1 Exploring and Understanding Data

Key Concept. Chapter 9 Inferences from Two Samples. Requirements. November 18, S9.5_3 Comparing Variation in Two Samples

12.1 Inference for Linear Regression. Introduction

Properties of the F Distribution. Key Concept. Chapter 9 Inferences from Two Samples. Requirements

Pitfalls in Linear Regression Analysis

INTERPRET SCATTERPLOTS

STAT 201 Chapter 3. Association and Regression

6. Unusual and Influential Data

Homework Linear Regression Problems should be worked out in your notebook

A response variable is a variable that. An explanatory variable is a variable that.

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test February 2016

Relationships. Between Measurements Variables. Chapter 10. Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

STATISTICS & PROBABILITY

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do.

UF#Stats#Club#STA#2023#Exam#1#Review#Packet# #Fall#2013#

1.4 - Linear Regression and MS Excel

Simple Linear Regression the model, estimation and testing

(a) 50% of the shows have a rating greater than: impossible to tell

AP Statistics. Semester One Review Part 1 Chapters 1-5

Chapter 4: Scatterplots and Correlation

Math 124: Module 2, Part II

Frequency Distributions

(a) 50% of the shows have a rating greater than: impossible to tell

MEASURES OF ASSOCIATION AND REGRESSION

SCATTER PLOTS AND TREND LINES

Simple Linear Regression

q3_2 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

Frequency Distributions

Chapter 2 Organizing and Summarizing Data. Chapter 3 Numerically Summarizing Data. Chapter 4 Describing the Relation between Two Variables

Math MidTerm Exam & Math Final Examination STUDY GUIDE Spring 2011

Multiple Regression Analysis

STT 200 Test 1 Green Give your answer in the scantron provided. Each question is worth 2 points.

CHAPTER TWO REGRESSION

Elementary Algebra Sample Review Questions

WELCOME! Lecture 11 Thommy Perlinger

Lab 5a Exploring Correlation

2016 ACP IM-ITE Regression Based Individual Analyses

STATISTICS 201. Survey: Provide this Info. How familiar are you with these? Survey, continued IMPORTANT NOTE. Regression and ANOVA 9/29/2013

Understandable Statistics

Lesson 1: Distributions and Their Shapes

Multiple Choice Questions

Regression. Regression lines CHAPTER 5

Regression CHAPTER SIXTEEN NOTE TO INSTRUCTORS OUTLINE OF RESOURCES

How Faithful is the Old Faithful? The Practice of Statistics, 5 th Edition 1

Examining Relationships Least-squares regression. Sections 2.3

Introduction to regression

CHAPTER ONE CORRELATION

Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world

F1: Introduction to Econometrics

Ordinary Least Squares Regression

Table of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017

Part 1. For each of the following questions fill-in the blanks. Each question is worth 2 points.

Statistical Methods Exam I Review

14.1: Inference about the Model

MAT Intermediate Algebra - Final Exam Review Textbook: Beginning & Intermediate Algebra, 5th Ed., by Martin-Gay

M 140 Test 1 A Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

1 Version SP.A Investigate patterns of association in bivariate data

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc.

ANALYZING BIVARIATE DATA

10. LINEAR REGRESSION AND CORRELATION

TEACHING REGRESSION WITH SIMULATION. John H. Walker. Statistics Department California Polytechnic State University San Luis Obispo, CA 93407, U.S.A.

Class 7 Everything is Related

EXECUTIVE SUMMARY DATA AND PROBLEM

Chapter 1 Where Do Data Come From?

M 140 Test 1 A Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 60

Semester 1 Final Scientific calculators are allowed, NO GRAPHING CALCULATORS. You must show all your work to receive full credit.

IAPT: Regression. Regression analyses

Chapter 3, Section 1 - Describing Relationships (Scatterplots and Correlation)

5 To Invest or not to Invest? That is the Question.

7) Briefly explain why a large value of r 2 is desirable in a regression setting.

Find the coordinates of the labeled points. 1) 1) Identify the quadrant with the given condition. 4) The first coordinate is positive.

Sample Math 71B Final Exam #1. Answer Key

Statistical Tools in Biology

Transcription:

MAT 155 Statistical Analysis Dr. Claude Moore Cape Fear Community College Chapter 10 Correlation and Regression 10 1 Review and Preview 10 2 Correlation 10 3 Regression 10 4 Variation and Prediction Intervals 10 5 Multiple Regression 10 6 Modeling Key Concept In part 1 of this section we find the equation of the straight line that best fits the paired sample data. That equation algebraically describes the relationship between two variables. The best fitting straight line is called a regression line and its equation is called the regression equation. In part 2, we discuss marginal change, influential points, and residual plots as tools for analyzing correlation and regression results. Part 1: Basic Concepts of Regression The regression equation expresses a relationship between x (called the explanatory variable, predictor variable or independent variable) and y ^ (called the response variable or dependent variable). The typical equation of a straight line y = mx + b is expressed in the form y ^ = b 0 + b 1 x, where b 0 is the y intercept and b 1 is the slope. Definitions Regression Equation Given a collection of paired data, the regression equation algebraically describes the relationship between the two variables. Regression Line The graph of the regression equation is called the regression line (or line of best fit, or least squares line). 1

y intercept of regression equation Slope of regression equation Notation for Regression Equation Population Parameter Sample Statistic β 0 b 0 β 1 b 1 Requirements 1. The sample of paired (x, y) data is a random sample of quantitative data. 2. Visual examination of the scatterplot shows that the points approximate a straight line pattern. 3. Any outliers must be removed if they are known to be errors. Consider the effects of any outliers that are not known errors. Equation of the regression line y = β 0 + β 1 x y = b 0 + b 1 x Formula 10 3 Formula 10 4 Formulas for b 0 and b 1 (slope) calculators or computers can compute these values Alternate forms to calculate b 1 and b 0: (y intercept) Special Property The regression line fits the sample points best. Rounding the y intercept b 0 and the Slope b 1 Round to three significant digits. If you use the formulas 10 3 and 10 4, do not round intermediate values. 2

Refer to the sample data given in Table 10 1 in the Chapter Problem. Use technology to find the equation of the regression line in which the explanatory variable (or x variable) is the cost of a slice of pizza and the response variable (or y variable) is the corresponding cost of a subway fare. Requirements are satisfied: simple random sample; scatterplot approximates a straight line; no outliers Here are results from four different technologies technologies All of these technologies show that the regression equation can be expressed as y ^= 0.0346 +0.945x, where ^y is the predicted cost of a subway fare and x is the cost of a slice of pizza. We should know that the regression equation is an estimate of the true regression equation. This estimate is based on one particular set of sample data, but another sample drawn from the same population would probably lead to a slightly different equation. Graph the regression equation (from the preceding Example) on the scatterplot of the pizza/subway fare data and examine the graph to subjectively determine how well the regression line fits the data. On the next slide is the Minitab display of the scatterplot with the graph of the regression line included. We can see that the regression line fits the data quite well. 3

Using the Regression Equation for Predictions 1. Use the regression equation for predictions only if the graph of the regression line on the scatterplot confirms that the regression line fits the points reasonably well. 2. Use the regression equation for predictions only if the linear correlation coefficient r indicates that there is a linear correlation between the two variables (as described in Section 10 2). Using the Regression Equation for Predictions Strategy for Predicting Values of Y 3. Use the regression line for predictions only if the data do not go much beyond the scope of the available sample data. (Predicting too far beyond the scope of the available sample data is called extrapolation, and it could result in bad predictions.) 4. If the regression equation does not appear to be useful for making predictions, the best predicted value 4

Using the Regression Equation for Predictions If the regression equation is not a good model, the best predicted value of y is simply y, the mean of the y values. Remember, this strategy applies to linear patterns of points in a scatterplot. If the scatterplot shows a pattern that is not a straightline pattern, other methods apply, as described in Section 10 6. Part 2: Beyond the Basics of Regression Definitions In working with two variables related by a regression equation, the marginal change in a variable is the amount that it changes when the other variable changes by exactly one unit. The slope b 1 in the regression equation represents the marginal change in y that occurs when x changes by one unit. Definitions In a scatterplot, an outlier is a point lying far away from the other data points. Paired sample data may include one or more influential points, which are points that strongly affect the graph of the regression line. Consider the pizza subway fare data from the Chapter Problem. The scatterplot located to the left on the next slide shows the regression line. If we include this additional pair of data: x = 2.00,y = 20.00 (pizza is still $2.00 per slice, but the subway fare is $ 20.00 which means that people are paid $20 to ride the subway), this additional point would be an influential point because the graph of the regression line would change considerably, as shown by the regression line located to the right. 5

Compare the two graphs and you will see clearly that the addition of that one pair of values has a very dramatic effect on the regression line, so that additional point is an influential point. The additional point is also an outlier because it is far from the other points. Definition Residuals For a pair of sample x and y values, the residual is the difference between the observed sample value of y and the y value that is predicted by using the regression equation. That is, residual = observed y predicted y = y y^ 6

Definitions A straight line satisfies the least squares property if the sum of the squares of the residuals is the smallest sum possible. A residual plot is a scatterplot of the (x, y) values after each of the y coordinate values has been replaced by the residual value y y ^ (where y ^ denotes the predicted value of y). That is, a residual plot is a graph of the points (x, y y). ^ Residual Plot Analysis When analyzing a residual plot, look for a pattern in the way the points are configured, and use these criteria: The residual plot should not have an obvious pattern that is not a straight line pattern. The residual plot should not become thicker (or thinner) when viewed from left to right. Residuals Plot Pizza/Subway Residual Plots 7

Residual Plots Residual Plots Complete Regression Analysis 1. Construct a scatterplot and verify that the pattern of the points is approximately a straight line pattern without outliers. (If there are outliers, consider their effects by comparing results that include the outliers to results that exclude the outliers.) 2. Construct a residual plot and verify that there is no pattern (other than a straight line pattern) and also verify that the residual plot does not become thicker (or thinner). Recap In this section we have discussed: The basic concepts of regression. Rounding rules. Using the regression equation for predictions. Interpreting the regression equation. Outliers Residuals and least squares. Residual plots. 3. Use a histogram and/or normal quantile plot to confirm that the values of the residuals have a distribution that is approximately normal. 4. Consider any effects of a pattern over time. 8

Making Predictions. In Exercises 5 8, use the given data to find the best predicted value of the response variable. Be sure to follow the prediction procedure summarized in Figure 10 5. (p. 544) 551/6. Heights of Mothers and Daughters A sample of eight mother daughter pairs of subjects was obtained, and their heights (in inches) were measured. The linear correlation coefficient is 0.693 and the regression equation is ^y = 69.0 0.0849x, where x represents the height of the mother (based on data from the National Health Examination Survey). The mean height of the mothers is 63.1 in. and the mean height of the daughters is 63.3 in. Find the best predicted height of a daughter given that the mother has a height of 60 in. Making Predictions. In Exercises 5 8, use the given data to find the best predicted value of the response variable. Be sure to follow the prediction procedure summarized in Figure 10 5. (p. 544) 552/8. Supermodel Heights and Weights Heights (in inches) and weights (in pounds) are obtained from a random sample of nine supermodels (Alves, Avermann, Hilton, Dyer, Turlington, Hall, Campbell, Mazza, and Hume). The linear correlation coefficient is 0.360 and the equation of the regression line is ^y = 31.8 + 1.23x, where x represents height. The mean of the nine heights is 69.3 in. and the mean of the nine weights is 117 lb. What is the best predicted weight of a supermodel with a height of 72 in.? Finding the Equation of the Regression Line and Making Predictions. Exercises 13 28 use the same data sets as Exercises 13 28 in Section 10 2. In each case, find the regression equation, letting the first variable be the predictor (x) variable. Find the indicated predicted value by following the prediction procedure summarized in Figure 10 5. 552/14. CPI and Subway Fare Find the best predicted cost of subway fare when the Consumer Price Index (CPI) is 182.5 (in the year 2000). CPI 30.2 48.3 112.3 162.2 191.9 197.8 Subway Fare 0.15 0.35 1.00 1.35 1.50 2.00 ^ Finding the Equation of the Regression Line and Making Predictions. Exercises 13 28 use the same data sets as Exercises 13 28 in Section 10 2. In each case, find the regression equation, letting the first variable be the predictor (x) variable. Find the indicated predicted value by following the prediction procedure summarized in Figure 10 5. 553/16. Heights of Presidents and Runners Up Find the best predicted height of runner up Goldwater, given that the height of the winning presidential candidate Johnson is 75 in. Is the predicted height of Goldwater close to his actual height of 72 in.? Winner 69.5 73 73 74 74.5 74.5 71 71 Runner Up 72 69.5 70 68 74 74 73 76 ^ 9

Finding the Equation of the Regression Line and Making Predictions. Exercises 13 28 use the same data sets as Exercises 13 28 in Section 10 2. In each case, find the regression equation, letting the first variable be the predictor (x) variable. Find the indicated predicted value by following the prediction procedure summarized in Figure 10 5. 553/20. Commuters and Parking Spaces The Metro North Station of Greenwich, CT has 2804 commuters. Find the best predicted number of parking spots at that station. Is the predicted value close to the actual value of 1274? Commuters 3453 1350 1126 3120 2641 277 579 2532 Parking Spots 1653 676 294 950 1216 179 466 1454 Finding the Equation of the Regression Line and Making Predictions. Exercises 13 28 use the same data sets as Exercises 13 28 in Section 10 2. In each case, find the regression equation, letting the first variable be the predictor (x) variable. Find the indicated predicted value by following the prediction procedure summarized in Figure 10 5. 553/22. New Car Mileage Ratings Find the best predicted new mileage rating of a Jeep Grand Cherokee given that the old rating is 19 mi/gal. Is the predicted value close to the actual value of 17 mi/gal? Old 16 27 17 33 28 24 18 22 20 29 21 New 15 24 15 29 25 22 16 20 18 26 19 ^ ^ Finding the Equation of the Regression Line and Making Predictions. Exercises 13 28 use the same data sets as Exercises 13 28 in Section 10 2. In each case, find the regression equation, letting the first variable be the predictor (x) variable. Find the indicated predicted value by following the prediction procedure summarized in Figure 10 5. 554/24. Costs of Televisions Find the best predicted quality score of a Hitachi television with a price of $ 1900. Is the predicted quality score close to the actual quality score of 56? Price 2300 1800 2500 2700 2000 1700 1500 2700 Quality Score 74 73 70 66 63 62 52 68 Finding the Equation of the Regression Line and Making Predictions. Exercises 13 28 use the same data sets as Exercises 13 28 in Section 10 2. In each case, find the regression equation, letting the first variable be the predictor (x) variable. Find the indicated predicted value by following the prediction procedure summarized in Figure 10 5. 554/26. Crickets and Temperature Find the best predicted temperature (in F) at a time when a cricket chirps 3000 times in one minute. What is wrong with this predicted value? Chirps in 1 min 882 1188 1104 864 1200 1032 960 900 Temperature ( F) 69.7 93.3 84.3 76.3 88.6 82.6 71.6 79.6 ^ 10