Lecture 12: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

Similar documents
Lecture 6B: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

Chapter 3 CORRELATION AND REGRESSION

Section 3.2 Least-Squares Regression

Chapter 3: Examining Relationships

3.2 Least- Squares Regression

CHAPTER 3 Describing Relationships

STAT 201 Chapter 3. Association and Regression

Regression Equation. November 29, S10.3_3 Regression. Key Concept. Chapter 10 Correlation and Regression. Definitions

AP Statistics Practice Test Ch. 3 and Previous

Math 075 Activities and Worksheets Book 2:

Lecture 10: Chapter 5, Section 2 Relationships (Two Categorical Variables)

Chapter 3 Review. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

HW 3.2: page 193 #35-51 odd, 55, odd, 69, 71-78

Chapter 3: Describing Relationships

12.1 Inference for Linear Regression. Introduction

Chapter 1: Exploring Data

STATISTICS INFORMED DECISIONS USING DATA

Example The median earnings of the 28 male students is the average of the 14th and 15th, or 3+3

Unit 1 Exploring and Understanding Data

3.2A Least-Squares Regression

Chapter 4: Scatterplots and Correlation

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do.

10/4/2007 MATH 171 Name: Dr. Lunsford Test Points Possible

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

Chapter 3, Section 1 - Describing Relationships (Scatterplots and Correlation)

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

INTERPRET SCATTERPLOTS

Homework #3. SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

Examining Relationships Least-squares regression. Sections 2.3

14.1: Inference about the Model

Relationships. Between Measurements Variables. Chapter 10. Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

How Faithful is the Old Faithful? The Practice of Statistics, 5 th Edition 1

7) Briefly explain why a large value of r 2 is desirable in a regression setting.

REVIEW PROBLEMS FOR FIRST EXAM

Chapter 14. Inference for Regression Inference about the Model 14.1 Testing the Relationship Signi!cance Test Practice

bivariate analysis: The statistical analysis of the relationship between two variables.

Homework 2 Math 11, UCSD, Winter 2018 Due on Tuesday, 23rd January

1.4 - Linear Regression and MS Excel

CHAPTER ONE CORRELATION

MATH 2560 C F03 Elementary Statistics I LECTURE 6: Scatterplots (Continuation).

Practice First Midterm Exam

(a) 50% of the shows have a rating greater than: impossible to tell

IAPT: Regression. Regression analyses

NORTH SOUTH UNIVERSITY TUTORIAL 2

Homework Linear Regression Problems should be worked out in your notebook

Lab 4 (M13) Objective: This lab will give you more practice exploring the shape of data, and in particular in breaking the data into two groups.

Business Statistics Probability

MEASURES OF ASSOCIATION AND REGRESSION

Reminders/Comments. Thanks for the quick feedback I ll try to put HW up on Saturday and I ll you

Scatter Plots and Association

Math 124: Module 2, Part II

Statistical Methods Exam I Review

Correlation and regression

M 140 Test 1 A Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

UF#Stats#Club#STA#2023#Exam#1#Review#Packet# #Fall#2013#

M 140 Test 1 A Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 60

6. Unusual and Influential Data

Further Mathematics 2018 CORE: Data analysis Chapter 3 Investigating associations between two variables

Eating and Sleeping Habits of Different Countries

Chapter 4: More about Relationships between Two-Variables Review Sheet

Introduction to regression

Lecture 12 Cautions in Analyzing Associations

STATISTICS & PROBABILITY

3. For a $5 lunch with a 55 cent ($0.55) tip, what is the value of the residual?

(a) 50% of the shows have a rating greater than: impossible to tell

Understandable Statistics

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test February 2016

Lecture 12A: Chapter 9, Section 1 Inference for Categorical Variable: Confidence Intervals

APPENDIX N. Summary Statistics: The "Big 5" Statistical Tools for School Counselors

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

CHILD HEALTH AND DEVELOPMENT STUDY

Regression. Lelys Bravo de Guenni. April 24th, 2015

Biostatistics. Donna Kritz-Silverstein, Ph.D. Professor Department of Family & Preventive Medicine University of California, San Diego

Chapter 4: More about Relationships between Two-Variables

3.4 What are some cautions in analyzing association?

STATISTICS 201. Survey: Provide this Info. How familiar are you with these? Survey, continued IMPORTANT NOTE. Regression and ANOVA 9/29/2013

Test 1C AP Statistics Name:

This means that the explanatory variable accounts for or predicts changes in the response variable.

Multiple Regression Analysis

Simple Linear Regression the model, estimation and testing

Chapter 2--Norms and Basic Statistics for Testing

Multiple Choice Questions

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

STAT 135 Introduction to Statistics via Modeling: Midterm II Thursday November 16th, Name:

Observational studies; descriptive statistics

STAT 503X Case Study 1: Restaurant Tipping

SCATTER PLOTS AND TREND LINES

MULTIPLE REGRESSION OF CPS DATA

Part 1. For each of the following questions fill-in the blanks. Each question is worth 2 points.

Statistics and Probability

What is Data? Part 2: Patterns & Associations. INFO-1301, Quantitative Reasoning 1 University of Colorado Boulder

Correlation & Regression Exercises Chapters 14-15

Still important ideas

Variability. After reading this chapter, you should be able to do the following:

Regression. Regression lines CHAPTER 5

CCM6+7+ Unit 12 Data Collection and Analysis

Ordinary Least Squares Regression

Welcome to OSA Training Statistics Part II

Dr. Kelly Bradley Final Exam Summer {2 points} Name

Regression CHAPTER SIXTEEN NOTE TO INSTRUCTORS OUTLINE OF RESOURCES

Transcription:

Lecture 12: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression Equation of Regression Line; Residuals Effect of Explanatory/Response Roles Unusual Observations Sample vs. Population Time Series; Additional Variables Cengage Learning Elementary Statistics: Looking at the Big Picture 1

Looking Back: Review 4 Stages of Statistics Data Production (discussed in Lectures 1-4) Displaying and Summarizing Single variables: 1 cat,1 quan (discussed Lectures 5-8) Relationships between 2 variables: Categorical and quantitative (discussed in Lecture 9) Two categorical (discussed in Lecture 10) Two quantitative Probability Statistical Inference Elementary Statistics: Looking at the Big Picture L12.2

Review Relationship between 2 quantitative variables Display with scatterplot Summarize: Form: linear or curved Direction: positive or negative Strength: strong, moderate, weak If form is linear, correlation r tells direction and strength. Also, equation of least squares regression line lets us predict a response for any explanatory value x. Elementary Statistics: Looking at the Big Picture L12.3

Least Squares Regression Line Summarize linear relationship between explanatory (x) and response (y) values with line that minimizes sum of squared prediction errors (called residuals). Slope: predicted change in response y for every unit increase in explanatory value x Intercept: where best-fitting line crosses y-axis (predicted response for x=0?) Elementary Statistics: Looking at the Big Picture L12.4

Example: Least Squares Regression Line Background: Car-buyer used software to regress price on age for 14 used Grand Am s. Question: What do the slope (-1,288) and intercept (14,690) tell us? Response: Slope: For each additional year in age, predict price Intercept: Best-fitting line Elementary Statistics: Looking at the Big Picture L12.6

Example: Extrapolation Background: Car-buyer used software to regress price on age for 14 used Grand Am s. Question: Should we predict a new Grand Am to cost $14,690-$1,288(0)=$14,690? Response: Elementary Statistics: Looking at the Big Picture L12.8

Definition Extrapolation: using the regression line to predict responses for explanatory values outside the range of those used to construct the line. Elementary Statistics: Looking at the Big Picture L12.9

Example: More Extrapolation Background: A regression of 17 male students weights (lbs.) on heights (inches) yields the equation Question: What weight does the line predict for a 20-inch-long infant? Response: Elementary Statistics: Looking at the Big Picture L12.11

Expressions for slope and intercept Consider slope and intercept of the least squares regression line Slope: so if x increases by a standard deviation, predict y to increase by r standard deviations Intercept: so when predict the line passes through the point of averages Elementary Statistics: Looking at the Big Picture L12.13

Example: Individual Summaries on Scatterplot Background: Car-buyer plotted price vs. age for 14 used Grand Ams [(4, 13,000), (8, 4,000), etc.] Question: Guess the means and sds of age and price? Response: Age has approx. mean yrs, sd yrs; price has approx. mean $, sd $. Elementary Statistics: Looking at the Big Picture L12.17

Definitions Residual: error in using regression line to predict y given x. It equals the vertical distance observed minus predicted which can be written s: denotes typical residual size, calculated as Note: s just averages out the residuals Elementary Statistics: Looking at the Big Picture L12.18

Example: Considering Residuals Background: Car-buyer regressed price on age for 14 used Grand Ams [(4, 13,000), (8, 4,000), etc.]. Question: What does s = 2,175 tell us? Response: Regression line predictions not perfect: x=4predict = actual y=13,000prediction error = x=8predict = actual y=4,000prediction error = Typical size of 14 prediction errors is (dollars) Elementary Statistics: Looking at the Big Picture L12.20

Example: Considering Residuals Typical size of 14 prediction errors is s = 2,175 (dollars): Some points vertical distance from line more, some less; 2,175 is typical distance. Elementary Statistics: Looking at the Big Picture L12.21

Example: Residuals and their Typical Size s Background: For a sample of schools, regressed average Math SAT on average Verbal SAT average Math SAT on % of teachers w. advanced degrees Question: How are s = 7.08 (left) and s = 26.2 (right) consistent with the values of the correlation r? Response: On left ; relation is and typical error size is (only 7.08). A Closer Look: If output reports R-sq, take its square root (+ or - depending on slope) to find r. Elementary Statistics: Looking at the Big Picture L12.23

Example: Residuals and their Typical Size s Background: For a sample of schools, regressed average Math SAT on average Verbal SAT Smaller sbetter predictions average Math SAT on % of teachers w. advanced degrees Looking Back: r based on averages is overstated; strength of relationship for individual students would be less. Question: How are s = 7.08 (left) and s = 26.2 (right) consistent with the values of the correlation r? Response: On right r = ; relation is and typical error size is (26.2). Elementary Statistics: Looking at the Big Picture L12.25

Example: Typical Residual Size s close to or 0 Background: Scatterplots show relationships Price per kilogram vs. price per lb. for groceries Students final exam score vs. (number) order handed in Regression line approx. same as line at average y-value. Questions: Which has s = 0? Which has s close to? Responses: Plot on left has s = : no prediction errors. Plot on right: s close to. (Regressing on x doesn t help; regression line is approximately horizontal.) Elementary Statistics: Looking at the Big Picture L12.27

Example: Typical Residual Size s close to Background: 2008-9 Football Season Scores Regression Analysis: Steelers versus Opponents The regression equation is Steelers = 23.5-0.053 Opponents S = 9.931 Descriptive Statistics: Steelers Variable N Mean Median TrMean StDev SE Mean Steelers 19 22.74 23.00 22.82 9.66 2.22 Variable Minimum Maximum Q1 Q3 Steelers 6.00 38.00 14.00 31.00 Question: Since s = 9.931 and expect r close to 0 or 1? Response: r must be close to = 9.66 are very close, do you Elementary Statistics: Looking at the Big Picture L12.29

Explanatory/Response Roles in Regression Our choice of roles, explanatory or response, does not affect the value of the correlation r, but it does affect the regression line. Elementary Statistics: Looking at the Big Picture L12.30

Example: Regression Line when Roles are Switched Background: Compare regression of y on x (left) and regression of x on y (right) for same 4 points: Question: Do we get the same line regressing y on x as we do regressing x on y? Context needed; Response: The lines are very different. consider variables Regressing y on x: slope and their roles Regressing x on y: slope before regressing. Elementary Statistics: Looking at the Big Picture L12.32

Definitions Outlier: (in regression) point with unusually large residual Influential observation: point with high degree of influence on regression line. Elementary Statistics: Looking at the Big Picture L12.33

Example: Outliers and Influential Observations Background: Exploring relationship between orders for new planes and fleet size. (r=+0.69) Question: Are Southwest and JetBlue outliers or influential? Response: Southwest: (omit itslope changes a lot) JetBlue: (large residual; omit itr increases to +0.97) Elementary Statistics: Looking at the Big Picture L12.35

Example: Outliers and Influential Observations Background: Exploring relationship between orders for new planes and fleet size. (r = +0.69) Question: How does Minitab classify Southwest and JetBlue? Response: Southwest: (marked in Minitab) JetBlue: (marked in Minitab) Influential observations tend to be extreme in horizontal direction. Elementary Statistics: Looking at the Big Picture L12.37

Definitions Slope : how much response y changes in general (for entire population) for every unit increase in explanatory variable x Intercept : where the line that best fits all explanatory/response points (for entire population) crosses the y-axis Looking Back: Greek letters often refer to population parameters. Elementary Statistics: Looking at the Big Picture L12.38

Line for Sample vs. Population Sample: line best fitting sampled points: predicted response is Population: line best fitting all points in population from which given points were sampled: mean response is A larger sample helps provide more evidence of a relationship between two quantitative variables in the general population. Elementary Statistics: Looking at the Big Picture L12.39

Example: Role of Sample Size Background: Relationship between ages of students mothers and fathers both have r = +0.78, but sample size is over 400 (on left) or just 5 (on right): Question: Which plot provides more evidence of strong positive relationship in population? Response: Plot on Can believe configuration on occurred by chance. Elementary Statistics: Looking at the Big Picture L12.41

Time Series If explanatory variable is time, plot one response for each time value and connect the dots to look for general trend over time, also peaks and troughs. Elementary Statistics: Looking at the Big Picture L12.42

Example: Time Series Background: Time series plot shows average daily births each month in year 2000 in the U.S.: Question: Where do you see a peak or a trough? Response: Trough in, peak in Elementary Statistics: Looking at the Big Picture L12.44

Example: Time Series Background: Time series plot of average daily births in U.S. Questions: How can we explain why there are Conceptions in U.S.: fewer in July, more in December? Conceptions in Europe: more in summer, fewer in winter? Response: A Closer Look: Statistical methods can t always explain why, but at least they help understand what is going on. Elementary Statistics: Looking at the Big Picture L12.46

Additional Variables in Regression Confounding Variable: Combining two groups that differ with respect to a variable that is related to both explanatory and response variables can affect the nature of their relationship. Multiple Regression: More advanced treatments consider impact of not just one but two or more quantitative explanatory variables on a quantitative response. Elementary Statistics: Looking at the Big Picture L12.47

Example: Additional Variables Background: A regression of phone time (in minutes the day before) and weight shows a negative relationship. Questions: Do heavy people talk on the phone less? Do light people talk more? Response: is confounding variableregress separately for no relationship Elementary Statistics: Looking at the Big Picture L12.49

Example: Multiple Regression Background: We used a car s age to predict its price. Question: What additional quantitative variable would help predict a car s price? Response: Elementary Statistics: Looking at the Big Picture L12.51

Lecture Summary (Regression) Equation of regression line Interpreting slope and intercept Extrapolation Residuals: typical size is s Line affected by explanatory/response roles Outliers and influential observations Line for sample or population; role of sample size Time series Additional variables Elementary Statistics: Looking at the Big Picture L12.52