Chapter 3: Describing Relationships

Similar documents
Section 3.2 Least-Squares Regression

Chapter 3: Examining Relationships

3.2 Least- Squares Regression

CHAPTER 3 Describing Relationships

3.2A Least-Squares Regression

Chapter 1: Exploring Data

This means that the explanatory variable accounts for or predicts changes in the response variable.

Chapter 3 CORRELATION AND REGRESSION

Chapter 4: Scatterplots and Correlation

1.4 - Linear Regression and MS Excel

STAT 201 Chapter 3. Association and Regression

STATISTICS & PROBABILITY

Lecture 6B: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

Examining Relationships Least-squares regression. Sections 2.3

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

STATISTICS INFORMED DECISIONS USING DATA

Lecture 12: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

Unit 1 Exploring and Understanding Data

Understandable Statistics

AP Statistics Practice Test Ch. 3 and Previous

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

SCATTER PLOTS AND TREND LINES

STATS Relationships between variables: Correlation

A response variable is a variable that. An explanatory variable is a variable that.

How Faithful is the Old Faithful? The Practice of Statistics, 5 th Edition 1

Still important ideas

Chapter 3, Section 1 - Describing Relationships (Scatterplots and Correlation)

The North Carolina Health Data Explorer

Business Statistics Probability

1 Version SP.A Investigate patterns of association in bivariate data

Simple Linear Regression the model, estimation and testing

Reminders/Comments. Thanks for the quick feedback I ll try to put HW up on Saturday and I ll you

AP Statistics. Semester One Review Part 1 Chapters 1-5

Pitfalls in Linear Regression Analysis

INTERPRET SCATTERPLOTS

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

6. Unusual and Influential Data

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world

12.1 Inference for Linear Regression. Introduction

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

bivariate analysis: The statistical analysis of the relationship between two variables.

Homework #3. SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

Unit 8 Day 1 Correlation Coefficients.notebook January 02, 2018

Population. Sample. AP Statistics Notes for Chapter 1 Section 1.0 Making Sense of Data. Statistics: Data Analysis:

AP Stats Chap 27 Inferences for Regression

Chapter 4: More about Relationships between Two-Variables

Statistics for Psychology

LINEAR REGRESSION TI-83 QUICK REFERENCE

Regression Equation. November 29, S10.3_3 Regression. Key Concept. Chapter 10 Correlation and Regression. Definitions

M 140 Test 1 A Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 60

Chapter 4. More On Bivariate Data. More on Bivariate Data: 4.1: Transforming Relationships 4.2: Cautions about Correlation

Lesson 1: Distributions and Their Shapes

AP STATISTICS 2010 SCORING GUIDELINES

Math 075 Activities and Worksheets Book 2:

Table of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017

Chapter 4: More about Relationships between Two-Variables Review Sheet

CHAPTER TWO REGRESSION

Chapter 14. Inference for Regression Inference about the Model 14.1 Testing the Relationship Signi!cance Test Practice

STATISTICS 201. Survey: Provide this Info. How familiar are you with these? Survey, continued IMPORTANT NOTE. Regression and ANOVA 9/29/2013

Statistics and Probability

Medical Statistics 1. Basic Concepts Farhad Pishgar. Defining the data. Alive after 6 months?

Part 1. For each of the following questions fill-in the blanks. Each question is worth 2 points.

IAPT: Regression. Regression analyses

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

Lecture 12 Cautions in Analyzing Associations

Introduction to regression

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

Caffeine & Calories in Soda. Statistics. Anthony W Dick

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test February 2016

MATH 2560 C F03 Elementary Statistics I LECTURE 6: Scatterplots (Continuation).

Measurement and Descriptive Statistics. Katie Rommel-Esham Education 604

Further Mathematics 2018 CORE: Data analysis Chapter 3 Investigating associations between two variables

Dr. Allen Back. Sep. 30, 2016

14.1: Inference about the Model

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences.

An Introduction to Statistical Thinking Dan Schafer Table of Contents

Chapter 3 Review. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Chapter 1: Explaining Behavior

Probability and Statistics. Chapter 1

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

Exemplar for Internal Assessment Resource Mathematics Level 3. Resource title: Sport Science. Investigate bivariate measurement data

Biostatistics II

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 5 Residuals and multiple regression Introduction

isc ove ring i Statistics sing SPSS

10. LINEAR REGRESSION AND CORRELATION

STAT 135 Introduction to Statistics via Modeling: Midterm II Thursday November 16th, Name:

Section 3 Correlation and Regression - Teachers Notes

3.4 What are some cautions in analyzing association?

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Conditional Distributions and the Bivariate Normal Distribution. James H. Steiger

CHILD HEALTH AND DEVELOPMENT STUDY

M 140 Test 1 A Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

Muscle Size & Strength

Relationships. Between Measurements Variables. Chapter 10. Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

IAS 3.9 Bivariate Data

Introduction to Statistical Data Analysis I

Measuring the User Experience

Multiple Linear Regression Analysis

Transcription:

Chapter 3: Describing Relationships Objectives: Students will: Construct and interpret a scatterplot for a set of bivariate data. Compute and interpret the correlation, r, between two variables. Demonstrate an understanding of the basic properties of the correlation r. Explain the meaning of a least squares regression line. Given a bivariate data set, construct and interpret a regression line. Demonstrate an understanding of how one measures the quality of a regression line as a model for bivariate data. AP Outline Fit: I. Exploring Data: Describing patterns and departures from patterns (20% 30%) D. Exploring bivariate data 1. Analyzing patterns in scatterplots 2. Correlation and linearity 3. Least-squares regression line 4. Residual plots, outliers, and influential points What you will learn: A. Data 1. Recognize whether each variable is quantitative or categorical 2. Indentify the explanatory and response variables in situations where one variable explains or influences another B. Scatterplots 1. Make a scatterplot to display the relationship between two quantitative variables. Place the explanatory variable (if any) on the horizontal scale of the plot 2. Add a categorical variable to a scatterplot by using a different plotting symbol or color 3. Describe the direction, form and strength of the overall pattern of a scatterplot. In particular, recognize positive or negative association and linear (straight-line) patterns. Recognize outliers in a scatterplot C. Correlation 1. Using a calculator, find the correlation r between two quantitative variables 2. Know the basic properties of correlation: a. r measures the strength and direction of linear relationships only b. -1 r 1 always c. r = ± 1 only for perfect straight line relations d. r moves away from 0 toward ±1 as the linear relationship gets stronger D. Regression Lines 1. Explain what the slope, b, and the y-intercept, a, mean in the regression line equation 2. Using a calculator, find the least-squares regression line for predicting values of a response variable, y, from an explanatory variable, x, from the data 3. Find the slope and intercept of the least-squares regression line for the means and standard deviations of x and y and their correlation 4. Use the regression line to predict y for a given x. Recognize extrapolation and be aware of its dangers E. Assessing Model Quality 1. Calculate the residuals and plot them against the explanatory variable x or against other variables. Recognize unusual patterns 2. Use r² to describe how much of the variation in one variable can be accounted for by a straight-line relationship with another variable 3. Recognize outliers and potentially influential observations from a scatterplot with the regression line drawn on it F. Interpreting Correlation and Regression 1. Understand that both r and the least-squares regression line can be strongly influenced by a few extreme observations 2. Recognize possible lurking variables that may explain the correlation between two variables x and y 1

Section 3.1: Scatterplots and Correlation Chapter 3: Describing Relationships Objectives: Students will be able to: Distinguish between explanatory and response variables for quantitative data Make a scatterplot to display the relationship between two quantitative variables Describe the direction, form, and strength of a relationship displayed in a scatterplot and identify unusual features Interpret the correlation Understand the basic properties of correlation, including how the correlation is influenced by outliers Distinguish correlation from causation Vocabulary: Bivariate data data that has two variables involved with each point Categorical Variables variables to which arithmetic operations make no sense Correlation (r) for only linear associations, it measures the direction and strength of the associaton Cluster a group of points distinct from other points in the scatterplot Explanatory variable a variable that may help predict or explain changes in a response variable Negative Association when above average values of one variable tend to accompany below average values of the other variable; decreasing left to right; negative slope No Association knowing the value of one variable does not help us predict the value of the other variable Outlier an individual value that falls outside the overall pattern of the relationship Positive Association when above average values of one variable tend to accompany above average values of the other variable (and when below average values also tend to occur together); increasing left to right; positive slope Response variable a variable that is measures an outcome of a study Scatterplot shows the relationship between two quantitative variables measured on the same individuals Scatterplot Direction positive (increasing left to right) or negative (decreasing left to right) association Scatterplot Form drawing a single line to represent the data (linear, curved, exponential, etc) Scatterplot Strength how closely the points follow a clear form (weak, moderately weak, moderately strong, strong) Key Concepts: Statistical Analysis: Plot the data and add numerical summaries Look for overall patterns and deviations from those patterns When there s a regular overall pattern, use a simplified model to describe it Scatter plots should be described by Direction positive association (positive slope left to right) negative association (negative slope left to right) Form linear straight line, curved quadratic, cubic, etc, exponential, etc Strength of the form weak moderate (either weak or strong) strong Outliers (any points not conforming to the form) Clusters (any sub-groups not conforming to the form) Linear Correlation Coefficient, r 1 r = ------ n 1 Σ (x i x) (y i y) ---------- ---------- s x s y Where x is the sample mean of the explanatory variable s x is the sample standard deviation for x y is the sample mean of the response variable s y is the sample standard deviation for y n is the number of individuals in the sample 2

Example 1: Identify the explanatory and response variable in each setting: A) In a study, adult volunteers drank different numbers of cans of beer. Thirty minutes later, a police officer measured their blood alcohol levels. B) The National Student Loan Survey provides data on the amount of debt for recent college graduates, their current income, and how stressed the feel about college debt. A sociologist looks at the data with the goal of using amount of debt and income to explain the stress caused by college debt. Example 2: Describe each of these scatterplots: 3

Example 3: Describe the following scatterplots: Chapter 3: Describing Relationships Example 4: Describe the scatterplot to the right: Example 5: Describe the scatterplot to the left: Example 6: Draw a scatterplot and find r for the following data: 1 2 3 4 5 6 7 8 9 10 11 12 x 3 2 2 4 5 15 22 13 6 5 4 1 y 0 1 2 1 2 9 16 5 3 3 1 0 Example 7: Match the r values to the Scatterplots on page 2 of the notes 1) r = -0.99 2) r = -0.7 3) r = -0.3 4) r = 0 5) r = 0.5 6) r = 0.9 Homework: pg. 104-09, probs 1, 3, 5, 19, 23 4

Section 3.2: Least-Squares Regression Chapter 3: Describing Relationships Objectives: Students will be able to: Make predictions using regression lines, keeping in mind the dangers of extrapolation Calculate and interpret a residual Interpret the slope and y-intercept of a regression line Determine the equation of a least-squares regression line using technology or computer output Construct and interpret residual plots to assess whether a regression model is appropriate Interpret the standard deviation of the residuals and r² and use these values to assess how well a least-squares regression line models the relationship between two variables Describe how the least-squares regression line, standard deviation of the residuals, and r² are influenced by outliers Find the slope and y-intercept of the least-squares regression line from the means and standard deviations of x and y and their correlation Vocabulary: b 0 the y-intercept, the predicted value of y when x = 0 b 1 the slope, the amount by which the predicted value of y changes when x increases by 1 unit Coefficient of Determination (r 2 ) measures the percentage of total variation in the response variable that is explained by the least-squares regression line. Extrapolation using a regression line for prediction far outside the interval of x values used to obtain the line. Influential Observation observation that significantly affects the value of the slope Least-squares regression line line that makes the sum of the squared residuals as small as possible Regression Line a line that describes how a response variable y changes as an explanatory variable x changes; expressed in the form y = b 0 + b 1 x, where y-hat is the predicted value for y, given a value of x Residual difference between the actual value of y and the value of y predicted by the regression line; residual = actual y predicted y = y y Residual plot a scatterplot that displays the residuals on the vertical axis and the explanatory variable on the horizontal axis Standard deviation of the residuals, s measures the size of a typical residual; the typical distance between the actual y values and the predicted y values Key Concepts: Least Squares Regression Line LSR line minimizes the square of the residuals (difference between predicted and observed). 5

Example 1: a) Describe the scatterplot b) Guess at the line of best fit c) c) Using your calculator do the scatterplot for this data, checking it against the plot in your notes d) d) Again using your calculator (1-VarStats) calculate the LS regression line using the formula (r = -0.7786) e) Now use you calculator to calculate the LS regression line, r and r² f) Calculate r² using the formulas Residuals Part Two: Describe the residuals in the following scatterplots and state whether they represent good linear fits A) B) C) 6

Key Concepts: Least Squares Regression Facts: The distinction between explanatory and response variable is essential in regression There is a close connection between correlation and the slope of the LS line The LS line always passes through the point (x-bar, y-bar) The square of the correlation, r², is the fraction of variation in the values of y that is explained by the LS regression of y on x Outliers vs. Influential Observations: Outlier is an observation that lies outside the overall pattern of the other observations Outliers in the Y direction will have large residuals. but may not influence the slope of the regression line Outliers in the X direction are often influential observations Influential observation is one that if by removing it, it would markedly change the result of the regression calculation Since lurking variables are often misused by AP Stats students, using the words extraneous variables is a better bet. It does not carry the specific meanings that lurking does to the AP Stat test readers. Other Items Remember association does not mean causation! Instances of Rocky Mt spotted fever and drownings reported per month are highly correlated, but completely without causation Individual versus Summarized Data: When we looked at individual values, they had much broader spreads (variances) than when we looked at the distributions of x-bar Same is true with correlations based on averaged data strong correlations may exist between averages, but individuals will have much greater variances Correlations based on averages are usually too high when applied to individuals. Example 1: Does the age at which a child begins to talk predict later score on a test of metal ability? A study of the development of 21 children recorded the age in months at which they spoke their first word and their later Gesell Adaptive Score (GAS). Child Age GAS Child Age GAS Child Age GAS 1 15 95 8 11 100 15 11 102 2 26 71 9 8 104 16 10 100 3 10 83 10 20 94 17 12 105 4 9 91 11 7 113 18 42 57 5 15 102 12 9 96 19 17 121 6 20 87 13 10 83 20 11 86 7 18 93 14 11 84 21 10 100 a) What is the equation of the LS regression line used to model this data? b) What is the interpretation of this data? 7

c) Are there any outliers? d) Are there any influential observations? Homework: pg. 138-44, probs: 37, 41, 55, 59, 67 8

Chapter 3: Review Objectives: Students will be able to: Summarize the chapter Define the vocabulary used Know and be able to discuss all sectional knowledge objectives Complete all sectional construction objectives Successfully answer any of the review exercises Construct and interpret a scatterplot for a set of bivariate data. Compute and interpret the correlation, r, between two variables. Demonstrate an understanding of the basic properties of the correlation r. Explain the meaning of a least squares regression line. Given a bivariate data set, construct and interpret a regression line. Demonstrate an understanding of how one measures the quality of a regression line as a model for bivariate data. Vocabulary: None new TI-83 Instructions for Scatter Plots Enter explanatory variable in L1 Enter response variable in L2 Press 2 nd y= for StatPlot, select 1: Plot1 Turn plot1 on by highlighting ON and enter Highlight the scatter plot icon and enter Press ZOOM and select 9: ZoomStat TI-83 Instructions for Linear Correlation, r With explanatory variable in L1 and response variable in L2 Turn diagnostics on by Go to catalog (2 nd 0) Scroll down and when diagnosticon is highlighted, hit enter twice Press STAT, highlight CALC and select 4: LinReg (ax + b) and hit enter twice Read r value (last line) TI-83 Instructions for Linear Regression and Plotting the Regression Line 2 nd 0 (Catalog); scroll down to DiagnosticON and press Enter twice (like Catalog help do once) Enter X data into L1 and Y data into L2 Define a scatterplot using L1 and L2 Use ZoomStat to see the data properly Press STAT, choose CALC, scroll to LinReg(a+bx) Enter LinReg(a+bx)L1,L2,Y1 Y1 is found under VARS / Y-VARS / 1: function TI-83 Instructions for Residuals and SSE After getting the scatterplot (plot1) and the LS regression line as before Define L3 = Y1(L1) [remember how we got Y1!!] Define L4 = L2 L3 [actual predicted] Turn off Plot1 and deselect the regression eqn (Y=) With Plot2, plot L1 as x and L4 as y Use 1-VarStat L4 to find sum of residuals squared Homework: R2.7-9 9

5 Minute Reviews: Lesson 3-1a: 1. Describe each scatterplot 2. Identify the explanatory and response variables A study observes a large group of people over a 10-year period. The goal is to see if overweight and obese people are more likely to die during the study than people who weigh less. Such studies can be misleading because obese people are more likely to be inactive and poor. 3. Could we conclude that increase weight causes greater risk of dying if the study reveals a strong positive correlation? Lesson 3-1b: 1. Are correlations are resistant to outliers? 2. When we change units of measure for one of the variables in a correlation, we have to recalculate the correlation, r. 3. Perfect negative correlation is. 4. An extremely strong correlation proves a variable caused a reaction in another variable. 5. How do we get the linear regression line to display on the scatter plot? 10

Residua Residual Residual Residua Chapter 3: Describing Relationships Lesson 3-2a: 1. Label the following graph with interpolation and extrapolation areas 2. A regression line is a mathematical or statistical relationship? 3. Write the definition of residuals: 4. What does the Least-square regression line minimize? Lesson 3-2b: A. 1. Describe each residual plot B. C. D. Explanatory Explanatory Explanatory Explanatory 2. What does a positive residual mean? A negative residual? 3. Define what r 2 is. 11