Chapter 3: Examining Relationships

Similar documents
Chapter 3: Describing Relationships

3.2 Least- Squares Regression

Section 3.2 Least-Squares Regression

3.2A Least-Squares Regression

Chapter 3 CORRELATION AND REGRESSION

Chapter 1: Exploring Data

A response variable is a variable that. An explanatory variable is a variable that.

STATISTICS INFORMED DECISIONS USING DATA

Lecture 12: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

CHAPTER 3 Describing Relationships

Lecture 6B: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

Introduction to regression

AP Statistics Practice Test Ch. 3 and Previous

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test February 2016

CHAPTER ONE CORRELATION

Unit 1 Exploring and Understanding Data

STAT 201 Chapter 3. Association and Regression

5 To Invest or not to Invest? That is the Question.

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

Reminders/Comments. Thanks for the quick feedback I ll try to put HW up on Saturday and I ll you

Math 075 Activities and Worksheets Book 2:

1.4 - Linear Regression and MS Excel

Lab 4 (M13) Objective: This lab will give you more practice exploring the shape of data, and in particular in breaking the data into two groups.

HW 3.2: page 193 #35-51 odd, 55, odd, 69, 71-78

STATS Relationships between variables: Correlation

AP Statistics. Semester One Review Part 1 Chapters 1-5

IAPT: Regression. Regression analyses

M 140 Test 1 A Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

(a) 50% of the shows have a rating greater than: impossible to tell

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do.

Regression Equation. November 29, S10.3_3 Regression. Key Concept. Chapter 10 Correlation and Regression. Definitions

SCATTER PLOTS AND TREND LINES

Statistics and Probability

Pitfalls in Linear Regression Analysis

3. For a $5 lunch with a 55 cent ($0.55) tip, what is the value of the residual?

Statistics for Psychology

Stat 13, Lab 11-12, Correlation and Regression Analysis

Business Statistics Probability

bivariate analysis: The statistical analysis of the relationship between two variables.

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

Descriptive Statistics Lecture

Part 1. For each of the following questions fill-in the blanks. Each question is worth 2 points.

Lesson 1: Distributions and Their Shapes

Homework #3. SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

M 140 Test 1 A Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 60

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

Conditional Distributions and the Bivariate Normal Distribution. James H. Steiger

CHILD HEALTH AND DEVELOPMENT STUDY

Chapter 4. More On Bivariate Data. More on Bivariate Data: 4.1: Transforming Relationships 4.2: Cautions about Correlation

How Faithful is the Old Faithful? The Practice of Statistics, 5 th Edition 1

Further Mathematics 2018 CORE: Data analysis Chapter 3 Investigating associations between two variables

Population. Sample. AP Statistics Notes for Chapter 1 Section 1.0 Making Sense of Data. Statistics: Data Analysis:

INTERPRET SCATTERPLOTS

Bouncing Ball Lab. Name

(a) 50% of the shows have a rating greater than: impossible to tell

Chapter 14. Inference for Regression Inference about the Model 14.1 Testing the Relationship Signi!cance Test Practice

12.1 Inference for Linear Regression. Introduction

Simple Linear Regression the model, estimation and testing

14.1: Inference about the Model

Statistical Methods and Reasoning for the Clinical Sciences

Problem Set 3 ECN Econometrics Professor Oscar Jorda. Name. ESSAY. Write your answer in the space provided.

Standard Deviation and Standard Error Tutorial. This is significantly important. Get your AP Equations and Formulas sheet

Relationships. Between Measurements Variables. Chapter 10. Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

LAB ASSIGNMENT 4 INFERENCES FOR NUMERICAL DATA. Comparison of Cancer Survival*

STATISTICS & PROBABILITY

Chapter 3 Review. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Table of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017

Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world

1 Version SP.A Investigate patterns of association in bivariate data

Understandable Statistics

Chapter 3, Section 1 - Describing Relationships (Scatterplots and Correlation)

Eating and Sleeping Habits of Different Countries

Regression. Lelys Bravo de Guenni. April 24th, 2015

UF#Stats#Club#STA#2023#Exam#1#Review#Packet# #Fall#2013#

Statisticians deal with groups of numbers. They often find it helpful to use

10. LINEAR REGRESSION AND CORRELATION

Chapter 4: Scatterplots and Correlation

Still important ideas

STAT 135 Introduction to Statistics via Modeling: Midterm II Thursday November 16th, Name:

Regression CHAPTER SIXTEEN NOTE TO INSTRUCTORS OUTLINE OF RESOURCES

VU Biostatistics and Experimental Design PLA.216

HERITABILITY INTRODUCTION. Objectives

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Correlation and regression

Chapter 4: More about Relationships between Two-Variables Review Sheet

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

Unit 8 Day 1 Correlation Coefficients.notebook January 02, 2018

STATISTICS 201. Survey: Provide this Info. How familiar are you with these? Survey, continued IMPORTANT NOTE. Regression and ANOVA 9/29/2013

Examining Relationships Least-squares regression. Sections 2.3

This means that the explanatory variable accounts for or predicts changes in the response variable.

Welcome to OSA Training Statistics Part II

Simple Linear Regression One Categorical Independent Variable with Several Categories

Lesson 9 Presentation and Display of Quantitative Data

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Making Inferences from Experiments

Homework Linear Regression Problems should be worked out in your notebook

AP Stats Chap 27 Inferences for Regression

10/4/2007 MATH 171 Name: Dr. Lunsford Test Points Possible

8.SP.1 Hand span and height

Transcription:

Name Date Per Key Vocabulary: response variable explanatory variable independent variable dependent variable scatterplot positive association negative association linear correlation r-value regression line mathematical model least-squares regression line ŷ y-hat SSM SSE r 2 coefficient of determination residuals residual plot influential observation Calculator Skills: seq(x,x,min,max,scl) x, s, y, s x y 2-Var Stats sum Clear All Lists residual plot Diagnostic On Use a separate sheet of paper to answer the questions, if more space is needed. 3.1 Scatterplots 1. (p.118) Sir Francis Galton ( - ), an English statistician related to American, invented the words and. 2. What is the difference between a response variable and an explanatory variable? 3. How are response and explanatory variables related to dependent and independent variables? 4. Is it proper to use the terms, response variable and explanatory variable, if the explanatory variable does not actually cause the response variable? 5. What is the order of tasks involved in examining relationships between two variables? (p.122) 6. A scatterplot shows the relationship between two variables measured on the individuals.

7. True or false: In a scatterplot, each point represents one individual; the x-coordinate of the point represents the value of one variable and the y-coordinate represents the value of another variable measured on that same individual. 8. Suppose that someone has math scores for the children in one classroom, and English scores for a second set of children in another classroom. The person asks you about making a scatterplot for these data. What would you say? 9. Which variable always appears on the horizontal axis of a scatterplot? 10. When describing a scatterplot, to what three aspects of the pattern should you refer? 11. True or false: In describing the form of a scatterplot, it is important to say whether the graph appears to be linear or not. 12. In describing the form of a scatterplot, what term do you use if the values tend to fall into two or more groups that are separated from one another by gaps? 13. In describing the direction of a scatterplot, when there is a positive or negative slope, we say that the variables are positively or negatively. 14. True or false: In describing the strength of a scatterplot, we look at the amount of scatter in the data points how close the points lie to a simple form such as a line. 15. Explain the difference between a positive association and a negative association (using the definition on p. 135). 16. When you are drawing a scatterplot, what symbols should you use on the axes if the origin of the graph is not at zero? 17. What are three other tips for drawing scatterplots properly? 18. Suppose that you want your scatterplot to reflect the influence of a particular categorical variable, in addition to the relationship of the two quantitative variables that are plotted. For example, suppose you want to graph the relation between entertainment violence and real-life violence for males and females on the same graph, in such a way that displays the relationship separately for males and females. What should you do? 19. A common problem in constructing a scatterplot occurs when two or more individuals have exactly the same values for each of the two variables. What should you do in that case? 3.2 Correlation 1. Which is a better method for judging the strength of a linear relationship: simply to look at the graph, or to use a calculated numerical statistic that summarizes the strength of the linear relationship? Explain why. 2. What does correlation measure? 3. We ve used Greek letter to represent a population mean, x-bar to represent a sample mean; Greek letter to represent the population standard deviation, and s to represent the sample standard deviation. What letter does our book use to designate what is called the correlation?

4. Given that letter above, for the correlation coefficient, is in our own alphabet and not the Greek alphabet, do you think it refers to a sample statistic or a population parameter? 5. Would you guess that there is some other Greek letter that refers to the population value of the correlation coefficient? xi 6. When you look at the formula for the sample correlation coefficient that your text gives, you see s yi y. Can you give a simpler name to these expressions? s y x x and 7. What is the meaning of a positive and negative sign associated with the correlation coefficient? 8. True or false: Correlation makes a distinction between explanatory and response variables. 9. Suppose one person calculates the correlation of IQ score of some individuals with number of boxing matches fought, testing the hypothesis that boxing (the explanatory variable) affects IQ (the response variable. A second person, using the same data set, also calculates the correlation of the number of fights with IQ score, only this person thinks of IQ as the explanatory variable and number of fights as the response variable. Do they get the same correlation, or different ones? 10. Explain why two variables must both be quantitative in order to find the correlation between them. 11. Suppose someone codes race as follows: 0 = Caucasian, 1 = African American, 2 = Asian, 3 = Hispanic, 4 = American Indian, 5 = Other. Then someone calculates a correlation between race and a reading test score for a sample of kids. Do you have a problem with this? If so, what s your problem? 12. True or false: A correlation coefficient has units. 13. Melinda computes a correlation between the height of mothers and their daughters. Larry is looking at the computations and says, You blew it! You have the height of mothers measured in centimeters, and the height of the daughters measured in inches! Does Melinda need to do anything to fix her correlation coefficient, and if so, what? 14. What range of values is possible for the correlation coefficient? 15. What is true about the relationship between two variables if the r-value is: a. Near 0? d. Exactly 1? b. Near 1? e. Exactly -1? c. Near -1? 16. What sort of correlation coefficient do you find when two variables have a very strong linear relationship, and when the first gets greater, the second gets smaller? 17. Suppose the data points are two variables collected for all the days of 2006. For each of those days, imagine that we know (variable 1) the number of words Mrs. O. spoke in that day, and (variable 2) the peak barometric pressure for that day in Caracas, Venezuela. About what would you guess the correlation between these two variables to be? Why?

18. True or false: Correlation measures the strength of relationships other than just linear. 19. Suppose there are two variables which, when graphed in a scatterplot, form an almost perfect u-shaped parabola. Would the strong relationship between these variables imply a high correlation coefficient (meaning close to 1 or - 1)? Why or why not? 20. Does the correlation coefficient resemble the median and IQR in being fairly resistant to outliers, or resemble the mean and standard deviation in being heavily influenced by outliers (i.e. non-resistant)? 21. Someone practices guessing correlation coefficients from scatterplots using an applet on the internet. Why should the person not get too confident of his or her guessing power given scatterplots of real-life data? (Read p. 144 and look Figure 3.8 on p. 141) 22. In attempting to give a more complete description of a set of data involving two variables, someone want to give a measure of center and spread as well as the correlation coefficient. Assuming the person has made a good decision to use the correlation coefficient, what measure of center and spread would be most consistent with the correlation coefficient: the mean and standard deviation or the median and IQR? 23. The women in a corporation think that they are being discriminated against in their salaries. A management spokesman says to them, Look at this plot. The first data point is the average salary for men who have worked here 1 year, put into an ordered pair with the salary for women who have worked here one year. The second ordered pair is the average salary for men and women with two year s experience, and so forth. The correlation between men s salaries and women s salaries is.95! That s almost a perfect correlation! You women have nothing to complain about! Is this argument valid? Why or why not? 3.3 Least-Squares Regression 1. Finish this statement: A regression line is a straight line that 2. The least-squares regression line (abbreviated: ) is one way to try to fit a to two-variable data that shows a linear trend. 3. Because we use a regression line to y-values from given x-values, we want a regression line that makes the distances of the points in a scatterplot to the regression line as as possible. 4. Why is this regression model called a least squares regression line? 5. In other words, this is a line that minimizes the total in the squares. 6. The equation for a LSRL is ŷ = 7. True or false: This is the same equation (using the same letters) we use for lines in algebra.

8. Why do we use ŷ instead of y? 9. The slope of a LSRL is b =. 10. The intercept of a LSRL can be found by a =. 11. Under STAT-CALC in your graphing calculator, find the correct LinReg command for lines in statistics. It is NOT 4: LinReg(ax + b) but : 12. When copying down a LSRL from the calculator, don t forget to write instead of just y =. (Mrs. O. forgets this a lot please gently remind her!) 13. Interpreting the slope is important think of it as a rate of change. That is the amount of change in when increases by one unit. 14. The intercept of the regression line is the value of ŷ when x =. 15. Once you have a LSRL, how do you find a predicted value of y for a given x-value? 16. Suppose that someone measures height as a function of weight for a bunch of human adults, and gets a regression equation predicting height as a function of weight. Why is the y-intercept of the equation not as meaningful or important as the slope, or as the equation as a whole? 17. Look at both computer outputs on p.156. It is very important that you can find the slope and the y-intercept of a regression line from these. Use p. 155 to help you identify them. (Ignore all the other statistics for now.) 18. Suppose you have a regression equation output from a computer and you are asked to plot the line by hand. How would you do it? 19. Computer outputs for r 2 say. 20. While r is called the correlation coefficient, r 2 is called the of. 21. Finish this statement: r 2 is the fraction of 22. The r 2 value shows how much of the variation in one variable can be accounted for by the linear relationship with the other variable. If r 2 = 0.95, what can be concluded about the relationship between x and y? 23. True or false: In a regression line, like a correlation coefficient, you get the same numbers (slopes and intercepts) no matter which variable is considered the explanatory variable and which is considered the response variable.

24. True or false: If two variables are perfectly correlated, then the slope of the LSRL and the correlation coefficient r are the same. 25. Every LSRL passes through the point (, ) 26. When you see a correlation r, square it to get a better feel for the strength of the association. Read the paragraph on p. 165. In the r 2 scale, a correlation of.7 is about halfway between 0 and 1 because r 2 would equal. 27. Define residual and give the formula written in words and symbols. 28. The mean of the least-squares residuals is always. (It might be approximate due to.) 29. If a LSRL fits the data well, what do you see on the residual plot? 30. True or false: A curved pattern on a residual plot means the data is not linear. (Recall: a curved pattern on a normal probability plot shows that the data is not very.) 31. An outlier in a scatterplot is any observation that lies outside the overall of the other observations in any direction. It will have a large residual if it is an outlier in the direction. 32. An influential point is an observation that has a effect on the calculations of least-squares regression. These are generally outliers in the direction and may not have large residuals.