Math 075 Activities and Worksheets Book 2:

Similar documents
(a) 50% of the shows have a rating greater than: impossible to tell

How Faithful is the Old Faithful? The Practice of Statistics, 5 th Edition 1

CHAPTER 3 Describing Relationships

HW 3.2: page 193 #35-51 odd, 55, odd, 69, 71-78

(a) 50% of the shows have a rating greater than: impossible to tell

Lecture 12: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

Homework #3. SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

Lecture 6B: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

AP Statistics Practice Test Ch. 3 and Previous

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

7) Briefly explain why a large value of r 2 is desirable in a regression setting.

Chapter 3: Examining Relationships

Lesson 1: Distributions and Their Shapes

Chapter 3 Review. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do.

Lab 5a Exploring Correlation

Scatter Plots and Association

LAB ASSIGNMENT 4 INFERENCES FOR NUMERICAL DATA. Comparison of Cancer Survival*

3.2A Least-Squares Regression

Chapter 3 CORRELATION AND REGRESSION

q3_2 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

STAT 201 Chapter 3. Association and Regression

Homework Linear Regression Problems should be worked out in your notebook

A response variable is a variable that. An explanatory variable is a variable that.

Chapter 14. Inference for Regression Inference about the Model 14.1 Testing the Relationship Signi!cance Test Practice

ANALYZING BIVARIATE DATA

STOR 155 Section 2 Midterm Exam 1 (9/29/09)

14.1: Inference about the Model

Regression Equation. November 29, S10.3_3 Regression. Key Concept. Chapter 10 Correlation and Regression. Definitions

INTERPRET SCATTERPLOTS

UF#Stats#Club#STA#2023#Exam#1#Review#Packet# #Fall#2013#

Section 1.2 Displaying Quantitative Data with Graphs. Dotplots

Lesson 2: Describing the Center of a Distribution

Answer all three questions. All questions carry equal marks.

3. For a $5 lunch with a 55 cent ($0.55) tip, what is the value of the residual?

Chapter 3: Describing Relationships

Lab 4 (M13) Objective: This lab will give you more practice exploring the shape of data, and in particular in breaking the data into two groups.

Unit 8 Bivariate Data/ Scatterplots

Semester 1 Final Scientific calculators are allowed, NO GRAPHING CALCULATORS. You must show all your work to receive full credit.

Business Statistics Probability

Stat 13, Lab 11-12, Correlation and Regression Analysis

Level 3 AS Credits Internal Investigate Bivariate Measurement Data Written by Jake Wills MathsNZ

The North Carolina Health Data Explorer

5 To Invest or not to Invest? That is the Question.

Lesson Using Lines to Make Predictions

Pre-Test Unit 9: Descriptive Statistics

REVIEW PROBLEMS FOR FIRST EXAM

CHAPTER ONE CORRELATION

MiSP Solubility Lab L3

MEASURES OF ASSOCIATION AND REGRESSION

Problem Set 3 ECN Econometrics Professor Oscar Jorda. Name. ESSAY. Write your answer in the space provided.

The Jumping Dog Quadratic Activity

Further Mathematics 2018 CORE: Data analysis Chapter 3 Investigating associations between two variables

Chapter 4. Navigating. Analysis. Data. through. Exploring Bivariate Data. Navigations Series. Grades 6 8. Important Mathematical Ideas.

Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world

10/4/2007 MATH 171 Name: Dr. Lunsford Test Points Possible

STATISTICS 201. Survey: Provide this Info. How familiar are you with these? Survey, continued IMPORTANT NOTE. Regression and ANOVA 9/29/2013

BIVARIATE DATA ANALYSIS

Homework 2 Math 11, UCSD, Winter 2018 Due on Tuesday, 23rd January

1. To review research methods and the principles of experimental design that are typically used in an experiment.

3.4 What are some cautions in analyzing association?

Lesson 1: Distributions and Their Shapes

Correlation & Regression Exercises Chapters 14-15

USING STATCRUNCH TO CONSTRUCT CONFIDENCE INTERVALS and CALCULATE SAMPLE SIZE

IAPT: Regression. Regression analyses

Section 3.2 Least-Squares Regression

Eating and Sleeping Habits of Different Countries

Math 124: Module 2, Part II

Statistics and Probability

bivariate analysis: The statistical analysis of the relationship between two variables.

Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 5 Residuals and multiple regression Introduction

INTERMEDIATE ALGEBRA Review for Exam 3

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

Part I: Alcohol Metabolization Explore and Explain

Mean Absolute Deviation (MAD) Statistics 7.SP.3, 7.SP.4

M 140 Test 1 A Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

Vitruvian Man Meets the Scientific Method Writing and Testing Appropriate Hypotheses

Bouncing Ball Lab. Name

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

7. Bivariate Graphing

3.2 Least- Squares Regression

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Choosing a Significance Test. Student Resource Sheet

Practice First Midterm Exam

What Do You Think? For You To Do GOALS. The men s high jump record is over 8 feet.

Introduction to regression

Ordinary Least Squares Regression

Exemplar for Internal Assessment Resource Mathematics Level 3. Resource title: Sport Science. Investigate bivariate measurement data

Regression CHAPTER SIXTEEN NOTE TO INSTRUCTORS OUTLINE OF RESOURCES

Multiple Choice Questions

SCATTER PLOTS AND TREND LINES

1. The figure below shows the lengths in centimetres of fish found in the net of a small trawler.

STAT 135 Introduction to Statistics via Modeling: Midterm II Thursday November 16th, Name:

Section I: Multiple Choice Select the best answer for each question.

STAT445 Midterm Project1

Section 6: Analysing Relationships Between Variables

Preliminary Report on Simple Statistical Tests (t-tests and bivariate correlations)

Section 3 Correlation and Regression - Teachers Notes

Chapter 3, Section 1 - Describing Relationships (Scatterplots and Correlation)

12.1 Inference for Linear Regression. Introduction

Transcription:

Math 075 Activities and Worksheets Book 2: Linear Regression Name: 1

Scatterplots Intro to Correlation Represent two numerical variables on a scatterplot and informally describe how the data points are distributed and any apparent relationship that exists between the two variables (e.g., between time spent on homework and grade level). Write positive correlation, negative correlation, or no correlation to describe each relationship. 1. 2. 3. 4. 5. 6. 2

7. Use the given data to make a (year, units of CD s) scatter plot. 8. What kind of correlation is there between the year and the number of CD s sold? 9. Use the given data to make a (year, units of cassettes) scatter plot. 10. What kind of correlation is there between the year and the number of cassettes sold? 11. The scatter plot to the right shows the average traffic volume and average vehicle speed on a I-80 for 50 days in 2009. Which statement best describes the relationship between average traffic volume and average vehicle speed shown on the scatter plot? A As traffic volume increases, vehicle speed increases. B As traffic volume increases, vehicle speed decreases. C As traffic volume increases, vehicle speed increases at first, then decreases. D As traffic volume increases, vehicle speed decreases at first, then increases. 3

Understanding Scatterplots Match each description for a set of measurements (A and B) to a scatterplot, and briefly explain your reasoning. Each graph in this packet can only be used once. Scatterplot 1: Scatterplot 2: 1. If x = city miles per gallons and y = highway miles per gallon for 10 cars, describe which scatter plot is likely the correct graph. Explain your reasoning. a. What does a dot represent? 2. If x = sodium (milligrams/serving) and y = Consumer Reports quality rating for 10 salted peanut butters, describe which scatterplot is likely the correct graph. Explain your reasoning. a. What does a dot represent? 4

These scatterplots show body measurements for 34 adults who are physically active. Some measurements are a girth, which is a measure of length around a body part. Match each description (A, B, and C) to a scatterplot. Briefly explain your reasoning. A. B. C. 3. x = forearm girth (centimeters), y = bicep girth (cm). The bicep is above the elbow. a. What does a dot represent? 4. x = calf girth (cm), y = bicep girth (cm). The calf is below the knee. a. What does a dot represent? 5. x = age (years), y = bicep girth (cm) a. What does a dot represent? 5

Match each description of a set of measurements to a scatterplot. Then describe what a dot represents in each graph. 6. x = average outdoor temperature and y = heating costs of a residence during the winter 7. x = height (inches) and y = shoe size for a random sample of adults 8. x = height (inches) and y = score on an intelligence test for a random sample of teenagers (15-17) Lines have been added to some of the scatterplots used in the Lesson 3.1.1 to summarize the relationship between the ingredient and the Consumer Reports rating for breakfast cereals. You will learn more about summary lines in future lessons. 9. Which ingredients (sugar, protein, and/or fat) are negatively associated with ratings? 10. Which of the negatively associated ratings is the strongest? 6

House Prices: Correlation Your lab report should include a well written response to each of the following questions and all relevant supporting graphs and analyses performed using StatCrunch. Submit your assignment through CANVAS by uploading it as a document (either in word format, or in pdf). Remember to put your name on the document itself. Regression: Find the data set house prices.txt from the class StatCrunch group and load it into StatCrunch. The data set house prices contains information collected on characteristics of houses that were sold in a suburban community. House prices (the price at which the house sold in thousands of dollars), its size (in square feet), and other characteristics of the house that are usually recorded when a house is on the market. In task #1 we want to investigate the relationship between the price of a house and its size. 1. Which variable is the explanatory variable and which variable is the response variable when investigating the relationship between the price of a house and its size? 2. Construct a scatterplot for the data. Graph -> Scatter Plot -> For x variable, select the explanatory variable. For y variable, select the response variable. Click Compute. Copy the graph below: 3. How can we describe the nature of the relationship between house price and size from the scatterplot? (Think form, direction, strength.) Do you notice any outliers or deviations from the general pattern? 4. Compute the correlation for house price and size. Stat -> Summary Stats -> Correlation. Choose the two variable names for which you want to calculate the correlation. Click Compute. 5. What does the correlation you found say about the nature of their relationship? (Think about what the correlation measures.) 7

Diamond Linear Regression Worksheet An article in the Journal of Statistics Education reported the price of diamonds of different sizes in Singapore dollars (SGD). The following data set contains a data set that is consistent with this data, adjusted to US dollars in 2004 Open the Diamond Data set in our StatCrunch Group, and answer the questions. 1. What is the response variable and what is the explanatory variable in this model? - Explanatory variable: - Response variable: 2. Explain why you chose the way you did for Number 1 3. Construct a Scatterplot. How do you describe the scatterplot relationship that you observe? form: direction: strength: 4. Find the least square s regression line that describes the price of a diamond in relation to it s carat size. 5. What is the slope, in units? Interpret the slope using a complete sentence. 8

6. What is the y intercept with units? Interpret the y intercept using a complete sentence. 7. Using the regression equation estimate the cost of a diamond that is 0.32 carats big. 8. Nick bought a diamond that is 0.32 and was included in the data set given. What is his residual? What does that mean? Did Nick overpay or underpay? 9. Calculate the correlation. What does the correlation say about the nature of the relationship between diamond size and price? 10. How much variability in price does the carat size explain? What number are you using for your answer? 9

Residual Plots: 11. Construct a residual plot. a. Go to Stat > Regression > Simple Linear. Select the appropriate explanatory and response variables. b. On the editing page: scroll down to graphs. Under graphs, scroll down and highlight Residual vs X- values. Click compute. c. Which graph from (a) (f) from above does your residual plot look like? Notice that the red line must be at y = 0 d. If a linear model is a good fit, the graph will look like (a), scattered everywhere! Do you think that a linear model is a good fit for our data? Why or why not? 10

Now construct a histogram of the residuals under the same menu. 12. If a linear model is a good fit the histogram of the residuals should be normal, centered around zero, like below. Do you think a linear model is a good fit for our data? Why or why not? 13. What is the standard error Se? This is found by going to Stat > Regression > Simple Linear. Select the appropriate explanatory and response variables, and the output should have a Estimate of error standard deviation. This is the Se. 14. Se represents the average distance that the observed values fall from the regression line. Write a statement in context of the data we are writing about. 11

USPS Postal Linear Regression Classwork In an effort to decide if there is an association between the year of a postal increase and the new postal rate for first class mail, the data were gathered from the United States Postal Service. In 1981, the United States Postal Service changed their rates on March 22 and November 1. This information is shown in the data set below. Find it on StatCrunch, and load it. 1. Choose an appropriate year representation for t = 0. We do not want to use such big numbers for our model. Make a new column that has the title Years since 1970. Put in the appropriate numbers. 2. What is the response variable and what is the explanatory variable in this model? - Response variable: - Explanatory variable: 3. Explain why you chose the way you did for Number 1 4. Construct a Scatterplot. How do you describe the scatterplot relationship that you observe? form: direction: strength: 5. Find the least square s regression line that describes relationship between the year and postal rate. 12

6. What is the slope, in units? Interpret the slope using a complete sentence. 7. What is the y intercept with units? Interpret the y intercept using a complete sentence. 8. Using the regression equation estimate the cost of a postage stamp in 1977. (Hint: you re not plugging in 1977!) 9. The actual postage stamp cost $0.13. What is the residual? (Remember the residual is the actual value minus the predicted value). 10. Calculate the correlation. What does the correlation say about the nature of the relationship between years since 1970 and the postage rate? 11. How much variability in postage rate does the year explain? What number are you using for your answer? 13

Residual Plots: 12. Construct a residual plot. a. Go to Stat > Regression > Simple Linear. Select the appropriate explanatory and response variables. b. On the editing page: scroll down to graphs. Under graphs, scroll down and highlight Residual vs X- values. Click compute. c. Which graph from (a) (f) from above does your residual plot look like? Notice that the red line must be at y = 0 d. If a linear model is a good fit, the graph will look like (a), scattered everywhere! Do you think that a linear model is a good fit for our data? Why or why not? 14

Now construct a histogram of the residuals under the same menu. 13. If a linear model is a good fit the histogram of the residuals should be normal, centered around zero, like below. Do you think a linear model is a good fit for our data? Why or why not? 14. What is the standard error Se? This is found by going to Stat > Regression > Simple Linear. Select the appropriate explanatory and response variables, and the output should have a Estimate of error standard deviation. This is the Se. 15. Se represents the average distance that the observed values fall from the regression line. Write a statement in context of the data we are writing about. 15

Olympics Long Jump The following data set contains a data set with the winning jump lengths (in meters) for the Olympics Men s Long Jump Winners. Open the data set in our StatCrunch Group, and answer the questions. 1. What is the response variable and what is the explanatory variable in this model? - Explanatory variable: - Response variable: 2. Explain why you chose the way you did for Number 1 3. Construct a Scatterplot. How do you describe the scatterplot relationship that you observe? form: direction: strength: 4. Find the least square s regression line that describes the length of a winning jump in relation to the year 5. What is the slope, in units? Interpret the slope using a complete sentence. 16

6. There was no data for 1940? Google search if there was Olympics in 1940 and explain why there s no data. 7. If there had been Olympics in 1940, predict what the winning long jump would have been using your regression model. Does this number seem reasonable? 8. Is it okay to predict the future? Predict what the winning long jump will be in 2180. Does this number seem reasonable? Why or why not? 9. Calculate the correlation. What does the correlation say about the nature of the relationship between the winning long jump distance and the year? 10. What is the standard error Se? This is found by going to Stat > Regression > Simple Linear. Select the appropriate explanatory and response variables, and the output should have a Estimate of error standard deviation. This is the Se. What does this number tell us? 17

Fatalities Worksheet: Linear Regression We are going to analyze the association between the number of drunk driving fatalities and the years after 1980. 1. What would be your explanatory and response variables in this analysis? Explanatory Variable: Response Variable: 2. Create a fitted line plot that describes the linear association between the years after 1980 and the number of drunk driving fatalities. Be sure to choose the appropriate explanatory and response variables. Be sure to label the axes correctly including the appropriate units being measured. Minitab: Drunk Driving Fatal Accidents 20000 18000 16000 14000 12000 Fitted Line Plot Drunk Driving Fatal Accidents = 19141-357.3 Yr Since 80 S 1227.20 R-Sq 81.6% R-Sq(adj) 80.8% 10000 0 5 10 15 Yr Since 80 20 25 3. What is the equation of the least squares regression line? 4. What is the slope of your model including units? Write a sentence that interprets this slope. 5. What is the intercept of your model? Write a sentence to interpret your intercept in context. 18

6. What is the correlation coefficient of your linear model? What does this value tell you about the strength of the linear relationship? 7. Find the predicted number of drunk driving fatalities in 1992. Show work. 8. Find the estimated residual (error) for the number of drunk driving fatalities in 1992. (Remember the residual is the actual value minus the predicted value). 9. What is the Coefficient of Determination (r 2 )? Write a sentence to interpret it. 10. What is the Standard Error of Regression (S e)? Write a sentence to interpret it. 19

Play Ball!! And Do Statistics!!! Objective: You will be playing soccer today, represent the data on a scatterplot, and analyze the data. Materials: Soccer board and spinner 1 large paper clip per group A ball such as a penny Directions: 1. Flip your penny to decide who goes first 2. Put the penny on the dark line in the center of the game board. 3. Players take turns (one goes, then the other person goes) by spinning the paper clip on the spinner and moving that many yards toward his/her opponent s goal (each line on the game board represents ten yards). a. Keep track of the total number of yards for both players in the table below. b. This means, for instance if the first person who goes spins 10, then the first data point would be (1,10) If the second person scores 20, the next data point would be (2,30) since it s the TOTAL yards. 4. Each time a player gets to his/her opponent s goal, s/he scores one point Collecting Data: Collect the following data, stop after 20 turns (each person will have 10 kicks of the ball/spins) Turn 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Total yards traveled Keep Track of your scores! 20

Now let s do some statistics: 1. Enter the Turn and Total yards traveled into StatCrunch 2. Which variable is the explanatory variable and which variable is the response variable when investigating the relationship between turns and total yards? Why did you choose this way? 3. Construct a scatterplot for the data. Graph -> Scatter Plot -> For x variable, select the explanatory variable. For y variable, select the response variable. Click Compute. Copy the graph below: 4. How can we describe the nature of the relationship between Turn and Total yards traveled from the scatterplot? (Think form, direction, strength.) Do you notice any outliers or deviations from the general pattern? 5. Compute the correlation for Turn and Total yards traveled. Stat -> Summary Stats -> Correlation. Choose the two variable names for which you want to calculate the correlation. Click Compute. 6. What does the correlation you found say about the nature of their relationship? (Think about what the correlation measures.) Is it a strong or weak correlation? 7. Find the LSRL Least Squares Regression Line by going to Stat -> Regression -> Simple Linear For x variable, select the explanatory variable. For y variable, select the response variable. Click Compute. Report the Least Squares regression line. (You don t need to give me the rest of the output, just the estimated regression equation.) 8. What is the slope, including the units? Then write a statement interpreting the slope. 9. What is the y intercept, including the units? Then write a statement interpreting the y intercept. 21

22

StatCrunch CW/HW: Linear Regression Refresher Your lab report should include a well written response to each of the following questions and all relevant supporting graphs and analyses performed using StatCrunch. Submit your assignment through CANVAS by uploading it as a document (either in word format, or in pdf). Remember to put your name on the document itself. Linear Regression: Open the Amount in Savings ($) in the StatCrunch group. The savings account was opened in 1990. The following ordered pairs give the number of years since 1990 and the amount of money in a savings account. 1. What is the response variable and what is the explanatory variable in this model? - Response variable: - Explanatory variable: 2. Explain why you chose the way you did for Number 1 3. Construct a Scatterplot. Post the scatterplot below 4. How do you describe the scatterplot relationship that you observe? form: direction: strength: 23

5. Find the least square s regression line that describes the linear relationship 6. What is the slope, in units? Interpret the slope using a complete sentence. 7. What is the y intercept with units? Interpret the y intercept using a complete sentence. 8. Calculate the correlation. What does the correlation say about the nature of the relationship between the two variables we are looking at? 9. How much variability in cost does the size explain? What number are you using for your answer? 24

Residual Plots: 10. Construct a residual plot. a. Go to Stat > Regression > Simple Linear. Select the appropriate explanatory and response variables. b. On the editing page: scroll down to graphs. Under graphs, scroll down and highlight Residual vs X-values. Click compute. c. Which graph from (a) (f) from above does your residual plot look like? Notice that the red line must be at y = 0. Post the graph below: d. If a linear model is a good fit, the graph will look like (a), scattered everywhere! Do you think that a linear model is a good fit for our data? Why or why not? 25

11. Now construct a histogram of the residuals under the same menu. Post the graph below 12. If a linear model is a good fit the histogram of the residuals should be normal, centered around zero. Do you think a linear model is a good fit for our data? Why or why not? 13. What is the standard error Se? This is found by going to Stat > Regression > Simple Linear. Select the appropriate explanatory and response variables, and the output should have a Estimate of error standard deviation. This is the Se. 14. Se represents the average distance that the observed values fall from the regression line. Write a statement in context of the data we are writing about. 26