Correlation & Regression Exercises Chapters 14-15

Similar documents
Lab 5a Exploring Correlation

3.2A Least-Squares Regression

HW 3.2: page 193 #35-51 odd, 55, odd, 69, 71-78

Math 075 Activities and Worksheets Book 2:

Section 3.2 Least-Squares Regression

AP Statistics Practice Test Ch. 3 and Previous

10/4/2007 MATH 171 Name: Dr. Lunsford Test Points Possible

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do.

3.2 Least- Squares Regression

Chapter 3 Review. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

(a) 50% of the shows have a rating greater than: impossible to tell

STOR 155 Section 2 Midterm Exam 1 (9/29/09)

REVIEW PROBLEMS FOR FIRST EXAM

Chapter 3: Examining Relationships

Chapter 3 CORRELATION AND REGRESSION

(a) 50% of the shows have a rating greater than: impossible to tell

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

STATISTICS INFORMED DECISIONS USING DATA

Lecture 6B: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

Lecture 12: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

Homework #3. SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

Lesson 1: Distributions and Their Shapes

Practice First Midterm Exam

CHAPTER 3 Describing Relationships

CHAPTER TWO REGRESSION

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

This means that the explanatory variable accounts for or predicts changes in the response variable.

Chapter 3: Describing Relationships

Chapter 3, Section 1 - Describing Relationships (Scatterplots and Correlation)

10. LINEAR REGRESSION AND CORRELATION

INTERPRET SCATTERPLOTS

Business Statistics Probability

3. For a $5 lunch with a 55 cent ($0.55) tip, what is the value of the residual?

7) Briefly explain why a large value of r 2 is desirable in a regression setting.

Relationships. Between Measurements Variables. Chapter 10. Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Homework Linear Regression Problems should be worked out in your notebook

Dr. Kelly Bradley Final Exam Summer {2 points} Name

CHAPTER ONE CORRELATION

Section 6: Analysing Relationships Between Variables

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

14.1: Inference about the Model

Chapter 4: More about Relationships between Two-Variables Review Sheet

Chapter 1: Exploring Data

Eating and Sleeping Habits of Different Countries

Problem #1 Neurological signs and symptoms of ciguatera poisoning as the start of treatment and 2.5 hours after treatment with mannitol.

M 140 Test 1 A Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 60

Unit 8 Bivariate Data/ Scatterplots

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test February 2016

STAT 201 Chapter 3. Association and Regression

AP Statistics Practice Test Unit Seven Sampling Distributions. Name Period Date

Chapter 14. Inference for Regression Inference about the Model 14.1 Testing the Relationship Signi!cance Test Practice

CP Statistics Sem 1 Final Exam Review

Regression. Lelys Bravo de Guenni. April 24th, 2015

M 140 Test 1 A Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

STT 200 Test 1 Green Give your answer in the scantron provided. Each question is worth 2 points.

Regression Equation. November 29, S10.3_3 Regression. Key Concept. Chapter 10 Correlation and Regression. Definitions

AP Stats Review for Midterm

Chapter Which of these are true and which are false? Explain why the false statements are wrong

Test 1C AP Statistics Name:

Chapter 4: More about Relationships between Two-Variables

Chapter Three in-class Exercises. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

3.4 What are some cautions in analyzing association?

STAT445 Midterm Project1

ANALYZING BIVARIATE DATA

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

CHILD HEALTH AND DEVELOPMENT STUDY

manipulation influences other variables, the researcher is conducting a(n)

EXECUTIVE SUMMARY DATA AND PROBLEM

Lesson Using Lines to Make Predictions

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Regression. Regression lines CHAPTER 5

Test 1: Professor Symanzik Statistics

STATISTICS 8 CHAPTERS 1 TO 6, SAMPLE MULTIPLE CHOICE QUESTIONS

Bangor University Laboratory Exercise 1, June 2008

Welcome to OSA Training Statistics Part II

Chapter 7: Descriptive Statistics

MEASURES OF ASSOCIATION AND REGRESSION

Scatter Plots and Association

Correlation and Regression

How to assess the strength of relationships

Simple Linear Regression the model, estimation and testing

Chapter 4: Scatterplots and Correlation

SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

Identify two variables. Classify them as explanatory or response and quantitative or explanatory.

Review for Final Exam Math 20

Homework 2 Math 11, UCSD, Winter 2018 Due on Tuesday, 23rd January

Part I: Alcohol Metabolization Explore and Explain

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

Reminders/Comments. Thanks for the quick feedback I ll try to put HW up on Saturday and I ll you

Test 1 Version A STAT 3090 Spring 2018

Lesson 11 Correlations

Paper Airplanes & Scientific Methods

NORTH SOUTH UNIVERSITY TUTORIAL 2

Lab 4 (M13) Objective: This lab will give you more practice exploring the shape of data, and in particular in breaking the data into two groups.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Medical Statistics 1. Basic Concepts Farhad Pishgar. Defining the data. Alive after 6 months?

Multiple Choice Questions

STATISTICS 201. Survey: Provide this Info. How familiar are you with these? Survey, continued IMPORTANT NOTE. Regression and ANOVA 9/29/2013

STAT 503X Case Study 1: Restaurant Tipping

Transcription:

Correlation & Regression Exercises Chapters 14-15 1. Which of these are true and which are false? Explain why the false statements are wrong. a. If the slope of the line is 1, then the correlation must also be 1. b. A correlation of 0.8 means that 80% of the points in the scatterplot lie above the regression line. c. If the correlation between two lists of numbers is zero, then there can be no relationship between them. d. For all of the books in the Library of Congress, the correlation between the thickness of the books (in inches) and their number of pages would be positive. e. For all of the cars registered in the state of Ohio, the correlation between their fuel efficiency (in miles per gallon) and their weight (in pounds) would be positive. f. If the correlation between height (in inches) and weight (in pounds) for a group of people is 0.7, then the correlation between their heights (in centimeters) and their weights (in kilograms) will still be 0.7. g. If the correlation between two variables is negative, then high values of one variable tend to be associated with low values of the other variable. 2. If the standard deviation of x is equal to the standard deviation of y then the slope of the regression line relating x and y will be: A. equal to 1. B. equal to the correlation. C. equal to the mean of x. D. equal to the standard deviation of x. Explain your choice. 3. Consider the following two correlations: I The correlation between weight (in pounds) and height (in inches) for all of the babies born in Brooklyn, N.Y. this year. II The correlation between weight (in kilograms) and height (in centimeters) for all of the babies born in Brooklyn, N.Y. this year. There are about 2.2 pounds to the kilogram and about 2.54 centimeters to the inch. Which statement is true? A. Correlation I is larger. B. Correlation II is larger. C. Correlations I and II are equal. D. There is not enough information to tell which correlation is larger. Explain your choice. 1

4. A realtor took a random sample of records of sales of homes from the files maintained by the Albuquerque Board of Realtors. From this sample he recorded the amount paid in real estate taxes (in dollars), and the sales price of the home (in thousands of dollars). From this information the following output was created. a. What is the correlation between the sales price and the taxes paid? b. We know that a home sold for a price of $180,000. Use the least squares line presented above to predict the average taxes for homes that sold at this price. c. We know that a home sold for $400,000. Would it be appropriate to predict the average taxes for a home that sold at this price? If so, make the prediction. If not explain why not. 5. Over 400 students in a statistics class were asked their GPA in high school and their GPA so far in college. The results were analyzed giving the following regression output: a. One student in the class had a high school GPA of 3.2. What would you predict for her GPA at Ohio State? Show your work. b. How do you interpret the GPA coefficient of 0.501958 given in the output? 2

6. A survey of homes in Whitehall, Ohio recorded the market value (market) of the home, the size of the home in square feet (sqft), and the number of bedrooms (bed) that each home had. Two resulting regression outputs are given below: The regression equation is market = - 34778 + 94.2 sqft Predictor Coef StDev T P Constant -34778 4860-7.16 0.000 sqft 94.201 3.507 26.86 0.000 S = 28294 R-Sq = 75.2% R-Sq(adj) = 75.1% The regression equation is market = - 4580 + 29845 bed Predictor Coef StDev T P Constant -4580 14032-0.33 0.744 bed 29845 4479 6.66 0.000 S = 52150 R-Sq = 15.7% R-Sq(adj) = 15.4% a. What is the correlation between the number of bedrooms and the market value? b. We know that a home in Whitehall has three bedrooms. What market value would we predict for this home? c. If you can choose either square footage or number of bedrooms to use as a predictor of market value, which would make the best predictor? Explain your reasoning. 7. A study measures the average annual snowfall (in inches) for 10 cities over the last decade along with the greatest Earth movement (on the Richter scale) over this same time period. The study included data from five cities in California s San Francisco Bay Area and five cities from Canada s province of Ontario. The study found a very strong negative correlation between the two variables. Does this mean that a strong snowfall will prevent earthquakes? Explain your answer briefly (identify the type of spurious argument involved and draw a picture to illustrate). 8. As the price of gasoline increases many people are considering purchasing gasoline electric hybrids. Are hybrids really different from other cars? We can use scatterplots and correlation to explore the relationship of variables and see how hybrids fall in these groups. Use the computer software of your choice to open the data set Cars2006. a. Create a scatterplot between the highway and city gas mileage. Illuminate the hybrid cars like the Toyota Prius, Honda Insight, Toyota Highlander Hybrid, Lexus RX400H, and Ford Escape Hybrid. Do these cars appear as outliers in the scatterplot? b. If the Toyota Prius and Honda Insight were removed from the scatterplot, would the correlation increase or decrease? Explain. c. Create a scatterplot between the highway mileage and engine displacement. Do the hybrids appear as outliers in this scatterplot? Explain. 3

9. Below is a scatterplot of the relationship between the Infant Mortality Rate and the Percent of Juveniles Not Enrolled in School for each of the 50 states plus the District of Columbia. The correlation is 0.73. If the District of Columbia (identified by the X) had been left out of the data set, then the correlation between these two variables for the 50 states would: A. be higher than 0.73. B. not change at all. C. be lower than 0.73. Pick one and explain briefly. 10. The director of admissions in a small college administered a newly designed entrance test to 100 students selected at random from the upcoming freshman class. The purpose of this study was to determine whether students' grade point average (GPA) at the end of the freshman year can be predicted from the entrance test score. At the end of the year when all the data are available, what would be the graph you would use to display the data? A. A histogram of the entrance test scores. B. A histogram of the GPAs. C. A scatterplot with GPA on the y-axis and the entrance test scores on the x-axis. D. A scatter plot with the entrance test scores on the y-axis and GPA on the x-axis. 11. Climatologists can estimate the amount of rainfall in California on a year by year basis over the last two thousand years by looking at the distance between the rings in very old redwood trees that have recently fallen (the idea being that the tree would grow faster and hence the rings would be farther apart for years with more rainfall). In this situation, which of the two variables below should be plotted on the Y-axis of a scatterplot and which should be plotted on the X-axis? Explain why. Variable 1: The distance between the rings Variable 2: The amount of rainfall 12. In an observational study, correlation does not imply causation because A. a high correlation may result from both X and Y being related to an unknown confounding variable. B. a high correlation may result from an outlier in the scatter plot. C. correlations may be negative. D. The regression line may have a steep slope. 4

13. The scatterplot and regression output below describe the relationship between the gestation age (in weeks) and the birth weight (in grams) of 100 low birth weight infants born in Boston. 1500 b i r t h w t 1250 1000 750 a. Is it appropriate to use the regression method to estimate the birth weight of an infant born with a gestational age of 30 weeks? If so explain why and make the estimate showing your work. If not, explain why not. b. Is it appropriate to use the regression method to estimate the birth weight of an infant born with a gestational age of 40 weeks? If so explain why and make the estimate showing your work. If not, explain why not. 14. Correlations will give a deceiving impression of the strength of an association A. when the pattern of points in the scatterplot is not linear. B. when the X and Y variables have a negative association. C. when the standard deviation of both X and Y are large. D. all of the above Pick one and explain. 15. Match the correlation to the situation. Your choices are 0.94 0.22 and -0.73. a. The correlation between the size of the home loan and the purchase price of the house. b. The correlation between the weight of the infant and the length of time they stay at the hospital after birth. c. The correlation between a college student s grade on an English class final exam and the student s score on the math part of the ACT college entrance exam. Explain your reasoning for deciding which correlation goes with which situation. 25.0 27.5 30.0 32.5 gestage (weeks) 16. The correlation between the price of the dinner and the tips left by customers at a restaurant is 0.96. True or False: If every customer decided to give one dollar less in tips then this correlation would go down. 17. The correlation between the ages of a group of students and the ages of their fathers is 0.8. Two years later all of the ages of the students and of their fathers would have increased by 2 years. True or False: The correlation here would still be 0.8. 5

18. Put the following four correlations in order from lowest to highest (be sure to remember that negative numbers are lower than positive numbers). Explain your reasoning. A. The correlation between the ages of all the husband and wife pairs in Ohio. B. The correlation between the weights of all the husbands and wives in Ohio. C. The correlation between the number of questions wrong and the number of questions right for all the students taking a test. D. The correlation between the weight and the miles per gallon of all the cars in Ohio. 19. The weights of 148 sets of twins born at the MetroHealth Medical Center in Cleveland, Ohio were recorded for a full year. How strongly is the weight of the first born associated with the weight of the twin? A scatterplot is shown below (all weights are in kilograms). a. The correlation between the weight of the first born and the weight of the twin is about A. 0.87 B. 0.36 C. 0.36 D. 0.87 Explain your choice. b. If the weights of the first born had been measured in pounds instead of kilograms, then: A. the value of the median weight of the first born twin would. B. the value of the standard deviation of the weights of the first born would. C. the value of the correlation between the two twins' weights would. Fill in the blanks from the possible choices listed below (Note. There are about 2.2 pounds in one kilogram). You may use an answer more than once. (1) be multiplied by 2.2 (2) be divided by 2.2 (3) stay the same (4) be multiplied by 2.2 times the correlation 20. The correlation between X and Y is 0.8. This says that A. Larger than average values of X are associated with smaller than average values of Y. B. Larger than average values of X are associated with negative values of Y. C. X tends to cause Y not to happen D. This is not possible. A correlation cannot be negative. Explain your choice. 6

21. The height, in cm, and length of the middle metacarpal bone, in mm, of 10 skeletons were measured. (The metacarpal bones are in the hand between the wrist and fingers.) The scatter diagram is given below. a. If the height and metacarpal length of the skeletons had been measured in inches instead of centimeters and millimeters, then the correlation between stature and metacarpal length for these 10 skeletons would go up, go down, or stay the same. Pick one and explain. b. One of these skeletons (identified by the X) had a metacarpal size of 52 mm and a height of 183 cm. If the height of this skeleton had been misrecorded as 153 cm, then the correlation between stature and metacarpal length for these 10 skeletons (including the misrecorded value) would go up, go down, or stay the same. Pick one and explain. Using the data in the scatterplot above (i.e., without the error mentioned in part b), a researcher gets the following output for the regression of stature on metacarpal length: Dependent variable is: stature No Selector R squared = 78.5% R squared (adjusted) = 75.8% s = 3.983 with 10-2 = 8 degrees of freedom Source Regression Residual Sum of Squares 463.208 126.892 df 1 8 Mean Square 463.208 15.8615 F-ratio 29.2 Variable Constant metacarpal Coefficient 93.9906 1.70736 s.e. of Coeff 14.62 0.3159 t-ratio 6.43 5.40 prob 0.0002 0.0006 c. A new metacarpal bone, which is 45 mm long, is found at an archeological dig. An investigator wants to use the data from the 10 skeletons mentioned above to make a prediction about the height of the person this new metacarpal bone came from. For the new metacarpal bone that was found, you would expect it to come from a skeleton that was cm tall. Fill in the blank and explain. 7

22. Each year, g3 Mystery Shopping, a market research company based in Sylvania, Ohio conducts a study of the drive-thru windows of the national fast-food restaurant chains. In one part of the study, a g3 Mystery Shopping employee orders a main item, a side item, and a drink at a drive-thru window (for example, a sandwich, a fries, and a soft drink) and then keeps track of how long it takes to be served. The time, in seconds, reported for each chain is then a summary of visits to that chain s locations nationwide. Below are a scatterplot and a regression output using the times for 24 chains that were evaluated in both the 1998 and 1999 surveys. Dependent variable is: 1999 time No Selector 26 total cases of which 2 are missing R squared = 80.5% R squared (adjusted) = 79.7% s = 18.53 with 24-2 = 22 degrees of freedom Source Regression Residual Variable Constant 1998 time Sum of Squares 31265.4 7556.36 Coefficient 33.8521 0.791209 df 1 22 s.e. of Coeff 18.25 0.0829 Mean Square 31265.4 343.471 t-ratio 1.85 9.54 F-ratio 91.0 prob 0.0771 0.0001 1 9 9 9 t i m e 300 250 200 200 250 300 1998 time a. One chain took 180 seconds to serve customers in 1998. What would you predict as the time for that chain to serve drive-thru customers in 1999? Show your work. b. The correlation between the 1998 times and the 1999 times was. c. The Steak n Shake chain did poorly in this survey taking 361 seconds to serve drive-thru customers in 1998 and 340 seconds in 1999. If Steak n Shake was not included in the survey, then the correlation between the 1998 and 1999 values would have been A. lower than the answer to part b above. B. higher than the answer to part b above. C. would not have changed the answer to part b above. Pick one and explain. 23. The correlation between the height and the age of a group of students in 2012 was 0.08. In 2013 the ages of the students had, of course, all gone up by one year but none of the heights had changed. True or False: For this group, the correlation between height and age would thus be greater than 0.08 in 2013. 8

24. Subjects taking part in an experimental test of a new drug have a blood test taken before the experiment begins to be sure that a variety of tests are within normal limits. Two of the tests measure the amount of hemoglobin (the protein in the red blood cells that carries the oxygen from the lungs to the body s tissues) and the Red Blood Cell Count (the number of red blood cells per milliliter). Since hemoglobin is carried in the red blood cells it stands to reason that the more red blood cells a person has, the higher their hemoglobin levels will be. A researcher carries out a regression analysis to study this relationship. Here is the output from this analysis: Dependent variable is: No Selector 1357 total cases of which 29 are missing Hemoglobin R squared = 64.6% R squared (adjusted) = 64.6% s = 0.8240 with 1328-2 = 1326 degrees of freedom Source Regression Residual Variable Constant Red Blood C Sum of Squares 1642.43 900.334 Coefficient 2.66655 2.47531 df 1 1326 s.e. of Coeff 0.2451 0.0503 Mean Square 1642.43 0.678985 t-ratio 10.9 49.2 F-ratio 2419 prob 0.0001 0.0001 17.5 15.0 12.5 10.0 3.75 4.50 5.25 6.00 Red Blood Cell Count a. One man had a red blood cell count of 5.5 per ml. Based on this output what hemoglobin level would you predict for this man? Show your calculations. b. Explain what aspects of the scatterplot above helps you know that your calculations in part a) were appropriate. c. What is the correlation between the hemoglobin level and the red blood cell count? 25. In searching for the causes of a disease, a researcher discovers that high levels of a certain protein are always present in subjects with the disease and is found at low levels in healthy subjects. Is this strong evidence that the protein causes the disease? Explain why or why not. H e m o g l o b i n 9

26. Which of these are true and which are false? Explain why the false statements are wrong. a. If two variables have a correlation of 0.3 and you add 0.1 to every value of both variables, then the correlation will become 0.4. b. A correlation of 0.5 means that half of the points in the scatterplot fall in a linear pattern. c. If the correlation is close to negative one, then the regression line will have a negative slope. d. The square of the correlation coefficient tells you the percentage of the variability in Y that is explained by knowing X. e. An outlier in the scatterplot can heavily influence a regression line. f. An outlier that falls right on the regression line can still have an important effect on the correlation. g. The regression line is not appropriate for making predictions when X and Y have a nonlinear relationship. h. If there is a linear pattern to the data, then linear regression can be appropriately used for extrapolation. i. If there is a non-linear relationship between x and y, then the correlation will always be zero. j. The correlation r measures both the direction and strength of a straight-line relationship. k. When the correlation between two variables is nearly negative 1, then there is a causeand-effect relationship between them. l. A correlation close to negative one indicates there were no outliers on the scatterplot. m. If there is a linear pattern to the data and there are no outliers driving the regression, then linear regression can be appropriately used for predictions within the range of the data. 10

EESEE Exercises The following exercises make use of stories in the Electronic Encyclopedia of Statistics Examples and Exercises, or EESEE (pronounced ee-zee). EESEE is included in the StatsPortal materials that accompany the textbook under the Resources tab. You will find the specific stories listed alphabetically by title. 27. EESEE Story Hubble Recession Velocity. In 1929 the astronomer Edwin Hubble investigated the relationship between the distance from Earth (in millions of light years) and the recession velocity (in kilometers/sec) of 24 galaxies. Read through this story s protocol. Hubble theorized that the relationship should be approximately linear. The data from Hubble s investigation are in the data file called Hubble. a. Suppose you find a galaxy with a recession velocity of 500 km/sec and want to estimate that galaxy s distance from Earth using Hubble s data. Which variable would you choose to be the y variable, and which would you choose to be the x variable? Explain. b. Use the computer to fit the regression line for distance on recession velocity. How far away from Earth do you predict the galaxy from part a to be? 28. EESEE Story Nutrition and Breakfast Cereals. This story details a study of the nutritional content of popular breakfast cereals. Read through the introduction to this story and open the data file called Cereals. This data gives the nutritional information from the box labels of 77 brands of breakfast cereal. a. Examine the amount of sodium in the cereals. Make a histogram and describe its shape. Calculate the values needed for the five-number summary and make a boxplot. Does the five-number summary do a good job of describing the distribution in this case? b. Examine the relationship between the amount of potassium in the cereals and the amount of dietary fiber. What is the correlation? c. Suppose a new breakfast cereal comes on the market with 300 milligrams of potassium per serving. Would it be appropriate to use the regression method to predict the amount of dietary fiber in a serving of this cereal? If yes, what is the prediction? If no, explain why not. 11

29. EESEE Story Blood Alcohol Content. How much does drinking beer increase the alcohol content of your blood? Read the introduction and protocol for this story. This question was addressed in an experiment at the Drackett Towers dormitory on The Ohio State University campus just before the State of Ohio raised the drinking age to 21. Sixteen students volunteered to take part in the experiment. Before the experiment each of the subjects blew into a Breathalyzer to show that their blood alcohol content (BAC) was at the zero mark. The student volunteers then drank a varying number of 12 ounce beers (between one and nine). How much each student drank was assigned by drawing tickets from a bowl. About 30 minutes later, an officer from the OSU Police Department measured their BAC using the Breathalyzer machine. Data from this experiment is in the datafile called bloodalc. Details of the variables are given in the results section of the story. a. Suppose you want to estimate how a person s Blood Alcohol Content is affected by the number of beers they drink. Make a scatter plot of BAC versus beers. Which variable did you choose to be the y variable and which did you choose to be the x variable? Explain. b. Use the computer to find the correlation between BAC and beers. Is the correlation coefficient an appropriate measure of the strength of the association between BAC and beers? Explain briefly. c. Use the computer to fit the regression line for BAC on beers. If a student drinks five beers, on average what do you predict the student s BAC will be? Show your work. c. Would the regression method be as accurate for predicting BAC for a person who drinks 15 beers? Explain. 12

Online Problems 30. To understand the ideas of this section try the Correlation and Regression applet in the StatsPortal website (you can find the collection of applets under the Resources tab). Read through the directions to the applet. Notice that you can add points to the scatter plot just by clicking. The correlation of the points will appear in the upper-left corner. To clear the points and start again just click on the Clear button. a. Create a scatter of points in the lower-left corner that has a correlation that is near zero. b. Now add a single point to your scatterplot in the upper-right corner. Click and drag the point to different places on the scatterplot. How much can you change the correlation by manipulating this single point? c. Clear your scatter plot using the trash icon. Create a new scatterplot that has a straight line of points and a correlation that is near 1. d. Now add a single point to your scatterplot in the upper-right corner. Click and drag the point to different places on the scatter plot. How much can you change the correlation by manipulating this single point? e. Based on what you have learned above, an outlier in a scatterplot can: A. increase the correlation. B. decrease the correlation. C. either increase or decrease the correlation. D. have no influence on the correlation. 13

31. What is the best line that fits a pattern of points? The least squares line is the line that minimizes the squared vertical distance from the points and the line. How well can you determine this line? You can use the Correlation and Regression applet in the StatsPortal website to experiment with determining this line. a. Open the applet. Create a scatterplot that has a linear pattern and a correlation around 0.7. b. Click the Draw your own line radio button. The next two points you create when you click on the plot will then form your line. You can change your line by dragging one of these two endpoints. The resulting relative sum of squares for your line is shown on the left. A value of 1 is for the best line possible so, for example, the value of 1.12 in the picture below indicates that the green line drawn has 12% high sum of squares than the least-squares line. Try moving the line you have created to try to reduce the relative sum of squares. c. When you have the line that you think is best you can click on Show least-squares line to see the actual best fit line. How did your line compare? d. Draw a new set of points and see if you can draw a line close to the least squares line on your first attempt. Was the line you drew centered correctly? Was its slope too steep or too shallow? 14

32. How well can you match correlations to their scatterplots? To find out try the following online applet: www.stat.illinois.edu/courses/stat100/cuwu/games.html a. Click the New Plots button. The applet will present four scatterplots and four correlations. b. Examine each plot and pick the correlation coefficient that matches the scatterplot. When you have made your guesses click the Answers button to find out if you are correct. c. You may continue to generate new plots. Try to achieve a streak of at least 20 in a row. 15