Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world

Similar documents
STATISTICS INFORMED DECISIONS USING DATA

Research in Education. Tenth Edition

Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world

Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world

Medical Terminology: A Living Language Bonnie F. Fremgen Suzanne S. Frucht Fifth Edition

Probability and Statistical Inference NINTH EDITION

Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world

Chapter 3: Describing Relationships

Program Evaluation: Methods and Case Studies Emil J. Posavac Eighth Edition

STAT 201 Chapter 3. Association and Regression

CHAPTER 3 Describing Relationships

Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world

Introduction. Lecture 1. What is Statistics?

Section 3.2 Least-Squares Regression

Chapter 3: Examining Relationships

GLOBAL EDITION. Anatomy and Physiology Coloring Workbook. A Complete Study Guide TWELFTH EDITION. Elaine N. Marieb Simone Brito

Elaine N. Marieb Katja N. Hoehn Ninth Edition

STATISTICS & PROBABILITY

Chapter 1: Exploring Data

Chapter 3 CORRELATION AND REGRESSION

Statistics and Probability

Math 075 Activities and Worksheets Book 2:

bivariate analysis: The statistical analysis of the relationship between two variables.

Business Statistics Probability

3.2 Least- Squares Regression

1. To review research methods and the principles of experimental design that are typically used in an experiment.

SPSS Correlation/Regression

STATISTICS 8 CHAPTERS 1 TO 6, SAMPLE MULTIPLE CHOICE QUESTIONS

Examining Relationships Least-squares regression. Sections 2.3

IAPT: Regression. Regression analyses

Tuesday October 24 First Math Contest of the year room am,

Part 1. For each of the following questions fill-in the blanks. Each question is worth 2 points.

CHAPTER ONE CORRELATION

One-Way Independent ANOVA

LAB ASSIGNMENT 4 INFERENCES FOR NUMERICAL DATA. Comparison of Cancer Survival*

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

STP 231 Example FINAL

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

Simple Linear Regression the model, estimation and testing

Chapter 4: More about Relationships between Two-Variables Review Sheet

Educational Psychology Theory and Practice Robert E. Slavin Tenth Edition

Unit 8 Day 1 Correlation Coefficients.notebook January 02, 2018

Unit 1 Exploring and Understanding Data

Further Mathematics 2018 CORE: Data analysis Chapter 3 Investigating associations between two variables

Intro to SPSS. Using SPSS through WebFAS

M15_BERE8380_12_SE_C15.6.qxd 2/21/11 8:21 PM Page Influence Analysis 1

3.2A Least-Squares Regression

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

Homework #3. SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

Section 3 Correlation and Regression - Teachers Notes

Regression CHAPTER SIXTEEN NOTE TO INSTRUCTORS OUTLINE OF RESOURCES

Understandable Statistics

GLOBAL EDITION. Anatomy and Physiology Coloring Workbook. A Complete Study Guide TWELFTH EDITION. Elaine N. Marieb Simone Brito

14.1: Inference about the Model

NORTH SOUTH UNIVERSITY TUTORIAL 2

Excel Project 8 Fast Food Fun

Regression Equation. November 29, S10.3_3 Regression. Key Concept. Chapter 10 Correlation and Regression. Definitions

Lesson 1: Distributions and Their Shapes

M 140 Test 1 A Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 60

1 Version SP.A Investigate patterns of association in bivariate data

Measuring the User Experience

Psy201 Module 3 Study and Assignment Guide. Using Excel to Calculate Descriptive and Inferential Statistics

Chapter Eight: Multivariate Analysis

Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world

Medical Statistics 1. Basic Concepts Farhad Pishgar. Defining the data. Alive after 6 months?

UF#Stats#Club#STA#2023#Exam#1#Review#Packet# #Fall#2013#

ANOVA in SPSS (Practical)

Chapter 14. Inference for Regression Inference about the Model 14.1 Testing the Relationship Signi!cance Test Practice

Multiple Choice Questions

Statistics for Psychology

STATS Relationships between variables: Correlation

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Unit 8 Bivariate Data/ Scatterplots

Preliminary Report on Simple Statistical Tests (t-tests and bivariate correlations)

From Bivariate Through Multivariate Techniques

THE STATSWHISPERER. Introduction to this Issue. Doing Your Data Analysis INSIDE THIS ISSUE

Chapter Eight: Multivariate Analysis

5 14.notebook May 14, 2015

SCATTER PLOTS AND TREND LINES

Math 1680 Class Notes. Chapters: 1, 2, 3, 4, 5, 6

Still important ideas

AP Statistics Practice Test Ch. 3 and Previous

CHAPTER TWO REGRESSION

Statistical Methods Exam I Review

DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Research Methods Posc 302 ANALYSIS OF SURVEY DATA

Understanding. Regression Analysis

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test February 2016

M 140 Test 1 A Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

Lecture 6B: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

Chapter 4. More On Bivariate Data. More on Bivariate Data: 4.1: Transforming Relationships 4.2: Cautions about Correlation

7. Bivariate Graphing

Chapter 2 Organizing and Summarizing Data. Chapter 3 Numerically Summarizing Data. Chapter 4 Describing the Relation between Two Variables

CHAPTER 2. MEASURING AND DESCRIBING VARIABLES

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

INTERPRET SCATTERPLOTS

CHILD HEALTH AND DEVELOPMENT STUDY

Stat 13, Intro. to Statistical Methods for the Life and Health Sciences.

10. LINEAR REGRESSION AND CORRELATION

Transcription:

Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world Visit us on the World Wide Web at: www.pearsoned.co.uk Pearson Education Limited 2014 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without either the prior written permission of the publisher or a licence permitting restricted copying in the United Kingdom issued by the Copyright Licensing Agency Ltd, Saffron House, 6 10 Kirby Street, London EC1N 8TS. All trademarks used herein are the property of their respective owners. The use of any trademark in this text does not vest in the author or publisher any trademark ownership rights in such trademarks, nor does the use of such trademarks imply any affiliation with or endorsement of this book by such owners. ISBN 10: 1-292-02395-3 ISBN 13: 978-1-292-02395-3 British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library Printed in the United States of America

4 ASSESS YOUR UNDERSTANDING VOCABULARY AND SKILL BUILDING 1. What is meant by a marginal distribution? What is meant by a conditional distribution? 2. Refer to Table 9. Is constructing a conditional distribution by level of education different from constructing a conditional distribution by employment status? If they are different, explain the difference. 3. Explain why we use the term association rather than correlation when describing the relation between two variables in this section. 4. Explain the idea behind Simpson s Paradox. In Problems 5 and 6, (a) Construct a frequency marginal distribution. (c) Construct a conditional distribution by x. (d) Draw a bar graph of the conditional distribution found in part (c). 5. x 1 x 2 x 3 6. y 1 20 25 30 y 2 30 25 50 APPLYING THE CONCEPTS 7. Made in America In a recent Harris Poll, a random sample of adult Americans (18 years and older) was asked, When you see an ad emphasizing that a product is Made in America, are you more likely to buy it, less likely to buy it, or neither more nor less likely to buy it? The results of the survey, by age group, are presented in the contingency table below. 18 34 35 44 45 54 55+ Total More likely 238 329 360 402 1329 Less likely 22 6 22 16 66 Neither more 282 201 164 118 765 nor less likely Total 542 536 546 536 2160 Source : The Harris Poll x 1 x 2 x 3 y 1 35 25 20 y 2 65 75 80 (a) How many adult Americans were surveyed? How many were 55 and older? (c) What proportion of Americans are more likely to buy a product when the ad says Made in America? (d) Construct a conditional distribution of likelihood to buy Made in America by age. That is, construct a conditional distribution treating age as the explanatory variable. (e) Draw a bar graph of the conditional distribution found in part (d). (f) Write a couple sentences explaining any relation between likelihood to buy and age. 8. Desirability Traits In a recent Harris Poll, a random sample of adult Americans (18 years and older) was asked, Given a choice of the following, which one would you most want to be? Results of the survey, by gender, are given in the contingency table. 268 Richer Thinner Smarter Younger None of these Total Male 520 158 159 181 102 1120 Female 425 300 144 81 92 1042 Total 945 458 303 262 194 2162 Source : The Harris Poll (a) How many adult Americans were surveyed? How many males were surveyed? (c) What proportion of adult Americans want to be richer? (d) Construct a conditional distribution of desired trait by gender. That is, construct a conditional distribution treating gender as the explanatory variable. (e) Draw a bar graph of the conditional distribution found in part (d). (f ) Write a couple sentences explaining any relation between desired trait and gender. 9. Party Affiliation Is there an association between party affiliation and gender? The following data represent the gender and party affiliation of registered voters based on a random sample of 802 adults. Female Male Republican 105 115 Democrat 150 103 Independent 150 179 Source : Star Tribune Minnesota Poll (a) Construct a frequency marginal distribution. (c) What proportion of registered voters considers themselves to be Independent? (d) Construct a conditional distribution of party affiliation by gender. (e) Draw a bar graph of the conditional distribution found in part (d). (f ) Is gender associated with party affiliation? If so, how? 10. Feelings on Abortion The Pew Research Center for the People and the Press conducted a poll in which it asked about the availability of abortion. The table is based on the results of the survey. Generally available Allowed, but more limited Illegal, with few exceptions High School or Less Some College College Graduate 90 72 113 51 60 77 125 94 69 Never permitted 51 14 17 Source : Pew Research Center for the People and the Press (a) Construct a frequency marginal distribution. (c) What proportion of college graduates feel that abortion should never be permitted? (d) Construct a conditional distribution of people s feelings about the availability of abortion by level of education.

(e) Draw a bar graph of the conditional distribution found in part (d). (f ) Is level of education associated with opinion on the availability of abortion? If so, how? 11. Health and Happiness The General Social Survey asks questions about one s happiness and health. One would think that health plays a role in one s happiness. Use the data in the table to determine whether healthier people tend to also be happier. Treat level of health as the explanatory variable. Poor Fair Good Excellent Total Not too happy 696 1,386 1,629 732 4,443 Pretty happy 950 3,817 9,642 5,19519,604 Very happy 350 1,382 4,520 5,09511,347 Total 1,996 6,585 15,791 11,022 35,394 Source: General Social Survey 12. Happy in Your Marriage? The General Social Survey asks questions about one s happiness in marriage. Is there an association between gender and happiness in marriage? Use the data in the table to determine if gender is associated with happiness in marriage. Treat gender as the explanatory variable. Male Female Total Very happy 7,609 7,94215,551 Pretty happy 3,738 4,4478,185 Not too happy 259 460 719 Total 11,606 12,849 24,455 Source: General Social Survey 13. Smoking Is Healthy? Could it be that smoking actually increases survival rates among women? The following data represent the 20-year survival status and smoking status of 1314 English women who participated in a cohort study from 1972 to 1992. Smoking Status Smoker (S) Nonsmoker (NS) Total Dead 139 230369 Alive 443 502945 Total 582 732 1314 Source: David R. Appleton et al. Ignoring a Covariate: An Example of Simpson s Paradox. American Statistician 50(4), 1996 (a) What proportion of the smokers was dead after 20 years? What proportion of the nonsmokers was dead after 20 years? What does this imply about the health consequences of smoking? The data in the table above do not take into account a variable that is strongly related to survival status, age. The data shown next give the survival status of women and their age at the beginning of the study. For example, 14 women who were 35 to 44 at the beginning of the study were smokers and dead after 20 years. Age Group 18 24 25 34 35 44 45 54 55 64 65 74 75 or older S NS S NS S NS S NS S NS S NS S NS Dead 2 1 3 5 14 7 27 12 51 40 29 101 13 64 Alive 53 61 121 152 95 114 103 66 64 81 7 28 0 0 (b) Determine the proportion of 18- to 24-year-old smokers who were dead after 20 years. Determine the proportion of 18- to 24-yearold nonsmokers who were dead after 20 years. (c) Repeat part (b) for the remaining age groups to create a conditional distribution of survival status by smoking status for each age group. (d) Draw a bar graph of the conditional distribution from part (c). (e) Write a short report detailing your findings. 269

14. Treating Kidney Stones Researchers conducted a study to determine which of two treatments, A or B, is more effective in the treatment of kidney stones. The results of their experiment are given in the table. Treatment A Treatment B Total Effective 273 289562 Not effective 77 61138 Total 350 350 700 Source: C. R. Charig, D. R. Webb, S. R. Payne, and O. E. Wickham. Comparison of Treatment Real Calculi by Operative Surgery, Percutaneous Nephrolithotomi, and Extracorporeal Shock Wave Lithoripsy. British Medical Journal 292(6524): 879 882. (a) Which treatment appears to be more effective? Why? The data in the table above do not take into account the size of the kidney stone. The data shown next indicate the effectiveness of each treatment for both large and small kidney stones. Small Stones Large Stones A B A B Effective 81 234 273 55 Not effective 6 36 77 25 (b) Determine the proportion of small kidney stones that were effectively dealt with using treatment A. Determine the proportion of small kidney stones that were effectively dealt with using treatment B. (c) Repeat part (b) for the large stones to create a conditional distribution of effectiveness by treatment for each stone size. (d) Draw a bar graph of the conditional distribution from part (c). (e) Write a short report detailing your findings. Technology Step-By-Step Contingency Tables and Association MINITAB 1. Enter the values of the row variable in column C1 and the corresponding values of the column variable in C2. The frequency for the cell is entered in C3. For example, the data in Table 9 would be entered as follows: Frequencies enter C3. Click the Options button and make sure the radio button for Display marginal statistics for Rows and columns is checked. Click OK. Click the Categorical Variables button and then select the summaries you desire. Click OK twice. 2. Select the Stat menu and highlight Tables. Then select Descriptive Statistics... 3. In the cell For Rows: enter C1. In the cell For Columns: enter C2. In the cell StatCrunch 1. Enter the contingency table into the spreadsheet. The first column should be the row variable. For example, for the data in Table 9, the first column would be employment status. Each subsequent column would be the counts of each category of the column variable. For the data in Table 9, enter the counts for each level of education. Title each column (including the first column indicating the row variable). 2. Select Stat, highlight Tables, select Contingency, then highlight with summary. 3. Select the column variables. Then select the label of the row variable. For example, the data in Table 9 has four column variables (Did Not Finish High School, and so on) and the row label is employment status. Click Next>. 4. Decide what values you want displayed. Typically, we choose row percent and column percent for this section. Click Calculate. 270

REVIEW Summary In this chapter we looked at describing the relation between two quantitative variables (Sections 1 to 3) and between two qualitative variables ( Section 4 ). The first step in describing the relation between two quantitative variables is to draw a scatter diagram. The explanatory variable is plotted on the horizontal axis and the corresponding response variable on the vertical axis. The scatter diagram can be used to discover whether the relation between the explanatory and the response variables is linear. In addition, for linear relations, we can judge whether the linear relation shows positive or negative association. A numerical measure for the strength of linear relation between two quantitative variables is the linear correlation coefficient. It is a number between -1 and 1, inclusive. Values of the correlation coefficient near -1 are indicative of a negative linear relation between the two variables. Values of the correlation coefficient near +1 indicate a positive linear relation between the two variables. If the correlation coefficient is near 0, then little linear relation exists between the two variables. Be careful! Just because the correlation coefficient between two quantitative variables indicates that the variables are linearly related, it does not mean that a change in one variable causes a change in a second variable. It could be that the correlation is the result of a lurking variable. Once a linear relation between the two variables has been discovered, we describe the relation by finding the least-squares regression line. This line best describes the linear relation between the explanatory and response variables. We can use the least-squares regression line to predict a value of the response variable for a given value of the explanatory variable. The coefficient of determination, R 2, measures the percent of variation in the response variable that is explained by the least-squares regression line. It is a measure between 0 and 1, inclusive. The closer R 2 is to 1, the more explanatory value the line has. Whenever a least-squares regression line is obtained, certain diagnostics must be performed. These include verifying that the linear model is appropriate, verifying the residuals have constant variance, and checking for outliers and influential observations. Section 4 introduced methods that allow us to describe any association that might exist between two qualitative variables. This is done through contingency tables. Both marginal and conditional distributions allow us to describe the effect one variable might have on the other variable in the study. We also construct bar graphs to see the association between the two variables in the study. Again, just because two qualitative variables are associated does not mean that a change in one variable causes a change in a second variable. We also looked at Simpson s Paradox, which represents situations in which an association between two variables inverts or goes away when a third (lurking) variable is introduced into the analysis. Vocabulary Bivariate data Response variable Explanatory variable Predictor variable Scatter diagram Positively associated Negatively associated Linear correlation coefficient Correlation matrix Lurking variable Residual Least-squares regression line Slope y-intercept Outside the scope of the model Coefficient of determination Deviation Total deviation Explained deviation Unexplained deviation Residual plot Constant error variance Outlier Influential observation Contingency (or two-way) table Row variable Column variable Cell Marginal distribution Conditional distribution Simpson s Paradox Formulas Correlation Coefficient a a x i - x s x ba y i - y b s y r = n - 1 Equation of the Least-Squares Regression Line yn = b 1 x + b 0 where yn is the predicted value of the response variable b 1 = r # s y s x is the slope of the least-squares regression line b 0 = y - b 1 x is the y-intercept of the least-squares regression line Coefficient of Determination, R 2 explained variation R 2 = total variation unexplained variation = 1 - total variation = r 2 for the least@squares regression model yn = b 1 x + b 0 271

Objectives Section You should be able to... Example Review Exercises 1 1 Draw and interpret scatter diagrams 1,3 2(b), 3(a), 6(a), 13(a) 2 Describe the properties of the linear correlation coefficient 18 3 Compute and interpret the linear correlation coefficient 2,3 2(c), 3(b), 13(b) 4 Determine whether a linear relation exists between two variables 4 2(d), 3(c) 5 Explain the difference between correlation and causation 5 14, 17 2 1 Find the least-squares regression line and use the line to make predictions 2 Interpret the slope and y-intercept of the least-squares regression line 2,3 1(a), 1(b), 4(a), 4(d), 5(a), 5(c), 6(d), 12(a), 13(c), 19(c) 1(c), 1(d), 4(c), 5(b), 19(b) 3 Compute the sum of squared residuals 4 6(f), 6(g) 3 1 Compute and interpret the coefficient of determination 1 1(e), 10(a), 11(a) 2 Perform residual analysis on a regression model 2 5 7 9, 10(b) and (c), 11(b) and (c), 13(d) and (e), 19(d) 3 Identify influential observations 6 10(d), 10(e), 11(d), 12(b), 19(e) 4 1 Compute the marginal distribution of a variable 1 and 2 15( b) 2 Use the conditional distribution to identify association among categorical data 3 5 15(d), 15(e), 15(f) 3 Explain Simpson s Paradox 6 16 Review Exercises 1. Basketball Spreads In sports betting, Las Vegas sports books establish winning margins for a team that is favored to win a game. An individual can place a wager on the game and will win if the team bet upon wins after accounting for the spread. For example, if Team A is favored by 5 points and wins the game by 7 points, then a bet on Team A is a winning bet. However, if Team A wins the game by only 3 points, then a bet on Team A is a losing bet. For NCAA Division I basketball games, a least-squares regression with explanatory variable home team Las Vegas spread, x, and response variable home team winning margin, y, is yn = 1.007x - 0.012. Source: Justin Wolfers. Point Shaving: Corruption in NCAA Basketball (a) Predict the winning margin if the home team is favored by 3 points. (b) Predict the winning margin (of the visiting team) if the visiting team is favored by 7 points (this is equivalent to the home team being favored by -7 points). (c) Interpret the slope. (d) Interpret the y-intercept. (e) The coefficient of determination is 0.39. Interpret this value. 2. Fat and Calories in Cheeseburgers A nutritionist was interested in developing a model that describes the relation between the amount of fat (in grams) in cheeseburgers at fastfood restaurants and the number of calories. She obtains the following data from the Web sites of the companies. Sandwich (Restaurant) Fat Content (g) Calories Quarter-pound Single with Cheese (Wendy s) 20 430 Whataburger (Whataburger) 39 750 Cheeseburger (In-n-Out) 27 480 Big Mac (McDonald s) 29 540 Quarter-pounder with cheese (McDonald s) 26 510 Whopper with cheese (Burger King) 47 760 Jumbo Jack (Jack in the Box) 35 690 Double Steakburger with cheese (Steak n Shake) 38 632 Source: Each company s Web site (a) The researcher wants to use fat content to predict calories. Which is the explanatory variable? (b) Draw a scatter diagram of the data. (c) Compute the linear correlation coefficient between fat content and calories. (d) Does a linear relation exist between fat content and calories in fast-food restaurant sandwiches? 272