Further Mathematics 2018 CORE: Data analysis Chapter 3 Investigating associations between two variables

Similar documents
Unit 8 Day 1 Correlation Coefficients.notebook January 02, 2018

What is Data? Part 2: Patterns & Associations. INFO-1301, Quantitative Reasoning 1 University of Colorado Boulder

STAT 201 Chapter 3. Association and Regression

BIVARIATE DATA ANALYSIS

M 140 Test 1 A Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 60

Welcome to OSA Training Statistics Part II

Section 3 Correlation and Regression - Teachers Notes

3.4 What are some cautions in analyzing association?

STATISTICS INFORMED DECISIONS USING DATA

STATISTICS & PROBABILITY

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

Exemplar for Internal Assessment Resource Mathematics Level 3. Resource title: Sport Science. Investigate bivariate measurement data

CHAPTER 3 Describing Relationships

Chapter 4. Navigating. Analysis. Data. through. Exploring Bivariate Data. Navigations Series. Grades 6 8. Important Mathematical Ideas.

Unit 1 Exploring and Understanding Data

Chapter 1: Exploring Data

Business Statistics Probability

STATS Relationships between variables: Correlation

Reminders/Comments. Thanks for the quick feedback I ll try to put HW up on Saturday and I ll you

CP Statistics Sem 1 Final Exam Review

1 Version SP.A Investigate patterns of association in bivariate data

Lab 4 (M13) Objective: This lab will give you more practice exploring the shape of data, and in particular in breaking the data into two groups.

Lesson 11 Correlations

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

Chapter 3: Examining Relationships

(a) 50% of the shows have a rating greater than: impossible to tell

INTERPRET SCATTERPLOTS

A response variable is a variable that. An explanatory variable is a variable that.

How Faithful is the Old Faithful? The Practice of Statistics, 5 th Edition 1

Still important ideas

STATISTICS 201. Survey: Provide this Info. How familiar are you with these? Survey, continued IMPORTANT NOTE. Regression and ANOVA 9/29/2013

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

3. For a $5 lunch with a 55 cent ($0.55) tip, what is the value of the residual?

Still important ideas

Section 3.2 Least-Squares Regression

Understandable Statistics

CHILD HEALTH AND DEVELOPMENT STUDY

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

AP Statistics. Semester One Review Part 1 Chapters 1-5

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Statistics: Making Sense of the Numbers

AP Statistics Practice Test Ch. 3 and Previous

1. What is the difference between positive and negative correlations?

Part 1. For each of the following questions fill-in the blanks. Each question is worth 2 points.

Section The Question of Causation

UF#Stats#Club#STA#2023#Exam#1#Review#Packet# #Fall#2013#

(a) 50% of the shows have a rating greater than: impossible to tell

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test February 2016

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Scatter Plots and Association

Lecture 12: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

10/4/2007 MATH 171 Name: Dr. Lunsford Test Points Possible

Unit 8 Bivariate Data/ Scatterplots

Chapter 3 Review. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Unit 3 Lesson 2 Investigation 4

V. Gathering and Exploring Data

Chapter 1: Explaining Behavior

Problem Set 3 ECN Econometrics Professor Oscar Jorda. Name. ESSAY. Write your answer in the space provided.

CHAPTER ONE CORRELATION

Lecture 12 Cautions in Analyzing Associations

SCATTER PLOTS AND TREND LINES

M 140 Test 1 A Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

bivariate analysis: The statistical analysis of the relationship between two variables.

Chapter 4: Scatterplots and Correlation

10. Introduction to Multivariate Relationships

How to describe bivariate data

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14

Chapter 3: Describing Relationships

1. The figure below shows the lengths in centimetres of fish found in the net of a small trawler.

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

CHAPTER TWO REGRESSION

Lab 5a Exploring Correlation

Constructing a Bivariate Table: Introduction. Chapter 10: Relationships Between Two Variables. Constructing a Bivariate Table. Column Percentages

Chapter 1. Picturing Distributions with Graphs

Undertaking statistical analysis of

Section 6: Analysing Relationships Between Variables

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

4.3 Measures of Variation

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

Chapter 11 Nonexperimental Quantitative Research Steps in Nonexperimental Research

Math 075 Activities and Worksheets Book 2:

Lesson 1: Distributions and Their Shapes

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

Test 1 Version A STAT 3090 Spring 2018

Identify two variables. Classify them as explanatory or response and quantitative or explanatory.

Population. Sample. AP Statistics Notes for Chapter 1 Section 1.0 Making Sense of Data. Statistics: Data Analysis:

Statistics and Probability

Unit 7 Comparisons and Relationships

A) I only B) II only C) III only D) II and III only E) I, II, and III

Statistical Techniques. Masoud Mansoury and Anas Abulfaraj

Section I: Multiple Choice Select the best answer for each question.

Survey research (Lecture 1) Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.

Survey research (Lecture 1)

Medical Statistics 1. Basic Concepts Farhad Pishgar. Defining the data. Alive after 6 months?

LAB ASSIGNMENT 4 INFERENCES FOR NUMERICAL DATA. Comparison of Cancer Survival*

3.2A Least-Squares Regression

Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2016 Creative Commons Attribution 4.0

Chapter 3 CORRELATION AND REGRESSION

Observational studies; descriptive statistics

Math 124: Module 2, Part II

Transcription:

Chapter 3: Investigating associations between two variables Further Mathematics 2018 CORE: Data analysis Chapter 3 Investigating associations between two variables Extract from Study Design Key knowledge correlation coefficient, r, its interpretation, the issue of correlation and cause and effect Key skills construct scatterplots and use them to identify and describe associations between two numerical variables calculate the correlation coefficient, r, and interpret it in the context of the data answer statistical questions that require a knowledge of the associations between pairs of variables determine the equation of the least squares line giving the coefficients correct to a required number of decimal places or significant figures as specified distinguish between correlation and causation Chapter Sections Questions to be completed 3A Response and explanatory variables 1,2 3B Investigating associations between categorical variables 1,2,3,4 3C Investigating the associations between a numerical and a 1,2,3 categorical variable 3D Investigating associations between two numerical variables 1,2,3,4,5 3E How to interpret a scatterplot 1,2,3 3F Calculating the correlation coefficient 1,3 3G The coefficient of determination 1,2,3 3H Correlation and causality 1,2,3,4,5,6,7 3I Which graph? 1,2 Chapter 3 Review All questions Page 1 of 16

CORE: Data analysis Table of Contents CHAPTER 3 INVESTIGATING ASSOCIATIONS BETWEEN TWO VARIABLES... 1 EXTRACT FROM STUDY DESIGN... 1 KEY KNOWLEDGE... 1 KEY SKILLS... 1 TABLE OF CONTENTS... 2 3A. RESPONSE AND EXPLANATORY VARIABLES... 3 IDENTIFYING RESPONSE AND EXPLANATORY VARIABLES... 3 EXAMPLE 1: IDENTIFYING THE RESPONSE AND EXPLANATORY VARIABLES... 3 EXAMPLE 2: IDENTIFYING THE RESPONSE AND EXPLANATORY VARIABLES... 3 3B. INVESTIGATING ASSOCIATIONS BETWEEN CATEGORICAL VARIABLES... 4 USING A TWO- WAY FREQUENCY TABLE TO INVESTIGATE AN ASSOCIATION... 4 EXAMPLE 3: IDENTIFYING & DESCRIBING ASSOCIATIONS BETWEEN 2 CATEGORICAL VARIABLES... 4 EXAMPLE 4: IDENTIFYING & DESCRIBING ASSOCIATIONS BETWEEN 2 CATEGORICAL VARIABLES (NO ASSOCIATION)... 4 EXAMPLE 5... 5 3C. INVESTIGATING THE ASSOCIATION BETWEEN A NUMERICAL AND A CATEGORICAL VARIABLE... 6 USING PARALLEL BOX PLOTS TO IDENTIFY AND DESCRIBE ASSOCIATIONS... 6 EXAMPLE 6: USING PARALLEL DOT PLOTS TO IDENTIFY AND DESCRIBE ASSOCIATIONS... 7 EXAMPLE 7: USING A BACK- TO- BACK STEM PLOT TO IDENTIFY AND DESCRIBE ASSOCIATIONS... 7 3D. INVESTIGATING ASSOCIATIONS BETWEEN TWO NUMERICAL VARIABLES... 8 HOW TO CONSTRUCT A SCATTERPLOT (CAS CALCULATOR)... 8 3E. HOW TO INTERPRET A SCATTERPLOT... 9 DIRECTION AND OUTLIERS... 9 EXAMPLE 8: DIRECTION OF ASSOCIATION... 9 FORM... 10 EXAMPLE 9: FORM OF AN ASSOCIATION... 10 STRENGTH OF A LINEAR RELATIONSHIP: THE CORRELATION COEFFICIENT... 11 GUIDELINES FOR CLASSIFYING STRENGTH OF A LINEAR ASSOCIATION... 11 3F. CALCULATING THE CORRELATION COEFFICIENT... 12 CALCULATING R USING (CAS- CALCULATOR)... 12 3G. THE COEFFICIENT OF DETERMINATION... 13 CALCULATING THE COEFFICIENT OF DETERMINATION... 13 INTERPRETING THE COEFFICIENT OF DETERMINATION... 13 EXAMPLE 10... 13 3H. CORRELATION AND CAUSALITY... 14 THERE IS A STRONG RELATIONSHIP BETWEEN THE NUMBER OF NOBEL PRIZES A COUNTRY HAS WON AND THE NUMBER OF IKEA STORES IN THAT COUNTRY (R = 0.82). THE SCATTERPLOT BELOW SHOWS THE ASSOCIATION BETWEEN THE TWO VARIABLES.... 14 NON- CAUSAL EXPLANATIONS... 15 EXAMPLE: ICE CREAM SALES... 15 CORRELATION DOES NOT IMPLY CAUSALITY... 15 VIDEO... 15 EXAMPLE: CAUSATION... 15 3I. WHICH GRAPH?... 16 Page 2 of 16

Chapter 3: Investigating associations between two variables 3A. Response and explanatory variables Bivariate data is when the association between two variables is studied. Identifying response and explanatory variables Response variable is dependent on the explanatory variable. It is located on the y- axis. Explanatory variable is used to explain the changes that might be observed in the response variable. It is located on the x- axis. Example 1: Identifying the response and explanatory variables We wish to investigate the question, Does the time it takes a student to get to school depend on their mode of transport? The variables here are time and mode of transport. Which is the response variable (RV) and which is the explanatory variable (EV)? Example 2: Identifying the response and explanatory variables Can we predict people s height (in cm) from their wrist measurement? The variables in this investigation are height and wrist measurement. Which is the response variable (RV) and which is the explanatory variable (EV)? Another way to determine the explanatory and response variables is to ask the question Can we predict people s wrist measurement from their height? Height would be the explanatory variable and wrist measurement would be the response variable. Note: The explanatory variable is sometimes called the independent variable (IV) and response variable the dependent variable (DV) Page 3 of 16

CORE: Data analysis 3B. Investigating associations between categorical variables If two variables are related or linked in some way, we can say they are associated. Using a two- way frequency table to investigate an association Used to investigate the association between two categorical variables. Response variables are the rows Explanatory variable are the columns Example 3: Identifying & describing associations between 2 categorical variables A survey was conducted with 100 people. As part of this survey, people were asked whether or not they supported banning mobile phones in cinemas. The results are summarised in the table. Is there an association between support for banning mobile phones in cinemas and the sex of the respondent? Write a brief response quoting appropriate percentages. Example 4: Identifying & describing associations between 2 categorical variables (no association) In the same survey people were asked whether or not they supported Sunday racing. The results are summarised in the table. Is there an association between support for Sunday racing and the sex of the respondent? Write a brief response quoting appropriate percentages. Note: As a rule of thumb, a difference of at least 5% would be required to classify a difference as significant. Page 4 of 16

Chapter 3: Investigating associations between two variables Example 5 1 A survey was conducted with 1000 males under 50 years old. As part this survey, they were asked to rate their interest in sport as high, medium and low. Their age group was also recorded as under 18, 19-25, 26-35 and 36-50. The results are displayed in the table. a) Which is the explanatory variable, interest in sport or age group? b) Is there an association between interest in sport and age group? Write a brief response quoting appropriate percentages. 1 https://seniormaths.cambridge.edu.au/lessonsection/lesson.action#/resources/video/112756/ Page 5 of 16

CORE: Data analysis 3C. Investigating the association between a numerical and a categorical variable Using parallel box plots to identify and describe associations Used to display the relationship between numerical data and categorical data e.g. Salary (numerical data) vs age group (categorical data). Comparison between boxplots can be made in the way in which the distribution changes between categories in terms of shape, centre and spread. If there is no association between groups, the distributions will be similar for all groups. Comparing medians Comparing IQRs and/or ranges Comparing shapes Note: Any one of these reports by themselves can be used to claim an association between salary and age. However, using all three gives a more complete description of this relationship. Page 6 of 16

Example 6: Using parallel dot plots to identify and describe associations Chapter 3: Investigating associations between two variables The parallel dot plot below displays the distribution of the number of sit- ups performed by 15 people before and after they had completed a gym program. Do the parallel dot plots support the contention that the number of sit- ups performed is associated with completing the gym program? Write a brief explanation that compares medians. Note: Because it is often difficult to clearly identify the shape of a distribution with a small amount of data, we usually confine ourselves to comparing medians when using dot plots and back- to- back stem plots. Example 7: Using a back- to- back stem plot to identify and describe associations The back- to- back stem plot shows the distribution of life expectancy (in years) for 13 countries in 2010 and 1970. Do the back- to- back stem plots support the contention that life expectancy is increasing over time? Write a brief explanation based on your comparisons of the two medians. Page 7 of 16

CORE: Data analysis 3D. Investigating associations between two numerical variables Scatterplot compares two numerical variables Response variable is on the y- axis Explanatory variable is on the x- axis How to construct a scatterplot (CAS Calculator) Page 8 of 16

3E. How to interpret a scatterplot Chapter 3: Investigating associations between two variables When looking at a scatterplot the first thing to do is to decide if there is a clear pattern. In the example opposite, there is no clear pattern in the points. The points are just scattered randomly across the plot. Conclude that there is no association. For the three examples opposite, there is a clear (but different) pattern in each set of points. Conclude that there is an association. Having found a clear pattern, there are several things we look for in the pattern of points. Direction and outliers (if any) Form strength Direction and outliers random scatter of points no association between the variables height and age for this group of footballers. There is an outlier; the one who is 201 cm tall. clear pattern in the scatterplot of weight v. height for footballers. The two variables are associated. The points go upwards as you move to the right. This is a positive association between the variables. Tall players tend to be heavy and vice versa. In this scatterplot There are no outliers. the scatterplot of working hours against university participation rates for 15 countries shows a clear pattern. The two variables are associated. In this case the points go downwards as you move to the right. There is a negative association between the variables. Countries with high working hours tend to have low university participation rates and vice versa. There are no outliers. Example 8: Direction of association Classify each of the following scatterplots as exhibiting positive, negative or no association. Where there is an association, describe the direction of the association in terms of the variables in the scatterplot and what it means in terms of the variable involved. Page 9 of 16

CORE: Data analysis Form Looking for a pattern in points that has a linear form. If the points in the scatterplot appear to be random fluctuations around a straight line, then it is said that the scatterplot has a linear form. Example 9: Form of an association Classify the form of the association in each of scatterplot as linear or non- linear. Page 10 of 16

Strength of a linear relationship: the correlation coefficient Chapter 3: Investigating associations between two variables The strength of a linear association is an indication of how closely the points in the scatterplot fit in a straight line. The correlation coefficient, r, measures the strength of a linear relationship. The correlation coefficient is a value between - 1 and +1. Guidelines for classifying strength of a linear association Page 11 of 16

CORE: Data analysis 3F. Calculating the correlation coefficient Calculating r using (CAS- calculator) Page 12 of 16

3G. The coefficient of determination Chapter 3: Investigating associations between two variables Calculating the coefficient of determination Numerically, the coefficient of determination = r 2. Thus, if the correlation between weight and height is r = 0.8, then the Coefficient of determination = r 2 = 0.8 2 = 0.64 or 0.64 100% = 64% Interpreting the coefficient of determination The coefficient of determination (as a percentage) tells us the variation in the response variable that is explained by the variation in the explanatory variable. Let s look at the relationship between weight and height. The coefficient of determination is 0.64 or 64%. The coefficient of determination tells us that: Example 10 64% of the variation in people s weight is explained by the variation in their height. For the relationship described by this scatterplot, the coefficient of determination = 0.5210. Determine the value of the correlation coefficient, r. Example 11 Carbon monoxide (CO) levels in the air and traffic volume are linearly related, with: r $% '()(',+,-../0 )1'23( = +0.985 Determine the value of the coefficient of determination, write it in percentage terms and interpret. In this relationship, traffic volume is the explanatory variable. Example 12 Scores on tests of verbal and mathematical ability are linearly related with: r 3-+;(3-+/0-',)(,<-' = +0.275 Determine the value of the coefficient of determination, write it in percentage terms, and interpret. In this relationship, verbal ability is the explanatory variable. Page 13 of 16

CORE: Data analysis 3H. Correlation and causality There is a strong relationship between the number of Nobel prizes a country has won and the number of IKEA stores in that country (r = 0.82). The scatterplot below shows the association between the two variables. Does this mean that one way to increase the number of Australian Nobel prize winners is to build more IKEA stores? Identifying that there is a high degree of correlation between two variables may be interesting and can often flag the need for further, more detailed investigation, it in no way gives us any basis to comment on whether or not one variable causes particular values in another variable. Correlation is a statistic measure that defines the size and direction of the relationship between two variables. Causation states that one event is the result of the occurrence of the other event (or variable). This is also referred to as cause and effect, where one event is the cause and this makes another event happen, this being the effect. An example of a cause and effect relationship could be an alarm going off (cause - happens first) and a person waking up (effect - happens later). It is also important to realise that a high correlation does not imply causation. For example, a person smoking could have a high correlation with alcoholism but it is not necessarily the cause of it, thus they are different. Testing causality One way to test for causality is experimentally, where a control study is the most effective. 1. This involves splitting the sample or population data and making one a control group (e.g. one group gets a placebo and the other get some form of medication). Another way is via an observational study which also compares against a control variable, but the researcher has no control over the experiment (e.g. smokers and non- smokers who develop lung cancer). They have no control over whether they develop lung cancer or not. Page 14 of 16

Chapter 3: Investigating associations between two variables Non- causal explanations Although we may observe a strong correlation between two variables, this does not necessarily mean that an association exists. In some cases the correlation between two variables can be explained by a common response variable which provides the association. Example: Ice cream Sales These graphs show a correlation between ice cream sales and (from left to right) drowning deaths, forest fires and shark attacks. It may appear that to save lives we should ban ice cream, but clearly there is another reason. In this case, the reason is that all these things occur in Summer so it is temperature which is the cause. Correlation does not imply causality Video To help you with this concept, you should watch the video The Question of Causation, which can be accessed through the link below. It is well worth 15 minutes of your time. http://cambridge.edu.au/redirect/?id=6103 Example: Causation Returning to the scatterplot of the number of Nobel prizes a country has won and the number of IKEA stores in that country. Comment on the relationship. The between the number of a country has and the number of in that country, does imply that the of the is the of. Page 15 of 16

CORE: Data analysis 3I. Which graph? The following guidelines may assist in deciding which is the most appropriate graph to display data Page 16 of 16