STATISTICS INFORMED DECISIONS USING DATA

Size: px
Start display at page:

Download "STATISTICS INFORMED DECISIONS USING DATA"

Transcription

1 STATISTICS INFORMED DECISIONS USING DATA Fifth Edition Chapter 4 Describing the Relation between Two Variables

2 4.1 Scatter Diagrams and Correlation Learning Objectives 1. Draw and interpret scatter diagrams 2. Describe the properties of the linear correlation coefficient 3. Compute and interpret the linear correlation coefficient 4. Determine whether a linear relation exists between two variables 5. Explain the difference between correlation and causation

3 4.1 Scatter Diagrams and Correlation Draw and Interpret Scatter Diagrams (1 of 6) The response variable is the variable whose value can be explained by the value of the explanatory or predictor variable. A scatter diagram is a graph that shows the relationship between two quantitative variables measured on the same individual. Each individual in the data set is represented by a point in the scatter diagram. The explanatory variable is plotted on the horizontal axis, and the response variable is plotted on the vertical axis.

4 4.1 Scatter Diagrams and Correlation Draw and Interpret Scatter Diagrams (2 of 6) EXAMPLE Drawing and Interpreting a Scatter Diagram The data shown to the right are based on a study for drilling rock. The researchers wanted to determine whether the time it takes to dry drill a distance of 5 feet in rock increases with the depth at which the drilling begins. So, depth at which drilling begins is the explanatory variable, x, and time (in minutes) to drill five feet is the response variable, y. Draw a scatter diagram of the data. Source: Penner, R., and Watts, D.G. Mining Information. The American Statistician, Vol. 45, No. 1, Feb. 1991, p. 6. Depth at Which Drilling Begins, x (in feet) Time to Drill 5 Feet, y (in minutes)

5 4.1 Scatter Diagrams and Correlation Draw and Interpret Scatter Diagrams (3 of 6)

6 4.1 Scatter Diagrams and Correlation Draw and Interpret Scatter Diagrams (4 of 6) Various Types of Relations in a Scatter Diagram

7 4.1 Scatter Diagrams and Correlation Draw and Interpret Scatter Diagrams (5 of 6) Two variables that are linearly related are positively associated when above-average values of one variable are associated with above-average values of the other variable and below-average values of one variable are associated with below-average values of the other variable. That is, two variables are positively associated if, whenever the value of one variable increases, the value of the other variable also increases.

8 4.1 Scatter Diagrams and Correlation Draw and Interpret Scatter Diagrams (6 of 6) Two variables that are linearly related are negatively associated when above-average values of one variable are associated with below-average values of the other variable. That is, two variables are negatively associated if, whenever the value of one variable increases, the value of the other variable decreases.

9 4.1 Scatter Diagrams and Correlation Describe the Properties of the Linear Correlation Coefficient (1 of 6) The linear correlation coefficient or Pearson product moment correlation coefficient is a measure of the strength and direction of the linear relation between two quantitative variables. The Greek letter ρ (rho) represents the population correlation coefficient, and r represents the sample correlation coefficient. We present only the formula for the sample correlation coefficient.

10 4.1 Scatter Diagrams and Correlation Describe the Properties of the Linear Correlation Coefficient (2 of 6) Sample Linear Correlation Coefficient

11 4.1 Scatter Diagrams and Correlation Describe the Properties of the Linear Correlation Coefficient (3 of 6) Properties of the Linear Correlation Coefficient 1. The linear correlation coefficient is always between 1 and 1, inclusive. That is, 1 r If r = + 1, then a perfect positive linear relation exists between the two variables. 3. If r = 1, then a perfect negative linear relation exists between the two variables. 4. The closer r is to +1, the stronger the evidence is of a positive association between the two variables. 5. The closer r is to 1, the stronger the evidence is of a negative association between the two variables.

12 4.1 Scatter Diagrams and Correlation Describe the Properties of the Linear Correlation Coefficient (4 of 6) 6. If r is close to 0, then little or no evidence exists of a linear relation between the two variables. So r close to 0 does not imply no relation, just no linear relation. 7. The linear correlation coefficient is a unitless measure of association. So the unit of measure for x and y plays no role in the interpretation of r. 8. The correlation coefficient is not resistant. Therefore, an observation that does not follow the overall pattern of the data could affect the value of the linear correlation coefficient.

13 4.1 Scatter Diagrams and Correlation Describe the Properties of the Linear Correlation Coefficient (5 of 6)

14 4.1 Scatter Diagrams and Correlation Describe the Properties of the Linear Correlation Coefficient (6 of 6)

15 4.1 Scatter Diagrams and Correlation Compute and Interpret the Linear Correlation Coefficient (1 of 5) EXAMPLE Determining the Linear Correlation Coefficient Determine the linear correlation coefficient of the drilling data. Depth at Which Drilling Begins, x (in feet) Time to Drill 5 Feet, y (in minutes)

16 4.1 Scatter Diagrams and Correlation Compute and Interpret the Linear Correlation Coefficient (2 of 5)

17 4.1 Scatter Diagrams and Correlation Compute and Interpret the Linear Correlation Coefficient (3 of 5)

18 4.1 Scatter Diagrams and Correlation Compute and Interpret the Linear Correlation Coefficient (4 of 5) IN CLASS ACTIVITY Correlation Randomly select six students from the class and have them determine their atrest pulse rates and then discuss the following: 1. When determining each at-rest pulse rate, would it be better to count beats for 30 seconds and multiply by 2 or count beats for 1 full minute? Explain. What are some other ways to find the at-rest pulse rate? Do any of these methods have an advantage? 2. What effect will physical activity have on pulse rate? 3. Do you think the at-rest pulse rate will have any effect on the pulse rate after physical activity? If so, how? If not, why not? Have the same six students jog in place for 3 minutes and then immediately determine their pulse rates using the same technique as for the at-rest pulse rates.

19 4.1 Scatter Diagrams and Correlation Compute and Interpret the Linear Correlation Coefficient (5 of 5) 4. Draw a scatter diagram for the pulse data using the at-rest data as the explanatory variable. 5. Comment on the relationship, if any, between the two variables. Is this consistent with your expectations? 6. Based on the graph, estimate the linear correlation coefficient for the data. Then compute the correlation coefficient and compare it to your estimate.

20 4.1 Scatter Diagrams and Correlation Determine whether a Linear Relation Exists between Two Variables (1 of 2) Testing for a Linear Relation Step 1 Determine the absolute value of the correlation coefficient. Step 2 Find the critical value in Table II for the given sample size. Step 3 If the absolute value of the correlation coefficient is greater than the critical value, we say a linear relation exists between the two variables. Otherwise, no linear relation exists.

21 4.1 Scatter Diagrams and Correlation Determine whether a Linear Relation Exists between Two Variables (2 of 2) EXAMPLE Does a Linear Relation Exist? Table II Critical Values for Correlation Coefficient Determine whether a linear relation exists between time to drill five feet and depth at which drilling begins. Comment on the type of relation that appears to exist between time to drill five feet and depth at which drilling begins. n blank The correlation between drilling depth and time to drill is The critical value for n = 12 observations is Since > 0.576, there is a positive linear relation between time to drill five feet and depth at which drilling begins

22 4.1 Scatter Diagrams and Correlation Explain the Difference between Correlation and Causation (1 of 8) According to data obtained from the Statistical Abstract of the United States, the correlation between the percentage of the female population with a bachelor s degree and the percentage of births to unmarried mothers since 1990 is Does this mean that a higher percentage of females with bachelor s degrees causes a higher percentage of births to unmarried mothers?

23 4.1 Scatter Diagrams and Correlation Explain the Difference between Correlation and Causation (2 of 8) Certainly not! The correlation exists only because both percentages have been increasing since It is this relation that causes the high correlation. In general, time series data (data collected over time) may have high correlations because each variable is moving in a specific direction over time (both going up or down over time; one increasing, while the other is decreasing over time). When data are observational, we cannot claim a causal relation exists between two variables. We can only claim causality when the data are collected through a designed experiment.

24 4.1 Scatter Diagrams and Correlation Explain the Difference between Correlation and Causation (3 of 8) Another way that two variables can be related even though there is not a causal relation is through a lurking variable. A lurking variable is related to both the explanatory and response variable. For example, ice cream sales and crime rates have a very high correlation. Does this mean that local governments should shut down all ice cream shops? No! The lurking variable is temperature. As air temperatures rise, both ice cream sales and crime rates rise.

25 4.1 Scatter Diagrams and Correlation Explain the Difference between Correlation and Causation (4 of 8) Table 4 EXAMPLE Lurking Variables in a Bone Mineral Density Study Number of Colas per Week Because colas tend to replace healthier beverages and colas contain caffeine and phosphoric acid, researchers Katherine L. Tucker and associates wanted to know whether cola consumption is associated with lower bone mineral density in women. The table lists the typical number of cans of cola consumed in a week and the femoral neck bone mineral density for a sample of 15 women. The data were collected through a prospective cohort study. Bone Mineral Density (g/cm2)

26 4.1 Scatter Diagrams and Correlation Explain the Difference between Correlation and Causation (5 of 8) EXAMPLE Lurking Variables in a Bone Mineral Density Study The figure on the next slide shows the scatter diagram of the data. The correlation between number of colas per week and bone mineral density is The critical value for correlation with n = 15 from Table II in Appendix A is Because > 0.514, we conclude a negative linear relation exists between number of colas consumed and bone mineral density. Can the authors conclude that an increase in the number of colas consumed causes a decrease in bone mineral density? Identify some lurking variables in the study.

27 4.1 Scatter Diagrams and Correlation Explain the Difference between Correlation and Causation (6 of 8)

28 4.1 Scatter Diagrams and Correlation Explain the Difference between Correlation and Causation (7 of 8) EXAMPLE Lurking Variables in a Bone Mineral Density Study In prospective cohort studies, data are collected on a group of subjects through questionnaires and surveys over time. Therefore, the data are observational. So the researchers cannot claim that increased cola consumption causes a decrease in bone mineral density. Some lurking variables in the study that could confound the results are: body mass index height smoking alcohol consumption calcium intake physical activity

29 4.1 Scatter Diagrams and Correlation Explain the Difference between Correlation and Causation (8 of 8) EXAMPLE Lurking Variables in a Bone Mineral Density Study The authors were careful to say that increased cola consumption is associated with lower bone mineral density because of potential lurking variables. They never stated that increased cola consumption causes lower bone mineral density.

30 4.2 Least-squares Regression Learning Objectives 1. Find the least-squares regression line and use the line to make predictions 2. Interpret the slope and the y-intercept of the least-squares regression line 3. Compute the sum of squared residuals

31 4.2 Least-squares Regression EXAMPLE Finding an Equation that Describes Linearly Relate Data (1 of 2) Using the following sample data: x y

32 4.2 Least-squares Regression EXAMPLE Finding an Equation that Describes Linearly Relate Data (2 of 2) (b) Graph the equation on the scatter diagram.

33 4.2 Least-squares Regression Find the Least-Squares Regression Line and Use the Line to Make Predictions (1 of 7) The difference between the observed value of y and the predicted value of y is the error, or residual. Using the line from the last example, and the predicted value at x = 3: residual = observed y predicted y = = 0.45

34 4.2 Least-squares Regression Find the Least-Squares Regression Line and Use the Line to Make Predictions (2 of 7) Least-Squares Regression Criterion

35 4.2 Least-squares Regression Find the Least-Squares Regression Line and Use the Line to Make Predictions (3 of 7) The Least-Squares Regression Line The equation of the least-squares regression line is given by

36 4.2 Least-squares Regression Find the Least-Squares Regression Line and Use the Line to Make Predictions (4 of 7) The Least-Squares Regression Line

37 4.2 Least-squares Regression Find the Least-Squares Regression Line and Use the Line to Make Predictions (5 of 7) EXAMPLE Finding the Leastsquares Regression Line Depth at Which Drilling Begins, x (in feet) Using the drilling data Time to Drill 5 Feet, y (in minutes) (b) Predict the drilling time if drilling starts at 130 feet (c) Is the observed drilling time at 130 feet above, or below, average (a) Find the least-squares regression line. (d) Draw the least-squares regression line on the scatter diagram of the data.

38 4.2 Least-squares Regression Find the Least-Squares Regression Line and Use the Line to Make Predictions (6 of 7) (c) The observed drilling time is 6.93 seconds. The predicted drilling time is seconds. The drilling time of 6.93 seconds is below average.

39 4.2 Least-squares Regression Find the Least-Squares Regression Line and Use the Line to Make Predictions (7 of 7)

40 4.2 Least-squares Regression Interpret the Slope and the y-intercept of the LeastSquares Regression Line (1 of 3) Interpretation of Slope: The slope of the regression line is For each additional foot of depth we start drilling, the time to drill five feet increases by minutes, on average.

41 4.2 Least-squares Regression Interpret the Slope and the y-intercept of the LeastSquares Regression Line (2 of 3) Interpretation of the y-intercept: The y-intercept of the regression line is To interpret the yintercept, we must first ask two questions: 1. Is 0 a reasonable value for the explanatory variable? 2. Do any observations near x = 0 exist in the data set? A value of 0 is reasonable for the drilling data (this indicates that drilling begins at the surface of Earth. The smallest observation in the data set is x = 35 feet, which is reasonably close to 0. So, interpretation of the y-intercept is reasonable. The time to drill five feet when we begin drilling at the surface of Earth is minutes.

42 4.2 Least-squares Regression Interpret the Slope and the y-intercept of the LeastSquares Regression Line (3 of 3) If the least-squares regression line is used to make predictions based on values of the explanatory variable that are much larger or much smaller than the observed values, we say the researcher is working outside the scope of the model. Never use a leastsquares regression line to make predictions outside the scope of the model because we can t be sure the linear relation continues to exist.

43 4.2 Least-squares Regression Compute the Sum of Squared Residuals To illustrate the fact that the sum of squared residuals for a least-squares regression line is less than the sum of squared residuals for any other line, use the regression by eye applet.

44 4.3 Diagnostics on the Least-squares Regression Line Learning Objectives 1. Compute and interpret the coefficient of determination 2. Perform residual analysis on a regression model 3. Identify influential observations

45 4.3 Diagnostics on the Least-squares Regression Line Compute and Interpret the Coefficient of Determination (1 of 18) The coefficient of determination, R2, measures the proportion of total variation in the response variable that is explained by the least-squares regression line. The coefficient of determination is a number between 0 and 1, inclusive. That is, 0 < R2 < 1. If R2 = 0 the line has no explanatory value If R2 = 1 means the line explains 100% of the variation in the response variable.

46 4.3 Diagnostics on the Least-squares Regression Line Compute and Interpret the Coefficient of Determination (2 of 18) The data to the right are based on a study for drilling rock. The researchers wanted to determine whether the time it takes to dry drill a distance of 5 feet in rock increases with the depth at which the drilling begins. So, depth at which drilling begins is the predictor variable, x, and time (in minutes) to drill five feet is the response variable, y. Depth at Which Drilling Begins, x (in feet) Source: Penner, R., and Watts, D.G. Mining Information. The American Statistician, Vol. 45, No. 1, Feb. 1991, p. 6. Time to Drill 5 Feet, y (in minutes)

47 4.3 Diagnostics on the Least-squares Regression Line Compute and Interpret the Coefficient of Determination (3 of 18)

48 4.3 Diagnostics on the Least-squares Regression Line Compute and Interpret the Coefficient of Determination (4 of 18) Sample Statistics blank blank Mean Standard Deviation Depth Time Correlation Between Depth and Time: Regression Analysis The regression equation is Time = Depth

49 4.3 Diagnostics on the Least-squares Regression Line Compute and Interpret the Coefficient of Determination (5 of 18) Suppose we were asked to predict the time to drill an additional 5 feet, but we did not know the current depth of the drill. What would be our best guess?

50 4.3 Diagnostics on the Least-squares Regression Line Compute and Interpret the Coefficient of Determination (6 of 18) Suppose we were asked to predict the time to drill an additional 5 feet, but we did not know the current depth of the drill. What would be our best guess? ANSWER: The mean time to drill an additional 5 feet: 6.99 minutes

51 4.3 Diagnostics on the Least-squares Regression Line Compute and Interpret the Coefficient of Determination (7 of 18) Now suppose that we are asked to predict the time to drill an additional 5 feet if the current depth of the drill is 160 feet? ANSWER:

52 4.3 Diagnostics on the Least-squares Regression Line Compute and Interpret the Coefficient of Determination (8 of 18)

53 4.3 Diagnostics on the Least-squares Regression Line Compute and Interpret the Coefficient of Determination (9 of 18) The difference between the observed value of the response variable and the mean value of the response variable is called the total deviation and is equal to

54 4.3 Diagnostics on the Least-squares Regression Line Compute and Interpret the Coefficient of Determination (10 of 18) The difference between the predicted value of the response variable and the mean value of the response variable is called the explained deviation and is equal to

55 4.3 Diagnostics on the Least-squares Regression Line Compute and Interpret the Coefficient of Determination (11 of 18) The difference between the observed value of the response variable and the predicted value of the response variable is called the unexplained deviation and is equal to

56 4.3 Diagnostics on the Least-squares Regression Line Compute and Interpret the Coefficient of Determination (12 of 18) Total Deviation = Unexplained Deviation + Explained Deviation

57 4.3 Diagnostics on the Least-squares Regression Line Compute and Interpret the Coefficient of Determination (13 of 18) Total Deviation = Unexplained Deviation + Explained Deviation

58 4.3 Diagnostics on the Least-squares Regression Line Compute and Interpret the Coefficient of Determination (14 of 18) Total Variation = Unexplained Variation + Explained Variation

59 4.3 Diagnostics on the Least-squares Regression Line Compute and Interpret the Coefficient of Determination (15 of 18) To determine R2 for the linear regression model simply square the value of the linear correlation coefficient.

60 4.3 Diagnostics on the Least-squares Regression Line Compute and Interpret the Coefficient of Determination (16 of 18) EXAMPLE Determining the Coefficient of Determination Find and interpret the coefficient of determination for the drilling data. Because the linear correlation coefficient, r, is 0.773, we have that R2 = = = 59.75%. So, 59.75% of the variability in drilling time is explained by the least-squares regression line.

61 4.3 Diagnostics on the Least-squares Regression Line Compute and Interpret the Coefficient of Determination (17 of 18) DATA SET A DATA SET B DATA SET C X Y X Y X Y Draw a scatter diagram for each of these data sets. For each data set, the variance of y is

62 4.3 Diagnostics on the Least-squares Regression Line Compute and Interpret the Coefficient of Determination (18 of 18) Data Set A: 99.99% of the variability in y is explained by the leastsquares regression line Data Set B: 94.7% of the variability in y is explained by the leastsquares regression line Data Set C: 9.4% of the variability in y is explained by the leastsquares regression line

63 4.3 Diagnostics on the Least-squares Regression Line Perform Residual Analysis on a Regression Model (1 of 14) Residuals play an important role in determining the adequacy of the linear model. In fact, residuals can be used for the following purposes: To determine whether a linear model is appropriate to describe the relation between the predictor and response variables. To determine whether the variance of the residuals is constant. To check for outliers.

64 4.3 Diagnostics on the Least-squares Regression Line Perform Residual Analysis on a Regression Model (2 of 14) If a plot of the residuals against the predictor variable shows a discernable pattern, such as a curve, then the response and predictor variable may not be linearly related.

65 4.3 Diagnostics on the Least-squares Regression Line Perform Residual Analysis on a Regression Model (3 of 14)

66 4.3 Diagnostics on the Least-squares Regression Line Perform Residual Analysis on a Regression Model (4 of 14) EXAMPLE Is a Linear Model Appropriate? Day Weight (in grams) A chemist has a 1000-gram sample of a radioactive material. She records the amount of radioactive material remaining in the sample every day for a week and obtains the following data

67 4.3 Diagnostics on the Least-squares Regression Line Perform Residual Analysis on a Regression Model (5 of 14) Linear correlation coefficient: 0.994

68 4.3 Diagnostics on the Least-squares Regression Line Perform Residual Analysis on a Regression Model (6 of 14)

69 4.3 Diagnostics on the Least-squares Regression Line Perform Residual Analysis on a Regression Model (7 of 14) Linear model not appropriate

70 4.3 Diagnostics on the Least-squares Regression Line Perform Residual Analysis on a Regression Model (8 of 14) If a plot of the residuals against the explanatory variable shows the spread of the residuals increasing or decreasing as the explanatory variable increases, then a strict requirement of the linear model is violated. This requirement is called constant error variance. The statistical term for constant error variance is homoscedasticity.

71 4.3 Diagnostics on the Least-squares Regression Line Perform Residual Analysis on a Regression Model (9 of 14)

72 4.3 Diagnostics on the Least-squares Regression Line Perform Residual Analysis on a Regression Model (10 of 14) A plot of residuals against the explanatory variable may also reveal outliers. These values will be easy to identify because the residual will lie far from the rest of the plot.

73 4.3 Diagnostics on the Least-squares Regression Line Perform Residual Analysis on a Regression Model (11 of 14)

74 4.3 Diagnostics on the Least-squares Regression Line Perform Residual Analysis on a Regression Model (12 of 14) EXAMPLE Residual Analysis Draw a residual plot of the drilling time data. Comment on the appropriateness of the linear least-squares regression model.

75 4.3 Diagnostics on the Least-squares Regression Line Perform Residual Analysis on a Regression Model (13 of 14)

76 4.3 Diagnostics on the Least-squares Regression Line Perform Residual Analysis on a Regression Model (14 of 14) Boxplot of Residuals for the Drilling Data

77 4.3 Diagnostics on the Least-squares Regression Line Identify Influential Observations (1 of 8) An influential observation is an observation that significantly affects the least-squares regression line s slope and/or yintercept, or the value of the correlation coefficient.

78 4.3 Diagnostics on the Least-squares Regression Line Identify Influential Observations (2 of 8) Influential observations typically exist when the point is an outlier relative to the values of the explanatory variable. So, Case 3 is likely influential.

79 4.3 Diagnostics on the Least-squares Regression Line Identify Influential Observations (3 of 8) Influence is affected by two factors: (1) the relative vertical position of the observation (residuals) and (2) the relative horizontal position of the observation (leverage).

80 4.3 Diagnostics on the Least-squares Regression Line Identify Influential Observations (4 of 8) EXAMPLE Influential Observations Suppose an additional data point is added to the drilling data. At a depth of 300 feet, it took minutes to drill 5 feet. Is this point influential?

81 4.3 Diagnostics on the Least-squares Regression Line Identify Influential Observations (5 of 8)

82 4.3 Diagnostics on the Least-squares Regression Line Identify Influential Observations (6 of 8)

83 4.3 Diagnostics on the Least-squares Regression Line Identify Influential Observations (7 of 8)

84 4.3 Diagnostics on the Least-squares Regression Line Identify Influential Observations (8 of 8) As with outliers, influential observations should be removed only if there is justification to do so. When an influential observation occurs in a data set and its removal is not warranted, there are two courses of action: (1) Collect more data so that additional points near the influential observation are obtained, or (2) Use techniques that reduce the influence of the influential observation (such as a transformation or different method of estimation - e.g. minimize absolute deviations). These techniques are beyond the scope of this text.

85 4.4 Contingency Tables and Association Learning Objectives 1. Compute the marginal distribution of a variable 2. Use the conditional distribution to identify association among categorical data 3. Explain Simpson s Paradox

86 4.4 Contingency Tables and Association Example: Data Information A professor at a community college in New Mexico conducted a study to assess the effectiveness of delivering an introductory statistics course via traditional lecture-based method, online delivery (no classroom instruction), and hybrid instruction (online course with weekly meetings) methods, the grades students received in each of the courses were tallied. blank A B C D F Traditional Online Hybrid

87 4.4 Contingency Tables and Association Compute the Marginal Distribution of a Variable (1 of 3) A marginal distribution of a variable is a frequency or relative frequency distribution of either the row or column variable in the contingency table.

88 4.4 Contingency Tables and Association Compute the Marginal Distribution of a Variable (2 of 3) EXAMPLE Determining Frequency Marginal Distributions A professor at a community college in New Mexico conducted a study to assess the effectiveness of delivering an introductory statistics course via traditional lecture-based method, online delivery (no classroom instruction), and hybrid instruction (online course with weekly meetings) methods, the grades students received in each of the courses were tallied. Find the frequency marginal distributions for course grade and delivery method. blank A B C D F Total Traditional Online Hybrid Total

89 4.4 Contingency Tables and Association Compute the Marginal Distribution of a Variable (3 of 3) EXAMPLE Determining Relative Frequency Marginal Distributions Determine the relative frequency marginal distribution for course grade and delivery method. blank A B C D F blank Traditional Online Hybrid Total

90 4.4 Contingency Tables and Association Use the Conditional Distribution to Identify Association among Categorical Data (1 of 4) A conditional distribution lists the relative frequency of each category of the response variable, given a specific value of the explanatory variable in the contingency table.

91 4.4 Contingency Tables and Association Use the Conditional Distribution to Identify Association among Categorical Data (2 of 4) EXAMPLE Determining a Conditional Distribution Construct a conditional distribution of course grade by method of delivery. Comment on any type of association that may exist between course grade and delivery method. It appears that students in the hybrid course are more likely to pass (A, B, or C) than the other two methods. blank Traditional Online A B C D F blank Hybrid Traditional Online Hybrid A B C D F

92 4.4 Contingency Tables and Association Use the Conditional Distribution to Identify Association among Categorical Data (3 of 4) EXAMPLE Drawing a Bar Graph of a Conditional Distribution Using the results of the previous example, draw a bar graph that represents the conditional distribution of method of delivery by grade earned.

93 4.4 Contingency Tables and Association Use the Conditional Distribution to Identify Association among Categorical Data (4 of 4) The following contingency table shows the survival status and demographics of passengers on the illfated Titanic. Men Women Boys Girls Survived Died Draw a conditional bar graph of survival status by demographic characteristic.

94 4.4 Contingency Tables and Association Explain Simpson s Paradox (1 of 6) EXAMPLE Illustrating Simpson s Paradox Insulin dependent (or Type 1) diabetes is a disease that results in the permanent destruction of insulin-producing beta cells of the pancreas. Type 1 diabetes is lethal unless treatment with insulin injections replaces the missing hormone. Individuals with insulin independent (or Type 2) diabetes can produce insulin internally. The data shown in the table below represent the survival status of 902 patients with diabetes by type over a 5-year period. blank Type 1 Type 2 Total Survived Died

95 4.4 Contingency Tables and Association Explain Simpson s Paradox (2 of 6) EXAMPLE Illustrating Simpson s Paradox blank Type 1 Type 2 Total Survived Died

96 4.4 Contingency Tables and Association Explain Simpson s Paradox (3 of 6) However, Type 2 diabetes is usually contracted after the age of 40. If we account for the variable age and divide our patients into two groups (those 40 or younger and those over 40), we obtain the data in the table below. blank Type 1 Type 2 Total < 40 > 40 < 40 > Died blank Survived

97 4.4 Contingency Tables and Association Explain Simpson s Paradox (4 of 6) blank Type 1 Type 2 Total < 40 > 40 < 40 > Died blank Survived

98 4.4 Contingency Tables and Association Explain Simpson s Paradox (5 of 6) blank Type 1 Type 2 Total < 40 > 40 < 40 > Died blank Survived

99 4.4 Contingency Tables and Association Explain Simpson s Paradox (6 of 6) Simpson s Paradox describes a situation in which an association between two variables inverts or goes away when a third variable is introduced to the analysis.

Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world

Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world Visit us on the World Wide Web at: www.pearsoned.co.uk Pearson Education Limited 2014

More information

Chapter 3 CORRELATION AND REGRESSION

Chapter 3 CORRELATION AND REGRESSION CORRELATION AND REGRESSION TOPIC SLIDE Linear Regression Defined 2 Regression Equation 3 The Slope or b 4 The Y-Intercept or a 5 What Value of the Y-Variable Should be Predicted When r = 0? 7 The Regression

More information

STAT 201 Chapter 3. Association and Regression

STAT 201 Chapter 3. Association and Regression STAT 201 Chapter 3 Association and Regression 1 Association of Variables Two Categorical Variables Response Variable (dependent variable): the outcome variable whose variation is being studied Explanatory

More information

Chapter 3: Examining Relationships

Chapter 3: Examining Relationships Name Date Per Key Vocabulary: response variable explanatory variable independent variable dependent variable scatterplot positive association negative association linear correlation r-value regression

More information

Simple Linear Regression the model, estimation and testing

Simple Linear Regression the model, estimation and testing Simple Linear Regression the model, estimation and testing Lecture No. 05 Example 1 A production manager has compared the dexterity test scores of five assembly-line employees with their hourly productivity.

More information

Section 3.2 Least-Squares Regression

Section 3.2 Least-Squares Regression Section 3.2 Least-Squares Regression Linear relationships between two quantitative variables are pretty common and easy to understand. Correlation measures the direction and strength of these relationships.

More information

Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1: Exploring Data Key Vocabulary:! individual! variable! frequency table! relative frequency table! distribution! pie chart! bar graph! two-way table! marginal distributions! conditional distributions!

More information

STATISTICS & PROBABILITY

STATISTICS & PROBABILITY STATISTICS & PROBABILITY LAWRENCE HIGH SCHOOL STATISTICS & PROBABILITY CURRICULUM MAP 2015-2016 Quarter 1 Unit 1 Collecting Data and Drawing Conclusions Unit 2 Summarizing Data Quarter 2 Unit 3 Randomness

More information

Chapter 3: Describing Relationships

Chapter 3: Describing Relationships Chapter 3: Describing Relationships Objectives: Students will: Construct and interpret a scatterplot for a set of bivariate data. Compute and interpret the correlation, r, between two variables. Demonstrate

More information

Lecture 6B: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

Lecture 6B: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression Lecture 6B: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression! Equation of Regression Line; Residuals! Effect of Explanatory/Response Roles! Unusual Observations! Sample

More information

NORTH SOUTH UNIVERSITY TUTORIAL 2

NORTH SOUTH UNIVERSITY TUTORIAL 2 NORTH SOUTH UNIVERSITY TUTORIAL 2 AHMED HOSSAIN,PhD Data Management and Analysis AHMED HOSSAIN,PhD - Data Management and Analysis 1 Correlation Analysis INTRODUCTION In correlation analysis, we estimate

More information

Examining Relationships Least-squares regression. Sections 2.3

Examining Relationships Least-squares regression. Sections 2.3 Examining Relationships Least-squares regression Sections 2.3 The regression line A regression line describes a one-way linear relationship between variables. An explanatory variable, x, explains variability

More information

Lecture 12: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

Lecture 12: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression Lecture 12: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression Equation of Regression Line; Residuals Effect of Explanatory/Response Roles Unusual Observations Sample

More information

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0% Capstone Test (will consist of FOUR quizzes and the FINAL test grade will be an average of the four quizzes). Capstone #1: Review of Chapters 1-3 Capstone #2: Review of Chapter 4 Capstone #3: Review of

More information

M 140 Test 1 A Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 60

M 140 Test 1 A Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 60 M 140 Test 1 A Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points 1-10 10 11 3 12 4 13 3 14 10 15 14 16 10 17 7 18 4 19 4 Total 60 Multiple choice questions (1 point each) For questions

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

Homework #3. SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

Homework #3. SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question. Homework #3 Name Due Due on on February Tuesday, Due on February 17th, Sept Friday 28th 17th, Friday SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question. Fill

More information

Further Mathematics 2018 CORE: Data analysis Chapter 3 Investigating associations between two variables

Further Mathematics 2018 CORE: Data analysis Chapter 3 Investigating associations between two variables Chapter 3: Investigating associations between two variables Further Mathematics 2018 CORE: Data analysis Chapter 3 Investigating associations between two variables Extract from Study Design Key knowledge

More information

3.2 Least- Squares Regression

3.2 Least- Squares Regression 3.2 Least- Squares Regression Linear (straight- line) relationships between two quantitative variables are pretty common and easy to understand. Correlation measures the direction and strength of these

More information

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review Results & Statistics: Description and Correlation The description and presentation of results involves a number of topics. These include scales of measurement, descriptive statistics used to summarize

More information

Business Statistics Probability

Business Statistics Probability Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

AP Statistics Practice Test Ch. 3 and Previous

AP Statistics Practice Test Ch. 3 and Previous AP Statistics Practice Test Ch. 3 and Previous Name Date Use the following to answer questions 1 and 2: A researcher measures the height (in feet) and volume of usable lumber (in cubic feet) of 32 cherry

More information

INTERPRET SCATTERPLOTS

INTERPRET SCATTERPLOTS Chapter2 MODELING A BUSINESS 2.1: Interpret Scatterplots 2.2: Linear Regression 2.3: Supply and Demand 2.4: Fixed and Variable Expenses 2.5: Graphs of Expense and Revenue Functions 2.6: Breakeven Analysis

More information

CHILD HEALTH AND DEVELOPMENT STUDY

CHILD HEALTH AND DEVELOPMENT STUDY CHILD HEALTH AND DEVELOPMENT STUDY 9. Diagnostics In this section various diagnostic tools will be used to evaluate the adequacy of the regression model with the five independent variables developed in

More information

CHAPTER 3 Describing Relationships

CHAPTER 3 Describing Relationships CHAPTER 3 Describing Relationships 3.1 Scatterplots and Correlation The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers Reading Quiz 3.1 True/False 1.

More information

Part 1. For each of the following questions fill-in the blanks. Each question is worth 2 points.

Part 1. For each of the following questions fill-in the blanks. Each question is worth 2 points. Part 1. For each of the following questions fill-in the blanks. Each question is worth 2 points. 1. The bell-shaped frequency curve is so common that if a population has this shape, the measurements are

More information

bivariate analysis: The statistical analysis of the relationship between two variables.

bivariate analysis: The statistical analysis of the relationship between two variables. bivariate analysis: The statistical analysis of the relationship between two variables. cell frequency: The number of cases in a cell of a cross-tabulation (contingency table). chi-square (χ 2 ) test for

More information

1.4 - Linear Regression and MS Excel

1.4 - Linear Regression and MS Excel 1.4 - Linear Regression and MS Excel Regression is an analytic technique for determining the relationship between a dependent variable and an independent variable. When the two variables have a linear

More information

Chapter 4: More about Relationships between Two-Variables Review Sheet

Chapter 4: More about Relationships between Two-Variables Review Sheet Review Sheet 4. Which of the following is true? A) log(ab) = log A log B. D) log(a/b) = log A log B. B) log(a + B) = log A + log B. C) log A B = log A log B. 5. Suppose we measure a response variable Y

More information

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test February 2016

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test February 2016 UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test February 2016 STAB22H3 Statistics I, LEC 01 and LEC 02 Duration: 1 hour and 45 minutes Last Name: First Name:

More information

Understandable Statistics

Understandable Statistics Understandable Statistics correlated to the Advanced Placement Program Course Description for Statistics Prepared for Alabama CC2 6/2003 2003 Understandable Statistics 2003 correlated to the Advanced Placement

More information

Unit 8 Day 1 Correlation Coefficients.notebook January 02, 2018

Unit 8 Day 1 Correlation Coefficients.notebook January 02, 2018 [a] Welcome Back! Please pick up a new packet Get a Chrome Book Complete the warm up Choose points on each graph and find the slope of the line. [b] Agenda 05 MIN Warm Up 25 MIN Notes Correlation 15 MIN

More information

M 140 Test 1 A Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

M 140 Test 1 A Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75 M 140 est 1 A Name (1 point) SHOW YOUR WORK FOR FULL CREDI! Problem Max. Points Your Points 1-10 10 11 10 12 3 13 4 14 18 15 8 16 7 17 14 otal 75 Multiple choice questions (1 point each) For questions

More information

STATS Relationships between variables: Correlation

STATS Relationships between variables: Correlation STATS 1060 Relationships between variables: Correlation READINGS: Chapter 7 of your text book (DeVeaux, Vellman and Bock); on-line notes for correlation; on-line practice problems for correlation NOTICE:

More information

Statistics for Psychology

Statistics for Psychology Statistics for Psychology SIXTH EDITION CHAPTER 12 Prediction Prediction a major practical application of statistical methods: making predictions make informed (and precise) guesses about such things as

More information

Medical Statistics 1. Basic Concepts Farhad Pishgar. Defining the data. Alive after 6 months?

Medical Statistics 1. Basic Concepts Farhad Pishgar. Defining the data. Alive after 6 months? Medical Statistics 1 Basic Concepts Farhad Pishgar Defining the data Population and samples Except when a full census is taken, we collect data on a sample from a much larger group called the population.

More information

12.1 Inference for Linear Regression. Introduction

12.1 Inference for Linear Regression. Introduction 12.1 Inference for Linear Regression vocab examples Introduction Many people believe that students learn better if they sit closer to the front of the classroom. Does sitting closer cause higher achievement,

More information

STATISTICS 8 CHAPTERS 1 TO 6, SAMPLE MULTIPLE CHOICE QUESTIONS

STATISTICS 8 CHAPTERS 1 TO 6, SAMPLE MULTIPLE CHOICE QUESTIONS STATISTICS 8 CHAPTERS 1 TO 6, SAMPLE MULTIPLE CHOICE QUESTIONS Circle the best answer. This scenario applies to Questions 1 and 2: A study was done to compare the lung capacity of coal miners to the lung

More information

Chapter 3 Review. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Chapter 3 Review. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question. Name: Class: Date: Chapter 3 Review Multiple Choice Identify the choice that best completes the statement or answers the question. Scenario 3-1 The height (in feet) and volume (in cubic feet) of usable

More information

Correlation and regression

Correlation and regression PG Dip in High Intensity Psychological Interventions Correlation and regression Martin Bland Professor of Health Statistics University of York http://martinbland.co.uk/ Correlation Example: Muscle strength

More information

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data TECHNICAL REPORT Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data CONTENTS Executive Summary...1 Introduction...2 Overview of Data Analysis Concepts...2

More information

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions Readings: OpenStax Textbook - Chapters 1 5 (online) Appendix D & E (online) Plous - Chapters 1, 5, 6, 13 (online) Introductory comments Describe how familiarity with statistical methods can - be associated

More information

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots Correlational Research Stephen E. Brock, Ph.D., NCSP California State University, Sacramento 1 Correlational Research A quantitative methodology used to determine whether, and to what degree, a relationship

More information

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Plous Chapters 17 & 18 Chapter 17: Social Influences Chapter 18: Group Judgments and Decisions

More information

Chapter 14: More Powerful Statistical Methods

Chapter 14: More Powerful Statistical Methods Chapter 14: More Powerful Statistical Methods Most questions will be on correlation and regression analysis, but I would like you to know just basically what cluster analysis, factor analysis, and conjoint

More information

Section 3 Correlation and Regression - Teachers Notes

Section 3 Correlation and Regression - Teachers Notes The data are from the paper: Exploring Relationships in Body Dimensions Grete Heinz and Louis J. Peterson San José State University Roger W. Johnson and Carter J. Kerk South Dakota School of Mines and

More information

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA Data Analysis: Describing Data CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA In the analysis process, the researcher tries to evaluate the data collected both from written documents and from other sources such

More information

Regression Equation. November 29, S10.3_3 Regression. Key Concept. Chapter 10 Correlation and Regression. Definitions

Regression Equation. November 29, S10.3_3 Regression. Key Concept. Chapter 10 Correlation and Regression. Definitions MAT 155 Statistical Analysis Dr. Claude Moore Cape Fear Community College Chapter 10 Correlation and Regression 10 1 Review and Preview 10 2 Correlation 10 3 Regression 10 4 Variation and Prediction Intervals

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Please note the page numbers listed for the Lind book may vary by a page or two depending on which version of the textbook you have. Readings: Lind 1 11 (with emphasis on chapters 10, 11) Please note chapter

More information

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys Multiple Regression Analysis 1 CRITERIA FOR USE Multiple regression analysis is used to test the effects of n independent (predictor) variables on a single dependent (criterion) variable. Regression tests

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

Welcome to OSA Training Statistics Part II

Welcome to OSA Training Statistics Part II Welcome to OSA Training Statistics Part II Course Summary Using data about a population to draw graphs Frequency distribution and variability within populations Bell Curves: What are they and where do

More information

Still important ideas

Still important ideas Readings: OpenStax - Chapters 1 13 & Appendix D & E (online) Plous Chapters 17 & 18 - Chapter 17: Social Influences - Chapter 18: Group Judgments and Decisions Still important ideas Contrast the measurement

More information

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you? WDHS Curriculum Map Probability and Statistics Time Interval/ Unit 1: Introduction to Statistics 1.1-1.3 2 weeks S-IC-1: Understand statistics as a process for making inferences about population parameters

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression Assoc. Prof Dr Sarimah Abdullah Unit of Biostatistics & Research Methodology School of Medical Sciences, Health Campus Universiti Sains Malaysia Regression Regression analysis

More information

CHAPTER ONE CORRELATION

CHAPTER ONE CORRELATION CHAPTER ONE CORRELATION 1.0 Introduction The first chapter focuses on the nature of statistical data of correlation. The aim of the series of exercises is to ensure the students are able to use SPSS to

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

IAPT: Regression. Regression analyses

IAPT: Regression. Regression analyses Regression analyses IAPT: Regression Regression is the rather strange name given to a set of methods for predicting one variable from another. The data shown in Table 1 and come from a student project

More information

3.2A Least-Squares Regression

3.2A Least-Squares Regression 3.2A Least-Squares Regression Linear (straight-line) relationships between two quantitative variables are pretty common and easy to understand. Our instinct when looking at a scatterplot of data is to

More information

14.1: Inference about the Model

14.1: Inference about the Model 14.1: Inference about the Model! When a scatterplot shows a linear relationship between an explanatory x and a response y, we can use the LSRL fitted to the data to predict a y for a given x. However,

More information

Biostatistics 2 - Correlation and Risk

Biostatistics 2 - Correlation and Risk BROUGHT TO YOU BY Biostatistics 2 - Correlation and Risk Developed by Pfizer January 2018 This learning module is intended for UK healthcare professionals only. PP-GEP-GBR-0957 Date of preparation Jan

More information

2 Assumptions of simple linear regression

2 Assumptions of simple linear regression Simple Linear Regression: Reliability of predictions Richard Buxton. 2008. 1 Introduction We often use regression models to make predictions. In Figure?? (a), we ve fitted a model relating a household

More information

3.4 What are some cautions in analyzing association?

3.4 What are some cautions in analyzing association? 3.4 What are some cautions in analyzing association? Objectives Extrapolation Outliers and Influential Observations Correlation does not imply causation Lurking variables and confounding Simpson s Paradox

More information

Chapter 1 Where Do Data Come From?

Chapter 1 Where Do Data Come From? Chapter 1 Where Do Data Come From? Understanding Data: The purpose of this class; to be able to read the newspaper and know what the heck they re talking about! To be able to go to the casino and know

More information

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug? MMI 409 Spring 2009 Final Examination Gordon Bleil Table of Contents Research Scenario and General Assumptions Questions for Dataset (Questions are hyperlinked to detailed answers) 1. Is there a difference

More information

Relationships. Between Measurements Variables. Chapter 10. Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Relationships. Between Measurements Variables. Chapter 10. Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc. Relationships Chapter 10 Between Measurements Variables Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc. Thought topics Price of diamonds against weight Male vs female age for dating Animals

More information

MULTIPLE REGRESSION OF CPS DATA

MULTIPLE REGRESSION OF CPS DATA MULTIPLE REGRESSION OF CPS DATA A further inspection of the relationship between hourly wages and education level can show whether other factors, such as gender and work experience, influence wages. Linear

More information

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions Readings: OpenStax Textbook - Chapters 1 5 (online) Appendix D & E (online) Plous - Chapters 1, 5, 6, 13 (online) Introductory comments Describe how familiarity with statistical methods can - be associated

More information

Chapter 4: More about Relationships between Two-Variables

Chapter 4: More about Relationships between Two-Variables 1. Which of the following scatterplots corresponds to a monotonic decreasing function f(t)? A) B) C) D) G Chapter 4: More about Relationships between Two-Variables E) 2. Which of the following transformations

More information

Regression CHAPTER SIXTEEN NOTE TO INSTRUCTORS OUTLINE OF RESOURCES

Regression CHAPTER SIXTEEN NOTE TO INSTRUCTORS OUTLINE OF RESOURCES CHAPTER SIXTEEN Regression NOTE TO INSTRUCTORS This chapter includes a number of complex concepts that may seem intimidating to students. Encourage students to focus on the big picture through some of

More information

Still important ideas

Still important ideas Readings: OpenStax - Chapters 1 11 + 13 & Appendix D & E (online) Plous - Chapters 2, 3, and 4 Chapter 2: Cognitive Dissonance, Chapter 3: Memory and Hindsight Bias, Chapter 4: Context Dependence Still

More information

10. LINEAR REGRESSION AND CORRELATION

10. LINEAR REGRESSION AND CORRELATION 1 10. LINEAR REGRESSION AND CORRELATION The contingency table describes an association between two nominal (categorical) variables (e.g., use of supplemental oxygen and mountaineer survival ). We have

More information

Conditional Distributions and the Bivariate Normal Distribution. James H. Steiger

Conditional Distributions and the Bivariate Normal Distribution. James H. Steiger Conditional Distributions and the Bivariate Normal Distribution James H. Steiger Overview In this module, we have several goals: Introduce several technical terms Bivariate frequency distribution Marginal

More information

Lecture 12 Cautions in Analyzing Associations

Lecture 12 Cautions in Analyzing Associations Lecture 12 Cautions in Analyzing Associations MA 217 - Stephen Sawin Fairfield University August 8, 2017 Cautions in Linear Regression Three things to be careful when doing linear regression we have already

More information

Research Methods in Forest Sciences: Learning Diary. Yoko Lu December Research process

Research Methods in Forest Sciences: Learning Diary. Yoko Lu December Research process Research Methods in Forest Sciences: Learning Diary Yoko Lu 285122 9 December 2016 1. Research process It is important to pursue and apply knowledge and understand the world under both natural and social

More information

TEACHING REGRESSION WITH SIMULATION. John H. Walker. Statistics Department California Polytechnic State University San Luis Obispo, CA 93407, U.S.A.

TEACHING REGRESSION WITH SIMULATION. John H. Walker. Statistics Department California Polytechnic State University San Luis Obispo, CA 93407, U.S.A. Proceedings of the 004 Winter Simulation Conference R G Ingalls, M D Rossetti, J S Smith, and B A Peters, eds TEACHING REGRESSION WITH SIMULATION John H Walker Statistics Department California Polytechnic

More information

Results. Example 1: Table 2.1 The Effect of Additives on Daphnia Heart Rate. Time (min)

Results. Example 1: Table 2.1 The Effect of Additives on Daphnia Heart Rate. Time (min) Notes for Alphas Line graphs provide a way to map independent and dependent variables that are both quantitative. When both variables are quantitative, the segment that connects every two points on the

More information

Caffeine & Calories in Soda. Statistics. Anthony W Dick

Caffeine & Calories in Soda. Statistics. Anthony W Dick 1 Caffeine & Calories in Soda Statistics Anthony W Dick 2 Caffeine & Calories in Soda Description of Experiment Does the caffeine content in soda have anything to do with the calories? This is the question

More information

Undertaking statistical analysis of

Undertaking statistical analysis of Descriptive statistics: Simply telling a story Laura Delaney introduces the principles of descriptive statistical analysis and presents an overview of the various ways in which data can be presented by

More information

Introduction. Lecture 1. What is Statistics?

Introduction. Lecture 1. What is Statistics? Lecture 1 Introduction What is Statistics? Statistics is the science of collecting, organizing and interpreting data. The goal of statistics is to gain information and understanding from data. A statistic

More information

Multiple Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Multiple Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Multiple Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Multiple Regression 1 / 19 Multiple Regression 1 The Multiple

More information

Unit 3 Lesson 2 Investigation 4

Unit 3 Lesson 2 Investigation 4 Name: Investigation 4 ssociation and Causation Reports in the media often suggest that research has found a cause-and-effect relationship between two variables. For example, a newspaper article listed

More information

Class 7 Everything is Related

Class 7 Everything is Related Class 7 Everything is Related Correlational Designs l 1 Topics Types of Correlational Designs Understanding Correlation Reporting Correlational Statistics Quantitative Designs l 2 Types of Correlational

More information

Lecture (chapter 12): Bivariate association for nominal- and ordinal-level variables

Lecture (chapter 12): Bivariate association for nominal- and ordinal-level variables Lecture (chapter 12): Bivariate association for nominal- and ordinal-level variables Ernesto F. L. Amaral April 2 4, 2018 Advanced Methods of Social Research (SOCI 420) Source: Healey, Joseph F. 2015.

More information

5 To Invest or not to Invest? That is the Question.

5 To Invest or not to Invest? That is the Question. 5 To Invest or not to Invest? That is the Question. Before starting this lab, you should be familiar with these terms: response y (or dependent) and explanatory x (or independent) variables; slope and

More information

Chapter 7: Descriptive Statistics

Chapter 7: Descriptive Statistics Chapter Overview Chapter 7 provides an introduction to basic strategies for describing groups statistically. Statistical concepts around normal distributions are discussed. The statistical procedures of

More information

(a) 50% of the shows have a rating greater than: impossible to tell

(a) 50% of the shows have a rating greater than: impossible to tell q 1. Here is a histogram of the Distribution of grades on a quiz. How many students took the quiz? What percentage of students scored below a 60 on the quiz? (Assume left-hand endpoints are included in

More information

Introduction to regression

Introduction to regression Introduction to regression Regression describes how one variable (response) depends on another variable (explanatory variable). Response variable: variable of interest, measures the outcome of a study

More information

FORM C Dr. Sanocki, PSY 3204 EXAM 1 NAME

FORM C Dr. Sanocki, PSY 3204 EXAM 1 NAME PSYCH STATS OLD EXAMS, provided for self-learning. LEARN HOW TO ANSWER the QUESTIONS; memorization of answers won t help. All answers are in the textbook or lecture. Instructors can provide some clarification

More information

CHAPTER 3 RESEARCH METHODOLOGY

CHAPTER 3 RESEARCH METHODOLOGY CHAPTER 3 RESEARCH METHODOLOGY 3.1 Introduction 3.1 Methodology 3.1.1 Research Design 3.1. Research Framework Design 3.1.3 Research Instrument 3.1.4 Validity of Questionnaire 3.1.5 Statistical Measurement

More information

Analysis and Interpretation of Data Part 1

Analysis and Interpretation of Data Part 1 Analysis and Interpretation of Data Part 1 DATA ANALYSIS: PRELIMINARY STEPS 1. Editing Field Edit Completeness Legibility Comprehensibility Consistency Uniformity Central Office Edit 2. Coding Specifying

More information

HW 3.2: page 193 #35-51 odd, 55, odd, 69, 71-78

HW 3.2: page 193 #35-51 odd, 55, odd, 69, 71-78 35. What s My Line? You use the same bar of soap to shower each morning. The bar weighs 80 grams when it is new. Its weight goes down by 6 grams per day on average. What is the equation of the regression

More information

Content. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries. Research question. Example Newly diagnosed Type 2 Diabetes

Content. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries. Research question. Example Newly diagnosed Type 2 Diabetes Content Quantifying association between continuous variables. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General

More information

Table of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017

Table of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017 Essential Statistics for Nursing Research Kristen Carlin, MPH Seattle Nursing Research Workshop January 30, 2017 Table of Contents Plots Descriptive statistics Sample size/power Correlations Hypothesis

More information

Lesson 1: Distributions and Their Shapes

Lesson 1: Distributions and Their Shapes Lesson 1 Name Date Lesson 1: Distributions and Their Shapes 1. Sam said that a typical flight delay for the sixty BigAir flights was approximately one hour. Do you agree? Why or why not? 2. Sam said that

More information

Chapter 1: Explaining Behavior

Chapter 1: Explaining Behavior Chapter 1: Explaining Behavior GOAL OF SCIENCE is to generate explanations for various puzzling natural phenomenon. - Generate general laws of behavior (psychology) RESEARCH: principle method for acquiring

More information

appstats26.notebook April 17, 2015

appstats26.notebook April 17, 2015 Chapter 26 Comparing Counts Objective: Students will interpret chi square as a test of goodness of fit, homogeneity, and independence. Goodness of Fit A test of whether the distribution of counts in one

More information

Chapter 2 Organizing and Summarizing Data. Chapter 3 Numerically Summarizing Data. Chapter 4 Describing the Relation between Two Variables

Chapter 2 Organizing and Summarizing Data. Chapter 3 Numerically Summarizing Data. Chapter 4 Describing the Relation between Two Variables Tables and Formulas for Sullivan, Fundamentals of Statistics, 4e 014 Pearson Education, Inc. Chapter Organizing and Summarizing Data Relative frequency = frequency sum of all frequencies Class midpoint:

More information

10. Introduction to Multivariate Relationships

10. Introduction to Multivariate Relationships 10. Introduction to Multivariate Relationships Bivariate analyses are informative, but we usually need to take into account many variables. Many explanatory variables have an influence on any particular

More information

Homework Linear Regression Problems should be worked out in your notebook

Homework Linear Regression Problems should be worked out in your notebook Homework Linear Regression Problems should be worked out in your notebook 1. Following are the mean heights of Kalama children: Age (months) 18 19 20 21 22 23 24 25 26 27 28 29 Height (cm) 76.1 77.0 78.1

More information