Statistical Significance, Effect Size, and Practical Significance Eva Lawrence Guilford College October, 2017

Similar documents
CHAPTER OBJECTIVES - STUDENTS SHOULD BE ABLE TO:

Chapter 11. Experimental Design: One-Way Independent Samples Design

One-Way ANOVAs t-test two statistically significant Type I error alpha null hypothesis dependant variable Independent variable three levels;

One-Way Independent ANOVA

Sheila Barron Statistics Outreach Center 2/8/2011

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Final Exam Practice Test

Inferential Statistics

Applied Statistical Analysis EDUC 6050 Week 4

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

CHAPTER ONE CORRELATION

Essential Skills for Evidence-based Practice: Statistics for Therapy Questions

Two-Way Independent ANOVA

CHAPTER - 6 STATISTICAL ANALYSIS. This chapter discusses inferential statistics, which use sample data to

The normal curve and standardisation. Percentiles, z-scores

Chapter 12. The One- Sample

AMSc Research Methods Research approach IV: Experimental [2]

Statistics as a Tool. A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations.

04/12/2014. Research Methods in Psychology. Chapter 6: Independent Groups Designs. What is your ideas? Testing

Types of questions. You need to know. Short question. Short question. Measurement Scale: Ordinal Scale

Hypothesis Testing. Richard S. Balkin, Ph.D., LPC-S, NCC

DO NOT COPY, POST, OR DISTRIBUTE

Testing Means. Related-Samples t Test With Confidence Intervals. 6. Compute a related-samples t test and interpret the results.

Business Statistics Probability

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data

Chapter 12: Introduction to Analysis of Variance

Still important ideas

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

1 The conceptual underpinnings of statistical power

Online Introduction to Statistics

A Brief (very brief) Overview of Biostatistics. Jody Kreiman, PhD Bureau of Glottal Affairs

Statistics and Probability

Lessons in biostatistics

Statistical Techniques. Masoud Mansoury and Anas Abulfaraj

PSYCHOLOGY 300B (A01) One-sample t test. n = d = ρ 1 ρ 0 δ = d (n 1) d

Chapter 19. Confidence Intervals for Proportions. Copyright 2010 Pearson Education, Inc.

Study Guide #2: MULTIPLE REGRESSION in education

Constructing a Bivariate Table:

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

FORM C Dr. Sanocki, PSY 3204 EXAM 1 NAME

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14

Chapter 13: Factorial ANOVA

Study Guide for the Final Exam

UNIT V: Analysis of Non-numerical and Numerical Data SWK 330 Kimberly Baker-Abrams. In qualitative research: Grounded Theory

Health Consciousness of Siena Students

Still important ideas

Something to think about. What happens, however, when we have a sample with less than 30 items?

Examining differences between two sets of scores

Evaluating you relationships

CHAPTER NINE DATA ANALYSIS / EVALUATING QUALITY (VALIDITY) OF BETWEEN GROUP EXPERIMENTS

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Objectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests

STATISTICS AND RESEARCH DESIGN

ANOVA in SPSS (Practical)

CALIFORNIA STATE UNIVERSITY STANISLAUS DEPARTMENT OF SOCIOLOGY ASSESSMENT MODEL

Appendix B Statistical Methods

Day 11: Measures of Association and ANOVA

The t-test: Answers the question: is the difference between the two conditions in my experiment "real" or due to chance?

Homework Exercises for PSYC 3330: Statistics for the Behavioral Sciences

Research Questions, Variables, and Hypotheses: Part 2. Review. Hypotheses RCS /7/04. What are research questions? What are variables?

Chapter 19. Confidence Intervals for Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

CHAPTER VI RESEARCH METHODOLOGY

What is Psychology? chapter 1

Lecture 20: Chi Square

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

ANALYSIS OF VARIANCE (ANOVA): TESTING DIFFERENCES INVOLVING THREE OR MORE MEANS

Slide 1 - Introduction to Statistics Tutorial: An Overview Slide notes

STATISTICS & PROBABILITY

1) What is the independent variable? What is our Dependent Variable?

Measuring association in contingency tables

9. Interpret a Confidence level: "To say that we are 95% confident is shorthand for..

Collecting & Making Sense of

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

15.301/310, Managerial Psychology Prof. Dan Ariely Recitation 8: T test and ANOVA

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments

Chapter 2 Organizing and Summarizing Data. Chapter 3 Numerically Summarizing Data. Chapter 4 Describing the Relation between Two Variables

Chapter 2--Norms and Basic Statistics for Testing

Variability. After reading this chapter, you should be able to do the following:

Undertaking statistical analysis of

Regression Including the Interaction Between Quantitative Variables

t-test for r Copyright 2000 Tom Malloy. All rights reserved

To open a CMA file > Download and Save file Start CMA Open file from within CMA

Two-Way Independent Samples ANOVA with SPSS

Methods for Determining Random Sample Size


SOME NOTES ON STATISTICAL INTERPRETATION

Demonstrating Client Improvement to Yourself and Others

The Single-Sample t Test and the Paired-Samples t Test

Statistics for Psychology

Things you need to know about the Normal Distribution. How to use your statistical calculator to calculate The mean The SD of a set of data points.

Psychology Research Process

Part 1. For each of the following questions fill-in the blanks. Each question is worth 2 points.

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

Section 6: Analysing Relationships Between Variables

CHAPTER 4 RESULTS. In this chapter the results of the empirical research are reported and discussed in the following order:

Daniel Boduszek University of Huddersfield

A Guide to Reading a Clinical or Research Publication

Student Performance Q&A:

Psychology Research Process

Transcription:

Statistical Significance, Effect Size, and Practical Significance Eva Lawrence Guilford College October, 2017 Definitions Descriptive statistics: Statistical analyses used to describe characteristics of a sample. Inferential statistics: Statistical analyses used to draw conclusions about a population based on a sample. Inferential statistics provide information to determine if results are statistically significant. Effect size: Strength or magnitude of an effect or relationship. Practical significance: Usefulness or everyday impact of results.

Inferential Statistics Descriptive statistics describe only the sample whereas one uses inferential statistics to draw conclusions about a population based on a sample. There are many different types of inferential statistics that you will learn about in later chapters, including t tests, chi-square tests, correlational analyses, and ANOVAs. When you calculate any inferential statistic in SPSS, you will obtain a p value. The p value indicates the probability that your results are due to chance alone, or due to error alone. Researchers usually choose p <.05 as a reasonable amount of error (meaning that there is less than a 5% chance that results are due to error alone). p =.0001.001.01.02.03.04.05.06.07.08.09.10.11 p <.05 A p value less than.05 indicates statistically significant results. Reject the null. There is a chance of a Type I error, and the exact probability of that error is indicated by the p value. For example, p =.02 indicates a 2% chance of a Type I error. p.05 A p value of.05 or greater indicates results are not statistically significant. Retain the null. There is a chance of a Type II error. Note: Researchers sometimes use more stringent criteria for statistical significance, such as p <.01, to further reduce the chance of a Type I error. On rare occasions, researchers might use a less stringent criteria, such as p <.10, allowing for a higher probability of a Type I error.

Reporting and interpreting p values Round your p values to two decimal places except in cases where the third decimal place provides important information about your results, such as if rounding would change the interpretation of the results. For example, report p =.049 because this indicates statistically significant results whereas rounding to p =.05 indicates that results are not statistically significant. Report the exact p value, except when SPSS reports a p value of.000. This does NOT mean there is no chance of error, the error rate is simply beyond three decimal places. In this situation, report that p <.001. Examples: If SPSS reports p =.034 You report: p =.03 Interpretation: Results are statistically significant. Reject the null hypothesis. There is a 3% chance of a Type I error. If SPSS reports p =.341 You report: p =.34 Interpretation: Results are not statistically significant. Retain the null. You have a chance of a Type II error. If SPSS reports p =.065 You report: p =.07 Interpretation: Results are not statistically significant. Retain the null. You have a chance of a Type II error. If SPSS reports p =.000 You report: p <.001 Interpretation: Results are statistically significant. Reject the null hypothesis. There is less than.1% chance of a Type I error. If SPSS reports p =.046 You report: p =.046 Interpretation: Results are statistically significant. Reject the null hypothesis. There is a 4.6% chance of a Type I error. *What if p =.05 or just slightly higher? In such cases, you might say that the results approach significance and report the exact p value. Some researchers will use this language of approaching significance with numbers up to p =.10, although others do not agree with this practice.

The theory behind statistical significance Statistical significance testing is based on probability theory in which you test your results against a population in which the null hypothesis is true. This population distribution is normally distributed. For example, a t test is an inferential statistic used to compare the means of two groups. The null hypothesis is that there is no difference between the groups, and for a t test, the null hypothesis is that t = 0. t gets further from 0 t = 0 t gets further from 0 The alternative hypothesis is that your sample mean is different from your population mean. Therefore, you want the t to be further away from 0, typically at least 2 SDs away from 0 so that your sample falls in the extremes of the population representing the null hypothesis. Why 2 SDs away? Because about 95% of the scores will fall within 2 SDs of the mean. If your t is greater than or less than 2 SDs, there is less than a 5% probability (p) that it represents this null hypothesis population distribution.

In statistical significance testing, the areas at the extreme of the population representing the null hypothesis are called the regions of rejection. When you set your criteria for statistical significance at p <.05, the regions of rejection represent scores above and below 2 SDs from the mean. If your score falls in the region of rejection, you can reject your null hypothesis. This is a two-tailed test because the regions of rejection are at both tails: +2 and -2 SDs from the mean. -t crit t = 0 t crit Even with a directional alternative hypothesis where you might use a one-tailed test (with only one region of rejection at either the positive or negative extreme), most researchers still run a two-tailed test because it is more stringent. The critical value of t (t crit ) marks the cutoff scores for the region of rejection. In order for an observed t to be considered statically significant, it must be the same or stronger than the t crit. The t crit depends on sample size - the larger your sample, the smaller the t crit required for the results to be considered statistically significant.

Effect Sizes Whereas results of inferential statistics will tell you whether or not your results met the criteria for statistical significance, the effect size will give you information on the strength of the effect or relationship you examined. You can have a strong but not statistically significant effect and you can have a weak but statistically significant effect. In other words, results of inferential statistics and effect sizes can vary independently, and it is therefore important that you report both. Two basic types of effect size are proportion of variance accounted for and a standardized difference score (most often Cohen s d, although there are others). Proportion of Variance Accounted For Recall that the standard deviation (SD) summarizes the degree to which scores differ from the mean (M). Variance is the standard deviation squared (SD 2 ), which is important to know if you are calculating statistics by hand. For our purposes, however, think of variance as providing the same type of information as the standard deviation in that they both tell us how much scores vary from the mean. When we examine the relationship between variables, or the effect of one variable on another, the proportion of variance accounted for tells us how much variance in one variable is accounted for by the other. There are different statistics for proportion of variance accounted for based on the types of variables you are examining. If you have two variables that are both interval or ratio, it is r 2 or the coefficient of determination. If one of your variables is nominal and dichotomous and the other is interval or ratio, it is rpb 2 or the squared pointbiserial correlation. An example might help make this clearer: A professor examining final exam scores might wonder if those who studied more did better on the exam. In proportion of variance accounted for terms, the professor wonders how much of the variance in final exam scores is accounted for by study time. If there is no relationship between study time and exam score, then 0% of the variance in one variable is accounted for by the other. If there is a perfect relationship, then 100% of the variance is accounted for. Study Time Exam Score Study Time Exam Score 0% of the variance in exam score is accounted for by study time 100% of the variance in exam score is accounted for by study time

These extremes (0% vs. 100%) are both unlikely. We would expect that there will be at least some relationship between studying and scores, but we would not expect that the relationship would be perfect. In addition to study time, exam scores are usually dependent on lots of different factors such as overall performance and attention in the class, test taking skills, stress levels, how well one slept, etc. Measurement error is also a factor in test scores, as it is with all scores. This example extends to other constructs in that there are multiple factors impacting any score, especially in the social sciences when we are interested in constructs such as love, motivation, personality, etc. Consequently, for r 2 and rpb 2, Cohen (1988) suggested that about 25% of the variance accounted for is a large effect in the social sciences, and about 9% is moderate, and about 1% is small. Study Time Exam Score A small effect 1% of the variance in exam score accounted for by study time Study Time Exam Score A moderate effect 9% of the variance in exam score accounted for by study time Study Time Exam Score A large effect 25% of the variance in exam score accounted for by study time

Standardized Difference Score The simple difference between the mean of two groups can provide important information about the magnitude of the difference between the two means in the sample, but we would not be able to easily evaluate the size of the difference beyond the sample. Consequently, it is better to calculate a standardized difference score that takes into account error and that is comparable across samples. Cohen s d is a standardized difference score that is on standard deviation units. It is a useful and commonly used effect size when comparing the mean of two groups. For example: A professor examines the difference in final exam scores between those who studied less than 6 hours and those who studied six hours or more. If there was absolutely no difference between the groups, Cohen s d would be 0 and the distributions would completely overlap, as shown below: exam scores for students who studied < 6 hours = exam scores for students who studied 6 hours If there are three standard deviations between the means of the two groups, Cohen s d would be 3. exam scores for students who studied < 6 hours exam scores for students who studied 6 hours Note. The images on this and the next page are from Magnussen (n.d.) The site is interactive check it out for yourself: http://rpsychologist.com/d3/cohend/

When interpreting Cohen s d scores, Cohen (1988) recommended interpreting about.80 as large, about.50 as moderate, and about.20 as small. A small effect: Cohen s d.20. The difference between the means of the two groups is.2 standard deviations. A moderate effect: Cohen s d.50. The difference between the means of the two groups is 1/2 a standard deviation. A large effect: Cohen s d.80. The difference between the means of the two groups is.8 standard deviations. Overview of Effect Size Interpretations (Cohen, 1988) Effect Size Interpretation Proportion of Variance Accounted for Cohen s d (r 2 or r 2 pb ) 1%.20 Small/Weak 9%.50 Medium/Moderate 25%.80 Large/Strong

Practical Significance Practical significance refers to the usefulness of our results or findings from our study. In other words: How do the results affect or apply to daily life? Even if we find statistical significance, the difference we find may not be noticeable or noticeably affect people s lives. On the other hand, a study that did not find statistical significance or have a large effect size may still yield important, practical results. A classic example of this is found in the medical literature. Findings from a study comparing heart attacks among those who took aspirin versus a placebo was stopped before completion because preliminary results were so clearly in favor of aspirin s benefits. The effect size was miniscule, with aspirin accounting for only a tenth of a percentage point in the reduction of heart attacks (Rosenthal, 1990). But the difference in outcomes revealed that those who took aspirin were slightly less likely to have a heart attack and much less likely to die from a heart attack, plus there were no clear health problems resulting from aspirin (Steering Committee of the Physician s Health Study Research Group, 1989). When we are talking about a life and death situation, even a very small difference can be meaningful. An example that might hit closer to home is the impact of time spent studying on exam scores. If it turns out that those who studied at least six hours for an exam scored a full letter grade higher than those who studied less than six hours, those results would have practical significance for students regardless if the difference in tests scores was statistically significant. On the other hand, if time spent studying had no meaningful impact on grades, one might decide the difference is not practically important. This assessment of practical significance would be the same if the results were statistically significant or not, and if the effect size was large or small. There are no clear-cut guidelines to evaluate the practical significance of results. Evaluation of practical significance requires a broader perspective and a consideration of context. Interpretations of practical significance are usually included in the Discussion section of a research report. Evaluation Descriptive Statistics Confidence Intervals (optional but recommended) Inferential Statistics Effect Sizes Practical Significance Where to report Results Section (and Method Section, as applicable) Discussion Section