HYPOTHESIS TESTING Hypothesis A statement about the relationship between variables that makes a falsifiable prediction Relationship can be (as one variable changes, the other changes too) or (change in one causes change in the other) Prediction predicts future events Falsifiable predictions can be clearly (shown to be false) through observations anyone can make Hypothesis Potential hypotheses? People are generally charitable Not a viable hypothesis (only one variable) People who are religious are generally charitable What is generally charitable? Viable hypothesis: Religious people give more money to charities than non-religious people Must be and 1
Hypothesis Testing Hypothesis: Increasing general arousal will increase memory functioning Group #1 studies while listening to low-tempo music Group #2 studies while listening to high-tempo music Both groups then tested for memory Example Is this a big enough difference to be sure arousal is having a real effect? Example What if there are only people in each condition? What if there are only people in each condition? 2
Example What if there was a larger difference? Significant Difference? What if there is a large overlap between groups? I.e., within-group variability Significant Difference? What if there is little overlap between groups? I.e., within-group variability 3
Statistical Significance Significance: When an effect is greater than could be reasonably expected from chance alone Effect is more likely to be significant if It is found in a sample of participants The effect itself is Large group difference, high correlation, etc. There is overlap between the groups The groups are very different from one another Affecting Statistical Significance N = 1 N = 2 Likely a more trustworthy group difference Affecting Statistical Significance Likely a more trustworthy group difference 4
Affecting Statistical Significance Likely a more trustworthy group difference Let s do some hypothesis testing Hypothesis = perceive their bodies more positively than women Experiment: Have women and men rate their own level of attractiveness Have strangers rate these women and men in attractiveness Measure: What is the difference in attractiveness between self- vs. otherratings? Results Attractiveness rated on 1 5 scale higher than Subtracting other- from self-ratings lower than - 5
Results Is this effect real? I.e., is this group difference more than could be expected by chance alone? Statistics used to calculate the likelihood that this difference could be produced by random chance alone Yields a probability (p) value p = probability that group difference is just due to i.e., effect is real - p = probability that group difference in data is just due to luck (i.e., effect is not real) Random chance effects can be caused by many things E.g., May have just happened to get women with abnormally low self-esteem and/or men with abnormally high self-esteem I.e., E.g., Something about the procedure (e.g., instructions, measures) may have made women underestimate their attractiveness or made men overestimate their attractiveness I.e., experiment p = probability that group difference in data is just due to luck (i.e., effect is not real) However, if p.5 I.e., there is only a 5% probability (or less) that the effect found was due to chance scientists are willing to conclude that the effect is likely real Why scientists use the phrase with confidence 6
Significant differences can be visualized through error bars higher than No Overlap = Significant Difference Subtracting other- from selfratings lower than - Significant differences can be visualized through error bars higher than Overlap = No Significant Difference Subtracting other- from selfratings lower than - Significant differences can be visualized through error bars Error bars show our as to the real means of groups Estimated population means, given 95% confidence So, if we are 95% sure the groups don t overlap Then we are 95% confident there is a significant difference between groups I.e., less than 5% chance they are the same - 7
Overview: Hypothesis Testing Form hypotheses: A testable prediction Predicting significant effect (i.e., group difference, correlation, etc.) Collect data Conduct significance testing If p.5, effect likely isn t due to chance Significant effect I.e., a real effect If p >.5, findings may be due to chance Non-significant effect Forms of Error In Reality Effect does Effect does not exist exist Study Finds That Significant effect found No significant effect found Accurate J Type II Error L Type I Error L Accurate J Not a Perfect System There is always some likelihood of error in hypothesis testing Error When an effect in the data is just due to random chance, but is still found to be statistically significant If p =.5, there is still a 5% chance the effect is due to luck (i.e., not real) Can be caused by differences between groups Can be caused by in sampling or measurement that introduces confounds 8
Not a Perfect System Error When real effects aren t detected as Can be caused by having too few participants to make an effect statistically significant E.g., Hypothesis: Students who read 1 extra book per month for 1 year score will higher on their SATs than students who don t Control Group (no extra books): 6 Experimental Group (extra books): 62 Would need over participants to find this small effect significant Not a Perfect System This is why statistical significance is always expressed as a Using 95% confidence is a good rule of thumb for minimizing both Type-I and Type-II error 95% confidence is a high criterion to surpass, so most effects that surpass that criterion are real effects I.e., minimizing Error If we set the confidence criterion higher, many more real effects wouldn t be found significant I.e., minimizing Error 9