1 One-Way ANOVAs We have already discussed the t-test. The t-test is used for comparing the means of two groups to determine if there is a statistically significant difference between them. The t-test allows us to determine the probability that we are making a Type I error (rejecting the null when it is true). If the probability (p-value) of obtaining the t-value that you did, is less than alpha (generally set at 0.05) then we reject the null hypothesis and conclude that there is a difference between conditions with at least 95% confidence that the difference between the two conditions is not due simply to chance. Take a detour with me. Imagine your instructor stated that she could obtain a 6 on the throw of a single die and then she did it! Would that be impressive? What would the probability of doing that be?. Now suppose that she said she could obtain at least one 6 on the throw of a pair of die. Is this more or less impressive than using a single die? What is the probability that she could get a 6 with two dice? What if she said that she could get at least one 6 by throwing 3 dice? What is the probability now?. We have the same problem when we want to make more than one comparison of condition means within a study. For example, we may want to include 3 conditions in our study A, B and C. I am interested in looking at difference between the means of conditions A and B; between the means of condition A and C and between the means of conditions B and C. Each new comparison we make is like adding an additional die to the example above. If we set alpha at.05 we have 5% chance of making a type one error with every comparison we make. The probability of making a Type I error in this study is not 5%, it is, in fact, 15%. Continuing with the example we have been using on the Marital Status and Happiness Ratings. The dependant variable in this example is happiness ratings. Assume that I interviewed 20 Married, 20 Single and 20 Divorced persons from Grant County for this study and obtained their responses to the Happiness Questionnaire. My Independent variable is Marital Status and it has three levels; single, married, and divorced. Below are Descriptive Statistics for this study. They are presented in the same format that would be produced if you analyzed the results using a computer statistical analysis program called Statistical Package for the Social Sciences (SPSS). As we discuss ANOVAs you will need to learn to interpret results from tables presented in the SPSS format. The Descriptive Statistics Table at the top of the next page contains the means, standard deviations and sample sizes (N) for the total sample as well as for each of the conditions.
2 Descriptive Statistics Dependent Variable: Happiness rating Marriage Mean Std. N Status Deviation Single 5.4000 1.6670 20 Married 6.6500 1.7252 20 Divorced 4.9500 1.7614 20 Total 5.6667 1.8381 60 Note: Even though the numbers in the table are not, you should always round to two decimal places. To determine if there are statistically significant differences among the mean happiness ratings for these three conditions, I could do three t-tests to compare all possible combinations of these groups. (i.e., I could compare single to married; single to divorced and married to divorced.) Remember for each t-test we have a 5% chance of making a Type I error. If I do three t-tests, I triple that chance and leaving a 15% chance of making a Type I error in the entire set of comparisons. If I found a significant difference for any or for all of these comparisons, I could not say that I was 95% confident that the difference is not due to chance. I could only be 85% confident. In science, that is not good enough! Fortunately, there is a method for comparing more than two means. The procedure is called an Analysis of Variance (ANOVA). When we have one independent variable (with 3 or more levels) we use a procedure called a One-way ANOVA. There are other variations of this test that can be used for factorial designs (designs with more than one independent variable). For example, if we did a study with two independent variables, we would use a Two-way ANOVA to analyze the results. What do you think they would call the analysis used for a study with five independent variables? An one way ANOVA can be used to compare any number of condition means and still maintain the probability of making a Type I error at 5%. An ANOVA is called an omnibus test because it looks at the amount the whole set of means differ from each other and determines if that pattern of differences is likely to have occurred less than 5% of the time by chance alone. If the entire set does not have a p value of greater than.05, than no one comparison can either. Recall when hypothesis testing we were deciding whether to reject or accept the null hypothesis. If we reject the null, we conclude that the scientific (alternative) hypothesis is the more tenable conclusion. The two hypotheses for the ANOVA are: Scientific Hypothesis - There are differences between at least 2 of the groups. Null Hypothesis There are no differences among the groups.
3 The logic of this test is simple (the mathematics is more complex but the computer does it). Here is the basic idea. Review: When we discussed measures of dispersion much earlier in the term, we talked about range, deviations scores (the score minus the mean), variance and standard deviations. The range is not a good estimate of the dispersion of scores because it is greatly influenced by extreme scores. Deviation scores when summed, equal zero, so they too are useless a measure of the dispersion of scores in a sample distribution. To get around this problem we square all the dispersion scores (this makes them all positive numbers) then we can sum them and divide by the sample size to obtain the mean squared that amount scores in the distribution deviate from the mean. This mean squared dispersion is called the variance of the distribution. Since most of us have difficulty thinking in squared amounts, it is generally more useful to think about the standard deviation of a distribution. The standard deviation is the square root of the variance and can be thought of as the mean amount the scores in the distribution vary from the distribution mean. Why are we talking about variances instead of standard deviations? The problem with the standard distribution is that it is a square root. We cannot add, subtract, multiply or divide square roots without squaring them first. For example, 9 9 18. Since variances are easier to work with mathematically, we use them for this analysis. Keep in mind however, that a variance is just a measure of the dispersion of scores. The larger the variance, the more spread out the scores are in the distribution. 1) Using the Happiness and Marital study example, we start with the assumption that people in general differ from each other in happiness. I expect that married people show the same variability in Happiness that single people do and that divorced people do. In other words, I might expect that marriage shifts the Happiness ratings of the entire group, but does not effect how much variability in happiness there is within the group. If there are differences between my groups that are due to the independent variable (Marital Status), I expect the means of the groups to differ but not the variances. 3 4 5 6 7 8 We refer to the amount of variation that is associated, in general, with the dependant variable as Within Groups Variance. It is the amount that scores between individuals within the same condition would be expected to vary from the mean of their condition. It does not have anything to do with variation due to the level of the IV. It can be thought of as variance that is due to random variation between individuals or what statisticians call Error. Having three groups (conditions) I have three estimates of this Within
4 Groups Variance. Using the average of these three Within Groups variance estimates gives a better estimate of the general variation of happiness in the population. We make a second assumption -- that we can use Within Groups Variance to estimate the amount of variation we would expect to find Between Groups if the Independent variable has no effect (if the null is true). Therefore, we assume that Within Groups Variance give a good estimate of Error variance. This estimate of error variance is called the Mean Squared Error (MSE). That leaves just one last step, measuring the amount of variance between groups. The way this is computed is complex but for this class we do not need to worry about that, SPSS will do the math for us. If we repeatedly measured samples of the same size drawn from the same population, over and over again, we would not expect to get exactly the same mean each time. (Remember, the distribution of the means we discussed when we discussed t-tests.) Similarly, if we sample three levels of our IV, even if they do not differ from each other on the DV, we would not expect to get exactly the same means for each condition. The means will vary simply due to random variation. However, they might also vary due to differences in the level of the IV. What is important is that you understand that Between Groups Variance is due to both error variation and to variation due to the independent variable. Between Group Variance is the variance of the entire set of scores in the study. Between Group Variance = Random Variance + Variance Due to the Independent variable 3 4 5 6 7 8 Assume for a moment that there is no effect of the Independent variable. This means that the independent variable adds zero variance to that we would expect to find due to chance. If there are no differences between our groups, then we would expect our Within Group Variance and our Between Group Variance to be equal. What would we expect the ration of these two variance estimates to be if there were no effect of the IV?
5 Random Variance + Variance Due to the Independent variable = Random Variance + 0 Random Variance Random Variance We would expect to find a ratio of one. The amount that the ratio differs from 1 can be attributed to variation due to the IV. So, if the IV has an effect, the ratio of Between Groups Variance to Within Groups Variance will be greater than one (i.e., the numerator will be greater than the denominator). This is called an F ratio. Because we are only using estimates of variances we do not expect that the ratio will always be one even when the IV does not have an effect. How much the F ratio needs to be above 1 for us to be 95% sure that there really is an effect of the Independent variable can be determined using probability theory, Again (lucky us!) the probability of making a type I error is calculated for us by the computer program. SPSS provides the following type of output. Tests of Between-Subjects Effects Dependent Variable: Happiness rating Source Sum of Squares df Mean Square F Sig. Marital Status 31.033 2 15.517 5.255.008 Error 168.300 57 2.953 Total 2126.000 60 F is the ratio of Mean Squared Variance due to Marital Status (Between Group Variance) and Mean Squared Error (Within Group Variance). Sig. (in the final column) is the probability (p value) of making a Type I error. Because p is less than.05 we reject the null hypothesis and conclude that there is a significant difference between at least two of the means. We would report this result by stating that A One-way ANOVA determined that Happiness ratings significantly differ among Marital Status Groups (F(2,57) = 5.26; p =.008). The numbers in the parenthesis following the letter F are the degrees of freedom (df) associated with this analysis. The first is the degrees of freedom between groups and the second is the degrees of freedom within groups. They are related to the number of conditions in the study and the number of participants. They must be reported in APA reports. They are always reported in parenthesis after the letter F and are always reported in the order Between Groups df, Within Groups df separated by a comma. A significant result for a One-way ANOVA allows us to reject the null hypothesis and conclude that the scientific hypothesis is the most tenable conclusion. (Remember the scientific hypothesis is that there are differences between at least 2 of the groups.) The One-Way ANOVA does not tell us, which groups differ from each other. To determine that we need to make individual comparisons. When the F ratio is significant, SPSS continues the analysis by running Post hoc (follow-up) Multiple Comparisons of the sets of means to determine which means are significantly different from each other. These are very much like doing t-tests between
6 all combinations of the means. Why can we do them now? Having found a significant F ratio from the One-way ANOVA we know that the level of Type I error is limited to 5% for the entire set of group comparisons. Therefore, it is safe to go ahead and do the three separate comparisons. Since the pattern of differences we found was unlikely to occur (p <.05) if we selected three samples randomly from a population, we can be assured that any significant differences we find in the multiple comparisons are not due to having done multiple tests (like throwing the dice three times) but are actually due to the effects of the IV. There are several ways of doing these post hoc Multiple Comparisons. I have had SPSS do a Least Squares Difference Test (LSD). Looking at the Multiple Comparisons Table below, the difference between each set of means is listed in the third column, and the p value (sig) in the last column. The LSD multiple comparisons analysis determined that married people rate themselves as significantly more happy then single people (p =.025) and than divorced people (p =.003), whereas single and divorced people do not differ on Happiness ratings. Multiple Comparisons Dependent Variable: Happiness rating LSD (I) Marital Status (J) Marital Status Mean Differenc e (I-J) Std. Error Sig. Single Married -1.2500.5434.025 Divorced.4500.5434.411 Married Single 1.2500.5434.025 Divorced 1.7000.5434.003 Divorced Single -.4500.5434.411 Married -1.7000.5434.003 If you were answering an exam question, I would be looking for the following. A statement about the one-way ANOVA: If the ANOVA is not significant, do not go on and interpret the LSDs. If the oneway ANOVA is significant, then you need to give a FULL interpretation of the LSD Multiple comparisons. If there are three conditions you need to make three statements. If there are four conditions you need to make 6 comparisons. If there are five conditions you need to make 10 comparisons.
7 In our case: The one way ANOVA was significant (F(2,57) = 5.255, p =.008). The LSD multiple comparisons analysis determined that married people rate themselves as significantly happier (M = 6.6.5, s = 1.73) than single people (M = 5.40, s = 1.67; p =.025) and then divorced people (M = 4.95, s = 1.76; p =.003), whereas single and divorced people do not differ on Happiness ratings. Within Subject Designs When the design of the study is Within Subjects, the Post Hoc Multiple comparisons would be paired t-tests instead of LSDs. We will see an example of this in concept checks. They are interpreted in the same manner, but are displayed in tables a little differently.