Utilizing t-test and One-Way Analysis of Variance to Examine Group Differences August 31, PDF Free Download

Good afternoon, everyone. My name is Stan Orchowsky and I'm the research director for the Justice Research and Statistics Association. It's 2:00PM here in Washington DC, and it's my pleasure to welcome you today to our webinar on comparing group means, utilizing t-tests and one-way analysis of variance, to examine group differences. This is the latest in our series of webinars designed for training and technical assistance on statistical analysis for criminal justice research. As always, I joined by Erin Farley, who's a research associate with JRSA. Before we go any further, I want to thank our partners at the Bureau of Justice Statistics, for helping to make this webinar possible. Before we launch into today's webinar, I wanted to cover a few logistical items. We are recording today's webinar session for future playback. The link to the recording will be posted on the JRSA website, but you can also access the recording with the same link that you used to log on just now... to the webinar. And that's usually available later on this afternoon. Today's webinar is being audio-cast, via both the speakers on your computer and by phone. If you have speakers on your computer, or headphones, and are not a presenter, we recommend listening to webinar using your computer speakers or headphones. To access the audio conference, select "audio" from the top menu bar, and then select "audio conference". Once the window appears, you can view the teleconference call-in information, or join the audio conference via your computer. If you have questions during the presentation, or would like to communicate with JRSA staff, submit them using the chat feature on the right side of your screen. You can select "host" from the drop-down menu next to the text box. Today's session is scheduled for an hour. If you have technical difficulties, or get disconnected during the session, you can reconnect using the same link that you used to join initially. You can also call Webex tech support at 1-866-229-3239. Justice Research and Statistics Association Webinar Page 1 of 25

In the last few minutes of today's webinar, we're going to ask you to complete a short survey. The information you provide will help us to plan and improve future webinars and meet our reporting requirements. Alright, so we're going to start off with talking about t-tests and we're going to start off with Erin. Good afternoon, welcome everybody. So, before jumping into a t-test example, we wanted to take a moment to review hypothesis testing, and significance levels. Hypothesis testing is a common form of statistical inference, also referred to as statistical testing, and refers to the formal procedures used by researchers to accept or reject statistical hypotheses. And, a statistical hypothesis is an assumption about a population parameter. A population parameter is a statistical measure for a given population. For example, the mean and variance of a population, are examples of population parameters. The best way to determine whether a statistical hypothesis is true, would be to examine the entire population. However, since that's often impractical, researchers typically examine a random sample from the population. This entails statistical inference, generalizing from a sample to a population. If sample data are not consistent with the statistical hypothesis, the hypothesis is rejected. There are two types of statistical hypotheses... The null, and the alternative. The null hypothesis assumes that there is no different in the population, where the alternative hypothesis assumes there is a difference. The alternative hypothesis can be one- or two-sided. Here, if you are stating that the difference is either larger or smaller than the null, this would be a one-sided, or a one-tailed test. Or, if you are saying that there is just going to be a difference in either direction, this would be a two-sided, or two-tailed test. The decision... if you pick one or the other... will impact how you interpret the test statistics. For these examples, our alternative Justice Research and Statistics Association Webinar Page 2 of 25

hypothesis will assume that there is a difference in either direction. And so, we will be focusing on the two-tailed results. Hypothesis testing usually has one of two outcomes; you either accept, or depending on the terminology, you might fail to reject the null hypothesis, or you reject the null hypothesis. Two types of errors can result from a hypothesis test. Type I errors occur when the researcher rejects a null hypothesis when it is true. And the probability of committing a type I error is call the significance level, or the alpha. A type II error occurs when the researcher fails to reject the null hypothesis... a null hypothesis that is false. So just a quick review of significance levels. The P-value represents the probably of finding the observed results when the null hypothesis is true; usually indicating no difference. The most common p-value utilized in research commonly conducted by criminal justice researchers is.05. And this represents or means a less than one in twenty chance of being wrong. So, if the p-value is greater than.05, you would fail to reject the null hypothesis, because the results indicate that there is no difference or association. However, if the p-value is smaller than.05, you would reject the null hypothesis, because the result indicates that there is a difference or an association. Okay. So, an introduction to t-tests. T-tests are utilized to compare the means of two samples, to determine if there is sufficient evidence that the means significantly differ. This test can be utilized to compare sample means, to see if there is evidence to infer that the means of the corresponding populations also differ. There are three types of t-test. Independent sample t-tests, which compare means across two unrelated samples. The second, paired sample t-test, which compares the means across two conditions, but the Justice Research and Statistics Association Webinar Page 3 of 25

observations are not independent of one another. And then, the third... a one-sample t-test, which compares the mean of the sample to a given number. So, we are going to cover all three of these in today's webinar. So, just to provide a very brief description of the data that I will be utilizing in the t-test examples... It is a community based from, or drawn from a program, a community mediation Maryland re-entry program. This provides mediation services to prisoners in Maryland, and it provides two neutral mediators... a prisoner can have up to three two-hour sessions. What is most important here, is to understand that the individuals have information, they're interviewed while they are incarcerated prior to their mediation experiences. And then, they are also followed up with, and interviewed after they are released, and information is collected regarding their intake, their case management, their mediation sessions, their post-release follow-up interview, and their criminal history data, as well as criminal activity post release. So, specifically, there were 282 individuals that did participate in the mediation program, and these individuals were matched to comparison groups. Most of what we will see, so you might see the sample size fluctuate a little bit, because that may include the larger sample of both the individuals who participated in the mediation program, as well as the comparison groups. But there are a couple examples which focuses just on those that were in the program. Okay. So, first example; Independent samples t-test. So here is the formula related to this test, and you can see that it is really the difference between the two groups, divided by the standard error of the difference between the two groups. And this formula and this test compares the means of two different samples. There are requirements for the measures that you utilize to conduct this test that includes one dichotomous variable that is often times referred to, or identified as the independent variable. Then, the second is an interval or ratio variable, also referred to as continuous variable... and that is often identified as the dependent variable. Justice Research and Statistics Association Webinar Page 4 of 25

There are a few assumptions to be aware of when running a t-test, and they include, that the data are continuous; normally distributed, and that the variance of the two populations are equal. That the two samples are independent. And that the sample is representative of the population... the larger population. If the variance of the to populations is not equal, the results on the test will indicate this and provide an alternative tool. And when I say "the results on the test", I'm referring to an SPSS test. They will provide an alternate tool to assess the significance of the difference in the means. And this test is called the unequal variance test. Here, I have also stated two hypotheses; the null, and the alternative hypothesis. So here, the null hypothesis would be that the population means from two unrelated groups are equal. And for this exercise, the groups are... gender is the measure; males and females. Alternatively, in other examples, it may be participants in an experimental versus control group. That is something to remember as well. But here, the example is gender; males versus females. And the dependent measure is actually arrest/conviction rates during the post period. The alternative hypothesis is that the population means from the two unrelated groups are not equal. So, the next question... How do we run this type of test in SPSS? So here we have provided a number of snapshots to be able to show you how to go about running this example in SPSS, or something similar to it. So here, you would go to "analyze"... which you can see at the upper left-hand corner of the picture. Then you would go to "compare means" and when you click on that, you will actually see a list of types of test, and it includes one-sample, independent, and paired t-tests. If you click on independent t-test... this box that you see right before you... that will pop up, and then you would transfer the dependent variable, which again, in this situation, is the arrest/conviction rate... into the test variable box. And then you would transfer the independent variable, gender, into the grouping variable box. You can see that below at the bottom with the Justice Research and Statistics Association Webinar Page 5 of 25

two question marks within parentheses, because that indicates that you now need to define the groups. So you would click on the "define groups" button and input your value. Here the values are "0" for female, and "1" for male. Now, you can also use other values like "1" and "2". In addition, if you have more than two groups, but you're only interested in comparing two... so you might have three, for example, but you're only interested in comparing two, you can actually select those respective numbers as well. So, if you had group that were labeled, "1" "2" "3", if you were interested, you could actually put group one as "1", and group two as "3" or something like that... just to focus on the two groups that you are interested in comparing. So, once you do that, you press "Continue"... and then you press "OK" and then here, you will receive a similar output as you see here. This slide includes... presents the associated output... And the first table is titled "Group Statistics", and this presents the descriptive statistics, including the mean, the sample size, the standard deviation, and the standard error. So here you can see that there are 75 females, and that they had a mean arrest/conviction rate during the post period, of.51... and that there were 488 males... and they had a mean conviction rate of.35. So just to clarify any possible confusion... you may have remembered that I said approximately 282, I believe, individuals actually participated in the mediation program. Obviously, here we have over 500, so this is including the larger sample; those who both participated and did not participate in the mediation program. So here, we're just looking... comparing gender and the post-arrest conviction rate. So, once you review group statistics, you can move to the next table... and that is titled, the "Independent Samples Test." And here, it is important to remember that the independent t-test assumes that the variance of the two groups that you are measuring, are equal in the population. If your variance is unequal, this can affect the type I error rate. The assumption of homogeneity of variance, or equal variance, can be tested using Levine's Test of Equality of Variance, which is produced in Justice Research and Statistics Association Webinar Page 6 of 25

SPSS. And you can actually see that, to the left underneath, within the second table... the title "Levine's Test for Equality of Variance." This test provides an f statistic, and a significance value, or p-value. Just, as a side note, the f statistic is a ratio of the sample variance, and it is completely arbitrary which sample is actually labeled "1" or "2"... So you really just want to focus on the p-value. If the p-value is less than.05, it is statistically significant. We then have unequal variance, and we have violated the assumption of homogeneity of variance. This can be corrected by not using a pooled variance... a pooled estimate for the error term for the t statistic. This sort of gets back into the equation, referencing the equation... And really, what you need to know is that, if you are violating this assumption, you want to actually utilize a different method of calculating the t statistic and the significance level, and that would be the equal variance not assumed, versus the equal variance assumed. They are calculated differently. So, if Levine's test is significant, and the group variance is unequal, then you want to refer to the equal variance, not assumed. So here... let me catch up... looking at Levine's test, you can actually see that the result is not significant. So that means that we can then go forward and continue to look at the output and the results associated with equal variances assumed. And this significance level shows us that it is.001, the p-value, below our.05 threshold, indicating that the difference between the means is statistically significant. So I want to take a moment to just review some of the information presented in this output. For example, the t statistic... these are the ratios of the means... and you can see again that there are two statistics that are calculated based on what you're referencing... And, again, we're just referencing that first line. And this is the ratio of the mean of the difference... to the standard error of the difference under the two assumptions. So, what does that mean? Putting it... making it a little bit easier to understand, you can actually see here on the slide I put up, the calculation, the standard difference, divided by the standard error difference... the result is the t statistic. So that's how you get the t statistic. Justice Research and Statistics Association Webinar Page 7 of 25

The degrees of freedom for the equal variance route, is the sum of the two samples, minus 2. Again, significance level, we reviewed that. Then the 95 percent confidence interval, these are the lower and upper bounds of the confidence interval for the mean difference. A confidence interval for the mean, specifies a range of values within which the unknown population parameter and, in this case, the mean may lie. So I also wanted to go ahead and provide an alternative example really quickly, of a situation in which Levine's test for equality, where it is significant. So here, I've already highlighted in red, I want to draw your attention to the [inaudible 00:19:57] and the significance level... and here, you can see that the p-value is.001, indicating that this Levine's test is significant. And this indicates that our two populations... the variance is not equal, and if you look above in the syntax, you can see that I'm still using gender. So the two groups I'm comparing, remain to be males versus females... But, I am just looking at post-arrests. So, it is different variable that I have selected to examine, and this variable indicates the variance between the two groups is not equal. So then, we would not use that first row. We would actually go to equal variance not assumed. And then to our significance level, which is.044... And it is just a different method, it's not as strong as the equal variances assumed. And you can, if you take a moment, you can compare the two values and see, for example, the significance level changes from.013 to.044, a slight adjustment, but the outcome is still the same... It is below our.05 threshold, so we would say that the means between these groups, is statistically different. And, if we hop back up to the group statistics, we can see that the mean for females arrests is 2.48 and for males, it's 2.01. And that difference is significant. So how would you report these findings? Here are two examples... the first example, they're written slightly different. The first one states, "An independent samples t-test was performed, comparing the mean arrest conviction rate of males to females. As predicted, there was a significant difference between the two groups, with females having significantly higher arrest/incarceration rates during the post period, than males." Then you can also, for example two, you can say a little, slightly different, "An independent samples t-test indicates that females had a significantly higher number of post arrests than males." Justice Research and Statistics Association Webinar Page 8 of 25

What is important is that it includes both the mean and the standard deviation and the sample size for each group. And then, at the end it includes the t statistic, but within the parenthesis, that is the degrees of freedom, and then the p-value. So, it's the t... lowercase "t"... within the parentheses, that's actually the degrees of freedom equals the t-statistic, and then the p-value. So, we also wanted to provide an opportunity to present how you would do something like this in Excel. So, what you do need, in Excel, to conduct this type of analysis... so you need a analysis pack. You could actually do it... you could do it without the analysis pack... it would be step by step, which takes a little bit of time, and so it is lot easier if you have the data analysis pack available to you, to follow that method. So what you would do to check, if you do not know if you have it or not, is you would go to your Microsoft icon in the left-hand corner and you would click on that... Then you would go to Excel "Options" and then, this box would pop up. You would go down, and if you see [inaudible 00:23:44] you can see how "Add-ins" is highlighted in orange. You would click on "Add-ins" and this list would pop up. And it's very hard to see... I realize, but that little red arrow is pointing to a label that says "Analysis Tool Pack." And so, you would click on that... You would select that, and essentially add it on to your Excel. And the way that you can tell if you do or do not have it, is if you go to the "Data" tab in your Excel program, you look all the way to the far right... you will actually see a new box titled "Data Analysis." Forgive me. This is a very small slide picture, but I wanted to give you all a sense of where you would see it on your screen. Alright, so let's assume everybody goes and downloads that and has that... What would you do next? You would double-click on "Data Analysis" and this screen would pop up. And here you could do a variety of different types of tests. You can see there's t-test for paired two sample for means. There is t-test for two samples assuming equal variance. There's another one for assuming unequal variance, and then below that is a z test. Justice Research and Statistics Association Webinar Page 9 of 25

So, the thing here, is that you need to first determine if your variance is equal or unequal, before you select a t-test option. Whereas in SPSS, you could run it, and again, that Levine's test would actually tell you which results to actually utilize. So here we first have to take one additional step. And to do that, we would still be working in that data analysis box. You just scan up a little bit, you scroll up, and you will see something titled "F Test Two Sample for Variance". So you would click on that... Then something like this will pop up... and again forgive me, this is a really big screen, but what I wanted to do, is just indicate... give you a sense of, again, what we look at, and we're highlighting, since these are just screen shots. You can see, for the input, there's "Value 1 Range" or "Variable 1 Range" and "Variable 2 Range". And really, what you do, is you just click... it's an empty box... and you would first click on that... and then you go... you take your mouse, and you just drag over all the... the first row of the data. So here, we reference... the first column is actually males. There are a total of 40 observations, so you would just highlight all of those values associated with males... put that in "Variable 1 Range"... so you can see, column B goes all the way down to row 42. And then for females, they would be "Variable 2 Range", you again, highlight the relevant data, and it will pop up if you click that "Variable 2" box, highlight the data, and then it will scroll up and you'll see it pop in there. So you can confirm and make sure that you are highlighting all the data. If you've ever cut and pasted in Excel, you know you have that little outline that sort of blinks a little bit... You'll see that as you do this. So you'll be able to see exactly what you are, or are not highlighting. And then you just press "OK". And you will receive a new tab... it will pop up, and it will look something like this. And so, here looks a little familiar. Again, you're getting your descriptive statistics, you're getting your mean, your variance, you can see there are 40 observations. Your degrees of freedom. There's your f statistic, but again, we're not really interested in the f statistic, we're interested in the p-value. And so, with a probability of p, it equals.389, this is above.05... and so, assuming that the variances is equal... if it is below, you assume that the variance is not equal... Again, sort of repeating what I had mentioned Justice Research and Statistics Association Webinar Page 10 of 25

earlier when looking at the SPSS table. It is, again, above.05, so we are proceeding with the t-test, and assuming that there is equal variance. So, once we have done that, we go back to the data analysis, then we go back, we select the "Two Sample Assuming Equal Variance" option that I showed earlier... and once again, you do the same thing. You just drag... you click on the... where the values... you can see where the values are right now, but they will be... that will be an empty box. So you click there, and then you just go, and you highlight all the relevant values for males, and then you do the same thing for variable 2, and then you press okay. And so, this is the output that you will see when you do that. Again, mean, variance, number of observations... 40. Pooled variance, which is associated, again, with equal variance. Hypothesize any difference [inaudible 00:29:02] degrees of freedom. There's your t statistic. And, if we go down, you can see the p-value is.00170432, so for simplification,.002. That is well below our.05 threshold, and we can say that the mean between the two groups is statistically different. Okay... So, I'm going to follow that same format through the next two examples, but hopefully speed up a little bit, as we've sort of... the information that's presented is very similar, once you start looking at this, you'll be able to recognize and understand you are looking at similar information. So, what is the difference between independent and paired sample t- tests? A paired sample t-test compares the means that are from the same individual or another variable of interest. So usually it represents two different times or different, but related conditions, or units. So the best thing to think about, a really good example, is the pre- post test. They are also... a paired sample t-test is also referred to or called a "Correlated or matched sample t-test." Here is the equation associated with a paired sample t-test. Here, you have the mean difference of the paired data, divided by the standard error of the difference. And here, just below that, you can see the calculation for the standard error of difference. Justice Research and Statistics Association Webinar Page 11 of 25

As with the independent t-test, a paired sample t-test also has a number of assumptions associated with it. The dependent variable is again, interval or ratio. The dependent variable should be approximately normally distributed. Now I say only "approximately" because it is actually quite robust to violations of normality... meaning that the assumption can be violated, and still provide valid results. Again, the sample should be representative of the population. There should be no significant outliers... And then, below that, you can also see that I've included the null and the alternative hypothesis. So, for a paired sample t-test, it would be the paired population means are equal. And then, for the alternative hypothesis, it would be the paired population means are not equal. In this example, I'm utilizing scores on a measure based on an interview... let me just make sure, I want to... make sure I'm on the right slide. Okay. So, for this example, I'm actually using... as I mentioned... scores on a measure. It was a statement. A 5-point Likert scale response option, and a statement regarding perceptions of resolving issues of conflict... with "1" being "strongly disagree" to "5" being "strongly agree." So again, the question is, how do you run a test like this in SPSS? So, you would again go to "Analyze" "Compare Means", and select the "Paired Samples" t-test option, and you will see something like this pop up. Now you have the "Variable 1" and the "Variable 2" option, and really, what you're going to do, is just drag and drop here. It's quite easy... drag and drop. And so here, you can see the first variable is conflict, it's the pre-mediation measure... and then, the "Variable 2" is the post-conflict, or the second conflict measure, which is the post mediation score. Then you press okay. And, three, in your output, you will see three tables with results. The first table is titled "One Sample Statistics" and presents the descriptive statistic, including again, the valid number of cases, the mean, the standard deviation, and the standard error of the mean. So here, we can see that the score for the post-conflict measure... the pre.. the score for the mean was 4.01, and the post was 4.31... the paired samples correlation is the second box, and that shows the bivariate correlation coefficient, with a two-tailed test of significance for each pair of the variables entered. Justice Research and Statistics Association Webinar Page 12 of 25

For the pre- post conflict measures, we see that the two measures do not correlate. Now, when you're running... usually when your doing any type of analysis or advanced analysis, usually you start that with a number of diagnostic tests, which often times include bivariate correlations. So this, if you ran this independently early on, you would be seeing the same thing. This is just really a bivariate correlation. So the third table is the paired samples test, which is what we are most interested in. And again, I've included the example of how to calculate the t statistic. Here we see that the mean within difference between the two measures, is -.298. You also have the standard error mean. This is the estimated standard deviation of the sample mean. We again, have the 95% confidence interval, the difference, the lowerand upper bound of the confidence interval for the mean difference. A confidence interval again, for the mean, specifies that the range of the values within which the unknown population parameter, in this case the mean, may lie. Then we have the test statistic, the degrees of freedom and the significance two-tailed test. So this two-tailed test, the p-value is.001, below our.05 threshold... meaning that there is a significant difference, in particular, an increase, because we see that the mean changed from 4.01 to 4.31 in the conflict score between the pre- and post mediation. So how would you report these findings? Here is one example on how to do that. A paired samples t-test indicates that the conflict scores reported during the post mediation... here again, in parentheses, the mean and the standard deviation... were significantly higher than those reported during the pre period... means, standard deviation... And then, followed by a lowercase "t" in the parentheses... that is the degrees of freedom, which equals the t statistic -3.471 and then the p-value. So, the p was less than.01. So, how did you run this in Excel? So, very similar... as I showed earlier, I skipped one picture, which again, you would go to that data analysis in the far right. In that column, you double click then up pops up a box with all different types of analysis that you can conduct, or utilize... and one of those would be a paired t-test. You double click on that... this is the box that you would see. Justice Research and Statistics Association Webinar Page 13 of 25

So once again, you're following the same procedure. You're just going to click on where that value 1 range is, and there, obviously, before you fill it in... it will be empty. So you click there that you indicate that's what you're filling. You're going to highlight the column of data, which is actually the same exact data that I used before... just changing the purpose of it. So again, it's 40 cases, and for illustrative purposes. Then you would highlight it would be variable 1... You would highlight variable 2... you would press okay. And then the results would pop up in another tab, an then it would look something like this. So this should all look familiar, based on what we ran for the independent t-test. And so, if you scroll down, you can see where it's highlighted, this is what is of interest, is that specifically that significance, that p-value of.002. So once again, that is below our threshold of.05, we would say that the difference is significant. Just to take a moment to look at the results a little bit more again, you can see your mean is 2.65, and variable 2 is 3.52. So, being able to utilize these examples, or produce these analysis in Excel, if you have the data analysis pack, is quite easy to do. So the next example is the one sample t-test. And this tests whether the mean of a distribution differs significantly from a pre-set or a known value... which is usually the population mean. It compares the averages of the sample which is observed, to the population which is expected. So what you could do, for example, if you had a certain... if you were doing research... perceptual research on defendants, or some group similar to that, and you had a score of a sample... but you were comparing them to maybe the results in another state, in your state, or a national level, you could... you're selecting that state or national value that is your pre-set known value. And you're comparing that... you want to find out if your sample is significantly different from that selected value. So once again, we have assumptions for running this t-test, and some of them are similar. Then... in comparison to the other ones, so you do again, the dependent variable is an interval/ratio, the dependent variable should be approximately normally distributed, so again, it is a little bit flexible in terms of violating that assumption. The sample is representative of the population. And so, I believe I might have Justice Research and Statistics Association Webinar Page 14 of 25

referenced SRS in the earlier slides as well, and maybe did not explain that. That is referencing the methodology of simple random sampling. And then, also no outliers. Here is the equation, for anyone who is interested in calculating by hand, the one sample t-test. Again, it's the sample mean, minus the sample of the population mean, divided by the standard error of the mean. Null hypothesis for a one sample t-test, would be that there is no significant difference between the population mean and the sample mean. Or, the alternative hypothesis, again, is that there would be a difference. So, how do you calculate this in SPSS? Here, you go... you follow the same path that I have mentioned earlier. You just find our measure... your continuous or interval/ratio measure that you're interested in. And here, I'm reusing conflict... that conflict score during the post period. And then, I selected a test value... quite honestly, it's quite arbitrary value of 4, because, if you remember, this conflict measure that I'm utilizing is based on a 5-point Likert scale. So I selected the value "4". Let's just say that that is... let's the national accepted standard, or the goal, for what we want individuals to be reaching, in terms of their perception of resolving conflict. So, to what level or what degree is this group different than the other? So, just to clarify, these two slides are out of order. So this, obviously just presents the table. This is how you would fill it in... the red arrows indicating... and then, this is what the output would look like. So, if you look at the syntax in the upper right left-hand corner, I did want to highlight, you can see that "4", that known value is the test value... indicated as the test value. So, the one sample statistic, that first table, again, presents descriptive information. You can see the sample size, which is 241. The mean is 4.33, you have your standard deviation and your standard error mean. Now, the question is, does that mean of 4.33 significant differ from 4? So then we would go to the second table... the one sample test, and we would look at the significance level. And that is.001. So that, again, is below our.05 threshold, and we would say that 4.33... the average score Justice Research and Statistics Association Webinar Page 15 of 25

of our group is significantly higher than the known value, or the test value of 4. How would you report these results? You would say something to effect of "a one sample t-test was conducted on the conflict scores of defendants to determine if the mean was significantly different" should say significantly different "from 4. The expected mean for defendants, in general. Results indicated that the sample mean of 4.33" Again now you see your standard deviation included... "was significantly different from 4" you have lowercase "t". In parenthesis, is the degrees of freedom... equals 6.41. That is the test statistic, the t statistic, and then your p-value... is less than.001. So how would you calculate this in Excel? Now this is a little interesting because, if you look at your data analysis options, there is no one sample t-test. So what you actually need to do is you need to fiddle with it a little bit... and that you select your two sample, assuming "Unequal Variance" option... But, before you do that, you want to take that value "4", and you want to actually create an additional column... consider this the comparison column... and you want to run it all the way down next to all of your valid scores for your sample. So in this example, I have 20 observations, and you can see that whole row is just all 4s, and that the scores are the column in the middle. So once you do that you can, again do the usual selection process... highlighting your scores would be variable 1, highlighting the comparison values would be variable 2... pressing "okay"... your going to see output similar to this. If you look at your means, you can see that that first group is 3.5... has a mean score 3.5... and then variable 2, not surprising, has a value, a mean of 4, and no variance. There are 20 observations. And then, if you scroll down, if you look at the p-value, you can see that the p-value is.076, and that would not be statistically significant, because that is above our.05 threshold. Okay, so there may be a number of questions about t-test, so we're going to hold off on those, if you can. And I'm going to then pass it over to Stan to discuss and present information on ANOVAs. Justice Research and Statistics Association Webinar Page 16 of 25

Thanks, Erin. So we're using a slightly different dataset here. We're using an old experiment evaluation of drug testing and treatment interventions for probationers with multiple levels of the independent variable. So, what do we do when we have multiple... we have more than two levels of the independent variable? Well, we could use multiple t-tests, with all possible combinations being tested. But the problem with that, is that we have an estimated alpha or p-value that is going to be inflated because of doing these multiple tests. And the estimate is shown right there. So, for example, if you recall, as Erin mentioned... so, we're looking at p- values that fall into the tails of this normal distribution, with the 95% falling in the middle, and then 2 1/2 percent on either tail. So, if we want to set a probability level.05, and we're actually doing those three comparisons from the... if we have three levels of the independent variable, we're actually doing that overall alpha value is actually inflated to.14, rather than.05. So, we need some way of holding that value down to somewhere reasonable that we want it to be. And that's where one way analysis of variance comes in. So, one way ANOVAs test whether each treatment group mean is significant different from the overall mean. Another way of saying that, is testing whether or not there are significant differences among the group means. And again, we're controlling for the overall alpha level. So the concept here is partitioning variability. So the observed variability in the data is divided out into two components. The between-group component, which is due to the treatment effects... the experimental effects... and the within-group component, which is due to individual variation and measurement error, sampling error and so forth and so on. So ideally, the between-group variability should be large relative to the within-group variability. And, in theory, the within-group variability should be near zero. But in reality that rarely happens. So, the other concept that we need here, is the concept of sums of squares, which is sum of the squared differences between the means. Justice Research and Statistics Association Webinar Page 17 of 25

So we have a sums of squares, sub b for between or "t" for treatment, and a sums of squares within, or a sums of squares for error, yes. So, this is a very simple little illustration here. Imagine we're looking at probation procedures and looking at outcomes for domestic violence offenders. So we're going to look at three different types of probation... Regular probation versus an intensive supervision probation, versus an intensive supervision probation with probation officers trained in some batterer treatment regimen. And the dependent variable will victims ratings of change in offenders' behavior, with higher ratings indicating more positive change. And imagine 24 subjects randomly assigned to those three treatment conditions. That data would look something like that. So you can see, we've got the annual [inaudible 00:50:59] if you look at the right hand column at the bottom... The total n is 24, that's 24 subjects, eight in each of three groups. k, which is the number of groups... that's three. For each of these, I've given you the sum of [inaudible 00:51:15] these are the terms that we'll need for the formula. The sum of the x's, which is the sum of the values in each of the three columns... that's the 37, 38, 62... sum of the x-squares, and then the average for each of those columns. The sum of the x-squares is the sum of each x-squared. So, that's why the example in the top left is 4 squared is 16, so if you square each of those eight entries, and then add them up, you'll get 199. And then finally, G, which is the grand mean, which is the average of all of the 24 scores, which is 5.71. So, I just wanted to point out that, one of the things that you can do here when you're looking at this, is kind of an eyeball test. So, if you take a look at those group means, and think about what they might mean, you can see that the "ISP + Batterer" treatment mean is a lot higher than either the ISP mean, or the traditional probation mean. So just by eyeballing it, you can get a sense that maybe there's not a whole lot of different between a and b, but there is a pretty big difference between a and b, and then c... compared to c. So what we're essentially going to do with the ANOVA is testing the statistical significance of those apparent differences. Justice Research and Statistics Association Webinar Page 18 of 25

So this is what this looks like, this is just the top part... it's just reproducing those numbers from the previous table. There's the formula for sums of squares within, for the "a" column. You can see how that works, and then that is repeated for b and for c, so for each of the treatment groups... and then all of those are summed together to get the error... the sums of squares for error. The sums of squares for the between, is the difference... conceptualizes, as I said earlier, as one of two different things. Either the difference between each group mean, and the grand mean, and you can see how that would be calculated there according to the formula. Or, as the difference between each group mean and the other group means. So, taking each group mean, so a minus b, a minus c, and b minus c, and working that all out through that formula. If you do all of that math, you end up with these values. So, a sums of squares between, a sums of squares within, and sums of squares total, which is the sum of those two sums of squares. Degrees of freedom for the sums of squares experimental, is k minus one, the number of groups minus one, so that's two in this case. Sums of squares within, degrees of freedom is total minus between, which is 21 in this case, and total is then minus one. So then, to calculate, what we're going to do is calculate an f statistic. In order to do that, we're going to first calculate a mean square, which is the sums of squares for each of those... the between, within, divided by the degrees of freedom for each. That gives you a mean square. And then the f statistic is the mean square for the treatment, divided by the mean square for the error... with degrees of freedom two and 21. The degrees of freedom between, and the degrees of freedom within. So, back to the example that... the detail that we had up before, so here's a summary of what we've got from that Maricopa county data. So this is four treatment conditions. It's a standard probation condition with no drug testing, a standard probation condition with random monthly drug tests... a standard probation condition with scheduled drug tests twice a week, and then drug court. And the dependent variable that I created here, is the number of face-to-face contacts, which includes contacts at office, work, school, home visit, community, and other, in the first six months of treatment. So, here's how you would run this in SPSS... looks very similar to what you just saw... And under "Analysis" you see, under "Compare Means", Justice Research and Statistics Association Webinar Page 19 of 25

we go down to the bottom, and you'll see the one way ANOVA alternative there. The procedure has you specify a dependent variable list, which in this case, is "contacts", and what's called a "factor", and in this case, the factor is called "Track", and those are the four probation treatment conditions. You'll see that, next to the... dependent list, you have three options there... "Contrast", "Post Hoc", and "Options". In this example, I clicked on "Options" and then clicked on "Descriptive." You will need that in order to give you the actual values of the means for each of the treatment groups. So, here's the output that results from doing that... You can see the means in that column... and then, the ANOVA table below. So you have the Between sums of squares, the Within sums of squares, the Mean square, which is the sums of squares divided by degrees of freedom, the f, which is the ratio of the two mean squares, and then the significance value. So, that f value obviously is very large, so there's a statistically significant difference. What you don't know from this, is what's different from what? You just know that you've controlled for that overall alpha level, and that something is going on. So one or more of the means of these groups, are different from the grand mean, or one or more of these groups are different from each other. So. How do you determine what's different from what? Well, the answer to that is, you run one or more of, what are known as these post hoc significance tests. Post Hoc meaning "after the fact". So these multiple comparison tests, as they're called, are used to determine which treatment groups differ from each other. They're used when the overall f test is significant, and all of them work basically the same way... Well, one of two ways... They compare either pairs of means, controlling for the overall alpha level. So pairwise comparisons... those are known. Or, they identify homogeneous subsets among the group means. And there are roughly, I think 17 or so of these tests that are available in SPSS, and it's beyond our ability to go into the details of all of those. And, a matter of fact, I would point out it's now 2:58, so we're going to go a little bit over our allotted hour, for those of you who'd want to hang in there with us. Justice Research and Statistics Association Webinar Page 20 of 25

So, before we do that though, let's go back to my point earlier, which is, trying to eyeball the means themselves, and see if we can take our best guess as to what is significantly different, or might be significantly different from what. So if you look at this set of four means, here's what it looks like to me. It looks like... it looks like my line didn't go high enough. It looks like that regular probation and random testing are pretty close to each other, and maybe not terribly different from each other. Both of those appear to be lower than the scheduled testing, which has the highest number of contacts, which you would certainly expect, given that the scheduled testing includes the drug tests. And then, the drug court... the number of face-to-face contacts, seems to be lower than... certainly than the scheduled testing, and then the regular probation and random testing combined. Before we move on, I also wanted to point out that the assumptions for the ANOVA are exactly the same as the assumptions that Erin mentioned for the t-test. And if you look at that column of standard deviations, you can see that those assumptions have been wildly violated in this example, particularly in the scheduled testing, where the standard deviation, which is the square root of the variance, is much much higher in that group than in any of the others, and the drug court one is lower than the others. And I suspect that's due to the fact that folks didn't always show up... or not all folks showed up for all scheduled testings. So we will talk a little bit about the assumptions there. So, here's how you generate those post hoc tests. There's an option, which you can see by the red arrow for using those, and up pops the screen of all of these many post hoc tests that are available to you. So, I want to just look at, for the moment, Tukey's test, that you see there, because I want to show you what the homogeneous group's looks like. So, when you specify that test, this is what you get. So this looks very similar to our eyeball test. You'll see that the drug court sits in its own column. The scheduled testing sits in its own column, and then the regular probation and the random testing sit together in the same column. Justice Research and Statistics Association Webinar Page 21 of 25

So, in terms of homogeneity, the random testing and the regular probation, seem to be part of the same homogeneous grouping subset, whereas the other two seem to be part of their own subsets. So that's what the homogeneity tests look like. This is what the pairwise test would look like. So we're going to look at two here... we're going to look at the Tukey, [inaudible 01:02:06] significant difference test, and the T2 test. So you can get an idea of the effect of the violation of the assumption of equal variances. So this is the output that's generated from those two procedures, the Tukey at the top, and the Tamhane at the bottom. These are the significant values, these are repetitive, so the ones that are circled, those five values are the ones that we're concerned about. And as you can see, there is no significant difference between the regular probation and the random testing, and then everything else... every other pairwise test is significantly different from each other. The Tamhane test, which doesn't assume equal variances, shows a similar result... different values, but a similar result. I'll give you a moment to read this. This is how you might go about reporting the results from those analyses... Before I go to the Excel example, I just want to... well, nevermind. I will go to the Excel example. Okay. So, you can do this in Excel. There's your data analysis component, that Erin talked about. If you go in there, and you look at... you can see that we've highlighted the comparison there for you. If you click on... Oh, before we go... so, the Excel example is a tad cumbersome, because you have to have the data organized either by rows, or by columns. So if you're comparing those four groups, then you have to have the four groups... and in this case, the way it makes most sense, is to have them... each treatment group is a different column. So you have the regular probation, random testing, scheduled, and drug court... each in its own column, and then the observations are in the rows. So the data have to be organized either that way, or the other way in order for the test to work, which can take some doing, depending on how your data are set up. Justice Research and Statistics Association Webinar Page 22 of 25

Utilizing t-test and One-Way Analysis of Variance to Examine Group Differences August 31, 2016