The Single-Sample t Test and the Paired-Samples t Test

C H A P T E R 9 The Single-Sample t Test and the Paired-Samples t Test BEFORE YOU GO ON The t Distributions Estimating Population Standard Deviation from the Sample Calculating Standard Error for the t Statistic Using Standard Error to Calculate the t Statistic The Single-Sample t Test The t Table and Degrees of Freedom The Six Steps of the Single-Sample t Test Calculating a Confidence Interval for a Single-Sample t Test Calculating Effect Size for a Single-Sample t Test You should know the six steps of hypothesis testing (Chapter 7). You should know how to determine a confidence interval for a z statistic (Chapter 8). You should understand the concept of effect size and know how to calculate Cohen s d for a z test (Chapter 8). The Paired-Samples t Test Distributions of Mean Differences The Six Steps of the Paired-Samples t Test Calculating a Confidence Interval for a Paired-Samples t Test Calculating Effect Size for a Paired-Samples t Test 201

202 CHAPTER 9 The Single-Sample t Test and the Paired-Samples t Test Radius Images/Alamy Holiday Weight Gain and Two-Group Studies Two-group studies indicate that the average holiday weight gain by college students is less than many people believe, only about 1 pound. MASTERING THE CONCEPT 9-1: There are three types of t tests. We use a single-sample t test when we are comparing a sample mean to a population mean but do not know the population standard deviation. We use a paired-samples t test when we are comparing two samples and every participant is in both samples a within-groups design. We use an independent-samples t test when we are comparing two samples and every participant is in only one sample a between-groups design. Upon arriving at college, many undergraduate students are faced with what might seem like incessant warnings about weight gain, including the dreaded freshman 15, the waistline expansions resulting from spring break excess, and the pounds put on over the annual winter holidays. In North America, as in many other parts of the world, the winter holiday season is a time when family food traditions take center stage. These holiday foods usually are readily available, beautifully presented, and high in calories. Popular wisdom suggests that many Americans add 5 to 7 pounds to their body weight over the holiday season. But before/after studies suggest a far more modest increase: a weight gain of just over 1 pound (Hull, Radley, Dinger, & Fields, 2006; Roberts & Mayer, 2000; Yanovski et al, 2000). A 1-pound weight gain over the holidays might not seem so bad, but weight gained over the holidays tends to stay (Yanovski et al, 2000). The data provide other insights about holiday weight gain. For example, female students at the University of Oklahoma gained a little less than 1 pound, male students a little more than 1 pound, and students who were already overweight gained an average of 2.2 pounds (Hull et al, 2006). The fact that researchers used two groups in their study students before the holidays and students after the holidays is significant for this chapter. The versatility of the t distributions allows us to compare two groups. We can compare one sample to a population when we don t know all of the details about the parameters, and we can compare two samples to each other. There are two ways to compare two samples: we can use a within-groups design (as when the same people are weighed before and after the holidays) or a between-groups design (as when different people are in the preholiday sample and the post-holiday sample). Whether we use a within-groups design or a between-groups design to collect the data for two groups, we use a t test. For a within-groups design, we use a paired-samples t test. Because the steps for a pairedsamples t test are similar to those for a single-sample t test, we learn about these two tests in this chapter. For a between-groups design, we use an independent-samples t test. Because the calculations for an independent-samples t test are a bit different from the first two types of t tests, we learn about that test in Chapter 10. The t Distributions When we compare the average weight of a sample of people before the holidays to the average weight of a sample of people after the holidays, we are concerned about whether the samples are fair representations of the larger populations. The t test, based on the t distributions, tells us how confident we can be that what we have learned from our samples generalizes to the larger populations.

CHAPTER 9 The Single-Sample t Test and the Paired-Samples t Test 203 The t distributions are used when we don t have enough information to use the z distribution. Specifically, we have to use a t distribution when we don t know the population standard deviation or when we compare two samples to one another. As Figure 9-1 demonstrates, there are many t distributions one for each possible sample size. As the sample size gets smaller, we become less certain about what the population distribution really looks like, and the t distributions become flatter and more spread out. However, as the sample size gets bigger, the t distributions begin to merge with the z distribution because we gain confidence as more and more participants are added to our study. 0 Standard, normal z distribution t distribution, 30 individuals t distribution, 8 individuals t distribution, 2 individuals Estimating Population Standard Deviation from the Sample Before we can conduct a t test, we have to estimate the standard deviation. To do this, we use the standard deviation of the sample data to estimate the standard deviation of the entire population. Estimating the standard deviation is the only practical difference between conducting a z test with the z distribution and conducting a t test with a t distribution. Here is the standard deviation formula that we have used up until now with a sample: MASTERING THE CONCEPT FIGURE 9-1 The Wider and Flatter t Distributions For smaller samples, the t distributions are wider and flatter than the z distribution. As the sample size increases, however, the t distributions approach the shape of the z distribution. In this figure, the t distribution most similar to the z distribution is that for a sample of approximately 30 individuals. This makes sense because a distribution derived from a larger sample size would be more likely to be similar to that of the entire population than one derived from a smaller sample size. 9-2: We use a t distribution instead of a z distribution when sampling requires us to estimate the population standard deviation from the sample standard deviation or when we compare two samples to one another. SD R( X M) 2 N We need to make a correction to this formula to account for the fact that there is likely to be some level of error when we re estimating the population standard deviation from a sample. Specifically, any given sample is likely to have somewhat less spread than is the entire population. One tiny alteration of this formula leads to the slightly larger standard deviation of the population that we estimate from the standard deviation of the sample. Instead of dividing by N, we divide by (N 1) to get the mean of the squared deviations. Subtraction is the key. Dividing by a slightly smaller number, (N 1), instead of by N increases the value of the standard deviation. For example, if the numerator was 90 and the denominator (N) was 10, the answer is 9; if we divide by (N 1) (10 1) 9, the answer is 10, a slightly larger value. So the formula for estimating the standard deviation of the population from the standard deviation of the sample is: MASTERING THE FORMULA s R( X M) ( N 1) 2

204 CHAPTER 9 The Single-Sample t Test and the Paired-Samples t Test Corbis Notice that we call this standard deviation s instead of SD. It still uses Latin rather than Greek letters because it is a statistic (from a sample) rather than a parameter (from a population). From now on, we will calculate the standard deviation in this way (because we will be using the sample standard deviation to estimate the population standard deviation), and we will be calling our standard deviation s. Let s apply the new formula for standard deviation to an everyday situation that many of us can relate to: multitasking. This formula marks an important step in conducting a t test. Researchers conducted a study in which employees were observed at one of two high-tech companies for over 1000 hours (Mark, Gonzalez, & Harris, 2005). The employees spent Multitasking If multitasking reduces productivity in a sample, we can just 11 minutes, on average, on one project before an interruption. Moreover, after each interruption, they needed an statistically determine the likelihood that multitasking reduces productivity among a much larger population. average of 25 minutes to get back to the original project! So even though a person who is busy multitasking appears to be productive, maybe the underlying reality is that multitasking actually reduces overall productivity. How can we use a t test to determine the effects of multitasking on productivity? Suppose you were a manager at one of these firms and decided to reserve a period from 1:00 to 3:00 each afternoon during which employees could not interrupt one another, but they might still be interrupted by phone calls or e-mails from people outside the company. To test your intervention, you observe five employees during these periods and develop a score for each the time he or she spent on a selected task before being interrupted. Here are your fictional data: 8, 12, 16, 12, and 14 minutes. In this case, we are treating 11 minutes as the population mean, but we do not know the population standard deviation. As a key step in conducting a t test, we need to estimate the standard deviation of the population from the sample. EXAMPLE 9.1 To calculate our estimated standard deviation for the population, there are two steps. Even though we are given a population STEP 1. Calculate the sample mean. mean (i.e., 11), we use the sample mean to calculate the corrected standard deviation for the sample. The mean for these 5 sample scores is: ( 8 M 12 16 12 14) 12. 4 5 STEP 2. Use this sample mean in the corrected formula for the standard deviation. s R( X M) ( N 1) 2 Remember, the easiest way to calculate the numerator under the square root sign is by first organizing our data into columns, as shown here:

CHAPTER 9 The Single-Sample t Test and the Paired-Samples t Test 205 X X M (X M ) 2 8 4.4 19.36 12 0.4 0.16 16 3.6 12.96 12 0.4 0.16 14 1.6 2.56 Thus, the numerator is: R(X M) 2 R(19.36 0.16 12.96 0.16 2.56) 35.2 And given a sample size of 5, the corrected standard deviation is: s R( X M) ( N 1) 2 35. 2 88. 297. ( 5 1) INACIO ROSA/epa/Corbis OLIVER WEIKEN/epa/Corbis A Simple Correction: N 1 When estimating variability, subtracting one person from a sample of four makes a big difference. Subtracting one person from a sample of thousands makes only a small difference. Calculating Standard Error for the t Statistic After we make the correction, we have an estimate of the standard deviation of the distribution of scores but not an estimate of the spread of a distribution of means, the standard error. As we did with the z distribution, we need to make our spread smaller to reflect the fact that a distribution of means is less variable than a distribution of scores. We do this in exactly the same way that we adjusted for the z distribution. We divide s by N. The formula for the standard error as estimated from a sample, therefore, is: MASTERING THE FORMULA s

206 CHAPTER 9 The Single-Sample t Test and the Paired-Samples t Test Notice that we have replaced r with s because we are using the corrected standard deviation from the sample rather than the actual standard deviation from the population. EXAMPLE 9.2 Here s how we would convert our corrected standard deviation of 2.97 (from the data above on minutes before an interruption) to a standard error. Our sample size was 5, so we divide by the square root of 5: s s M N 297. 5 133. So the appropriate standard deviation for the distribution of means that is, its standard error is 1.33. Just as the central limit theorem predicts, the standard error for the distribution of sample means is smaller than the standard deviation of sample scores (1.33 2.97). (Note: This step leads to one of the most common mistakes that we see among our students. Because we have implemented a correction when calculating s, students want to implement an extra correction here by dividing by ( N 1). Do not do this! We still divide by N in this step. We are making our standard deviation smaller to reflect the size of the sample; there is no need for a further correction to the standard error.) Using Standard Error to Calculate the t Statistic Once we know how to estimate the population standard deviation from our sample and then use that to calculate standard error, we have all the tools necessary to conduct a t test. The simplest type of t test is the single-sample t test. We introduce the formula for that t statistic here, and in the next section we go through all six steps for a single-sample t test. The formula to calculate the t statistic for a single-sample t test is identical to that for the z statistic, except that it uses the estimated standard error rather than the actual standard error of the population of means. So, the t statistic indicates the distance of a sample mean from a population mean in terms of the standard error. That distance is expressed numerically as the estimated number of standard errors between the two means. Here is the formula for the t statistic for a distribution of means: MASTERING THE FORMULA 9-3: The formula for the t statistic (M l M) is: t s. It only differs from M the formula for the z statistic in that we use s M instead of r M, because we re using the sample to estimate standard error rather than using the actual population standard error. Note that the denominator is the only difference between this formula for the t statistic and the formula used to compute the z statistic for a sample mean. The corrected denominator makes the t statistic smaller and thereby reduces the probability of observing an extreme t statistic. That is, a t statistic is not as extreme as a z statistic; in scientific terms, it s more conservative. M EXAMPLE 9.3 The t statistic for our sample of 5 scores representing minutes until interruptions is: 1. 33 As part of the six steps of hypothesis testing, this t statistic, 1.05, can help us make an inference about whether the communication ban from 1:00 to 3:00 affected the average number of minutes until an interruption.

CHAPTER 9 The Single-Sample t Test and the Paired-Samples t Test 207 As with the z distribution, statisticians have developed t tables that include probabilities under any area of the t curve. We provide you with a t table for many different sample sizes in Appendix B. The t table includes only the percentages of most interest to researchers those indicating the extreme scores that suggest large differences between groups. CHECK YOUR LEARNING Reviewing the Concepts > The t distributions are used when we do not know the population standard deviation or are comparing only two groups. > The two groups may be a sample and a population, or the two groups may be two samples as part of a within-groups design or a between-groups design. > Because we do not know the population standard deviation, we must estimate it, and estimating invites the possibility of more error. > The formula for the t statistic for a single-sample t test is the same as the formula for the z statistic for a distribution of means, except that we use estimated standard error in the denominator rather than the actual standard error for the population. Clarifying the Concepts 9-1 What is the t statistic? 9-2 Briefly describe the three different t tests. Calculating the Statistics 9-3 Calculate the standard deviation for a sample (SD) and as an estimate of the population (s) using the following data: 6, 3, 7, 6, 4, 5. 9-4 Calculate the standard error for t for the data given in Check Your Learning 9-3. Applying the Statistics 9-5 In our discussion of a study on multitasking (Mark et al, 2005), we imagined a followup study in which five employees were observed following a communication ban from 1:00 to 3:00. For each of the five employees, one task was selected. Let s now examine the time until work on that task was resumed. The fictional data for the 5 employees were 20, 19, 27, 24, and 18 minutes until work on the given task was resumed. Remember that the original research showed it took 25 minutes on average for an employee to return to a task after being interrupted. Solutions to these Check Your Learning questions can be found in Appendix D. a. What distribution will be used in this situation? Explain your answer. b. Determine the appropriate mean and standard deviation (or standard error) for this distribution. Show all your work; use symbolic notation and formulas where appropriate. c. Calculate the t statistic. The Single-Sample t Test A before/after comparison of weight change over the holidays is only one of the interesting comparisons we can make using the t statistic. There might also be regional differences in how much people weigh. For example, the t statistic can be used to compare the average weight from a sample of people in a particular region with the national average. To answer that kind of question, we now demonstrate how to conduct a single-sample t test that uses a distribution of means. A single-sample t test is a hypothesis test in which we compare data from one sample to a population for which we know the mean but not the standard deviation. The only thing we The t statistic indicates the distance of a sample mean from a population mean in terms of the standard error. A single-sample t test is a hypothesis test in which we compare data from one sample to a population for which we know the mean but not the standard deviation.

208 CHAPTER 9 The Single-Sample t Test and the Paired-Samples t Test need to know to use a single-sample t test is the population mean. We begin with the single-sample t test because understanding it will help us when using the more sophisticated t tests that let us compare two samples. The t Table and Degrees of Freedom When we use the t distributions, we use the t table. There are different t distributions for every sample size, so we must take sample size into account when using the t table. However, we do not look up our actual sample size on the table. Rather, we look up degrees of freedom, the number of scores that are free to vary when estimating a population parameter from a sample.the phrase free to vary refers to the number of scores that can take on different values if we know a given parameter. EXAMPLE 9.4 MASTERING THE FORMULA 9-4: The formula for degrees of freedom for a single-sample t test is: df N 1. To calculate degrees of freedom, we subtract 1 from the sample size. For example, the manager of a baseball team needs to assign nine players to particular spots in the batting order but only has to make eight decisions (N 1). Why? Because only one option remains after making the first eight decisions. So before the manager makes any decisions, there are N 1, or 9 1 8, degrees of freedom. After the second decision, there are N 1, or 8 1 7, degrees of freedom, and so on. As in the baseball example, there is always one score that cannot vary once all of the others have been determined. For example, if we know the mean of four scores is 6 and we know that three of the scores are 2, 4, and 8, then the last score must be 10. So the degrees of freedom is the number of scores in the sample minus 1; there is always one score that cannot vary. Degrees of freedom is written in symbolic notation as df, which is always italicized. The formula for degrees of freedom for a single-sample t test, therefore, is: df N 1 This is one key piece of information to keep in mind as we work with the t table. In the behavioral sciences, the degrees of freedom usually correspond to how many people are in the study or how many observations we make. Table 9-1 is an excerpt from a t table, but an expanded table is included in Appendix B. Consider the relation between degrees of freedom and the cutoff, or critical value, needed to declare statistical significance. In the column corresponding to a onetailed test at a p level of 0.05 with only 1 degree of freedom, the critical t value is TABLE 9-1. Excerpt from the t Table When conducting hypothesis testing, we use the t table to determine critical values for a given p level, based on the degrees of freedom and whether the test is one- or two-tailed. Degrees of freedom is the number of scores that are free to vary when estimating a population parameter from a sample. One-Tailed Tests Two-Tailed Tests df 0.10 0.05 0.01 0.10 0.05 0.01 1 3.078 6.314 31.821 6.314 12.706 63.657 2 1.886 2.920 6.965 2.920 4.303 9.925 3 1.638 2.353 4.541 2.353 3.182 5.841 4 1.533 2.132 3.747 2.132 2.776 4.604 5 1.476 2.015 3.365 2.015 2.571 4.032

CHAPTER 9 The Single-Sample t Test and the Paired-Samples t Test 209 6.314. With only 1 degree of freedom, the two means have to be extremely far apart and the standard deviation has to be very small in order to declare that a statistically significant difference exists. But with 2 degrees of freedom (two observations), the critical t value drops to 2.920. With 2 degrees of freedom, the two means don t have to be quite so far apart or the standard deviation so small. That is, it is easier to reach the critical value of 2.920 needed to declare that there is a statistically significant difference. We re more confident with two observations than with just one. Now notice what happens when we increase the number of observations once again from two observations to three observations. The critical t value needed to declare statistical significance once again decreases, from 2.920 to 2.353. Our level of confidence in our observation increases with each additional observation; at the same time, the critical value decreases, becoming closer and closer to the related cutoff on the z distribution. The t distributions become closer to the z distribution as sample size increases. When the sample size is large enough, the standard deviation of a sample is more likely to be equal to the standard deviation of the population. At large enough sample sizes, in fact, the t distribution is identical to the z distribution. Most t tables include a sample size of infinity ( ) to indicate a very large sample size (a sample size of infinity itself is, of course, impossible). The t statistics at extreme percentages for very large sample sizes are identical to the z statistics at the very same percentages. Check it out for yourself by comparing the z and t tables in Appendix B. For example, the z statistic for the 95th percentile a percentage between the mean and the z statistic of 45% is between 1.64 and 1.65; at a sample size of infinity, the t statistic for the 95th percentile is 1.645. MASTERING THE CONCEPT Let s remind ourselves why the t statistic merges with the z statistic as sample size increases. The underlying principle is easy to understand: 9-3: As sample size increases, the t more observations lead to greater confidence. Thus, more distributions more and more closely participants in a study if they are a representative sample correspond to increased confidence that we are making an accurate observation. So don t think of the t distributions as completely separate from think of the z statistic as a single-blade approximate the z distribution. You can the z distribution. Rather, think of the z statistic as a single-blade Swiss Swiss Army knife and the t statistic as a Army knife and the t statistic as a multiblade Swiss Army knife that still includes the single blade that is the z statistic. multiblade Swiss Army knife that includes Let s determine the cutoffs, or critical t value(s), for two research the single blade that is the z statistic. studies. For the first study, you may use the excerpt in Table 9-1. The second study requires the full t table in Appendix B. The study: A researcher collects Stroop reaction times for five participants who have had reduced sleep for three nights. She wants to compare this sample to the known population mean. Her research hypothesis is that the lack of sleep will slow participants down, leading to an increased reaction time. She will use a p level of 0.05 to determine her critical value. The cutoff(s):this is a one-tailed test because the research hypothesis posits a change in only one direction an increase in reaction time. There will be only a positive critical t value because we are hypothesizing an increase. There are five participants, so the degrees of freedom is: EXAMPLE 9.5 df N 1 5 1 4 Her stated p level is 0.05. When we look in the t table under one-tailed tests, in the column labeled 0.05 and in the row for a df of 4, we see a critical value of 2.132. This is our critical t value.

210 CHAPTER 9 The Single-Sample t Test and the Paired-Samples t Test EXAMPLE 9.6 The study: A researcher knows the mean number of calories a rat will consume in half an hour if unlimited food is available. He wonders whether a new food will lead rats to consume a different number of calories either more or fewer. He studies 38 rats and uses a conservative critical value based on a p level of 0.01. The cutoff(s): This is a two-tailed test because the research hypothesis allows for change in either direction. There will be both negative and positive critical t values. There are 38 rats, so the degrees of freedom is: df N 1 38 1 37 His stated p level is 0.01. We want to look in the t table under two-tailed tests, in the column for 0.01 and in the row for a df of 37; however, there is no df of 37. In this case, we err on the side of being more conservative and choose the more extreme (i.e., larger) of the two possible critical t values, which is always the smaller df. Here, we look next to 35, where we see a value of 2.724. Because this is a two-tailed test, we will have critical values of 2.724 and 2.724. Be sure to list both values. The Six Steps of the Single-Sample t Test Now we have all the tools necessary to conduct a single-sample t test. So let s consider a hypothetical study and conduct all six steps of hypothesis testing. EXAMPLE 9.7 Chapter 4 presented data that included the mean number of sessions attended by clients at a university counseling center. We noted that one study reported a mean of 4.6 sessions (Hatchett, 2003). Let s imagine that the counseling center hoped to increase participation rates by having students sign a contract to attend at least 10 sessions. Five students sign the contract, and these students attend 6, 6, 12, 7, and 8 sessions, respectively. The researchers are interested only in their university, so they treat the mean of 4.6 sessions as a population mean. Zigy Kaluzny/Getty Images Nonparticipation in Therapy Clients missing appointments can be a problem for their therapists. A t test can compare the consequences between those who do and those who do not commit themselves to participating in therapy for a set period. STEP 1. Identify the populations, distribution, and assumptions. Population 1: All clients at this counseling center who sign a contract to attend at least 10 sessions. Population 2: All clients at this counseling center who do not sign a contract to attend at least 10 sessions. The comparison distribution will be a distribution of means. The hypothesis test will be a single-sample t test because we have only one sample and we know the population mean but not the population standard deviation. This study meets one of the three assumptions and may meet the other two: (1) The dependent variable is scale. (2) We do not know whether the data were randomly selected, however, so we must be cautious with respect to generalizing to other clients at this university who might sign the contract. (3) We do not know whether the population is normally distributed, and there are not at least 30 participants. However, the data from our sample do not suggest a skewed distribution. STEP 2. State the null and research hypotheses. Null hypothesis: Clients at this university who sign a contract to attend at least 10 sessions attend the same number of sessions, on average, as clients who do not sign such a contract H 0 : l 1 l 2.

CHAPTER 9 The Single-Sample t Test and the Paired-Samples t Test 211 Research hypothesis: Clients at this university who sign a contract to attend at least 10 sessions attend a different number of sessions, on average, from clients who do not sign such a contract H 1 : l 1 l 2. STEP 3. Determine the characteristics of the comparison distribution. l M 4.6; s M 1.114 Calculations: l M l 4.6 R X ( 6 M 6 12 7 8) 78. N 5 X X M (X M ) 2 6 1.8 3.24 6 1.8 3.24 12 4.2 17.64 7 0.8 0.64 8 0.2 0.04 The numerator is the sum of the squares: R(X M) 2 R(3.24 3.24 17.64 0.64 0.04) 24.8 s R( X M) ( N 1) 2 24. 8 62. 2490. ( 5 1) s s M N 2490. 5 1114. STEP 4. Determine the critical values, or cutoffs. df N 1 5 1 4 For a two-tailed test with a p level of 0.05 and a df of 4, the critical values are 2.776 and 2.776 (as seen in the curve in Figure 9-2). 2.50% 2.50% 2.776 2.776 FIGURE 9-2 Determining Cutoffs for a t Distribution As with the z distribution, we typically determine critical values in terms of t statistics rather than means of raw scores so that we can easily compare a test statistic to them to determine whether the test statistic is beyond the cutoffs. Here, the cutoffs are 2.776 and 2.776, and they mark off the most extreme 5%, with 2.5% in each tail.

212 CHAPTER 9 The Single-Sample t Test and the Paired-Samples t Test STEP 5. Calculate the test statistic. ( M M t l ) ( 78. 46. ) 2873. s 1114. M Reject the null hypothesis; it appears that STEP 6. Make a decision. counseling center clients who sign a contract to attend at least 10 sessions do attend more sessions, on average, than do clients who do not sign such a contract (see Figure 9-3). FIGURE 9-3 Making a Decision To decide whether to reject the null hypothesis, we compare our test statistic to our critical t values. In this figure, the test statistic, 2.873, is beyond the cutoff of 2.776, so we can reject the null hypothesis. 2.50% 2.50% 2.776 2.776 2.873 After completing our hypothesis test, we want to present the primary statistical information in a report. There is a standard American Psychological Association (APA) format for the presentation of statistics across the behavioral sciences so that the results are easily understood by the reader. You ll notice this format in almost every journal article that reports results of a social science study: 1. Write the symbol for the test statistic (e.g., t). 2. Write the degrees of freedom, in parentheses. 3. Write an equal sign and then the value of the test statistic, typically to two decimal places. 4. Write a comma and then the exact p value associated with the test statistic. (Note that we must use software to get the exact p value. For now, we can just say whether our p value is less than the p level of 0.05.) In our example, then we reject the null hypothesis and the statistics would read: t(4) 2.87, p 0.05 The statistic typically follows a statement about the finding, after a comma or in parentheses: for example, It appears that counseling center clients who sign a contract to attend at least 10 sessions, on average, do attend more sessions than do clients who do not sign such a contract. The report would also include the sample mean and the standard deviation (not the standard error) to two decimal points. The descriptive statistics, typically in parentheses, would read, for our example: (M 7.80, SD 2.49). Notice that, due to convention, we use SD instead of s to symbolize the standard deviation. As with a z test, we could present p rep as an alternative to the p value, a change that has been encouraged by the Association for Psycho logical Science (APS). The t table in Appendix B only includes the p values of.10,.05, and.01, so we cannot use it to determine the actual p value for our test statistic. In the SPSS section of this chapter, we show you

CHAPTER 9 The Single-Sample t Test and the Paired-Samples t Test 213 how you can use SPSS to determine the specific p value for the test statistic. Then we can insert the p value into the Excel formula introduced in Chapter 8 to determine p rep : NORMSDIST (NORMSINV (1-P)/ (SQRT(2))). This procedure is the same for all three kinds of t tests single- sample, paired-samples, and independent-samples t tests. Calculating a Confidence Interval for a Single-Sample t Test As with a z test, the APA recommends that researchers report confidence intervals and effect sizes, in addition to the results of hypothesis tests, whenever possible. We can calculate a confidence interval with the single-sample t test data. The population mean was 4.6, and we used the sample to estimate the population standard deviation to be 2.490 and the population standard error to be 1.114. The five students in the sample attended a mean of 7.8 sessions. When we conducted hypothesis testing, we centered our curve around the mean according to the null hypothesis the population mean of 4.6. We determined critical values based on this mean and compared our sample mean to these cutoffs. We were able to reject the null hypothesis that there was no mean difference between the two groups. The test statistic was beyond the cutoff z statistic. Now we can use the same information to calculate the 95% confidence interval around the sample mean of 7.8. Step 1: Draw a picture of a t distribution that includes the confidence interval. We draw a normal curve (see Figure 9-4) that has the sample mean, 7.8, at its center (instead of the population mean, 4.6). Step 2: Indicate the bounds of the confidence interval on the drawing. We draw a vertical line from the mean to the top of the curve. For a 95% confidence interval, we also draw two much smaller vertical lines indicating the middle 95% of the t distribution (2.5% in each tail for a total of 5%). We then write the appropriate percentages under the segments of the curve. The curve is symmetric, so half of the 95% falls above and half falls below the mean. Thus, 47.5% falls on each side of the mean between the mean and the cutoff, and 2.5% falls in each tail. Step 3: Look up the t statistics that fall at each 2.50% line marking the middle 95%. For a two-tailed test with a p level of 0.05 and df of 4, the critical values are 2.776 and 2.776. We can now add these t statistics to our curve, as seen in Figure 9-5. Step 4: Convert the t statistics back into raw means. As we did with the z test, we can use formulas for this conversion, but first we must identify the appropriate mean MASTERING THE CONCEPT 9-4: Whenever researchers conduct a hypothesis test, the APA encourages that, if possible, they also calculate a confidence interval and an effect size. 47.50% 47.50% 2.50% 7.8 FIGURE 9-4 A 95% Confidence Interval for a Single-Sample t Test, Part I To begin calculating a confidence interval for a single-sample t test, we place the sample mean, 7.8, at the center of a curve and indicate the percentages within and beyond the confidence interval. 47.50% 47.50% 2.50% 2.50% 2.776 (0) 2.776 FIGURE 9-5 A 95% Confidence Interval for a Single-Sample t Test, Part II The next step in calculating a confidence interval for a singlesample t test is to identify the t statistics that indicate each end of the interval. Because the curve is symmetric, the t statistics have the same magnitude one is negative, 2.776, and one is positive, 2.776. The t statistic at the mean is always 0.

2.50% 47.50% 47.50% 4.71 7.8 10.89 2.50% FIGURE 9-6 A 95% Confidence Interval for a Single-Sample t Test, Part III The final step in calculating a confidence interval for a single-sample t test is to convert the t statistics that indicate each end of the interval into raw means, 4.71 and 10.89. and standard deviation. There are two important points to remember. First, we center our interval around the sample mean (not the population mean). So we use the sample mean of 7.8 in our calculations. Second, because we have a sample mean (rather than an individual score), we use a distribution of means. So we use the standard error of 1.114 as our measure of spread. Using this mean and standard error, we can calculate the raw mean at each end of the confidence interval, the lower end and the upper end, and add them to our curve as in Figure 9-6. The formulas are exactly the same as for the z test except that z is replaced by t and r M is replaced by s M. MASTERING THE FORMULA 9-5: The formula for the lower bound of a confidence interval for a single-sample t test is M lower t(s M ) M sample.the formula for the upper bound of a confidence interval for a single-sample t test is M upper t(s M ) M sample.the only differences from those for a z test are that in each formula z is replaced by t and r M is replaced by s M. MASTERING THE FORMULA M lower t(s M ) M sample 2.776(1.114) 7.8 4.71 M upper t(s M ) M sample 2.776(1.114) 7.8 10.89 Our 95% confidence interval, reported in brackets as is typical, is [4.71, 10.89]. Step 5: Check that the confidence interval makes sense. The sample mean should fall exactly in the middle of the two ends of the interval. 4.71 7.8 3.09 and 10.89 7.8 3.09 We have a match. The confidence interval ranges from 3.09 below the sample mean to 3.09 above the sample mean. If we were to sample five students from the same population over and over, the 95% confidence interval would include the population mean 95% of the time. Note that the population mean, 4.6, does not fall within this interval. This means it is not plausible that this sample of students who signed contracts came from the population according to the null hypothesis students seeking treatment at the counseling center who did not sign a contract. We can conclude that the sample comes from a different population; that is, we can conclude that these students attended more sessions than did the general population. As with the z test, the conclusions from both the single-sample t test and the confidence interval are the same, but the confidence interval gives us more information an interval estimate, not just a point estimate. Calculating Effect Size for a Single-Sample t Test As with a z test, we can calculate the effect size (Cohen s d) for a single-sample t test. Let s calculate it for the counseling center study. Similar to what we did with the z test, we simply use the formula for the t statistic, substituting s for s M (and l for l M, even though these means are always the same). This means we use 2.490 instead of 1.114 in the denominator. The Cohen s d is now based on the spread of the distribution of individual scores, rather than the distribution of means. Cohen s d ( M l) ( 7. 8 4. 6) s 2. 490 1. 29 Our effect size, d 1.29, tells us that our sample mean and the population mean are 1.29 standard deviations apart. According to the conventions we learned in Chapter 8 (0.2 is a small effect; 0.5 is a medium effect; 0.8 is a large effect), this is a large effect. We can add the effect size when we report the statistics as follows: t(4) 2.87, p 0.05, d 1.29.

CHAPTER 9 The Single-Sample t Test and the Paired-Samples t Test 215 CHECK YOUR LEARNING Reviewing the Concepts > A single-sample t test is a hypothesis test in which we compare data from one sample to a population for which we know the mean but not the standard deviation. > We consider degrees of freedom, or the number of scores that are free to vary, instead of N when we assess estimated test statistics against distributions. > As sample size increases, our confidence in our estimates improves, degrees of freedom increase, and the critical value for t drops, making it easier to reach statistical significance. In fact, as sample size grows, the t distributions approach the z distribution. > As with any hypothesis test, we identify the populations and comparison distribution and check the assumptions. We then state the null and research hypotheses. We next determine the characteristics of the comparison distribution, a distribution of means based on the null hypothesis. We must first estimate the standard deviation from our sample; then we must calculate the standard error. We then determine critical values, usually for a two-tailed test with a p level of 0.05. The test statistic is then calculated and compared to these critical values, or cutoffs, to determine whether to reject or fail to reject the null hypothesis. > We can determine p rep for our test statistic as an alternative to p value. > We can calculate a confidence interval and an effect size, Cohen s d, for a single-sample t test. Clarifying the Concepts 9-6 Explain the term degrees of freedom. 9-7 Why does a single-sample t test have more uses than a z test? Calculating the Statistics 9-8 Compute degrees of freedom for each of the following: a. An experimenter times how long it takes 35 rats to run through a maze with 8 pathways. b. Test scores for 14 students are collected and averaged over 4 semesters. 9-9 Identify the critical t value for each of the following tests: a. A two-tailed test with alpha of 0.05 and 11 degrees of freedom b. A one-tailed test with alpha of 0.01 and N of 17 Applying the Concepts Solutions to these Check Your Learning questions can be found in Appendix D. 9-10 Let s assume that according to university summary statistics, the average student misses 3.7 classes during a semester. Imagine the data you have been working with (6, 3, 7, 6, 4, 5) are the number of classes missed by a group of students. Conduct all six steps of hypothesis testing, assuming a two-tailed test with a p level of 0.05. (Note: The work for step 3 has already been completed in Check Your Learning exercises 9-3 and 9-4.) The Paired-Samples t Test Researchers found that weight gain over the holidays was far less than once thought. The dreaded freshman 15 also appears to be an exaggerated myth. It s really less than 4 pounds, on average. One study sampled college students at a northeastern university and compared their weight at the beginning of the fall semester with how much they weighed by November (Holm-Denoma, Joiner, Vohs, & Heatherton, 2008). Male students gained an average of 3.5 pounds and female students gained an average of 4.0 pounds. These types of before/after comparisons can be tested by using the pairedsamples t test.

216 CHAPTER 9 The Single-Sample t Test and the Paired-Samples t Test The paired-samples t test is used to compare two means for a within-groups design, a situation in which every participant is in both samples; also called a dependentsamples t test. The paired-samples t test (also called dependent-samples t test) is used to compare two means for a within-groups design, a situation in which every participant is in both samples. Although both terms paired and dependent are used frequently, we use the term paired in this book because it matches the language of some of the most-used statistical software packages. If an individual in the study participates in both conditions (such as a memory task after ingesting a caffeinated beverage and again after ingesting a noncaffeinated beverage), then her score in one depends on her score in the other. That s when we use the dependent-samples, or paired-samples, t test. Once you understand the single-sample t test, the paired-samples t test is simple. The major difference in the paired-samples t test is that we must create difference scores for every individual. FIGURE 9-7 Creating a Distribution of Mean Differences This distribution is one of many that could be created by pulling 30 mean differences, the average of three differences between pairs of weights, pulled one a time from a population of pairs of weights one pre-holiday and one post-holiday. The population used here was one based on the null hypothesis, that there is no average difference in weight from before the holidays to after the holidays. Frequency 8 7 6 5 4 3 2 1 0 Distributions of Mean Differences We already have learned about a distribution of scores and a distribution of means. Now we need to develop a distribution of mean differences so that we can establish a distribution that specifies the null hypothesis. Let s use pre- and post-holiday weight data to demonstrate how to create a distribution of mean differences, the distribution that accompanies a within-groups design. Imagine that many college students weights were measured before and after the winter holidays to determine if they changed, and you planned to gather data on a sample of three people. Imagine that you have two cards for each person on which weights are listed one before the holidays and one after the holidays. So you have many pairs of cards, one for each student. First, you would randomly choose three pairs of cards. For each pair, you d subtract the first weight from the second weight to calculate a difference score, and then you would calculate the mean of the differences in weights for these three people. Then you would randomly choose another three people from the population of many college students and calculate the mean of their three difference scores. And then you d do it again, and again, and again. That s really all there is to it, except that we do this procedure many more times (although we are simplifying a bit here; we would actually replace every pair of cards before selecting the next). So there are two samples college students before the holidays and college students after the holidays but we re building just one curve of mean differences. Let s say the first student weighed 140 pounds before the holidays and 144 pounds after the holidays; the difference between weights would be 144 140 4. Next, we put all the cards back and repeat the process. Let s say that this time we have weights of 126 pounds before the holidays and 124 pounds after the holidays; now the difference between means would be 124 126 2. A third student might weigh 168 both before and after the holidays for a difference between means of 0. We would take the mean of these three different scores, 0.667. We would then choose three more students and calculate the mean of their difference scores. Eventually, we would have many mean differences some positive, some negative, and some right at 0 and could plot them on a curve. But this would only be the beginning of what this distribution would look like. If we were to calculate the whole distribution, then we would do this an 5 4 3 2 1 0 1 2 3 4 5 Mean weight differences (in pounds) uncountable number of times. When the authors calculated 30 mean differences for pairs of weights, we got the distribution in Figure 9-7

(we plotted means rounded to whole numbers). If no mean difference is found when comparing weights from before and after the holidays, as with the data we used to create Figure 9-7, the distribution would center around 0. According to the null hypothesis, we would expect no difference in average weight from before the holidays to after the holidays. The Six Steps of the Paired-Samples t Test In a paired-samples t test, each participant has two scores one in each condition. When we conduct a paired-samples t test, we write the pairs of scores in two columns, side by side next to the same individual. We then subtract each score in one column from its paired score in the other column to create difference scores. Ideally, a positive difference score indicates an increase, and a negative difference score indicates a decrease. Typically, we subtract the first score from the second so that our difference scores match this logic. Now we implement the steps of the single-sample t test with only minor changes, as discussed in the steps below. Large Monitors and Productivity Microsoft researchers and cognitive psychologists (Czerwinski et al, 2003) reported a 9% increase in productivity when research volunteers used an extremely large 42-inch display versus a more typical 15-inch display. Every participant used both displays and thus was in both samples. A paired-samples t test is the appropriate hypothesis test for this within-groups design. Courtesy of Microsoft Research Let s try an example from the social sciences. Computers and software companies often employ social scientists to research ways their products can better benefit users. For example, Microsoft researchers studied how 15 volunteers performed on a set of tasks under two conditions. The researchers compared the volunteers performance on the tasks while using a 15-inch computer monitor and while using a 42-inch monitor (Czerwinski et al, 2003). The 42-inch monitor, far larger than most of us have ever used, allows the user to have multiple programs in view at the same time. Here are five participants fictional data, which reflect the actual means reported by researchers. Note that a smaller number is good it indicates a faster time. The first person completed the tasks on the small monitor in 122 seconds and on the large monitor in 111 seconds; the second person in 131 and 116; the third in 127 and 113; the fourth in 123 and 119; and the fifth in 132 and 121. EXAMPLE 9.8 STEP 1. Identify the populations, distribution, and assumptions. The paired-samples t test is like the single-sample t test in that we analyze a single sample of scores. For the single-sample t test, we use individual scores; for the paired-samples t test, we use difference scores. For the paired-samples t test, one population is reflected by each condition, but the comparison distribution is a distribution of mean difference scores (rather than a distribution of means). The comparison distribution is the same as with the single-sample t test; it is based on the null hypothesis that posits no difference. So the mean of the comparison distribution is 0; this indicates a mean difference score of 0. The assumptions are the same as for the single-sample t test. Summary: Population 1: People performing tasks using a 15-inch monitor. Population 2: People performing tasks using a 42-inch monitor. The comparison distribution will be a distribution of mean difference scores based on the null hypothesis. The hypothesis test will be a paired- samples t test because we have two samples of scores, and every individual contributes a score to each sample. MASTERING THE CONCEPT 9-5: The steps for the paired-samples t test are very similar to those for the singlesample t test. The main difference is that we are comparing the sample mean difference between scores to that for the mean difference for the population according to the null hypothesis, rather than comparing the sample mean of individual scores to the population mean.

218 CHAPTER 9 The Single-Sample t Test and the Paired-Samples t Test This study meets one of the three assumptions and may meet the other two: (1) The dependent variable is time, which is scale. (2) The participants were not randomly selected, however, so we must be cautious with respect to generalizing our findings. (3) We do not know whether the population is normally distributed, and there are not at least 30 participants. However, the data from our sample do not suggest a skewed distribution. STEP 2. State the null and research hypotheses. This step is identical to that for the singlesample t test. Remember, hypotheses are always about populations, not about our specific samples. Summary: Null hypothesis: People who use a 15-inch screen will complete a set of tasks in the same amount of time, on average, as people who use a 42-inch screen H 0 : l 1 l 2. Research hypothesis: People who use a 15-inch screen will complete a set of tasks in a different amount of time, on average, from people who use a 42-inch screen H 1 : l 1 l 2. STEP 3. Determine the characteristics of the comparison distribution. This step is similar to that for the singlesample t test in that we determine the appro - priate mean and standard error of the comparison distribution the distribution based on the null hypothesis. In the single-sample t test, there was a comparison mean, and the null hypothesis posited that the sample mean would be the same as that of the comparison distribution. With the paired-samples t test, we have a sample of difference scores. According to the null hypothesis, there is no difference; that is, the mean difference score is 0. So the mean of the comparison distribution is always 0, as long as the null hypothesis posits no difference. The standard error is calculated exactly as it is calculated for the single-sample t test, only we use the difference scores rather than the scores in each condition. To get the difference scores in the current example, we want to know what happens when we go from the control condition (small screen) to the experimental condition (large screen), so we subtract the first score from the second score. This means that a negative difference indicates a decrease in time when the screen goes from small to large and a positive difference indicates an increase in time. (The test statistic will be the same if we reverse the order in which we subtract, but the sign will change. In some cases, you can think about it as subtracting the before score from the after score.) Another helpful strategy is to cross out the original scores once we ve created the difference scores so that we remember to use only the difference scores from that point on. If we don t cross out the original scores, it is very easy to use them in our calculations and end up with an incorrect standard error. Summary: l M 0; s M 1.923 Calculations: (Notice that we crossed out the original scores once we created our column of difference scores. We did this to remind ourselves that all remaining calculations involve the differences scores, not the original scores.) Difference Squared X Y Difference mean difference deviation 122 111 11 0 0 131 116 15 4 16 127 113 14 3 9 123 119 4 7 49 132 121 11 0 0

CHAPTER 9 The Single-Sample t Test and the Paired-Samples t Test 219 The mean of the difference scores is: M difference 11 The numerator is the sum of squares, SS: STEP 4. Determine the critical values, or cutoffs. 0 16 9 49 0 74 s 74 18. 5 4. 301 ( 5 1) 4301. s M 1923. 5 This step is the same as that for the singlesample t test. The degrees of freedom is the number of participants (not the number of scores) minus 1. Summary: df N 1 5 1 4 Our critical values, based on a two-tailed test and a p level of 0.05, are 2.776 and 2.776, as seen in the curve in Figure 9-8. FIGURE 9-8 Determining Cutoffs for a Paired-Samples t Test We typically determine critical values in terms of t statistics rather than means of raw scores so that we can easily compare a test statistic to them to determine whether the test statistic is beyond the cutoffs. STEP 5. Calculate the test statistic. ( Summary: t 11 0) 572. 1923. This step is identical to that for the singlesample t test. This step is identical to that for the singlesample t test. If we reject the null hypothesis, STEP 6. Make a decision. we need to examine the means of the two conditions (in this case, M X 127; M Y 116) so that we know the direction of the effect. Remember, even though the hypotheses are two-tailed, we report the direction of the effect. Summary: Reject the null hypothesis. It appears that, on average, people perform faster when using a 42-inch monitor than when using a 15-inch monitor (as shown by the curve in Figure 9-9). 2.50% 2.50% FIGURE 9-9 Making a Decision To decide whether to reject the null hypothesis, we compare our test statistic to our critical values. In this figure, the test statistic, 5.72, is beyond the cutoff of 2.776, so we can reject the null hypothesis. 5.72 2.776 2.776

220 CHAPTER 9 The Single-Sample t Test and the Paired-Samples t Test FIGURE 9-10 A 95% Confidence Interval for a Paired-Samples t Test, Part I We start the confidence interval for a distribution of mean differences by drawing a curve with the sample mean difference, 11, in the center. 2.50% 2.50% The statistics, as reported in a journal article, follow the same APA format as for a single- sample t test. We report the degrees of freedom, the value of the test statistic, and the p value associated with the test statistic (unless we use software, we can only indicate whether the p value is less than or greater than 0.05). In the current example, the statistics would read: t(4) 5.72, p 0.05. (We would also include the means and the standard deviations for the two samples. We calculated the means in step 6 of hypothesis testing, but we would also have to calculate the standard deviations for the two samples to report them. In addition, we could report p rep instead the p value. See the SPSS section of this chapter for details.) Researchers note that the faster time with the large display might not seem much faster but that, in their research, they have had great difficulty identifying any factors that lead to faster times (Czerwinski et al., 2003). Based on their previous research, therefore, this is an impressive difference. Calculating a Confidence Interval for a Paired-Samples t Test As with most hypothesis testing, the APA also encourages the use of confidence intervals and effect sizes when conducting a paired-samples t test. Let s start by determining the confidence interval for the example we ve been using. First, let s recap the information we need. The population mean difference according to the null hypothesis was 0, and we used the sample to estimate the population standard deviation to be 4.301 and standard error to be 1.923. The five participants in the study sample had a mean difference of 11. We will calculate the 95% confidence interval around the sample mean difference of 11. Step 1: Draw a picture of a t distribution that 47.50% 47.50% 47.50% 47.50% 2.776 FIGURE 9-11 A 95% Confidence Interval for a Paired-Samples t Test, Part II The next step in calculating a confidence interval for mean differences is identifying the t statistics that indicate each end of the interval. Because the curve is symmetric, the t statistics have the same magnitude one is negative, 2.776, and one is positive, 2.776. 11 0 2.776 2.50% 2.50% includes the confidence interval. We draw a normal curve (see Figure 9-10) that has the sample mean difference, 11, at its center instead of the population mean difference, 0. Step 2: Indicate the bounds of the confidence interval on the drawing. As before, 47.5% would fall on each side of the mean between the mean and the cutoff, and 2.5% would fall in each tail. Step 3: Add the t statistics to the curve, as seen in Figure 9-11. For a two-tailed test with a p level of 0.05 and 4 df, the critical values are 2.776 and 2.776. Step 4: Convert the t statistics back into raw mean differences. As we did with the other confidence intervals, we use the sample mean difference ( 11) in our calculations and the standard error (1.923) as our measure of spread. We use the same formulas as for the single-sample t test, recalling that these means and standard errors are calculated from differences between two scores for each participant in the study, rather than an individual score for each participant. We have added these raw mean differences to our curve in Figure 9-12. M lower t(s M ) M sample 2.776(1.923) ( 11) 16.34 M upper t(s M ) M sample 2.776(1.923) ( 11) 5.66 Our 95% confidence interval, reported in brackets as is typical, is [ 5.66, 16.34].

CHAPTER 9 The Single-Sample t Test and the Paired-Samples t Test 221 2.50% 47.50% 47.50% 2.50% FIGURE 9-12 A 95% Confidence Interval for a Paired-Samples t Test, Part III The final step in calculating a confidence interval for mean differences is converting the t statistics that indicate each end of the interval to raw mean differences, 16.34 and 5.66. 16.34 11 5.66 Step 5: Check that the confidence interval makes sense. The sample mean difference should fall exactly in the middle of the two ends of the interval. 11 ( 5.66) 5.34 and 11 ( 16.34) 5.34 We have a match. The confidence interval ranges from 5.34 below the sample mean difference to 5.34 above the sample mean difference. If we were to sample five people from the same population over and over, the 95% confidence interval would include the population mean 95% of the time. Note that the population mean difference according to the null hypothesis, 0, does not fall within this interval. This means it is not plausible that the difference between those using the 15-inch monitor and those using the 42-inch monitor is 0. We can conclude that, on average, people perform faster when using a 42-inch monitor than when using a 15-inch monitor. As with other hypothesis tests, the conclusions from both the paired-samples t test and the confidence interval are the same, but the confidence interval gives us more information an interval estimate, not just a point estimate. Calculating Effect Size for a Paired-Samples t Test As with a z test, we can calculate the effect size (Cohen s d) for a paired-samples t test. Let s calculate it for the computer monitor study. Again, we simply use the formula for the t statistic, substituting s for s M (and l for l M, even though these means are always the same). This means we use 4.301 instead of 1.923 in the denominator. Cohen s d is now based on the spread of the distribution of individual differences between scores, rather than the distribution of mean differences. M Cohen s d ( l) s ( 11 0). 256. 4301 MASTERING THE FORMULA 9-7: The formula for the lower bound of a confidence interval for a paired-samples t test is M lower t(s M ) M sample. The formula for the upper bound of a confidence interval for a paired-samples t test is M upper t(s M ) M sample. These are the same as for a single-sample t test, but remember that the means and standard errors are calculated from differences between pairs of scores, not individual scores. MASTERING THE FORMULA Our effect size, d 2.56, tells us that our sample mean difference and the population mean difference are 2.56 standard deviations apart. This is a large effect. Recall that the sign has no effect on the size of an effect: 2.56 and 2.56 are equivalent effect sizes. We can add the effect size when we report the statistics as follows: t(4) 5.72, p 0.05, d 2.56. CHECK YOUR LEARNING Reviewing the Concepts > The paired-samples t test is used when we have data for all participants under two conditions a within-groups design. > In the paired-samples t test, we calculate a difference score for every individual. The statistic is calculated on those difference scores. > We use the same six steps of hypothesis testing that we used with the z test and with the single-sample t test. continued on next page

222 CHAPTER 9 The Single-Sample t Test and the Paired-Samples t Test > We can calculate p rep, a confidence interval, and an effect size (Cohen s d), for a pairedsamples t test. Clarifying the Concepts 9-11 How do we conduct a paired-samples t test? 9-12 Explain what an individual difference score is, as it is used in a paired-samples t test. Calculating the Statistics 9-13 Below are energy-level data (on a scale of 1 to 7, where 1 feeling of no energy and 7 feeling of high energy) for five students before and after lunch. Calculate the mean difference for these people so that loss of energy is a negative value. Assume you are testing the hypothesis that students go into what we call food comas after eating, versus lunch giving them added energy. Before lunch After lunch 6 3 5 2 4 6 5 4 7 5 Applying the Concepts Solutions to these Check Your Learning questions can be found in Appendix D. 9-14 Using the energy-level data presented in Check Your Learning 9-13, let s test the hypothesis that students have different energy levels before and after lunch. a. Perform the six steps of hypothesis testing. b. Calculate the 95% confidence interval and describe how it results in the same conclusion as the hypothesis test. c. Calculate and interpret Cohen s d. REVIEW OF CONCEPTS The t Distributions The t distributions are similar to the z distribution, except that we must estimate the standard deviation from the sample. When estimating the standard deviation, we must make a mathematical correction to adjust for the increased likelihood of error. After estimating the standard deviation, the t statistic is calculated like the z statistic for distributions of means. The t distributions can be used to compare the mean of a sample to a population mean when we don t know the population standard deviation (singlesample t test), to compare two samples with a within-groups design (paired-samples t test), and to compare two samples with a between-groups design (independentsamples t test). We learned about the first two t tests in this chapter; the third t test is described in Chapter 10. The Single-Sample t Test Like z tests, single-sample t tests are conducted in the rare cases in which we have one sample that we re comparing to a known population. The difference is that we must know the mean and the standard deviation of the population to conduct a z test, whereas

CHAPTER 9 The Single-Sample t Test and the Paired-Samples t Test 223 we only have to know the mean of the population to conduct a single-sample t test. There are many t distributions, one for every possible sample size. We look up the appropriate critical values on the t table based on degrees of freedom, a number calculated from the sample size. In addition to the hypothesis test we can calculate p rep, a confidence interval, and an effect size (Cohen s d) for a single-sample t test. The Paired-Samples t Test A paired-samples t test is used when we have two samples and the same participants are in both samples; to conduct the test, we calculate a difference score for every individual. The comparison distribution is a distribution of mean difference scores. We can calculate p rep, a confidence interval, and an effect size (Cohen s d) for a paired- samples t test. SPSS The t test is used to compare only two groups. First, let s conduct a single-sample t test using the data on number of counseling sessions attended that we tested earlier in this chapter. The five scores were: 6, 6, 12, 7, and 8. Select Analyze Compare Means One-Sample T Test. Then highlight the dependent variable (sessions) and click the arrow in the center to choose it. Type the population mean to which we re comparing our sample, 4.6, next to Test Value and click OK. The screenshot here shows the data and output. You ll notice that the t statistic, 2.874, is almost identical to the one we calculated, 2.873. The difference is only due to our rounding decisions. Notice that the confidence interval is different from the one we calculated. This is an interval around the difference between the two means, rather than around the mean of our sample. The p value is under Sig (2-tailed). The p value of.045 is less than the chosen p level of.05, an indication that this is a statistically significant finding. We can use this p value in Excel to calculate our p rep. When we replace P with.045 in the Excel formula, NORMSDIST(NORMSINV(1-P)/ (SQRT (2))), we get a p rep of.8847. If we were to replicate this study with the same sample size drawn from the same population, we could expect an effect in the same direction 88.47% of the time. For a paired-samples t test, let s use the data from this chapter on performance using a small monitor versus a large monitor. Enter the data in two columns, with each participant having one score in the first column for his or her performance on the small monitor and one score in the second column for his or her performance on the large monitor. Select Analyze Compare Means Paired-Samples T Test. Choose the dependent variable under the first condition (Small) by clicking it, then clicking the center arrow. Choose the dependent variable under the second condition (large) by

224 CHAPTER 9 The Single-Sample t Test and the Paired-Samples t Test clicking it, then clicking the center arrow. Then click OK. The data and output are shown in the screenshot. Notice that the t statistic and confidence interval match ours (5.72) and [ 16.34, 5.66] except that the signs are different. This occurs because of the order in which one score was subtracted from the other score that is, whether the score on the large monitor was subtracted from the score on the small monitor, or vice versa. The outcome is the same in either case. The p value is under Sig. (2-tailed) and is.005. We can use this number in Excel to determine the value for p rep,.9657. How It Works 9.1 CONDUCTING A SINGLE-SAMPLE t TEST In How It Works 7.2, we conducted a z test for data from the Consideration of Future Consequences (CFC) scale (Petrocelli, 2003). How can we conduct all six steps for a singlesample t test for the same data using a p level of 0.05 and a two-tailed test? To start, we ll use the population mean CFC score of 3.51, but we ll pretend that we no longer know the population standard deviation. As before, we wonder whether students who joined a career discussion group might have improved CFC scores, on average, compared with the population. Forty-five students in the social sciences regularly attended these discussion groups and then took the CFC scale. The mean for this group is 3.7. The standard deviation for this sample is 0.52. Step 1: Population 1: All students in career discussion groups Population 2: All students who did not participate in career discussion groups The comparison distribution will be a distribution of means. The hypothesis test will be a single-sample t test because we have only one sample, and we know the population mean but do not know the population standard deviation. This study meets two of the three assumptions and may meet the third. The dependent variable is scale. In addition, there are more than 30 participants in the sample, indicating that the comparison distribution will be normal. The data were not randomly selected, however, so we must be cautious when generalizing. Step 2: Null hypothesis: Students who participated in career discussion groups had the same CFC scores, on average, as students who did not participate H 0 : l 1 l 2.