Appendix B Statistical Methods

Appendix B Statistical Methods Figure B. Graphing data. (a) The raw data are tallied into a frequency distribution. (b) The same data are portrayed in a bar graph called a histogram. (c) A frequency polygon is plotted over the histogram. (d) The resultant frequency polygon is shown by itself. Empiricism depends on observation; precise observation depends on measurement; and measurement requires numbers. Thus, scientists routinely analyze numerical data to arrive at their conclusions. Over empirical studies are cited in this text, and all but a few of the simplest ones required a statistical analysis. Statistics is the use of mathematics to organize, summarize, and interpret numerical data. We discussed statistics briefly in Chapter, but in this appendix we take a closer look. To illustrate statistics in action, imagine a group of students who want to test a hypothesis that has generated quite an argument in their psychology class. The hypothesis is that university students who watch a great deal of television aren t as bright as those who watch TV infrequently. For the fun of it, the class decides to conduct a correlational study of itself, collecting survey and psychological test data. All of the classmates agree to respond to a short survey on their TV viewing habits. Because everyone at that school has had to take the Scholastic Aptitude Test (SAT), the class decides to use scores on the SAT verbal subtest as an index of how bright students are. The SAT is one of a set of tests that high school students in the United States take before applying to college or university. Universities frequently use test scores like these, along with the students high school grades and other relevant information, when considering admitting students to university. In this class, all of the students agree to allow the records office at the university to furnish their SAT scores to the professor, who replaces each student s name with a subject number (to protect students right to privacy). Let s see how they could use statistics to analyze the data collected in their pilot study (a small, preliminary investigation). Graphing Data After collecting the data, the next step is to organize the data to get a quick overview of our numerical results. Let s assume that there are students in the class, and when they estimate how many hours they spend per day watching TV, the results are as follows: c One of the simpler things that they can do to organize data is to create a frequency distribution an orderly arrangement of scores indicating the frequency of each score or group of scores. Figure B.(a) shows a frequency distribution for the data on TV viewing. The column on the left lists the possible scores (estimated hours of TV viewing) in order, and the column on the right lists the number of subjects or participants with each score. Graphs can provide an even better overview of the data. One approach is to portray the data in a histogram, which is a bar graph that presents data from a frequency distribution. Such a histogram, summarizing our TV viewing data, is presented in Figure B.(b). Another widely used method of portraying data graphically is the frequency polygon a line figure used to present data from a frequency distribution. Figures B.(c) and B.(d) show how the TV viewing data can be converted from a histogram to a frequency polygon. In both the bar graph and the line figure, the horizontal axis lists the possible scores and Score Tallies of score Scores (estimated hours of TV viewing per day) Scores (estimated hours of TV viewing per day) Scores (estimated hours of TV viewing per day) (a) distribution (b) Histogram (c) Conversion of histogram into frequency polygon (d) polygon A-8 APPENDI B NEL 77_7_App_pA-A pp.indd 8 // :: AM

the vertical axis is used to indicate the frequency of each score. This use of the axes is nearly universal for frequency polygons, although sometimes it is reversed in histograms (the vertical axis lists possible scores, so the bars become horizontal). The graphs improve on the jumbled collection of scores that they started with, but descriptive statistics, which are used to organize and summarize data, provide some additional advantages. Let s see what the three measures of central tendency tell us about the data. Measuring Central Tendency c In examining a set of data, it s routine to ask, What is a typical score in the distribution? For instance, in this case, we might compare the average amount of TV watching in the sample to national estimates, to determine whether the subjects appear to be representative of the population. The three measures of central tendency the median, the mean, and the mode give us indications regarding the typical score in a data set. As explained in Chapter, the median is the score that falls in the centre of a distribution, the mean is the arithmetic average of the scores, and the mode is the score that occurs most frequently. All three measures of central tendency are calculated for the TV viewing data in Figure B.. As you can see, in this set of data, the mean, median, and mode all turn out to be the same score, which is. Although our example in Chapter emphasized that the mean, median, and mode can yield different estimates of central tendency, the correspondence among them seen in the TV viewing data is quite common. Lack of agreement usually occurs when a few extreme scores pull the mean away from the centre of the distribution, as shown in Figure B.. The curves plotted in Figure B. are simply smoothed-out =. Mode (most frequent score) Median (middle of score distribution) Mean (arithmetic average of summed scores) frequency polygons based on data from many subjects. They show that when a distribution is symmetric, the measures of central tendency fall together, but this is not true in skewed or unbalanced distributions. Figure B.(b) shows a negatively skewed distribution, in which most scores pile up at the high end of the scale (negative skew refers to the direction in which the curve s tail points). A positively skewed distribution, in which scores pile up at the low end of the scale, is shown in Figure B.(c). In both types of skewed distributions, a few extreme scores at one end pull the mean, and to a lesser degree the median, away from the mode. In these situations, the mean may be misleading and the median usually provides the best index of central tendency. In any case, the measures of central tendency for the TV viewing data are reassuring, since they all agree and they fall reasonably close to national estimates Figure B. Measures of central tendency. Although the mean, median, and mode sometimes yield different results, they usually converge, as in the case of the TV viewing data. Figure B. Measures of central tendency in skewed distributions. In a symmetrical distribution (a), the three measures of central tendency converge. However, in a negatively skewed distribution (b) or in a positively skewed distribution (c), the mean, median, and mode are pulled apart as shown here. Typically, in these situations, the median provides the best index of central tendency. Low Mean Mode Median Scores High Low Mean Mode High Median Scores Low Mode Mean High Median Scores (a) Symmetrical distribution (b) Negatively skewed distribution (c) Positively skewed distribution NEL APPENDI B A- 77_7_App_pA-A pp.indd // :: AM

Figure B. The standard deviation and dispersion of data. Although both of these distributions of golf scores have the same mean, their standard deviations will be different. In (a) the scores are bunched together and there is less variability than in (b), yielding a lower standard deviation for the data in distribution (a). regarding how much young adults watch TV (Nielsen Media Research, 8). Given the small size of the student group, this agreement with national norms doesn t prove that the sample is representative of the population, but at least there s no obvious reason to believe that it is unrepresentative. Measuring Variability c Of course, the subjects in the sample did not report identical TV viewing habits. Virtually all data sets are characterized by some variability. Variability refers to how much the scores tend to vary or depart from the mean score. For example, the distribution of golf scores for a mediocre, erratic golfer would be characterized by high variability, while scores for an equally mediocre but consistent golfer would show less variability. The standard deviation is an index of the amount of variability in a set of data. It reflects the dispersion of scores in a distribution. This principle is portrayed graphically in Figure B., where the two distributions of golf scores have the same mean but the upper one has less variability because the scores are bunched up in the centre (for the consistent golfer). The distribution in Figure B.(b) is characterized by more variability, as the erratic golfer s scores are more spread out. This distribution will yield a higher standard deviation than the distribution in Figure B.(a). (a) (b) Mean 7 8 Golf scores Mean The formula for calculating the standard deviation is shown in Figure B., where d stands for each score s deviation from the mean and S stands for summation. A step-by-step application of this formula to our TV viewing data, shown in Figure B., reveals that the standard deviation for our TV viewing data is.. The standard deviation has a variety of uses. One of these uses will surface in the next section, where we discuss the normal distribution. The Normal Distribution c The hypothesis in the study was that brighter students watch less TV than relatively dull students. To test this hypothesis, the students decided to correlate N = Σ = Mean = TV viewing score ( ) Σ N = = Standard deviation = Σd = N =.7 =.. Deviation from mean ( d ) + + + + + + + + Deviation squared ( d ) Σd = 7 8 Golf scores Figure B. Steps in calculating the standard deviation. () Add the scores (S) and divide by the number of scores (N) to calculate the mean (which comes out to. in this case). () Calculate each score s deviation from the mean by subtracting the mean from each score (the results are shown in the second column). () Square these deviations from the mean and total the results to obtain (Sd ), as shown in the third column. () Insert the numbers for N and Sd into the formula for the standard deviation and compute the results. A- APPENDI B NEL 77_7_App_pA-A pp.indd // :: AM

TV viewing with SAT scores. But to make effective use of the SAT data, they need to understand what SAT scores mean, which brings us to the normal distribution. The normal distribution is a symmetrical, bellshaped curve that represents the pattern in which many human characteristics are dispersed in the population. A great many physical qualities (e.g., height, nose length, and running speed) and psychological traits (intelligence, spatial reasoning ability, introversion) are distributed in a manner that closely resembles this bell-shaped curve. When a trait is normally distributed, most scores fall near the centre of the distribution (the mean), and the number of scores gradually declines as one moves away from the centre in either direction. The normal distribution is not a law of nature. It s a mathematical function, or theoretical curve, that approximates the way nature seems to operate. The normal distribution is the bedrock of the scoring system for most psychological tests, including the SAT. As we discuss in Chapter, psychological tests are relative measures; they assess how people score on a trait in comparison to other people. The normal distribution gives us a precise way to measure how people stack up in comparison to each other. The scores under the normal curve are dispersed in a fixed pattern, with the standard deviation serving as the unit of measurement, as shown in Figure B.. About 8% of the scores in the distribution fall within plus or minus standard deviation of the mean, while % of the scores fall within plus or minus standard deviations of the mean. Given this fixed pattern, if you know the mean and standard deviation of a normally distributed trait, you can tell where any score falls in the distribution for the trait. Although you may not have realized it, you probably have taken many tests in which the scoring system is based on the normal distribution, such as IQ tests. On the SAT, for instance, raw scores (the number of items correct on each subtest) are converted into standard scores that indicate where a student falls in the normal distribution for the trait measured. In this conversion, the mean is set arbitrarily at and the standard deviation at, as shown in Figure B.7. Therefore, a score of on the SAT verbal subtest means that the student scored standard deviation below the mean, while an SAT score of indicates that the student scored standard deviation above the mean. Thus, SAT scores tell us how many standard deviations above or below the mean a specific student s score was. This system also provides the metric for IQ scales and many other types of psychological tests (see Chapter ). Test scores that place examinees in the normal distribution can always be converted to percentile scores,.7%.% 8.% 7 8 Standard deviations Number of scores in interval if total number =........ Scores in interval (%) 7 8 Percentiles + + + Figure B. The normal distribution. Many characteristics are distributed in a pattern represented by this bell-shaped curve (each dot represents a case). The horizontal axis shows how far above or below the mean a score is (measured in plus or minus standard deviations). The vertical axis shows the number of cases obtaining each score. In a normal distribution, most cases fall near the centre of the distribution, so that 8.% of the cases fall within plus or minus standard deviation of the mean. The number of cases gradually declines as one moves away from the mean in either direction, so that only.% of the cases fall between and standard deviations above or below the mean, and even fewer cases (.%) fall between and standard deviations above or below the mean. NEL APPENDI B A- 77_7_App_pA-A pp.indd // :: AM

Figure B.7 The normal distribution and SAT scores. The normal distribution is the basis for the scoring system on many standardized tests. For example, on the SAT, the mean is set at and the standard deviation at. Hence, an SAT score tells you how many standard deviations above or below the mean a student scored. For example, a score of 7 means that person scored standard deviations above the mean. Figure B.8 Scatter diagrams of positive and negative correlations. Scatter diagrams plot paired and Y scores as single points. Score plots slanted in the opposite direction result from positive (top row) as opposed to negative (bottom row) correlations. Moving across both rows (to the right), you can see that progressively weaker correlations result in more and more scattered plots of data points..%.%.%.%.%.%.%.% + + + Standard deviations 7 8 SAT scores which are a little easier to interpret. A percentile score indicates the percentage of people who score at or below a particular score. For example, if you score at the th percentile on an IQ test, % of the people who take the test score the same or below you, while the remaining % score above you. There are tables available that permit us to convert any standard deviation placement in a normal distribution into a precise percentile score. Figure B. gives some percentile conversions for the normal curve. Of course, not all distributions are normal. As we saw in Figure B., some distributions are skewed in one direction or the other. As an example, consider what would happen if a classroom exam was much too easy or much too hard. If the test was too easy, scores would be bunched up at the high end of the scale, as in Figure B.(b). If the test was too hard, scores would be bunched up at the low end, as in Figure B.(c). Measuring Correlation d To determine whether TV viewing is related to SAT scores, the students have to compute a correlation coefficient a numerical index of the degree of relationship between two variables. As discussed in Chapter, a positive correlation means that two variables say and Y co-vary in the same direction. This means that high scores on variable are associated with high scores on variable Y and that low scores on are associated with low scores on Y. A negative correlation indicates that two variables co-vary in the opposite direction. This means that people who score high on variable tend to score low on variable Y, whereas those who score low on tend to score high on Y. In their study, the psychology students hypothesized that as TV viewing increases, SAT scores will decrease, so they should expect a negative correlation between TV viewing and SAT scores. The magnitude of a correlation coefficient indicates the strength of the association between two variables. This coefficient can vary between and.. The coefficient is usually represented by the letter r (e.g., r.). A coefficient near tells us that there is no relationship between two variables. A coefficient of. or. indicates that there is a perfect, one-to-one correspondence between two variables. A perfect correlation is found only rarely when working with real data. The closer the coefficient is to either. or., the stronger the relationship is. The direction and strength of correlations can be illustrated graphically in scatter diagrams (see Figure B.8). A scatter diagram is a graph in which paired and Y scores for each subject are plotted as single points. Figure B.8 shows scatter diagrams for Direct relationship Y Y Y Y Y Positive correlation Inverse relationship r =. r =.8 r =. r =. Y Y Y Y Y Negative correlation r =. r =.8 r =. r =. A- APPENDI B NEL 77_7_App_pA-A pp.indd // :: AM

positive correlations in the upper half and for negative correlations in the bottom half. A perfect positive correlation and a perfect negative correlation are shown on the far left. When a correlation is perfect, the data points in the scatter diagram fall exactly in a straight line. However, positive and negative correlations yield lines slanted in the opposite direction because the lines map out opposite types of associations. Moving to the right in Figure B.8, you can see what happens when the magnitude of a correlation decreases. The data points scatter farther and farther from the straight line that would represent a perfect relationship. What about the data relating TV viewing to SAT scores? Figure B. shows a scatter diagram of these data. Having just learned about scatter diagrams, perhaps you can estimate the magnitude of the correlation between TV viewing and SAT scores. The scatter diagram of our data looks a lot like the one shown in the bottom right corner of Figure B.8, suggesting that the correlation will be in the vicinity of.. The formula for computing the most widely used measure of correlation the Pearson product moment correlation is shown in Figure B., along with the calculations for the data on TV viewing and SAT scores. The data yield a correlation of r.. This coefficient of correlation reveals that there is a weak inverse association between TV viewing and performance on the SAT. Among the sample of participants, as TV viewing increases, SAT scores decrease, but the trend isn t very strong. We can get a better idea of how strong this correlation is by examining its predictive power. Correlation and Prediction d As the magnitude of a correlation increases (gets closer to either. or.), our ability to predict one variable based on knowledge of the other variable steadily increases. This relationship between the magnitude of a correlation and predictability can be quantified precisely. All we have to do is square the correlation coefficient (multiply it by itself) and this gives us the coefficient of determination, the percentage of variation in one variable that can be predicted based on the other variable. Thus, a correlation of.7 yields a coefficient of determination of. (.7.7.), indicating that variable can account for % of the variation in variable Y. Figure B. shows how the coefficient of determination goes up as the magnitude of a correlation increases. Unfortunately, a correlation of. doesn t give us much predictive power. The students can account for only a little over % of the variation in variable Y. So, if they tried to predict individuals SAT scores based on how much TV those individuals watched, their predictions wouldn t be very accurate. Although a low correlation doesn t have much practical, predictive utility, it may still have theoretical value. Just knowing that there is a relationship between two variables can SAT score 7 Subject number 7 8 7 8 Formula for Pearson product-moment correlation coefficient Estimated hours of TV viewing per day TV viewing score ( ) Figure B. Scatter diagram of the correlation between TV viewing and SAT scores. The hypothetical data relating TV viewing to SAT scores are plotted in this scatter diagram. Compare it to the scatter diagrams shown in Figure B.8 and see whether you can estimate the correlation between TV viewing and SAT scores in the students data (see the text for the answer). N = Σ = Σ = ΣY = 8 ΣY = 7 ΣY = 8 8 Figure B. r = = SAT score ( Y ) Y = [8][ 7] =.7 7 7 7 ( N ) ΣY ( ) Σ ( ΣY ) [(N ) Σ (Σ ) ][(N ) ΣY (ΣY ) ] () (8 8) () (8) 8 8 7 [ () () () ] [() ( 7) (8) ] Y 8 8 8 7 7 Computing a correlation coefficient. The calculations required to compute the Pearson product moment coefficient of correlation are shown here. The formula looks intimidating, but it s just a matter of filling in the figures taken from the sums of the columns shown above the formula. NEL APPENDI B A- 77_7_App_pA-A pp.indd // :: AM

Figure B. Correlation and the coefficient of determination. The coefficient of determination is an index of a correlation s predictive power. As you can see, whether positive or negative, stronger correlations yield greater predictive power. Coefficient of determination..7.. High Negative correlation Moderate Low Low Negligible predictive power Positive correlation Moderate High...8.7.............7.8.. Correlation Increasing Increasing be theoretically interesting. However, we haven t yet addressed the question of whether the observed correlation is strong enough to support the hypothesis that there is a relationship between TV viewing and SAT scores. To make this judgment, we have to turn to inferential statistics and the process of hypothesis testing. Hypothesis Testing Inferential statistics go beyond the mere description of data. Inferential statistics are used to interpret data and draw conclusions. They permit researchers to decide whether their data support their hypotheses. In Chapter, we showed how inferential statistics can be used to evaluate the results of an experiment; the same process can be applied to correlational data. In the study of TV viewing, the students hypothesized that they would find an inverse relationship between the amount of TV watched and SAT scores. Sure enough, that s what they found. However, a critical question remains: Is this observed correlation large enough to support the hypothesis, or might a correlation of this size have occurred by chance? We have to ask a similar question nearly every time we conduct a study. Why? Because we are working with only a sample. In research, we observe a limited sample (in this case, participants) to draw conclusions about a much larger population (students in general). In any study, there s always a possibility that if we drew a different sample from the population, the results might be different. Perhaps our results are unique to our sample and not generalizable to the larger population. If we were able to collect data on the entire population, we would not have to wrestle with this problem, but our dependence on a sample necessitates the use of inferential statistics to precisely evaluate the likelihood that our results are due to chance factors in sampling. Thus, inferential statistics are the key to making the inferential leap from the sample to the population (see Figure B.). Although it may seem backward, in hypothesis testing, we formally test the null hypothesis. As applied to correlational data, the null hypothesis is the assumption that there is no true relationship between the variables observed. In the students study, the null hypothesis is that there is no genuine Figure B. Population: The complete set Sampling Sample: A subset of the population Inference The relationship between the population and the sample. In research, we are usually interested in a broad population, but we can observe only a small sample from the population. After making observations of our sample, we draw inferences about the population, based on the sample. This inferential process works well as long as the sample is reasonably representative of the population. A- APPENDI B NEL 77_7_App_pA-A pp.indd // :: AM

association between TV viewing and SAT scores. They want to determine whether their results will permit them to reject the null hypothesis and thus conclude that their research hypothesis (that there is a relationship between the variables) has been supported. In such cases, why do researchers directly test the null hypothesis instead of the research hypothesis? Because our probability calculations depend on assumptions tied to the null hypothesis. Specifically, we compute the probability of obtaining the results that we have observed if the null hypothesis is indeed true. The calculation of this probability hinges on a number of factors. A key factor is the amount of variability in the data, which is why the standard deviation is an important statistic. Statistical Significance When we reject the null hypothesis, we conclude that we have found statistically significant results. Statistical significance is said to exist when the probability that the observed findings are due to chance is very low, usually fewer than chances in. This means that if the null hypothesis is correct and we conduct our study times, drawing a new sample from the population each time, we will get results such as those observed only times out of. If our calculations allow us to reject the null hypothesis, we conclude that our results support our research hypothesis. Thus, statistically significant results typically are findings that support a research hypothesis. The requirement that there be fewer than chances in that research results are due to chance is the minimum requirement for statistical significance. When this requirement is met, we say the results are significant at the. level. If researchers calculate that there is less than chance in that their results are due to chance factors in sampling, the results are significant at the. level. If there is less than a in chance that findings are attributable to sampling error, the results are significant at the. level. Thus, there are several levels of significance that you may see cited in scientific articles. Because we are dealing only in matters of probability, there is always the possibility that our decision to accept or reject the null hypothesis is wrong. The various significance levels indicate the probability of erroneously rejecting the null hypothesis (and inaccurately accepting the research hypothesis). At the. level of significance, there are chances in that we have made a mistake when we conclude that our results support our hypothesis, and at the. level of significance, the chance of an erroneous conclusion is in. Although researchers hold the probability of this type of error quite low, the probability is never zero. This is one of the reasons that competently executed studies of the same question can yield contradictory findings. The differences may be due to chance variations in sampling that can t be prevented. What do we find when we evaluate the data linking TV viewing to students SAT scores? The calculations indicate that, given the sample size and the variability in the data, the probability of obtaining a correlation of. by chance is greater than %. That s not a high probability, but it s not low enough to reject the null hypothesis. Thus, the findings are not strong enough to allow us to conclude that the students have supported their hypothesis. Statistics and Empiricism In summary, conclusions based on empirical research are a matter of probability, and there s always a possibility that the conclusions are wrong. However, two major strengths of the empirical approach are its precision and its intolerance of error. Scientists can give you precise estimates of the likelihood that their conclusions are wrong, and because they re intolerant of error, they hold this probability extremely low. It s their reliance on statistics that allows them to accomplish these goals. Key Terms Coefficient of determination, A- Correlation coefficient, A- Descriptive statistics, A- distribution, A-8 polygon, A-8 Histogram, A-8 Inferential statistics, A- Mean, A- Median, A- Mode, A- Negatively skewed distribution, A- Normal distribution, A- Null hypothesis, A- Percentile score, A- Positively skewed distribution, A- Scatter diagram, A- Standard deviation, A- Statistical significance, A- Statistics, A-8 Variability, A- NEL APPENDI B A- 77_7_App_pA-A pp.indd // :: AM