Inferential Statistics for Radiation Scientists: A Brief Guide to Better Statistical Interpretations

Size: px
Start display at page:

Download "Inferential Statistics for Radiation Scientists: A Brief Guide to Better Statistical Interpretations"

Transcription

1 Journal of Medical Imaging and Radiation Sciences Journal of Medical Imaging and Radiation Sciences 43 (2012) Directed Reading Article Inferential Statistics for Radiation Scientists: A Brief Guide to Better Statistical Interpretations Yves Bureau, PhD a * a University of Western Ontario, St Joseph s Health Care Lawson Health Research Institute, London, Ontario, Canada Journal de l imagerie médicale et des sciences de la radiation LEARNING OBJECTIVES Distinguish between descriptive and inferential statistics. Describe the Monte Carlo method. This will be important as it will ensure that the investigator understands what P values mean. Of importance is for the investigator to understand estimations. By understanding estimation and error resulting from estimations, the investigator will appreciate statistical results. Understand rare events. This is related to the above point, and is crucial for hypothesis testing. Without this understanding, any interpretation of results will be memorization of rules only. Understand power. What is power and how do we interpret it? The reader will have a greater command of this concept. Understand effect size. Often we look at the Type 1 error erroneously for the effectiveness of a treatment. We should look at other measures as well. To be able to interpret outputs from PASW (SPSS). Critically evaluate data using the Type 1 error, effect size, and power together. Finally, through examples, the reader will be able to analyze and interpret the results from an experiment. ABSTRACT Inferential statistics is used to help investigators make decisions about their data. This package will help novice researchers understand how to think about inferential statistics and will offer some examples of specific statistical tests. Presented here is the Monte Carlo technique, which is an interesting approach to instructing in statistics. It is used as a practical way to show why distributions are important and how it relates to the famous.05 probability criterion when declaring results significant. Also presented here is how to conduct and interpret the t-test, The F test often referred to as analysis of variance, and interaction analysis. Finally, a discussion on misinterpretations is included to help prevent making erroneous statements about statistical analysis. RESUME Les statistiques deductives aident les chercheurs a prendre des decisions sur leurs donnees. Cette trousse aidera les chercheurs debutants a comprendre les statistiques deductives et presente des exemples de tests statistiques particuliers. On y trouve la technique Monte Carlo, une approche de statistiques interessante. Il s agit d une façon pratique de demontrer l importance des distributions et comment elles se relient au fameux critere de probabilite de 0,05 pour la pertinence des resultats. On y trouve aussi l execution et l interpretation du test-t et du test-f, souvent consideres comme une analyse des variances et de l interpretation. Enfin, on aborde la question des fausses interpretations, dans un objectif d aide a la prevention des declarations erronees sur l analyse statistique. Introduction Descriptive statistics describe, which is perfectly useful and appropriate when all that is wished is to summarize the data. However, the moment an investigator asks questions about populations, an inference is being made. Answering those questions require something called inferential statistics. Before inferential statistics, the best an investigator could do was to observe the data and make a blanket statement based on the sample characteristics. It ended up being a best guess. * Corresponding author: Yves Bureau, PhD, University of Western Ontario, St Joseph s Health Care London Health Research Institute, 288 Grosvenor Street, London, Ontario, N6A 4V2. address: ybureau@uwo.ca If an investigator was to determine whether or not two groups in an experimental design had different means from one another, all that could be done was to visually determine whether two group means were different. For lack of better terms we would call that decision-making process or test, the ocular trauma test. The issue with that type of decision making is that there is a probability of being incorrect because it is entirely possible that the means are different due to random error. Consequently, the subjects that make up the two groups might be from the same population but because of error in choosing the subjects, the means end up being different. In this article, it will be explained in detail that samples are always different from one another due to random sampling error. As a result, better ways are needed to help /$ - see front matter Ó 2012 Elsevier Inc. All rights reserved. doi: /j.jmir

2 investigators make decisions about their data. Thus, inferential statistics were born. No one can be 100% certain that a sample accurately represents the population. We will learn that this is sampling error and will have to wrestle with this error because samples rarely measure an entire population. This article will deal with random error and how to inference. It may seem odd that random error is so central to this document when the title of this article is Inferential Statistics for Radiation Scientists: a Guide to Better Statistical Interpretations. However, to understand how inferential statistics work, it will be essential to understand error. Once this is understood, it will be possible to appreciate inferential statistics because it always incorporates error. Descriptive Statistics Barker [1] introduced descriptive statistics in detail. Description of our data provides us with data summaries. However, these summaries are guesses as to what the population looks like, and therefore are inherently inaccurate when extrapolating to the population. It is known that there will be some error in estimation; error must be taken into consideration when making decisions about hypotheses. A number of things are known about error [2]. 1. It is a function of random sampling. Random sampling means that we sample a population without bias. If the sampling is perfectly performed, it is possible to obtain the same mean and standard deviation of the population. However, that never happens. This inaccuracy is called error in random sampling or random sampling error. 2. Sample size is crucial with respect to error when estimating population parameters (such as the mean). The greater the sample size, the less error there is in estimation. If there was access to all subjects in a population there would be no error of estimation, and therefore no need of inferences. 3. Error can be due to instrument imprecision. This part is not crucial to the arguments of this paper. However, the more errors the instrument introduces when measuring, the more difficult it is to accurately describe some population. Estimations and the Monte Carlo Method Monte Carlo (MC) methods provide approximate solutions to a variety of mathematical problems by performing statistical sampling experiments. This process involves performing many simulations using random numbers and probability to get an approximation of the answer to a problem. The name comes from the capital of Monaco that is known for its roulette tables. Because roulette tables are great random number generators, the MC method borrowed its name from this location [3, 4]. The purpose for MC methods in this article is to help us understand how sample sizes influence the accuracy of estimating means for a population. To perform an MC study, computers are used to generate a population of randomly selected numbers that represent some variable [5]. From this population, a sample of any given number of subjects can be obtained by random selection. After calculating the mean of the sample, the sample is returned to the population, a process called sampling with replacement. This process is repeated as often as necessary, sometimes as few as 2,000 times, but there are no limits. Of interest for us will be the resulting distribution of means as opposed to individual scores. With small samples, means tend to be more variable compared with distributions from large samples. Consequently, with small samples, it is more likely that we will sample from the extremes of the population. In short, here is how an MC simulation for the distribution of a statistic is conducted. 1. First, a population of a given number of scores with a fixed mean and standard deviation is generated. Figure 1. Three distributions of means constructed using Monte Carlo simulations. (A, B, C) Constructed using sample sizes of 4, 32, and 1,000 subjects, respectively. The larger the sample size, the less range in the distribution of means. 122 Y. Bureau/Journal of Medical Imaging and Radiation Sciences 43 (2012)

3 Figure 2. Three distributions of mean differences constructed using Monte Carlo simulations. (A, B, C) Constructed using sample sizes of 4, 32, and 1,000 subjects per group, respectively. The larger the sample sizes, the less range in the distribution of mean differences. 2. If a distribution of means is to be constructed a specific sample size is chosen. 3. The computer then randomly samples the constructed population and a mean is calculated. 4. The scores or subjects used for the first sampling are returned to the population. 5. The sampling is repeated as often as specified by the investigator. 6. The means are plotted to show the distribution using a histogram. 7. The investigator can then observe this distribution or conduct statistical analyses by comparing the distributions (distribution comparisons are beyond the scope of this article). Below are distributions of means generated by MC2G (Monte Carlo Analysis for 1 or 2 Groups, ver. 5.07, Brooks, 2005, Aspen, Ohio, index.htm), and graphed using PASW version 18 (SPSS Inc, IBM, Chicago, IL). PASW was formerly known as SPSS and has reverted to that name in later versions. To begin MC analysis, MC2G generated a population of 10,000 individual scores with a mean of zero and a standard deviation of one (m ¼ 0, s ¼ 1). The number of scores was chosen arbitrarily. It could easily have been one million; however, for the purposes here, 10,000 scores is a sufficiently large population. From this population, a distribution of 10,000 samples were randomly obtained and graphed. Three distributions were produced from samples of 4, 32, and 1,000 scores (Figure 1a, b, c). It can be seen from these MC simulations that the distributions range of means decrease as sample size increases. With a sample size of 1,000 scores, the range in means is negligible. This demonstrates that population estimations are more accurately determined with larger sample sizes. This has serious implications when conducting statistical analyses. Sampling from two different populations using small sample sizes will result in greater error because a distribution of mean differences would be variable. To emphasize the point that sample size matters with respect to estimation, MC simulations were conducted in which two means were randomly sampled from the same population. For this example the samples came from the same population. This is called sampling under the null hypothesis, which is to say that there are no differences between the means in the population. It would be expected then that the means are equal for every sampling. However, because of random error the sample means will not be the same [6]. Figure 2 shows MC simulations for two means from the same population that clearly show that as sample size increases the distribution of mean differences has less variability. This is similar to saying that the range of mean differences is reduced with increased sample size. This phenomenon is taken seriously when discussing statistical tests used for inferencing. Inferential Evaluation of Data Inferential statistics are the basis for decision-making after experimental or associational studies. As noted, this article is limiting itself to a few experimental designs. What follows here are explanations in how to use test tools and how to interpret them. The t-test and the F-test will be the subjects of the next few sections. To best understand how to use these tools, short explanations of concepts will be presented. Independent and Dependent Variables Again, Barker [1] should be consulted, but a short description will be presented here. The distinction between independent variables (IV) and dependent variables (DV) is simple but crucial for experimental designs and the accompanying statistical analysis. This is important if conducting analysis yourself or consulting with a statistician [7]. IVs are variables that have categorization. Most of the time an IV will denote treatment groups or some distinction of characterization. However, a proper IV is a manipulated variable [4]. These are the variables in which subjects are randomly allocated to groups. For example, if an investigator Y. Bureau/Journal of Medical Imaging and Radiation Sciences 43 (2012)

4 Table 1 Critical Values for the t test at Various Degrees of Freedom Alpha One-tailed Alpha Two-tailed df Infinity As the degrees of freedom (df) increase, the critical values decrease. Also, as the acceptable Type 1 error increases, the critical value decreases. Adapted from StatSoft ( wishes to determine whether or not a drug will lower cholesterol levels, the investigator will randomly allocate subjects to groups. Obviously, the investigator is manipulating that variable. Random allocation is important here because it ensures that all confounding variables are equally distributed in the groups. This ensures that any observed differences between the groups will be due to the manipulation and not some disproportionate representation of some confounding characteristic. Dependent variables (DV) are the measures influenced by the manipulation. For example, the cholesterol drug will influence the level of cholesterol in the blood. Cholesterol then is the DV and its concentration depends on the group from which it is measured. Thinking in terms of the DV depending on the IV s manipulation is one way to remember the difference between the DV and IV. Rare Events Rare events are events that occur less frequently than others. In statistics, a rare event is a statistic that has a less than predetermined probability of occurring. Traditionally, we view a statistic that has a probability of less than 5% (P <.05) of being observed to be rare. The critical values shown for the t-test (described later) are the cutoffs at specific probabilities of observing a t-statistic (Table 1). Any observed value within and including the critical values is considered common and therefore not rare. Please keep this in mind as you read the following sections. Hypotheses Hypotheses are the questions to be answered from experiments or studies. They are made in reference to populations. After all, the population is what we are trying to estimate. The null hypothesis is the one that states there is no effect in the population, whereas the alternate hypothesis makes a statement about effect. Directional hypotheses assume that the investigator is interested in making a statement that the effect will go in some direction. This is the more powerful hypothesis. A non-directional hypothesis is stated when the investigator suspects that there will be an effect but there is no knowledge to propose a direction. This type of hypothesis is less powerful but often is the best to make because of its conservative nature. However, some would debate that statement as advocating inaccuracy in place of proper testing. It is the view of this author that non-directional hypotheses reduce the chances of reporting nonreproducible results. H o ¼ null hypothesis. H a ¼ alternate hypothesis. Type 1 and Type 2 Errors All hypotheses can be erroneously rejected. A Type 1 error is the probability of rejecting the null hypothesis when the null hypothesis is correct. It is possible to conclude that an effect is observed when in fact only a random phenomenon is detected. In such a case the conclusion is wrong and an unacceptable Type 1 error has been committed. The acceptable Type 1 error is less than.05. In other words, it is acceptable to be wrong as long as the probability of being wrong is less than.05. This value of.05 is arbitrary. It could easily have been.10, but it would seem that the statistical world has settled on.05. In any experiment, the probability of committing a Type 1 error ranges between 0 and 1.0 for any separate analyses. A Type 2 error is the probability of failing to reject the null hypothesis when in fact there is an effect. The null hypothesis should be rejected but for whatever reason the effect was not detected. Neither error is desirable or acceptable nor would this author suggest that one is more acceptable than the other. Degrees of Freedom We will be somewhat non-mathematical in the explanation of degrees of freedom (df). In most statistical textbooks for the behavioral sciences, the explanations are relatively nonsatisfactory and it will remain so here. In mathematical texts the explanations are complex with sophisticated algebraic proofs. The explanation used here is from Hays [2]. Samples are used as estimates of populations. An inherent bias is that the samples underestimate variance in the population. However, one way of reducing the bias, which is a source of error, is to calculate variance using df as opposed to sample size. According to Hays [2], the sum of deviations of scores from the mean of any population must be zero. This fact has consequences. Suppose that you are told that N ¼ 4in 124 Y. Bureau/Journal of Medical Imaging and Radiation Sciences 43 (2012)

5 a sample and that you are to guess the four deviations from the mean. For the first deviation you can guess any number, and suppose you say d 1 ¼ 6, for the second d 2 ¼ 9, and for the third d 3 ¼ 7. However, when you come to the fourth deviation value, you are no longer free to guess any number you please. The value of d 4 must be equal to d 4 ¼ 0-d 1 -d 2 -d 3 ¼ 10. In short, given the values of an N-1 deviation from the means, which could be any set of N-1 number, the value of the last deviation is determined completely. Therefore, we say that there are n-1 df for a sample variance, which is the average squared deviation. Power Power is the probability of rejecting the null hypothesis at a predetermined alpha if we were to repeat the experiment with all conditions remaining constant. It could be said that this is the ability of a test to detect effects in the population. This is true for tests of means or associations. The power of an experiment depends on: 1) The effect size: Effect size is the degree to which variation in the data is not due to random error. In designs investigating mean differences, the degree to which means differ relative to error variance is important. This is a little different from the discussion that will come later, but suffice it to say that this definition works perfectly when describing tests of means as shown by Cohen [8]. 2) Sample size: As discussed previously, sample size is associated with error in estimation. The larger the sample size, the narrower the distribution of means. 3) Sample variance: Sample variance is quite important. The greater the sample variance, the greater the range in the distribution of means, which translates into greater error in estimating the population mean. Therefore, the likelihood of rejecting the null hypothesis decreases, which results in less power. 4) Directional or non-directional hypothesis: A hypothesis can be directional depending on the investigator s knowledge of the effect. If the direction is known, the test will be more powerful. The investigator must decide on a one (directional) or two (nondirectional)-tailed test. A one-tailed test has smaller critical values (cutoff scores) resulting in a greater probability of rejecting the null hypothesis (more power). For example, if you had six subjects per group for two groups with different means (0 vs. 1), a two-tailed test using the t-test would require an observed t of greater than 2.23, whereas, for a one-tailed test only a t greater than would be required (Figure 3). Experimental Design and Associated Statistics One of the simplest types of experiments comprises a treatment plus a control group. In such an experiment an investigator would be interested in determining whether or not one group mean significantly differs from the other with respect to some measure. Say that the investigator is interested in showing the effectiveness of some cancer drugs to shrink tumors in the lung. The investigator could measure tumor size before Figure 3. (A) The critical value for a directional hypothesis. The one-tailed test would require that the observed t statistic be greater than the value shown. (B) The critical value for a non-directional hypothesis. The two-tailed test would require that the observed t statistic be greater than the value shown. It should be noted that the distribution of rare ts are at one tail of the distribution for the one-tailed test but equally distributed at both tails for the two-tailed test. Y. Bureau/Journal of Medical Imaging and Radiation Sciences 43 (2012)

6 x1 x2 Difference Between Means t = = = 2 2 s s Standard Error of Mean Differences n n 1 s = Group variance x = Group mean n = Group sample size 2 Signal Noise Figure 4. A standard formula for calculating a two-sample independent t test. Of interest here is the emphasis on the effect of mean differences being equated to signal and the standard error of mean differences being equated to noise. This gives us a unique perspective on statistical analysis. and after treatment and calculate the mean differences for both groups. The means of these differences would then be compared. However, as already discussed, group differences calculated from samples vary from random sampling error. Therefore, determining whether or not two groups differ in the population requires that we know the range of some statistic. Let s begin by exploring the t-test s t statistic, a relevant inferential statistic for the present example. The t-test All inferential test statistics incorporate error within their formulas. Most do so by calculating a ratio of signal to noise. The signal is the effect of any treatment and the noise is the variability in the data that cannot be explained by the investigator s manipulations. For the example given previously, the signal would be calculated using the mean difference between groups, whereas the noise is calculated using individual differences between scores within the groups. Such a ratio for our example would be the t ratio. Figure 4 shows the equation for the t-statistic. Notice that it is a ratio as described previously with differences between group means divided by the standard error of the mean (standard deviation of differences between means). The variation of mean differences is estimated from differences between individuals, which is our measurement of random fluctuation. Once the calculations are complete, the resulting ratio will have to be evaluated. This will dictate whether or not the null hypothesis is rejected. To make those statements of significance, it will be necessary to compare the obtained t to a frequency distribution of t-statistics under the null hypothesis called the theoretical distribution (Figure 5). The null hypothesis states that there is no difference between groups in the population. This distribution will show that most ts cluster near the mean which is zero, whereas larger ts are fewer in number and away from the mean. It is important to consider sample size as well. The larger the sample size, the smaller the range of ts. This consequently influences the probability of obtaining a statistic with larger values compared to one calculated from our experiment. For an experiment with a moderate effect size conducted with four subjects per group, there is a probability of less than.05 (5%) of obtaining a value that is beyond 2.44 or below -2.44, whereas the t value would be 2.02 if the experiment was conducted with 20 subjects per group (Table 1). Therefore, an experiment with 20 subjects per group would be more powerful, which is to say we would be more likely to reject the null hypothesis. The distributions look somewhat different with varying degrees of freedom. We can see that comparing t statistics obtained from an experiment with four subjects per group should not be compared to a distribution of ts constructed with sample sizes of 20. You would be falsely tempted to conclude that your experiment worked. This is why we compare an obtained statistic to a distribution of ts obtained using the same df as our experiment. Example with Analysis A radiation scientist is interested in determining the effectiveness of various dosages of radiation to reduce burn area in patients being treated for cancer using a rat model of cancer without compromising the effectiveness of treatment. The investigator used two dosages in the experiment. Through imaging, one epithelial tumor was targeted followed by exposure to radiation. Each rat was then randomly allocated to a dosage group (n ¼ 10/group): 1) 60 Gy or 2) 80 Gy. In this experiment the dependent measures are: 1) the reduction in Figure 5. A theoretical distribution of ts. This distribution was constructed using a large sample size. Consequently, it resembles the Z distribution. Represented by beta (b) are the t values considered common under the null hypothesis and are in this case bound by and include to All other values are considered rare and are observed at the extremes (tails) of the distribution. In this example, the values are less than and greater than represented by alpha (a). 126 Y. Bureau/Journal of Medical Imaging and Radiation Sciences 43 (2012)

7 percentage of the tumor from the beginning to the final radiation exposure and 2) the area of burn on the skin in mm 2. All statistical analyses for this example were conducted using PASW version 18 (Chicago IL). Fictitious data were generated for this example using the literature for ranges. As seen, there are two hypotheses because there are two dependent variables. Null and Alternate Hypotheses H 0 ¼ null hypothesis H a ¼ alternate hypothesis First Set of Hypotheses H 0 ¼ the larger dosage will not be more effective in reducing tumor size H a ¼ the larger dosage will be more effective in reducing tumor size Second Set of Hypotheses H 0 ¼ the larger dosage will not result in larger skin burn area H a ¼ the larger dosage will result in larger skin burn area Both sets of hypotheses are directional. It is reasonable to suppose that higher dosages of radiation would kill more tumor cells and result in more burn. Because PASW conducts two-tailed tests only, the observed Type 1 error must be divided by two to be interpreted as a one-tailed test. As Table 2 shows, PASW prints out descriptive as well as inferential statistics. This is quite useful when summarizing the data and making inferences. PASW displays a number of statistics but only the relevant outputs were selected here. This analysis shows descriptive statistics and the inferential analysis. For the descriptive statistics, the sample size, the mean, standard deviation, and standard error of the mean are shown. There are no associated P values nor is there any possibility of making statements of significance from the available information. With respect to the table titledindependent samples testdall the information needed to make statements of significance is available. First there is the t value, then the df, followed by significance (Sig.; two-tailed) or the probability of making a Type 1 error. The analysis showed that there was no significant difference between dosage groups with respect to percent change in tumor size, t(18) ¼ , P ¼ NS, but there was a significantly greater area of skin burn in the 80 Gy group, t(18) ¼ -8.50, P <.001. Those statements can be made because the Sig. is less than.05 for skin burn area but greater than.05 for radiation dose. Therefore, according to this fictitious data both radiation dosages have equivalent treatment efficacy but different burn effects. When reporting results, the t value, df and Type 1 error must be reported. That being said, some journal editors instruct authors to report the P value only as a space-saving measure or to avoid visual clutter. This author recommends reporting all that is possible. Finally, the probability of a Type 1 error should be reported as either,.05,.01,.001, The reason for this is that it is not necessary to report exact values. If this study were to be repeated, the P value would not equate that of the first study. Therefore, a cutoff is more reasonable. The F Test (Analysis of Variance) and Multiple Group Analysis Multiple group analysis is always a problem. In order to fully analyze these designs multiple comparisons between Table 2 Group Descriptive Statistics and Inferential Statistics Output from PASW (SPSS) A. Group Statistics Dosage Group n Mean Standard Deviation Standard Error of the Mean Percent change in tumor size dimension1 60 Gy Gy Area of skin burned by treatment dimension1 60 Gy Gy B. Independent Samples Test t test for Equality of Means t df Significance (2-tailed) Percent change in tumor size Equal variances assumed Equal variances not assumed Area of skin burned by treatment Equal variances assumed Equal variances not assumed The sample size per group, mean, standard deviation, and standard error of the mean are shown. The inferential statistics shown here are the t statistic, degrees of freedom (df), and significance or the probability of a Type 1 error. The analyses for both percent change in tumor size and area of skin burned by treatment are shown. For the first analysis, dose did not influence tumor size (p >.05 or p ¼ ns, not significant) as indicated by significance (p ¼.072). Dose did influence burn area as shown by significance, which is.000 (p <.001). Y. Bureau/Journal of Medical Imaging and Radiation Sciences 43 (2012)

8 2 ( x x) 2 S = Sample variance or variance between individuals n 1 F = 2 n j ( x j x.. ) /( J 1) MS = 2 ( x x ) /( N J ) MS ij j BG Error Signal Noise Figure 6. The formula for the sample variance and the F-test also known as the analysis of variance. It is evident that the F-test formula is in fact variance over variance. The result then is the variance from one source of variance (effect variance) over that of random error. Consequently, large Fs that are interpreted as rare indicate that the effect is significantly larger than error allowing for an investigator to reject the null hypothesis. groups must be performed. However, with each comparison a Type 1 error is committed. If an experiment has three conditions, three pairs of comparisons must be conducted in order to fully analyze the experiment resulting in a Type 1 error for each comparison. Thus, the cumulative Type 1 error (.05 3) called the familywise error is.15. This means that the probability of observing a significant difference in means between groups is.15, which is an unacceptable error. To get around the familywise error the F-test must be performed. The F test is an omnibus test that provides us with a statistic capable to determining whether or not there are any group differences with one test as opposed to many. Of interest is that this Type 1 error would be.05 and not.15 as indicated previously. Figure 6 shows equations for the sample variance and for the analysis of variance (ANOVA). There are a number of ways to understand the F-test, but the focus here will be predominately conceptual. Sample variance focuses on random changes between individuals. This variation is not due to any manipulated variables. Interestingly, the same concept can be used to evaluate differences between groups. As means between groups differ, variance between groups increases, which is an excellent way to detect an effect. You will notice the ratio for the F statistic is variation between groups divided by variation between individuals providing us with a signal to noise ratio. If an F statistic is considered significant at the.05 level the null hypothesis of equal means between groups is rejected. Thus, no matter = the number of treatment groups in the experiment we have one statistic and a probability of.05 of making a Type 1 error. However, the F-test or ANOVA is an omnibus test. In statistics, omnibus is interpreted as determining whether or not an effect is present at all. This test does not provide information about group differences. To make that distinction, post hoc tests must be used. These tests take into consideration multiple comparisons and adjust the observed probabilities for each analysis or use statistics that would be more conservative compared to separate t-tests. Two post hoc tests will be discussed here. The Bonferroni method is one of the oldest methods of adjusting the familywise error. Because the Type 1 error is compounded with every comparison made, the per comparison alpha level is adjusted by dividing the per comparison alpha by the number of comparisons made. The investigator then compares the obtained alpha associated with the obtained t test to the alpha calculated by the Bonferroni procedure. For example, if an experiment has four groups, there are six possible comparisons. The per-comparison alpha would be divided by 6 resulting in a new per-comparison alpha of This would ensure that the familywise error is always.05. The Tukey test also known here as the honestly significant difference post hoc test or honestly significant difference Tukey is somewhat different. The equation is similar to the t test equation but provides us with a studentized q statistic (Figure 7). The major difference is that the error is taken from the F test conducted before conducting the post hoc tests. This statistic is then compared with the theoretical distribution of all possible q statistics under the null hypothesis. If the obtained statistic calculated for any comparison analysis is larger than the critical value, the groups would be considered significantly different from one another. Effect Size: Eta Squared (h 2 ) Effect size was previously discussed using the t test. However, eta squared is perhaps the best and most useful way to understand effect size. Eta squared provides us with the proportion of variance that is due to the independent variable (ie, the treatment in our experiment). The greater the variability between the groups relative to the total variance, the more the effect is said to be large. This is an incredibly useful statistic that is insensitive to sample size. Thus, regardless of the sample size it is possible to determine the size of the influence an independent variable has on the dependent variable. This statistic is not Figure 7. The studentized q formula for the honestly significant difference (HSD) Tukey post hoc test. The form is very similar to the t test formula. Like the t test, the signal is the difference between groups, although there is an emphasis on subtracting the smallest value from the largest, whereas this is unnecessary for the t-test. The error or mean squares (MS) error is directly taken from the F-test conducted before conducting the HSD Tukey, divided by the group size. The square root of the result is then taken. We then compare the q to a table of critical q values as we would the results from a t test (available in any statistics text book such as Hays 1994). Table 3 The Design for Chemotherapy Dose by Radiation Dose Chemotherapy Dose 1 Dose 2 Radiation Dose Dose This shows that all levels are crossed; therefore, we note that every dose of chemotherapy will be combined with every dose of radiation. 128 Y. Bureau/Journal of Medical Imaging and Radiation Sciences 43 (2012)

9 Figure 8. A formula for eta squared. This formula computes a ratio of variation between groups and the total variation in the data and is an indication of effect size. SS ¼ sum of squares, BG ¼ between groups. tested. There is no determining its importance using hypotheses. It is simply a matter of reporting the value. Interactions Interactions have the misfortune of being underused because of interpretation difficulties. Many investigators will attempt to interpret their data using a number of single factor (one IV) analyses of variance instead of conducting more complex analyses such as factorial analysis of variance (more than one IV). It is true that the interaction is the most complicated effect when conducting an analysis of variance but it will be evident that the more complicated analysis often simplifies the interpretation of data. Interactions are possible when an experiment has multiple IV. These designs cross all treatment groups so that the level of one IV will be combined separately at all levels of the other IVs. The definition of the interaction is that the effect of one IV is not the same at all levels of the other IVs. For example, if an investigator is interested in exploring the effect of radiation and chemotherapy dosage on the number of years a patient lives after being diagnosed with breast cancer, it would be reasonable to assume that the radiation dosage would not have the same ability to increase survival at all levels of chemotherapy dosage. This is the inconsistency mentioned earlier. This experiment with two levels per IV would have all levels crossed (Table 3). The interaction would best be visualized using line graphs (Figure 9). As we can see, the lines are not running parallel as would be expected when an interaction is observed. Hypotheses There would be a minimum of three sets of hypotheses; two main effects and one interaction hypothesis. Main Effects H o : There are no differences between the means for radiation. H a : There are differences between the means for radiation. H o : There are no differences between the means for chemotherapy. H a : There are differences between the means for chemotherapy. Interaction H o : There is no interaction between radiation and chemotherapy on survival means. H a : There is an interaction between radiation and chemotherapy on survival means. You will notice that the hypotheses are somewhat general. This is because of the omnibus nature of the F test. The F test is not designed to determine specific group differences. This is done using post hoc techniques afterwards. Figure 9. The interaction between radiation dose and chemotherapy dose on years lived. The effect of radiation dose at chemotherapy dose is not the same. At chemotherapy dose 1, the difference in years lived is not as great compared with the difference at dose 2 indicating that chemotherapy dose and radiation dose work together to increase lifespan. Consequently, it is concluded that radiation and chemotherapy dose interact resulting in an inconsistency of effect of one independent variable at levels of another. The graph shown is from PASW (SPSS). It is used as opposed to graphs from other programs because readers will most likely begin with graphs from PASW to explore their data. Note that the title is estimated marginal meansdthis is to say that PASW estimates the means as they might be in the population. There is no need for alarm. The means are nearly identical to those calculated from the data. Therefore, the graph perfectly represents the data. Also, this graph does not have error bars. PASW does not make it easy to indicate error bars. For a quick perusal, this is not important; however, for publication an investigator must indicate error bars and have proper titles for the axes. Why Not Use Multiple Single-Factor Analyses of Variance? For any set of data, it is advantageous to explain variance to the fullest. Designs with many IVs explain more of the variance in a database because there are more sources of variance. Consequently, there is less unexplained variation (also called error) resulting in a smaller error term. As the F statistic is more likely to be larger, rejecting the null hypothesis is more probable. Table 4 shows three sources of variance: the main effect for radiation dose, chemotherapy dose, and the interaction between both plus error. It should be evident now that the signal to noise ratio for the main effects would be smaller compared to conducting a single factor analysis of variance. Y. Bureau/Journal of Medical Imaging and Radiation Sciences 43 (2012)

10 Table 4 Analysis of Variance Summary Table for an Analysis of Variance for the Years Surviving Cancer Study A. Tests of between-subjects Effects Dependent Variable: Yrslivd Source Type III Sum of Squares df Mean square F Significance Corrected model Intercept Radiation Chemotherapy Radiation chemotherapy Error Total Corrected total R 2 ¼ (adjusted R 2 ¼ 0.633). B. Tests of between-subjects Effects Dependent Variable: Yrslivd Source Partial Eta Squared Noncentrality Parameter Observed Power Corrected model Intercept Radiation Chemotherapy Radiation Chemotherapy Computed using alpha ¼ There is a significant main effect for radiation dosage and chemotherapy dosage; a significant radiation by chemotherapy interaction is also observed. Simple Main Effects On occasion, after observing a significant interaction, an investigator might be interested in determining whether or not there are differences between means of one IV at levels of another IV. This is said by some to investigate the interaction. However, this has little to do with the interaction as it is not necessary to observe significant differences in means to observe an interaction. Nevertheless, simple main effect analysis consists of determining differences between levels of one IV at levels of another IV. You can conduct this analysis by using t tests but the results might not be accurate. To have a proper analysis, the error from the original factorial analysis must be used in separate single factor ANOVAs of one IV at levels of another IV. This is because there is no need to recalculate error. Error has already been determined and therefore can be used in subsequent analyses. The error then could be much smaller resulting in a larger signal to noise ratio. Most investigators will use an F test but replace the error term with the one from the overall factorial analysis of variance. Limitations of This Package 1. First, only a few tests were discussed. They were the t - test and the F -test for completely between subjects designs. Repeated measures designs are beyond the scope of this package. Twice the material would be needed to do it justice. I chose instead to concentrate on key concepts not previously covered in this journal. 2. Second, the reader might recognize that the material here is somewhat unorthodox. Explaining the concepts using MC methods is not common. However, after teaching statistics for nearly 15 years to students not familiar with any statistics has shown me that this is an excellent method. Students can complete an entire course and have no idea what the Type 1 error represents. This method ensures that the student understands the meaning behind a rare event and consequently the Type 1 error. 3. Third, we did not touch upon designs of association. Again, this would be somewhat extensive. They comprise the Pearson product moment correlation, simple regression, multiple regression, and one of my favorites, factor analysis. Some of these were touched upon in Barker [1]. 4. Fourth, we did not touch on any of the multivariate tests. For that type of discussion we must have a strong grasp of associational analysis and repeated measures analysis. Again, this was beyond the scope of this package. Strengths of This Package 1. This document clearly instructs in how to evaluate tests of means. The t - test and F test should be clearly understood. 2. The reader is more likely to understand error and estimation problems. Consequently, the Type 1 error should no longer be a conceptual or interpretive challenge. 3. Even though there was little in terms of graphing data, the reader may have a greater appreciation for data presentation. 130 Y. Bureau/Journal of Medical Imaging and Radiation Sciences 43 (2012)

11 4. Not previously covered in this journal are interaction effects. This may be the single most important tool discussed in this package. Interaction effects are often underused. By implementing factorial designs, the reader will be capable of evaluating main and interaction effects alike, which is a clear boon for the reader. Dos and Don ts When Interpreting Results There are so many mistakes made in either performing statistical analyses or interpreting them that it is not reasonable to go over them all. However, there are a few that pertain directly to this package for which there is now an understanding. 1. The Type 1 error indicated as Sig. in PASW is not an indication of strength of the effect. Often, an investigator will be tempted to say, based on the Type 1 value (P value) that their results are either very or not very significant. All that should be said is whether their results are significant or not. Therefore, don t report on the strength of your effect using the P value. 2. In addition to the P value, two more things should be reported if the journal editors allow: 1) effect size and 2) power. This allows for an investigator to indicate the effect and the probability of observing a significant result if the study were conducted again. 3. Graphing should always include error bars (standard error of the mean). This is the best way to present data. Conclusion This package is written in the spirit of properly instructing the reader in some fundamental concepts underlying statistical analyses. The Type 1 error, estimation error, effect size, power, t test, F tests, post hoc analysis, simple main effects, and interaction effects were covered. I would recommend reading texts on repeated measures next. These tools would be the logical continuation of this package. Following those readings, I would study associational designs and then all the non-parametric tests such as Mann-Whitney U tests and the Wilcoxon sign-ranked tests. I hope that this package is well received and that it has proved useful. References [1] Barker, R. F. (2007). Deciphering statistics in research: a beginner s guide to statistics for radiation science professionals. Can J Med Radiat Technol [2] Hays, W. L. (1994). Statistics, (5th ed.). New York: Harcourt-Brace. [3] Metropolis, N., & Ulam, S. (1949). The Monte Carlo method. J Am Stat Assoc 44, [4] Metropolis, N. (1987). The beginning of the Monte Carlo method. Los Alamos Science (Special Issue) [5] Boneau, C. A. (1960). The effects of violations of assumptions underlying the t test. Psychol Bull 57, [6] Howell, D. C. (1994). Statistical Methods for Psychology, (5th ed.). Pacific Grove, CA: Duxbury. [7] De Muth, J. E. (2008). Preparing for the first meeting with a statistician. Am J Health Syst Pharm 15, [8] Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences, (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates. Y. Bureau/Journal of Medical Imaging and Radiation Sciences 43 (2012)

Psychology Research Process

Psychology Research Process Psychology Research Process Logical Processes Induction Observation/Association/Using Correlation Trying to assess, through observation of a large group/sample, what is associated with what? Examples:

More information

Chapter 11. Experimental Design: One-Way Independent Samples Design

Chapter 11. Experimental Design: One-Way Independent Samples Design 11-1 Chapter 11. Experimental Design: One-Way Independent Samples Design Advantages and Limitations Comparing Two Groups Comparing t Test to ANOVA Independent Samples t Test Independent Samples ANOVA Comparing

More information

Psychology Research Process

Psychology Research Process Psychology Research Process Logical Processes Induction Observation/Association/Using Correlation Trying to assess, through observation of a large group/sample, what is associated with what? Examples:

More information

Business Statistics Probability

Business Statistics Probability Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

One-Way ANOVAs t-test two statistically significant Type I error alpha null hypothesis dependant variable Independent variable three levels;

One-Way ANOVAs t-test two statistically significant Type I error alpha null hypothesis dependant variable Independent variable three levels; 1 One-Way ANOVAs We have already discussed the t-test. The t-test is used for comparing the means of two groups to determine if there is a statistically significant difference between them. The t-test

More information

Two-Way Independent Samples ANOVA with SPSS

Two-Way Independent Samples ANOVA with SPSS Two-Way Independent Samples ANOVA with SPSS Obtain the file ANOVA.SAV from my SPSS Data page. The data are those that appear in Table 17-3 of Howell s Fundamental statistics for the behavioral sciences

More information

Analysis of single gene effects 1. Quantitative analysis of single gene effects. Gregory Carey, Barbara J. Bowers, Jeanne M.

Analysis of single gene effects 1. Quantitative analysis of single gene effects. Gregory Carey, Barbara J. Bowers, Jeanne M. Analysis of single gene effects 1 Quantitative analysis of single gene effects Gregory Carey, Barbara J. Bowers, Jeanne M. Wehner From the Department of Psychology (GC, JMW) and Institute for Behavioral

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

One-Way Independent ANOVA

One-Way Independent ANOVA One-Way Independent ANOVA Analysis of Variance (ANOVA) is a common and robust statistical test that you can use to compare the mean scores collected from different conditions or groups in an experiment.

More information

Testing Means. Related-Samples t Test With Confidence Intervals. 6. Compute a related-samples t test and interpret the results.

Testing Means. Related-Samples t Test With Confidence Intervals. 6. Compute a related-samples t test and interpret the results. 10 Learning Objectives Testing Means After reading this chapter, you should be able to: Related-Samples t Test With Confidence Intervals 1. Describe two types of research designs used when we select related

More information

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

3 CONCEPTUAL FOUNDATIONS OF STATISTICS 3 CONCEPTUAL FOUNDATIONS OF STATISTICS In this chapter, we examine the conceptual foundations of statistics. The goal is to give you an appreciation and conceptual understanding of some basic statistical

More information

A Brief (very brief) Overview of Biostatistics. Jody Kreiman, PhD Bureau of Glottal Affairs

A Brief (very brief) Overview of Biostatistics. Jody Kreiman, PhD Bureau of Glottal Affairs A Brief (very brief) Overview of Biostatistics Jody Kreiman, PhD Bureau of Glottal Affairs What We ll Cover Fundamentals of measurement Parametric versus nonparametric tests Descriptive versus inferential

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Please note the page numbers listed for the Lind book may vary by a page or two depending on which version of the textbook you have. Readings: Lind 1 11 (with emphasis on chapters 10, 11) Please note chapter

More information

Still important ideas

Still important ideas Readings: OpenStax - Chapters 1 13 & Appendix D & E (online) Plous Chapters 17 & 18 - Chapter 17: Social Influences - Chapter 18: Group Judgments and Decisions Still important ideas Contrast the measurement

More information

Inferential Statistics

Inferential Statistics Inferential Statistics and t - tests ScWk 242 Session 9 Slides Inferential Statistics Ø Inferential statistics are used to test hypotheses about the relationship between the independent and the dependent

More information

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc.

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc. Chapter 23 Inference About Means Copyright 2010 Pearson Education, Inc. Getting Started Now that we know how to create confidence intervals and test hypotheses about proportions, it d be nice to be able

More information

Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1:

Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1: Research Methods 1 Handouts, Graham Hole,COGS - version 10, September 000: Page 1: T-TESTS: When to use a t-test: The simplest experimental design is to have two conditions: an "experimental" condition

More information

Midterm Exam MMI 409 Spring 2009 Gordon Bleil

Midterm Exam MMI 409 Spring 2009 Gordon Bleil Midterm Exam MMI 409 Spring 2009 Gordon Bleil Table of contents: (Hyperlinked to problem sections) Problem 1 Hypothesis Tests Results Inferences Problem 2 Hypothesis Tests Results Inferences Problem 3

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

INADEQUACIES OF SIGNIFICANCE TESTS IN

INADEQUACIES OF SIGNIFICANCE TESTS IN INADEQUACIES OF SIGNIFICANCE TESTS IN EDUCATIONAL RESEARCH M. S. Lalithamma Masoomeh Khosravi Tests of statistical significance are a common tool of quantitative research. The goal of these tests is to

More information

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Plous Chapters 17 & 18 Chapter 17: Social Influences Chapter 18: Group Judgments and Decisions

More information

Examining differences between two sets of scores

Examining differences between two sets of scores 6 Examining differences between two sets of scores In this chapter you will learn about tests which tell us if there is a statistically significant difference between two sets of scores. In so doing you

More information

Readings Assumed knowledge

Readings Assumed knowledge 3 N = 59 EDUCAT 59 TEACHG 59 CAMP US 59 SOCIAL Analysis of Variance 95% CI Lecture 9 Survey Research & Design in Psychology James Neill, 2012 Readings Assumed knowledge Howell (2010): Ch3 The Normal Distribution

More information

CHAPTER THIRTEEN. Data Analysis and Interpretation: Part II.Tests of Statistical Significance and the Analysis Story CHAPTER OUTLINE

CHAPTER THIRTEEN. Data Analysis and Interpretation: Part II.Tests of Statistical Significance and the Analysis Story CHAPTER OUTLINE CHAPTER THIRTEEN Data Analysis and Interpretation: Part II.Tests of Statistical Significance and the Analysis Story CHAPTER OUTLINE OVERVIEW NULL HYPOTHESIS SIGNIFICANCE TESTING (NHST) EXPERIMENTAL SENSITIVITY

More information

Still important ideas

Still important ideas Readings: OpenStax - Chapters 1 11 + 13 & Appendix D & E (online) Plous - Chapters 2, 3, and 4 Chapter 2: Cognitive Dissonance, Chapter 3: Memory and Hindsight Bias, Chapter 4: Context Dependence Still

More information

Research Questions, Variables, and Hypotheses: Part 2. Review. Hypotheses RCS /7/04. What are research questions? What are variables?

Research Questions, Variables, and Hypotheses: Part 2. Review. Hypotheses RCS /7/04. What are research questions? What are variables? Research Questions, Variables, and Hypotheses: Part 2 RCS 6740 6/7/04 1 Review What are research questions? What are variables? Definition Function Measurement Scale 2 Hypotheses OK, now that we know how

More information

Chapter 13: Introduction to Analysis of Variance

Chapter 13: Introduction to Analysis of Variance Chapter 13: Introduction to Analysis of Variance Although the t-test is a useful statistic, it is limited to testing hypotheses about two conditions or levels. The analysis of variance (ANOVA) was developed

More information

Analysis of Variance (ANOVA)

Analysis of Variance (ANOVA) Research Methods and Ethics in Psychology Week 4 Analysis of Variance (ANOVA) One Way Independent Groups ANOVA Brief revision of some important concepts To introduce the concept of familywise error rate.

More information

The t-test: Answers the question: is the difference between the two conditions in my experiment "real" or due to chance?

The t-test: Answers the question: is the difference between the two conditions in my experiment real or due to chance? The t-test: Answers the question: is the difference between the two conditions in my experiment "real" or due to chance? Two versions: (a) Dependent-means t-test: ( Matched-pairs" or "one-sample" t-test).

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

ANOVA in SPSS (Practical)

ANOVA in SPSS (Practical) ANOVA in SPSS (Practical) Analysis of Variance practical In this practical we will investigate how we model the influence of a categorical predictor on a continuous response. Centre for Multilevel Modelling

More information

Comparing Two Means using SPSS (T-Test)

Comparing Two Means using SPSS (T-Test) Indira Gandhi Institute of Development Research From the SelectedWorks of Durgesh Chandra Pathak Winter January 23, 2009 Comparing Two Means using SPSS (T-Test) Durgesh Chandra Pathak Available at: https://works.bepress.com/durgesh_chandra_pathak/12/

More information

Chapter 12: Introduction to Analysis of Variance

Chapter 12: Introduction to Analysis of Variance Chapter 12: Introduction to Analysis of Variance of Variance Chapter 12 presents the general logic and basic formulas for the hypothesis testing procedure known as analysis of variance (ANOVA). The purpose

More information

CHAPTER ONE CORRELATION

CHAPTER ONE CORRELATION CHAPTER ONE CORRELATION 1.0 Introduction The first chapter focuses on the nature of statistical data of correlation. The aim of the series of exercises is to ensure the students are able to use SPSS to

More information

Investigating the robustness of the nonparametric Levene test with more than two groups

Investigating the robustness of the nonparametric Levene test with more than two groups Psicológica (2014), 35, 361-383. Investigating the robustness of the nonparametric Levene test with more than two groups David W. Nordstokke * and S. Mitchell Colp University of Calgary, Canada Testing

More information

Hypothesis Testing. Richard S. Balkin, Ph.D., LPC-S, NCC

Hypothesis Testing. Richard S. Balkin, Ph.D., LPC-S, NCC Hypothesis Testing Richard S. Balkin, Ph.D., LPC-S, NCC Overview When we have questions about the effect of a treatment or intervention or wish to compare groups, we use hypothesis testing Parametric statistics

More information

Designing Psychology Experiments: Data Analysis and Presentation

Designing Psychology Experiments: Data Analysis and Presentation Data Analysis and Presentation Review of Chapter 4: Designing Experiments Develop Hypothesis (or Hypotheses) from Theory Independent Variable(s) and Dependent Variable(s) Operational Definitions of each

More information

The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016

The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016 The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016 This course does not cover how to perform statistical tests on SPSS or any other computer program. There are several courses

More information

04/12/2014. Research Methods in Psychology. Chapter 6: Independent Groups Designs. What is your ideas? Testing

04/12/2014. Research Methods in Psychology. Chapter 6: Independent Groups Designs. What is your ideas? Testing Research Methods in Psychology Chapter 6: Independent Groups Designs 1 Why Psychologists Conduct Experiments? What is your ideas? 2 Why Psychologists Conduct Experiments? Testing Hypotheses derived from

More information

Advanced ANOVA Procedures

Advanced ANOVA Procedures Advanced ANOVA Procedures Session Lecture Outline:. An example. An example. Two-way ANOVA. An example. Two-way Repeated Measures ANOVA. MANOVA. ANalysis of Co-Variance (): an ANOVA procedure whereby the

More information

Correlation and Regression

Correlation and Regression Dublin Institute of Technology ARROW@DIT Books/Book Chapters School of Management 2012-10 Correlation and Regression Donal O'Brien Dublin Institute of Technology, donal.obrien@dit.ie Pamela Sharkey Scott

More information

Two-Way Independent ANOVA

Two-Way Independent ANOVA Two-Way Independent ANOVA Analysis of Variance (ANOVA) a common and robust statistical test that you can use to compare the mean scores collected from different conditions or groups in an experiment. There

More information

UNEQUAL CELL SIZES DO MATTER

UNEQUAL CELL SIZES DO MATTER 1 of 7 1/12/2010 11:26 AM UNEQUAL CELL SIZES DO MATTER David C. Howell Most textbooks dealing with factorial analysis of variance will tell you that unequal cell sizes alter the analysis in some way. I

More information

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14 Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14 Still important ideas Contrast the measurement of observable actions (and/or characteristics)

More information

Objectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests

Objectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests Objectives Quantifying the quality of hypothesis tests Type I and II errors Power of a test Cautions about significance tests Designing Experiments based on power Evaluating a testing procedure The testing

More information

POST GRADUATE DIPLOMA IN BIOETHICS (PGDBE) Term-End Examination June, 2016 MHS-014 : RESEARCH METHODOLOGY

POST GRADUATE DIPLOMA IN BIOETHICS (PGDBE) Term-End Examination June, 2016 MHS-014 : RESEARCH METHODOLOGY No. of Printed Pages : 12 MHS-014 POST GRADUATE DIPLOMA IN BIOETHICS (PGDBE) Term-End Examination June, 2016 MHS-014 : RESEARCH METHODOLOGY Time : 2 hours Maximum Marks : 70 PART A Attempt all questions.

More information

Comparing 3 Means- ANOVA

Comparing 3 Means- ANOVA Comparing 3 Means- ANOVA Evaluation Methods & Statistics- Lecture 7 Dr Benjamin Cowan Research Example- Theory of Planned Behaviour Ajzen & Fishbein (1981) One of the most prominent models of behaviour

More information

Basic Statistics and Data Analysis in Work psychology: Statistical Examples

Basic Statistics and Data Analysis in Work psychology: Statistical Examples Basic Statistics and Data Analysis in Work psychology: Statistical Examples WORK PSYCHOLOGY INTRODUCTION In this chapter we examine a topic which is given too little coverage in most texts of this kind,

More information

Problem #1 Neurological signs and symptoms of ciguatera poisoning as the start of treatment and 2.5 hours after treatment with mannitol.

Problem #1 Neurological signs and symptoms of ciguatera poisoning as the start of treatment and 2.5 hours after treatment with mannitol. Ho (null hypothesis) Ha (alternative hypothesis) Problem #1 Neurological signs and symptoms of ciguatera poisoning as the start of treatment and 2.5 hours after treatment with mannitol. Hypothesis: Ho:

More information

CHAPTER - 6 STATISTICAL ANALYSIS. This chapter discusses inferential statistics, which use sample data to

CHAPTER - 6 STATISTICAL ANALYSIS. This chapter discusses inferential statistics, which use sample data to CHAPTER - 6 STATISTICAL ANALYSIS 6.1 Introduction This chapter discusses inferential statistics, which use sample data to make decisions or inferences about population. Populations are group of interest

More information

Appendix B Statistical Methods

Appendix B Statistical Methods Appendix B Statistical Methods Figure B. Graphing data. (a) The raw data are tallied into a frequency distribution. (b) The same data are portrayed in a bar graph called a histogram. (c) A frequency polygon

More information

Reliability, validity, and all that jazz

Reliability, validity, and all that jazz Reliability, validity, and all that jazz Dylan Wiliam King s College London Published in Education 3-13, 29 (3) pp. 17-21 (2001) Introduction No measuring instrument is perfect. If we use a thermometer

More information

BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA

BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA PART 1: Introduction to Factorial ANOVA ingle factor or One - Way Analysis of Variance can be used to test the null hypothesis that k or more treatment or group

More information

9 research designs likely for PSYC 2100

9 research designs likely for PSYC 2100 9 research designs likely for PSYC 2100 1) 1 factor, 2 levels, 1 group (one group gets both treatment levels) related samples t-test (compare means of 2 levels only) 2) 1 factor, 2 levels, 2 groups (one

More information

t-test for r Copyright 2000 Tom Malloy. All rights reserved

t-test for r Copyright 2000 Tom Malloy. All rights reserved t-test for r Copyright 2000 Tom Malloy. All rights reserved This is the text of the in-class lecture which accompanied the Authorware visual graphics on this topic. You may print this text out and use

More information

Sheila Barron Statistics Outreach Center 2/8/2011

Sheila Barron Statistics Outreach Center 2/8/2011 Sheila Barron Statistics Outreach Center 2/8/2011 What is Power? When conducting a research study using a statistical hypothesis test, power is the probability of getting statistical significance when

More information

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug? MMI 409 Spring 2009 Final Examination Gordon Bleil Table of Contents Research Scenario and General Assumptions Questions for Dataset (Questions are hyperlinked to detailed answers) 1. Is there a difference

More information

Audio: In this lecture we are going to address psychology as a science. Slide #2

Audio: In this lecture we are going to address psychology as a science. Slide #2 Psychology 312: Lecture 2 Psychology as a Science Slide #1 Psychology As A Science In this lecture we are going to address psychology as a science. Slide #2 Outline Psychology is an empirical science.

More information

Chapter 9: Comparing two means

Chapter 9: Comparing two means Chapter 9: Comparing two means Smart Alex s Solutions Task 1 Is arachnophobia (fear of spiders) specific to real spiders or will pictures of spiders evoke similar levels of anxiety? Twelve arachnophobes

More information

Kepler tried to record the paths of planets in the sky, Harvey to measure the flow of blood in the circulatory system, and chemists tried to produce

Kepler tried to record the paths of planets in the sky, Harvey to measure the flow of blood in the circulatory system, and chemists tried to produce Stats 95 Kepler tried to record the paths of planets in the sky, Harvey to measure the flow of blood in the circulatory system, and chemists tried to produce pure gold knowing it was an element, though

More information

PSY 216: Elementary Statistics Exam 4

PSY 216: Elementary Statistics Exam 4 Name: PSY 16: Elementary Statistics Exam 4 This exam consists of multiple-choice questions and essay / problem questions. For each multiple-choice question, circle the one letter that corresponds to the

More information

Chapter 12. The One- Sample

Chapter 12. The One- Sample Chapter 12 The One- Sample z-test Objective We are going to learn to make decisions about a population parameter based on sample information. Lesson 12.1. Testing a Two- Tailed Hypothesis Example 1: Let's

More information

CHAPTER III METHODOLOGY

CHAPTER III METHODOLOGY 24 CHAPTER III METHODOLOGY This chapter presents the methodology of the study. There are three main sub-titles explained; research design, data collection, and data analysis. 3.1. Research Design The study

More information

11/24/2017. Do not imply a cause-and-effect relationship

11/24/2017. Do not imply a cause-and-effect relationship Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are highly extraverted people less afraid of rejection

More information

Intro to SPSS. Using SPSS through WebFAS

Intro to SPSS. Using SPSS through WebFAS Intro to SPSS Using SPSS through WebFAS http://www.yorku.ca/computing/students/labs/webfas/ Try it early (make sure it works from your computer) If you need help contact UIT Client Services Voice: 416-736-5800

More information

Overview of Lecture. Survey Methods & Design in Psychology. Correlational statistics vs tests of differences between groups

Overview of Lecture. Survey Methods & Design in Psychology. Correlational statistics vs tests of differences between groups Survey Methods & Design in Psychology Lecture 10 ANOVA (2007) Lecturer: James Neill Overview of Lecture Testing mean differences ANOVA models Interactions Follow-up tests Effect sizes Parametric Tests

More information

A Spreadsheet for Deriving a Confidence Interval, Mechanistic Inference and Clinical Inference from a P Value

A Spreadsheet for Deriving a Confidence Interval, Mechanistic Inference and Clinical Inference from a P Value SPORTSCIENCE Perspectives / Research Resources A Spreadsheet for Deriving a Confidence Interval, Mechanistic Inference and Clinical Inference from a P Value Will G Hopkins sportsci.org Sportscience 11,

More information

Bayesian and Frequentist Approaches

Bayesian and Frequentist Approaches Bayesian and Frequentist Approaches G. Jogesh Babu Penn State University http://sites.stat.psu.edu/ babu http://astrostatistics.psu.edu All models are wrong But some are useful George E. P. Box (son-in-law

More information

Reliability, validity, and all that jazz

Reliability, validity, and all that jazz Reliability, validity, and all that jazz Dylan Wiliam King s College London Introduction No measuring instrument is perfect. The most obvious problems relate to reliability. If we use a thermometer to

More information

LAB ASSIGNMENT 4 INFERENCES FOR NUMERICAL DATA. Comparison of Cancer Survival*

LAB ASSIGNMENT 4 INFERENCES FOR NUMERICAL DATA. Comparison of Cancer Survival* LAB ASSIGNMENT 4 1 INFERENCES FOR NUMERICAL DATA In this lab assignment, you will analyze the data from a study to compare survival times of patients of both genders with different primary cancers. First,

More information

Bayesian Tailored Testing and the Influence

Bayesian Tailored Testing and the Influence Bayesian Tailored Testing and the Influence of Item Bank Characteristics Carl J. Jensema Gallaudet College Owen s (1969) Bayesian tailored testing method is introduced along with a brief review of its

More information

EXPERIMENTAL DESIGN Page 1 of 11. relationships between certain events in the environment and the occurrence of particular

EXPERIMENTAL DESIGN Page 1 of 11. relationships between certain events in the environment and the occurrence of particular EXPERIMENTAL DESIGN Page 1 of 11 I. Introduction to Experimentation 1. The experiment is the primary means by which we are able to establish cause-effect relationships between certain events in the environment

More information

Lecture 4: Research Approaches

Lecture 4: Research Approaches Lecture 4: Research Approaches Lecture Objectives Theories in research Research design approaches ú Experimental vs. non-experimental ú Cross-sectional and longitudinal ú Descriptive approaches How to

More information

1 The conceptual underpinnings of statistical power

1 The conceptual underpinnings of statistical power 1 The conceptual underpinnings of statistical power The importance of statistical power As currently practiced in the social and health sciences, inferential statistics rest solidly upon two pillars: statistical

More information

ANOVA. Thomas Elliott. January 29, 2013

ANOVA. Thomas Elliott. January 29, 2013 ANOVA Thomas Elliott January 29, 2013 ANOVA stands for analysis of variance and is one of the basic statistical tests we can use to find relationships between two or more variables. ANOVA compares the

More information

Online Introduction to Statistics

Online Introduction to Statistics APPENDIX Online Introduction to Statistics CHOOSING THE CORRECT ANALYSIS To analyze statistical data correctly, you must choose the correct statistical test. The test you should use when you have interval

More information

CHAPTER NINE DATA ANALYSIS / EVALUATING QUALITY (VALIDITY) OF BETWEEN GROUP EXPERIMENTS

CHAPTER NINE DATA ANALYSIS / EVALUATING QUALITY (VALIDITY) OF BETWEEN GROUP EXPERIMENTS CHAPTER NINE DATA ANALYSIS / EVALUATING QUALITY (VALIDITY) OF BETWEEN GROUP EXPERIMENTS Chapter Objectives: Understand Null Hypothesis Significance Testing (NHST) Understand statistical significance and

More information

Chapter 02 Developing and Evaluating Theories of Behavior

Chapter 02 Developing and Evaluating Theories of Behavior Chapter 02 Developing and Evaluating Theories of Behavior Multiple Choice Questions 1. A theory is a(n): A. plausible or scientifically acceptable, well-substantiated explanation of some aspect of the

More information

investigate. educate. inform.

investigate. educate. inform. investigate. educate. inform. Research Design What drives your research design? The battle between Qualitative and Quantitative is over Think before you leap What SHOULD drive your research design. Advanced

More information

Experimental Psychology

Experimental Psychology Title Experimental Psychology Type Individual Document Map Authors Aristea Theodoropoulos, Patricia Sikorski Subject Social Studies Course None Selected Grade(s) 11, 12 Location Roxbury High School Curriculum

More information

A SAS Macro to Investigate Statistical Power in Meta-analysis Jin Liu, Fan Pan University of South Carolina Columbia

A SAS Macro to Investigate Statistical Power in Meta-analysis Jin Liu, Fan Pan University of South Carolina Columbia Paper 109 A SAS Macro to Investigate Statistical Power in Meta-analysis Jin Liu, Fan Pan University of South Carolina Columbia ABSTRACT Meta-analysis is a quantitative review method, which synthesizes

More information

Psy201 Module 3 Study and Assignment Guide. Using Excel to Calculate Descriptive and Inferential Statistics

Psy201 Module 3 Study and Assignment Guide. Using Excel to Calculate Descriptive and Inferential Statistics Psy201 Module 3 Study and Assignment Guide Using Excel to Calculate Descriptive and Inferential Statistics What is Excel? Excel is a spreadsheet program that allows one to enter numerical values or data

More information

Statistics Guide. Prepared by: Amanda J. Rockinson- Szapkiw, Ed.D.

Statistics Guide. Prepared by: Amanda J. Rockinson- Szapkiw, Ed.D. This guide contains a summary of the statistical terms and procedures. This guide can be used as a reference for course work and the dissertation process. However, it is recommended that you refer to statistical

More information

Statistics as a Tool. A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations.

Statistics as a Tool. A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations. Statistics as a Tool A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations. Descriptive Statistics Numerical facts or observations that are organized describe

More information

Statistical Methods and Reasoning for the Clinical Sciences

Statistical Methods and Reasoning for the Clinical Sciences Statistical Methods and Reasoning for the Clinical Sciences Evidence-Based Practice Eiki B. Satake, PhD Contents Preface Introduction to Evidence-Based Statistics: Philosophical Foundation and Preliminaries

More information

Chapter 19. Confidence Intervals for Proportions. Copyright 2010 Pearson Education, Inc.

Chapter 19. Confidence Intervals for Proportions. Copyright 2010 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions Copyright 2010 Pearson Education, Inc. Standard Error Both of the sampling distributions we ve looked at are Normal. For proportions For means SD pˆ pq n

More information

Repeated Measures ANOVA and Mixed Model ANOVA. Comparing more than two measurements of the same or matched participants

Repeated Measures ANOVA and Mixed Model ANOVA. Comparing more than two measurements of the same or matched participants Repeated Measures ANOVA and Mixed Model ANOVA Comparing more than two measurements of the same or matched participants Data files Fatigue.sav MentalRotation.sav AttachAndSleep.sav Attitude.sav Homework:

More information

Between Groups & Within-Groups ANOVA

Between Groups & Within-Groups ANOVA Between Groups & Within-Groups ANOVA BG & WG ANOVA Partitioning Variation making F making effect sizes Things that influence F Confounding Inflated within-condition variability Integrating stats & methods

More information

AMSc Research Methods Research approach IV: Experimental [2]

AMSc Research Methods Research approach IV: Experimental [2] AMSc Research Methods Research approach IV: Experimental [2] Marie-Luce Bourguet mlb@dcs.qmul.ac.uk Statistical Analysis 1 Statistical Analysis Descriptive Statistics : A set of statistical procedures

More information

CLINICAL RESEARCH METHODS VISP356. MODULE LEADER: PROF A TOMLINSON B.Sc./B.Sc.(HONS) OPTOMETRY

CLINICAL RESEARCH METHODS VISP356. MODULE LEADER: PROF A TOMLINSON B.Sc./B.Sc.(HONS) OPTOMETRY DIVISION OF VISION SCIENCES SESSION: 2006/2007 DIET: 1ST CLINICAL RESEARCH METHODS VISP356 LEVEL: MODULE LEADER: PROF A TOMLINSON B.Sc./B.Sc.(HONS) OPTOMETRY MAY 2007 DURATION: 2 HRS CANDIDATES SHOULD

More information

Table of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017

Table of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017 Essential Statistics for Nursing Research Kristen Carlin, MPH Seattle Nursing Research Workshop January 30, 2017 Table of Contents Plots Descriptive statistics Sample size/power Correlations Hypothesis

More information

Biostatistics 3. Developed by Pfizer. March 2018

Biostatistics 3. Developed by Pfizer. March 2018 BROUGHT TO YOU BY Biostatistics 3 Developed by Pfizer March 2018 This learning module is intended for UK healthcare professionals only. Job bag: PP-GEP-GBR-0986 Date of preparation March 2018. Agenda I.

More information

STATISTICS AND RESEARCH DESIGN

STATISTICS AND RESEARCH DESIGN Statistics 1 STATISTICS AND RESEARCH DESIGN These are subjects that are frequently confused. Both subjects often evoke student anxiety and avoidance. To further complicate matters, both areas appear have

More information

Chapter 19. Confidence Intervals for Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Chapter 19. Confidence Intervals for Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions Copyright 2010, 2007, 2004 Pearson Education, Inc. Standard Error Both of the sampling distributions we ve looked at are Normal. For proportions For means

More information

Statistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN

Statistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN Statistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN Vs. 2 Background 3 There are different types of research methods to study behaviour: Descriptive: observations,

More information

Confidence Intervals On Subsets May Be Misleading

Confidence Intervals On Subsets May Be Misleading Journal of Modern Applied Statistical Methods Volume 3 Issue 2 Article 2 11-1-2004 Confidence Intervals On Subsets May Be Misleading Juliet Popper Shaffer University of California, Berkeley, shaffer@stat.berkeley.edu

More information

Lesson 9: Two Factor ANOVAS

Lesson 9: Two Factor ANOVAS Published on Agron 513 (https://courses.agron.iastate.edu/agron513) Home > Lesson 9 Lesson 9: Two Factor ANOVAS Developed by: Ron Mowers, Marin Harbur, and Ken Moore Completion Time: 1 week Introduction

More information

Applied Statistical Analysis EDUC 6050 Week 4

Applied Statistical Analysis EDUC 6050 Week 4 Applied Statistical Analysis EDUC 6050 Week 4 Finding clarity using data Today 1. Hypothesis Testing with Z Scores (continued) 2. Chapters 6 and 7 in Book 2 Review! = $ & '! = $ & ' * ) 1. Which formula

More information

Encoding of Elements and Relations of Object Arrangements by Young Children

Encoding of Elements and Relations of Object Arrangements by Young Children Encoding of Elements and Relations of Object Arrangements by Young Children Leslee J. Martin (martin.1103@osu.edu) Department of Psychology & Center for Cognitive Science Ohio State University 216 Lazenby

More information