Comparing Sample Size Requirements for Significance Tests and Confidence Intervals

Size: px
Start display at page:

Download "Comparing Sample Size Requirements for Significance Tests and Confidence Intervals"

Transcription

1 Outcome Research Design Comparing Sample Size Requirements for Significance Tests and Confidence Intervals Counseling Outcome Research and Evaluation 4(1) 3-12 ª The Author(s) 2013 Reprints and permission: sagepub.com/journalspermissions.nav DOI: / core.sagepub.com Xiaofeng Steven Liu 1 Abstract We compare the sample size requirements for significance tests and confidence intervals by calculating power of each. The power of confidence interval is defined as the probability of obtaining a short interval width conditional on that the confidence interval includes the parameter of interest. We find that a smaller sample size is required to attain a desired statistical power as compared to comparable power of confidence interval in the two sample independent t test, which is illustrated in an example study that examines the outcome difference between psychotherapy and control in treating depression. Keywords sample size, statistical power, confidence interval Sample size requirement has long been studied in statistical power analysis (see Cohen, 1988) as adequate sample sizes are essential to achieving high statistical power in null hypothesis significance testing. Statistical power refers to the probability of rejecting a false null hypothesis in a significance test, which establishes the plausibility of the research hypothesis by falsifying the null hypothesis with the evidence from empirical data. As a research study is to substantiate the existence of a treatment effect in hypothesis testing, statistical power directly speaks to the likelihood of achieving the study goal. Although statistical power is critical to a research endeavor, many research studies lack sufficient statistical power and, in turn, produce inconclusive findings (Cohen, 1992; Rossi, 1990; Sedlmeier & Gigerenzer, 1989; Sink & Mvududu, 2010). Although inadequate sample size is often the cause of low-statistical power, other factors such as significance level, effect size, and error variance can all influence statistical power (Cohen, 1988). The significance level depends on the a, which is the maximum allowed Type I error. The smaller the Type I error is, the larger the Type II error becomes. In other words, a more stringent significance level will increase the Type II error and, in turn, decrease statistical power. The effect size indicates how large an effect it is due to the treatment or intervention. The error variance shows the amount of variation in the outcome 1 University of South Carolina, Columbia, SC, USA Submitted February 18, Revised October 17, Accepted October 28, Corresponding Author: Xiaofeng Steven Liu, University of South Carolina, 820 S Main St, Columbia, SC 29208, USA. xliu@mailbox.sc.edu

2 4 Counseling Outcome Research and Evaluation 4(1) due to individual idiosyncrasy. These factors are not easily subject to the control of a researcher. The significance level is traditionally predetermined at 5%, and the effect size is a fixed quantity that a researcher cannot change. The researcher can use a more valid and reliable instrument or employ a stronger research design to reduce error variance to some extent. Thus, the researcher has the most opportunity to choose an appropriate sample size to achieve high statistical power in testing a possible treatment effect. If power analysis is not done properly, the selected sample size will not yield the desired statistical power (i.e.,.80 or higher). Thus, low-statistical power is sometimes synonymous with inadequate sample size. Although underpowered studies still plague research studies, the chronic issue of insufficient sample size has not drawn due attention in the current debate about significance test versus confidence interval. There has been a motion to replace significance tests with confidence intervals in reporting research findings (APA, 2009; Bland, 2009; Cumming et al., 2007; International Committee of Medical Journal Editors, 1988; Wilkinson & American Psychological Association Task Force on Statistical Inference, 1999). Serious concerns have been raised about the fact that research reports are sometimes fraught with misinterpretation of the p value. As significance tests culminate in a ubiquitous p value, many scholars have started to question the practice of running significance tests. The criticisms of null hypothesis significance testing surround the confusion about the probability of the null hypothesis being true and the probability of observing the data given the null hypothesis, the illusion over statistical significance and practical importance, and the lack of information about the magnitude of the effect size (Cumming et al, 2007; Hunter, 1997; Thompson, 2002, 2006). The supporters of significance test, however, argued that the critics of null hypothesis significance testing identified the wrong cause of lack of scientific progress. They found that the criticism of null hypothesis significance testing mainly focused on the misuse of the procedure rather than the procedure itself. If the procedure were not correctly used, then the natural remedy would be to correct the misuse rather than ban the procedure (Levin, 1993; Levin & Robinson, 1999). Although the p value in hypothesis testing does not indicate the actual size of an effect, it allows us to exclude chance as a plausible explanation of the observed effect. A significant p value substantiates the existence of a treatment effect. Such confirmation alone often means a significant contribution to the field (Wainer, 1999). Cohen (1994) decried the misrepresentations in null hypothesis significance testing and recommended estimating effect sizes using confidence intervals. Cohen cautioned about the search for a nonexistent magic alternative to null hypothesis significance testing. He also noted that confidence intervals were rarely to be seen in our publications (p. 1001). According to his surmise, the main reason why confidence intervals were not often reported was that they were so embarrassingly large. A large width of the confidence interval is not very informative about the possible size of the treatment effect, and it does not provide any better information than a hypothesis test with low-statistical power. Therefore, the width of the confidence interval provides us with the analogue of power analysis in significant testing larger sample sizes reduce the size of confidence intervals as they increase the statistical power (p. 1002). There have been few articles that compare sample size requirements for significance tests and confidence intervals. The call for the great use of confidence interval does not suggest how confidence intervals will perform in research studies that often have insufficient statistical power. In sum, little is known about how the chronic issue of inadequate sample size in underpowered studies would affect the use of confidence intervals if they were to substitute the significance tests. This article provides a demonstration to examine the sample size requirements for significance tests and confidence intervals. We will make the comparison in the context of

3 Liu 5 an independent t-test. For example, an independent t-test can involve a sample of clients with depression who are randomly assigned to receive psychotherapy in the treatment condition or placebo in the control condition. For this example, the clients are measured before and after treatments to assess changes in depression symptoms. We will focus on the sample size requirements in a t-test because there is a wellknown correspondence between the t-test and its two-sided confidence interval (Agresti & Finlay, 2008). The confidence interval allows us to reach the same decision on the rejection of the null hypothesis in the significance test. The information used to conduct the t-test can be readily converted to compute a confidence interval. In this article, the required sample sizes are first calculated to achieve different levels of statistical power for various effect sizes in the significance tests. The required sample sizes for the significance tests will then be used to construct confidence intervals in lieu of the significance tests. The performance of the confidence intervals is then examined through the interval width as suggested by Cohen (1994). In the following, we will start with the confidence interval width in the section Performance Measure of Confidence Intervals. The probability of achieving a certain interval width is then defined in the section Power of Confidence Interval, which is as essential to confidence interval as statistical power is to significance test. Using the two sample t-test as an example, we show the required sample sizes for achieving sufficient statistical power in the section Sample Size for Sufficient Power in Significance Tests, and then use thoserequiredsamplesizestocomputepower of confidence interval in the section Power of Confidence Interval in Studies with Sufficient Statistical Power. Finally, we will compare sample size requirements for significance test and confidence interval by examining statistical power and power of confidence interval in the section Sample Size Requirements for Significance Test and Confidence Interval. Performance Measure of Confidence Intervals The probability of achieving a certain short width in the confidence interval is computed to evaluate the performance of confidence intervals using those sample sizes based on statistical power. The findings will provide valuable insights about how the confidence intervals will perform in place of significance tests. If the sample size based on high statistical power produces sufficiently narrow confidence intervals, then the researcher should be encouraged to convert significant tests to confidence intervals without different sample size planning for confidence intervals. Otherwise, if the sample size requirement for confidence intervals is more demanding than that for significance tests, it will present a serious challenge in advocating the use of confidence intervals without revamping the current practice of sample size determination, which is geared toward significance tests and statistical power. To compare sample size requirements for significance tests and confidence intervals, we will establish a simple standard on the precision of confidence intervals (i.e., the interval width) in relation to the standardized effect size in the significance tests. We will use standardized effect sizes in computing the required sample sizes for statistical power, and we will also express the width of the confidence interval as a fraction of the standardized effect size, which relates the sample size requirement for the significance tests to that for the confidence intervals. Specifically, sample sizes are computed to achieve varying statistical power for Cohen s small, medium, and large effect size, and then these sample sizes are used to calculate the probability of attaining varying interval width, which is expressed as a fraction of Cohen s small, medium, and large effect size. The effect size used in computing statistical power is also called the minimum effect size because power is an increasing function of effect size. Any effect size larger than the minimum effect size used in power calculation will yield higher statistical power than the computed one.

4 6 Counseling Outcome Research and Evaluation 4(1) Thus, the half width of the confidence interval (w) can be expressed as a fraction of the minimum effect size (D): w ¼ fd; where f is the fractional factor. We will standardize the minimum effect size by dividing it with the standard deviation s i:e:; d ¼ D s. While using standardized effect size, we in effect assume a unit standard deviation (s ¼ 1). A unit standard deviation suggests that the effect size is standardized or metric free. In the following, we will assume a unit standard deviation for simplicity of explanation. The half width then becomes w ¼ fd. The smaller the fractional factor f is, the more precise the confidence interval becomes. There does not exist an absolute precision standard for confidence intervals, but we can define a simple one according to the purpose of the interval estimate. The confidence interval shows a range of possible values that the effect size can take. Suppose that we deal with a minimum effect size of some practical importance D. If the confidence interval is wide enough to contain both zero and minimum effect size D (i.e., f :5Þ, it suggests that the magnitude of the effect size can be zero or it can be large enough to be practically important. Such a conclusion is not really the ideal one because it yields contradictory answers about the effect size and is tantamount to stating that the effect size is both unimportant (zero) and important (D). So, the ultimate precision standard for a confidence interval is that the length of the confidence interval is shorter than the minimum effect size (f <:5). It means that the confidence interval does not yield an ambiguous answer about what is important (D) and what is unimportant (zero). That way, the researcher never has to subjectively reconcile the inconsistency in the inference about the effect size. However, such an ultimate precision standard can be rarely achieved in view of the required sample size in current practice. Such a short interval width (i.e., f <:5) translates to extremely large sample sizes because the ultimate precision standard implies statistical power near.98. In practice, such high power is not common. The relationship between interval precision and statistical power can be made succinct in a z test when the population variance does not need to be estimated. The interval width of the confidence interval determines the statistical power of the significance test and vice versa. The half width of the confidence interval is p z 0 s ffiffiffiffiffiffiffi 2=n, where z0 s the critical value. Equating w ¼ fd with w ¼ z 0 s ffiffiffiffiffiffiffi 2=n yields the p 2, group sample size n ¼ 2 z 0 fd where d ¼ D=s. The statistical power for the z-test is 1 b ¼ 1 PZ 0 0 z 0 þ PZ z 0 ;where Z 0 s the test statistic under the alternative hypothesis. The statistic Z 0 follows a normal distribution with a shifted mean l ¼ ffiffiffiffiffiffiffi n=2 d. The p lambda is the noncentrality parameter. Substituting n ¼ 2 z 0 2 fd into the formula l produces l ¼ z 0 f :The factor f ¼ :5 andf ¼ 1:0 correspond to statistical power.98 and.50, respectively (see Appendix for SAS code). So, we constrain f to be between.5 and 1.0. When the fractional factor f is close to 1.0, the corresponding statistical power is close to.5. The length of the confidence interval is twice the minimum effect size. It means that both zero and an effect size twice the minimum effect size can be presented as plausible values of the true effect size in a confidence interval. We, therefore, define f ¼ 1:0 as the low precision standard (Liu, 2012). Power of Confidence Interval We shall now review the literature on sample size planning for confidence intervals and different approaches to sample size determination in planning confidence intervals. A few seminal papers have provided different approaches to gauging the width of confidence intervals. Beal (1989) provided an elegant framework to evaluate the width of the confidence interval. The probability of achieving a short width given the validity of a confidence interval is defined as power of confidence interval, and it is currently implemented in SAS proc power procedure, which is a built-in statistical routine in SAS

5 Liu 7 software. The conditional probability is related to the unconditional probability of obtaining a narrow width in a confidence interval: Pw ½ U Š ¼ Pw ½ UjVŠP½VŠ þ Pw ½ Uj V ŠP½ VŠ; where the half width of the confidence interval is w; the upper bound of the half width is U. The validity of a confidence interval (V) means that the confidence interval includes the parameter of interest, and its complement ( V) means that the confidence interval does not contain the population parameter. The conditional probability Pw ½ UjVŠ is the power of confidence interval, and it is the default option in SAS proc power procedure. Although there is a discernable difference between the conditional probability Pw ½ UjVŠ and the unconditional probability Pw ½ UŠ, the difference is usually very small. So, the difference largely lies in the philosophy of evaluating a confidence interval. Lehmann (1959) argued that it was desirable to obtain a narrow confidence interval only when it included the parameter. Both Beal s paper and the SAS procedure reflect the same point of view. The conditional probability is typically smaller than the unconditional probability. Therefore, the former is more conservative than the latter, which does not take into account whether the confidence interval includes the parameter or not. Other researchers seem to prefer the unconditional probability Pw ½ UŠ in sample size planning (Hahn & Meeker, 1991). Around the same time, Hsu (1988) used the joint probability of obtaining a short width and including the parameter in determining sample size for confidence intervals, that is, P½w U \ VŠ. The joint probability is smaller than the conditional probability and the unconditional probability because Pw ½ U \ V Š ¼ Pw ½ UjVŠPV ½ Š; where the probability of parameter inclusion is PV ½ Š ¼ 1 a or the confidence level. In this study, we will use the default conditional probability in SAS proc power procedure to compare the sample size requirements for significance tests and confidence intervals. Had the unconditional probability or the joint probability been used, the conclusion would be similar. Sample Size for Sufficient Statistical Power in Significance Tests Sample sizes are computed to achieve different levels of statistical power for varying minimum effect sizes. The minimum effect size divided by the standard deviation is the standardized effect size (d ¼ D s ), a metric free measure. The standardized effect size takes the values of.2,.5, and.8, which correspond to small, medium, and large effect size popularized by Cohen. In our example that compares psychotherapy and placebo in treating depression, these effect sizes mean that the difference in change scores of depression symptom is 20%, 50%, and 80% of the standard deviation of the change score, respectively. The significance level is set at the 5% (a ¼ :05) The SAS proc power procedure produces the required sample sizes necessary to achieve statistical power.5,.6,.7,.8, and.9, when equal group size is assumed in the two sample independent t-test (see Table 1). Statistical power 0.5 is considered poor, and.8 is usually desired. Power.9 is perceived to be on the high end. It is easy to see that higher statistical power requires larger sample sizes, and that detecting a smaller effect size increases the required sample size. Power of Confidence Interval in Studies With Sufficient Statistical Power The sample size required to attain statistical power is used to calculate the probability of obtaining a certain interval width given that the confidence interval includes the parameter (i.e., power of confidence interval). The achieved width can vary from short (more precise) to long (less precise), and the width is expressed as a fraction of the minimum effect size. Since we use the standardized effect size d, the half

6 8 Counseling Outcome Research and Evaluation 4(1) Table 1. Sample Size Requirement for Statistical Power. d Minal Power Actual Power Total N , width becomes w fd. The fractional factor f varies, and it takes.5,.6,.7,.8,.9, and 1.0. In our example study that compares psychotherapy and control in treating depression, varying f means changing desired precision in the computed confidence interval. Suppose that the effect size is medium.5 and that f is set to.7. The desired half width of the confidence interval is then :5 :7 ¼ :35. It suggests that we intend to have the confidence interval for the difference in change scores accurate to 35% of the standard deviation of the outcome. Had the fractional factor f been set to.5, we would want to have the confidence interval accurate to 25% of the standard deviation of the outcome. In this case, we have increased the desired precision by decreasing f and shortening the width. Table 2 lists the probability of obtaining the varying width given the validity of the confidence interval for small, medium, and large effect size (see Appendix for the SAS code). Sample Size Requirements for Significance Test and Confidence Interval We can compare the sample size requirements for significance test and confidence interval by examining statistical power and power of confidence interval in the example study that uses a two-sample independent t-test. Statistical power is used to measure the performance of the significance test, and power of confidence interval is utilized to gauge the performance of the confidence interval. The example study is assumed to have sufficient sample size and statistical power. We will check to see if sufficient statistical power also implies comparable power of confidence interval in the same study. For that purpose, we can make three interesting observations in Table 2. First, it is virtually impossible to achieve high precision in confidence intervals (i.e., f ¼ :5Þ using the required sample size based on statistical power. In Table 1, the total sample size (N) 1,054, 172, and 68 will produce statistical power.9 for effect size.2,.5, and.8, respectively. Statistical power.9 is considered very high in practice, yet none of these sample sizes seems to produce the ultimate precise confidence interval, which will not include zero and the minimum effect size at the same time. In other words, the confidence intervals are possibly wide enough to contain zero and minimum effect size. Thus, zero is a plausible value of the effect size; so is the minimum effect size. When this happens, the confidence interval may not allow us to draw a conclusive inference about the treatment effect. In this case, there inevitably exists some confusion about whether the treatment effect is indeed important or not. Second, sample size based on statistical power.7 almost guarantees that the confidence intervals will achieve low precision regardless of the actual effect sizes, that is, f ¼ 1:0 and Pw ½ djvš :99. Even statistical power.6 ensures the achievement of low precision when the effect size is.2 and.5. When the effect size is.8, Pw ½ djvš ¼ :884. If the confidence interval is twice as wide as the minimum effect size, then an observed effect size bordering on the minimum effect size will return a confidence interval that includes zero. So, the observed effect size must at least exceed the minimum effect size before the researcher can rule out zero as a plausible value of the treatment effect in the confidence interval. In short,

7 Liu 9 Table 2. Sample Size and the Power of Confidence Interval. d ¼ :2 d ¼ :5 d ¼ :8 f fd N P½w fdjvš fd N P½w fdjvš fd N P½w fdjvš < < < < < < < < < < < < < < < < < < < < < < < > < > > > > > > > > > > > > > > > > > > >.999 Note: When the effect size d is.2, the total sample size (N) 388, 492, 620, 788, and 1,054 produce statistical power.5,.6,.7,.8, and.9, respectively in Table 1. These sample sizes 388, 492, 620, 788, and 1,054 are used to compute the power of confidence interval when the desired half width w ¼ fd is set to :5 :2 ¼ :10,:6 :2 ¼ :12,:7 :2 ¼ :14,:8 :2 ¼ :16,:9 :2 ¼ :18, and 1:0 :2 ¼ :20, respectively. When the effect size ds.5, the total sample size (N) 64, 82, 102, 128, and 172 produce statistical power.5,.6,.7,.8, and.9, respectively in Table 1. These sample sizes 64, 82, 102, 128, and 172 are used to compute the power of confidence interval when the desired half width w ¼ fd is set to :5 :5 ¼ :25,:6 :5 ¼ :30,:7 :5 ¼ :35,:8 :5 ¼ :40,:9 :5 ¼ :45, and 1:0 :5 ¼ :50, respectively. When the effect size dis.8, the total sample size (N) 28, 34, 42, 52, and 68 produce statistical power.5,.6,.7,.8, and.9, respectively in Table 1. These sample sizes 28, 34, 42, 52, and 68 are used to compute the power of confidence interval when the desired half width w ¼ fd is set to :5 :8 ¼ :40,:6 :8 ¼ :48,:7 :8 ¼ :56,:8 :8 ¼ :64,:9 :8 ¼ :72, and 1:0 :8 ¼ :80, respectively. moderate statistical power (.70) in a significance test does not necessarily translate into adequate precision in the confidence interval. Third, the benchmark.80 on statistical power returns a little above.50 in power of confidence interval, when we seek relatively good precision in the interval estimate, that is, f ¼ :7 and Pw ½ :7djVŠ :5. For instance, the sample size N ¼ 128 produces statistical power.80 for effect size.5. The same sample size yields.514 in power of confidence interval Pw ½ :7djVŠ ¼ :514 (see Table 2). Thus, the desired power.80 in a significance test does not guarantee that good precision will materialize

8 10 Counseling Outcome Research and Evaluation 4(1) Table 3. Precision of Confidence Intervals (f) Under High Statistical Power (.80) and High Power of Confidence Interval (.80). d Total N Statistical Power f fd P½w fdjvš Note: The total sample size (N) 788, 128, and 52 produce statistical power around.80 for effect size.2,.5, and.8, respectively in Table 1. We can use grid search to identify w ¼ fd such that the power of confidence interval P½w fdjvšs close to.80 for the same sample sizes 788, 128, and 52. in a confidence interval with a high probability. In other words, good statistical power does not mean sufficient power of confidence interval with moderate precision. It should be noted that the power of confidence interval is relative to the desired precision of a confidence interval. The power of confidence interval will vary for the same sample size if the desired precision of the confidence interval changes. As the precision standard lowers, f and the power of confidence interval increase. If we adjust f upward, we can always improve the power of confidence interval at the cost of sacrificing precision. It is interesting to know what kind of precision the sample size based on statistical power.8 can produce while holding the power of confidence interval also at.80. Table 3 lists the sample size for statistical power.80 and the precision level at which the power of confidence interval is around.80. We can see that the fractional factor f sits near or below.75 if we want to elevate the power of confidence interval to around.80. The fractional factor f ¼ :75 is the middle point between f ¼ :5 and f ¼ 1:0, which correspond to high and low precision, respectively. In other words, statistical power.80 returns about.80 chance of obtaining the half width shorter than three quarters of the standardized minimum effect size given that the confidence interval contains the true parameter (i.e., Pw ½ :75djVŠ :80). Discussion Confidence intervals are not all created equal. They vary in the probability of achieving a certain short width. The precision of the confidence interval is measured by the interval width. The wider the confidence interval is, the less informative it is about the actual effect size. A wide confidence interval is not precise because it shows a long range of plausible values for the effect size. So far, researchers have not paid much attention to the performance of confidence intervals. Few researchers have used SAS proc power to determine sample sizes for confidence intervals, and few have known the subtle difference between the conditional probability and unconditional probability in SAS proc power procedure. A confidence interval may offer more information than the corresponding significance test, but it does so with moderate precision (f :70) at the expense of more sample. This comparison study suggests that the significance test requires a smaller sample size than does the corresponding confidence interval of moderate precision. Intuitively, it may be easy to explain this finding. Confirming the existence of a treatment effect should be less demanding than measuring such an effect with some satisfactory precision. The former is the first step in our study of a phenomenon; and the latter is one step further. It takes more resources to measure an effect with good precision than to exclude chance as a possible explanation of the observed effect. Without sufficient sample sizes, confidence intervals will not bring us any closer to the knowledge of the true effect size. The confidence intervals thus constructed will be embarrassingly large just as Cohen (1994) had surmised. Using confidence intervals alone will not

9 Liu 11 improve the statistical results in the empirical studies, where significance tests are simply suppressed. Insufficient sample size, which is known to plague underpowered studies, will render the corresponding confidence intervals woefully imprecise. In counseling outcome research, people will continue to use both significance test and confidence interval in examining outcome difference due to certain therapies. The significance test affords a formal mechanism to rule out chance as a cause of the observed nonzero difference due to the therapy. The confidence interval can be used to measure how large such a difference due to the therapy can be. Although confidence intervals can be informative about the outcome difference due to the therapy, they do not necessarily have the same desired precision. If sample size is not properly considered, the width of the confidence interval will be too large to be informative about the size of the difference due to the therapy. Counseling researchers, who employ confidence intervals in research studies, should consider sample size and the probability of achieving a short interval width or power of confidence interval. As shown in this article, sample size requirement for confidence interval differs from that for significance test. Sample size planning for significance test and confidence interval follow different paradigms. The former uses statistical power and the latter power of confidence interval. They cannot be used interchangeably. Sufficient statistical power does not imply comparable power of confidence interval. In conclusion, confidence intervals shall not substitute careful planning, without which they are as equally susceptible to misuse as significance tests. Sample size planning is important both to significance tests and confidence intervals. While sample size determination is well known for significance tests, its counterpart for confidence intervals has not been thoroughly understood or developed. More research of this nature is needed to improve statistical practice in empirical studies, which have become the primary mode of inquiry in counseling research and evaluation. Appendix SAS Code for Table 2 * sample size for power.5 through.9 at delta.2.5.8; proc power; twosamplemeans test¼diff meandiff¼ stddev ¼1 power ¼ ntotal ¼.; run; * effect size.2; proc power; twosamplemeans ci¼diff halfwidth ¼ stddev ¼1 probwidth ¼. ntotal ¼ ; run; * effect size.5; proc power; twosamplemeans ci¼diff halfwidth ¼ stddev ¼1 probwidth ¼. ntotal ¼ ; run; * effect size.8; proc power; twosamplemeans ci¼diff halfwidth ¼ stddev ¼1 probwidth ¼. ntotal ¼ ; run; Declaration of Conflicting Interests The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. Funding The author(s) received no financial support for the research, authorship, and/or publication of this article.

10 12 Counseling Outcome Research and Evaluation 4(1) References Agresti, A., & Finlay, B. (2008). Statistical methods for the social sciences (4th ed.). Upper Saddle River, NJ: Prentice Hall. American Psychological Association. (2009). Publication manual of the American Psychological Association (6th ed.). Washington, DC: Author. Beal, S. L. (1989). Sample size determination for confidence intervals on the population mean and on the difference between two population means. Biometrics, 45, Bland, J. M. (2009). The tyranny of power: Is there a better way to calculate sample size? British Medical Journal, 339, Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum. Cohen, J. (1992). A power primer. Psychological Bulletin, 112, Cohen, J. (1994). The Earth is round (p <.05). American Psychologist, 49, Cumming, G., Fidler, F., Leonard, M., Kalinowski, P., Christiansen, A., Kleinig, A., & Wilson, S. (2007). Statistical reform in psychology: Is anything changing? Psychological Science, 18, Hahn, G., & Meeker, W. (1991). Statistical intervals: A guide for practitioners. New York, NY: Wiley. Hsu, J. C. (1988). Sample size computation for designing multiple comparison experiments. Computational Statistics and Data Analysis, 7, Hunter, J. E. (1997). Needed: A ban on the significance test. Psychological Science, 8, 3 7. International Committee of Medical Journal Editors. (1988). Uniform requirements for manuscripts submitted to biomedical journals. Annals of Internal Medicine, 108, Lehmann. (1959). Testing statistical hypotheses. New York, NY: Wiley. Levin, J. R. (1993). Statistical significance testing from three perspectives. Journal of Experimental Education, 61, Levin, J. R., & Robinson, D. H. (1999) Further reflections on hypothesis testing and editorial policy for primary research journals. Educational Psychological Review, 11, Liu, X. (2012). Implications of statistical power for confidence intervals. British Journal of Mathematical and Statistical Psychology, 65, Rossi, J. S. (1990). Statistical power of psychological research: What have we gained in 20 years? Journal of Consulting and Clinical Psychology, 58, Sedlmeier, P., & Gigerenzer, G. (1989). Do studies of statistical power have an effect on the power of studies? Psychological Bulletin, 105, Sink, C. A., & Mvududu, N. H. (2010). Statistical power, sampling, and effect sizes: Three keys to research relevancy. Counseling Outcome Research and Evaluation 1, Thompson, B. (2002). Statistical, practical, and clinical : How many kinds of significance do counselors need to consider? Journal of Counseling and Development, 80, Thompson, B. (2006). Foundations of behavioral statistics: An insight-based approach. New York, NY: Guilford. Wainer, H. (1999). One cheer for null hypothesis significance testing. Psychological Methods, 4, Wilkinson, L. American Psychological Association Task Force on Statistical Inference. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, Author Biography Xiaofeng Steven Liu is Associate Professor in the Department of Educational Studies at the University of South Carolina, Columbia. His areas of specialization are statistical power analysis, sample size determination, hierarchical linear modeling, and educational policy study.

INADEQUACIES OF SIGNIFICANCE TESTS IN

INADEQUACIES OF SIGNIFICANCE TESTS IN INADEQUACIES OF SIGNIFICANCE TESTS IN EDUCATIONAL RESEARCH M. S. Lalithamma Masoomeh Khosravi Tests of statistical significance are a common tool of quantitative research. The goal of these tests is to

More information

A SAS Macro to Investigate Statistical Power in Meta-analysis Jin Liu, Fan Pan University of South Carolina Columbia

A SAS Macro to Investigate Statistical Power in Meta-analysis Jin Liu, Fan Pan University of South Carolina Columbia Paper 109 A SAS Macro to Investigate Statistical Power in Meta-analysis Jin Liu, Fan Pan University of South Carolina Columbia ABSTRACT Meta-analysis is a quantitative review method, which synthesizes

More information

Running Head: ADVERSE IMPACT. Significance Tests and Confidence Intervals for the Adverse Impact Ratio. Scott B. Morris

Running Head: ADVERSE IMPACT. Significance Tests and Confidence Intervals for the Adverse Impact Ratio. Scott B. Morris Running Head: ADVERSE IMPACT Significance Tests and Confidence Intervals for the Adverse Impact Ratio Scott B. Morris Illinois Institute of Technology Russell Lobsenz Federal Bureau of Investigation Adverse

More information

Confidence Intervals On Subsets May Be Misleading

Confidence Intervals On Subsets May Be Misleading Journal of Modern Applied Statistical Methods Volume 3 Issue 2 Article 2 11-1-2004 Confidence Intervals On Subsets May Be Misleading Juliet Popper Shaffer University of California, Berkeley, shaffer@stat.berkeley.edu

More information

Introduction to Meta-Analysis

Introduction to Meta-Analysis Introduction to Meta-Analysis Nazım Ço galtay and Engin Karada g Abstract As a means to synthesize the results of multiple studies, the chronological development of the meta-analysis method was in parallel

More information

THE INTERPRETATION OF EFFECT SIZE IN PUBLISHED ARTICLES. Rink Hoekstra University of Groningen, The Netherlands

THE INTERPRETATION OF EFFECT SIZE IN PUBLISHED ARTICLES. Rink Hoekstra University of Groningen, The Netherlands THE INTERPRETATION OF EFFECT SIZE IN PUBLISHED ARTICLES Rink University of Groningen, The Netherlands R.@rug.nl Significance testing has been criticized, among others, for encouraging researchers to focus

More information

Context of Best Subset Regression

Context of Best Subset Regression Estimation of the Squared Cross-Validity Coefficient in the Context of Best Subset Regression Eugene Kennedy South Carolina Department of Education A monte carlo study was conducted to examine the performance

More information

1 The conceptual underpinnings of statistical power

1 The conceptual underpinnings of statistical power 1 The conceptual underpinnings of statistical power The importance of statistical power As currently practiced in the social and health sciences, inferential statistics rest solidly upon two pillars: statistical

More information

Empirical Formula for Creating Error Bars for the Method of Paired Comparison

Empirical Formula for Creating Error Bars for the Method of Paired Comparison Empirical Formula for Creating Error Bars for the Method of Paired Comparison Ethan D. Montag Rochester Institute of Technology Munsell Color Science Laboratory Chester F. Carlson Center for Imaging Science

More information

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data TECHNICAL REPORT Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data CONTENTS Executive Summary...1 Introduction...2 Overview of Data Analysis Concepts...2

More information

SAMPLE SIZE AND POWER

SAMPLE SIZE AND POWER SAMPLE SIZE AND POWER Craig JACKSON, Fang Gao SMITH Clinical trials often involve the comparison of a new treatment with an established treatment (or a placebo) in a sample of patients, and the differences

More information

CHAPTER THIRTEEN. Data Analysis and Interpretation: Part II.Tests of Statistical Significance and the Analysis Story CHAPTER OUTLINE

CHAPTER THIRTEEN. Data Analysis and Interpretation: Part II.Tests of Statistical Significance and the Analysis Story CHAPTER OUTLINE CHAPTER THIRTEEN Data Analysis and Interpretation: Part II.Tests of Statistical Significance and the Analysis Story CHAPTER OUTLINE OVERVIEW NULL HYPOTHESIS SIGNIFICANCE TESTING (NHST) EXPERIMENTAL SENSITIVITY

More information

Interpretation of Data and Statistical Fallacies

Interpretation of Data and Statistical Fallacies ISSN: 2349-7637 (Online) RESEARCH HUB International Multidisciplinary Research Journal Research Paper Available online at: www.rhimrj.com Interpretation of Data and Statistical Fallacies Prof. Usha Jogi

More information

Testing the Hypothesis That Treatments Have Negligible Effects: Minimum-Effect Tests in the General Linear Model

Testing the Hypothesis That Treatments Have Negligible Effects: Minimum-Effect Tests in the General Linear Model Journal of Applied Psychology 1999, Vol. 84, No. 2, 234-248 Copyright 1999 by the American Psychological Association, Inc. 002J-90IO/99/S3.00 Testing the Hypothesis That Treatments Have Negligible Effects:

More information

Running head: INDIVIDUAL DIFFERENCES 1. Why to treat subjects as fixed effects. James S. Adelman. University of Warwick.

Running head: INDIVIDUAL DIFFERENCES 1. Why to treat subjects as fixed effects. James S. Adelman. University of Warwick. Running head: INDIVIDUAL DIFFERENCES 1 Why to treat subjects as fixed effects James S. Adelman University of Warwick Zachary Estes Bocconi University Corresponding Author: James S. Adelman Department of

More information

Political Science 15, Winter 2014 Final Review

Political Science 15, Winter 2014 Final Review Political Science 15, Winter 2014 Final Review The major topics covered in class are listed below. You should also take a look at the readings listed on the class website. Studying Politics Scientifically

More information

Using Effect Size in NSSE Survey Reporting

Using Effect Size in NSSE Survey Reporting Using Effect Size in NSSE Survey Reporting Robert Springer Elon University Abstract The National Survey of Student Engagement (NSSE) provides participating schools an Institutional Report that includes

More information

Bayesian and Frequentist Approaches

Bayesian and Frequentist Approaches Bayesian and Frequentist Approaches G. Jogesh Babu Penn State University http://sites.stat.psu.edu/ babu http://astrostatistics.psu.edu All models are wrong But some are useful George E. P. Box (son-in-law

More information

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc.

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc. Chapter 23 Inference About Means Copyright 2010 Pearson Education, Inc. Getting Started Now that we know how to create confidence intervals and test hypotheses about proportions, it d be nice to be able

More information

9 research designs likely for PSYC 2100

9 research designs likely for PSYC 2100 9 research designs likely for PSYC 2100 1) 1 factor, 2 levels, 1 group (one group gets both treatment levels) related samples t-test (compare means of 2 levels only) 2) 1 factor, 2 levels, 2 groups (one

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 39 Evaluation of Comparability of Scores and Passing Decisions for Different Item Pools of Computerized Adaptive Examinations

More information

SAMPLE SIZE IN CLINICAL RESEARCH, THE NUMBER WE NEED

SAMPLE SIZE IN CLINICAL RESEARCH, THE NUMBER WE NEED TECHNICAL NOTES SAMPLE SIZE IN CLINICAL RESEARCH, THE NUMBER WE NEED Pratap Patra Department of Pediatrics, Govt. Medical College, Vadodara, Gujarat, India Correspondence to: Pratap Patra (pratap_patra3@yahoo.co.in)

More information

How many speakers? How many tokens?:

How many speakers? How many tokens?: 1 NWAV 38- Ottawa, Canada 23/10/09 How many speakers? How many tokens?: A methodological contribution to the study of variation. Jorge Aguilar-Sánchez University of Wisconsin-La Crosse 2 Sample size in

More information

Power and Effect Size Measures: A Census of Articles Published from in the Journal of Speech, Language, and Hearing Research

Power and Effect Size Measures: A Census of Articles Published from in the Journal of Speech, Language, and Hearing Research Power and Effect Size Measures: A Census of Articles Published from 2009-2012 in the Journal of Speech, Language, and Hearing Research Manish K. Rami, PhD Associate Professor Communication Sciences and

More information

ELEMENTS OF PSYCHOPHYSICS Sections VII and XVI. Gustav Theodor Fechner (1860/1912)

ELEMENTS OF PSYCHOPHYSICS Sections VII and XVI. Gustav Theodor Fechner (1860/1912) ELEMENTS OF PSYCHOPHYSICS Sections VII and XVI Gustav Theodor Fechner (1860/1912) Translated by Herbert Sidney Langfeld (1912) [Classics Editor's note: This translation of these passages from Fechner's

More information

Sawtooth Software. The Number of Levels Effect in Conjoint: Where Does It Come From and Can It Be Eliminated? RESEARCH PAPER SERIES

Sawtooth Software. The Number of Levels Effect in Conjoint: Where Does It Come From and Can It Be Eliminated? RESEARCH PAPER SERIES Sawtooth Software RESEARCH PAPER SERIES The Number of Levels Effect in Conjoint: Where Does It Come From and Can It Be Eliminated? Dick Wittink, Yale University Joel Huber, Duke University Peter Zandan,

More information

Data Mining CMMSs: How to Convert Data into Knowledge

Data Mining CMMSs: How to Convert Data into Knowledge FEATURE Copyright AAMI 2018. Single user license only. Copying, networking, and distribution prohibited. Data Mining CMMSs: How to Convert Data into Knowledge Larry Fennigkoh and D. Courtney Nanney Larry

More information

Testing Means. Related-Samples t Test With Confidence Intervals. 6. Compute a related-samples t test and interpret the results.

Testing Means. Related-Samples t Test With Confidence Intervals. 6. Compute a related-samples t test and interpret the results. 10 Learning Objectives Testing Means After reading this chapter, you should be able to: Related-Samples t Test With Confidence Intervals 1. Describe two types of research designs used when we select related

More information

Sheila Barron Statistics Outreach Center 2/8/2011

Sheila Barron Statistics Outreach Center 2/8/2011 Sheila Barron Statistics Outreach Center 2/8/2011 What is Power? When conducting a research study using a statistical hypothesis test, power is the probability of getting statistical significance when

More information

How to use the Lafayette ESS Report to obtain a probability of deception or truth-telling

How to use the Lafayette ESS Report to obtain a probability of deception or truth-telling Lafayette Tech Talk: How to Use the Lafayette ESS Report to Obtain a Bayesian Conditional Probability of Deception or Truth-telling Raymond Nelson The Lafayette ESS Report is a useful tool for field polygraph

More information

Lesson 11.1: The Alpha Value

Lesson 11.1: The Alpha Value Hypothesis Testing Lesson 11.1: The Alpha Value The alpha value is the degree of risk we are willing to take when making a decision. The alpha value, often abbreviated using the Greek letter α, is sometimes

More information

To open a CMA file > Download and Save file Start CMA Open file from within CMA

To open a CMA file > Download and Save file Start CMA Open file from within CMA Example name Effect size Analysis type Level Tamiflu Symptom relief Mean difference (Hours to relief) Basic Basic Reference Cochrane Figure 4 Synopsis We have a series of studies that evaluated the effect

More information

Hypothesis Testing. Richard S. Balkin, Ph.D., LPC-S, NCC

Hypothesis Testing. Richard S. Balkin, Ph.D., LPC-S, NCC Hypothesis Testing Richard S. Balkin, Ph.D., LPC-S, NCC Overview When we have questions about the effect of a treatment or intervention or wish to compare groups, we use hypothesis testing Parametric statistics

More information

2 Critical thinking guidelines

2 Critical thinking guidelines What makes psychological research scientific? Precision How psychologists do research? Skepticism Reliance on empirical evidence Willingness to make risky predictions Openness Precision Begin with a Theory

More information

Fixed-Effect Versus Random-Effects Models

Fixed-Effect Versus Random-Effects Models PART 3 Fixed-Effect Versus Random-Effects Models Introduction to Meta-Analysis. Michael Borenstein, L. V. Hedges, J. P. T. Higgins and H. R. Rothstein 2009 John Wiley & Sons, Ltd. ISBN: 978-0-470-05724-7

More information

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz This study presents the steps Edgenuity uses to evaluate the reliability and validity of its quizzes, topic tests, and cumulative

More information

USE AND MISUSE OF MIXED MODEL ANALYSIS VARIANCE IN ECOLOGICAL STUDIES1

USE AND MISUSE OF MIXED MODEL ANALYSIS VARIANCE IN ECOLOGICAL STUDIES1 Ecology, 75(3), 1994, pp. 717-722 c) 1994 by the Ecological Society of America USE AND MISUSE OF MIXED MODEL ANALYSIS VARIANCE IN ECOLOGICAL STUDIES1 OF CYNTHIA C. BENNINGTON Department of Biology, West

More information

Using Statistical Intervals to Assess System Performance Best Practice

Using Statistical Intervals to Assess System Performance Best Practice Using Statistical Intervals to Assess System Performance Best Practice Authored by: Francisco Ortiz, PhD STAT COE Lenny Truett, PhD STAT COE 17 April 2015 The goal of the STAT T&E COE is to assist in developing

More information

Response to the ASA s statement on p-values: context, process, and purpose

Response to the ASA s statement on p-values: context, process, and purpose Response to the ASA s statement on p-values: context, process, purpose Edward L. Ionides Alexer Giessing Yaacov Ritov Scott E. Page Departments of Complex Systems, Political Science Economics, University

More information

A Spreadsheet for Deriving a Confidence Interval, Mechanistic Inference and Clinical Inference from a P Value

A Spreadsheet for Deriving a Confidence Interval, Mechanistic Inference and Clinical Inference from a P Value SPORTSCIENCE Perspectives / Research Resources A Spreadsheet for Deriving a Confidence Interval, Mechanistic Inference and Clinical Inference from a P Value Will G Hopkins sportsci.org Sportscience 11,

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

Reinforcement Learning : Theory and Practice - Programming Assignment 1

Reinforcement Learning : Theory and Practice - Programming Assignment 1 Reinforcement Learning : Theory and Practice - Programming Assignment 1 August 2016 Background It is well known in Game Theory that the game of Rock, Paper, Scissors has one and only one Nash Equilibrium.

More information

Chapter 02 Developing and Evaluating Theories of Behavior

Chapter 02 Developing and Evaluating Theories of Behavior Chapter 02 Developing and Evaluating Theories of Behavior Multiple Choice Questions 1. A theory is a(n): A. plausible or scientifically acceptable, well-substantiated explanation of some aspect of the

More information

Mantel-Haenszel Procedures for Detecting Differential Item Functioning

Mantel-Haenszel Procedures for Detecting Differential Item Functioning A Comparison of Logistic Regression and Mantel-Haenszel Procedures for Detecting Differential Item Functioning H. Jane Rogers, Teachers College, Columbia University Hariharan Swaminathan, University of

More information

CHAPTER NINE DATA ANALYSIS / EVALUATING QUALITY (VALIDITY) OF BETWEEN GROUP EXPERIMENTS

CHAPTER NINE DATA ANALYSIS / EVALUATING QUALITY (VALIDITY) OF BETWEEN GROUP EXPERIMENTS CHAPTER NINE DATA ANALYSIS / EVALUATING QUALITY (VALIDITY) OF BETWEEN GROUP EXPERIMENTS Chapter Objectives: Understand Null Hypothesis Significance Testing (NHST) Understand statistical significance and

More information

Statistical Significance, Effect Size, and Practical Significance Eva Lawrence Guilford College October, 2017

Statistical Significance, Effect Size, and Practical Significance Eva Lawrence Guilford College October, 2017 Statistical Significance, Effect Size, and Practical Significance Eva Lawrence Guilford College October, 2017 Definitions Descriptive statistics: Statistical analyses used to describe characteristics of

More information

Supplemental material on graphical presentation methods to accompany:

Supplemental material on graphical presentation methods to accompany: Supplemental material on graphical presentation methods to accompany: Preacher, K. J., & Kelley, K. (011). Effect size measures for mediation models: Quantitative strategies for communicating indirect

More information

VERDIN MANUSCRIPT REVIEW HISTORY REVISION NOTES FROM AUTHORS (ROUND 2)

VERDIN MANUSCRIPT REVIEW HISTORY REVISION NOTES FROM AUTHORS (ROUND 2) 1 VERDIN MANUSCRIPT REVIEW HISTORY REVISION NOTES FROM AUTHORS (ROUND 2) Thank you for providing us with the opportunity to revise our paper. We have revised the manuscript according to the editors and

More information

Chapter Three: Sampling Methods

Chapter Three: Sampling Methods Chapter Three: Sampling Methods The idea of this chapter is to make sure that you address sampling issues - even though you may be conducting an action research project and your sample is "defined" by

More information

04/12/2014. Research Methods in Psychology. Chapter 6: Independent Groups Designs. What is your ideas? Testing

04/12/2014. Research Methods in Psychology. Chapter 6: Independent Groups Designs. What is your ideas? Testing Research Methods in Psychology Chapter 6: Independent Groups Designs 1 Why Psychologists Conduct Experiments? What is your ideas? 2 Why Psychologists Conduct Experiments? Testing Hypotheses derived from

More information

Detection Theory: Sensitivity and Response Bias

Detection Theory: Sensitivity and Response Bias Detection Theory: Sensitivity and Response Bias Lewis O. Harvey, Jr. Department of Psychology University of Colorado Boulder, Colorado The Brain (Observable) Stimulus System (Observable) Response System

More information

Analysis of single gene effects 1. Quantitative analysis of single gene effects. Gregory Carey, Barbara J. Bowers, Jeanne M.

Analysis of single gene effects 1. Quantitative analysis of single gene effects. Gregory Carey, Barbara J. Bowers, Jeanne M. Analysis of single gene effects 1 Quantitative analysis of single gene effects Gregory Carey, Barbara J. Bowers, Jeanne M. Wehner From the Department of Psychology (GC, JMW) and Institute for Behavioral

More information

A Brief Introduction to Bayesian Statistics

A Brief Introduction to Bayesian Statistics A Brief Introduction to Statistics David Kaplan Department of Educational Psychology Methods for Social Policy Research and, Washington, DC 2017 1 / 37 The Reverend Thomas Bayes, 1701 1761 2 / 37 Pierre-Simon

More information

Guidance Document for Claims Based on Non-Inferiority Trials

Guidance Document for Claims Based on Non-Inferiority Trials Guidance Document for Claims Based on Non-Inferiority Trials February 2013 1 Non-Inferiority Trials Checklist Item No Checklist Item (clients can use this tool to help make decisions regarding use of non-inferiority

More information

The wicked learning environment of regression toward the mean

The wicked learning environment of regression toward the mean The wicked learning environment of regression toward the mean Working paper December 2016 Robin M. Hogarth 1 & Emre Soyer 2 1 Department of Economics and Business, Universitat Pompeu Fabra, Barcelona 2

More information

Lecture Notes Module 2

Lecture Notes Module 2 Lecture Notes Module 2 Two-group Experimental Designs The goal of most research is to assess a possible causal relation between the response variable and another variable called the independent variable.

More information

Checking the counterarguments confirms that publication bias contaminated studies relating social class and unethical behavior

Checking the counterarguments confirms that publication bias contaminated studies relating social class and unethical behavior 1 Checking the counterarguments confirms that publication bias contaminated studies relating social class and unethical behavior Gregory Francis Department of Psychological Sciences Purdue University gfrancis@purdue.edu

More information

Retrospective power analysis using external information 1. Andrew Gelman and John Carlin May 2011

Retrospective power analysis using external information 1. Andrew Gelman and John Carlin May 2011 Retrospective power analysis using external information 1 Andrew Gelman and John Carlin 2 11 May 2011 Power is important in choosing between alternative methods of analyzing data and in deciding on an

More information

Recognizing Ambiguity

Recognizing Ambiguity Recognizing Ambiguity How Lack of Information Scares Us Mark Clements Columbia University I. Abstract In this paper, I will examine two different approaches to an experimental decision problem posed by

More information

Using The Scientific method in Psychology

Using The Scientific method in Psychology CHAPTER FIVE Using The Scientific method in Psychology 1 The Scientific Method in Psychology Science The key elements of the scientific approach are: Objective measurements of the phenomenon under consideration

More information

Funnelling Used to describe a process of narrowing down of focus within a literature review. So, the writer begins with a broad discussion providing b

Funnelling Used to describe a process of narrowing down of focus within a literature review. So, the writer begins with a broad discussion providing b Accidental sampling A lesser-used term for convenience sampling. Action research An approach that challenges the traditional conception of the researcher as separate from the real world. It is associated

More information

August 29, Introduction and Overview

August 29, Introduction and Overview August 29, 2018 Introduction and Overview Why are we here? Haavelmo(1944): to become master of the happenings of real life. Theoretical models are necessary tools in our attempts to understand and explain

More information

Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses

Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses Item Response Theory Steven P. Reise University of California, U.S.A. Item response theory (IRT), or modern measurement theory, provides alternatives to classical test theory (CTT) methods for the construction,

More information

Chapter 5: Field experimental designs in agriculture

Chapter 5: Field experimental designs in agriculture Chapter 5: Field experimental designs in agriculture Jose Crossa Biometrics and Statistics Unit Crop Research Informatics Lab (CRIL) CIMMYT. Int. Apdo. Postal 6-641, 06600 Mexico, DF, Mexico Introduction

More information

The Roles of Short Term Endpoints in. Clinical Trial Planning and Design

The Roles of Short Term Endpoints in. Clinical Trial Planning and Design The Roles of Short Term Endpoints in Clinical Trial Planning and Design Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Roche, Welwyn Garden

More information

THIS PROBLEM HAS BEEN SOLVED BY USING THE CALCULATOR. A 90% CONFIDENCE INTERVAL IS ALSO SHOWN. ALL QUESTIONS ARE LISTED BELOW THE RESULTS.

THIS PROBLEM HAS BEEN SOLVED BY USING THE CALCULATOR. A 90% CONFIDENCE INTERVAL IS ALSO SHOWN. ALL QUESTIONS ARE LISTED BELOW THE RESULTS. Math 117 Confidence Intervals and Hypothesis Testing Interpreting Results SOLUTIONS The results are given. Interpret the results and write the conclusion within context. Clearly indicate what leads to

More information

Estimation. Preliminary: the Normal distribution

Estimation. Preliminary: the Normal distribution Estimation Preliminary: the Normal distribution Many statistical methods are only valid if we can assume that our data follow a distribution of a particular type, called the Normal distribution. Many naturally

More information

Meta-Analysis and Subgroups

Meta-Analysis and Subgroups Prev Sci (2013) 14:134 143 DOI 10.1007/s11121-013-0377-7 Meta-Analysis and Subgroups Michael Borenstein & Julian P. T. Higgins Published online: 13 March 2013 # Society for Prevention Research 2013 Abstract

More information

1. Improve Documentation Now

1. Improve Documentation Now Joseph C. Nichols, MD, Principal, Health Data Consulting, Seattle, Washington From Medscape Education Family Medicine: Transition to ICD-10: Getting Started. Posted: 06/19/2012 Target Audience: This activity

More information

Computerized Adaptive Testing for Classifying Examinees Into Three Categories

Computerized Adaptive Testing for Classifying Examinees Into Three Categories Measurement and Research Department Reports 96-3 Computerized Adaptive Testing for Classifying Examinees Into Three Categories T.J.H.M. Eggen G.J.J.M. Straetmans Measurement and Research Department Reports

More information

Calculating and Reporting Effect Sizes to Facilitate Cumulative Science: A. Practical Primer for t-tests and ANOVAs. Daniël Lakens

Calculating and Reporting Effect Sizes to Facilitate Cumulative Science: A. Practical Primer for t-tests and ANOVAs. Daniël Lakens Calculating and Reporting Effect Sizes 1 RUNNING HEAD: Calculating and Reporting Effect Sizes Calculating and Reporting Effect Sizes to Facilitate Cumulative Science: A Practical Primer for t-tests and

More information

CHAPTER - 6 STATISTICAL ANALYSIS. This chapter discusses inferential statistics, which use sample data to

CHAPTER - 6 STATISTICAL ANALYSIS. This chapter discusses inferential statistics, which use sample data to CHAPTER - 6 STATISTICAL ANALYSIS 6.1 Introduction This chapter discusses inferential statistics, which use sample data to make decisions or inferences about population. Populations are group of interest

More information

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA Data Analysis: Describing Data CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA In the analysis process, the researcher tries to evaluate the data collected both from written documents and from other sources such

More information

Role of Statistics in Research

Role of Statistics in Research Role of Statistics in Research Role of Statistics in research Validity Will this study help answer the research question? Analysis What analysis, & how should this be interpreted and reported? Efficiency

More information

Underreporting in Psychology Experiments: Evidence From a Study Registry

Underreporting in Psychology Experiments: Evidence From a Study Registry Article Underreporting in Psychology Experiments: Evidence From a Study Registry Social Psychological and Personality Science 2016, Vol. 7(1) 8-12 ª The Author(s) 2015 Reprints and permission: sagepub.com/journalspermissions.nav

More information

The Ontogeny and Durability of True and False Memories: A Fuzzy Trace Account

The Ontogeny and Durability of True and False Memories: A Fuzzy Trace Account JOURNAL OF EXPERIMENTAL CHILD PSYCHOLOGY 71, 165 169 (1998) ARTICLE NO. CH982468 The Ontogeny and Durability of True and False Memories: A Fuzzy Trace Account Stephen J. Ceci Cornell University and Maggie

More information

PERCEIVED TRUSTWORTHINESS OF KNOWLEDGE SOURCES: THE MODERATING IMPACT OF RELATIONSHIP LENGTH

PERCEIVED TRUSTWORTHINESS OF KNOWLEDGE SOURCES: THE MODERATING IMPACT OF RELATIONSHIP LENGTH PERCEIVED TRUSTWORTHINESS OF KNOWLEDGE SOURCES: THE MODERATING IMPACT OF RELATIONSHIP LENGTH DANIEL Z. LEVIN Management and Global Business Dept. Rutgers Business School Newark and New Brunswick Rutgers

More information

Conditional spectrum-based ground motion selection. Part II: Intensity-based assessments and evaluation of alternative target spectra

Conditional spectrum-based ground motion selection. Part II: Intensity-based assessments and evaluation of alternative target spectra EARTHQUAKE ENGINEERING & STRUCTURAL DYNAMICS Published online 9 May 203 in Wiley Online Library (wileyonlinelibrary.com)..2303 Conditional spectrum-based ground motion selection. Part II: Intensity-based

More information

Objectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests

Objectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests Objectives Quantifying the quality of hypothesis tests Type I and II errors Power of a test Cautions about significance tests Designing Experiments based on power Evaluating a testing procedure The testing

More information

Sanjay P. Zodpey Clinical Epidemiology Unit, Department of Preventive and Social Medicine, Government Medical College, Nagpur, Maharashtra, India.

Sanjay P. Zodpey Clinical Epidemiology Unit, Department of Preventive and Social Medicine, Government Medical College, Nagpur, Maharashtra, India. Research Methodology Sample size and power analysis in medical research Sanjay P. Zodpey Clinical Epidemiology Unit, Department of Preventive and Social Medicine, Government Medical College, Nagpur, Maharashtra,

More information

Chapter 9. Youth Counseling Impact Scale (YCIS)

Chapter 9. Youth Counseling Impact Scale (YCIS) Chapter 9 Youth Counseling Impact Scale (YCIS) Background Purpose The Youth Counseling Impact Scale (YCIS) is a measure of perceived effectiveness of a specific counseling session. In general, measures

More information

An introduction to power and sample size estimation

An introduction to power and sample size estimation 453 STATISTICS An introduction to power and sample size estimation S R Jones, S Carley, M Harrison... Emerg Med J 2003;20:453 458 The importance of power and sample size estimation for study design and

More information

The Scientific Method

The Scientific Method The Scientific Method Objectives 1. To understand the central role of hypothesis testing in the modern scientific process. 2. To design and conduct an experiment using the scientific method. 3. To learn

More information

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity Measurement & Variables - Initial step is to conceptualize and clarify the concepts embedded in a hypothesis or research question with

More information

Practitioner s Guide To Stratified Random Sampling: Part 1

Practitioner s Guide To Stratified Random Sampling: Part 1 Practitioner s Guide To Stratified Random Sampling: Part 1 By Brian Kriegler November 30, 2018, 3:53 PM EST This is the first of two articles on stratified random sampling. In the first article, I discuss

More information

Online Introduction to Statistics

Online Introduction to Statistics APPENDIX Online Introduction to Statistics CHOOSING THE CORRECT ANALYSIS To analyze statistical data correctly, you must choose the correct statistical test. The test you should use when you have interval

More information

Cognitive domain: Comprehension Answer location: Elements of Empiricism Question type: MC

Cognitive domain: Comprehension Answer location: Elements of Empiricism Question type: MC Chapter 2 1. Knowledge that is evaluative, value laden, and concerned with prescribing what ought to be is known as knowledge. *a. Normative b. Nonnormative c. Probabilistic d. Nonprobabilistic. 2. Most

More information

Statistics for Psychology

Statistics for Psychology Statistics for Psychology SIXTH EDITION CHAPTER 3 Some Key Ingredients for Inferential Statistics Some Key Ingredients for Inferential Statistics Psychologists conduct research to test a theoretical principle

More information

UNIT 5 - Association Causation, Effect Modification and Validity

UNIT 5 - Association Causation, Effect Modification and Validity 5 UNIT 5 - Association Causation, Effect Modification and Validity Introduction In Unit 1 we introduced the concept of causality in epidemiology and presented different ways in which causes can be understood

More information

What is Science 2009 What is science?

What is Science 2009 What is science? What is science? The question we want to address is seemingly simple, but turns out to be quite difficult to answer: what is science? It is reasonable to ask such a question since this is a book/course

More information

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories Kamla-Raj 010 Int J Edu Sci, (): 107-113 (010) Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories O.O. Adedoyin Department of Educational Foundations,

More information

Working Paper 8: Janine Morley, Interesting topics & directions for practice theories August 2014

Working Paper 8: Janine Morley, Interesting topics & directions for practice theories August 2014 Please Note: The following working paper was presented at the workshop Demanding ideas: where theories of practice might go next held 18-20 June 2014 in Windermere, UK. The purpose of the event was to

More information

Meta-Analysis of Correlation Coefficients: A Monte Carlo Comparison of Fixed- and Random-Effects Methods

Meta-Analysis of Correlation Coefficients: A Monte Carlo Comparison of Fixed- and Random-Effects Methods Psychological Methods 01, Vol. 6, No. 2, 161-1 Copyright 01 by the American Psychological Association, Inc. 82-989X/01/S.00 DOI:.37//82-989X.6.2.161 Meta-Analysis of Correlation Coefficients: A Monte Carlo

More information

Understanding Uncertainty in School League Tables*

Understanding Uncertainty in School League Tables* FISCAL STUDIES, vol. 32, no. 2, pp. 207 224 (2011) 0143-5671 Understanding Uncertainty in School League Tables* GEORGE LECKIE and HARVEY GOLDSTEIN Centre for Multilevel Modelling, University of Bristol

More information

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD Psy 427 Cal State Northridge Andrew Ainsworth, PhD Contents Item Analysis in General Classical Test Theory Item Response Theory Basics Item Response Functions Item Information Functions Invariance IRT

More information

Lecture Outline Biost 517 Applied Biostatistics I. Statistical Goals of Studies Role of Statistical Inference

Lecture Outline Biost 517 Applied Biostatistics I. Statistical Goals of Studies Role of Statistical Inference Lecture Outline Biost 517 Applied Biostatistics I Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Statistical Inference Role of Statistical Inference Hierarchy of Experimental

More information

Chapter 8. Learning Objectives 9/10/2012. Research Principles and Evidence Based Practice

Chapter 8. Learning Objectives 9/10/2012. Research Principles and Evidence Based Practice 1 Chapter 8 Research Principles and Evidence Based Practice 2 Learning Objectives Explain the importance of EMS research. Distinguish between types of EMS research. Outline 10 steps to perform research

More information

How Does Analysis of Competing Hypotheses (ACH) Improve Intelligence Analysis?

How Does Analysis of Competing Hypotheses (ACH) Improve Intelligence Analysis? How Does Analysis of Competing Hypotheses (ACH) Improve Intelligence Analysis? Richards J. Heuer, Jr. Version 1.2, October 16, 2005 This document is from a collection of works by Richards J. Heuer, Jr.

More information

The Effect Sizes r and d in Hypnosis Research

The Effect Sizes r and d in Hypnosis Research Marty Sapp The Effect Sizes r and d in Hypnosis Research Marty Sapp, Ed.D. The effect sizes r and d and their confidence intervals can improve hypnosis research. This paper describes how to devise scientific

More information

Review Statistics review 2: Samples and populations Elise Whitley* and Jonathan Ball

Review Statistics review 2: Samples and populations Elise Whitley* and Jonathan Ball Available online http://ccforum.com/content/6/2/143 Review Statistics review 2: Samples and populations Elise Whitley* and Jonathan Ball *Lecturer in Medical Statistics, University of Bristol, UK Lecturer

More information