Optimizing Research Payoff

Size: px
Start display at page:

Download "Optimizing Research Payoff"

Transcription

1 64917PPSXXX1.1177/ Miller, UlrichOptimizing Research research-article216 Optimizing Research Jeff Miller 1 and Rolf Ulrich 2 1 University of Otago and 2 University of Tübingen Perspectives on Psychological Science 216, Vol. 11(5) The Author(s) 216 Reprints and permissions: sagepub.com/journalspermissions.nav DOI: / pps.sagepub.com Abstract In this article, we present a model for determining how total research payoff depends on researchers choices of sample sizes, α levels, and other parameters of the research process. The model can be used to quantify various tradeoffs inherent in the research process and thus to balance competing goals, such as (a) maximizing both the number of studies carried out and also the statistical power of each study, (b) minimizing the rates of both false positive and false negative findings, and (c) maximizing both replicability and research efficiency. Given certain necessary information about a research area, the model can be used to determine the optimal values of sample size, statistical power, rate of false positives, rate of false negatives, and replicability, such that overall research payoff is maximized. More specifically, the model shows how the optimal values of these quantities depend upon the size and frequency of true effects within the area, as well as the individual payoffs associated with particular study outcomes. The model is particularly relevant within current discussions of how to optimize the productivity of scientific research, because it shows which aspects of a research area must be considered and how these aspects combine to determine total research payoff. Keywords optimizing research payoff, sample size, power, false positives, replicability Research efficiency is important in virtually every area of science, because the capacity to carry out research is limited by the available time, money, and personnel. In many research areas, there are also limits on the available study material, such as meteor fragments, cells of rare types, or plots of land with particular soil types. In medical research, there may be relatively few patients with a disease under study. In studies on animals, ethical considerations suggest that studies should be kept as small as possible (e.g., Sagarin, Ambler, & Lee, 214). Finally, in situations where research results would have immediate societal value, studies should be as efficient as possible so that the research goals may be achieved and benefits obtained as quickly as possible. In recent years, there has been great alarm over the possibility that current research practices are inefficient, and there have been many recommendations about how efficiency could be increased (e.g., Chalmers et al., 214; Chalmers & Glasziou, 29; Ioannidis et al., 214; Macleod et al., 214). To a large extent, the alarm has arisen because of emerging evidence that many published findings in the scientific literature may not actually reflect the true state of the world (e.g., Ioannidis, 25b; Jager & Leek, 214; Ledgerwood, 214a; Nosek et al., 215; Simmons, Nelson, & Simonsohn, 211; Vul, Harris, Winkielman, & Pashler, 29; Wacholder, Chanock, Garcia-Closas, El ghormli, & Rothman, 24; for an historical review, see Finkel, Eastwick, & Reis, 215). Some of these erroneous findings, often referred to as false positives (FPs), appear to arise because researchers try many alternative data analyses looking for statistically significant results a practice sometimes referred to as p hacking (e.g., Head, Holman, Lanfear, Kahn, & Jennions, 215; John, Loewenstein, & Prelec, 212; Simmons et al., 211). In addition, the tendency of journals to publish only statistically significant results is known to increase the frequency of FPs within the published literature (e.g., Bakker, Van Dijk, & Wicherts, 212; Fanelli, 212; Sterling, 1959; Sterling, Rosenbaum, & Weinkam, 1995). Naturally, concerns over the unexpectedly high frequency of FPs have led to calls for changes in both the questionable practices themselves and in the incentives that promote them (e.g., Button et al., 213; Fiedler, Kutzner, & Krueger, Corresponding Author: Jeff Miller, Department of Psychology, University of Otago, PO Box 56, Dunedin 954, New Zealand miller@psy.otago.ac.nz

2 Optimizing Research ; Ioannidis, 25b, 214; Koole & Lakens, 212; Nosek, Spies, & Motyl, 212; Vul et al., 29). These concerns have also led to the development of methods for identifying and counteracting the effects of these practices on meta-analyses (e.g., Egger, Smith, Schneider, & Minder, 1997; Francis, 212; Ioannidis & Trikalinos, 27; Peters, Sutton, Jones, Abrams, & Rushton, 26; Simonsohn, Nelson, & Simmons, 214b; Sutton, Abrams, Jones, Sheldon, & Song, 2; Ulrich & Miller, 215; van Assen, van Aert, & Wicherts, 215; Wolf, 1986) and to methods for assessing and ethically disclosing the effects of the practices on statistical results (e.g., Sagarin et al., 214). Thus, the recently increased awareness of FPs has resulted in numerous positive steps that can and should be taken to reduce the frequency and impact of FPs in the literature. Although it is clear that these steps can reduce the frequency of FPs in the scientific literature, there is still the fundamental problem that some FPs must arise even if researchers do everything correctly (i.e., they use no questionable research practices). Some FPs are simply inevitable in any research area that relies on statistical inference, for reasons that are well known and are reviewed in the following section. Two of the factors influencing the frequency of FPs the level of statistical significance and statistical power are determined by the researchers own choices. Naturally, then, concern over FPs has led to recommendations that researchers make these choices in such a way as to minimize FP frequency (e.g., Asendorpf et al., 213; Button et al., 213; Ioannidis, 25b; Johnson, 213; Nosek et al., 212). Unfortunately, the same choices that minimize FPs necessarily have other consequences, some of which are undesirable (e.g., Fiedler et al., 212; Fiedler & Schott, in press; Finkel et al., 215; Friston, 212; Ioannidis, 214; Johnson, 213; Mudge, Baker, Edge, & Houlahan, 212; Sbarra, 214). In particular, as will be elaborated later, minimizing FPs also tends to slow the discovery of real effects, known as true positives (TPs). This means that the optimal choices must strike a good balance between reducing the frequency of FPs and maintaining a good rate of TPs a strategy referred to as the error balance approach by Finkel et al. (215). This can only be done within a detailed model in which the overall research payoff is assessed by considering both the costs associated with FPs and the benefits associated with TPs. To contribute to the ongoing discussion about how research practices should be modified to address concerns about FPs, this article presents such a model of total research payoff and shows how this model can be used to investigate which researchers choices would tend to optimize that payoff. In the next three sections, we review the relevant background on statistical hypothesis testing, the inherent rates of incorrect decisions such as FPs, and the costs associated with minimizing FPs. This review is organized primarily around the technique of null hypothesis significance testing (NHST), which is the standard approach in most fields, but the same basic issues also arise within alternative Bayesian procedures (e.g., DeGroot, 1989, pp ). Following that review, we present a model for research payoff and show how the model can be used to optimize payoff within a research area. Among other things, the model shows how the optimal choices for researchers depend on the characteristics of their research area, and this means that it is impossible to identify a universally optimal set of choices that would apply across all areas. Nonetheless, the model is useful in highlighting the factors that researchers must consider in making their choices. NHST For researchers using NHST, the set of all possible studies can be conceptualized as a research scenario like that shown in Figure 1. Depending on the interests of a metaanalyst concerned with topics like research efficiency and FP rates, this set of studies might be fairly tightly constrained (e.g., highly cited clinical trials examined by Ioannidis, 25a), or it might include a rather broad set of topics (e.g., studies in neuroscience examined by Button et al., 213; or studies carried out with the hope of publication in one of three specific psychology journals, examined by Open Science Collaboration, 215). Each individual study within the scenario tests a specific null hypothesis (H ) according to which a particular effect is absent. In some studies the effect is actually absent (i.e., H is true; left side of Fig. 1), whereas in others a true effect is actually present (i.e., H is false; right side of figure). At the end of each study, the statistical analysis leads either to a positive decision that a true effect is present, known as rejecting H (shaded rectangles), or to a negative decision that a true effect may not be present, known as failing to reject H (unshaded rectangles). Thus, four study outcomes can be distinguished, depending on the presence or absence of the true effect under test and the positive or negative decision based on the data. Two of the outcomes represent correct decisions (TPs and true negatives [TNs]), whereas two represent incorrect decisions (FPs and false negatives [FNs]). The possibility of incorrect decisions both FPs and FNs is an unavoidable feature of statistical research, because the results are influenced by random error as well as by the true state of the world. Table 1 depicts the probabilities of the different outcomes shown in Figure 1, both in general and for the numerical example illustrated in the figure. As can be seen from the formulas in the table, the probabilities of the four mutually exclusive and exhaustive outcomes within NHST are determined by three underlying parameters:

3 666 Miller, Ulrich Effect absent (H true) FP TN Effect present TP FN (and journal editors) in many research areas. In terms of the outcomes depicted in Figure 1, Pr( FP ) α = Pr( FP ) + Pr( TN ). (1) 2. The conditional probability of obtaining a negative result given that a true effect is present. This parameter is generally known as the Type II error probability and denoted β, but we will denote it as β(d, n s, α) to emphasize that its value depends on the true effect size d (e.g., as parameterized by Cohen, 1988), the sample size n s, and the chosen α level. In terms of the outcomes shown in Figure 1, Pr( FN ) β( d,n s, α) = Pr( TP ) + Pr( FN ). (2) Fig. 1. A schematic depiction of a given research scenario and the four possible outcomes of individual studies within that scenario. The total area within the large outside rectangle represents all possible studies that might be carried out within that scenario. Studies testing for effects that are absent (i.e., H is true) are shown on the left, and those testing for effects that are present are shown on the right, but of course the researcher does not know whether a particular contemplated effect falls on one side or the other. The result of each study is either a positive decision (i.e., conclude that the effect is present; shaded rectangles), or a negative decision (i.e., conclude that the effect may be absent; unshaded rectangles). True positive (TP) and true negative (TN) decisions are correct; false positive (FP) and false negative (FN) decisions are errors. The figure is drawn so that the areas of the four outcome rectangles (FP, TN, TP, and FN) are proportional to their probabilities under the parameter values shown in Table The conditional probability of obtaining a positive result given that no true effect is present. This parameter is variously known as the alpha (α) level, the Type I error probability, the significance level, or the p level. An important requirement of NHST is that researchers must choose this value before collecting the data, and it is conventionally chosen to be 5% by researchers The α level and sample size determining the Type II error probability are, of course, always known to the researcher. As is considered further in the General Discussion, the true effect size is generally not known precisely but can often be estimated with meta-analysis (e.g., Richard, Bond, & Stokes-Zoota, 23). The complement of the Type II error probability, 1 β(d, n s, α), is known as the power level, and it is the conditional probability of obtaining a positive result given that a true effect of size d is present. 3. The unconditional probability of a true effect. This parameter is sometimes called the base rate of true effects, π 1, and it represents the proportion of all possible studies within a given research scenario for which the true effect is present. This probability is reflected in Figure 1 as π = Pr( TP ) + Pr( FN ). (3) 1 Like the true effect size, the base rate of true effects is also generally unknown, but researchers have a number of ways of estimating its value, as is considered further in the General Discussion. Table 1. Probabilities Corresponding to the Four Possible Individual Study Outcomes (False Positive or FP, True Negative or TN, True Positive or TP, and False Negative or FN) for the Research Scenario Depicted in Figure 1 Outcome Probability Numerical example in Figure 1 FP Pr(FP) = (1 π 1 ) α.9.5 = 4.5% TN Pr(TN) = (1 π 1 ) (1 α).9.95 = 85.5% TP Pr(TP) = π 1 [1 β(d, n s, α)].1 (1.6) = 4.% FN Pr(FN) = π 1 β(d, n s, α).1.6 = 6.% Sum 1 1% Note: The probabilities shown in the numerical example correspond to the relative areas of the four outcome rectangles shown in Figure 1, and these represent a research scenario with α =.5, power [1 β(d, n s, α)] =.4, and a base rate proportion of true effects π 1 =.1. For this scenario, the rates of FPs and FNs are: R fp = Pr(FP) / [Pr(FP) + Pr(TP)] = 4.5 /( ) = 52.9%, and R fn = Pr(FN) / [Pr(FN) + Pr(TN)] = 6. /( ) = 6.6%.

4 Optimizing Research 667 Rates of FPs and FNs Discussions of FPs often focus on the proportion of all positive results within a research area that are false, known as the rate of FPs (R fp, also sometimes called the false positive report probability or the false discovery rate; Benjamini & Hochberg, 1995; Wacholder et al., 24). 1 Formally, the rate of FPs is the conditional probability that a result is false, given that it is a positive result, defined as FP R fp = Pr( ) Pr( FP ) + Pr( TP ). (4) A common misconception about the FP rate is that it should be at most 5% if researchers use the α =.5 level for hypothesis testing (e.g., Cohen, 1994; Falk & Greenbaum, 1995; Oakes, 1986; Pashler & Harris, 212; Pollard & Richardson, 1987). Instead, FP rates can be much larger even approaching 1% for purely statistical reasons. Considering that there may be additional effects of editorial policies and of questionable research practices beyond the purely statistical causes of FPs, some estimates put the FP rate at more than 9% of published findings in certain research areas (e.g., Ioannidis, 25b). The idea that the rate of FPs is at most the 5% α level is a misconception because these are two fundamentally different conditional probabilities. The probabilities are similar in that both increase directly with the proportion of false positive results, as can be seen by noting that the numerators of Equations 1 and 4 are identical. However, the two probabilities have different denominators. The α level indicates the number of FPs relative to the total number of experiments in which H is true [i.e., Pr(FP) + Pr(TN)], whereas the rate of FPs measures the number of FPs relative to the total number of experiments in which H is rejected [i.e., Pr(FP) + Pr(TP)]. Thus, the α level and the rate of FPs are only equal when TPs and TNs have equal probabilities [i.e., Pr(TP) = Pr(TN)], as can be seen by comparing the relevant fractions directly:? R fp = α Pr( FP )? Pr( FP ) =. Pr( FP ) + Pr( TP ) Pr( FP ) + Pr( TN ) This comparison shows that the rate of FPs exceeds the α level whenever TNs are more common than TPs. For example, TNs are much more common than TPs within the research scenario summarized in Table 1 and depicted in Figure 1, and this results in 52.9% FPs even though α =.5. In general, the rate of FPs tends to be (5) large when true effects are rare, simply because there are few opportunities for TPs in that case. As another example, when a researcher tests new cancer treatment drugs, only a small proportion of the candidate drugs may actually be beneficial (e.g., π 1.1), and this produces a relatively high FP rate (e.g., Ioannidis, 25b; Wacholder et al., 24). In an extreme case where all studies tested true H s (e.g., ESP research), true positive results would be impossible (i.e., Pr(TP) = ), and therefore all positive results would be FPs (i.e., R fp = 1). Conversely, FPs would tend to be rare in research scenarios in which most studies tested for true effects (i.e., where π 1 is large). For instance, with an ideal research scenario in which researchers based their studies on perfect theories, all predicted effects would be present and FPs would be impossible (i.e., Pr(FP) =, thus R fp = ). In a scenario for which 8% of studies tested for true effects, the rate of FPs would be 3%, assuming the same α level and Type II error probability used in Table 1. Unfortunately, the base rate of true effects π 1 is difficult to determine, leading to uncertainty about what FP rate should be expected on purely statistical grounds. As studies can erroneously result in FNs as well as FPs, several authors have emphasized that one should also consider the rate of FNs, defined as FN Rfn = Pr( ) Pr( FN ) + Pr( TN ) (e.g., Fiedler et al., 212; Friston, 212; Mudge et al., 212; Vadillo, Konstantinidis, & Shanks, 216). For the research scenario illustrated in Table 1, for example, the rate of FNs is R fn = 6/( ) = 6.6%, meaning that 6.6% of all negative results represent studies in which researchers failed to detect a true effect. FNs may also be costly, as they reflect missed opportunities. For example, it would clearly be a serious error to dismiss a drug as therapeutically ineffective when it was actually helpful. To minimize both FPs and FNs requires an understanding of how the rates of these errors depend upon the characteristics of the underlying research scenario under discussion. Figure 2 shows how the rates of FPs and FNs depend on the key parameters of α level, power 1 β(d, n s, α), and the base rate of true effects π 1. The figure illustrates four important points. First, there are strong effects of the base rate π 1 on the rates of FPs and FNs, and these effects go in opposite directions. Comparing the panels on the left shows that the rate of FPs tends to be larger when true effects are rare (i.e., π 1 =.1 in Fig. 2A versus π 1 =.9 in Fig. 2E). This is because there are more opportunities for FPs when most studies test true H s. In contrast, examination of the panels on the right shows that the rate of FNs tends to be larger when true (6)

5 668 Miller, Ulrich A: π 1 =.1 α: B: π 1 =.1 R fp.4 R fn C: π 1 =.5 D: π 1 = R fp.4 R fn E: π 1 = F: π 1 = R fp.4 R fn Power Power Fig. 2. Rate of false positives {R fp = Pr(FP)/[Pr(FP) + Pr(TP)], left panels} and rate of false negatives {R fn = Pr(FN)/[Pr(FN) + Pr(TN)], right panels} as a function of the α level, study power [1 β(d, n s, α)], and the base rate probability that a true effect is present π 1. effects are more common (Fig. 2F versus 2B), and this happens because there are more opportunities for FNs when most studies test for true effects. Second, the researcher s α level has a strong effect on the FP rate, as can be seen within Figures 2A, 2C, and 2E. As expected, the rate of FPs decreases when the α level is reduced, because H is rejected in a smaller proportion of the studies in which it is true. Somewhat counterintuitively, the researcher s α level has very little effect on the FN rate for a given power level (Figs. 2B, 2D, and 2F). The insensitivity of the rate of FNs to α appears inconsistent with the standard wisdom that decreasing α makes it harder (i.e., less likely) to detect true effects (e.g., DeGroot, 1989; Fiedler et al., 212; Friston, 212; Mudge et al., 212), but there is a simple explanation for the apparent inconsistency. The standard wisdom applies when the effects of decreasing α are considered across studies with a fixed sample size; for that situation, decreasing α also decreases power. In contrast, Figure 2 displays the effect of changing the α level across studies with a fixed power level. This figure thus depicts the situation in which sample sizes are increased to compensate for the power loss that would otherwise result from the change in α level. If sample size rather than power appeared on the x-axes of Figures 2B, 2D, and 2F, then smaller αs would indeed produce visibly larger false negative rates at each sample size. Third, as can be seen within each panel, the FP and FN rates both decrease as power increases. These

6 Optimizing Research 669 decreases are to be expected because increasing power simultaneously increases the number of TPs (which reduces the ratio R fp in Eq. 4) and also reduces the number of FNs (which reduces the ratio R fn in Eq. 6). Fourth, there are interactions among the α level, power, and base rate parameters in the sense that the effect of each parameter depends on the values of the other parameters. For example, both the α level and power have larger effects on the rate of FPs when the base rate of true effects is low than when it is high (Figs. 2A vs. 2E). This pattern is to be expected because of the increased opportunity for FPs when true effects are rare, as mentioned previously. Analogously, power has a smaller effect on the rate of FNs when true effects are rare than when they are common (Fig. 2B vs. 2F), because there is little opportunity for FNs when true effects are rare. Study Costs Based on the fact that α level and power influence the rates of FPs and FNs, as shown in Figure 2, it seems natural to suggest that researchers should seek to maximize the accuracy of their study results by reducing their α levels and by using larger samples to increase power (e.g., Colhoun, McKeigue, & Smith, 23). For example, Schimmack (212) argued that the most logical approach to reduce concerns about Type I error is to use more stringent criteria for significance (p. 553), and he suggested that a value of α =.1 be used in particularly important studies. Similarly, Johnson (213) recommended shifting to α =.5 or α =.1, noting that under some conditions more stringent standards will thus reduce false-positive rates by a factor of 5 or more (p ). Many others have emphasized the importance of increasing sample sizes to enhance study power and thereby reduce FPs (e.g., Asendorpf et al., 213; Button et al., 213; Ioannidis, 25b; Nosek et al., 212). In contrast, however, some have argued against large sample sizes on various grounds. As noted earlier, ethical considerations may warrant keeping sample sizes as small as possible for example, when animals are sacrificed to perform the study (e.g., Lakens, 214; Sagarin et al., 214). In addition, large sample sizes have been criticized for yielding sufficiently high power to detect very small effects that might better be ignored (Friston, 212; Lenth, 21). The latter criticism might not apply to theoretical development where small effects could be quite important, however, and it can also be overcome by using the data to estimate the sizes of effect as well as their existence. More critically, the main argument against decreasing α and increasing power is that these changes lead to more costly studies. In many studies, the most straightforward way to increase power is to increase sample size, and especially large sample size increases are needed with especially small α levels. Figure 3 illustrates the problem by showing what sample size n s is required to achieve various levels of power for a given α level and true effect size d. Assuming finite study resources (e.g., sample sizes), the strategy of decreasing α and increasing power consumes more resources per study, and this reduces the number of studies that can be conducted. As Lakens and Evers (214) put it, Because running studies with large sample sizes is costly, and because the resources that a researcher has available are finite, a researcher is forced to make a tradeoff between the number of studies that he or she runs and the power of these studies (p. 288). To choose the optimal sample size for any research program, then, the benefits of increasing the power for each individual study must be balanced against the reduction in the number of studies conducted. This can only be done within the context of a quantitative model integrating the benefits and costs of the various correct conclusions and errors that might be made. A Model for Research We introduce the model by considering the question of what sample size a researcher should choose. 2 As will become apparent, this very practical problem includes all of the essential ingredients needed to address questions about the optimal level of α, the optimal power, the optimal false positive and false negative rates, and indeed the optimal values for many other parameters of the research scenario. As will be considered in the General Discussion, this model could in principle be applied within a very broad context (e.g., an entire research area, such as social psychology ). To make the presentation simpler and more concrete, however, we will start by considering how it would apply within the much narrower context of a hypothetical team of researchers carrying out initial screening studies of the effectiveness of various drugs, using experimental/control group designs analyzed with two-sample t tests. 3 Because of resource limitations and other practical constraints, this research team can test only a limited total number of participants, n max, with the available funding. 4 In general, if the team uses experimental and control groups of size n s to screen each drug, they can screen only k = n max (2 n s ) drugs. Thus, there is a tradeoff between the number of participants used in testing each drug and the number of drugs tested (e.g., Lakens & Evers, 214), leading to the question of whether it is better to test more drugs with fewer participants per drug or fewer drugs with more participants per drug. That is, how should the research team choose the sample size n s to maximize its total research payoff? To identify the optimal sample size, it is necessary to have a measure of the total research payoff P T associated

7 67 Miller, Ulrich Required n s α: A: d =.2 B: d =.5 with the research team s scientific efforts (cf., Ioannidis, 214, Table 2). We assume that each of the four possible outcomes in Figure 1 (i.e., TP, FP, TN, FN) is associated with its own individual outcome payoff value (i.e., P tp, P fp, P tn, P fn ), where correct decisions (TP, TN) are associated with positive individual outcome payoffs and incorrect decisions (FP, FN) are associated with negative ones. As is considered further in the General Discussion, these individual outcome payoff values would depend on many factors and they probably vary widely across research areas. Similar individual outcome payoff models have been used in the evaluation and optimization of diagnostic testing (e.g., Swets, Dawes, & Monahan, 2) and personnel selection (e.g., Taylor & Russell, 1939), among other areas. In such areas, it is sometimes possible to estimate fairly directly the payoffs associated with the different individual outcomes. Given the individual outcome payoffs, the research team s expected total payoff using a given sample size, E[P T (n s )], can represented as Required n s Required n s C: d = Power Fig. 3. Functions showing the sample size (n s, per group) required to obtain the indicated level of power 1 β(d, n s, α) when using the indicated α level. For this example, computations were carried out for studies analyzed using a two-tailed, two-sample t test with the indicated true effect size (d). Power was computed as the probability of finding a significant result in the same direction as the true effect (i.e., positive observed d values). Note that the scale of the vertical axis differs across panels. n Pr( TP ) P +Pr( ) P max tp FP fp E [ PT ( ns )] =, (7) 2 ns +Pr( TN ) Ptn +Pr( FN ) P fn where Pr(TP), Pr(FP), Pr(TN), and Pr(FN) are the probabilities of the individual study outcomes, and P tp, P fp, P tn, and P fn are their payoffs. The weighted sum of the individual outcome probabilities and payoffs is the expected payoff for a single study, and the total number of studies that can be conducted with sample size n s is n max (2 n s ). To illustrate how this model can be used to calculate and optimize overall research payoff, it is necessary to assume particular values for the parameters within an example research scenario. The parameter values would certainly differ across scenarios, though, so it is only possible to illustrate the model for some particular and necessarily somewhat arbitrary choices. Thus, we want to emphasize that we intend to illustrate the types of conclusions that the model could support after appropriate parameter values were determined, but we do not intend to offer general conclusions that would apply across all research scenarios. We first illustrate the model with an arbitrary set of individual outcome payoffs P tp = 1, P fp = 1, P tn =, and P fn = which we refer to as the simplistic payoffs. These individual outcome payoffs make explicit the idea of assigning zero values to negative results an idea that seems implicit in existing publication biases (i.e., the practice of publishing mainly significant results; see, e.g., Fanelli, 212; Franco, Malhotra, & Simonovits, 214), as unpublished negative results are likely to have little impact. These payoffs also reflect the arbitrary assumption that the

8 Optimizing Research 671 Table 2. Illustration of the Computational Steps Used to Determine the Optimal Sample Size, Power Level, Rate of FPs, and Rate of FNs for a Given Research Area n s 1 - β(d, n s, α) Pr(TP) Pr(FP) Pr(TN) Pr(FN) E[P T (n s )] R fp R fn Note: The computations reflect a two-sample t-test research scenario with α =.5, a base rate of true effects π 1 =.5, an effect size of d =.5 when the effect is present, individual outcome payoffs of P tp = 1, P fp = 1, P tn =, and P fn =, and a total of n max = 1, available participants across all studies. (These are the same parameters used to compute the α =.5 curves shown in Figures 4E, 5E, 6E, and 7E.) For each sample size n s, the first step is to compute power 1 β(d, n s, α) from the noncentral t-distribution, as is described in the Appendix Computational Details in the online Supplemental Material. Second, the individual outcome probabilities Pr(TP), Pr(FP), Pr(TN), and Pr(FN) are computed from α, π 1, and β(d, n s, α) using the equations shown in Table 1. Third, the expected total research payoff P T is computed with Equation 7. Within this research scenario, the computations indicate that the maximum total expected payoff is E[P T (n s )] = 3.974, which is obtained with n s = 8, so n s = 8 is the optimal sample size. The optimal power level is the value of 1 β(d, n s, α) =.152 associated with n s = 8. The rates of FPs R fp and FNs R fn can be computed for each sample size with Equations 4 and 6, and the rates associated with the optimal sample size are R fp =.141, and R fn =.465. gain contributed by one TP is exactly offset by the loss associated with one FP. Table 2 illustrates the computations needed to apply the model within the example research scenario with its particular combination of statistical test (i.e., two-sample t test), effect size d, base rate of true effects π 1, and α level. For each possible sample size n s, the associated power can be computed from the assumed true effect size d and α level using standard techniques (see Computational Details in the online Supplemental Material). The individual outcome probabilities Pr(TP), Pr(FP), Pr(TN), and Pr(FN) can then be computed using their formulas shown in Table 1, and the expected total payoff E[P T (n s )] can be computed from Equation 7. Once the expected total payoff is computed in this manner for each possible sample size, the optimal sample size is easily identified as the one producing the largest expected payoff. Figure 4 shows expected total research payoff, E[P T (n s )], computed using the simplistic individual outcome payoffs. The different panels represent nine distinct research scenarios differing in the base rate probability with which true study effects are present (π 1 ) and in the size of the true effect (d) when it is present, but having in common the simplistic individual outcome payoffs. As suggested by Cohen (1988), we parameterized the true effect size as the ratio of the mean true effect to the standard deviation of the individual data values (d), with effect sizes of d =.2,.5, or.8 generally regarded as small, medium, and large, respectively. Note that the range of the vertical axis differs across the three rows of this figure, because the expected payoff depends strongly on the base rate of true effects. The research team s problem, of course, is to choose the sample size n s that maximizes its expected total payoff, and the results shown in each panel of the figure directly address that problem for a research team working in the depicted scenario. For example, consider the effect of sample size shown in Figure 4I, which depicts a research scenario in which true effects are relatively common and the effects are relatively large when they are present (i.e., π 1 =.9 and d =.8). Depending on the α level, payoff can peak at a sample size of anywhere from approximately 5 to 5. According to this model, it would seem that smaller sample sizes are inefficient because they do not provide enough information about a given drug, whereas larger sample sizes are inefficient because they reduce the number of drugs that can be tested. For the scenario illustrated in Figure 4I, it is also striking that the expected total payoff is larger for α =.5 than α =.1 and α =.1. Thus, the best overall strategy within this scenario is to use α =.5 with a sample size of five. Evidently, the more stringent α values waste resources because they require larger samples to obtain reasonable power (Fig. 3). Note that the expected payoffs for the different α levels converge at a sample size of approximately 6; for this sample size and true effect size, power is nearly 1. for all three of these αs.

9 672 Miller, Ulrich 2 A: π 1 =.1; d =.2 B: π 1 =.1; d =.5 C: π 1 =.1; d =.8 2 α: D: π 1 =.5; d =.2 E: π 1 =.5; d =.5 F: π 1 =.5; d = G: π 1 =.9; d =.2 H: π 1 =.9; d =.5 I: π 1 =.9; d = n s n s n s Fig. 4. Expected total payoff, E[P T (n s )], as a function of α level and sample size (n s ) for nine research scenarios (A I) differing in the base rate probability that a true effect is present (π 1 ) and in the size of the effect when it is present (d). s were computed from Equation 7 using individual outcome payoffs of P tp = 1, P fp = 1, P tn =, and P fn =. Computations were carried out for studies analyzed with two-sample t tests. The existence of a clear peak in the function relating expected research payoff to sample size that can be seen in Figure 4I is generally consistent across the distinct research scenarios depicted in the different panels, with rather small samples often tending to be optimal under the simplistic individual outcome payoffs used for this figure. Comparisons across the different panels also reveal strong effects of the base rate and the true effect size. Expected total payoffs tend to be larger when true effects are more common (i.e., larger π 1 ), because more TPs can be found when more true effects are present. Furthermore, payoffs tend to be larger with larger true effects, because power tends to be greater with larger effects. What might perhaps be surprising and is certainly disappointing is that the expected total payoff can actually be negative when the base rate is low and the effect size is small (Fig. 4A). The negative total payoff means that the indicated sample sizes and α levels would produce more FPs than TPs, so researchers would actually be better off conducting no studies at all than conducting studies with these values. In this scenario, the most stringent α level works best because it produces

10 Optimizing Research 673 fewer FPs, but even it produces only a small positive expected total payoff (.48) with the optimal sample size. In the other panels, though, the least stringent α =.5 value is superior. This α level, which was advocated by Fisher (1926), has traditionally been standard in many research areas. What are the optimal levels of power, rate of FPs, and rate of FNs? The preceding analysis of sample size can be extended to determine the optimal level of study power, the optimal rates of FPs and FNs, and indeed the optimal value for any other function of the probabilities depicted in Figure 1. Such extensions are useful because they allow researchers to consider the more general questions of what level of power, what rate of FPs, and what rate of FNs would be optimal within a given research area. The extensions are possible because the choice of sample size uniquely determines all of the other values, as is illustrated in Table 2. Therefore, once the optimal sample size is chosen within a given research scenario, the corresponding optimal values of these other variables are also determined. In Table 2, for example, the optimal sample size is n s = 8. This sample size s associated power level,.152, is therefore the optimal power level for researchers working under this scenario. Of course, these researchers could increase power by increasing the sample size, but the payoff computations indicate that this would be a bad idea under this scenario with its assumed simplistic individual outcome payoffs. More total payoff would be lost by performing fewer studies than would be gained by having higher power within each study. Figure 5 shows how the optimal power level within each scenario can be identified by plotting the expected total payoff, E[P T (n s )], as a function of study power rather than sample size. In terms of the columns of Table 2, Figure 5 simply plots the expected total payoff on the vertical axis against power on the horizontal axis. Because power is monotonic with sample size, these plots essentially involve stretching and relabeling the x-axes relative to Figure 4. In addition, vertical comparisons between the curves for different α levels within a panel have a different meaning in this figure than in Figure 4. In the earlier figure, such vertical comparisons were between studies with the same sample size but different power levels, whereas in this figure the comparisons are between studies with the same power levels but different sample sizes. Perhaps the most surprising result in Figure 5 is that expected payoffs within a scenario can be optimal even with rather low values of power. The optimal level of power tends to be higher when the base rate of true effects is lower, but even with the relatively small base rate of π 1 =.1, the optimal value of power can be below.9, and power levels down to approximately.5 may perform almost as well due to the flatness of these curves. In some of the cases with larger base rates, the optimal power levels are actually below.5, meaning that expected total payoff is maximized despite the fact that true effects are missed in more than half of the studies testing for them. Although it seems intuitively surprising that such a low level of power could be optimal, the computations reveal that in this scenario the additional power gained by increasing sample size is inadequate compensation for the reduced number of studies that can be conducted. This is, of course, partly because of the nonlinear relationship between sample size and power shown in Figure 3. The model can also be used to address the ongoing discussion of FP and FN rates considered in the introduction (e.g., Ioannidis, 25b; Jager & Leek, 214; Simmons et al., 211). Just as each sample size has an associated power level, it also has associated rates of FPs and FNs that can be computed using Equations 4 and 6 (see, e.g., the R fp and R fn columns of Table 2). Thus, the expected total payoff within each scenario can be plotted against these rates, and it is possible to see which rate is associated with the greatest payoff within each scenario. Figures 6 and 7 show expected research payoffs, again computed with the simplistic individual outcome payoffs, plotted as functions of the rates of FPs and FNs, respectively. The most dramatic implication of these figures is that the optimal rates of FPs and FNs can be quite different depending on the base rate of the true effects. For example, Figures 6A, 6B, and 6C show that researchers conducting studies in an area with a low base rate of true effects (i.e., π 1 =.1) can maximize their expected total payoff by choosing sample sizes that will yield as many as approximately 15% to 4% FPs, depending on their α levels. In contrast, the optimal rate of FPs is far lower when effects are more common (i.e., π 1 =.5 or.9; Figs. 6D 6I), and it can even drop below 1% in some cases. Similarly, Figure 7 shows that the optimal rates of FNs also vary tremendously across research scenarios. FN rates below 5% are optimal when true effects are rare (i.e., π 1 =.1), but the optimal FN rate can exceed 8% when true effects are common (i.e., π 1 =.9). The bottom line, then, is that although the optimal FP and FN rates are determined by a complex interplay of the characteristics of the research scenario (i.e., base rate, individual outcome payoffs, etc.), these optimal rates can still be identified using the model. Different individual outcome payoff values As was already mentioned, the expected total payoffs depicted in Figures 4 through 7 were computed using the

11 674 Miller, Ulrich 2 A: π 1 =.1; d =.2 B: π 1 =.1; d =.5 C: π 1 =.1; d = α: D: π 1 =.5; d =.2 E: π 1 =.5; d =.5 F: π 1 =.5; d = G: π 1 =.9; d =.2 H: π 1 =.9; d =.5 I: π 1 =.9; d = Power Power Power Fig. 5. Expected research payoff as a function of α level and study power for the same nine scenarios and payoff computations depicted in Figure 4. simplistic individual outcome payoffs P tp = 1, P fp = 1, P tn =, and P fn = as an example. The earlier conclusions about optimal sample sizes, α levels, power, et cetera are thus specific to those simplistic payoffs. To gain some insight into how those conclusions might differ with other individual payoffs, we briefly consider in this section what would be optimal with a very different set of payoffs. As is considered further in the General Discussion, identifying appropriate individual outcome payoffs is an extremely difficult problem, especially since the values seem likely to vary widely across research areas. Here, we recalculated total payoffs using a set of individual outcome payoffs that were sensitive to negative findings as well as positive ones: P tp = 4, P fp = 8, P tn = 1, and P fn = 2. These negative-sensitive payoffs were selected based on the rationale that, in practice, negative findings should probably also be given some weight, because any given research program is also likely to derive both benefits from true negative findings and costs from FNs (e.g., Mudge et al., 212). Indeed, Fiedler et al. (212) even argued that, compared to FPs, false negatives often constitute a more serious problem (p. 661). Specifically, we chose these example negative-sensitive values to reflect

12 Optimizing Research A: π 1 =.1; d =.2 B: π 1 =.1; d =.5 C: π 1 =.1; d =.8 2 α: D: π 1 =.5; d =.2 E: π 1 =.5; d =.5 F: π 1 =.5; d = G: π 1 =.9; d =.2 H: π 1 =.9; d =.5 I: π 1 =.9; d = R fp R fp.1.2 R fp Fig. 6. Expected research payoff as a function of α level and the rate of false positives (R fp ) for the same nine scenarios and payoff computations depicted in Figure 4. Note that the scales of the horizontal axes have been adjusted to reflect the different rates of false positives in the different scenarios. scenarios in which (a) positive results are approximately four times as important as negative ones, and (b) the cost of an incorrect decision is approximately twice as large as the benefit of a correct decision. Figures 8 and 9 show results obtained with these negative-sensitive individual outcome payoffs, and there are clearly major differences between these results and those obtained with the simplistic payoffs (Figs. 4 5). One rather trivial difference is that when the base rate of true effects is low (i.e., π 1 =.1), total payoffs are much higher with the negative-sensitive individual outcome payoffs than with the simplistic ones. This is simply because the negative-sensitive payoffs reward correct negative decisions whereas the simplistic payoffs do not.

13 676 Miller, Ulrich 2 A: π 1 =.1; d =.2 B: π 1 =.1; d =.5 C: π 1 =.1; d =.8 2 α: D: π 1 =.5; d =.2 E: π 1 =.5; d =.5 F: π 1 =.5; d = G: π 1 =.9; d =.2 H: π 1 =.9; d =.5 I: π 1 =.9; d = R fn R fn R fn Fig. 7. Expected research payoff as a function of α level and the rate of false negatives (R fn ) for the same nine scenarios and payoff computations depicted in Figure 4. Note that the scales of the horizontal axes have been adjusted to reflect the different rates of false negatives in the different scenarios. The more important consequence of the change in payoffs is the change in the optimal values of sample size and power. For example, when true effects are rare (i.e., π 1 =.1), relatively high power is optimal for the simplistic individual outcome payoffs (Figs. 5A 5C), whereas comparatively low power is optimal for the negativesensitive individual outcome payoffs (Figs. 9A 9C). The reverse happens when true effects are common (i.e., π 1 =.9); in these situations, relatively low power is optimal for the simplistic individual outcome payoffs (Figs. 5G 5I), whereas high power is optimal for the negativesensitive payoffs (Figs. 9G 9I). Naturally, similar qualitative differences due to the individual outcome payoffs can be seen when comparing the optimal FP

14 Optimizing Research A: π 1 =.1; d =.2 α: B: π 1 =.1; d =.5 C: π 1 =.1; d = D: π 1 =.5; d =.2 E: π 1 =.5; d =.5 F: π 1 =.5; d = G: π 1 =.9; d =.2 H: π 1 =.9; d =.5 I: π 1 =.9; d = n s n s n s Fig. 8. Expected research payoff as a function of sample size (n s ) and statistical significance level (α), for the same nine scenarios and payoff computations depicted in Figure 4. Total payoffs were computed from Equation 7 using the negative-sensitive individual outcome payoffs of P tp = 4, P fp = 8, P tn = 1, and P fn = 2. rates or the optimal FN rates (not shown). Clearly, the dramatic effects of the individual payoffs on the optimal values of sample size, power, etc., indicate that the exact payoffs must be considered explicitly in order to justify any recommendations concerning these values. Extensions The proposed model is only a first approximation to the measurement of research payoff, but its basic principles are general enough to be extended well beyond the limited set of research scenarios examined so far. For example, to this point, we have considered only studies analyzed with the two-sample t test for a difference between two means, but the model can also be easily extended to other statistical tests (see Computational Details in the online Supplemental Material), and the results are quite similar. Thus, we suspect that conclusions about optimal research strategies may depend little on the particular statistical test under consideration. Another way to generalize the model presented here is to consider more complex research scenarios. For

15 678 Miller, Ulrich 15 A: π 1 =.1; d =.2 B: π 1 =.1; d =.5 C: π 1 =.1; d = α: D: π 1 =.5; d =.2 E: π 1 =.5; d =.5 F: π 1 =.5; d = G: π 1 =.9; d =.2 H: π 1 =.9; d =.5 I: π 1 =.9; d = Power Power Power Fig. 9. Expected research payoff as a function of α level and study power for the same nine scenarios depicted in Figure 4. Total payoffs were computed from Equation 7 using the negative-sensitive individual outcome payoffs of P tp = 4, P fp = 8, P tn = 1, and P fn = 2. example, we have considered only scenarios in which there is either no effect or an effect of a fixed size, but it is easy to handle scenarios in which effects of various different sizes might be present. As an example, Varying Effect Sizes in the online Supplemental Material shows how the model can be extended to a scenario in which effects of different sizes could be present and in which the payoffs for TPs and FNs depend on the effect sizes. The extended model can be used to compute the expected total payoff as a function of α level and sample size for such scenarios, and it can thus again to help identify the optimal values. So far, our illustrative calculations have only addressed situations in which the research scenario is the same for all studies. Real research is more complicated than this, especially because the outcome of one study often informs the selection of the next study, but the model can also be extended to such situations. For one thing, the individual outcome payoff values associated with a particular individual study should perhaps not only reflect the possible impact of that study per se but also reflect the contribution of that study to its wider literature. 5 In a contemplated drug study, for example, the negative weight assigned to an FP might reflect not only the costs

Evaluation Models STUDIES OF DIAGNOSTIC EFFICIENCY

Evaluation Models STUDIES OF DIAGNOSTIC EFFICIENCY 2. Evaluation Model 2 Evaluation Models To understand the strengths and weaknesses of evaluation, one must keep in mind its fundamental purpose: to inform those who make decisions. The inferences drawn

More information

Checking the counterarguments confirms that publication bias contaminated studies relating social class and unethical behavior

Checking the counterarguments confirms that publication bias contaminated studies relating social class and unethical behavior 1 Checking the counterarguments confirms that publication bias contaminated studies relating social class and unethical behavior Gregory Francis Department of Psychological Sciences Purdue University gfrancis@purdue.edu

More information

Minimizing Uncertainty in Property Casualty Loss Reserve Estimates Chris G. Gross, ACAS, MAAA

Minimizing Uncertainty in Property Casualty Loss Reserve Estimates Chris G. Gross, ACAS, MAAA Minimizing Uncertainty in Property Casualty Loss Reserve Estimates Chris G. Gross, ACAS, MAAA The uncertain nature of property casualty loss reserves Property Casualty loss reserves are inherently uncertain.

More information

The Regression-Discontinuity Design

The Regression-Discontinuity Design Page 1 of 10 Home» Design» Quasi-Experimental Design» The Regression-Discontinuity Design The regression-discontinuity design. What a terrible name! In everyday language both parts of the term have connotations

More information

Tilburg University. Conducting meta-analyses based on p values van Aert, Robbie; Wicherts, Jelte; van Assen, Marcus

Tilburg University. Conducting meta-analyses based on p values van Aert, Robbie; Wicherts, Jelte; van Assen, Marcus Tilburg University Conducting meta-analyses based on p values van Aert, Robbie; Wicherts, Jelte; van Assen, Marcus Published in: Perspectives on Psychological Science Document version: Publisher's PDF,

More information

Further Properties of the Priority Rule

Further Properties of the Priority Rule Further Properties of the Priority Rule Michael Strevens Draft of July 2003 Abstract In Strevens (2003), I showed that science s priority system for distributing credit promotes an allocation of labor

More information

Underreporting in Psychology Experiments: Evidence From a Study Registry

Underreporting in Psychology Experiments: Evidence From a Study Registry Article Underreporting in Psychology Experiments: Evidence From a Study Registry Social Psychological and Personality Science 2016, Vol. 7(1) 8-12 ª The Author(s) 2015 Reprints and permission: sagepub.com/journalspermissions.nav

More information

Bayesian and Frequentist Approaches

Bayesian and Frequentist Approaches Bayesian and Frequentist Approaches G. Jogesh Babu Penn State University http://sites.stat.psu.edu/ babu http://astrostatistics.psu.edu All models are wrong But some are useful George E. P. Box (son-in-law

More information

Chapter 11. Experimental Design: One-Way Independent Samples Design

Chapter 11. Experimental Design: One-Way Independent Samples Design 11-1 Chapter 11. Experimental Design: One-Way Independent Samples Design Advantages and Limitations Comparing Two Groups Comparing t Test to ANOVA Independent Samples t Test Independent Samples ANOVA Comparing

More information

EXERCISE: HOW TO DO POWER CALCULATIONS IN OPTIMAL DESIGN SOFTWARE

EXERCISE: HOW TO DO POWER CALCULATIONS IN OPTIMAL DESIGN SOFTWARE ...... EXERCISE: HOW TO DO POWER CALCULATIONS IN OPTIMAL DESIGN SOFTWARE TABLE OF CONTENTS 73TKey Vocabulary37T... 1 73TIntroduction37T... 73TUsing the Optimal Design Software37T... 73TEstimating Sample

More information

Full title: A likelihood-based approach to early stopping in single arm phase II cancer clinical trials

Full title: A likelihood-based approach to early stopping in single arm phase II cancer clinical trials Full title: A likelihood-based approach to early stopping in single arm phase II cancer clinical trials Short title: Likelihood-based early stopping design in single arm phase II studies Elizabeth Garrett-Mayer,

More information

UC Davis UC Davis Previously Published Works

UC Davis UC Davis Previously Published Works UC Davis UC Davis Previously Published Works Title Introduction to the Special Section on Improving Research Practices: Thinking Deeply Across the Research Cycle Permalink https://escholarship.org/uc/item/54f712t3

More information

Fixed-Effect Versus Random-Effects Models

Fixed-Effect Versus Random-Effects Models PART 3 Fixed-Effect Versus Random-Effects Models Introduction to Meta-Analysis. Michael Borenstein, L. V. Hedges, J. P. T. Higgins and H. R. Rothstein 2009 John Wiley & Sons, Ltd. ISBN: 978-0-470-05724-7

More information

What is an Impact Factor? How variable is the impact factor? Time after publication (Years) Figure 1. Generalized Citation Curve

What is an Impact Factor? How variable is the impact factor? Time after publication (Years) Figure 1. Generalized Citation Curve Reprinted with pemission from Perspectives in Puplishing, No 1, Oct 2000, http://www.elsevier.com/framework_editors/pdfs/perspectives 1.pdf What is an Impact Factor? The impact factor is only one of three

More information

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

Lec 02: Estimation & Hypothesis Testing in Animal Ecology Lec 02: Estimation & Hypothesis Testing in Animal Ecology Parameter Estimation from Samples Samples We typically observe systems incompletely, i.e., we sample according to a designed protocol. We then

More information

Running head: PRIOR ODDS IN COGNITIVE AND SOCIAL PSYCHOLOGY 1. The Prior Odds of Testing a True Effect in Cognitive and Social Psychology

Running head: PRIOR ODDS IN COGNITIVE AND SOCIAL PSYCHOLOGY 1. The Prior Odds of Testing a True Effect in Cognitive and Social Psychology Running head: PRIOR ODDS IN COGNITIVE AND SOCIAL PSYCHOLOGY 1 The Prior Odds of Testing a True Effect in Cognitive and Social Psychology Brent M. Wilson and John T. Wixted University of California, San

More information

Reinforcement Learning : Theory and Practice - Programming Assignment 1

Reinforcement Learning : Theory and Practice - Programming Assignment 1 Reinforcement Learning : Theory and Practice - Programming Assignment 1 August 2016 Background It is well known in Game Theory that the game of Rock, Paper, Scissors has one and only one Nash Equilibrium.

More information

Field-normalized citation impact indicators and the choice of an appropriate counting method

Field-normalized citation impact indicators and the choice of an appropriate counting method Field-normalized citation impact indicators and the choice of an appropriate counting method Ludo Waltman and Nees Jan van Eck Centre for Science and Technology Studies, Leiden University, The Netherlands

More information

Understanding Uncertainty in School League Tables*

Understanding Uncertainty in School League Tables* FISCAL STUDIES, vol. 32, no. 2, pp. 207 224 (2011) 0143-5671 Understanding Uncertainty in School League Tables* GEORGE LECKIE and HARVEY GOLDSTEIN Centre for Multilevel Modelling, University of Bristol

More information

Convergence Principles: Information in the Answer

Convergence Principles: Information in the Answer Convergence Principles: Information in the Answer Sets of Some Multiple-Choice Intelligence Tests A. P. White and J. E. Zammarelli University of Durham It is hypothesized that some common multiplechoice

More information

CHAPTER THIRTEEN. Data Analysis and Interpretation: Part II.Tests of Statistical Significance and the Analysis Story CHAPTER OUTLINE

CHAPTER THIRTEEN. Data Analysis and Interpretation: Part II.Tests of Statistical Significance and the Analysis Story CHAPTER OUTLINE CHAPTER THIRTEEN Data Analysis and Interpretation: Part II.Tests of Statistical Significance and the Analysis Story CHAPTER OUTLINE OVERVIEW NULL HYPOTHESIS SIGNIFICANCE TESTING (NHST) EXPERIMENTAL SENSITIVITY

More information

Irrationality in Game Theory

Irrationality in Game Theory Irrationality in Game Theory Yamin Htun Dec 9, 2005 Abstract The concepts in game theory have been evolving in such a way that existing theories are recasted to apply to problems that previously appeared

More information

SUPPLEMENTAL MATERIAL

SUPPLEMENTAL MATERIAL 1 SUPPLEMENTAL MATERIAL Response time and signal detection time distributions SM Fig. 1. Correct response time (thick solid green curve) and error response time densities (dashed red curve), averaged across

More information

Jackknife-based method for measuring LRP onset latency differences

Jackknife-based method for measuring LRP onset latency differences Psychophysiology, 35 ~1998!, 99 115. Cambridge University Press. Printed in the USA. Copyright 1998 Society for Psychophysiological Research METHODOLOGY Jackknife-based method for measuring LRP onset latency

More information

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc.

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc. Chapter 23 Inference About Means Copyright 2010 Pearson Education, Inc. Getting Started Now that we know how to create confidence intervals and test hypotheses about proportions, it d be nice to be able

More information

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2 MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and Lord Equating Methods 1,2 Lisa A. Keller, Ronald K. Hambleton, Pauline Parker, Jenna Copella University of Massachusetts

More information

PRECISION, CONFIDENCE, AND SAMPLE SIZE IN THE QUANTIFICATION OF AVIAN FORAGING BEHAVIOR

PRECISION, CONFIDENCE, AND SAMPLE SIZE IN THE QUANTIFICATION OF AVIAN FORAGING BEHAVIOR Studies in Avian Biology No. 13:193-198, 1990. PRECISION, CONFIDENCE, AND SAMPLE SIZE IN THE QUANTIFICATION OF AVIAN FORAGING BEHAVIOR LISA J. PETIT, DANIEL R. PETIT, AND KIMBERLY G. SMITH Abstract. We

More information

Confidence Intervals On Subsets May Be Misleading

Confidence Intervals On Subsets May Be Misleading Journal of Modern Applied Statistical Methods Volume 3 Issue 2 Article 2 11-1-2004 Confidence Intervals On Subsets May Be Misleading Juliet Popper Shaffer University of California, Berkeley, shaffer@stat.berkeley.edu

More information

Lecture Outline Biost 517 Applied Biostatistics I. Statistical Goals of Studies Role of Statistical Inference

Lecture Outline Biost 517 Applied Biostatistics I. Statistical Goals of Studies Role of Statistical Inference Lecture Outline Biost 517 Applied Biostatistics I Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Statistical Inference Role of Statistical Inference Hierarchy of Experimental

More information

Competency Rubric Bank for the Sciences (CRBS)

Competency Rubric Bank for the Sciences (CRBS) Competency Rubric Bank for the Sciences (CRBS) Content Knowledge 1 Content Knowledge: Accuracy of scientific understanding Higher Order Cognitive Skills (HOCS) 3 Analysis: Clarity of Research Question

More information

Measuring and Assessing Study Quality

Measuring and Assessing Study Quality Measuring and Assessing Study Quality Jeff Valentine, PhD Co-Chair, Campbell Collaboration Training Group & Associate Professor, College of Education and Human Development, University of Louisville Why

More information

1 The conceptual underpinnings of statistical power

1 The conceptual underpinnings of statistical power 1 The conceptual underpinnings of statistical power The importance of statistical power As currently practiced in the social and health sciences, inferential statistics rest solidly upon two pillars: statistical

More information

In Support of a No-exceptions Truth-telling Policy in Medicine

In Support of a No-exceptions Truth-telling Policy in Medicine In Support of a No-exceptions Truth-telling Policy in Medicine An odd standard has developed regarding doctors responsibility to tell the truth to their patients. Lying, or the act of deliberate deception,

More information

Placebo and Belief Effects: Optimal Design for Randomized Trials

Placebo and Belief Effects: Optimal Design for Randomized Trials Placebo and Belief Effects: Optimal Design for Randomized Trials Scott Ogawa & Ken Onishi 2 Department of Economics Northwestern University Abstract The mere possibility of receiving a placebo during a

More information

Estimation. Preliminary: the Normal distribution

Estimation. Preliminary: the Normal distribution Estimation Preliminary: the Normal distribution Many statistical methods are only valid if we can assume that our data follow a distribution of a particular type, called the Normal distribution. Many naturally

More information

Review Statistics review 2: Samples and populations Elise Whitley* and Jonathan Ball

Review Statistics review 2: Samples and populations Elise Whitley* and Jonathan Ball Available online http://ccforum.com/content/6/2/143 Review Statistics review 2: Samples and populations Elise Whitley* and Jonathan Ball *Lecturer in Medical Statistics, University of Bristol, UK Lecturer

More information

Learning Deterministic Causal Networks from Observational Data

Learning Deterministic Causal Networks from Observational Data Carnegie Mellon University Research Showcase @ CMU Department of Psychology Dietrich College of Humanities and Social Sciences 8-22 Learning Deterministic Causal Networks from Observational Data Ben Deverett

More information

A Case Study: Two-sample categorical data

A Case Study: Two-sample categorical data A Case Study: Two-sample categorical data Patrick Breheny January 31 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/43 Introduction Model specification Continuous vs. mixture priors Choice

More information

Conditional spectrum-based ground motion selection. Part II: Intensity-based assessments and evaluation of alternative target spectra

Conditional spectrum-based ground motion selection. Part II: Intensity-based assessments and evaluation of alternative target spectra EARTHQUAKE ENGINEERING & STRUCTURAL DYNAMICS Published online 9 May 203 in Wiley Online Library (wileyonlinelibrary.com)..2303 Conditional spectrum-based ground motion selection. Part II: Intensity-based

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

Data that can be classified as belonging to a distinct number of categories >>result in categorical responses. And this includes:

Data that can be classified as belonging to a distinct number of categories >>result in categorical responses. And this includes: This sheets starts from slide #83 to the end ofslide #4. If u read this sheet you don`t have to return back to the slides at all, they are included here. Categorical Data (Qualitative data): Data that

More information

Meta-Analysis of Correlation Coefficients: A Monte Carlo Comparison of Fixed- and Random-Effects Methods

Meta-Analysis of Correlation Coefficients: A Monte Carlo Comparison of Fixed- and Random-Effects Methods Psychological Methods 01, Vol. 6, No. 2, 161-1 Copyright 01 by the American Psychological Association, Inc. 82-989X/01/S.00 DOI:.37//82-989X.6.2.161 Meta-Analysis of Correlation Coefficients: A Monte Carlo

More information

Fundamental Clinical Trial Design

Fundamental Clinical Trial Design Design, Monitoring, and Analysis of Clinical Trials Session 1 Overview and Introduction Overview Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics, University of Washington February 17-19, 2003

More information

How Does Analysis of Competing Hypotheses (ACH) Improve Intelligence Analysis?

How Does Analysis of Competing Hypotheses (ACH) Improve Intelligence Analysis? How Does Analysis of Competing Hypotheses (ACH) Improve Intelligence Analysis? Richards J. Heuer, Jr. Version 1.2, October 16, 2005 This document is from a collection of works by Richards J. Heuer, Jr.

More information

Essential Skills for Evidence-based Practice Understanding and Using Systematic Reviews

Essential Skills for Evidence-based Practice Understanding and Using Systematic Reviews J Nurs Sci Vol.28 No.4 Oct - Dec 2010 Essential Skills for Evidence-based Practice Understanding and Using Systematic Reviews Jeanne Grace Corresponding author: J Grace E-mail: Jeanne_Grace@urmc.rochester.edu

More information

Sawtooth Software. The Number of Levels Effect in Conjoint: Where Does It Come From and Can It Be Eliminated? RESEARCH PAPER SERIES

Sawtooth Software. The Number of Levels Effect in Conjoint: Where Does It Come From and Can It Be Eliminated? RESEARCH PAPER SERIES Sawtooth Software RESEARCH PAPER SERIES The Number of Levels Effect in Conjoint: Where Does It Come From and Can It Be Eliminated? Dick Wittink, Yale University Joel Huber, Duke University Peter Zandan,

More information

Discussion Meeting for MCP-Mod Qualification Opinion Request. Novartis 10 July 2013 EMA, London, UK

Discussion Meeting for MCP-Mod Qualification Opinion Request. Novartis 10 July 2013 EMA, London, UK Discussion Meeting for MCP-Mod Qualification Opinion Request Novartis 10 July 2013 EMA, London, UK Attendees Face to face: Dr. Frank Bretz Global Statistical Methodology Head, Novartis Dr. Björn Bornkamp

More information

Goodness of Pattern and Pattern Uncertainty 1

Goodness of Pattern and Pattern Uncertainty 1 J'OURNAL OF VERBAL LEARNING AND VERBAL BEHAVIOR 2, 446-452 (1963) Goodness of Pattern and Pattern Uncertainty 1 A visual configuration, or pattern, has qualities over and above those which can be specified

More information

To conclude, a theory of error must be a theory of the interaction between human performance variability and the situational constraints.

To conclude, a theory of error must be a theory of the interaction between human performance variability and the situational constraints. The organisers have provided us with a both stimulating and irritating list of questions relating to the topic of the conference: Human Error. My first intention was to try to answer the questions one

More information

Appendix B Statistical Methods

Appendix B Statistical Methods Appendix B Statistical Methods Figure B. Graphing data. (a) The raw data are tallied into a frequency distribution. (b) The same data are portrayed in a bar graph called a histogram. (c) A frequency polygon

More information

Research Practices that can Prevent an Inflation of False-positive Rates. Kou Murayama University of California, Los Angeles

Research Practices that can Prevent an Inflation of False-positive Rates. Kou Murayama University of California, Los Angeles Running head: PREVENTION OF FALSE-POSITIVES Research Practices that can Prevent an Inflation of False-positive Rates Kou Murayama University of California, Los Angeles Reinhard Pekrun University of Munich

More information

An Experiment to Evaluate Bayesian Learning of Nash Equilibrium Play

An Experiment to Evaluate Bayesian Learning of Nash Equilibrium Play . An Experiment to Evaluate Bayesian Learning of Nash Equilibrium Play James C. Cox 1, Jason Shachat 2, and Mark Walker 1 1. Department of Economics, University of Arizona 2. Department of Economics, University

More information

Where does "analysis" enter the experimental process?

Where does analysis enter the experimental process? Lecture Topic : ntroduction to the Principles of Experimental Design Experiment: An exercise designed to determine the effects of one or more variables (treatments) on one or more characteristics (response

More information

6. Unusual and Influential Data

6. Unusual and Influential Data Sociology 740 John ox Lecture Notes 6. Unusual and Influential Data Copyright 2014 by John ox Unusual and Influential Data 1 1. Introduction I Linear statistical models make strong assumptions about the

More information

Lecture 2: Learning and Equilibrium Extensive-Form Games

Lecture 2: Learning and Equilibrium Extensive-Form Games Lecture 2: Learning and Equilibrium Extensive-Form Games III. Nash Equilibrium in Extensive Form Games IV. Self-Confirming Equilibrium and Passive Learning V. Learning Off-path Play D. Fudenberg Marshall

More information

Introduction to statistics Dr Alvin Vista, ACER Bangkok, 14-18, Sept. 2015

Introduction to statistics Dr Alvin Vista, ACER Bangkok, 14-18, Sept. 2015 Analysing and Understanding Learning Assessment for Evidence-based Policy Making Introduction to statistics Dr Alvin Vista, ACER Bangkok, 14-18, Sept. 2015 Australian Council for Educational Research Structure

More information

Supporting Information

Supporting Information 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Supporting Information Variances and biases of absolute distributions were larger in the 2-line

More information

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES 24 MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES In the previous chapter, simple linear regression was used when you have one independent variable and one dependent variable. This chapter

More information

Chapter 5: Field experimental designs in agriculture

Chapter 5: Field experimental designs in agriculture Chapter 5: Field experimental designs in agriculture Jose Crossa Biometrics and Statistics Unit Crop Research Informatics Lab (CRIL) CIMMYT. Int. Apdo. Postal 6-641, 06600 Mexico, DF, Mexico Introduction

More information

European Federation of Statisticians in the Pharmaceutical Industry (EFSPI)

European Federation of Statisticians in the Pharmaceutical Industry (EFSPI) Page 1 of 14 European Federation of Statisticians in the Pharmaceutical Industry (EFSPI) COMMENTS ON DRAFT FDA Guidance for Industry - Non-Inferiority Clinical Trials Rapporteur: Bernhard Huitfeldt (bernhard.huitfeldt@astrazeneca.com)

More information

CHAPTER ONE CORRELATION

CHAPTER ONE CORRELATION CHAPTER ONE CORRELATION 1.0 Introduction The first chapter focuses on the nature of statistical data of correlation. The aim of the series of exercises is to ensure the students are able to use SPSS to

More information

INADEQUACIES OF SIGNIFICANCE TESTS IN

INADEQUACIES OF SIGNIFICANCE TESTS IN INADEQUACIES OF SIGNIFICANCE TESTS IN EDUCATIONAL RESEARCH M. S. Lalithamma Masoomeh Khosravi Tests of statistical significance are a common tool of quantitative research. The goal of these tests is to

More information

Formulating and Evaluating Interaction Effects

Formulating and Evaluating Interaction Effects Formulating and Evaluating Interaction Effects Floryt van Wesel, Irene Klugkist, Herbert Hoijtink Authors note Floryt van Wesel, Irene Klugkist and Herbert Hoijtink, Department of Methodology and Statistics,

More information

RISK AS AN EXPLANATORY FACTOR FOR RESEARCHERS INFERENTIAL INTERPRETATIONS

RISK AS AN EXPLANATORY FACTOR FOR RESEARCHERS INFERENTIAL INTERPRETATIONS 13th International Congress on Mathematical Education Hamburg, 24-31 July 2016 RISK AS AN EXPLANATORY FACTOR FOR RESEARCHERS INFERENTIAL INTERPRETATIONS Rink Hoekstra University of Groningen Logical reasoning

More information

Computerized Mastery Testing

Computerized Mastery Testing Computerized Mastery Testing With Nonequivalent Testlets Kathleen Sheehan and Charles Lewis Educational Testing Service A procedure for determining the effect of testlet nonequivalence on the operating

More information

Introduction to diagnostic accuracy meta-analysis. Yemisi Takwoingi October 2015

Introduction to diagnostic accuracy meta-analysis. Yemisi Takwoingi October 2015 Introduction to diagnostic accuracy meta-analysis Yemisi Takwoingi October 2015 Learning objectives To appreciate the concept underlying DTA meta-analytic approaches To know the Moses-Littenberg SROC method

More information

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug? MMI 409 Spring 2009 Final Examination Gordon Bleil Table of Contents Research Scenario and General Assumptions Questions for Dataset (Questions are hyperlinked to detailed answers) 1. Is there a difference

More information

2012 Course: The Statistician Brain: the Bayesian Revolution in Cognitive Sciences

2012 Course: The Statistician Brain: the Bayesian Revolution in Cognitive Sciences 2012 Course: The Statistician Brain: the Bayesian Revolution in Cognitive Sciences Stanislas Dehaene Chair of Experimental Cognitive Psychology Lecture n 5 Bayesian Decision-Making Lecture material translated

More information

Chapter 7: Descriptive Statistics

Chapter 7: Descriptive Statistics Chapter Overview Chapter 7 provides an introduction to basic strategies for describing groups statistically. Statistical concepts around normal distributions are discussed. The statistical procedures of

More information

Agenetic disorder serious, perhaps fatal without

Agenetic disorder serious, perhaps fatal without ACADEMIA AND CLINIC The First Positive: Computing Positive Predictive Value at the Extremes James E. Smith, PhD; Robert L. Winkler, PhD; and Dennis G. Fryback, PhD Computing the positive predictive value

More information

Theory and Methods Question Bank

Theory and Methods Question Bank Theory and Methods Question Bank Theory and Methods is examined in both the AS and the A Level. The AS questions focus mostly on research methods and at A Level include sociological debates, perspectives

More information

A Comparison of Several Goodness-of-Fit Statistics

A Comparison of Several Goodness-of-Fit Statistics A Comparison of Several Goodness-of-Fit Statistics Robert L. McKinley The University of Toledo Craig N. Mills Educational Testing Service A study was conducted to evaluate four goodnessof-fit procedures

More information

Exploration and Exploitation in Reinforcement Learning

Exploration and Exploitation in Reinforcement Learning Exploration and Exploitation in Reinforcement Learning Melanie Coggan Research supervised by Prof. Doina Precup CRA-W DMP Project at McGill University (2004) 1/18 Introduction A common problem in reinforcement

More information

o^ &&cvi AL Perceptual and Motor Skills, 1965, 20, Southern Universities Press 1965

o^ &&cvi AL Perceptual and Motor Skills, 1965, 20, Southern Universities Press 1965 Ml 3 Hi o^ &&cvi AL 44755 Perceptual and Motor Skills, 1965, 20, 311-316. Southern Universities Press 1965 m CONFIDENCE RATINGS AND LEVEL OF PERFORMANCE ON A JUDGMENTAL TASK 1 RAYMOND S. NICKERSON AND

More information

Hierarchy of Statistical Goals

Hierarchy of Statistical Goals Hierarchy of Statistical Goals Ideal goal of scientific study: Deterministic results Determine the exact value of a ment or population parameter Prediction: What will the value of a future observation

More information

Statistical Methods and Reasoning for the Clinical Sciences

Statistical Methods and Reasoning for the Clinical Sciences Statistical Methods and Reasoning for the Clinical Sciences Evidence-Based Practice Eiki B. Satake, PhD Contents Preface Introduction to Evidence-Based Statistics: Philosophical Foundation and Preliminaries

More information

Power Posing: P-Curving the Evidence

Power Posing: P-Curving the Evidence 658563PSSXXX10.1177/0956797616658563Simmons, SimonsohnP-Curving the Power-Posing Literature research-article2017 Commentary Power Posing: P-Curving the Evidence Joseph P. Simmons and Uri Simonsohn University

More information

Reliability, validity, and all that jazz

Reliability, validity, and all that jazz Reliability, validity, and all that jazz Dylan Wiliam King s College London Introduction No measuring instrument is perfect. The most obvious problems relate to reliability. If we use a thermometer to

More information

Answers to end of chapter questions

Answers to end of chapter questions Answers to end of chapter questions Chapter 1 What are the three most important characteristics of QCA as a method of data analysis? QCA is (1) systematic, (2) flexible, and (3) it reduces data. What are

More information

An Empirical Assessment of Bivariate Methods for Meta-analysis of Test Accuracy

An Empirical Assessment of Bivariate Methods for Meta-analysis of Test Accuracy Number XX An Empirical Assessment of Bivariate Methods for Meta-analysis of Test Accuracy Prepared for: Agency for Healthcare Research and Quality U.S. Department of Health and Human Services 54 Gaither

More information

Why most of psychology is statistically unfalsifiable. Richard D. Morey Cardiff University. Daniël Lakens Eindhoven University of Technology DRAFT

Why most of psychology is statistically unfalsifiable. Richard D. Morey Cardiff University. Daniël Lakens Eindhoven University of Technology DRAFT Why most of psychology is statistically unfalsifiable Richard D. Morey Cardiff University Daniël Lakens Eindhoven University of Technology Abstract Low power in experimental psychology is an oft-discussed

More information

Bayesian Analysis by Simulation

Bayesian Analysis by Simulation 408 Resampling: The New Statistics CHAPTER 25 Bayesian Analysis by Simulation Simple Decision Problems Fundamental Problems In Statistical Practice Problems Based On Normal And Other Distributions Conclusion

More information

PLANNING THE RESEARCH PROJECT

PLANNING THE RESEARCH PROJECT Van Der Velde / Guide to Business Research Methods First Proof 6.11.2003 4:53pm page 1 Part I PLANNING THE RESEARCH PROJECT Van Der Velde / Guide to Business Research Methods First Proof 6.11.2003 4:53pm

More information

The recommended method for diagnosing sleep

The recommended method for diagnosing sleep reviews Measuring Agreement Between Diagnostic Devices* W. Ward Flemons, MD; and Michael R. Littner, MD, FCCP There is growing interest in using portable monitoring for investigating patients with suspected

More information

Issues Surrounding the Normalization and Standardisation of Skin Conductance Responses (SCRs).

Issues Surrounding the Normalization and Standardisation of Skin Conductance Responses (SCRs). Issues Surrounding the Normalization and Standardisation of Skin Conductance Responses (SCRs). Jason J. Braithwaite {Behavioural Brain Sciences Centre, School of Psychology, University of Birmingham, UK}

More information

Comparing Direct and Indirect Measures of Just Rewards: What Have We Learned?

Comparing Direct and Indirect Measures of Just Rewards: What Have We Learned? Comparing Direct and Indirect Measures of Just Rewards: What Have We Learned? BARRY MARKOVSKY University of South Carolina KIMMO ERIKSSON Mälardalen University We appreciate the opportunity to comment

More information

Behavioral Data Mining. Lecture 4 Measurement

Behavioral Data Mining. Lecture 4 Measurement Behavioral Data Mining Lecture 4 Measurement Outline Hypothesis testing Parametric statistical tests Non-parametric tests Precision-Recall plots ROC plots Hardware update Icluster machines are ready for

More information

USE AND MISUSE OF MIXED MODEL ANALYSIS VARIANCE IN ECOLOGICAL STUDIES1

USE AND MISUSE OF MIXED MODEL ANALYSIS VARIANCE IN ECOLOGICAL STUDIES1 Ecology, 75(3), 1994, pp. 717-722 c) 1994 by the Ecological Society of America USE AND MISUSE OF MIXED MODEL ANALYSIS VARIANCE IN ECOLOGICAL STUDIES1 OF CYNTHIA C. BENNINGTON Department of Biology, West

More information

Choose an approach for your research problem

Choose an approach for your research problem Choose an approach for your research problem This course is about doing empirical research with experiments, so your general approach to research has already been chosen by your professor. It s important

More information

Persistence in the WFC3 IR Detector: Intrinsic Variability

Persistence in the WFC3 IR Detector: Intrinsic Variability Persistence in the WFC3 IR Detector: Intrinsic Variability Knox S. Long, & Sylvia M. Baggett March 29, 2018 ABSTRACT When the WFC3 IR detector is exposed to a bright source or sources, the sources can

More information

The Pretest! Pretest! Pretest! Assignment (Example 2)

The Pretest! Pretest! Pretest! Assignment (Example 2) The Pretest! Pretest! Pretest! Assignment (Example 2) May 19, 2003 1 Statement of Purpose and Description of Pretest Procedure When one designs a Math 10 exam one hopes to measure whether a student s ability

More information

10 Intraclass Correlations under the Mixed Factorial Design

10 Intraclass Correlations under the Mixed Factorial Design CHAPTER 1 Intraclass Correlations under the Mixed Factorial Design OBJECTIVE This chapter aims at presenting methods for analyzing intraclass correlation coefficients for reliability studies based on a

More information

baseline comparisons in RCTs

baseline comparisons in RCTs Stefan L. K. Gruijters Maastricht University Introduction Checks on baseline differences in randomized controlled trials (RCTs) are often done using nullhypothesis significance tests (NHSTs). In a quick

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

A Brief (very brief) Overview of Biostatistics. Jody Kreiman, PhD Bureau of Glottal Affairs

A Brief (very brief) Overview of Biostatistics. Jody Kreiman, PhD Bureau of Glottal Affairs A Brief (very brief) Overview of Biostatistics Jody Kreiman, PhD Bureau of Glottal Affairs What We ll Cover Fundamentals of measurement Parametric versus nonparametric tests Descriptive versus inferential

More information

Using the Patient Reported Outcome Measures Tool (PROMT)

Using the Patient Reported Outcome Measures Tool (PROMT) Using the Patient Reported Outcome Measures Tool (PROMT) Using the Patient Reported Outcome Measures Tool DH INFORMATION READER BOX Policy HR / Workforce Management Planning / Clinical Document Purpose

More information

Inference About Magnitudes of Effects

Inference About Magnitudes of Effects invited commentary International Journal of Sports Physiology and Performance, 2008, 3, 547-557 2008 Human Kinetics, Inc. Inference About Magnitudes of Effects Richard J. Barker and Matthew R. Schofield

More information

CHAPTER NINE DATA ANALYSIS / EVALUATING QUALITY (VALIDITY) OF BETWEEN GROUP EXPERIMENTS

CHAPTER NINE DATA ANALYSIS / EVALUATING QUALITY (VALIDITY) OF BETWEEN GROUP EXPERIMENTS CHAPTER NINE DATA ANALYSIS / EVALUATING QUALITY (VALIDITY) OF BETWEEN GROUP EXPERIMENTS Chapter Objectives: Understand Null Hypothesis Significance Testing (NHST) Understand statistical significance and

More information

Retrospective power analysis using external information 1. Andrew Gelman and John Carlin May 2011

Retrospective power analysis using external information 1. Andrew Gelman and John Carlin May 2011 Retrospective power analysis using external information 1 Andrew Gelman and John Carlin 2 11 May 2011 Power is important in choosing between alternative methods of analyzing data and in deciding on an

More information

Simple Sensitivity Analyses for Matched Samples Thomas E. Love, Ph.D. ASA Course Atlanta Georgia https://goo.

Simple Sensitivity Analyses for Matched Samples Thomas E. Love, Ph.D. ASA Course Atlanta Georgia https://goo. Goal of a Formal Sensitivity Analysis To replace a general qualitative statement that applies in all observational studies the association we observe between treatment and outcome does not imply causation

More information