A Bayesian approach to sample size determination for studies designed to evaluate continuous medical tests

Size: px
Start display at page:

Download "A Bayesian approach to sample size determination for studies designed to evaluate continuous medical tests"

Transcription

1 Baylor Health Care System From the SelectedWorks of unlei Cheng 1 A Bayesian approach to sample size determination for studies designed to evaluate continuous medical tests unlei Cheng, Baylor Health Care System Adam J Branscum, University of Kentucky James Stamey, Baylor University Available at:

2 A Bayesian approach to sample size determination for studies designed to evaluate continuous medical tests ULEI CHEG 1, AAM J. BRASCUM, and JAMES. STAMEY 3 1 Institute for Health Care Research and Improvement, Baylor Health Care System, allas, TX 756, USA epartments of Biostatistics, Statistics, and Epidemiology, University of Kentucky, Lexington, KY 4536, USA 3 epartment of Statistical Science, Baylor University, Waco, TX 76798, USA Correspondence to: unlei Cheng, Institute for Health Care Research and Improvement, Baylor Health Care System, 88. Central Expressway, Suite 5 allas, TX 756, unleic@baylorhealth.edu 1

3 Abstract We develop a Bayesian approach to sample size and power calculations for crosssectional studies that are designed to evaluate and compare continuous medical tests. For studies that involve one test or two conditionally independent or dependent tests, we present methods that are applicable when the true disease status of sampled individuals will be available and when it will not. Within a hypothesis testing framework, we consider the goal of demonstrating that a medical test has area under the receiver operating characteristic (ROC) curve that exceeds a minimum acceptable level or another relevant threshold, and the goals of establishing the superiority or equivalence of one test relative to another. A Bayesian average power criterion is used to determine a sample size that will yield high posterior probability, on average, of a future study correctly deciding in favor of these goals. The impacts on Bayesian average power of prior distributions, the proportion of diseased subjects in the study, and correlation among tests are investigated through simulation. The computational algorithm we develop involves simulating multiple data sets that are fit with Bayesian models using Gibbs sampling, and is executed by using WinBUGS in tandem with R. Key Words: iagnostic test, ROC curve, power calculations, simulation 1. Introduction Medical tests are used to accurately classify individuals into one of several groups. In the two-group classification problem that we consider here, one or two tests are used to distinguish between two groups of individuals, which for ease of discussion we will refer to as a diseased () group and a non-diseased ( ) group. One phase in the development of a new medical test involves characterizing the test s ability to accurately discern from individuals in the target population. The accuracy of a continuous test can be quantified by first defining a cutoff threshold, c, for a positive test, and then estimating the sensitivity, η(c), and specificity, θ(c), of the test at that cutoff. The parameter η(c) denotes the probability of a diseased individual having a positive test result at cutoff c, and θ(c) is the probability of a non-diseased individual having a

4 negative result. Without loss of generality we adopt the usual convention that test scores (y) are expected to be larger for the group, so that η(c) = Pr(y > c ) and θ(c) = Pr(y < c ). Instead of focusing inference on a single cutoff value, an alternative approach to evaluating the accuracy of continuous tests that avoids the loss of information that comes from dichotomization involves estimating the receiver operating characteristic (ROC) curve. The ROC curve represents the plot of a test s true positive fraction (sensitivity) versus its false positive fraction (1 specificity) across all possible cutoff thresholds. Thus, the ROC curve is obtained by plotting the pairs (1- θ(c), η(c)) for all values of c. The area under the ROC curve (AUC) is a summary index that measures the overall accuracy of a test, reflecting with equal weight the test s ability to distinguish between subjects with and without a medical condition. The value of AUC typically ranges from.5 (for a useless diagnostic procedure that classifies disease status in a purely random fashion) to 1 (for tests that have perfect classification accuracy). In this paper, we treat AUC as the focal parameter for use in evaluating and comparing continuous medical tests when true disease status is known and when it is not, and we develop a simulation-based procedure for sample size estimation and power calculations in these contexts. We emphasize at the outset that although our focus is on the use of medical tests to classify health status, and our notation and terminology are consistent with biomedical applications, the methods presented in this paper apply more broadly. For instance, the methods we develop here can aid in sample size selection to investigate any general continuous classification procedure. 3

5 The remainder of the paper is organized as follows. Common goals of test accuracy studies and some background on ROC analysis are outlined in Section. Section 3 details the Bayesian models that we use in our sample size determination procedure. In Section 4 we discuss the Bayesian average power criterion used in our computational algorithm. Results from simulations are presented in Section 5, and concluding remarks are given in Section 6.. Goals and Background In designing a study that will measure and/or compare test performance, an appropriate sample size that will ensure adequate statistical power without overextending limited resources is needed. We consider study designs that involve either a single medical test, or two conditionally independent or correlated tests. The possible goals of test accuracy studies are numerous. We focus on three common goals and we note that many other cases can be handled with slight modifications of the ideas and methods presented here. We assume that a new and/or a standard test are under investigation, but in general the study could involve any tests. The goals include establishing that a continuous test has at least a certain desired level of accuracy, establishing that one test has superior accuracy over another test, and establishing that two tests are (practically) equivalent in terms of accuracy. Specifically, we consider power calculations to (i) verify that the AUC of a newly developed continuous medical test exceeds some threshold, (ii) verify that the AUC of a newly developed medical test is greater than that of a standard test, and (iii) verify that the AUC of a new diagnostic procedure is equivalent to that of a standard classifier. These three objectives were also the focus of 4

6 Branscum, Johnson, and Gardner (7) to determine required sample sizes for estimating sensitivity and specificity of binary tests. We do not discuss ordinal tests here; we refer the reader to Wang and Gatsonis (8) for a Bayesian treatment of multi-test, multi-reader ordinal ROC analysis, including methods for sample size determination in that context. We may represent the above three goals as certain hypothesis tests regarding AUC. Let the subscripts and S denote the new and standard tests, and let, λ, and ε represent pre-determined positive constants. In case (i), we formulate the following hypothesis H: AUC >. The hypotheses of test superiority (case ii) and equivalence (case iii) are written as: respectively. H: AUC AUCS >λ and H: AUC AUC < ε, With respect to sample size determination, we assume that each proposed hypothesis is true and that a future study is being designed to test the hypothesis. A sample size is selected that ensures, on average, that the posterior probability of the hypothesis H is high when in fact H is true. In addition to being able to test in a single framework many different types of hypotheses that reflect many different study goals, we also build into our all-purpose sample size determination procedure the ability to accommodate a key complicating issue in medical test evaluation, namely handling data from sampled individuals whose true disease status is unknown. Information on true disease status will be missing when a S 5

7 perfect reference test (also called a gold-standard test) does not exist or cannot be applied without unacceptable consequences. The methods developed here for sample size calculations in the gold-standard (GS) setting are a special case of those for the non gold-standard (GS) setting. Much of the literature on ROC analysis assumes that the true disease status is known for each subject, however research on the development of GS ROC analysis has recently increased (some examples include Branscum et al., 8; Albert, 7; Choi et al., 6; Erkanli et al., 6; Zhou, Castelluccio, and Zhou, 5). Often, imperfect tests are used when GS tests are either too expensive or invasive, or don t exist with (near) perfect accuracy. Obuchowski (1998) reviewed methods for sample size calculations for ROC curves and functionals of them, including procedures for a single diagnostic test, for comparing two diagnostic tests and for multi-reader ROC analysis. Obuchowski and McClish (1997) studied sample size requirements when the ROC curve is only considered in a specific range of false positive values (partial AUC) rather than the full area under the ROC curve. Sample size estimation for clustered ROC curves was addressed in Obuchowski (1997), whose method incorporated a cluster design effect. All three of these papers assumed that a GS test was available to ascertain the true medical condition of each subject. The present study addresses the issue of sample size and power estimation for ROC analysis via the Bayesian paradigm. A primary advantage of the Bayesian approach is the allowance for uncertainty in parameter values in the planning stages of the 6

8 experiment as opposed to use of plug-in values. The particular criterion we use is referred to by Wang and Gelfand () as Bayesian power. 3. Bayesian Models 3.1. One Gold-Standard Test Let TS i (i = 1,, n 1 ) and j TS (j = 1,, n ) denote scores of a new test obtained from a random sample of n (n = n 1 + n ) individuals who have a disease or are disease-free, respectively. We suppose that a GS test has been used to identify each individual s true disease status before the application of the new test. We further assume that TS i and j TS are both normally distributed or could be modeled with normal families after an appropriate transformation is applied, namely TS i ~ (, ), i = 1,, n 1, TS j ~ (, ), j = 1,, n, (1) where and denote the means, and and denote the variances of the distributions of the measurements from the diseased and non-diseased populations, respectively. The AUC for a diagnostic test under this two-group normal model is given by AUC = Φ +, () where Φ ( ) is the c.d.f. of the standard normal distribution., The completion of the Bayesian model requires a prior distribution for (,, ). We assume prior independence of the component parameters. We employ a 7

9 Bayesian sample size determination method as described in Wang and Gelfand (). Their simulation-based approach requires the user to construct two sets of distributions. One set, called sampling or design priors, is used to simulate parameter values that are then used to generate multiple data sets, and the other set, called fitting or analysis priors, is used as priors for data analysis on the simulated data sets in the conventional way. By using sampling priors, the method accounts for uncertainty about the true values of parameters in the model, in contrast to using fixed planning estimates as is commonly done in frequentist sample size determination. Sampling priors contain substantive information, which can be extracted from historical data, based on expert experience and knowledge, or constructed using a combination of the two. One approach that has been previously used in practice places uniform distributions with relatively narrow intervals around the elicited planning estimate (Wang and Gelfand, ; Cheng, Stamey, and Branscum, 8). For instance, suppose a value for of.5 is elicited. In the present Bayesian framework, uncertainty could be accounted for by generating from, say, a uniform(1.5, 3.5) distribution, or a normal distribution with mean.5 and an appropriate standard deviation. The fitting priors for and in the data analysis portion of the sample size procedure are generally relatively flat normal distributions centered at zero. Additional details about these two sets of distributions are provided in Section One on Gold-Standard Test When it is difficult or impossible to have a definitive diagnosis of tested individuals, the model in Section 3.1 needs to be modified to account for unknown disease status. We introduce a latent disease indicator variable Z k, k = 1,, n, where 8

10 Z k =1 if the kth sampled individual is and Z k = otherwise, and where n is the total number of enrolled subjects into the study. If individuals are sampled randomly from a large population that has disease prevalence π, then Z k ~ Bernoulli(π ), k = 1,, n. (3) The data are modeled according to the mixture TS k ~ π f (, ) + (1 π ) g(, ), (4) where f (, ) and g(, ) are the p.d.f.s of the (, ) and (, ) distributions, respectively. The procedure to construct priors for,,, and is analogous to that previously described for the GS case. In addition, it is required to assign an informative fitting prior to π, and we furthermore incorporate the constraint > to insure identifiability (Choi et al., 6). A beta prior distribution is often reasonable for a prevalence parameter (e.g., Johnson, Gastwirth, and Pearson, 1; Joseph, Gyorkos, and Coupal, 1995, and many others). For the sampling prior, we use a uniform distribution over an elicited range, for instance π ~ uniform(.3,.5) Two Gold-Standard Tests In the two test scenario, both the standard and new tests will be applied to all sampled subjects. In the GS case, the medical condition is known for each subject, with n 1 subjects having the disease and n subjects being disease-free. We assume an independent, two-group bivariate normal model for the future (possibly transformed) data as in Choi et al. (6). Here, TS i and j TS denote vectors that contain scores from the 9

11 standard and new tests for the ith diseased and jth non-diseased subjects, respectively, which are modeled as TS TS i j TS = TS TS = TS i S i j S j (, Σ ), i = 1,, n 1, ~ (, Σ ), j = 1,, n. ~ Using obvious notation for parameters, the AUCs of the standard and new diagnostic tests are given by AUC S =Φ S S S + S and AUC =Φ +. (5) The decision related to superiority of the new test or to equivalence of test accuracy can be determined by the magnitude of AUC AUC S. With two GS tests, sampling and fitting priors are assigned to the following parameters: S,,,, S S,,, S, ρ S, ρ. Since in S most situations with dependent tests, the new and standard tests are positively correlated, the lower bounds of ρ S and ρ are assumed to be no smaller than. Uniform and S beta distributions are used as sampling and fitting priors for ρ S and ρ in our S study Two on Gold-Standard Tests In the two-test, GS scenario, we again use the latent Bernoulli variables Z k, k = 1,, n, with parameter π just as was done in the one GS case. The kth measurement is assumed to follow a distribution that is a mixture of two bivariate normals 1

12 TS k π (, Σ ) + (1 π ) (, Σ ). (6) ~ Prior construction for π is done in the same manner as discussed in Section 3.. In order to avoid problems with identifiability in this mixture model, we add the constraints S > S and >, and place an informative prior on π. All the other prior densities are as described in the two GS case. 4. Bayesian Power Criterion and Simulation Algorithm We apply the Bayesian power criterion proposed by Wang and Gelfand (), which is used in hypothesis testing. For case (i), this criterion selects a combination of n 1 and n (GS test), or n (GS test) so that the posterior probability, averaged over potential future data sets, that the AUC of the new diagnostic test exceeds some benchmark,, is sufficiently high when in fact the AUC is expected to be greater than. Specifically, the average power criterion in this setting is where m TS and m m E {Pr( AUC > TS, TS )} 1 β m TS denote test scores from future study data associated with diseased and non diseased subjects. Typical values forβ are.5,.1, and., and the value of is problem-specific, but for accurate tests the value.85,.9, or.95 can be used. studies Similar expressions in the two-test situation can be formulated for superiority and equivalence studies m m m m E{Pr( AUC AUCS > λ TS, TS, TS S, TS )} 1 β m m m m E{Pr( AUC AUC S < ε TS, TS, TS S, TS )} 1 β, S S 11

13 m m where potential future data sets, ( TS, TS, TS, TS ), represent a composition of m new and standard test scores on subjects with or without disease. Choices of λ include, for instance,.1,.15, and., and ε is chosen to be a small constant, such as.5. Wang and Gelfand () argue that sampling priors should be informative whereas fitting priors for data analysis can be less informative or even diffuse. In our study, the sampling priors for all parameters are uniform distributions centered on the most likely value and with a small range. However, identifiability issues prevent the use of diffuse fitting priors for all parameters with GS data, so π, ρ S, and ρ are S each assigned a beta fitting prior that has the same mean as the corresponding uniform sampling prior, but with slightly larger variance. The fitting and sampling priors for the mean parameters follow the order constraints outlined in Sections 3.1 and 3.3. Choi et al. (6) handled this by constraining the normal sampling models over a parameter space in which the mean test score for the population is greater than the mean for the population. We found that computation using this truncated normal model breaks down for certain data sets in the simulation framework required for our sample size procedure. Therefore, as a practical alternative we instead mitigated lack of identifiability by modeling the means for the diseased and non-diseased populations with prior distributions that have non overlapping 99% intervals. The variance parameters are given somewhat diffuse inverse gamma fitting priors. The following algorithm can be used to compute Bayesian power for the one GS test setting of case (i). Similar algorithms can be used for cases (ii) and (iii), and for m S studies that will involve GS data and/or other goals. 1. Specify, β, and G pairs of sample size combinations (n 1, n ). S 1

14 . For l = 1,, B Monte Carlo iterations, at each sample size combination (i) Generate values of,, (ii) Simulate data formula (1)., and from their sampling priors. l l TS i, i = 1,, n 1, and TS, j = 1,, n j, according to (iii) To each simulated data set generated in step (ii), fit the two-group normal model and approximate the posterior distribution of (). A Gibbs sampler can be employed at this step. AUC as defined in equation (iv) For the lth simulated data set, calculate the posterior probability that l AUC exceeds. This posterior probability is computed as the proportion of Monte Carlo (MC) iterates sampled from the posterior of l AUC that are greater than l l l. An approximation to Pr( AUC > TS, TS ) is obtained as M l 1 l t p = I( AUC > ), where I ( ) denotes the indicator function, M denotes M t= 1 the number of MC iterates, and AUC denotes the tth MC iterate generated l t from the posterior of l AUC. (v) Calculate the Bayesian average power for each sample size combination, which is obtained as 1 B B l= 1 l p. 3. Fit a curve or surface through the G Bayesian power values and find an adequate sample size combination for the desired power. 13

15 In the case of one GS test, steps (i) - (iii) need to be altered. In step (i), the latent variable Z k needs to be generated after obtaining a value of π from its sampling prior; then, based on expression (4), data are simulated in step (ii). The approximation of the posterior distribution of l AUC requires the fitting prior of π in step (iii). For studies that involve two GS tests, values of S,,,, S S,,, S, ρ S, and ρ need to be generated in step (i) via sampling priors. S In the next step, we simulate scores from the standard test, namely TSi S and TS, using formula (1). Then, values of the new test are simulated conditional on the data j S from the standard test, i.e., generate TS i TSi S and TS TS j j S. In step (iii), the joint posterior distribution of both AUCs is obtained from a two-group bivariate normal analysis. In the next step, if the goal of the study is to demonstrate the superiority of the new diagnostic test relative to the standard test, calculate the posterior probability l l l l Pr( AUC AUCS > λ TS, TS, TS S, TS ) using M l 1 l t l t p = I( AUC AUCS >λ ). If the study objective is to demonstrate M t= 1 M l 1 l t l t equivalence, the value p = I( AUC AUCS <ε ) is an MC approximation to M t= 1 l l l l Pr( AUC AUCS < ε TS, TS, TS S, TS ). In both cases, the average S S power is obtained as the mean of the l p s. When the disease condition is not identified in the case of two tests, data will be simulated and analyzed after incorporating the Bernoulli random variable Z k. 14

16 All computations in these algorithms can be carried out using R and posterior distributions can be approximated using the WinBUGS software. Computer code for some examples can be found online at the website 5. Illustrations We consider sample size calculations in a variety of scenarios. We first investigate the impact of and λ on the required sample size. In the one test and two tests settings, we compare Bayesian average power when disease status is available to the case where this information is not ascertained. The simulations also assess the influence of sampling priors on average power. The issue of the fraction of non-diseased versus diseased subjects is also studied. Moreover, we discuss the influence of ρ S and ρ S on sample size when two tests are applied to each subject. These simulations are all performed for case (i) or (ii). In addition, we calculate required sample size under the proposed Bayesian method and under a standard frequentist method in the one GS test setting. All simulations in this section use 1 data sets generated as described in Section 4, with 5 posterior iterates after a 1 iteration burn-in used for posterior approximations with each simulated data set. Computational times for the one GS, one GS, and two GS cases are relatively fast with more time required for the two GS case. For example, using a ell PC with a.66 GHz processor and 3.5 GB of RAM, the run time of a simulation with 1 data sets is less than 3 minutes for one GS test, about 7 minutes for one GS test,.5 hours for two GS tests, and 6.5 hours for two GS tests with a total sample size of 1. 15

17 5.1 Impact of and λ on Sample Size We will first illustrate how the average Bayesian power varies over a range of sample size combinations for different values of and λ. Three benchmark values were selected, namely {.85,.9,.95} and λ {.1,.15,.}. For both one and two GS tests, the sample size combinations used are (n 1, n ) = (1, 1), (, ),, (1, 1). When medical conditions are unknown, π is given a uniform(.4,.6) sampling prior and a beta(5, 5) fitting prior with both priors reflecting a 5-5 chance for each individual having the disease. The total sample sizes used in the GS scenario are n =, 4,,, matching the totals used in the GS cases. In the simulations involving a single test, we used a uniform(.5, 3.5) as the sampling prior for and a uniform(-.5,.5) as the sampling prior for so that the average test score in the disease group is larger than that in the disease-free group. Uniform(1.8,.) and uniform(.8, 1.) distributions were used as the sampling priors of and, respectively, which conforms to the convention that test values tend to fluctuate more widely in the disease group. For the fitting priors of and, a normal(3,.58) and a normal(,.58) distribution were used, respectively, for the diseased and non-diseased populations. The lower endpoint of the central 99% interval of the normal(3,.58) is 1.56, and the upper endpoint of the central 99% interval of the normal(,.58) is 1.494; therefore, the fitting priors for and have minimal overlap with each other. The fitting priors for and were relatively non- informative IG(.5,.5) and IG(.5,.5) distributions, reflecting the prior belief that is smaller as is commonly seen in practice. 16

18 Figure 1 contains three cubic polynomial spline fits to the Bayesian power estimates at (n 1, n ) = (1, 1), (, ),, (1, 1) corresponding to the different values of. The size of appreciably alters the magnitude of the average Bayesian power. When =.85, the smallest Bayesian power is.97, with only 1 subjects in each group, and the power is almost 1 (.996) with a total sample size of. However, when is raised to.95, the largest Bayesian power never exceeds.68 even after recruiting as many as 1 subjects in each group. The average Bayesian power for =.9 ranged from.768 to.943 across the selected range of sample sizes. Figure 1 about here The corresponding Bayesian power curves for the one GS test are also plotted in Figure 1. Clearly, a study involving individuals with unknown disease status will be less powerful than a study in which medical conditions are known. For example, using a third order polynomial regression fit, at least 13 subjects are required to achieve a Bayesian power of.8 with =.9 when one GS test is used. However, to obtain the same power with the same, the sample size is reduced to 3 if one GS test is applied. When the number of GS tests increases to two, six more parameters are needed to complete the model. Table 1 categorizes the sampling and fitting priors used for the parameters in the two-test GS simulations. The sampling prior means of, S, S, and are, 1,, and 3, respectively. The fitting priors for and S S, and and are formulated so that there is no overlap among 99% intervals. Both the 17

19 sampling and fitting priors of ρ and ρ S S are informative, and each has a prior mean of.5. Table 1 about here Figure demonstrates the impact of the effect size ( λ ) of superiority on the average Bayesian power. With λ =.15, a decrease in power is seen at each sample size by approximately 5 to 1 per cent compared with λ =.1. The larger effect size in AUC of λ =. further reduces the average power by about 3 to 43 per cent at every sample size in comparison with λ =.1. Figure about here Figure also has the corresponding three splines for the two GS case. A comparison of the GS and GS curves demonstrates that unknown disease status requires a much larger sample size in this example. When λ =.1, a power of.95 requires only 3 subjects for the two GS test scenario. However, unidentified disease status will add more than 1 additional subjects to obtain the same power. 5. Impact of Sampling Priors We next investigate changes in sample size as a result of different sampling priors for the mean or variance parameters. We consider GS scenarios with =.9 for the 18

20 one test case and λ =.15 for two tests, the latter placed on a website in order to save space. In the one test case, three different sets of sampling priors for and are compared. One set follows those discussed in Section 5.1 with ~ uniform(.5, 3.5) and ~ uniform(1.8,.). The second set uses the same uniform distribution for but increases the prior mean of, from 3 to 3.5. The third set gives a more precise sampling prior, a uniform(1.4, 1.8), to, and uses the same prior on as in the first set of sampling priors. The results presented in Table show that either raising the prior mean of or reducing the prior mean of boosts the average power, which was expected since both will increase the separation between the distributions of test scores for the diseased and non-diseased populations. With the first set of sampling priors, to achieve power of at least.9 a study needs to enroll more than 8 individuals. However, when the sampling prior of changes to the uniform(1.4, 1.8), only 36 total subjects are required to obtain a 9% level of average power. With a uniform(3, 4) sampling prior on, the sample size decreases from 8 to 3 for a power of.9. Comparing the power numbers at each sample size shows that in this example a 16.67% increase in the prior mean of raises the average power by a greater amount compared to a % increase in the prior precision of. Table about here 19

21 The results summarizing the impact of mean and variance sampling priors for the two GS test scenario can be found online at Impact of Prevalence and the Ratio of iseased-to-on iseased The simulations in the previous two sections were based on a population prevalence of 5% so the ratio of diseased and non-diseased sampled individuals was 1. A balanced allocation of subjects will not be expected when the prevalence is different from.5. We therefore study the influence of an imbalanced design on sample size determination for ROC analysis. We fix the total sample size at 1 for the one GS and GS cases. The sample size combinations for the GS scenario are (n 1, n ) = (1, 9), (, 8),, (8, ), (9, 1). For unknown disease status, we investigate changes in average power when the prior mean of π ranges from. to.8 in increments of.1. In both cases, was set equal to.9. The average Bayesian power does not reach a maximum when there are an equal number of subjects in the diseased or disease-free groups, as demonstrated in Table 3. In fact, with a ratio of n 1 to n of.33 (7 vs. 3), we obtained the largest average power of.917 in this example. The average power decreased by only.1 when the sample size ratio was 1.5. However, further increases of the ratio to 4 and 9 reduce the Bayesian power by about 1 and 4 per cents, respectively, compared with the balanced design. Table 3 also shows that the power continues to decrease as the ratio between n 1 and n approaches. In this simulation, a larger value of average power is achieved if more diseased subjects, rather than more disease-free subjects, are recruited.

22 Table 3 about here The website mentioned in Section 5. also presents results regarding prevalence prior changes for a single GS test with = Impact of Correlation Here we evaluate how correlation among tests affects the sample size requirement in ROC studies. We compare four scenarios involving two GS tests with λ equal to.15. The first scenario uses the sampling and fitting priors of ρ and ρ S S listed in Table 1. In the second scenario, we increase the prior mean of ρ S from.5 to.8 for both data simulation and data analysis. In the third scenario, we change the sampling and fitting prior means of ρ to.8. The last scenario considered assumes conditional S independence of the two tests by setting ρ S and ρ equal to. Two tests that have S a different biological basis are often viewed as conditionally independent (see, for example, Branscum, Gardner, and Johnson, 5). Based on Table 4, a total of at least 6 subjects are needed to achieve a power of.9 to detect a.15 difference in AUC when the prior means of both ρ and ρ S S are.5. Our simulations show that ρ S has a greater impact on Bayesian power compared with ρ. When the sampling prior mean of ρ S S increases from.5 to.8, the average Bayesian power increases by about.15 across all sample sizes. However, the same amount of change in ρ leads to an increase in Bayesian power of only.6, S less than half the size of the power increase seen with the same amount of change in 1

23 ρ S. When ρ and ρ S S are both equal to, more subjects are required in order to maintain the same level of power. Table 4 about here 5.5 Impact of Accounting for Uncertainty We investigate the impact of the degree of uncertainty in sampling and fitting priors on sample size requirements for one GS test with =.9. The following scenarios are considered: (1) the same sampling and informative fitting priors as used in Section 5.1; () the same sampling priors as used in Section 5.1, but a diffuse proper fitting prior that approximates Jeffreys prior for this model, namely ormal(,1) for means and IG(.1,.1) for variances; (3 and 4) sampling priors are not used, instead data are simulated using fixed values for parameters ( = 3, =, = 1 for case 3, and = 3.5, = -.5, =, and =, and = 1 for case 4). Table 5 reports average power for these four cases when 1 to 1 diseased and non-diseased individuals are sampled. Comparing informative to diffuse fitting priors (case 1 versus case ), the difference in average power does not exceed. with a total of 14 or more subjects, and the difference remains relatively low (.58) with only 1 subjects per group. The similarities in power for cases 1 and support the use of diffuse fitting priors as the default in one GS test studies. In case 3, fixed inputs instead of sampling priors were used to simulate data sets; specifically, the means of the sampling priors from case 1 were used as fixed input values. The average power is consistently higher (by between.7% and 6.1%) for case 3 compared to case 1. In case 4, we shifted the value of the

24 mean for the diseased population up by.5 and shifted the mean for the non-diseased population down by.5, relative to the fixed values used in case 3. This increased the average power substantially, which highlights the sensitive of the method to fixed input values and the need to carefully select them when they are used. Table 5 about here. 5.6 Traditional Methods for Sample Size Calculations The review article by Obuchowski (1998) and Obuchowski and McClish (1997) contain sample size formulas for measures of medical test performance. We consider one GS test that has true AUC of.9, with the goal of testing null hypothesis values of.8 or.85. The effect sizes are therefore.1 and.5, respectively. The parameter values from cases 3 and 4 in Section 5.5 are used to determine fixed inputs (using formula T1 in Obuchowski, 1998) for frequentist sample size computation under each effect size, yielding four total scenarios. Taking averages over the four scenarios, we calculated that 3 subjects in each group are needed to reach 8% power, 3 subjects are needed per group to reach 9% power, and 38 subjects per group for 95% power. The proposed Bayesian approach using the sampling and fitting priors outlined in case 1 of Section 5.5 gives sample sizes of 14 subjects in each group for 8% power, 4 subjects per group for 9% power, and 11 subjects per group for 95% power. In this setting, accounting for uncertainty in parameter values through sampling priors leads to higher sampling size combinations to achieve 9% power, with more than double the number of individuals needed for 95% 3

25 power. Similar qualitative findings were reported by Branscum, Johnson, and Gardner (7) for binary tests. 6. Conclusions We present a Bayesian approach to sample size and power calculations for ROC studies designed to measure and compare the performance of medical tests. The criterion adopted for this problem is Bayesian average power, which can be applied to several common study designs involving a single test or two tests, both with and without gold standard information. Through simulation studies we illustrated the impact of effect size, the ratio of the number of diseased to non-diseased subjects enrolled into a study, disease prevalence, and correlation among tests on the required sample size. The simulation study emphasizes the importance of incorporating prior information, especially at the data simulation step and with GS tests. Further research topics in this area may extend the Bayesian framework to sample size estimation for designs with clustered data. Power and sample size calculations for studies involving three or more tests or repeated testing are also worth further investigation. Methods designed specifically for ordinal tests without a gold-standard are also needed. Acknowledgments We thank two anonymous referees for their helpful suggestions, which resulted in an improved manuscript. 4

26 References Albert, P.S. (7). Random effects modeling approaches for estimating ROC curves from repeated ordinal tests without a gold standard. Biometrics 63, Branscum, A.J., Gardner, I.A., Johnson, W.O. (5). Estimation of diagnostic test sensitivity and specificity through Bayesian modeling. Preventive Veterinary Medicine 68, Branscum, A.J., Johnson, W.O., and Gardner, I.A. (7). Sample size calculations for studies designed to evaluate diagnostic test accuracy. Journal of Agricultural, Biological, and Environmental Statistics 16, Branscum, A.J., Johnson, WO, Hanson, TE, Gardner, IA, (8). Bayesian semiparametric ROC curve estimation and disease diagnosis. Statistics in Medicine 7, Cheng,., Stamey, J.., and Branscum, A.J. (9). Bayesian approach to average power calculation for binary regression with misclassified outcomes. Statistics in Medicine, OI: 1.1/sim.355 Choi, Y.-K., Johnson, W.O., Collins, M.T., and Gardner, I.A. (6). Bayesian inferences for receiver operating characteristic curves in the absence of a gold standard. Journal of Agricultural, Biological, and Environmental Statistics 11, 1-9. Erkanli, A., Sung, M., Costello, E.J., and Angold, A. (6). Bayesian semi-parametric ROC analysis. Statistics in Medicine 5, Johnson, W.O., Gastwirth, J.L., and Pearson, L.M. (1). Screening without a gold standard: the Hui-Walter paradigm revisited. American Journal of Epidemiology 153, Joseph, L., Gyorkos, T.W., and Coupal, L. (1995). Bayesian estimation of disease prevalence and the parameters of diagnostic tests in the absence of a gold standard. American Journal of Epidemiology 141, Liu, J.-P., Ma, M.-C., Wu, C.-Y., and Tai, J.-Y. (6). Tests of equivalence and noninferiority for diagnostic accuracy based on the paired areas under ROC curves. Statistics in Medicine 5, Obuchowski,.A. (1997). onparametric analysis of clustered ROC curve data. Biometrics 53, Obuchowski,.A. (1998). Sample size calculations in studies of test accuracy. Statistical Methods in Medical Research 7,

27 Obuchowski,.A. (6). An ROC-type measure of diagnostic accuracy when the gold standard is continuous-scale. Statistics in Medicine 5, Obuchowski,.A. and McClish,.K. (1997). Sample size determination for diagnostic accuracy studies involving binormal ROC curve indices. Statistics in Medicine 16, Wang, F. and Gelfand, A.E. (). A simulation-based approach to Bayesian sample size determination for performance under a given model and for separating models. Statistical Science 17, Wang, F. and Gatsonis, C.A. (8). Hierarchical models for ROC curve summary measures: esign and analysis of multi-reader, multi-modality studies of medical test. Statistics in Medicine 7, Zhou, X.-H., Castelluccio, P., and Zhou, C. (5). onparametric estimation of ROC curves in the absence of a gold standard. Biometrics 61,

28 Table 1: Sampling and fitting priors for two GS test scenario Parameter Sampling Prior Fitting Prior uniform(-.5,.5) normal(,.19) S uniform(.75, 1.5) normal(1,.19) S uniform(-.5,.5) normal(,.58) uniform(.5, 3.5) normal(3,.58) uniform(.8, 1.) IG(.5,.5) S uniform(1.8,.) IG(.5,.5) S uniform(.8, 1.) IG(.5,.5) uniform(1.8,.) IG(.5,.5) ρ uniform(.4,.6) beta(5, 5) S ρ uniform(.4,.6) beta(5, 5) S 7

29 Table : Bayesian power with one GS test when =.9 with three different combinations of sampling priors for and : (1) ~ uniform(.5, 3.5) and ~ uniform(1.8,.), () ~ uniform(3, 4) and ~ uniform(1.8,.), and (3) ~ uniform(.5, 3.5), ~ uniform(1.4, 1.8). Sample Size Power 1 Power Power 3 (1, 1) (, ) (3, 3) (4, 4) (5, 5) (6, 6) (7, 7) (8, 8) (9, 9) (1, 1)

30 Table 3: Bayesian power with one GS test when =.9 (n 1, n ) Bayesian Power (1, 9).83 (, 8).878 (3, 7).896 (4, 6).9 (5, 5).91 (6, 4).916 (7, 3).917 (8, ).91 (9, 1).876 9

31 Table 4: Bayesian power with two GS tests when λ =.15 with four different correlation structures. (1) sampling prior of ρ, ρ S S ~ uniform(.4,.6) and fitting prior of ρ S ~ beta(5, 5), () ρ same as in (1) with sampling prior ρ S S ~ uniform(.7,.9) and fitting prior ρ S ~ beta(4, 1), (3) sampling prior ρ ~ S uniform(.7,.9) and fitting prior ρ ~ beta(4, 1), and ρ S S same as in (1), and (4) ρ ρ = in both data simulation and analysis. S = S Sample Size Power 1 Power Power 3 Power 4 (1, 1) (, ) (3, 3) (4, 4) (5, 5) (6, 6) (7, 7) (8, 8) (9, 9) (1, 1)

32 Table 5: Bayesian power under informative (1) and diffuse () fitting priors, and when sampling priors are replaced with fixed point estimates. Sample Size Ave.power (case 1) Ave. power (case ) Ave. power (case 3) Ave. power (case 4) (1, 1) (, ) (3, 3) (4, 4) (5, 5) (6, 6) (7, 7) (8, 8) (9, 9) (1, 1)

33 Figure 1: Bayesian power curves for one test scenario. Bold lines represent one GS test and thin lines one GS test. represents =.85, _ represents =.9, and. represents =.95. 3

34 Figure : Bayesian power curves for two-test scenario. Bold lines represent two GS tests and thin lines two GS tests. represents λ =.1, _ represents λ =.15, and. represents λ =.. 33

Fundamental Clinical Trial Design

Fundamental Clinical Trial Design Design, Monitoring, and Analysis of Clinical Trials Session 1 Overview and Introduction Overview Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics, University of Washington February 17-19, 2003

More information

Psychology, 2010, 1: doi: /psych Published Online August 2010 (

Psychology, 2010, 1: doi: /psych Published Online August 2010 ( Psychology, 2010, 1: 194-198 doi:10.4236/psych.2010.13026 Published Online August 2010 (http://www.scirp.org/journal/psych) Using Generalizability Theory to Evaluate the Applicability of a Serial Bayes

More information

Practical Bayesian Design and Analysis for Drug and Device Clinical Trials

Practical Bayesian Design and Analysis for Drug and Device Clinical Trials Practical Bayesian Design and Analysis for Drug and Device Clinical Trials p. 1/2 Practical Bayesian Design and Analysis for Drug and Device Clinical Trials Brian P. Hobbs Plan B Advisor: Bradley P. Carlin

More information

A Brief Introduction to Bayesian Statistics

A Brief Introduction to Bayesian Statistics A Brief Introduction to Statistics David Kaplan Department of Educational Psychology Methods for Social Policy Research and, Washington, DC 2017 1 / 37 The Reverend Thomas Bayes, 1701 1761 2 / 37 Pierre-Simon

More information

Bayesian Methods for Medical Test Accuracy. Broemeling & Associates Inc., 1023 Fox Ridge Road, Medical Lake, WA 99022, USA;

Bayesian Methods for Medical Test Accuracy. Broemeling & Associates Inc., 1023 Fox Ridge Road, Medical Lake, WA 99022, USA; Diagnostics 2011, 1, 1-35; doi:10.3390/diagnostics1010001 OPEN ACCESS diagnostics ISSN 2075-4418 www.mdpi.com/journal/diagnostics/ Review Bayesian Methods for Medical Test Accuracy Lyle D. Broemeling Broemeling

More information

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method Biost 590: Statistical Consulting Statistical Classification of Scientific Studies; Approach to Consulting Lecture Outline Statistical Classification of Scientific Studies Statistical Tasks Approach to

More information

Bayesian Joint Modelling of Benefit and Risk in Drug Development

Bayesian Joint Modelling of Benefit and Risk in Drug Development Bayesian Joint Modelling of Benefit and Risk in Drug Development EFSPI/PSDM Safety Statistics Meeting Leiden 2017 Disclosure is an employee and shareholder of GSK Data presented is based on human research

More information

Modelling Spatially Correlated Survival Data for Individuals with Multiple Cancers

Modelling Spatially Correlated Survival Data for Individuals with Multiple Cancers Modelling Spatially Correlated Survival Data for Individuals with Multiple Cancers Dipak K. Dey, Ulysses Diva and Sudipto Banerjee Department of Statistics University of Connecticut, Storrs. March 16,

More information

Ordinal Data Modeling

Ordinal Data Modeling Valen E. Johnson James H. Albert Ordinal Data Modeling With 73 illustrations I ". Springer Contents Preface v 1 Review of Classical and Bayesian Inference 1 1.1 Learning about a binomial proportion 1 1.1.1

More information

Sample Size Considerations. Todd Alonzo, PhD

Sample Size Considerations. Todd Alonzo, PhD Sample Size Considerations Todd Alonzo, PhD 1 Thanks to Nancy Obuchowski for the original version of this presentation. 2 Why do Sample Size Calculations? 1. To minimize the risk of making the wrong conclusion

More information

Introduction. We can make a prediction about Y i based on X i by setting a threshold value T, and predicting Y i = 1 when X i > T.

Introduction. We can make a prediction about Y i based on X i by setting a threshold value T, and predicting Y i = 1 when X i > T. Diagnostic Tests 1 Introduction Suppose we have a quantitative measurement X i on experimental or observed units i = 1,..., n, and a characteristic Y i = 0 or Y i = 1 (e.g. case/control status). The measurement

More information

Introduction to Bayesian Analysis 1

Introduction to Bayesian Analysis 1 Biostats VHM 801/802 Courses Fall 2005, Atlantic Veterinary College, PEI Henrik Stryhn Introduction to Bayesian Analysis 1 Little known outside the statistical science, there exist two different approaches

More information

EVALUATION AND COMPUTATION OF DIAGNOSTIC TESTS: A SIMPLE ALTERNATIVE

EVALUATION AND COMPUTATION OF DIAGNOSTIC TESTS: A SIMPLE ALTERNATIVE EVALUATION AND COMPUTATION OF DIAGNOSTIC TESTS: A SIMPLE ALTERNATIVE NAHID SULTANA SUMI, M. ATAHARUL ISLAM, AND MD. AKHTAR HOSSAIN Abstract. Methods of evaluating and comparing the performance of diagnostic

More information

Estimation of Area under the ROC Curve Using Exponential and Weibull Distributions

Estimation of Area under the ROC Curve Using Exponential and Weibull Distributions XI Biennial Conference of the International Biometric Society (Indian Region) on Computational Statistics and Bio-Sciences, March 8-9, 22 43 Estimation of Area under the ROC Curve Using Exponential and

More information

Introduction to ROC analysis

Introduction to ROC analysis Introduction to ROC analysis Andriy I. Bandos Department of Biostatistics University of Pittsburgh Acknowledgements Many thanks to Sam Wieand, Nancy Obuchowski, Brenda Kurland, and Todd Alonzo for previous

More information

Bayesian Confidence Intervals for Means and Variances of Lognormal and Bivariate Lognormal Distributions

Bayesian Confidence Intervals for Means and Variances of Lognormal and Bivariate Lognormal Distributions Bayesian Confidence Intervals for Means and Variances of Lognormal and Bivariate Lognormal Distributions J. Harvey a,b, & A.J. van der Merwe b a Centre for Statistical Consultation Department of Statistics

More information

Combining Risks from Several Tumors Using Markov Chain Monte Carlo

Combining Risks from Several Tumors Using Markov Chain Monte Carlo University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln U.S. Environmental Protection Agency Papers U.S. Environmental Protection Agency 2009 Combining Risks from Several Tumors

More information

Remarks on Bayesian Control Charts

Remarks on Bayesian Control Charts Remarks on Bayesian Control Charts Amir Ahmadi-Javid * and Mohsen Ebadi Department of Industrial Engineering, Amirkabir University of Technology, Tehran, Iran * Corresponding author; email address: ahmadi_javid@aut.ac.ir

More information

Score Tests of Normality in Bivariate Probit Models

Score Tests of Normality in Bivariate Probit Models Score Tests of Normality in Bivariate Probit Models Anthony Murphy Nuffield College, Oxford OX1 1NF, UK Abstract: A relatively simple and convenient score test of normality in the bivariate probit model

More information

Using historical data for Bayesian sample size determination

Using historical data for Bayesian sample size determination Using historical data for Bayesian sample size determination Author: Fulvio De Santis, J. R. Statist. Soc. A (2007) 170, Part 1, pp. 95 113 Harvard Catalyst Journal Club: December 7 th 2016 Kush Kapur,

More information

1 Introduction. st0020. The Stata Journal (2002) 2, Number 3, pp

1 Introduction. st0020. The Stata Journal (2002) 2, Number 3, pp The Stata Journal (22) 2, Number 3, pp. 28 289 Comparative assessment of three common algorithms for estimating the variance of the area under the nonparametric receiver operating characteristic curve

More information

Full title: A likelihood-based approach to early stopping in single arm phase II cancer clinical trials

Full title: A likelihood-based approach to early stopping in single arm phase II cancer clinical trials Full title: A likelihood-based approach to early stopping in single arm phase II cancer clinical trials Short title: Likelihood-based early stopping design in single arm phase II studies Elizabeth Garrett-Mayer,

More information

Design for Targeted Therapies: Statistical Considerations

Design for Targeted Therapies: Statistical Considerations Design for Targeted Therapies: Statistical Considerations J. Jack Lee, Ph.D. Department of Biostatistics University of Texas M. D. Anderson Cancer Center Outline Premise General Review of Statistical Designs

More information

Various performance measures in Binary classification An Overview of ROC study

Various performance measures in Binary classification An Overview of ROC study Various performance measures in Binary classification An Overview of ROC study Suresh Babu. Nellore Department of Statistics, S.V. University, Tirupati, India E-mail: sureshbabu.nellore@gmail.com Abstract

More information

Mediation Analysis With Principal Stratification

Mediation Analysis With Principal Stratification University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 3-30-009 Mediation Analysis With Principal Stratification Robert Gallop Dylan S. Small University of Pennsylvania

More information

Critical reading of diagnostic imaging studies. Lecture Goals. Constantine Gatsonis, PhD. Brown University

Critical reading of diagnostic imaging studies. Lecture Goals. Constantine Gatsonis, PhD. Brown University Critical reading of diagnostic imaging studies Constantine Gatsonis Center for Statistical Sciences Brown University Annual Meeting Lecture Goals 1. Review diagnostic imaging evaluation goals and endpoints.

More information

An Introduction to Bayesian Statistics

An Introduction to Bayesian Statistics An Introduction to Bayesian Statistics Robert Weiss Department of Biostatistics UCLA Fielding School of Public Health robweiss@ucla.edu Sept 2015 Robert Weiss (UCLA) An Introduction to Bayesian Statistics

More information

The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0

The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0 The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0 Introduction Loss of erozygosity (LOH) represents the loss of allelic differences. The SNP markers on the SNP Array 6.0 can be used

More information

EXERCISE: HOW TO DO POWER CALCULATIONS IN OPTIMAL DESIGN SOFTWARE

EXERCISE: HOW TO DO POWER CALCULATIONS IN OPTIMAL DESIGN SOFTWARE ...... EXERCISE: HOW TO DO POWER CALCULATIONS IN OPTIMAL DESIGN SOFTWARE TABLE OF CONTENTS 73TKey Vocabulary37T... 1 73TIntroduction37T... 73TUsing the Optimal Design Software37T... 73TEstimating Sample

More information

Bayesian Adjustments for Misclassified Data. Lawrence Joseph

Bayesian Adjustments for Misclassified Data. Lawrence Joseph Bayesian Adjustments for Misclassified Data Lawrence Joseph Bayesian Adjustments for Misclassified Data Lawrence Joseph Marcel Behr, Patrick Bélisle, Sasha Bernatsky, Nandini Dendukuri, Theresa Gyorkos,

More information

A Comparison of Methods for Determining HIV Viral Set Point

A Comparison of Methods for Determining HIV Viral Set Point STATISTICS IN MEDICINE Statist. Med. 2006; 00:1 6 [Version: 2002/09/18 v1.11] A Comparison of Methods for Determining HIV Viral Set Point Y. Mei 1, L. Wang 2, S. E. Holte 2 1 School of Industrial and Systems

More information

Bayesian Adjustments for Misclassified Data. Lawrence Joseph

Bayesian Adjustments for Misclassified Data. Lawrence Joseph Bayesian Adjustments for Misclassified Data Lawrence Joseph Marcel Behr, Patrick Bélisle, Sasha Bernatsky, Nandini Dendukuri, Theresa Gyorkos, Martin Ladouceur, Elham Rahme, Kevin Schwartzman, Allison

More information

Statistical Hocus Pocus? Assessing the Accuracy of a Diagnostic Screening Test When You Don t Even Know Who Has the Disease

Statistical Hocus Pocus? Assessing the Accuracy of a Diagnostic Screening Test When You Don t Even Know Who Has the Disease Statistical Hocus Pocus? Assessing the Accuracy of a Diagnostic Screening Test When You Don t Even Know Who Has the Disease Michelle Norris Dept. of Mathematics and Statistics California State University,

More information

Comparison of the Null Distributions of

Comparison of the Null Distributions of Comparison of the Null Distributions of Weighted Kappa and the C Ordinal Statistic Domenic V. Cicchetti West Haven VA Hospital and Yale University Joseph L. Fleiss Columbia University It frequently occurs

More information

Our goal in this section is to explain a few more concepts about experiments. Don t be concerned with the details.

Our goal in this section is to explain a few more concepts about experiments. Don t be concerned with the details. Our goal in this section is to explain a few more concepts about experiments. Don t be concerned with the details. 1 We already mentioned an example with two explanatory variables or factors the case of

More information

COMPARING SEVERAL DIAGNOSTIC PROCEDURES USING THE INTRINSIC MEASURES OF ROC CURVE

COMPARING SEVERAL DIAGNOSTIC PROCEDURES USING THE INTRINSIC MEASURES OF ROC CURVE DOI: 105281/zenodo47521 Impact Factor (PIF): 2672 COMPARING SEVERAL DIAGNOSTIC PROCEDURES USING THE INTRINSIC MEASURES OF ROC CURVE Vishnu Vardhan R* and Balaswamy S * Department of Statistics, Pondicherry

More information

Computerized Mastery Testing

Computerized Mastery Testing Computerized Mastery Testing With Nonequivalent Testlets Kathleen Sheehan and Charles Lewis Educational Testing Service A procedure for determining the effect of testlet nonequivalence on the operating

More information

arxiv: v1 [stat.me] 7 Mar 2014

arxiv: v1 [stat.me] 7 Mar 2014 Bayesian regression discontinuity designs: Incorporating clinical knowledge in the causal analysis of primary care data Sara Geneletti 1, Aidan G. O Keeffe 2, Linda D. Sharples 3, Sylvia Richardson 4,

More information

Comparing Two ROC Curves Independent Groups Design

Comparing Two ROC Curves Independent Groups Design Chapter 548 Comparing Two ROC Curves Independent Groups Design Introduction This procedure is used to compare two ROC curves generated from data from two independent groups. In addition to producing a

More information

METHODS FOR DETECTING CERVICAL CANCER

METHODS FOR DETECTING CERVICAL CANCER Chapter III METHODS FOR DETECTING CERVICAL CANCER 3.1 INTRODUCTION The successful detection of cervical cancer in a variety of tissues has been reported by many researchers and baseline figures for the

More information

Analysis of left-censored multiplex immunoassay data: A unified approach

Analysis of left-censored multiplex immunoassay data: A unified approach 1 / 41 Analysis of left-censored multiplex immunoassay data: A unified approach Elizabeth G. Hill Medical University of South Carolina Elizabeth H. Slate Florida State University FSU Department of Statistics

More information

Review. Imagine the following table being obtained as a random. Decision Test Diseased Not Diseased Positive TP FP Negative FN TN

Review. Imagine the following table being obtained as a random. Decision Test Diseased Not Diseased Positive TP FP Negative FN TN Outline 1. Review sensitivity and specificity 2. Define an ROC curve 3. Define AUC 4. Non-parametric tests for whether or not the test is informative 5. Introduce the binormal ROC model 6. Discuss non-parametric

More information

Bayesian Bi-Cluster Change-Point Model for Exploring Functional Brain Dynamics

Bayesian Bi-Cluster Change-Point Model for Exploring Functional Brain Dynamics Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'18 85 Bayesian Bi-Cluster Change-Point Model for Exploring Functional Brain Dynamics Bing Liu 1*, Xuan Guo 2, and Jing Zhang 1** 1 Department

More information

Bayesian and Frequentist Approaches

Bayesian and Frequentist Approaches Bayesian and Frequentist Approaches G. Jogesh Babu Penn State University http://sites.stat.psu.edu/ babu http://astrostatistics.psu.edu All models are wrong But some are useful George E. P. Box (son-in-law

More information

Introduction. Patrick Breheny. January 10. The meaning of probability The Bayesian approach Preview of MCMC methods

Introduction. Patrick Breheny. January 10. The meaning of probability The Bayesian approach Preview of MCMC methods Introduction Patrick Breheny January 10 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/25 Introductory example: Jane s twins Suppose you have a friend named Jane who is pregnant with twins

More information

Understanding Uncertainty in School League Tables*

Understanding Uncertainty in School League Tables* FISCAL STUDIES, vol. 32, no. 2, pp. 207 224 (2011) 0143-5671 Understanding Uncertainty in School League Tables* GEORGE LECKIE and HARVEY GOLDSTEIN Centre for Multilevel Modelling, University of Bristol

More information

SISCR Module 4 Part III: Comparing Two Risk Models. Kathleen Kerr, Ph.D. Associate Professor Department of Biostatistics University of Washington

SISCR Module 4 Part III: Comparing Two Risk Models. Kathleen Kerr, Ph.D. Associate Professor Department of Biostatistics University of Washington SISCR Module 4 Part III: Comparing Two Risk Models Kathleen Kerr, Ph.D. Associate Professor Department of Biostatistics University of Washington Outline of Part III 1. How to compare two risk models 2.

More information

Individual Differences in Attention During Category Learning

Individual Differences in Attention During Category Learning Individual Differences in Attention During Category Learning Michael D. Lee (mdlee@uci.edu) Department of Cognitive Sciences, 35 Social Sciences Plaza A University of California, Irvine, CA 92697-5 USA

More information

Bayesian versus maximum likelihood estimation of treatment effects in bivariate probit instrumental variable models

Bayesian versus maximum likelihood estimation of treatment effects in bivariate probit instrumental variable models Bayesian versus maximum likelihood estimation of treatment effects in bivariate probit instrumental variable models Florian M. Hollenbach Department of Political Science Texas A&M University Jacob M. Montgomery

More information

NEW METHODS FOR SENSITIVITY TESTS OF EXPLOSIVE DEVICES

NEW METHODS FOR SENSITIVITY TESTS OF EXPLOSIVE DEVICES NEW METHODS FOR SENSITIVITY TESTS OF EXPLOSIVE DEVICES Amit Teller 1, David M. Steinberg 2, Lina Teper 1, Rotem Rozenblum 2, Liran Mendel 2, and Mordechai Jaeger 2 1 RAFAEL, POB 2250, Haifa, 3102102, Israel

More information

Student Performance Q&A:

Student Performance Q&A: Student Performance Q&A: 2009 AP Statistics Free-Response Questions The following comments on the 2009 free-response questions for AP Statistics were written by the Chief Reader, Christine Franklin of

More information

Reducing Decision Errors in the Paired Comparison of the Diagnostic Accuracy of Continuous Screening Tests

Reducing Decision Errors in the Paired Comparison of the Diagnostic Accuracy of Continuous Screening Tests Reducing Decision Errors in the Paired Comparison of the Diagnostic Accuracy of Continuous Screening Tests Brandy M. Ringham, 1 Todd A. Alonzo, 2 John T. Brinton, 1 Aarti Munjal, 1 Keith E. Muller, 3 Deborah

More information

Supplementary Information

Supplementary Information Supplementary Information The neural correlates of subjective value during intertemporal choice Joseph W. Kable and Paul W. Glimcher a 10 0 b 10 0 10 1 10 1 Discount rate k 10 2 Discount rate k 10 2 10

More information

Research and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida

Research and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida Vol. 2 (1), pp. 22-39, Jan, 2015 http://www.ijate.net e-issn: 2148-7456 IJATE A Comparison of Logistic Regression Models for Dif Detection in Polytomous Items: The Effect of Small Sample Sizes and Non-Normality

More information

Bayesian Dose Escalation Study Design with Consideration of Late Onset Toxicity. Li Liu, Glen Laird, Lei Gao Biostatistics Sanofi

Bayesian Dose Escalation Study Design with Consideration of Late Onset Toxicity. Li Liu, Glen Laird, Lei Gao Biostatistics Sanofi Bayesian Dose Escalation Study Design with Consideration of Late Onset Toxicity Li Liu, Glen Laird, Lei Gao Biostatistics Sanofi 1 Outline Introduction Methods EWOC EWOC-PH Modifications to account for

More information

Bayes Theorem Application: Estimating Outcomes in Terms of Probability

Bayes Theorem Application: Estimating Outcomes in Terms of Probability Bayes Theorem Application: Estimating Outcomes in Terms of Probability The better the estimates, the better the outcomes. It s true in engineering and in just about everything else. Decisions and judgments

More information

BIOSTATISTICAL METHODS

BIOSTATISTICAL METHODS BIOSTATISTICAL METHODS FOR TRANSLATIONAL & CLINICAL RESEARCH Designs on Micro Scale: DESIGNING CLINICAL RESEARCH THE ANATOMY & PHYSIOLOGY OF CLINICAL RESEARCH We form or evaluate a research or research

More information

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES 24 MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES In the previous chapter, simple linear regression was used when you have one independent variable and one dependent variable. This chapter

More information

Unequal Numbers of Judges per Subject

Unequal Numbers of Judges per Subject The Reliability of Dichotomous Judgments: Unequal Numbers of Judges per Subject Joseph L. Fleiss Columbia University and New York State Psychiatric Institute Jack Cuzick Columbia University Consider a

More information

Bayesian Latent Subgroup Design for Basket Trials

Bayesian Latent Subgroup Design for Basket Trials Bayesian Latent Subgroup Design for Basket Trials Yiyi Chu Department of Biostatistics The University of Texas School of Public Health July 30, 2017 Outline Introduction Bayesian latent subgroup (BLAST)

More information

Decision Making in Confirmatory Multipopulation Tailoring Trials

Decision Making in Confirmatory Multipopulation Tailoring Trials Biopharmaceutical Applied Statistics Symposium (BASS) XX 6-Nov-2013, Orlando, FL Decision Making in Confirmatory Multipopulation Tailoring Trials Brian A. Millen, Ph.D. Acknowledgments Alex Dmitrienko

More information

Small-area estimation of mental illness prevalence for schools

Small-area estimation of mental illness prevalence for schools Small-area estimation of mental illness prevalence for schools Fan Li 1 Alan Zaslavsky 2 1 Department of Statistical Science Duke University 2 Department of Health Care Policy Harvard Medical School March

More information

Detection of Unknown Confounders. by Bayesian Confirmatory Factor Analysis

Detection of Unknown Confounders. by Bayesian Confirmatory Factor Analysis Advanced Studies in Medical Sciences, Vol. 1, 2013, no. 3, 143-156 HIKARI Ltd, www.m-hikari.com Detection of Unknown Confounders by Bayesian Confirmatory Factor Analysis Emil Kupek Department of Public

More information

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY Lingqi Tang 1, Thomas R. Belin 2, and Juwon Song 2 1 Center for Health Services Research,

More information

Sheila Barron Statistics Outreach Center 2/8/2011

Sheila Barron Statistics Outreach Center 2/8/2011 Sheila Barron Statistics Outreach Center 2/8/2011 What is Power? When conducting a research study using a statistical hypothesis test, power is the probability of getting statistical significance when

More information

Bayesian Statistics Estimation of a Single Mean and Variance MCMC Diagnostics and Missing Data

Bayesian Statistics Estimation of a Single Mean and Variance MCMC Diagnostics and Missing Data Bayesian Statistics Estimation of a Single Mean and Variance MCMC Diagnostics and Missing Data Michael Anderson, PhD Hélène Carabin, DVM, PhD Department of Biostatistics and Epidemiology The University

More information

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2 MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and Lord Equating Methods 1,2 Lisa A. Keller, Ronald K. Hambleton, Pauline Parker, Jenna Copella University of Massachusetts

More information

Introduction to diagnostic accuracy meta-analysis. Yemisi Takwoingi October 2015

Introduction to diagnostic accuracy meta-analysis. Yemisi Takwoingi October 2015 Introduction to diagnostic accuracy meta-analysis Yemisi Takwoingi October 2015 Learning objectives To appreciate the concept underlying DTA meta-analytic approaches To know the Moses-Littenberg SROC method

More information

For general queries, contact

For general queries, contact Much of the work in Bayesian econometrics has focused on showing the value of Bayesian methods for parametric models (see, for example, Geweke (2005), Koop (2003), Li and Tobias (2011), and Rossi, Allenby,

More information

Case Studies in Bayesian Augmented Control Design. Nathan Enas Ji Lin Eli Lilly and Company

Case Studies in Bayesian Augmented Control Design. Nathan Enas Ji Lin Eli Lilly and Company Case Studies in Bayesian Augmented Control Design Nathan Enas Ji Lin Eli Lilly and Company Outline Drivers for innovation in Phase II designs Case Study #1 Pancreatic cancer Study design Analysis Learning

More information

Numerical Integration of Bivariate Gaussian Distribution

Numerical Integration of Bivariate Gaussian Distribution Numerical Integration of Bivariate Gaussian Distribution S. H. Derakhshan and C. V. Deutsch The bivariate normal distribution arises in many geostatistical applications as most geostatistical techniques

More information

Small Group Presentations

Small Group Presentations Admin Assignment 1 due next Tuesday at 3pm in the Psychology course centre. Matrix Quiz during the first hour of next lecture. Assignment 2 due 13 May at 10am. I will upload and distribute these at the

More information

Journal of Political Economy, Vol. 93, No. 2 (Apr., 1985)

Journal of Political Economy, Vol. 93, No. 2 (Apr., 1985) Confirmations and Contradictions Journal of Political Economy, Vol. 93, No. 2 (Apr., 1985) Estimates of the Deterrent Effect of Capital Punishment: The Importance of the Researcher's Prior Beliefs Walter

More information

PSYCHOLOGY 300B (A01) One-sample t test. n = d = ρ 1 ρ 0 δ = d (n 1) d

PSYCHOLOGY 300B (A01) One-sample t test. n = d = ρ 1 ρ 0 δ = d (n 1) d PSYCHOLOGY 300B (A01) Assignment 3 January 4, 019 σ M = σ N z = M µ σ M d = M 1 M s p d = µ 1 µ 0 σ M = µ +σ M (z) Independent-samples t test One-sample t test n = δ δ = d n d d = µ 1 µ σ δ = d n n = δ

More information

Comprehensive evaluation of CRM, mtpi, and 3+3 relative to a benchmark

Comprehensive evaluation of CRM, mtpi, and 3+3 relative to a benchmark Comprehensive evaluation of CRM, mtpi, and 3+3 relative to a benchmark Bethany Jablonski Horton, Ph.D. University of Virginia April 16, 2015 Bethany Jablonski Horton, Ph.D. (UVA) Early Phase Dose Finding

More information

Objectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests

Objectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests Objectives Quantifying the quality of hypothesis tests Type I and II errors Power of a test Cautions about significance tests Designing Experiments based on power Evaluating a testing procedure The testing

More information

Outlier Analysis. Lijun Zhang

Outlier Analysis. Lijun Zhang Outlier Analysis Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Extreme Value Analysis Probabilistic Models Clustering for Outlier Detection Distance-Based Outlier Detection Density-Based

More information

Psychology Research Process

Psychology Research Process Psychology Research Process Logical Processes Induction Observation/Association/Using Correlation Trying to assess, through observation of a large group/sample, what is associated with what? Examples:

More information

Sawtooth Software. MaxDiff Analysis: Simple Counting, Individual-Level Logit, and HB RESEARCH PAPER SERIES. Bryan Orme, Sawtooth Software, Inc.

Sawtooth Software. MaxDiff Analysis: Simple Counting, Individual-Level Logit, and HB RESEARCH PAPER SERIES. Bryan Orme, Sawtooth Software, Inc. Sawtooth Software RESEARCH PAPER SERIES MaxDiff Analysis: Simple Counting, Individual-Level Logit, and HB Bryan Orme, Sawtooth Software, Inc. Copyright 009, Sawtooth Software, Inc. 530 W. Fir St. Sequim,

More information

Bayesian Estimation of a Meta-analysis model using Gibbs sampler

Bayesian Estimation of a Meta-analysis model using Gibbs sampler University of Wollongong Research Online Applied Statistics Education and Research Collaboration (ASEARC) - Conference Papers Faculty of Engineering and Information Sciences 2012 Bayesian Estimation of

More information

A Proposal for the Validation of Control Banding Using Bayesian Decision Analysis Techniques

A Proposal for the Validation of Control Banding Using Bayesian Decision Analysis Techniques A Proposal for the Validation of Control Banding Using Bayesian Decision Analysis Techniques Paul Hewett Exposure Assessment Solutions, Inc. Morgantown, WV John Mulhausen 3M Company St. Paul, MN Perry

More information

Type and quantity of data needed for an early estimate of transmissibility when an infectious disease emerges

Type and quantity of data needed for an early estimate of transmissibility when an infectious disease emerges Research articles Type and quantity of data needed for an early estimate of transmissibility when an infectious disease emerges N G Becker (Niels.Becker@anu.edu.au) 1, D Wang 1, M Clements 1 1. National

More information

Bayesian Mediation Analysis

Bayesian Mediation Analysis Psychological Methods 2009, Vol. 14, No. 4, 301 322 2009 American Psychological Association 1082-989X/09/$12.00 DOI: 10.1037/a0016972 Bayesian Mediation Analysis Ying Yuan The University of Texas M. D.

More information

This is a repository copy of Practical guide to sample size calculations: superiority trials.

This is a repository copy of Practical guide to sample size calculations: superiority trials. This is a repository copy of Practical guide to sample size calculations: superiority trials. White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/97114/ Version: Accepted Version

More information

Outline of Part III. SISCR 2016, Module 7, Part III. SISCR Module 7 Part III: Comparing Two Risk Models

Outline of Part III. SISCR 2016, Module 7, Part III. SISCR Module 7 Part III: Comparing Two Risk Models SISCR Module 7 Part III: Comparing Two Risk Models Kathleen Kerr, Ph.D. Associate Professor Department of Biostatistics University of Washington Outline of Part III 1. How to compare two risk models 2.

More information

Sampling Weights, Model Misspecification and Informative Sampling: A Simulation Study

Sampling Weights, Model Misspecification and Informative Sampling: A Simulation Study Sampling Weights, Model Misspecification and Informative Sampling: A Simulation Study Marianne (Marnie) Bertolet Department of Statistics Carnegie Mellon University Abstract Linear mixed-effects (LME)

More information

Bayesian Models for Combining Data Across Subjects and Studies in Predictive fmri Data Analysis

Bayesian Models for Combining Data Across Subjects and Studies in Predictive fmri Data Analysis Bayesian Models for Combining Data Across Subjects and Studies in Predictive fmri Data Analysis Thesis Proposal Indrayana Rustandi April 3, 2007 Outline Motivation and Thesis Preliminary results: Hierarchical

More information

School of Population and Public Health SPPH 503 Epidemiologic methods II January to April 2019

School of Population and Public Health SPPH 503 Epidemiologic methods II January to April 2019 School of Population and Public Health SPPH 503 Epidemiologic methods II January to April 2019 Time: Tuesday, 1330 1630 Location: School of Population and Public Health, UBC Course description Students

More information

STATISTICAL METHODS FOR DIAGNOSTIC TESTING: AN ILLUSTRATION USING A NEW METHOD FOR CANCER DETECTION XIN SUN. PhD, Kansas State University, 2012

STATISTICAL METHODS FOR DIAGNOSTIC TESTING: AN ILLUSTRATION USING A NEW METHOD FOR CANCER DETECTION XIN SUN. PhD, Kansas State University, 2012 STATISTICAL METHODS FOR DIAGNOSTIC TESTING: AN ILLUSTRATION USING A NEW METHOD FOR CANCER DETECTION by XIN SUN PhD, Kansas State University, 2012 A THESIS Submitted in partial fulfillment of the requirements

More information

Section on Survey Research Methods JSM 2009

Section on Survey Research Methods JSM 2009 Missing Data and Complex Samples: The Impact of Listwise Deletion vs. Subpopulation Analysis on Statistical Bias and Hypothesis Test Results when Data are MCAR and MAR Bethany A. Bell, Jeffrey D. Kromrey

More information

Psychology Research Process

Psychology Research Process Psychology Research Process Logical Processes Induction Observation/Association/Using Correlation Trying to assess, through observation of a large group/sample, what is associated with what? Examples:

More information

Module Overview. What is a Marker? Part 1 Overview

Module Overview. What is a Marker? Part 1 Overview SISCR Module 7 Part I: Introduction Basic Concepts for Binary Classification Tools and Continuous Biomarkers Kathleen Kerr, Ph.D. Associate Professor Department of Biostatistics University of Washington

More information

SISCR Module 7 Part I: Introduction Basic Concepts for Binary Biomarkers (Classifiers) and Continuous Biomarkers

SISCR Module 7 Part I: Introduction Basic Concepts for Binary Biomarkers (Classifiers) and Continuous Biomarkers SISCR Module 7 Part I: Introduction Basic Concepts for Binary Biomarkers (Classifiers) and Continuous Biomarkers Kathleen Kerr, Ph.D. Associate Professor Department of Biostatistics University of Washington

More information

A Critique of Two Methods for Assessing the Nutrient Adequacy of Diets

A Critique of Two Methods for Assessing the Nutrient Adequacy of Diets CARD Working Papers CARD Reports and Working Papers 6-1991 A Critique of Two Methods for Assessing the Nutrient Adequacy of Diets Helen H. Jensen Iowa State University, hhjensen@iastate.edu Sarah M. Nusser

More information

Regression Discontinuity Designs: An Approach to Causal Inference Using Observational Data

Regression Discontinuity Designs: An Approach to Causal Inference Using Observational Data Regression Discontinuity Designs: An Approach to Causal Inference Using Observational Data Aidan O Keeffe Department of Statistical Science University College London 18th September 2014 Aidan O Keeffe

More information

Measurement Error in Nonlinear Models

Measurement Error in Nonlinear Models Measurement Error in Nonlinear Models R.J. CARROLL Professor of Statistics Texas A&M University, USA D. RUPPERT Professor of Operations Research and Industrial Engineering Cornell University, USA and L.A.

More information

Model calibration and Bayesian methods for probabilistic projections

Model calibration and Bayesian methods for probabilistic projections ETH Zurich Reto Knutti Model calibration and Bayesian methods for probabilistic projections Reto Knutti, IAC ETH Toy model Model: obs = linear trend + noise(variance, spectrum) 1) Short term predictability,

More information

Assessing the diagnostic accuracy of a sequence of tests

Assessing the diagnostic accuracy of a sequence of tests Biostatistics (2003), 4, 3,pp. 341 351 Printed in Great Britain Assessing the diagnostic accuracy of a sequence of tests MARY LOU THOMPSON Department of Biostatistics, Box 357232, University of Washington,

More information

Binary Diagnostic Tests Two Independent Samples

Binary Diagnostic Tests Two Independent Samples Chapter 537 Binary Diagnostic Tests Two Independent Samples Introduction An important task in diagnostic medicine is to measure the accuracy of two diagnostic tests. This can be done by comparing summary

More information