The Factorized Self-Controlled Case Series Method: An Approach for Estimating the Effects of Many Drugs on Many Outcomes

Size: px
Start display at page:

Download "The Factorized Self-Controlled Case Series Method: An Approach for Estimating the Effects of Many Drugs on Many Outcomes"

Transcription

1 The Factorized Sel-Controlled Case Series Method: An Approach or Estimating the Eects o Many Drugs on Many Outcomes Ramin Moghaddass, Cynthia Rudin MIT CSAIL and Sloan School o Management, Massachusetts Institute o Technology, Cambridge, USA and David Madigan Department o Statistics, Columbia University, New York, USA March 25, 25 Abstract We provide a hierarchical Bayesian model or estimating the eects o transient drug exposures on a collection o health outcomes, where the eects o all drugs and all outcomes are estimated simultaneously. The method possesses properties that allow it to handle important challenges o dealing with large-scale longitudinal observational databases. In particular, this model is a generalization o the selcontrolled case series (SCCS) method, meaning that certain patient speciic baseline rates never need to be estimated. Further, this model is ormulated with layers o latent actors, which substantially reduces the number o parameters and helps with interpretability by illuminating latent classes o drugs and outcomes. We demonstrate the approach by estimating the eects o various time-sensitive insulin treatments or diabetes. Keywords: Bayesian Analysis, Drug Saety, Sel-controlled Case Series, Matrix Factorization, Causal Inerence

2 Introduction The medical community, the pharmaceutical industry, and health authorities are obligated to conirm that marketed medical products and prescription drugs have acceptable beneitrisk proiles; in act, these entities have come under increasing scientiic, regulatory and public scrutiny to accurately estimate the eects o drugs. The increasing availability o large-scale longitudinal observational healthcare databases (LODs) opens up exciting new opportunities to add to the evidence base concerning these issues, though the complexity and scale o some o the available databases presents interesting statistical and computational challenges. In what ollows we ocus on using longitudinal observational databases to make inerence about the eects o many drugs with respect to many outcomes simultaneously. Many research studies have attempted to characterize the relationship between timevarying drug exposures and adverse events (AEs) related to health outcomes (e.g., in Madigan 29, Greene et al. 2, Chui et al. 24, Benchimol et al. 23, Simpson et al. 23) and the use o LODs to study individual drug-adverse eect combinations has become routine. The medical literature provides many examples and many dierent epidemiological and statistical approaches, oten tailored to the speciic drug and speciic adverse eect. There is a major law in these approaches o estimating the eect o one drug on one outcome, which is that it is very clear that many drugs are closely related to each other (there are dozens o antibiotics or instance), and many heath outcomes are closely related to each other (e.g., strokes, heart attacks, and other vascular diseases); we should leverage this inormation to better understand causal eects. In this work, we borrow strength across both drugs and outcomes in order to obtain better estimates or each individual drug and outcome. Since we are interested in the causal eect o the drug, and not in the patient-speciic baseline rate o the outcome, we use the ideas o the sel-controlled case series (SCCS) method o Farrington (995), which is a conditional Poisson regression approach wherein each patient serves as his or her own control. The SCCS method has been widely applied, especially in vaccine studies (see the tutorial o Whitaker et al. 26). SCCS controls or all ixed patient-level covariates but remains susceptible to time-varying conounding. The standard SCCS method ocuses on one drug and one outcome. Simp- 2

3 son et al. (23) proposed the multiple sel-controlled case series (MSCCS) method that simultaneously provides eect estimates or multiple drugs and a single outcome. In act, the MSCCS provides a sel-controlled approach that can control or many time-varying covariates, drugs being a special case. Bayesian implementations o both SCCS and MSCCS provide signiicant advantages, especially in high-dimensional settings with thousands or even tens o thousands o drugs and outcomes and even larger numbers o interactions. Neither SCCS nor MSCCS account or the act that many drugs/treatments naturally orm classes and thereore regression coeicients or drugs rom within a single class might reasonably be modeled as arising exchangeably rom a common prior distribution. Adverse events and health conditions can also be organized hierarchically, again aording an opportunity to borrow strength across related outcomes. For both drugs and outcomes, the hierarchy could extend to multiple levels. In what ollows we ormalize these ideas within the ramework o latent actor Bayesian hierarchical models. Factor models, which have been traditionally used in behavioral sciences, provide a lexible ramework or modeling multivariate data via unobserved latent actors (e.g., Ghosh & Dunson 29). In this paper, we do not impose speciic latent structure a priori. However, our approach can also be used or cases where classes o drugs and conditions are known a priori. We will show that the latent actor approach not only brings more interpretability to our model, but also can signiicantly contribute to reducing the computational complexity. To our knowledge, only ew works have considered matrix actorization-based data analysis techniques or drug saety and surveillance (or example, Zitnik & Zupan 25, or drug-induced liver injury prediction and Cobanoglu et al. 23, or predicting drug-target interactions in neurobiological disorders, which are both very dierent than our study). We introduce three models or predicting the eects o multiple drugs on multiple outcomes that use hierarchical Bayesian analysis. The irst model (Model ) does not use latent actors, and borrows strength across all drugs and outcomes. The second model (Model ) uses one set o latent drugs and one set o latent outcomes, through a single matrix actorization. The third model (Model 2) uses two sets o latent actors, by actoring the matrix o coeicients into three matrices; one or converting drugs to latent drugs, another or converting outcomes to latent outcomes, and the third or modeling the eects 3

4 o latent drugs on latent outcomes. By allowing or latent actors, the second and third models provide an increased level o interpretability, use ewer variables, and are thus more computationally eicient to estimate. The rest o this paper is organized as ollows: In Sections 2, 3, 4, and 5 we describe the model and the Bayesian inerence procedure. In Section 6 the structure o the MCMC or parameter estimation is illustrated. We then use a series o simulations in Section 7 to show that we can recover the true generating parameters rom data. Finally, we demonstrate the approach in Section 8 or estimating the eects o various insulin treatments or diabetes. Our proposed methodology has broader applicability beyond estimating the eects o drugs considered in this paper. 2 Factorized Sel-Controlled Case Series - Notation and Framework The actorized sel-controlled case series (FSCCS) method generalizes the sel-controlled case series (SCCS) to handle multiple treatments and multiple outcomes. The notation used throughout the paper is as ollows: N: number o patients (i indexes individuals rom to N). x idj : binary indicator relecting whether patient i is exposed to drug j on interval d. x id = [x id, x id2,..., x idj ] : the vector o exposed drugs or patient i on interval d. J: number o drugs (treatments). O: number o health outcomes (adverse events). D o i : the set o observation intervals where patient i has outcome o. τ o i : the number o observation intervals where patient i has outcome o (the size o D o i ). yid o : binary indicator relecting whether patient i has outcome o on interval d. y o i = [y o i, y o i2,..., y o iτ o i ] : the vector o observed outcomes o or patient i. ϕ o i : baseline incidence o outcome o or patient i. 4

5 Φ = ϕ... ϕ O : : : ϕ N... ϕo N : baseline incidence matrix. β o j : regression coeicients associated with outcome o and drug j. β o = [β o, β o 2,..., β o J ] : regression coeicients associated with outcome o. β... β O B = : : : βj... βj O : drug-outcome coeicient matrix. λ id = exp(ϕo i + x id βo ): the Poisson event rate o outcome o, or patient i, on interval d. Outcomes occur according to a nonhomogeneous Poisson process, where drug exposure can modulate the rate over time. Patient i has an individual baseline rate o exp(ϕ o i ) or outcome o that remains constant over time. Drug j has a multiplicative eect o exp(β o j ) on the individual baseline rate exp(ϕ o i ) during its exposure period. The Poisson event rate or outcome o and patient i on interval d according to the SCCS is: The key beneit o the SCCS is that the ϕ o i λ o id = exp(ϕ o i + x idβ o ). terms do not need to be modeled, since we are interested in the ratio o Poisson intensities with and without the drug. For instance, considering only one drug j, comparing the intensity ratio or day d to a dierent day d 2 with no exposure to the drug, we have: λ o id = exp(ϕo i + βj o ) λ o id 2 exp(ϕ o i + βo j ) = exp(βo j ). As the Poisson rate is assumed to be constant in each interval, the number o outcomes o observed or patient i on interval d is distributed as a Poisson random variable (r.v.) denoted by Y o id as Pr(Yid o = yid x o id ) = e λo id λ o yid o id yid o!. Based on the above, the contribution to the likelihood or patient i and outcome o or the observed sequence o events y o i = [y o i, y o i2,..., y o iτ o i ], conditioned on the observed exposures 5

6 x i = [x i,..., x iτ o i ] is L i = Pr(yi o x i ) = Pr(yid x o id ) = exp e ϕo i +x id βo d Di o d Di o d Di o = exp e ϕo i e x id βo e ϕo i yo id y o d Di o d Di o d Di o id! ( ) y o = exp ϕ o i n o i e ϕo i e x id βo id e x id βo yid o!, d D o i d D o i ( e x id βo ) y o id ( e ϕo i +x id βo ) y o id y o id! (2.) where n o i = d yo id. It is assumed here that uture outcomes are independent o past outcomes and also outcomes are independent o each other. One could orm the ull likelihood to estimate the unknown parameters (Φ, B). In order to avoid estimating the nuisance parameter set Φ, we can condition on its suicient statistic, which removes the dependence on Φ. The cumulative intensity is a sum (rather than an integral) since we assume a constant intensity over each interval. Conditioning on n o i yields the ollowing likelihood or person i: L o i = Pr(y o i x i, n o i ) = Notice that because n o i d D o i exp Pr(y o id x id) Pr(n o i x i) d D o i d = ex id βo e x id βo Pr(y o id x id) d Di o exp λ o n o i d D i o id λ o d D i o id n o i! y o id. (2.2) is suicient, the individual likelihood in the above expression no longer contains Φ. Assuming that patients are independent and conditions are independent, the ull conditional likelihood or event o is simply the product o the individual likelihoods (i.e. L o = N L o i ). Intuitively it ollows that i i has no outcomes o type o, it cannot i= provide any inormation about the relative rate o outcome o. We now present three hierarchical models or multiple drug, multiple outcome selcontrolled case series and discuss how to estimate the drug-outcome coeicient matrix B. 6

7 Two o the models have latent actors that allow B to be expressed in a simpler and more interpretable way. In our experiments, the empirical perormance o these methods is about the same. 3 Model - No latent actors Instead o estimating each coeicient independently, we borrow strength over both drugs and outcomes, which adds substantial regularization. One should think o this as being relevant or a set o related outcomes and drugs, e.g., heart-disease related outcomes and the set o drugs one might prescribe or heart-related conditions. We shrink the estimates or drug j over all outcomes to µ j by placing an independent normal prior on each βj o as βj o N (µ j, σj 2 ), (j, o), where µ j N (, γ), j. We also assume uniorm priors or hyperparameters σ j and γ as σ j U(, a), j and γ U(, a), where parameter a is a user-deined constant. A natural extension o this model (not explored here) would be to have drugs belong to certain classes o drugs, so that priors can be deined based on each class o drugs; similarly with outcomes. The posterior probability o our model is now deined as Pr(B, µ, σ, γ y, a) = Pr(y B) Pr(B µ, σ) Pr(µ γ) Pr(γ a) Pr(σ a) (3.) o j ( ( exp x id β o) d exp ( x id β o) i d D i N (βj o µ j, σj 2 ) o j ) y (o) id N (µ j, γ) j Pr(σ j a) Pr(γ a). The negative log-posterior (which can be used or inding the MAP solution i desired) is: L = log (Pr(B, µ, σ, γ y, a)). The graphical representation o this model is shown in Figure. 7

8 a γ.. µ j σ j β o j o = : O j = : J B Figure : Graphical representation o Model 4 Model - One level o latent actors This model is motivated by two considerations. First, modeling the ull posterior distribution o Model can be computationally expensive, particularly or large N, J, and O, where J and O determine the number o variables to be estimated within the B matrix. Second, Model overlooks the act that drugs and outcomes might come rom a smaller number o latent classes; or instance, there are commonly several drugs that are extremely similar to each other or treating a set o highly related illnesses. We consider F latent actors or drugs and outcomes. We model the J O matrix B as B = L (D) L (O), where L (D) = L (D),... L (D),F : : : L (D) J,... L (D) J,F, L(O) = L (O),... L (O),O : : : L (O) F,... L (O) F,O. 8

9 This way, we do not assume we know in advance which drugs have similar eects on which outcomes, instead we estimate this rom data. The number o latent actors F can be determined by cross-validation. The total number o latent actors is J F + F O, which can be substantially less than J O. For drug latent actors, we place independent normal priors on the entries o L (D) as L (D) j N (µ (D), σ (D)2 ), (j, ), where µ (D) N (, γ (D) ),. Similarly, we deine normal priors on the entries o L (O) as L (O) o N (µ(o), σ (O)2 ), (, o), where µ (O) We assume uniorm priors or hyperparameters σ j N (, γ (O) ),. and γ as σ (D) U(, a),, σ (O) U(, a),, γ (D) U(, b), γ (O) U(, b), where (a, b) are known parameters. The posterior probability is now deined as: Pr(L (D), L (O), µ (D), µ (O), σ (D), σ (O), γ (D), γ (O) y) = Pr(y L (D), L (O) ) (4.) Pr(L (D) µ (D), σ (D) ) Pr(L (O) µ (O), σ (O) ) Pr(µ (D) γ (D) ) Pr(µ O γ (O) ) Pr(σ (D) a) Pr(σ (O) a) Pr(γ (D) b) Pr(γ (O) b) ( ( exp x id β o) ) (o) y id o d exp ( x id β o) J i F j= = F = Pr(σ (D) d D o i N (L (D) j µ(d) N (µ (D), γ (D) ), σ (D)2 ) F = (O) F N (L (O) = o= N (µ (O), γ (O) ) o µ(o) b) Pr(σ (O) ) b) Pr(γ (D) a) Pr(γ (D) a)., σ (O)2 ) The graphical representation o this hierarchical Bayesian model is given in Figure 2. 5 Model 2 - Two levels o latent actors Here we represent B as: B = L (D) L (F ) L (O), 9

10 b. a a b γ (D) γ (O) σ (D) µ (D) µ (O) σ (O) L (D) j L (O) o j o = : F L (D) L (O) B Figure 2: The Graphical Framework or Hierarchical Bayesian Model with one level o latent actors where B = L (D),... L (D),F : : : L (D) J,... L (D) J,F L (F ),... L (F ),F 2 : : : L (F ) F,... L (F ) F,F 2 L (O),... L (O),O : : : L (O) F 2,... L (O) F 2,O. The number o latent actors is thus J F + F F 2 + F 2 O, which can be less than the number o variables o Model in many cases. Its major beneit is interpretability, since now the number o latent drug actors and the number o latent outcome actors can be estimated dierently. L (D) represents the relationship between latent drugs and latent

11 drug-related actors, L (F ) represents the relationship between latent drug-related actors and health outcome-related actors, and L (O) represents the relationship between drugrelated actors and health outcome-related actors. L (O) is really the core set o variables since they relate the latent treatments to the latent outcomes. Note that latent actors could be some known categories/classes o drugs and health conditions. The idea here is to analyze all related drugs and related health events simultaneously. The priors are L (D) jf N (µ (D) F µ (D) F N (, γ (D) ), µ (F ) N (, γ (F ) ), µ (O) F 2 σ (O) F 2, σ (D)2 ), L (O) F 2 o N (µ(o) F F 2 N (, γ (O) ), σ (D) F, σ (O)2 F 2 ), L (F ) F F 2 N (µ (F ), σ (F )2 ), U(, a), σ (F ) U(, a), U(, a), γ (D) U(, b), γ (F ) U(, b), and γ (O) U(, b) or all (F, F 2, j, o). The posterior probability is: Pr(L (D), L (F ), L (O), µ (D), µ (F ), µ (O), σ (D), σ (F ), σ (O), γ (D), γ (F ), γ (O) y) (5.) = Pr(y L (D), L (F ), L (O) ) Pr(L (D) µ (D), σ (D) ) Pr(L (F ) µ (F ), σ (D) ) Pr(L (O) µ (O), σ (O) ) Pr(µ (D) γ (D) ) Pr(µ (F ) γ (F ) ) Pr(µ (O) γ (O) ) Pr(σ (D) a) Pr(σ (F ) a) Pr(σ (O) a) Pr(γ (D) b) Pr(γ (F ) b) Pr(γ (O) b). Table compares the number o parameters in each o the three models. Models and 2 have much ewer parameters when F, F, and F 2 are lower than J and O. Table : The number o parameters and hyperparameters in each model Model Name # o Parameters # o Hyperparameters Total Model J O 2 J + J O + 2 J + Model J F + F O 4 F + 2 J F + F O + 4 F + 2 Model 2 J F + F F 2 + F 2 O 2 F + 2 F J F + F F 2 + F 2 O + 2 F + 2 F Parameter Estimation Using Markov Chain Monte Carlo We use Markov Chain Monte Carlo (MCMC) to approximate the entries o B, speciically random walk Metropolis (RWM). The algorithm employs a Gaussian proposal distribution

12 J t (x, x ) which proposes a new parameter set x given the current parameter set x. We denote Θ as the set o all parameters in the model excluding B. Step. set t =. Generate an initial state {B, Θ } with positive probability Pr(B, Θ y). Repeat the ollowing until stationary distribution and the desired number o samples are reached considering optional burn-in and/or thinning. Step 2. Sample {B, Θ } rom the symmetric proposal distribution J t ({B, Θ },. ). Step 3. Calculate the acceptance probability ( α = min, Pr(B, Θ y) Pr(B t, Θ t y) Step 4. Draw a random number u rom Uni(, ). I u α, accept the proposal state {B, Θ } and set B t = B, Θ t = Θ, else set B t = B t, Θ t = Θ t. Set t : t +. Our implementation uses a component-wise sampling approach, that is, when a sample or one dimension is proposed, all other dimensions are held ixed. Ater the current sample get accepted or rejected, we then move on to the next dimension and repeat the process while holding all other dimensions constant. sampling approaches may be necessary. ). For truly large-scale applications, blocked 7 Simulation Study We will show that or data generated rom our model, the true data-generating parameters B can be recovered. We simulated sample trajectories o drug exposure and health outcomes or 6 patients over 6 days. We set the number o drugs to J = 4, and the number o health conditions to O = 4. Each patient randomly took between J drugs over the 6 days. The average exposure period was assumed to be 2% o the study interval or each patient (that is, on average, each patient was exposed to one or more drugs or at least 2 days). The exposure intervals are randomly selected, so these intervals could be multiple non-consecutive days, multiple consecutive days, or a combination o both. Drugs can have positive or negative contributions to the likelihood and intensity rate o each outcome. For each model (Model, Model, Model 2), we generated the elements o B according to the model s hierarchy. Figure 3 and Figures 2-3 (which are given in the Appendix) show the posterior density or each parameter o B, or Models, and 2, as estimated by MCMC 2

13 sampling. These igures show that the posterior samples were concentrated around the true values and the posterior mean o each variable was generally close to its true value. We β, β, β, β, Posterior density β 2, β 2, β 2, β 2, β 3, β 3, β 3, β 3, β 4, β 4, β 4, β 4,4 Figure 3: Normalized histograms o posterior samples or each element o B in Model. The vertical line indicates the true value. summarize Figures 3, 2, and 3 in Figure 4, which provides a scatter plot o the posterior means and true values or each o the three models. It can be observed that each o the posterior means are very close to their true values. 8 Application to Blood Glucose Analysis or Diabetes We consider an application to Insulin-Dependent Diabetes Mellitus, where our goal is to predict blood glucose level outcomes under dierent circumstances o a patient s daily lie, including their recent eating history, exercise, and insulin injections. Our data are longitudinal measurements taken multiple times per day rom 7 patients (this is the AIM-94 data set provided by Michael Kahn, MD, PhD, Washington University, St. Louis, MO, 3

14 2 a) Model b) Model c) Model Posterior mean Posterior mean Posterior mean True values True values True values Figure 4: Scatter plot o posterior means vs. true parameter values o elements o B or Models, and 2. Bache & Lichman 23). We aim mainly to illustrate (i) how the models we introduce can be used with complex longitudinal data to predict outcomes, (ii) the prediction power and (iii) interpretability o the proposed models. It is well known that current therapies or regulating glucose level in diabetics are very challenging and oten rustrating, as they require patients to continuously regulate diet, exercise and various medications any deviations can be dangerous (Benchimol et al. 23). Our dataset and code will be made publicly available or the purpose o reproducibility. Blood glucose measurements, symptoms and insulin treatments were recorded with timestamps or each patient, over the course o several weeks to months. The two main classes o health outcomes considered here are hyperglycemia (high blood glucose) and hypoglycemia (low blood glucose). All other health outcomes we deine later are related to these two classes. Figure 5 provides a schematic o the type o data we are considering or one patient over a course o day. We will describe the setup in more detail: Drugs/Treatments: Diet, exercise, and injected insulin were treated as three dierent classes o treatments. It is obvious that interactions among these treatments are important. Insulin doses are given one or more times a day, typically beore meals and sometimes also at bedtime. Three types o insulin were considered: () regular, (2) Neutral Protamine Hagedorn (NPH), and (3) Ultralente. Each insulin type has its own characteristic time o onset (O), time o peak action (P), and eective duration (D). The exposure time intervals 4

15 NPH injection More-than-usual exercise activity Less-than-usual meal ingestion NPH injection Events Measurements Pre-breakast BG measurement (35 mg/dl) Pre-lunch BG measurement (6 mg/dl) Pre-lunch BG measurement (47 mg/dl) Pre-snack BG measurement (8 mg/dl) 6: 8: : 2: 4: 6: 8: 2: 22: Time Figure 5: Sample longitudinal traces or a patient with multiple drug/treatments exposures and blood Glucose measurements over a day. Downwards arrows indicate only measurements. Upwards arrows indicate treatments. or peak, and duration used in this paper, which were provided with the dataset, are shown in Table 2 in the Exposure Time column in the bottom several rows o the table labeled Insulin. Based on the actual time o injection, we determined the intervals at which the patient is at peak and/or within the duration o an insulin injection. Based on this, six types o treatments were considered, () regular insulin on peak, (2) regular insulin on duration, (3) NPH insulin on peak, (4) NPH insulin on duration, (5) Ultralente insulin on peak, and (6) Ultralente insulin on duration. At each interval o time, the patient can be either insulin ree or subject to one o the above six exposures. The second class o treatment is or exercise, which may have complex eects on glucose level. For example, glucose levels can all during exercise but also quite a ew hours aterwards. Three types o exercise are reported, () normal exercise, (2) lower than normal exercise, and (3) higher than normal exercise. Each type o exercise was considered separately as a single treatment. The third class o treatment is or diet, which also can have complex eects on the glucose level. For example, a larger meal may lead to a longer and possibly higher elevation 5

16 Table 2: The list o drugs/treatments and health outcomes and their known correlations. P means strong positive correlation, and N means strong negative correlation, D is lower decile and D means highest decile. Treatment Exposure Time Health Outcome Low Glucose Level High Glucose Level. Too low 2. Low 3. D 4. Hypo Symptom 5. Too High 6. High 7. D Exercise Diet Level Time. Normal -4 h 2. Too High -4 h P P P P N N N 3. Low -4 h 4. Normal -4 h 5. Too High -4 h N N N N N N N 6. Low -4 h 7. Ater Meal -4 h N N N N P P P 8. Beore Meal -4 h P P P P N N N Insulin Regular NPH Ultralente 9. Peak -3 h P P P P N N N. Duration -5 h P P P P N N N. Peak 4-6 h P P P P N N N 2. Duration -2 h P P P P N N N 3. Peak 4-24 h P P P P N N N 4. Duration -27 h P P P P N N N o blood glucose. Missing a meal may put the patient at risk or low glucose levels in the hours that ollow. Three types o diet are reported: () normal diet, (2) higher than normal diet, and (3) lower than normal diet. Each o these types o diet were taken as a single treatment. Since measurements were collected beore a meal, ater a meal, and at other times, we considered two other eatures in the model, () beore meal measurement o blood glucose and (2) ater meal measurement, and we treated them as binary eatures. These extra eatures allow us to distinguish whether the measurement was made beore or ater the meal (there is a big dierence between glucose measurements taken beore a meal and ater a meal). Based on all o the treatments described, the total number o variables associated with treatments in the model is 4 (J = 4). The variables are all listed on Table 2 in the Treatment column on the let. Health Outcomes. The outcomes are divided into two major categories, i) high glucose 6

17 level and ii) low glucose level. Given that normal pre-meal blood glucose ranges rom approximately 8-2 mg, and post-meal blood glucose ranges rom 8-4 mg/dl (Bache & Lichman 23), we considered seven health outcomes or glucose level: extremely low (below 4 mg/dl), low (between 4-8 mg/dl), high (over 4 mg/dl), extremely high (over 8 mg/dl), lower decile (lower % o glucose level or each patient), upper decile (upper % o glucose level or each patient), and hypoglycemic (low glucose) symptoms. Thus, the total number o outcomes considered or our analysis is O = 7. We can perorm an evaluation only on intervals where we have glucose measurements, thus we only use those intervals. True Relationships Between Drugs and Outcomes. We wanted to determine whether our model reproduces known relationships between treatments and glucose levels, even though it was not explicitly designed to do so. The inormation about true relationships within Table 2 come rom material accompanying the dataset and We denoted known positive eects in Table 2 by P, strong negative eects by N, and relationships that were unknown were denoted by dashes. For example, we expect NPH injection on peak to decrease the likelihood o having Too High glucose level, so the correlation between NPH on peak and Too High glucose level is negative (N ). Mixing. We perormed cross-validation, dividing our data into ive olds, each with the same number o patients, training our models on our olds and testing on the ith. We removed the irst 5 iterations (as burn-in) o Metropolis-Hastings sampling, and obtained 6 additional samples to estimate the posterior. Figure 6 shows samples rom the posterior o one o the variables, β Too high GL NPHonpeak, or ive separate model instantiations (Model, Model with F =2, Model with F =3, Model 2 with F = 2, F 2 = 2, and Model 2 with F = 3, F 2 = 3. Recall that in Models and 2, we sample elements o the matrices o latent actors and then calculate B. From this igure, we observe reasonable mixing or all models, and we observe that models with latent actors (Models and 2) have better mixing and convergence, possibly due to the smaller number o variables. Computation. The number o parameters diers substantially between models, which aects CPU time o the MCMC sampler. In Figure 7, the number o parameters or each model and the associated CPU time or running MCMC are shown. This igure shows a 7

18 Model Model F=2 Model F=3 Model 2 F =2, F 2 =2 Model 2 F =3, F 2 = Iteration No Too high GL Figure 6: Samples rom the posterior o βnph on peak or ive model instantiations (let), and their histograms (right). The irst 5 samples were discarded as burn-in. We observe better mixing and convergence in the models with latent actors (Model and Model 2). clear correlation between the number o variables and CPU time, that is CPU time increases with the number o parameters. In particular, Model takes a long time to run, because it has substantially more variables than the other models. Interestingly, it seems that using latent actors has a purpose beyond interpretability and regularization, in particular, it helps substantially with tractability. Interpretation o coeicients in B and comparison with ground truth. We compare the estimated coeicients in B to the ground truth signs o coeicients given in Table 2. It is not necessarily the case that the signs o the estimated coeicients need to agree with the ground truth signs in order or the model to perorm well, but it is a reasonable aspect o the model to consider. To perorm this comparison, we ranked the estimated coeicients in B and used these rankings and the true signs o coeicients to generate an ROC curve. That is, the ROC curve was generated by placing thresholds at each point in the ranking, and calculating the true positive rate and alse positive rate with respect to the true coeicients. We computed these ROC curves or all ive model 8

19 6 5.5 Model CPU time (hours) Model 2 F =2, F 2 =2 Model F=3 Model 2 F =3, F 2 = Model F= Number o parameters Figure 7: Comparison between the CPU time and number o variables in the ive model instantiations. Model has larger number o variables and signiicantly higher CPU time. instantiations to obtain Figure 8. These ROC curves indicate that all models perormed well, in the sense o estimating reasonable signs or the coeicients, and also indicates that models with more latent actors perormed better than Model. The models with two latent treatment and outcome actors perormed slightly better than the models with three latent treatment and outcome latent actors, though there was no signiicant dierence in perormance between Model and Model 2 or the same number o latent treatment and outcome actors. Drug Surveillance. We evaluated prediction perormance o our model as ollows: or each patient we calculated the Poisson rate o each condition at each hour considering all drug exposures. We then checked whether or not the patient had that condition at that time. The Poisson rate acts as the score o each patient with regards to each condition. In Figure 9, we present the actual glucose level or a patient at 2 measurement points (upper igure) as well as the estimated intensity rate o the Too-High glucose level or the same patient (lower igure). Each point in this igure represents the estimated x ˆβ id o, where ˆβ o are the estimated regression coeicients associated with too-high glucose level and x id is the known vector o drug exposures at interval d. The igure shows that the estimated intensity rate is reasonably close to the actual level o glucose, particularly when the glucose level is actually too high. In Figure, we repeated the analysis or Too-Low glucose level. It can 9

20 .9 Model Model, F=2 Model, F=3 Model 2, F =2, F 2 =2 Model 2, F =3, F 2 =3 AUC=8% AUC=85% AUC=82% AUC=85% AUC=83% Sensitivity Sensitivity Sensitivity Sensitivity Sensitivity Speciicity.5 Speciicity.5 Speciicity.5 Speciicity.5 Speciicity Figure 8: Receiver Operating Characteristic (curve) or evaluating the signs o coeicients against true signs or ive model instantiations. See the text or details o how these curves were generated. be observed rom this igure that the times where this coeicient is particularly large are the same times where the glucose level drops substantially. This kind o dramatic agreement was not observed or all patients nor all conditions, so below we describe a more general evaluation procedure. In Figure, we show the box plots or the estimated Poisson rates o the two conditions o Too Low glucose level and Too High glucose level on all seventy patients in the test sets. For comparison, we also plotted the box plots o the estimated Poisson rates or normal conditions, where patients did not have too-high or too-low glucose levels. For clarity and airness, we normalized the estimated Poisson rates or each patient. It can be observed rom this igure that the estimated Poisson rate o these conditions are elevated when patients actually suer rom Too-Low (or Too-High, respectively) glucose. Thus, our model could be a useul approach or monitoring the likelihood o a condition, given the timing o the drugs recently taken by the patient. The results were consistent across all ive model instantiations. In Table 3, we report an AUC (area under the ROC curve) value that measures the probability that a method ranks a positive condition timepoint higher than a timepoint with no condition, or the same patient. In particular, we are testing whether the Poisson rate o a patient with a health condition is higher than the estimated rate o the same 2

21 Glucose Level Estiamted Intensity or Too High Glucose level Patient # Measurement # Measurement # Figure 9: Monitoring Too-High glucose levels over 2 hours. The upper igure shows the true glucose levels obtained by measurement, and the lower igure shows the estimated x id ˆβ o or the Too-High glucose outcome over 2 hours. patient when no condition is present. For each model instantiation, we trained the model on our training olds and then calculated the AUC on the ith old, and we repeated this or each condition. We reported the average and standard deviation o AUC over all patients in the test sets. Again D is the lower decile (lower % o glucose level or each patient), and D is the upper decile (upper % o glucose level or each patient). Intuition o B as compared with other methods We compared the perormance o the proposed models with that o the univariate sel-controlled case series (SCCS), and multi-variate sel-controlled case series (MSCCS) (Simpson et al. 23) with and without an l 2 regularization term on the coeicients. For each model instantiation and each method, the estimated entries o B were compared to their known eects (positive or negative) rom Table 2. For each method, we provide the area under the curve (AUC) or this comparison in Table 4. We also reported the mean, median, and standard deviation o all estimated coeicients in the group o Table 2 s positive group and Table 2 s negative group. Better models should have higher AUC, and the estimated coeicients o B should agree in sign with those in Table 2. From Table 4, we observe that as expected, the MSCCS perormed better than the SCCS, the regularized MSCCS worked slightly better than the normal MSCCS, and all proposed 2

22 Glucose Level Estimated Intensity Too low Glucose Level Patient # Measurement # Measurement # Figure : Monitoring too-low glucose levels over 2 hours. The upper igure shows the true glucose levels obtained by measurement, and the lower igure shows the estimated x id ˆβ o or the too-low glucose outcome over 2 hours. Bayesian model instantiations (Model -2) perormed better than all o the traditional models, yielding higher AUC s and better agreement in the mean signs o coeicients; urther the standard deviations or the coeicient values were substantially lower. These perormance beneits come in addition to the other beneits o our approach discussed earlier, including computational tractability and interpretability o the latent actors. 9 Concluding Remarks The novel elements o this work are as ollows. () We estimate the eects o many drugs on many health outcomes simultaneously. Borrowing strength across similar drugs and outcomes allows us to create better estimates across both drugs and outcomes. (2) We use latent actors to encode latent classes o drugs and outcomes, to help with interpretability, and to provide a major computational beneit. A computational beneit is also provided by the SCCS s ramework, since we do not need to estimate the baseline rates o outcomes or each patient. This approach is scalable to large longitudinal observational databases, is generally applicable (also to problems beyond healthcare), and provides a level o interpretability to physicians and patients that was not previously possible. 22

23 Model Model, F=2 Model, F=3 Model 2, F = F 2 = 2 Model 2, F = F 2 = 3 Normalized Rate Normal Too Low Normal Too Low Normal Too Low Normal Too Low Normal Too Low Model Model, F=2 Model, F=3 Model 2, F = F 2 = 2 Model 2, F = F 2 = 3 Normalized Rate Normal Too High Normal Too High Normal Too High Normal Too High Normal Too High Figure : Comparison between the box-plots or Too-High and Too-Low glucose level and normal condition or all ive model instantiations, where the Poisson rates were normalized or each person. Acknowledgement We grateully acknowledge unding rom the Natural Science and Engineering Research Council o Canada (NSERC) and the MIT Big Data Initiative. Reerences Bache, K. & Lichman, M. (23), UCI machine learning repository, University o Caliornia, Irvine, School o Inormation and Computer Sciences, Benchimol, E., Hawken, S., Kwong, J. & Wilson, K. (23), Saety and utilization o inluenza immunization in children with inlammatory bowel disease, Pediatrics 3(6), Chui, C., Man, K., Cheng, C.-L., Chan, E., Lau, W., Cheng, V., Wong, D., Yang Kao, Y.-H. & Wong, I. (24), An investigation o the potential association between retinal 23

24 Table 3: Average and standard deviation (sd) o AUC over 5 olds. The entities that were ranked or each score are pairs o timepoints or a patient, where or one o the timepoints the patient had a condition and or the other timepoint the patient did not have a condition. The AUC was calculated as the raction o pairs o times (the irst time with the condition and the second time without the condition) where the estimated rate was higher when the patient had the condition vs. when the patient did not have the condition. Too low Low Too high High D D Hypo Symptom Model Model, F = 2 Model, F = 3 Model 2, F = F 2 = 2 Model 2, F = 3, F 2 = 3 Mean 7.3% 62.92% 62.98% 6.26% 58.2% 56.39% 59.7% sd 9.56% 2.79% 3.38% 2.7% 3.93% 3.5% 4.95% Mean 7.86% 62.7% 63.6% 6.58% 57.97% 56.5% 56.7% sd 8.28% 2.47% 3.4% 2.% 3.36% 3.4% 3.9% Mean 7.66% 62.68% 62.92% 6.5% 57.97% 56.35% 56.33% sd 7.89% 2.52% 3.4% 2.2% 3.37% 2.9% 2.87% Mean 7.9% 62.7% 62.97% 6.44% 58.% 55.9% 56.88% sd 9.5% 2.43% 3.8% 2.3% 3.34% 3.6% 3.2% Mean 7.7% 62.73% 62.99% 6.47% 58.5% 56.5% 56.68% sd 9.52% 2.44% 3.6% 2.7% 3.3% 3.5% 2.79% detachment and oral luoroquinolones: A sel-controlled case series study, Journal o Antimicrobial Chemotherapy 69(9), Cobanoglu, M., Liu, C., Hu, F., Oltvai, Z. & Bahar, I. (23), Predicting drug-target interactions using probabilistic matrix actorization, Journal o Chemical Inormation and Modeling 53(2), Farrington, P. (995), Relative incidence estimation rom case series or vaccine saety evaluation, Biometrics 5, Ghosh, J. & Dunson, D. (29), Deault prior distributions and eicient posterior computation in Bayesian actor analysis, Journal o Computational and Graphical Statistics 8(2), Greene, S., Kulldor, M., Yin, R., Yih, W., Lieu, T., Weintraub, E. & Lee, G. (2), Near 24

25 Table 4: Comparison with existing models Measure Model Mode F = 2 Model F = 3 Method Model 2 Model 2 F = 2, F 2 = 2 F = 3, F 2 = 3 SCCS MSCCS Regularized MSCCS AUC B Mean Median sd B + Mean Median sd real-time vaccine saety surveillance with partially accrued data, Pharmacoepidemiology and Drug Saety 2(6), Madigan, D. (29), The sel-controlled case series: Recent developments, 94(6), Simpson, S., Madigan, D., Zorych, I., Schuemie, M., Ryan, P. & Suchard, M. (23), Multiple sel-controlled case series or large-scale longitudinal observational databases, Biometrics 69(4), Whitaker, H. J., Farrington, C. P., Spiessens, B. & Musonda, P. (26), Tutorial in biostatistics: the sel-controlled case series method, Statistics in Medicine 25(), Zitnik, M. & Zupan, B. (25), Matrix actorization-based data usion or drug-induced liver injury prediction, Systems Biomedicine pp

26 Appendix: Figures 2 and β, β, β, β, Posterior density β 2, β 2, β 2, β 2, β 3,.4..2 β 3, β 3, β 3, β 4, β 4, β 4, β 4,4 Figure 2: Normalized histograms o posterior samples or each element o B in Model. The vertical line indicates the true value. 26

27 β, β, β, β,4.4 Posterior density β 2, β 2, β 2, β 2, β 3, β 3, β 3, β 3, β 4, β 4, β 4, β 4,4 Figure 3: Normalized histograms o posterior samples or each element o B in Model 2. The vertical line indicates the true value. 27

The Factorized Self-Controlled Case Series Method: An Approach for Estimating the Effects of Many Drugs on Many Outcomes

The Factorized Self-Controlled Case Series Method: An Approach for Estimating the Effects of Many Drugs on Many Outcomes Journal o Machine Learning Research 7 (26) -24 Submitted 8/5; Revised 2/5; Published 6/6 The Factorized Sel-Controlled Case Series Method: An Approach or Estimating the Eects o Many Drugs on Many Outcomes

More information

Multilevel IRT for group-level diagnosis. Chanho Park Daniel M. Bolt. University of Wisconsin-Madison

Multilevel IRT for group-level diagnosis. Chanho Park Daniel M. Bolt. University of Wisconsin-Madison Group-Level Diagnosis 1 N.B. Please do not cite or distribute. Multilevel IRT for group-level diagnosis Chanho Park Daniel M. Bolt University of Wisconsin-Madison Paper presented at the annual meeting

More information

Modelling Spatially Correlated Survival Data for Individuals with Multiple Cancers

Modelling Spatially Correlated Survival Data for Individuals with Multiple Cancers Modelling Spatially Correlated Survival Data for Individuals with Multiple Cancers Dipak K. Dey, Ulysses Diva and Sudipto Banerjee Department of Statistics University of Connecticut, Storrs. March 16,

More information

A Brief Introduction to Bayesian Statistics

A Brief Introduction to Bayesian Statistics A Brief Introduction to Statistics David Kaplan Department of Educational Psychology Methods for Social Policy Research and, Washington, DC 2017 1 / 37 The Reverend Thomas Bayes, 1701 1761 2 / 37 Pierre-Simon

More information

Outlier Analysis. Lijun Zhang

Outlier Analysis. Lijun Zhang Outlier Analysis Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Extreme Value Analysis Probabilistic Models Clustering for Outlier Detection Distance-Based Outlier Detection Density-Based

More information

Mechanism of inhibition defines CETP activity: a mathematical model for CETP in vitro

Mechanism of inhibition defines CETP activity: a mathematical model for CETP in vitro Mechanism o inhibition deines activity: a mathematical model or in vitro Laura K. Potter, 1,2,* Dennis L. Sprecher, Max C. Walker, and Frank L. Tobin Scientiic Computing and Mathematical Modeling,* GlaxoSmithKline,

More information

Ordinal Data Modeling

Ordinal Data Modeling Valen E. Johnson James H. Albert Ordinal Data Modeling With 73 illustrations I ". Springer Contents Preface v 1 Review of Classical and Bayesian Inference 1 1.1 Learning about a binomial proportion 1 1.1.1

More information

Individual Differences in Attention During Category Learning

Individual Differences in Attention During Category Learning Individual Differences in Attention During Category Learning Michael D. Lee (mdlee@uci.edu) Department of Cognitive Sciences, 35 Social Sciences Plaza A University of California, Irvine, CA 92697-5 USA

More information

Introduction to Bayesian Analysis 1

Introduction to Bayesian Analysis 1 Biostats VHM 801/802 Courses Fall 2005, Atlantic Veterinary College, PEI Henrik Stryhn Introduction to Bayesian Analysis 1 Little known outside the statistical science, there exist two different approaches

More information

Bayesian Statistics Estimation of a Single Mean and Variance MCMC Diagnostics and Missing Data

Bayesian Statistics Estimation of a Single Mean and Variance MCMC Diagnostics and Missing Data Bayesian Statistics Estimation of a Single Mean and Variance MCMC Diagnostics and Missing Data Michael Anderson, PhD Hélène Carabin, DVM, PhD Department of Biostatistics and Epidemiology The University

More information

Bayesian and Frequentist Approaches

Bayesian and Frequentist Approaches Bayesian and Frequentist Approaches G. Jogesh Babu Penn State University http://sites.stat.psu.edu/ babu http://astrostatistics.psu.edu All models are wrong But some are useful George E. P. Box (son-in-law

More information

Kelvin Chan Feb 10, 2015

Kelvin Chan Feb 10, 2015 Underestimation of Variance of Predicted Mean Health Utilities Derived from Multi- Attribute Utility Instruments: The Use of Multiple Imputation as a Potential Solution. Kelvin Chan Feb 10, 2015 Outline

More information

SUPPLEMENTARY MATERIAL. Impact of Vaccination on 14 High-Risk HPV type infections: A Mathematical Modelling Approach

SUPPLEMENTARY MATERIAL. Impact of Vaccination on 14 High-Risk HPV type infections: A Mathematical Modelling Approach SUPPLEMENTARY MATERIAL Impact of Vaccination on 14 High-Risk HPV type infections: A Mathematical Modelling Approach Simopekka Vänskä, Kari Auranen, Tuija Leino, Heini Salo, Pekka Nieminen, Terhi Kilpi,

More information

Identification of Tissue Independent Cancer Driver Genes

Identification of Tissue Independent Cancer Driver Genes Identification of Tissue Independent Cancer Driver Genes Alexandros Manolakos, Idoia Ochoa, Kartik Venkat Supervisor: Olivier Gevaert Abstract Identification of genomic patterns in tumors is an important

More information

Bayesian Logistic Regression Modelling via Markov Chain Monte Carlo Algorithm

Bayesian Logistic Regression Modelling via Markov Chain Monte Carlo Algorithm Journal of Social and Development Sciences Vol. 4, No. 4, pp. 93-97, Apr 203 (ISSN 222-52) Bayesian Logistic Regression Modelling via Markov Chain Monte Carlo Algorithm Henry De-Graft Acquah University

More information

Mediation Analysis With Principal Stratification

Mediation Analysis With Principal Stratification University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 3-30-009 Mediation Analysis With Principal Stratification Robert Gallop Dylan S. Small University of Pennsylvania

More information

Type and quantity of data needed for an early estimate of transmissibility when an infectious disease emerges

Type and quantity of data needed for an early estimate of transmissibility when an infectious disease emerges Research articles Type and quantity of data needed for an early estimate of transmissibility when an infectious disease emerges N G Becker (Niels.Becker@anu.edu.au) 1, D Wang 1, M Clements 1 1. National

More information

Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology

Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology Sylvia Richardson 1 sylvia.richardson@imperial.co.uk Joint work with: Alexina Mason 1, Lawrence

More information

Risk-prediction modelling in cancer with multiple genomic data sets: a Bayesian variable selection approach

Risk-prediction modelling in cancer with multiple genomic data sets: a Bayesian variable selection approach Risk-prediction modelling in cancer with multiple genomic data sets: a Bayesian variable selection approach Manuela Zucknick Division of Biostatistics, German Cancer Research Center Biometry Workshop,

More information

Bayesian Bi-Cluster Change-Point Model for Exploring Functional Brain Dynamics

Bayesian Bi-Cluster Change-Point Model for Exploring Functional Brain Dynamics Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'18 85 Bayesian Bi-Cluster Change-Point Model for Exploring Functional Brain Dynamics Bing Liu 1*, Xuan Guo 2, and Jing Zhang 1** 1 Department

More information

Methods Research Report. An Empirical Assessment of Bivariate Methods for Meta-Analysis of Test Accuracy

Methods Research Report. An Empirical Assessment of Bivariate Methods for Meta-Analysis of Test Accuracy Methods Research Report An Empirical Assessment of Bivariate Methods for Meta-Analysis of Test Accuracy Methods Research Report An Empirical Assessment of Bivariate Methods for Meta-Analysis of Test Accuracy

More information

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination Timothy N. Rubin (trubin@uci.edu) Michael D. Lee (mdlee@uci.edu) Charles F. Chubb (cchubb@uci.edu) Department of Cognitive

More information

Practical Bayesian Design and Analysis for Drug and Device Clinical Trials

Practical Bayesian Design and Analysis for Drug and Device Clinical Trials Practical Bayesian Design and Analysis for Drug and Device Clinical Trials p. 1/2 Practical Bayesian Design and Analysis for Drug and Device Clinical Trials Brian P. Hobbs Plan B Advisor: Bradley P. Carlin

More information

Tutorial #7A: Latent Class Growth Model (# seizures)

Tutorial #7A: Latent Class Growth Model (# seizures) Tutorial #7A: Latent Class Growth Model (# seizures) 2.50 Class 3: Unstable (N = 6) Cluster modal 1 2 3 Mean proportional change from baseline 2.00 1.50 1.00 Class 1: No change (N = 36) 0.50 Class 2: Improved

More information

Bayesian Joint Modelling of Benefit and Risk in Drug Development

Bayesian Joint Modelling of Benefit and Risk in Drug Development Bayesian Joint Modelling of Benefit and Risk in Drug Development EFSPI/PSDM Safety Statistics Meeting Leiden 2017 Disclosure is an employee and shareholder of GSK Data presented is based on human research

More information

Exploring the Influence of Particle Filter Parameters on Order Effects in Causal Learning

Exploring the Influence of Particle Filter Parameters on Order Effects in Causal Learning Exploring the Influence of Particle Filter Parameters on Order Effects in Causal Learning Joshua T. Abbott (joshua.abbott@berkeley.edu) Thomas L. Griffiths (tom griffiths@berkeley.edu) Department of Psychology,

More information

Primary Health-Care Physicians Attitude toward Depression in Primary Health-Care Centers Saudi Arabia - Al-Ahsa, 2018

Primary Health-Care Physicians Attitude toward Depression in Primary Health-Care Centers Saudi Arabia - Al-Ahsa, 2018 Original Article Print ISSN: 2321-6379 Online ISSN: 2321-595X DOI: 10.17354/ijss/2018/296 Primary Health-Care Physicians Attitude toward Depression in Primary Health-Care Centers Saudi Arabia - Al-Ahsa,

More information

OBSERVATIONAL MEDICAL OUTCOMES PARTNERSHIP

OBSERVATIONAL MEDICAL OUTCOMES PARTNERSHIP OBSERVATIONAL Patient-centered observational analytics: New directions toward studying the effects of medical products Patrick Ryan on behalf of OMOP Research Team May 22, 2012 Observational Medical Outcomes

More information

bivariate analysis: The statistical analysis of the relationship between two variables.

bivariate analysis: The statistical analysis of the relationship between two variables. bivariate analysis: The statistical analysis of the relationship between two variables. cell frequency: The number of cases in a cell of a cross-tabulation (contingency table). chi-square (χ 2 ) test for

More information

arxiv: v2 [stat.ap] 7 Dec 2016

arxiv: v2 [stat.ap] 7 Dec 2016 A Bayesian Approach to Predicting Disengaged Youth arxiv:62.52v2 [stat.ap] 7 Dec 26 David Kohn New South Wales 26 david.kohn@sydney.edu.au Nick Glozier Brain Mind Centre New South Wales 26 Sally Cripps

More information

Machine Learning Statistical Learning. Prof. Matteo Matteucci

Machine Learning Statistical Learning. Prof. Matteo Matteucci Machine Learning Statistical Learning Pro. Matteo Matteucci Statistical Learning Outline o What Is Statistical Learning? Why estimate? How do we estimate? The trade-o between prediction accuracy & model

More information

FREQUENCY COMPRESSION AND FREQUENCY SHIFTING FOR THE HEARING IMPAIRED

FREQUENCY COMPRESSION AND FREQUENCY SHIFTING FOR THE HEARING IMPAIRED FREQUENCY COMPRESSION AND FREQUENCY SHIFTING FOR THE HEARING IMPAIRED Francisco J. Fraga 1, Alan M. Marotta 2 1 National Institute o Telecommunications, Santa Rita do Sapucaí - MG, Brazil 2 National Institute

More information

MEA DISCUSSION PAPERS

MEA DISCUSSION PAPERS Inference Problems under a Special Form of Heteroskedasticity Helmut Farbmacher, Heinrich Kögel 03-2015 MEA DISCUSSION PAPERS mea Amalienstr. 33_D-80799 Munich_Phone+49 89 38602-355_Fax +49 89 38602-390_www.mea.mpisoc.mpg.de

More information

Report Reference Guide. THERAPY MANAGEMENT SOFTWARE FOR DIABETES CareLink Report Reference Guide 1

Report Reference Guide. THERAPY MANAGEMENT SOFTWARE FOR DIABETES CareLink Report Reference Guide 1 Report Reference Guide THERAPY MANAGEMENT SOFTWARE FOR DIABETES CareLink Report Reference Guide 1 How to use this guide Each type of CareLink report and its components are described in the following sections.

More information

How do we combine two treatment arm trials with multiple arms trials in IPD metaanalysis? An Illustration with College Drinking Interventions

How do we combine two treatment arm trials with multiple arms trials in IPD metaanalysis? An Illustration with College Drinking Interventions 1/29 How do we combine two treatment arm trials with multiple arms trials in IPD metaanalysis? An Illustration with College Drinking Interventions David Huh, PhD 1, Eun-Young Mun, PhD 2, & David C. Atkins,

More information

Quantifying and Mitigating the Effect of Preferential Sampling on Phylodynamic Inference

Quantifying and Mitigating the Effect of Preferential Sampling on Phylodynamic Inference Quantifying and Mitigating the Effect of Preferential Sampling on Phylodynamic Inference Michael D. Karcher Department of Statistics University of Washington, Seattle April 2015 joint work (and slide construction)

More information

A Bayesian Measurement Model of Political Support for Endorsement Experiments, with Application to the Militant Groups in Pakistan

A Bayesian Measurement Model of Political Support for Endorsement Experiments, with Application to the Militant Groups in Pakistan A Bayesian Measurement Model of Political Support for Endorsement Experiments, with Application to the Militant Groups in Pakistan Kosuke Imai Princeton University Joint work with Will Bullock and Jacob

More information

Bayesian Estimation of a Meta-analysis model using Gibbs sampler

Bayesian Estimation of a Meta-analysis model using Gibbs sampler University of Wollongong Research Online Applied Statistics Education and Research Collaboration (ASEARC) - Conference Papers Faculty of Engineering and Information Sciences 2012 Bayesian Estimation of

More information

Business Statistics Probability

Business Statistics Probability Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

Joint Spatio-Temporal Modeling of Low Incidence Cancers Sharing Common Risk Factors

Joint Spatio-Temporal Modeling of Low Incidence Cancers Sharing Common Risk Factors Journal of Data Science 6(2008), 105-123 Joint Spatio-Temporal Modeling of Low Incidence Cancers Sharing Common Risk Factors Jacob J. Oleson 1,BrianJ.Smith 1 and Hoon Kim 2 1 The University of Iowa and

More information

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY Lingqi Tang 1, Thomas R. Belin 2, and Juwon Song 2 1 Center for Health Services Research,

More information

Bayesian Benefit-Risk Assessment. Maria Costa GSK R&D

Bayesian Benefit-Risk Assessment. Maria Costa GSK R&D Assessment GSK R&D Disclosure is an employee and shareholder of GSK Data presented is based on human research studies funded and sponsored by GSK 2 Outline 1. Motivation 2. GSK s Approach to Benefit-Risk

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction 1.1 Motivation and Goals The increasing availability and decreasing cost of high-throughput (HT) technologies coupled with the availability of computational tools and data form a

More information

An Empirical Assessment of Bivariate Methods for Meta-analysis of Test Accuracy

An Empirical Assessment of Bivariate Methods for Meta-analysis of Test Accuracy Number XX An Empirical Assessment of Bivariate Methods for Meta-analysis of Test Accuracy Prepared for: Agency for Healthcare Research and Quality U.S. Department of Health and Human Services 54 Gaither

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Please note the page numbers listed for the Lind book may vary by a page or two depending on which version of the textbook you have. Readings: Lind 1 11 (with emphasis on chapters 10, 11) Please note chapter

More information

Bayesians methods in system identification: equivalences, differences, and misunderstandings

Bayesians methods in system identification: equivalences, differences, and misunderstandings Bayesians methods in system identification: equivalences, differences, and misunderstandings Johan Schoukens and Carl Edward Rasmussen ERNSI 217 Workshop on System Identification Lyon, September 24-27,

More information

Outline. Hierarchical Hidden Markov Models for HIV-Transmission Behavior Outcomes. Motivation. Why Hidden Markov Model? Why Hidden Markov Model?

Outline. Hierarchical Hidden Markov Models for HIV-Transmission Behavior Outcomes. Motivation. Why Hidden Markov Model? Why Hidden Markov Model? Hierarchical Hidden Markov Models for HIV-Transmission Behavior Outcomes Li-Jung Liang Department of Medicine Statistics Core Email: liangl@ucla.edu Joint work with Rob Weiss & Scott Comulada Outline Motivation

More information

HIDDEN MARKOV MODELS FOR ALCOHOLISM TREATMENT TRIAL DATA

HIDDEN MARKOV MODELS FOR ALCOHOLISM TREATMENT TRIAL DATA HIDDEN MARKOV MODELS FOR ALCOHOLISM TREATMENT TRIAL DATA By Kenneth Shirley, Dylan Small, Kevin Lynch, Stephen Maisto, and David Oslin Columbia University, University of Pennsylvania and Syracuse University

More information

Bayesian methods in health economics

Bayesian methods in health economics Bayesian methods in health economics Gianluca Baio University College London Department of Statistical Science g.baio@ucl.ac.uk Seminar Series of the Master in Advanced Artificial Intelligence Madrid,

More information

Beyond bumps: Spiking networks that store sets of functions

Beyond bumps: Spiking networks that store sets of functions Neurocomputing 38}40 (2001) 581}586 Beyond bumps: Spiking networks that store sets of functions Chris Eliasmith*, Charles H. Anderson Department of Philosophy, University of Waterloo, Waterloo, Ont, N2L

More information

Bayesian Models for Combining Data Across Subjects and Studies in Predictive fmri Data Analysis

Bayesian Models for Combining Data Across Subjects and Studies in Predictive fmri Data Analysis Bayesian Models for Combining Data Across Subjects and Studies in Predictive fmri Data Analysis Thesis Proposal Indrayana Rustandi April 3, 2007 Outline Motivation and Thesis Preliminary results: Hierarchical

More information

A Predictive Chronological Model of Multiple Clinical Observations T R A V I S G O O D W I N A N D S A N D A M. H A R A B A G I U

A Predictive Chronological Model of Multiple Clinical Observations T R A V I S G O O D W I N A N D S A N D A M. H A R A B A G I U A Predictive Chronological Model of Multiple Clinical Observations T R A V I S G O O D W I N A N D S A N D A M. H A R A B A G I U T H E U N I V E R S I T Y O F T E X A S A T D A L L A S H U M A N L A N

More information

Classification of Insulin Dependent Diabetes Mellitus Blood Glucose Level Using Support Vector Machine

Classification of Insulin Dependent Diabetes Mellitus Blood Glucose Level Using Support Vector Machine IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, PP 36-42 www.iosrjournals.org Classification of Insulin Dependent Diabetes Mellitus Blood Glucose Level Using Support

More information

Hierarchical Bayesian Methods for Estimation of Parameters in a Longitudinal HIV Dynamic System

Hierarchical Bayesian Methods for Estimation of Parameters in a Longitudinal HIV Dynamic System Hierarchical Bayesian Methods for Estimation of Parameters in a Longitudinal HIV Dynamic System Yangxin Huang, Dacheng Liu and Hulin Wu Department of Biostatistics & Computational Biology University of

More information

Combining Risks from Several Tumors Using Markov Chain Monte Carlo

Combining Risks from Several Tumors Using Markov Chain Monte Carlo University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln U.S. Environmental Protection Agency Papers U.S. Environmental Protection Agency 2009 Combining Risks from Several Tumors

More information

PSYCH-GA.2211/NEURL-GA.2201 Fall 2016 Mathematical Tools for Cognitive and Neural Science. Homework 5

PSYCH-GA.2211/NEURL-GA.2201 Fall 2016 Mathematical Tools for Cognitive and Neural Science. Homework 5 PSYCH-GA.2211/NEURL-GA.2201 Fall 2016 Mathematical Tools for Cognitive and Neural Science Homework 5 Due: 21 Dec 2016 (late homeworks penalized 10% per day) See the course web site for submission details.

More information

1 Introduction. st0020. The Stata Journal (2002) 2, Number 3, pp

1 Introduction. st0020. The Stata Journal (2002) 2, Number 3, pp The Stata Journal (22) 2, Number 3, pp. 28 289 Comparative assessment of three common algorithms for estimating the variance of the area under the nonparametric receiver operating characteristic curve

More information

A Bayesian Nonparametric Model Fit statistic of Item Response Models

A Bayesian Nonparametric Model Fit statistic of Item Response Models A Bayesian Nonparametric Model Fit statistic of Item Response Models Purpose As more and more states move to use the computer adaptive test for their assessments, item response theory (IRT) has been widely

More information

Still important ideas

Still important ideas Readings: OpenStax - Chapters 1 11 + 13 & Appendix D & E (online) Plous - Chapters 2, 3, and 4 Chapter 2: Cognitive Dissonance, Chapter 3: Memory and Hindsight Bias, Chapter 4: Context Dependence Still

More information

A Bayesian Perspective on Unmeasured Confounding in Large Administrative Databases

A Bayesian Perspective on Unmeasured Confounding in Large Administrative Databases A Bayesian Perspective on Unmeasured Confounding in Large Administrative Databases Lawrence McCandless lmccandl@sfu.ca Faculty of Health Sciences, Simon Fraser University, Vancouver Canada Summer 2014

More information

Bayesian Hierarchical Models for Fitting Dose-Response Relationships

Bayesian Hierarchical Models for Fitting Dose-Response Relationships Bayesian Hierarchical Models for Fitting Dose-Response Relationships Ketra A. Schmitt Battelle Memorial Institute Mitchell J. Small and Kan Shao Carnegie Mellon University Dose Response Estimates using

More information

A Comparison of Methods for Determining HIV Viral Set Point

A Comparison of Methods for Determining HIV Viral Set Point STATISTICS IN MEDICINE Statist. Med. 2006; 00:1 6 [Version: 2002/09/18 v1.11] A Comparison of Methods for Determining HIV Viral Set Point Y. Mei 1, L. Wang 2, S. E. Holte 2 1 School of Industrial and Systems

More information

Modeling Function of Nectar Foraging of Honeybees using Operant Conditioning

Modeling Function of Nectar Foraging of Honeybees using Operant Conditioning International Journal o Computer Applications (975 8887) Volume 4 No.7, October 24 Modeling Function o Nectar Foraging o Honeybees using Operant Conditioning Subha Fernando Faculty o Inormation Technology

More information

Bayesian Nonparametric Methods for Precision Medicine

Bayesian Nonparametric Methods for Precision Medicine Bayesian Nonparametric Methods for Precision Medicine Brian Reich, NC State Collaborators: Qian Guan (NCSU), Eric Laber (NCSU) and Dipankar Bandyopadhyay (VCU) University of Illinois at Urbana-Champaign

More information

Prediction of Successful Memory Encoding from fmri Data

Prediction of Successful Memory Encoding from fmri Data Prediction of Successful Memory Encoding from fmri Data S.K. Balci 1, M.R. Sabuncu 1, J. Yoo 2, S.S. Ghosh 3, S. Whitfield-Gabrieli 2, J.D.E. Gabrieli 2 and P. Golland 1 1 CSAIL, MIT, Cambridge, MA, USA

More information

A Comparison of Collaborative Filtering Methods for Medication Reconciliation

A Comparison of Collaborative Filtering Methods for Medication Reconciliation A Comparison of Collaborative Filtering Methods for Medication Reconciliation Huanian Zheng, Rema Padman, Daniel B. Neill The H. John Heinz III College, Carnegie Mellon University, Pittsburgh, PA, 15213,

More information

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training.

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training. Supplementary Figure 1 Behavioral training. a, Mazes used for behavioral training. Asterisks indicate reward location. Only some example mazes are shown (for example, right choice and not left choice maze

More information

Practical Bayesian Optimization of Machine Learning Algorithms. Jasper Snoek, Ryan Adams, Hugo LaRochelle NIPS 2012

Practical Bayesian Optimization of Machine Learning Algorithms. Jasper Snoek, Ryan Adams, Hugo LaRochelle NIPS 2012 Practical Bayesian Optimization of Machine Learning Algorithms Jasper Snoek, Ryan Adams, Hugo LaRochelle NIPS 2012 ... (Gaussian Processes) are inadequate for doing speech and vision. I still think they're

More information

Stroke-free duration and stroke risk in patients with atrial fibrillation: simulation using a Bayesian inference

Stroke-free duration and stroke risk in patients with atrial fibrillation: simulation using a Bayesian inference Asian Biomedicine Vol. 3 No. 4 August 2009; 445-450 Brief Communication (Original) Stroke-free duration and stroke risk in patients with atrial fibrillation: simulation using a Bayesian inference Tomoki

More information

Bayesian Latent Subgroup Design for Basket Trials

Bayesian Latent Subgroup Design for Basket Trials Bayesian Latent Subgroup Design for Basket Trials Yiyi Chu Department of Biostatistics The University of Texas School of Public Health July 30, 2017 Outline Introduction Bayesian latent subgroup (BLAST)

More information

TRIPODS Workshop: Models & Machine Learning for Causal I. & Decision Making

TRIPODS Workshop: Models & Machine Learning for Causal I. & Decision Making TRIPODS Workshop: Models & Machine Learning for Causal Inference & Decision Making in Medical Decision Making : and Predictive Accuracy text Stavroula Chrysanthopoulou, PhD Department of Biostatistics

More information

Bayesian growth mixture models to distinguish hemoglobin value trajectories in blood donors

Bayesian growth mixture models to distinguish hemoglobin value trajectories in blood donors Bayesian growth mixture models to distinguish hemoglobin value trajectories in blood donors Kazem Nasserinejad 1 Joost van Rosmalen 1 Mireille Baart 2 Katja van den Hurk 2 Dimitris Rizopoulos 1 Emmanuel

More information

TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS)

TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS) TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS) AUTHORS: Tejas Prahlad INTRODUCTION Acute Respiratory Distress Syndrome (ARDS) is a condition

More information

Motivation Empirical models Data and methodology Results Discussion. University of York. University of York

Motivation Empirical models Data and methodology Results Discussion. University of York. University of York Healthcare Cost Regressions: Going Beyond the Mean to Estimate the Full Distribution A. M. Jones 1 J. Lomas 2 N. Rice 1,2 1 Department of Economics and Related Studies University of York 2 Centre for Health

More information

Study And Development Of Digital Image Processing Tool For Application Of Diabetic Retinopathy

Study And Development Of Digital Image Processing Tool For Application Of Diabetic Retinopathy Study And Development O Digital Image Processing Tool For Application O Diabetic Retinopathy Name: Ms. Jyoti Devidas Patil mail ID: jyot.physics@gmail.com Outline 1. Aims & Objective 2. Introduction 3.

More information

A MODEL FOR MIXED CONTINUOUS AND DISCRETE RESPONSES WITH POSSIBILITY OF MISSING RESPONSES

A MODEL FOR MIXED CONTINUOUS AND DISCRETE RESPONSES WITH POSSIBILITY OF MISSING RESPONSES Journal o Sciences Islamic epublic o Iran : 5-6 National Center For Scientiic esearch ISSN 6- A MODE FO MIED CONTINUOUS AND DISCETE ESPONSES WITH POSSIBIIT OF MISSING ESPONSES M. Ganjali Department o Statistics

More information

9 research designs likely for PSYC 2100

9 research designs likely for PSYC 2100 9 research designs likely for PSYC 2100 1) 1 factor, 2 levels, 1 group (one group gets both treatment levels) related samples t-test (compare means of 2 levels only) 2) 1 factor, 2 levels, 2 groups (one

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

LINEAR REGRESSION FOR BIVARIATE CENSORED DATA VIA MULTIPLE IMPUTATION

LINEAR REGRESSION FOR BIVARIATE CENSORED DATA VIA MULTIPLE IMPUTATION STATISTICS IN MEDICINE Statist. Med. 18, 3111} 3121 (1999) LINEAR REGRESSION FOR BIVARIATE CENSORED DATA VIA MULTIPLE IMPUTATION WEI PAN * AND CHARLES KOOPERBERG Division of Biostatistics, School of Public

More information

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition List of Figures List of Tables Preface to the Second Edition Preface to the First Edition xv xxv xxix xxxi 1 What Is R? 1 1.1 Introduction to R................................ 1 1.2 Downloading and Installing

More information

Structured models for dengue epidemiology

Structured models for dengue epidemiology Structured models for dengue epidemiology submitted by Hannah Woodall for the degree of Doctor of Philosophy of the University of Bath Department of Mathematical Sciences September 24 COPYRIGHT Attention

More information

Tuning Epidemiological Study Design Methods for Exploratory Data Analysis in Real World Data

Tuning Epidemiological Study Design Methods for Exploratory Data Analysis in Real World Data Tuning Epidemiological Study Design Methods for Exploratory Data Analysis in Real World Data Andrew Bate Senior Director, Epidemiology Group Lead, Analytics 15th Annual Meeting of the International Society

More information

Prediction and Inference under Competing Risks in High Dimension - An EHR Demonstration Project for Prostate Cancer

Prediction and Inference under Competing Risks in High Dimension - An EHR Demonstration Project for Prostate Cancer Prediction and Inference under Competing Risks in High Dimension - An EHR Demonstration Project for Prostate Cancer Ronghui (Lily) Xu Division of Biostatistics and Bioinformatics Department of Family Medicine

More information

Report Reference Guide

Report Reference Guide Report Reference Guide How to use this guide Each type of CareLink report and its components are described in the following sections. Report data used to generate the sample reports was from sample patient

More information

Multiple trait model combining random regressions for daily feed intake with single measured performance traits of growing pigs

Multiple trait model combining random regressions for daily feed intake with single measured performance traits of growing pigs Genet. Sel. Evol. 34 (2002) 61 81 61 INRA, EDP Sciences, 2002 DOI: 10.1051/gse:2001004 Original article Multiple trait model combining random regressions for daily feed intake with single measured performance

More information

Method Comparison for Interrater Reliability of an Image Processing Technique in Epilepsy Subjects

Method Comparison for Interrater Reliability of an Image Processing Technique in Epilepsy Subjects 22nd International Congress on Modelling and Simulation, Hobart, Tasmania, Australia, 3 to 8 December 2017 mssanz.org.au/modsim2017 Method Comparison for Interrater Reliability of an Image Processing Technique

More information

2014 Modern Modeling Methods (M 3 ) Conference, UCONN

2014 Modern Modeling Methods (M 3 ) Conference, UCONN 2014 Modern Modeling (M 3 ) Conference, UCONN Comparative study of two calibration methods for micro-simulation models Department of Biostatistics Center for Statistical Sciences Brown University School

More information

Application of Bayesian Extrapolation in Pediatric Drug Development Program

Application of Bayesian Extrapolation in Pediatric Drug Development Program Application of Bayesian Extrapolation in Pediatric Drug Development Program May Mo, Amy Xia Amgen Regulatory-Industry Statistics Workshop Sep 13, 2018 Washington, DC Disclaimer: The views expressed herein

More information

Using dynamic prediction to inform the optimal intervention time for an abdominal aortic aneurysm screening programme

Using dynamic prediction to inform the optimal intervention time for an abdominal aortic aneurysm screening programme Using dynamic prediction to inform the optimal intervention time for an abdominal aortic aneurysm screening programme Michael Sweeting Cardiovascular Epidemiology Unit, University of Cambridge Friday 15th

More information

Advanced Bayesian Models for the Social Sciences

Advanced Bayesian Models for the Social Sciences Advanced Bayesian Models for the Social Sciences Jeff Harden Department of Political Science, University of Colorado Boulder jeffrey.harden@colorado.edu Daniel Stegmueller Department of Government, University

More information

Meta-analysis of two studies in the presence of heterogeneity with applications in rare diseases

Meta-analysis of two studies in the presence of heterogeneity with applications in rare diseases Meta-analysis of two studies in the presence of heterogeneity with applications in rare diseases Christian Röver 1, Tim Friede 1, Simon Wandel 2 and Beat Neuenschwander 2 1 Department of Medical Statistics,

More information

CHAPTER VI RESEARCH METHODOLOGY

CHAPTER VI RESEARCH METHODOLOGY CHAPTER VI RESEARCH METHODOLOGY 6.1 Research Design Research is an organized, systematic, data based, critical, objective, scientific inquiry or investigation into a specific problem, undertaken with the

More information

Treatment effect estimates adjusted for small-study effects via a limit meta-analysis

Treatment effect estimates adjusted for small-study effects via a limit meta-analysis Treatment effect estimates adjusted for small-study effects via a limit meta-analysis Gerta Rücker 1, James Carpenter 12, Guido Schwarzer 1 1 Institute of Medical Biometry and Medical Informatics, University

More information

An application of a pattern-mixture model with multiple imputation for the analysis of longitudinal trials with protocol deviations

An application of a pattern-mixture model with multiple imputation for the analysis of longitudinal trials with protocol deviations Iddrisu and Gumedze BMC Medical Research Methodology (2019) 19:10 https://doi.org/10.1186/s12874-018-0639-y RESEARCH ARTICLE Open Access An application of a pattern-mixture model with multiple imputation

More information

A Bayesian Account of Reconstructive Memory

A Bayesian Account of Reconstructive Memory Hemmer, P. & Steyvers, M. (8). A Bayesian Account of Reconstructive Memory. In V. Sloutsky, B. Love, and K. McRae (Eds.) Proceedings of the 3th Annual Conference of the Cognitive Science Society. Mahwah,

More information

Introduction. Patrick Breheny. January 10. The meaning of probability The Bayesian approach Preview of MCMC methods

Introduction. Patrick Breheny. January 10. The meaning of probability The Bayesian approach Preview of MCMC methods Introduction Patrick Breheny January 10 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/25 Introductory example: Jane s twins Suppose you have a friend named Jane who is pregnant with twins

More information

Missing data. Patrick Breheny. April 23. Introduction Missing response data Missing covariate data

Missing data. Patrick Breheny. April 23. Introduction Missing response data Missing covariate data Missing data Patrick Breheny April 3 Patrick Breheny BST 71: Bayesian Modeling in Biostatistics 1/39 Our final topic for the semester is missing data Missing data is very common in practice, and can occur

More information

Author summary. Introduction

Author summary. Introduction A Probabilistic Palimpsest Model of Visual Short-term Memory Loic Matthey 1,, Paul M Bays 2,3, Peter Dayan 1 1 Gatsby Computational Neuroscience Unit, University College London, London, United Kingdom

More information

Application of Multinomial-Dirichlet Conjugate in MCMC Estimation : A Breast Cancer Study

Application of Multinomial-Dirichlet Conjugate in MCMC Estimation : A Breast Cancer Study Int. Journal of Math. Analysis, Vol. 4, 2010, no. 41, 2043-2049 Application of Multinomial-Dirichlet Conjugate in MCMC Estimation : A Breast Cancer Study Geetha Antony Pullen Mary Matha Arts & Science

More information