The Factorized Self-Controlled Case Series Method: An Approach for Estimating the Effects of Many Drugs on Many Outcomes

Size: px

Start display at page:

Download "The Factorized Self-Controlled Case Series Method: An Approach for Estimating the Effects of Many Drugs on Many Outcomes"

Richard Clarke
5 years ago
Views:

1 The Factorized Sel-Controlled Case Series Method: An Approach or Estimating the Eects o Many Drugs on Many Outcomes Ramin Moghaddass, Cynthia Rudin MIT CSAIL and Sloan School o Management, Massachusetts Institute o Technology, Cambridge, USA and David Madigan Department o Statistics, Columbia University, New York, USA March 25, 25 Abstract We provide a hierarchical Bayesian model or estimating the eects o transient drug exposures on a collection o health outcomes, where the eects o all drugs and all outcomes are estimated simultaneously. The method possesses properties that allow it to handle important challenges o dealing with large-scale longitudinal observational databases. In particular, this model is a generalization o the selcontrolled case series (SCCS) method, meaning that certain patient speciic baseline rates never need to be estimated. Further, this model is ormulated with layers o latent actors, which substantially reduces the number o parameters and helps with interpretability by illuminating latent classes o drugs and outcomes. We demonstrate the approach by estimating the eects o various time-sensitive insulin treatments or diabetes. Keywords: Bayesian Analysis, Drug Saety, Sel-controlled Case Series, Matrix Factorization, Causal Inerence

2 Introduction The medical community, the pharmaceutical industry, and health authorities are obligated to conirm that marketed medical products and prescription drugs have acceptable beneitrisk proiles; in act, these entities have come under increasing scientiic, regulatory and public scrutiny to accurately estimate the eects o drugs. The increasing availability o large-scale longitudinal observational healthcare databases (LODs) opens up exciting new opportunities to add to the evidence base concerning these issues, though the complexity and scale o some o the available databases presents interesting statistical and computational challenges. In what ollows we ocus on using longitudinal observational databases to make inerence about the eects o many drugs with respect to many outcomes simultaneously. Many research studies have attempted to characterize the relationship between timevarying drug exposures and adverse events (AEs) related to health outcomes (e.g., in Madigan 29, Greene et al. 2, Chui et al. 24, Benchimol et al. 23, Simpson et al. 23) and the use o LODs to study individual drug-adverse eect combinations has become routine. The medical literature provides many examples and many dierent epidemiological and statistical approaches, oten tailored to the speciic drug and speciic adverse eect. There is a major law in these approaches o estimating the eect o one drug on one outcome, which is that it is very clear that many drugs are closely related to each other (there are dozens o antibiotics or instance), and many heath outcomes are closely related to each other (e.g., strokes, heart attacks, and other vascular diseases); we should leverage this inormation to better understand causal eects. In this work, we borrow strength across both drugs and outcomes in order to obtain better estimates or each individual drug and outcome. Since we are interested in the causal eect o the drug, and not in the patient-speciic baseline rate o the outcome, we use the ideas o the sel-controlled case series (SCCS) method o Farrington (995), which is a conditional Poisson regression approach wherein each patient serves as his or her own control. The SCCS method has been widely applied, especially in vaccine studies (see the tutorial o Whitaker et al. 26). SCCS controls or all ixed patient-level covariates but remains susceptible to time-varying conounding. The standard SCCS method ocuses on one drug and one outcome. Simp- 2

3 son et al. (23) proposed the multiple sel-controlled case series (MSCCS) method that simultaneously provides eect estimates or multiple drugs and a single outcome. In act, the MSCCS provides a sel-controlled approach that can control or many time-varying covariates, drugs being a special case. Bayesian implementations o both SCCS and MSCCS provide signiicant advantages, especially in high-dimensional settings with thousands or even tens o thousands o drugs and outcomes and even larger numbers o interactions. Neither SCCS nor MSCCS account or the act that many drugs/treatments naturally orm classes and thereore regression coeicients or drugs rom within a single class might reasonably be modeled as arising exchangeably rom a common prior distribution. Adverse events and health conditions can also be organized hierarchically, again aording an opportunity to borrow strength across related outcomes. For both drugs and outcomes, the hierarchy could extend to multiple levels. In what ollows we ormalize these ideas within the ramework o latent actor Bayesian hierarchical models. Factor models, which have been traditionally used in behavioral sciences, provide a lexible ramework or modeling multivariate data via unobserved latent actors (e.g., Ghosh & Dunson 29). In this paper, we do not impose speciic latent structure a priori. However, our approach can also be used or cases where classes o drugs and conditions are known a priori. We will show that the latent actor approach not only brings more interpretability to our model, but also can signiicantly contribute to reducing the computational complexity. To our knowledge, only ew works have considered matrix actorization-based data analysis techniques or drug saety and surveillance (or example, Zitnik & Zupan 25, or drug-induced liver injury prediction and Cobanoglu et al. 23, or predicting drug-target interactions in neurobiological disorders, which are both very dierent than our study). We introduce three models or predicting the eects o multiple drugs on multiple outcomes that use hierarchical Bayesian analysis. The irst model (Model ) does not use latent actors, and borrows strength across all drugs and outcomes. The second model (Model ) uses one set o latent drugs and one set o latent outcomes, through a single matrix actorization. The third model (Model 2) uses two sets o latent actors, by actoring the matrix o coeicients into three matrices; one or converting drugs to latent drugs, another or converting outcomes to latent outcomes, and the third or modeling the eects 3

4 o latent drugs on latent outcomes. By allowing or latent actors, the second and third models provide an increased level o interpretability, use ewer variables, and are thus more computationally eicient to estimate. The rest o this paper is organized as ollows: In Sections 2, 3, 4, and 5 we describe the model and the Bayesian inerence procedure. In Section 6 the structure o the MCMC or parameter estimation is illustrated. We then use a series o simulations in Section 7 to show that we can recover the true generating parameters rom data. Finally, we demonstrate the approach in Section 8 or estimating the eects o various insulin treatments or diabetes. Our proposed methodology has broader applicability beyond estimating the eects o drugs considered in this paper. 2 Factorized Sel-Controlled Case Series - Notation and Framework The actorized sel-controlled case series (FSCCS) method generalizes the sel-controlled case series (SCCS) to handle multiple treatments and multiple outcomes. The notation used throughout the paper is as ollows: N: number o patients (i indexes individuals rom to N). x idj : binary indicator relecting whether patient i is exposed to drug j on interval d. x id = [x id, x id2,..., x idj ] : the vector o exposed drugs or patient i on interval d. J: number o drugs (treatments). O: number o health outcomes (adverse events). D o i : the set o observation intervals where patient i has outcome o. τ o i : the number o observation intervals where patient i has outcome o (the size o D o i ). yid o : binary indicator relecting whether patient i has outcome o on interval d. y o i = [y o i, y o i2,..., y o iτ o i ] : the vector o observed outcomes o or patient i. ϕ o i : baseline incidence o outcome o or patient i. 4

5 Φ = ϕ... ϕ O : : : ϕ N... ϕo N : baseline incidence matrix. β o j : regression coeicients associated with outcome o and drug j. β o = [β o, β o 2,..., β o J ] : regression coeicients associated with outcome o. β... β O B = : : : βj... βj O : drug-outcome coeicient matrix. λ id = exp(ϕo i + x id βo ): the Poisson event rate o outcome o, or patient i, on interval d. Outcomes occur according to a nonhomogeneous Poisson process, where drug exposure can modulate the rate over time. Patient i has an individual baseline rate o exp(ϕ o i ) or outcome o that remains constant over time. Drug j has a multiplicative eect o exp(β o j ) on the individual baseline rate exp(ϕ o i ) during its exposure period. The Poisson event rate or outcome o and patient i on interval d according to the SCCS is: The key beneit o the SCCS is that the ϕ o i λ o id = exp(ϕ o i + x idβ o ). terms do not need to be modeled, since we are interested in the ratio o Poisson intensities with and without the drug. For instance, considering only one drug j, comparing the intensity ratio or day d to a dierent day d 2 with no exposure to the drug, we have: λ o id = exp(ϕo i + βj o ) λ o id 2 exp(ϕ o i + βo j ) = exp(βo j ). As the Poisson rate is assumed to be constant in each interval, the number o outcomes o observed or patient i on interval d is distributed as a Poisson random variable (r.v.) denoted by Y o id as Pr(Yid o = yid x o id ) = e λo id λ o yid o id yid o!. Based on the above, the contribution to the likelihood or patient i and outcome o or the observed sequence o events y o i = [y o i, y o i2,..., y o iτ o i ], conditioned on the observed exposures 5

6 x i = [x i,..., x iτ o i ] is L i = Pr(yi o x i ) = Pr(yid x o id ) = exp e ϕo i +x id βo d Di o d Di o d Di o = exp e ϕo i e x id βo e ϕo i yo id y o d Di o d Di o d Di o id! ( ) y o = exp ϕ o i n o i e ϕo i e x id βo id e x id βo yid o!, d D o i d D o i ( e x id βo ) y o id ( e ϕo i +x id βo ) y o id y o id! (2.) where n o i = d yo id. It is assumed here that uture outcomes are independent o past outcomes and also outcomes are independent o each other. One could orm the ull likelihood to estimate the unknown parameters (Φ, B). In order to avoid estimating the nuisance parameter set Φ, we can condition on its suicient statistic, which removes the dependence on Φ. The cumulative intensity is a sum (rather than an integral) since we assume a constant intensity over each interval. Conditioning on n o i yields the ollowing likelihood or person i: L o i = Pr(y o i x i, n o i ) = Notice that because n o i d D o i exp Pr(y o id x id) Pr(n o i x i) d D o i d = ex id βo e x id βo Pr(y o id x id) d Di o exp λ o n o i d D i o id λ o d D i o id n o i! y o id. (2.2) is suicient, the individual likelihood in the above expression no longer contains Φ. Assuming that patients are independent and conditions are independent, the ull conditional likelihood or event o is simply the product o the individual likelihoods (i.e. L o = N L o i ). Intuitively it ollows that i i has no outcomes o type o, it cannot i= provide any inormation about the relative rate o outcome o. We now present three hierarchical models or multiple drug, multiple outcome selcontrolled case series and discuss how to estimate the drug-outcome coeicient matrix B. 6

7 Two o the models have latent actors that allow B to be expressed in a simpler and more interpretable way. In our experiments, the empirical perormance o these methods is about the same. 3 Model - No latent actors Instead o estimating each coeicient independently, we borrow strength over both drugs and outcomes, which adds substantial regularization. One should think o this as being relevant or a set o related outcomes and drugs, e.g., heart-disease related outcomes and the set o drugs one might prescribe or heart-related conditions. We shrink the estimates or drug j over all outcomes to µ j by placing an independent normal prior on each βj o as βj o N (µ j, σj 2 ), (j, o), where µ j N (, γ), j. We also assume uniorm priors or hyperparameters σ j and γ as σ j U(, a), j and γ U(, a), where parameter a is a user-deined constant. A natural extension o this model (not explored here) would be to have drugs belong to certain classes o drugs, so that priors can be deined based on each class o drugs; similarly with outcomes. The posterior probability o our model is now deined as Pr(B, µ, σ, γ y, a) = Pr(y B) Pr(B µ, σ) Pr(µ γ) Pr(γ a) Pr(σ a) (3.) o j ( ( exp x id β o) d exp ( x id β o) i d D i N (βj o µ j, σj 2 ) o j ) y (o) id N (µ j, γ) j Pr(σ j a) Pr(γ a). The negative log-posterior (which can be used or inding the MAP solution i desired) is: L = log (Pr(B, µ, σ, γ y, a)). The graphical representation o this model is shown in Figure. 7

8 a γ.. µ j σ j β o j o = : O j = : J B Figure : Graphical representation o Model 4 Model - One level o latent actors This model is motivated by two considerations. First, modeling the ull posterior distribution o Model can be computationally expensive, particularly or large N, J, and O, where J and O determine the number o variables to be estimated within the B matrix. Second, Model overlooks the act that drugs and outcomes might come rom a smaller number o latent classes; or instance, there are commonly several drugs that are extremely similar to each other or treating a set o highly related illnesses. We consider F latent actors or drugs and outcomes. We model the J O matrix B as B = L (D) L (O), where L (D) = L (D),... L (D),F : : : L (D) J,... L (D) J,F, L(O) = L (O),... L (O),O : : : L (O) F,... L (O) F,O. 8

9 This way, we do not assume we know in advance which drugs have similar eects on which outcomes, instead we estimate this rom data. The number o latent actors F can be determined by cross-validation. The total number o latent actors is J F + F O, which can be substantially less than J O. For drug latent actors, we place independent normal priors on the entries o L (D) as L (D) j N (µ (D), σ (D)2 ), (j, ), where µ (D) N (, γ (D) ),. Similarly, we deine normal priors on the entries o L (O) as L (O) o N (µ(o), σ (O)2 ), (, o), where µ (O) We assume uniorm priors or hyperparameters σ j N (, γ (O) ),. and γ as σ (D) U(, a),, σ (O) U(, a),, γ (D) U(, b), γ (O) U(, b), where (a, b) are known parameters. The posterior probability is now deined as: Pr(L (D), L (O), µ (D), µ (O), σ (D), σ (O), γ (D), γ (O) y) = Pr(y L (D), L (O) ) (4.) Pr(L (D) µ (D), σ (D) ) Pr(L (O) µ (O), σ (O) ) Pr(µ (D) γ (D) ) Pr(µ O γ (O) ) Pr(σ (D) a) Pr(σ (O) a) Pr(γ (D) b) Pr(γ (O) b) ( ( exp x id β o) ) (o) y id o d exp ( x id β o) J i F j= = F = Pr(σ (D) d D o i N (L (D) j µ(d) N (µ (D), γ (D) ), σ (D)2 ) F = (O) F N (L (O) = o= N (µ (O), γ (O) ) o µ(o) b) Pr(σ (O) ) b) Pr(γ (D) a) Pr(γ (D) a)., σ (O)2 ) The graphical representation o this hierarchical Bayesian model is given in Figure 2. 5 Model 2 - Two levels o latent actors Here we represent B as: B = L (D) L (F ) L (O), 9

10 b. a a b γ (D) γ (O) σ (D) µ (D) µ (O) σ (O) L (D) j L (O) o j o = : F L (D) L (O) B Figure 2: The Graphical Framework or Hierarchical Bayesian Model with one level o latent actors where B = L (D),... L (D),F : : : L (D) J,... L (D) J,F L (F ),... L (F ),F 2 : : : L (F ) F,... L (F ) F,F 2 L (O),... L (O),O : : : L (O) F 2,... L (O) F 2,O. The number o latent actors is thus J F + F F 2 + F 2 O, which can be less than the number o variables o Model in many cases. Its major beneit is interpretability, since now the number o latent drug actors and the number o latent outcome actors can be estimated dierently. L (D) represents the relationship between latent drugs and latent

11 drug-related actors, L (F ) represents the relationship between latent drug-related actors and health outcome-related actors, and L (O) represents the relationship between drugrelated actors and health outcome-related actors. L (O) is really the core set o variables since they relate the latent treatments to the latent outcomes. Note that latent actors could be some known categories/classes o drugs and health conditions. The idea here is to analyze all related drugs and related health events simultaneously. The priors are L (D) jf N (µ (D) F µ (D) F N (, γ (D) ), µ (F ) N (, γ (F ) ), µ (O) F 2 σ (O) F 2, σ (D)2 ), L (O) F 2 o N (µ(o) F F 2 N (, γ (O) ), σ (D) F, σ (O)2 F 2 ), L (F ) F F 2 N (µ (F ), σ (F )2 ), U(, a), σ (F ) U(, a), U(, a), γ (D) U(, b), γ (F ) U(, b), and γ (O) U(, b) or all (F, F 2, j, o). The posterior probability is: Pr(L (D), L (F ), L (O), µ (D), µ (F ), µ (O), σ (D), σ (F ), σ (O), γ (D), γ (F ), γ (O) y) (5.) = Pr(y L (D), L (F ), L (O) ) Pr(L (D) µ (D), σ (D) ) Pr(L (F ) µ (F ), σ (D) ) Pr(L (O) µ (O), σ (O) ) Pr(µ (D) γ (D) ) Pr(µ (F ) γ (F ) ) Pr(µ (O) γ (O) ) Pr(σ (D) a) Pr(σ (F ) a) Pr(σ (O) a) Pr(γ (D) b) Pr(γ (F ) b) Pr(γ (O) b). Table compares the number o parameters in each o the three models. Models and 2 have much ewer parameters when F, F, and F 2 are lower than J and O. Table : The number o parameters and hyperparameters in each model Model Name # o Parameters # o Hyperparameters Total Model J O 2 J + J O + 2 J + Model J F + F O 4 F + 2 J F + F O + 4 F + 2 Model 2 J F + F F 2 + F 2 O 2 F + 2 F J F + F F 2 + F 2 O + 2 F + 2 F Parameter Estimation Using Markov Chain Monte Carlo We use Markov Chain Monte Carlo (MCMC) to approximate the entries o B, speciically random walk Metropolis (RWM). The algorithm employs a Gaussian proposal distribution

12 J t (x, x ) which proposes a new parameter set x given the current parameter set x. We denote Θ as the set o all parameters in the model excluding B. Step. set t =. Generate an initial state {B, Θ } with positive probability Pr(B, Θ y). Repeat the ollowing until stationary distribution and the desired number o samples are reached considering optional burn-in and/or thinning. Step 2. Sample {B, Θ } rom the symmetric proposal distribution J t ({B, Θ },. ). Step 3. Calculate the acceptance probability ( α = min, Pr(B, Θ y) Pr(B t, Θ t y) Step 4. Draw a random number u rom Uni(, ). I u α, accept the proposal state {B, Θ } and set B t = B, Θ t = Θ, else set B t = B t, Θ t = Θ t. Set t : t +. Our implementation uses a component-wise sampling approach, that is, when a sample or one dimension is proposed, all other dimensions are held ixed. Ater the current sample get accepted or rejected, we then move on to the next dimension and repeat the process while holding all other dimensions constant. sampling approaches may be necessary. ). For truly large-scale applications, blocked 7 Simulation Study We will show that or data generated rom our model, the true data-generating parameters B can be recovered. We simulated sample trajectories o drug exposure and health outcomes or 6 patients over 6 days. We set the number o drugs to J = 4, and the number o health conditions to O = 4. Each patient randomly took between J drugs over the 6 days. The average exposure period was assumed to be 2% o the study interval or each patient (that is, on average, each patient was exposed to one or more drugs or at least 2 days). The exposure intervals are randomly selected, so these intervals could be multiple non-consecutive days, multiple consecutive days, or a combination o both. Drugs can have positive or negative contributions to the likelihood and intensity rate o each outcome. For each model (Model, Model, Model 2), we generated the elements o B according to the model s hierarchy. Figure 3 and Figures 2-3 (which are given in the Appendix) show the posterior density or each parameter o B, or Models, and 2, as estimated by MCMC 2

13 sampling. These igures show that the posterior samples were concentrated around the true values and the posterior mean o each variable was generally close to its true value. We β, β, β, β, Posterior density β 2, β 2, β 2, β 2, β 3, β 3, β 3, β 3, β 4, β 4, β 4, β 4,4 Figure 3: Normalized histograms o posterior samples or each element o B in Model. The vertical line indicates the true value. summarize Figures 3, 2, and 3 in Figure 4, which provides a scatter plot o the posterior means and true values or each o the three models. It can be observed that each o the posterior means are very close to their true values. 8 Application to Blood Glucose Analysis or Diabetes We consider an application to Insulin-Dependent Diabetes Mellitus, where our goal is to predict blood glucose level outcomes under dierent circumstances o a patient s daily lie, including their recent eating history, exercise, and insulin injections. Our data are longitudinal measurements taken multiple times per day rom 7 patients (this is the AIM-94 data set provided by Michael Kahn, MD, PhD, Washington University, St. Louis, MO, 3

14 2 a) Model b) Model c) Model Posterior mean Posterior mean Posterior mean True values True values True values Figure 4: Scatter plot o posterior means vs. true parameter values o elements o B or Models, and 2. Bache & Lichman 23). We aim mainly to illustrate (i) how the models we introduce can be used with complex longitudinal data to predict outcomes, (ii) the prediction power and (iii) interpretability o the proposed models. It is well known that current therapies or regulating glucose level in diabetics are very challenging and oten rustrating, as they require patients to continuously regulate diet, exercise and various medications any deviations can be dangerous (Benchimol et al. 23). Our dataset and code will be made publicly available or the purpose o reproducibility. Blood glucose measurements, symptoms and insulin treatments were recorded with timestamps or each patient, over the course o several weeks to months. The two main classes o health outcomes considered here are hyperglycemia (high blood glucose) and hypoglycemia (low blood glucose). All other health outcomes we deine later are related to these two classes. Figure 5 provides a schematic o the type o data we are considering or one patient over a course o day. We will describe the setup in more detail: Drugs/Treatments: Diet, exercise, and injected insulin were treated as three dierent classes o treatments. It is obvious that interactions among these treatments are important. Insulin doses are given one or more times a day, typically beore meals and sometimes also at bedtime. Three types o insulin were considered: () regular, (2) Neutral Protamine Hagedorn (NPH), and (3) Ultralente. Each insulin type has its own characteristic time o onset (O), time o peak action (P), and eective duration (D). The exposure time intervals 4

15 NPH injection More-than-usual exercise activity Less-than-usual meal ingestion NPH injection Events Measurements Pre-breakast BG measurement (35 mg/dl) Pre-lunch BG measurement (6 mg/dl) Pre-lunch BG measurement (47 mg/dl) Pre-snack BG measurement (8 mg/dl) 6: 8: : 2: 4: 6: 8: 2: 22: Time Figure 5: Sample longitudinal traces or a patient with multiple drug/treatments exposures and blood Glucose measurements over a day. Downwards arrows indicate only measurements. Upwards arrows indicate treatments. or peak, and duration used in this paper, which were provided with the dataset, are shown in Table 2 in the Exposure Time column in the bottom several rows o the table labeled Insulin. Based on the actual time o injection, we determined the intervals at which the patient is at peak and/or within the duration o an insulin injection. Based on this, six types o treatments were considered, () regular insulin on peak, (2) regular insulin on duration, (3) NPH insulin on peak, (4) NPH insulin on duration, (5) Ultralente insulin on peak, and (6) Ultralente insulin on duration. At each interval o time, the patient can be either insulin ree or subject to one o the above six exposures. The second class o treatment is or exercise, which may have complex eects on glucose level. For example, glucose levels can all during exercise but also quite a ew hours aterwards. Three types o exercise are reported, () normal exercise, (2) lower than normal exercise, and (3) higher than normal exercise. Each type o exercise was considered separately as a single treatment. The third class o treatment is or diet, which also can have complex eects on the glucose level. For example, a larger meal may lead to a longer and possibly higher elevation 5

16 Table 2: The list o drugs/treatments and health outcomes and their known correlations. P means strong positive correlation, and N means strong negative correlation, D is lower decile and D means highest decile. Treatment Exposure Time Health Outcome Low Glucose Level High Glucose Level. Too low 2. Low 3. D 4. Hypo Symptom 5. Too High 6. High 7. D Exercise Diet Level Time. Normal -4 h 2. Too High -4 h P P P P N N N 3. Low -4 h 4. Normal -4 h 5. Too High -4 h N N N N N N N 6. Low -4 h 7. Ater Meal -4 h N N N N P P P 8. Beore Meal -4 h P P P P N N N Insulin Regular NPH Ultralente 9. Peak -3 h P P P P N N N. Duration -5 h P P P P N N N. Peak 4-6 h P P P P N N N 2. Duration -2 h P P P P N N N 3. Peak 4-24 h P P P P N N N 4. Duration -27 h P P P P N N N o blood glucose. Missing a meal may put the patient at risk or low glucose levels in the hours that ollow. Three types o diet are reported: () normal diet, (2) higher than normal diet, and (3) lower than normal diet. Each o these types o diet were taken as a single treatment. Since measurements were collected beore a meal, ater a meal, and at other times, we considered two other eatures in the model, () beore meal measurement o blood glucose and (2) ater meal measurement, and we treated them as binary eatures. These extra eatures allow us to distinguish whether the measurement was made beore or ater the meal (there is a big dierence between glucose measurements taken beore a meal and ater a meal). Based on all o the treatments described, the total number o variables associated with treatments in the model is 4 (J = 4). The variables are all listed on Table 2 in the Treatment column on the let. Health Outcomes. The outcomes are divided into two major categories, i) high glucose 6

17 level and ii) low glucose level. Given that normal pre-meal blood glucose ranges rom approximately 8-2 mg, and post-meal blood glucose ranges rom 8-4 mg/dl (Bache & Lichman 23), we considered seven health outcomes or glucose level: extremely low (below 4 mg/dl), low (between 4-8 mg/dl), high (over 4 mg/dl), extremely high (over 8 mg/dl), lower decile (lower % o glucose level or each patient), upper decile (upper % o glucose level or each patient), and hypoglycemic (low glucose) symptoms. Thus, the total number o outcomes considered or our analysis is O = 7. We can perorm an evaluation only on intervals where we have glucose measurements, thus we only use those intervals. True Relationships Between Drugs and Outcomes. We wanted to determine whether our model reproduces known relationships between treatments and glucose levels, even though it was not explicitly designed to do so. The inormation about true relationships within Table 2 come rom material accompanying the dataset and We denoted known positive eects in Table 2 by P, strong negative eects by N, and relationships that were unknown were denoted by dashes. For example, we expect NPH injection on peak to decrease the likelihood o having Too High glucose level, so the correlation between NPH on peak and Too High glucose level is negative (N ). Mixing. We perormed cross-validation, dividing our data into ive olds, each with the same number o patients, training our models on our olds and testing on the ith. We removed the irst 5 iterations (as burn-in) o Metropolis-Hastings sampling, and obtained 6 additional samples to estimate the posterior. Figure 6 shows samples rom the posterior o one o the variables, β Too high GL NPHonpeak, or ive separate model instantiations (Model, Model with F =2, Model with F =3, Model 2 with F = 2, F 2 = 2, and Model 2 with F = 3, F 2 = 3. Recall that in Models and 2, we sample elements o the matrices o latent actors and then calculate B. From this igure, we observe reasonable mixing or all models, and we observe that models with latent actors (Models and 2) have better mixing and convergence, possibly due to the smaller number o variables. Computation. The number o parameters diers substantially between models, which aects CPU time o the MCMC sampler. In Figure 7, the number o parameters or each model and the associated CPU time or running MCMC are shown. This igure shows a 7

18 Model Model F=2 Model F=3 Model 2 F =2, F 2 =2 Model 2 F =3, F 2 = Iteration No Too high GL Figure 6: Samples rom the posterior o βnph on peak or ive model instantiations (let), and their histograms (right). The irst 5 samples were discarded as burn-in. We observe better mixing and convergence in the models with latent actors (Model and Model 2). clear correlation between the number o variables and CPU time, that is CPU time increases with the number o parameters. In particular, Model takes a long time to run, because it has substantially more variables than the other models. Interestingly, it seems that using latent actors has a purpose beyond interpretability and regularization, in particular, it helps substantially with tractability. Interpretation o coeicients in B and comparison with ground truth. We compare the estimated coeicients in B to the ground truth signs o coeicients given in Table 2. It is not necessarily the case that the signs o the estimated coeicients need to agree with the ground truth signs in order or the model to perorm well, but it is a reasonable aspect o the model to consider. To perorm this comparison, we ranked the estimated coeicients in B and used these rankings and the true signs o coeicients to generate an ROC curve. That is, the ROC curve was generated by placing thresholds at each point in the ranking, and calculating the true positive rate and alse positive rate with respect to the true coeicients. We computed these ROC curves or all ive model 8

19 6 5.5 Model CPU time (hours) Model 2 F =2, F 2 =2 Model F=3 Model 2 F =3, F 2 = Model F= Number o parameters Figure 7: Comparison between the CPU time and number o variables in the ive model instantiations. Model has larger number o variables and signiicantly higher CPU time. instantiations to obtain Figure 8. These ROC curves indicate that all models perormed well, in the sense o estimating reasonable signs or the coeicients, and also indicates that models with more latent actors perormed better than Model. The models with two latent treatment and outcome actors perormed slightly better than the models with three latent treatment and outcome latent actors, though there was no signiicant dierence in perormance between Model and Model 2 or the same number o latent treatment and outcome actors. Drug Surveillance. We evaluated prediction perormance o our model as ollows: or each patient we calculated the Poisson rate o each condition at each hour considering all drug exposures. We then checked whether or not the patient had that condition at that time. The Poisson rate acts as the score o each patient with regards to each condition. In Figure 9, we present the actual glucose level or a patient at 2 measurement points (upper igure) as well as the estimated intensity rate o the Too-High glucose level or the same patient (lower igure). Each point in this igure represents the estimated x ˆβ id o, where ˆβ o are the estimated regression coeicients associated with too-high glucose level and x id is the known vector o drug exposures at interval d. The igure shows that the estimated intensity rate is reasonably close to the actual level o glucose, particularly when the glucose level is actually too high. In Figure, we repeated the analysis or Too-Low glucose level. It can 9

20 .9 Model Model, F=2 Model, F=3 Model 2, F =2, F 2 =2 Model 2, F =3, F 2 =3 AUC=8% AUC=85% AUC=82% AUC=85% AUC=83% Sensitivity Sensitivity Sensitivity Sensitivity Sensitivity Speciicity.5 Speciicity.5 Speciicity.5 Speciicity.5 Speciicity Figure 8: Receiver Operating Characteristic (curve) or evaluating the signs o coeicients against true signs or ive model instantiations. See the text or details o how these curves were generated. be observed rom this igure that the times where this coeicient is particularly large are the same times where the glucose level drops substantially. This kind o dramatic agreement was not observed or all patients nor all conditions, so below we describe a more general evaluation procedure. In Figure, we show the box plots or the estimated Poisson rates o the two conditions o Too Low glucose level and Too High glucose level on all seventy patients in the test sets. For comparison, we also plotted the box plots o the estimated Poisson rates or normal conditions, where patients did not have too-high or too-low glucose levels. For clarity and airness, we normalized the estimated Poisson rates or each patient. It can be observed rom this igure that the estimated Poisson rate o these conditions are elevated when patients actually suer rom Too-Low (or Too-High, respectively) glucose. Thus, our model could be a useul approach or monitoring the likelihood o a condition, given the timing o the drugs recently taken by the patient. The results were consistent across all ive model instantiations. In Table 3, we report an AUC (area under the ROC curve) value that measures the probability that a method ranks a positive condition timepoint higher than a timepoint with no condition, or the same patient. In particular, we are testing whether the Poisson rate o a patient with a health condition is higher than the estimated rate o the same 2

21 Glucose Level Estiamted Intensity or Too High Glucose level Patient # Measurement # Measurement # Figure 9: Monitoring Too-High glucose levels over 2 hours. The upper igure shows the true glucose levels obtained by measurement, and the lower igure shows the estimated x id ˆβ o or the Too-High glucose outcome over 2 hours. patient when no condition is present. For each model instantiation, we trained the model on our training olds and then calculated the AUC on the ith old, and we repeated this or each condition. We reported the average and standard deviation o AUC over all patients in the test sets. Again D is the lower decile (lower % o glucose level or each patient), and D is the upper decile (upper % o glucose level or each patient). Intuition o B as compared with other methods We compared the perormance o the proposed models with that o the univariate sel-controlled case series (SCCS), and multi-variate sel-controlled case series (MSCCS) (Simpson et al. 23) with and without an l 2 regularization term on the coeicients. For each model instantiation and each method, the estimated entries o B were compared to their known eects (positive or negative) rom Table 2. For each method, we provide the area under the curve (AUC) or this comparison in Table 4. We also reported the mean, median, and standard deviation o all estimated coeicients in the group o Table 2 s positive group and Table 2 s negative group. Better models should have higher AUC, and the estimated coeicients o B should agree in sign with those in Table 2. From Table 4, we observe that as expected, the MSCCS perormed better than the SCCS, the regularized MSCCS worked slightly better than the normal MSCCS, and all proposed 2

22 Glucose Level Estimated Intensity Too low Glucose Level Patient # Measurement # Measurement # Figure : Monitoring too-low glucose levels over 2 hours. The upper igure shows the true glucose levels obtained by measurement, and the lower igure shows the estimated x id ˆβ o or the too-low glucose outcome over 2 hours. Bayesian model instantiations (Model -2) perormed better than all o the traditional models, yielding higher AUC s and better agreement in the mean signs o coeicients; urther the standard deviations or the coeicient values were substantially lower. These perormance beneits come in addition to the other beneits o our approach discussed earlier, including computational tractability and interpretability o the latent actors. 9 Concluding Remarks The novel elements o this work are as ollows. () We estimate the eects o many drugs on many health outcomes simultaneously. Borrowing strength across similar drugs and outcomes allows us to create better estimates across both drugs and outcomes. (2) We use latent actors to encode latent classes o drugs and outcomes, to help with interpretability, and to provide a major computational beneit. A computational beneit is also provided by the SCCS s ramework, since we do not need to estimate the baseline rates o outcomes or each patient. This approach is scalable to large longitudinal observational databases, is generally applicable (also to problems beyond healthcare), and provides a level o interpretability to physicians and patients that was not previously possible. 22

23 Model Model, F=2 Model, F=3 Model 2, F = F 2 = 2 Model 2, F = F 2 = 3 Normalized Rate Normal Too Low Normal Too Low Normal Too Low Normal Too Low Normal Too Low Model Model, F=2 Model, F=3 Model 2, F = F 2 = 2 Model 2, F = F 2 = 3 Normalized Rate Normal Too High Normal Too High Normal Too High Normal Too High Normal Too High Figure : Comparison between the box-plots or Too-High and Too-Low glucose level and normal condition or all ive model instantiations, where the Poisson rates were normalized or each person. Acknowledgement We grateully acknowledge unding rom the Natural Science and Engineering Research Council o Canada (NSERC) and the MIT Big Data Initiative. Reerences Bache, K. & Lichman, M. (23), UCI machine learning repository, University o Caliornia, Irvine, School o Inormation and Computer Sciences, Benchimol, E., Hawken, S., Kwong, J. & Wilson, K. (23), Saety and utilization o inluenza immunization in children with inlammatory bowel disease, Pediatrics 3(6), Chui, C., Man, K., Cheng, C.-L., Chan, E., Lau, W., Cheng, V., Wong, D., Yang Kao, Y.-H. & Wong, I. (24), An investigation o the potential association between retinal 23

24 Table 3: Average and standard deviation (sd) o AUC over 5 olds. The entities that were ranked or each score are pairs o timepoints or a patient, where or one o the timepoints the patient had a condition and or the other timepoint the patient did not have a condition. The AUC was calculated as the raction o pairs o times (the irst time with the condition and the second time without the condition) where the estimated rate was higher when the patient had the condition vs. when the patient did not have the condition. Too low Low Too high High D D Hypo Symptom Model Model, F = 2 Model, F = 3 Model 2, F = F 2 = 2 Model 2, F = 3, F 2 = 3 Mean 7.3% 62.92% 62.98% 6.26% 58.2% 56.39% 59.7% sd 9.56% 2.79% 3.38% 2.7% 3.93% 3.5% 4.95% Mean 7.86% 62.7% 63.6% 6.58% 57.97% 56.5% 56.7% sd 8.28% 2.47% 3.4% 2.% 3.36% 3.4% 3.9% Mean 7.66% 62.68% 62.92% 6.5% 57.97% 56.35% 56.33% sd 7.89% 2.52% 3.4% 2.2% 3.37% 2.9% 2.87% Mean 7.9% 62.7% 62.97% 6.44% 58.% 55.9% 56.88% sd 9.5% 2.43% 3.8% 2.3% 3.34% 3.6% 3.2% Mean 7.7% 62.73% 62.99% 6.47% 58.5% 56.5% 56.68% sd 9.52% 2.44% 3.6% 2.7% 3.3% 3.5% 2.79% detachment and oral luoroquinolones: A sel-controlled case series study, Journal o Antimicrobial Chemotherapy 69(9), Cobanoglu, M., Liu, C., Hu, F., Oltvai, Z. & Bahar, I. (23), Predicting drug-target interactions using probabilistic matrix actorization, Journal o Chemical Inormation and Modeling 53(2), Farrington, P. (995), Relative incidence estimation rom case series or vaccine saety evaluation, Biometrics 5, Ghosh, J. & Dunson, D. (29), Deault prior distributions and eicient posterior computation in Bayesian actor analysis, Journal o Computational and Graphical Statistics 8(2), Greene, S., Kulldor, M., Yin, R., Yih, W., Lieu, T., Weintraub, E. & Lee, G. (2), Near 24

25 Table 4: Comparison with existing models Measure Model Mode F = 2 Model F = 3 Method Model 2 Model 2 F = 2, F 2 = 2 F = 3, F 2 = 3 SCCS MSCCS Regularized MSCCS AUC B Mean Median sd B + Mean Median sd real-time vaccine saety surveillance with partially accrued data, Pharmacoepidemiology and Drug Saety 2(6), Madigan, D. (29), The sel-controlled case series: Recent developments, 94(6), Simpson, S., Madigan, D., Zorych, I., Schuemie, M., Ryan, P. & Suchard, M. (23), Multiple sel-controlled case series or large-scale longitudinal observational databases, Biometrics 69(4), Whitaker, H. J., Farrington, C. P., Spiessens, B. & Musonda, P. (26), Tutorial in biostatistics: the sel-controlled case series method, Statistics in Medicine 25(), Zitnik, M. & Zupan, B. (25), Matrix actorization-based data usion or drug-induced liver injury prediction, Systems Biomedicine pp

26 Appendix: Figures 2 and β, β, β, β, Posterior density β 2, β 2, β 2, β 2, β 3,.4..2 β 3, β 3, β 3, β 4, β 4, β 4, β 4,4 Figure 2: Normalized histograms o posterior samples or each element o B in Model. The vertical line indicates the true value. 26

27 β, β, β, β,4.4 Posterior density β 2, β 2, β 2, β 2, β 3, β 3, β 3, β 3, β 4, β 4, β 4, β 4,4 Figure 3: Normalized histograms o posterior samples or each element o B in Model 2. The vertical line indicates the true value. 27

The Factorized Self-Controlled Case Series Method: An Approach for Estimating the Effects of Many Drugs on Many Outcomes

The Factorized Self-Controlled Case Series Method: An Approach for Estimating the Effects of Many Drugs on Many Outcomes Journal o Machine Learning Research 7 (26) -24 Submitted 8/5; Revised 2/5; Published 6/6 The Factorized Sel-Controlled Case Series Method: An Approach or Estimating the Eects o Many Drugs on Many Outcomes