OBSERVATIONAL Some Recent OMOP Research Results William DuMouchel on behalf of the OMOP research team MidWest Biopharmaceutical Statistics Workshop Muncie, IN 21 May 2013
Agenda Overview of OMOP publications of the last year with focus on three papers Explanation and Example of OMOP Proposal: An Empirical Approach to Measuring and Calibrating for Error in Observational Analyses 2
OMOP Publications (2012-2013) Madigan D, Ryan PB, Schuemie M, Stang PE, Overhage JM, Hartzema AG, et al. Evaluating the Impact of Database Heterogeneity on Observational Study Results. American journal of epidemiology. May 5, 2013. Fox BI, Hollingsworth JC, Gray MD, Hollingsworth ML, Gao J, Hansen RA. Developing an expert panel process to refine health outcome definitions in observational data. Journal of Biomedical Informatics. 2013: In press. Ryan PB, Suchard MA, Schuemie M & Madigan D (2013): Learning from Epidemiology: Interpreting Observational Database Studies for the Effects of Medical Products, Statistics in Biopharmaceutical Research, DOI:10.1080/19466315.2013.791638 Madigan D, Ryan PB, Schuemie M. Does design matter? Systematic evaluation of the impact of analytical choices on effect estimates in observational studies. Therapeutic Advances in Drug Safety. February 25, 2013 2013. Hansen RA, Gray M, Fox BI, Hollingsworth J, Gao J, Hollingsworth M, Carpenter DM. Expert panel assessment of acute liver injury identification in observational data. Research in Social and Administrative Pharmacy 2012 (in press). Statistical Methods in Medical Research. February 2013; 22 (1). Special Issue: Effectiveness Research. Guest editors: Xiaochun Li, Lingling Li and Patrick Ryan. Suchard MA, Simpson SE, Zorych I, Ryan P, Madigan D. Massive Parallelization of Serial Inference Algorithms for a Complex Generalized Linear Model. ACM Trans Model Comput Simul. 2013;23(1):1-17. Ryan, P. B. (2012). Using exploratory visualization in the analysis of medical product safety in observational healthcare data. In A. Krause & M. O Connell (Eds.), A picture is worth a thousand tables: Graphics in life sciences (pp. 391-413). New York, New York: Springer-Verlag. Ryan PB, Madigan D, Stang PE, Overhage JM, Racoosin JA and Hartzema AG. (2012), Empirical assessment of methods for risk identification in healthcare data: results from the experiments of the Observational Medical Outcomes Partnership. Statist. Med. doi: 10.1002/sim.5620. Reich C, Ryan PB, Stang PE, Rocca M. Evaluation of alternative standardized terminologies for medical conditions within a network of observational healthcare databases. Journal of Biomedical Informatics 2012; 45: 689-696. Page D, Santos Costa V, Natarajan S, Barnard A, Peissig P, and Caldwell M. Identifying adverse drug events by relational learning. In AAAI-12, pages 1599-1605, Toronto, 2012. Harpaz R, DuMouchel W, Shah NH, Madigan D, Ryan P, Friedman C. Novel Data-Mining Methodologies for Adverse Drug Event Discovery and Analysis. Clinical Pharmacology & Therapeutics (2 May 2012) 3
Publications of Interest (2012-2013) Zhou, X., Murugesan, S., Bhullar, H., Liu, Q., Cai, B., Wentworth, C., Bate A. (2013) An Evaluation of the Thin Database in the Omop Common Data Model for Active Drug Safety Surveillance. Drug Safety: 1-16. DOI: 10.1007/s40264-012-0009-3. DeFalco F, Ryan P, Soledad Cepeda M (2012) Applying standardized drug terminologies to observational healthcare databases: a case study on opioid exposure. Health Services and Outcomes Research Methodology: 1-10. DOI 10.1007/s10742-012-0102-1. Harpaz R, Vilar S, DuMouchel W, Salmasian H, Haerian K, et al. (2013) Combing signals from spontaneous reports and electronic health records for detection of adverse drug reactions. J Am Med Inform Assoc 2013;20:413 419. Kahn MG, Batson D, Schilling LM (2012) Data model considerations for clinical effectiveness researchers. Med Care 50 Suppl: S60-67. Kahn MG, Raebel MA, Glanz JM, Riedlinger K, Steiner JF (2012) A pragmatic framework for single-site and multisite data quality assessment in electronic health record-based clinical research. Med Care 50 Suppl: S21-29. Schuemie MJ, Coloma PM, Straatman H, Herings RM, Trifiro G, et al. (2012) Using Electronic Health Care Records for Drug Safety Signal Detection: A Comparative Evaluation of Statistical Methods. Med Care. Platt, R. and Carnahan, R. (2012), The U.S. Food and Drug Administration's Mini-Sentinel Program. Pharmacoepidem. Drug Safe., 21: 1 303. doi: 10.1002/pds.3230 Robb, M. A., Racoosin, J. A., Sherman, R. E., Gross, T. P., Ball, R., Reichman, M. E., Midthun, K. and Woodcock, J. (2012), The US Food and Drug Administration's Sentinel Initiative: Expanding the horizons of medical product safety. Pharmacoepidem. Drug Safe., 21: 9 11. doi: 10.1002/pds.2311 Curtis, L. H., Weiner, M. G., Boudreau, D. M., Cooper, W. O., Daniel, G. W., Nair, V. P., Raebel, M. A., Beaulieu, N. U., Rosofsky, R., Woodworth, T. S. and Brown, J. S. (2012), Design considerations, architecture, and use of the Mini-Sentinel distributed data system. Pharmacoepidem. Drug Safe., 21: 23 31. doi: 10.1002/pds.2336 4
Review Paper: Harpaz R, DuMouchel W, Shah NH, Madigan D, Ryan P, Friedman C. Novel Data-Mining Methodologies for Adverse Drug Event Discovery and Analysis. Clinical Pharmacology & Therapeutics (2 May 2012) Growth of Pharmaco- Vigilance Data Mining Literature 5
Analyses of Spontaneous Reports Enhancements: Bayesian Shrinkage Covariate Adjustments Drug Interactions Concomitant Drugs 6
Claims Records and EHRs Designs: Cohort, Case-Control, Self-Controlled, Many variations on each basic design 7
Non-Standard Data Sources Quantitative Structure Activity Relationship (QSAR) models Matthews et al. from the FDA CDER Commercial QSAR software applied to drug candidates from AERS DPAs Separate QSAR models for several Cardiac, Liver and Urinary ADRs Reported 78% Specificity and 56% Sensitivity PubChem Assays: National Center for Biotechnology Information 487,000 drug activity screens vs Canadian ADR database Pouliot, Chiang and Butte (Clin. Pharm. Ther. 90) Use Logistic Regression Reported 75% Specificity when compared to literature or drug labels Biomedical Literature: NLP Analyses Focusing on Drug-AE Pairs Shetty and Dalal, J. Am. Med. Inf. Ass. 18 report 70% sensitivity, 40% PPV User-Generated Content in Health Forums Leaman et al, Proc 2010 Wkshp Biomed Nat Lang Process Web crawler, NLP extracts ADR clinical concepts: 78% precision, 70% recall 8
Harpaz R, Vilar S, DuMouchel W, Salmasian H, Haerian K, et al. (2013) Combining signals from spontaneous reports and electronic health records for detection of adverse drug reactions. J Amer Med Inform Assoc Objective Data-mining algorithms that can produce accurate signals of potentially novel adverse drug reactions (ADRs) are a central component of pharmacovigilance. We propose a signaldetection strategy that combines the adverse event reporting system (AERS) of the Food and Drug Administration and electronic health records (EHRs) by requiring signaling in both sources. We claim that this approach leads to improved accuracy of signal detection when the goal is to produce a highly selective ranked set of candidate ADRs. Materials and methods Our investigation was based on over 4 million AERS reports and information extracted from 1.2 million EHR narratives. Well-established methodologies were used to generate signals from each source. The study focused on ADRs related to three high-profile serious adverse reactions. A reference standard of over 600 established and plausible ADRs was created and used to evaluate the proposed approach against a comparator. Results The combined signaling system achieved a statistically significant large improvement over AERS (baseline) in the precision of top ranked signals. The average improvement ranged from 31% to almost threefold for different evaluation categories. Using this system, we identified a new association between the agent, rasburicase, and the adverse event, acute pancreatitis, which was supported by clinical review. Conclusions The results provide promising initial evidence that combining AERS with EHRs via the framework of replicated signaling can improve the accuracy of signal detection for certain operating scenarios. The use of additional EHR data is required to further evaluate the capacity and limits of this system and to extend the generalizability of these results. 9
Comparing the Top K Signals from 2 Methods AERS alone using MGPS Minimum of AERS and EHR Signal Scores Using both gives much better Positive Predictive Values 10
Empirical Bayes to Combine Signals Outcome AERS Relative Healthcare Combined Improvement Acute Renal Failure 0.86 0.81 0.94 56% Upper GI Bleed 0.89 0.73 0.94 49% Acute Liver Injury 0.70 0.76 0.85 37% Acute Myocardial Infarction 0.64 0.70 0.76 20% Average 0.77 0.75 0.87 40%
Madigan, Ryan, Schuemie, Stang, Overhage, Hartzema et al. Evaluating the Impact of Database Heterogeneity on Observational Study Results. American journal of epidemiology. May 5, 2013 Studies of the same issue in different databases can and do generate different results, sometimes with strikingly different clinical implications. Relative risk estimates for 53 drug-outcome pairs 2 study designs (cohort studies and self-controlled case series) 10 observational databases Wildly discordant estimates across the 10 databases Statistically significant decreased risk to a statistically significant increased risk 11 of 53 (21%) of drug-outcome pairs that use a cohort design 19 of 53 (36%) of drug-outcome pairs that use a self-controlled case series design Consistent estimates (both direction and significance) across the 10 DBs 9 of 53 drug-outcome pairs (17%) for cohort studies 5 of 53 drug-outcome pairs (9%) for self-controlled case series Observational studies can be sensitive to the choice of database. More attention needs to be paid to this issue! 12
An Empirical Approach to Measuring and Calibrating for Error in Observational Analyses A proposal based on the 2012 OMOP experiments Initial review established a gold standard of 400 drug-event pairs with established positive or negative standing as to causality Restricted to four health outcomes of interest Acute kidney injury Acute liver injury Acute myocardial infarction GI bleed Extensive methodological experimentation Study bias and variability of estimation across drug-event pairs, estimation methods, real databases and simulations 13
OMOP 2012 Experiments http://omop.fnih.org/
Consider a typical observational database study: Exploring clopidogrel and upper gastrointestinal bleeding Error = distance from the point estimate to the true effect How far away from truth is RR=2.07? Bias = expected value of the error distribution When applying this type of analysis to this type of data for this type of outcome, how far on average is the estimate from the true value? Coverage = probability that true effect is contained within confidence interval When applying this type of analysis to this type of data for this type of outcome, do the 95% confidence intervals (1.66 to 2.58 in this case) actually contain the true relative risk 95% of the time? p<.001 15
Learning from what's already known Their recommendation: Use 3-4 negative controls, in addition to target outcome, as a means of assessing the plausibility of an observational analysis result Our recommendation: Use a large sample of negative (and positive) controls to empirically measure analysis operating characteristics and use them to calibrate your study finding 16
OMOP approach to methodological research Develop a standardized implementation of the analysis strategy Study design: Case-control Nesting within indication (unstable angina) Case definition: First episode of upper GI hemorrhage 10 controls per case, matched on age, gender, and index date Exposure definition: Length of exposure + 30d Exclusion criteria: <180d of observation before case Systematically apply the analysis across a network of databases, consistently for a large sample of positive and negative controls GI Bleeding: 24 positive controls, 67 negative controls Standard approach yields similar results as initial study: Opatrny 2008 in CRPD: 2.07 (1.66, 2.58) OMOP 2012 in CCAE: 1.86 (1.79, 1.93) Criteria for negative controls: Event not listed anywhere in any section of active FDA structured product label Drug not listed as causative agent in Tisdale et al, 2010: Drug-Induced Diseases Literature review identified no evidence of potential positive association Record all effect estimates (RR, CI) from all analysis-database-drugoutcome combinations and summarize analysis*database performance If we assume drugs identified as negative controls truly have no effect on outcome, then we can assume true RR = 1 as a basis for measuring error 17
Case-control estimates for GI bleed negative controls CC: 2000314, CCAE, GI Bleed If 95% confidence interval was properly calibrated, then 95%*65 = 62 of the estimates should cover RR = 1 We observed 29 of negative controls did cover RR=1 Estimated coverage probability = 29 / 65 = 45% Positive tendency: 74% of estimates have RR>1 Error distribution demonstrates positive bias (expected value > 1) and substantial variability 18
Measures of accuracy used in OMOP s evaluations Bias expected difference between true RR and estimated RR Mean squared error sum of variance and squared bias of the estimated RR Coverage probability - % of drugs where true RR is contained within estimated 95% confidence interval Real data: negative controls, assume true RR = 1 Can t use positive controls in real data if you don t know true RR Simulated data: positive controls, inject true RR = 1, 1.25, 1.5, 2, 4, 10 Discrimination (AUC) probability that estimate can distinguish between no effect and positive effect AUC can use any rank-order statistic (RR, p-value) AUC only assumes true RR should be bigger for positive controls than negative controls Can be/has been studied in both real and simulated data Sensitivity/specificity expected operating characteristics of a procedure at a defined decision threshold Decision threshold can be any dichotomous criteria (ex: RR>2, p<0.05, LBRR>1.5) Sensitivity - % of positive controls that meet decision threshold Specificity - % of negative controls that do not meet decision threshold Can set desired sensitivity or specificity to determine decision threshold Can be/has been studied in both real and simulated data 19
Comparing accuracy of cohort and self-controlled designs Data: MarketScan Medicare Supplemental Beneficiaries (MDCR) HOI: GI Bleeding broad definition Discrimination Error Coverage CM: 21000214 New user cohort, propensity score stratification, with active comparator (drugs known to be negative controls for outcome) Bias: -0.21 MSE: 0.31 Mean SE: 0.10 SCCS: 1955010 Multivariate selfcontrolled case series, including all events, and defining time-at-risk as all-time post-exposure Observation: Analyses have different error distributions, but all methods have low coverage probability OS: 403002 Self-controlled cohort design, including all exposures and outcomes, defining time-at-risk and control time as length of exposure + 30d Potential solution: empirical calibration to adjust estimate/standard error for observed bias and residual error Bias: -0.40 MSE: 0.31 Mean SE: 0.03 Bias: -0.32 MSE: 0.22 Mean SE: 0.05
Case-control estimates for GI bleed negative controls Using theoretical null: 55% have p <.05 Using empirical null: 6% have p <.05 CC: 2000314, CCAE, GI Bleed Intuition for empirical calibration: You can use empirical null to adjust original estimate by shifting for bias and stretching for variance in error distribution at each true effect size Ex: Clopidogrel-bleeding: Pre-calibration: (1.79-1.93) Post-calibration: (0.79-4.57) 21
Applying case-control design and calibrating estimates of positive controls in simulated data, RR=1.00 6 original estimates that did not contain true RR=1 After calibration, only 1 estimate does not contain true RR = 1 Original coverage probability = 75% Calibrated coverage probability = 96%
Applying case-control design and calibrating estimates of positive controls in simulated data, RR=1.25 Original coverage probability = 54% Calibrated coverage probability = 96%
Applying case-control design and calibrating estimates of positive controls in simulated data, RR=1.50 Original coverage probability = 46% Calibrated coverage probability = 92%
Applying case-control design and calibrating estimates of positive controls in simulated data, RR=2.00 Original coverage probability = 42% Calibrated coverage probability = 92%
Comparing accuracy of cohort and self-controlled designs, after empirical calibration Data: MDCR; HOI: GI Bleeding broad Discrimination Error Coverage CM: 21000214 New user cohort, propensity score stratification, with active comparator (drugs known to be negative controls for outcome) Bias: -0.02 MSE: 0.38 Mean SE: 0.36 SCCS: 1955010 Multivariate selfcontrolled case series, including all events, and defining time-at-risk as all-time post-exposure Bias: 0.04 MSE: 0.33 Mean SE: 0.67 Observation: Calibration does not influence discrimination, but tends to improve bias, MSE, and coverage OS: 403002 Self-controlled cohort design, including all exposures and outcomes, defining time-at-risk and control time as length of exposure + 30d Bias: 0.00 MSE: 0.11 Mean SE: 0.25
Concluding Thoughts Systematic exploration of negative and positive controls can be used to augment observational studies to measure analysis operating characteristics Errors in observational studies were observed to be differential by analysis design, data source, and outcome Magnitude and direction of bias varied, but all analyses had error distributions far from nominal Traditional interpretation of 95% confidence interval, that the CI covers the true effect size 95% of the time, may be misleading in the context of observational database studies Coverage probability was much lower across all methods and all outcomes Sampling variability is small portion of the true uncertainty in any study Empirical calibration is one approach to attempt to account for residual error that should be expected within any observational analysis 27
Observational Medical Outcomes Partnership Fourth Annual Symposium November 5 6, 2013 Hyatt Regency Bethesda Bethesda, Maryland OMOP holds an annual symposium to publicly share insights from the partnership's ongoing research with all stakeholders. Day 1 is a hands on tutorial designed for data users interested in analysis and engagement in the OMOP community. Day 2 is a research meeting to hear and discuss new OMOP research findings. Join us for two days and be a part of an exciting and growing community! DATE & TIME: November 5 6, 2013 9:00 a.m. - 5:00 p.m. Eastern Time Opening Poster Session Evening of November 5 th LOCATION: Hyatt Regency Bethesda One Bethesda Metro Center Bethesda, MD 20814, USA POSTER APPLICATIONS and ONLINE REGISTRATION COMING SOON! For questions contact Emily Welebob, welebob@omop.org