Interpreting Prospective Studies

Comparative Effectiveness Research Collaborative Initiative (CER CI) PART 1: INTERPRETING OUTCOMES RESEARCH STUDIES FOR HEALTH CARE DECISION MAKERS ASSESSING PROSPECTIVE DATABASE STUDIES: A PROPOSED MEASUREMENT TOOL FOR HEALTH CARE DECISION MAKERS Interpreting Prospective Database Studies For Health Care Decision Makers Task Force Forum: Monday June 4 th ISPOR 17 th International Meeting, Washington DC AMCP/ISPOR/NPC Interpreting Prospective Studies For Health Care Decision Makers Task Force Chair: Marc Berger M.D. Executive VP and Senior Scientist, OptumInsight Life Sciences New York, New York, USA Task Force Members: AMCP NPC ISPOR Dan Allen, PharmD Clinical Pharmacist Consultant RegenceRx Portland, OR, USA Karen Worley PhD Clinical Research Consultant Competitive Health Analytics, Inc. Humana Cincinnati, OH, USA Scott Devine, MPH, PhD Outcomes Research Scientist US Outcomes Research Merck & Co, Inc. St. Louis, MO, USA John Graham PharmD Group Director, Health Services, US Medical Bristol Myers Squibb Princeton, NJ, USA C. Daniel Mullins PhD Professor University of Maryland School of Pharmacy, Pharmaceutical Health Services Research 220 Arch Street, 12th Floor Baltimore, MD, USA Don Husereau BScPharm, MSc, Adjunct Professor, Faculty of Medicine, University of Ottawa and Senior Scientist, University for Health Sciences, Medical Informatics and Technology, Tirol, Austria Ottawa, Ontario, Canada

Assessing Prospective Observational Studies: A Proposed Measurement Tool For Health Care Decision Makers Marc Berger, MD CHAIR CER CI Interpreting Prospective Observational Studies for Health Care Decisions Task Force and Executive Vice President & Senior Scientist OPTUMInsight, New York, NY, USA C. Daniel Mullins, PhD Professor, University of Maryland School of Pharmacy Pharmaceutical Health Services Research, Baltimore, MD, USA John Graham, PharmD Group Director, Health Services, US Medical Bristol Myers Squibb, Princeton, NJ, USA Scott Devine, MPH, PhD Outcomes Research Scientist, US Outcomes Research, Merck & Co., Inc., St Louis, MO, USA AMCP/ISPOR/NPC CER Collaborative Initiative Objective: Enhancing usefulness of CER to improve patient health outcomes

Critical Appraisal and Coverage Determinations CER CI Part 1 Develop a set of Assessment Tools to: Assess the credibility, and relevance of non experimental studies (not RCTs) Easy, fast, accurate, no skill required help the end user: assess the quality, credibility, and relevance of non experimental studies include more CER studies in their body of evidence Uniform the decision making process when appraising the quality of evidence For more information: http://www.ispor.org/taskforces/interpretingorsforhcdecisionmakerstfx.asp

Two main questions Credible? (Credibility/accuracy/internal validity) Credibility is the extent to which the study accurately answers the question it is designed or intended to answer and is determined by the design and conduct of the study. Addresses issues of Internal Validity; measurement, and confounding Relevant? (applicability/external validity to user) Relevance addresses the extent to which the results of the study, if accurate, apply to the setting of interest to the decision maker. Addresses issues of External Validity (population, comparators, endpoints, timeframe) and policy relevant differences The Target Easy, fast, accurate, minimum skill required Questions/items ~25 30 items per tool Consistency across the different tools Specific questions pertinent to the type of study

Tool can be used at different levels Level 1: Checklist 25 30 Questions (Yes/No) Nested sub questions (Yes/No) Level 2: Scorecard Domains (strength/weakness/neutral/fatal flaw) Questions grouped by concept Overall Scores Credibility Relevance Level 3: Annotated Scorecard Items are accompanied by brief comment as to why they were assigned a particular score

Status Work in progress Retrospective and Prospective Observational Study Tools Taskforces to align to common dimensions and majority of questions Working Set of Common Dimensions CREDIBILITY (N=8) A priori specification Good scientific rationale Study Population Study Execution Loss to follow up Measurement Exposure Outcome Covariate Data Quality Design and Analysis Study Design Sub questions Statistics and Analysis Sensitivity Precision/Uncertainty Confounding Stats Interpretation Consistency Conflict of Interest RELEVANCE (N=2) Design Length of Follow up Population Comparison Groups Endpoints Magnitude of differences demonstrated

Tool can be used at different levels Level 1: Checklist 31 Questions (Yes/No) Level 2: Scorecard 12 Domains (strength/weakness/neutral/fatal flaw) 31 Questions (Yes/No) Overall Scores 7 Domains Credibility 5 Domains Relevance Level 3: Annotated Scorecard Items are accompanied by brief comment as to why they were assigned a particular score Q1 Q2 Credibility Dimension: A priori specification of study questions and analysis plan Did the authors document their study hypotheses/questions, and their analysis plan in a formal study protocol (ex. as stated in manuscript, or evidenced by IRB review or registration on clintrials.gov)? Did their analysis plan include a sample size calculation and justification? Does this dimension represent a strength, neither a strength nor weakness, a weakness, or a fatal flaw?

Q6 Credibility Dimension: Design and Analysis Precision and Uncertainty Did the authors describe the uncertainty of their findings through the use of appropriate statistics such as confidence intervals? Q7 Were sensitivity analyses described? Q8 Were confounder-adjusted estimates and their precision reported? Does this dimension represent a strength, neither a strength nor weakness, a weakness, or a fatal flaw? Credibility Dimension: Design and Analysis Q14 Did the study use multiple statistical approaches? Q15 Q16 Q17 Q18 Q19 Did these different approaches employ a reasonable range of statistical assumptions? Did using different approaches lead to differences in the direction of study findings? If the direction was similar, did using different approaches lead to differences in the size of the study effect? If the direction was different or if the the size of the effect was reduced, did the authors provide a plausible explanation? If the authors drew conclusions from one statistical approach versus another, did they justify their choice? Q20 Did the authors describe confounders that were measured in the study? Q21 Did the authors describe confounders that were not measured in the study? Q22 Did the authors employ statistical approaches that address confounding such as propensity score matching, propensity score regression, instrumental variable approaches, external validation, and multivariable regression.

Credibility Dimension: Design and Analysis Q23 Did the authors employ study design to minimize or account for confounding such as inception cohorts, new user designs, multiple comparator groups, matching designs, and assessment of outcomes thought not to be impacted by the therapies being compared. Q24 Did the authors test the effect of measured confounders on the observed difference in effect? Q25 Did the authors discuss or estimate the effect of unmeasured confounders on the observed diffference in the effect? Does this dimension represent a strength, neither a strength nor weakness, a weakness, or a fatal flaw? Relevance Dimension: Population Q26 Is the population studied is comparable to that of your organization (or at least to the subset of individuals to which you will be applying these results)? Some characteristics to consider include demographic variables (gender, age, ethnicity, geographic region), clinical characteristics (disease severity, comorbidities) line of business (Medicare, commercial), size of health plan, etc. Q27 Are there any unique characteristics of the study population, or your own population, that would limit the relevance of these findings? (e.g., the study shows a drug effect for newlydiagnosed diabetics, but individuals in your organization tend to be older and further advanced in their disease status) Does this dimension represent a strength, neither a strength nor weakness, a weakness, or a fatal flaw?

Relevance Dimension: Comparison Groups Relevance Dimension: Endpoints Q28 Are the comparison groups used are appropriate for the current marketplace and your policy decision? Q29 Are you willing to make decisions based upon the endpoints measured? True endpoints of interest (ex. Morbidity, mortality, quality of life) are generally given greater weight than surrogate endpoints (ex. Blood pressure for stroke risk, cholesterol reduction for cardiovascular mortality). Does this dimension represent a strength, neither a strength nor weakness, a weakness, or a fatal flaw? Does this dimension represent a strength, neither a strength nor weakness, a weakness, or a fatal flaw?

Relevance Dimension: Magnitude of differences demonstrated Q31 Are the results (difference demonstrated) considered clinically meaningful and relevant to the policy question at hand? Questions? Comments? Does this dimension represent a strength, neither a strength nor weakness, a weakness, or a fatal flaw?