Strategies for handling missing data in randomised trials

Similar documents
Should individuals with missing outcomes be included in the analysis of a randomised trial?

COMMITTEE FOR PROPRIETARY MEDICINAL PRODUCTS (CPMP) POINTS TO CONSIDER ON MISSING DATA

Appendix 1. Sensitivity analysis for ACQ: missing value analysis by multiple imputation

Reporting and dealing with missing quality of life data in RCTs: has the picture changed in the last decade?

Module 14: Missing Data Concepts

Exploring the Impact of Missing Data in Multiple Regression

Proper analysis in clinical trials: how to report and adjust for missing outcome data

How should the propensity score be estimated when some confounders are partially observed?

Estimands, Missing Data and Sensitivity Analysis: some overview remarks. Roderick Little

Missing data in clinical trials: making the best of what we haven t got.

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Treatment changes in cancer clinical trials: design and analysis

PSI Missing Data Expert Group

Analysis Strategies for Clinical Trials with Treatment Non-Adherence Bohdana Ratitch, PhD

Further data analysis topics

Comparison of imputation and modelling methods in the analysis of a physical activity trial with missing outcomes

The RoB 2.0 tool (individually randomized, cross-over trials)

DRAFT (Final) Concept Paper On choosing appropriate estimands and defining sensitivity analyses in confirmatory clinical trials

research methods & reporting

Pharmaceutical Statistics Journal Club 15 th October Missing data sensitivity analysis for recurrent event data using controlled imputation

ICH E9(R1) Technical Document. Estimands and Sensitivity Analysis in Clinical Trials STEP I TECHNICAL DOCUMENT TABLE OF CONTENTS

EU regulatory concerns about missing data: Issues of interpretation and considerations for addressing the problem. Prof. Dr.

Selected Topics in Biostatistics Seminar Series. Missing Data. Sponsored by: Center For Clinical Investigation and Cleveland CTSC

Controlled Trials. Spyros Kitsiou, PhD

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY

Controlled multiple imputation methods for sensitivity analyses in longitudinal clinical trials with dropout and protocol deviation

G , G , G MHRN

Transmission to CHMP July Adoption by CHMP for release for consultation 20 July Start of consultation 31 August 2017

Designing and Analyzing RCTs. David L. Streiner, Ph.D.

AVOIDING BIAS AND RANDOM ERROR IN DATA ANALYSIS

Estimands and Their Role in Clinical Trials

Missing Data and Imputation

Estimands and Sensitivity Analysis in Clinical Trials E9(R1)

Beyond the intention-to treat effect: Per-protocol effects in randomized trials

Trial designs fully integrating biomarker information for the evaluation of treatment-effect mechanisms in stratified medicine

Practice of Epidemiology. Strategies for Multiple Imputation in Longitudinal Studies

Methods for Computing Missing Item Response in Psychometric Scale Construction

Catherine A. Welch 1*, Séverine Sabia 1,2, Eric Brunner 1, Mika Kivimäki 1 and Martin J. Shipley 1

The role of Randomized Controlled Trials

baseline comparisons in RCTs

In this module I provide a few illustrations of options within lavaan for handling various situations.

An application of a pattern-mixture model with multiple imputation for the analysis of longitudinal trials with protocol deviations

Missing data in clinical trials: a data interpretation problem with statistical solutions?

Complier Average Causal Effect (CACE)

Endovascular or open repair for ruptured abdominal aortic aneurysm: 30-day outcomes from

Supplementary Appendix

Bayesian approaches to handling missing data: Practical Exercises

Systematic Reviews. Simon Gates 8 March 2007

Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study

Help! Statistics! Missing data. An introduction

Cochrane Pregnancy and Childbirth Group Methodological Guidelines

Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis

Understanding noninferiority trials

Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models, 2nd Ed.

Models for potentially biased evidence in meta-analysis using empirically based priors

Analysis of Vaccine Effects on Post-Infection Endpoints Biostat 578A Lecture 3

Revised Cochrane risk of bias tool for randomized trials (RoB 2.0) Additional considerations for cross-over trials

City, University of London Institutional Repository

Master thesis Department of Statistics

Health authorities are asking for PRO assessment in dossiers From rejection to recognition of PRO

Title: Intention-to-treat and transparency of related practices in randomized, controlled trials of anti-infectives

International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use

ICH E9 (R1) Addendum on Estimands and Sensitivity Analysis

A NEW TRIAL DESIGN FULLY INTEGRATING BIOMARKER INFORMATION FOR THE EVALUATION OF TREATMENT-EFFECT MECHANISMS IN PERSONALISED MEDICINE

Meta-analysis of safety thoughts from CIOMS X

What to do with missing data in clinical registry analysis?

Statistics in Medicine The Prevention and Treatment of Missing Data in Clinical Trials

Beyond Controlling for Confounding: Design Strategies to Avoid Selection Bias and Improve Efficiency in Observational Studies

Implementation of estimands in Novo Nordisk

ethnicity recording in primary care

Mitigating Reporting Bias in Observational Studies Using Covariate Balancing Methods

Imputation methods for missing outcome data in meta-analysis of clinical trials

Evidence Based Practice

Lecture II: Difference in Difference. Causality is difficult to Show from cross

Logistic Regression with Missing Data: A Comparison of Handling Methods, and Effects of Percent Missing Values

Practical Statistical Reasoning in Clinical Trials


Fixed Effect Combining

Explaining heterogeneity

BLISTER STATISTICAL ANALYSIS PLAN Version 1.0

Guidelines for Reporting Non-Randomised Studies

Estimand in China. - Consensus and Reflection from CCTS-DIA Estimand Workshop. Luyan Dai Regional Head of Biostatistics Asia Boehringer Ingelheim

The Logic of Causal Inference. Uwe Siebert, MD, MPH, MSc, ScD

The prevention and handling of the missing data

Instrumental Variables Estimation: An Introduction

This is a repository copy of Practical guide to sample size calculations: non-inferiority and equivalence trials.

An Introduction to Multiple Imputation for Missing Items in Complex Surveys

Statistical Analysis Plan

Maintenance of weight loss and behaviour. dietary intervention: 1 year follow up

Bias in Area Under the Curve for Longitudinal Clinical Trials With Missing Patient Reported Outcome Data: Summary Measures Versus Summary Statistics

Confounding by indication developments in matching, and instrumental variable methods. Richard Grieve London School of Hygiene and Tropical Medicine

Methodology for Non-Randomized Clinical Trials: Propensity Score Analysis Dan Conroy, Ph.D., inventiv Health, Burlington, MA

Critical Appraisal of RCT

Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling, Sensitivity Analysis, and Causal Inference

Assessing risk of bias

Advanced Handling of Missing Data

Lecture II: Difference in Difference and Regression Discontinuity

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha

Analysis of TB prevalence surveys

INTERVAL trial Statistical analysis plan for principal paper

Transcription:

Strategies for handling missing data in randomised trials NIHR statistical meeting London, 13th February 2012 Ian White MRC Biostatistics Unit, Cambridge, UK

Plan 1. Why do missing data matter? 2. Popular analysis methods and their assumptions 3. Which methods are best in a RCT? 4. Intention-to-treat analysis strategy for randomised trials with missing outcomes 5. Sensitivity analysis Work with: James Carpenter & Stuart Pocock (LSHTM), Nick Horton (USA), Simon Thompson (BSU) 2

Why do missing data matter? 1. Loss of power (cf. power with no missing data) can t regain lost power 2. Any analysis must make an untestable assumption about the missing data wrong assumption biased estimates 3. Some popular analyses with missing data get biased standard errors resulting in wrong p-values and confidence intervals 4. Some popular analyses with missing data are inefficient confidence intervals wider than they need be 3

What to do: loss of power Can t solve by analysis (but can exacerbate it!) Approach by design: Minimise amount of missing data good communications with participants aim to follow everyone up make repeated attempts using different methods Reduce the impact of missing data collect reasons for missing data collect information predictive of missing values 4

What to do: analysis A suitable method of analysis would: Make the correct assumption about the missing data Give an unbiased estimate (under that assumption) Give an unbiased standard error (so that P-values and confidence intervals are correct) Be efficient (make best use of the available data) BUT we can never be sure what is the correct assumption sensitivity analyses are essential 5

US report: The Prevention and Treatment of Missing Data in Clinical Trials Commissioned by Food & Drug Administration Written by a panel of top statisticians National Research Council (2010) 1. Introduction and Background 2. Trial Designs to Reduce the Frequency of Missing Data focus on estimands (pre-trial) 3. Trial Strategies to Reduce the Frequency of Missing Data 4. Drawing Inferences from Incomplete Data covers it all 5. Principles and Methods of Sensitivity Analyses lots of suggestions 6. Conclusions and Recommendations 6

Plan 1. Why do missing data matter? 2. Popular analysis methods and their assumptions 3. Which methods are best in a RCT? 4. Intention-to-treat analysis strategy for randomised trials with missing outcomes 5. Sensitivity analysis Note: missing data are most commonly in the outcome, but may also occur in baseline covariates 7

How to approach the analysis Start by knowing: extent of missing data pattern of missing data (e.g. how many people with time 1 missing have time 2 observed?) predictors of missing data and of outcome Principled approach to missing data: identify a plausible assumption (needs discussion between statisticians and clinicians) choose an analysis method that s valid under that assumption Some analysis methods are simple to describe but have complex and/or implausible assumptions 8

The analysis toolkit Simple methods Last observation carried forward (LOCF) Complete-case analysis Mean imputation Missing indicator method Regression imputation More complex methods Multiple imputation Likelihood-based methods Inverse probability weighting (IPW) 9

Properties of analysis methods Method For missing covariate For missing outcome LOCF Not applicable OK under LOCF ass n Complete cases Inefficient Single Y: OK under MAR Repeated Y: inefficient Mean imputation OK in RCT SE Missing indicator Fails to control confounding in epi Not applicable Regression OK under MAR (no Y in imputation imp. model) SE Multiple imputation Maximum likelihood OK under MAR OK under MAR IPW Inefficient or complex OK under MAR Simple patterns only OK means valid & efficient 10

Missing at random (MAR) The probability that data are missing may depend on the values of the observed data does not depend on the values of the missing data (conditional on the values of the observed data) Example: blood pressure (BP) data are MAR if older individuals are more likely to have their BP recorded (and age is observed and included in the analysis) but at any age, individuals with low and high BP are equally likely to have their BP recorded 11

A comment on MAR A lot of statistical literature seems to regard MAR as the correct starting point for analyses with missing data I think the correct assumption depends on the clinical context A general argument in favour of MAR is that it tends to become more plausible as more variables are included in the model 12

A comment on LOCF Assumes last observation is representative of the missing value i.e. mean change after drop-out is zero Can t verify this assumption from the data not implied by mean change in observed data is zero Analysts rarely give a good justification, and instead justify LOCF (wrongly) on the grounds that it is conservative: not true in general it respects ITT by analysing all individuals Recall principled approach to missing data: identify a plausible assumption choose analysis that s valid under that assumption 13

Plan 1. Why do missing data matter? 2. Popular analysis methods and their assumptions 3. Which methods are best in a RCT? 4. Intention-to-treat analysis strategy for randomised trials with missing outcomes 5. Sensitivity analysis In this section I m going to assume we are working on a trial where have decided that MAR is a reasonably plausible assumption, or at least a good starting point 14

Missing outcomes in a RCT under MAR: 1. Single outcome Under MAR, cases with missing Y contribute no information complete-cases analysis is correct! Regress outcome (Y) on randomised group (Z), adjusting for baseline covariates (X) analysis of covariance, ANCOVA this is the likelihood-based method Which X? to make MAR valid, adjust for X that predict both outcome and missingness to gain power, adjust for X that predict outcome Can improve on complete-cases analysis with composite outcomes or auxiliary information see later 15

Missing outcomes in a RCT under MAR: 2. Repeated outcome Repeated quantitative outcome: Use a mixed model (likelihood-based) Include all observed outcome data Exclude any individuals with no post-baseline observations Include X s as before Software: Stata xtmixed, SAS proc mixed, R lme() There are some pitfalls Don t allow a treatment effect at baseline Allow a different treatment effect at each follow-up time If possible, use unstructured variance-covariance matrix Repeated binary outcome: May be worth using multiple imputation 16

What about multiple imputation? Idea of multiple imputation (tutorial: White et al, 2011) Impute missing data m times from observed data Analyse the m completed data sets Combine estimates by Rubin s rules If imputation model = analysis model, MI is the same as fitting a [mixed] model to the observed data but MI has additional random error so why do MI? MI may be of value in a RCT if auxiliary information (e.g. compliance or other trial outcomes) can be included in the imputation model as a way to do sensitivity analyses with composite outcomes with repeated binary outcome 17

Missing baselines Missing baselines in RCTs are a completely different problem from missing outcomes Not a source of bias: baseline adjustment is used to gain precision Complete cases analysis is a very bad idea Almost anything else is OK (White & Thompson, 2005) in particular, mean imputation or missing indicator method are OK provided randomisation is respected The above is only true when estimating the effect of a randomised intervention on outcome 18

Plan 1. Why do missing data matter? 2. Popular analysis methods and their assumptions 3. Which methods are best in a RCT? 4. Intention-to-treat analysis strategy for randomised trials with missing outcomes 5. Sensitivity analysis 19

Intention-to-treat (ITT) principle Include everyone randomised in the group to which they were assigned (whether or not they completed the intervention) What does ITT mean with missing outcome data? The statistical analysis of a clinical trial generally requires the imputation of values to those data that have not been recorded (CPMP, 2001) Although those participants [who drop out] cannot be included in the analysis, it is customary still to refer to analysis of all available participants as an intention-to-treat analysis (Altman et al, 2001) Full set analysis generally requires the imputation of values or modelling for the unrecorded data (Eur. Medicines Agency, 2010) We replaced mention of intention to treat analysis, a widely misused term, by a more explicit request for information about retaining participants in their original assigned groups (CONSORT, 2010) 20

Difficulties with ITT Including all randomised individuals in the analysis isn t enough to make an analysis valid The desire to include all randomised individuals in the analysis reduces emphasis on the appropriate assumptions leads to uncritical use of simple imputation methods, esp. Last Observation Carried Forward (LOCF) leads to unnecessary use of complex methods, esp. multiple imputation biases against MAR-based analyses 21

Strategy for intention to treat analysis with incomplete observations (White et al, BMJ, 2011) 1. Attempt to follow up all randomised participants, even if they withdraw from allocated treatment 2. Perform a main analysis of all observed data that is valid under a plausible assumption about the missing data 3. Perform sensitivity analyses to explore the effect of departures from the assumption made in the main analysis 4. Account for all randomised participants, at least in the sensitivity analyses 22

Example: QUATRO trial European multicentre RCT to evaluate the effectiveness of adherence therapy in improving quality of life for people with schizophrenia (Gray et al, 2006) Primary outcome: quality of life measured by the SF-36 MCS scale at baseline and 52-week follow up Intervention Control Total n 204 205 Missing outcome 14% 6% Mean of observed outcomes 40.2 41.3 SD of observed outcomes 12.0 11.5 23

QUATRO trial: ITT analysis strategy 1. We did attempt to follow up all randomised individuals 2. Main assumption: no difference between missing and observed values, once adjusted for baseline variables (MAR) Main analysis: analysis of covariance on complete cases intervention effect = -0.33 (s.e. 1.11) 3. Sensitivity analysis: consider possible differences between missing and observed values, allowed to be different in each arm coming next 4. All randomised individuals were included in the sensitivity analyses 24

Plan 1. Why do missing data matter? 2. Popular analysis methods and their assumptions 3. Which methods are best in a RCT? 4. Intention-to-treat analysis strategy for randomised trials with missing outcomes 5. Sensitivity analysis 25

How to do sensitivity analyses? Not LOCF for main analysis, CC for sensitivity analysis LOCF CC MAR-based Mean outcome 6 7 8 9 10 0 1 2 0 1 2 time Complete Drop out at time 1 0 1 2 26

How to do sensitivity analyses? Not LOCF for main analysis, CC for sensitivity analysis Instead, specify the numerical value of sensitivity parameter(s) governing the degree of departure from the main assumption (Kenward et al, 2001) e.g. the degree of departure from MAR Principled sensitivity analysis My approach: let δ = mean of missing data mean of observed data so δ = 0 is MAR get plausible range of δ from subject matter vary δ in both arms vary δ in one arm (δ=0 in other arm) Methods: White et al (2007) or rctmiss software 27

Example: QUATRO data Estimated intervention effect -4-2 0 2-10 -8-6 -4-2 0 δ = missing observed Intervention only Both arms Control only MAR MAR Outcome SD 12 28

Conclusions & discussion Missing baselines: use simple methods that respect randomisation Missing outcomes: focus on assumptions, not methods ANCOVA and mixed models are often the best strategy for missing outcomes in RCTs use MI with auxiliary data (e.g. compliance) or possibly as a way to do sensitivity analyses An intention-to-treat analysis strategy should include all individuals in sensitivity analyses but not necessarily in main analyses Sensitivity analyses can be done in various ways install my software rctmiss in Stata using net from http://www.mrc-bsu.cam.ac.uk/iw_stata/missing 29

References Altman D et al (2001). The revised CONSORT statement for reporting randomized trials: Explanation and elaboration. Annals of Internal Medicine 134: 663 694. Committee for Proprietary Medicinal Products (2001). Points to consider on missing data. http://www.emea.europa.eu/pdfs/human/ewp/177699en.pdf European Medicines Agency (2010). Guideline on missing data in confirmatory clinical trials. http://www.ema.europa.eu/ema/pages/includes/document/ open_document.jsp?webcontentid=wc500096793 Gray R et al (2006). Adherence therapy for people with schizophrenia: European multicentre randomised controlled trial. British Journal of Psychiatry 189: 508 514. Kenward MG, Goetghebeur EJT, Molenberghs G (2001). Sensitivity analysis for incomplete categorical tables. Statistical Modelling 1: 31 48. National Research Council (2010). The Prevention and Treatment of Missing Data in Clinical Trials. http://www.nap.edu/catalog.php?record_id=12955. Schulz KF, Altman DG, Moher D (2010). CONSORT 2010 Statement: updated guidelines for reporting parallel group randomised trials. BMJ 340: c332. White IR, Horton N, Carpenter J, Pocock SJ (2011). An intention-to-treat analysis strategy for randomised trials with missing outcome data. BMJ 342:d40. White IR, Thompson SG (2005). Adjusting for partially missing baseline measurements in randomised trials. Statistics in Medicine 24:993 1007. White IR, Wood A, Royston P (2011). Multiple imputation using chained equations: issues and guidance for practice. Statistics in Medicine; 30: 377 399. 30