Further data analysis topics Jonathan Cook Centre for Statistics in Medicine, NDORMS, University of Oxford EQUATOR OUCAGS training course 24th October 2015
Outline Ideal study Further topics Multiplicity Subgroups Missing data Summary 2
Ideal study An ideal clinical study is where Every participant was eligible for the study All receive the intervention exactly as desired All outcomes are obtained for all participants Participants directly map into a definable population and clinical decision Analysis of such a study is (reasonably) straightforward, reliable, interpretable and applicable In reality? 3
Man et al., BMJ 2004
Who do we analyse? Statistical analysis premised upon having a representative sample (or that we can get back to such a thing in our analysis) Patients may though be unideal Got another treatment before, during or afterwards? Might be quite abnormal? What about important factors (e.g. age)? May have incomplete data Who should be included in the analysis? What do we do when the outcome is missing? 5
Multiplicity The more you look, the more you will find 6
Dangers of multiplicity Each statistical test typically has a 5% probability of being significant when in reality there is no real difference A false positive finding With multiple tests the probability of at least one false positive finding rises With many tests something is likely to be significant May be misinterpreted Danger of selective reporting (i.e. publish only the significant results) 7
Probability of at least one significant result Multiple tests 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 11 21 Number 31 41of tests 51 61 71 81 91
Sources of multiplicity in RCTs DESIGN Multiple treatment groups Multiple outcome measures Multiple follow-up time points CONDUCT Multiple looks at accumulating data PRE-SPECIFY ANALYSIS Grouping of continuous or categorical data Adjusted or unadjusted Subgroups Do these all generate the same concerns? 9
Multiple treatments, multiple time-points 3 groups = 7 comparisons: Global: A1 vs A2 vs B Pairwise: A1 vs A2; A1 vs B; A2 vs B; A1+A2 vs B; A1+B vs A2; A2 + B vs A1 3 time-points: 1 month; 3 months; 6 months 21 possible comparisons The trial reported a global analysis of variance at each time-point and a post-hoc multiple comparison test between groups. Could take account of all time-points using a more complex model (e.g. multilevel model) 10
Adjusting for multiple testing Formal adjustment to control overall significance level ( ) to desired level (e.g. 0.05) is possible Under Bonferroni procedure divide the by the number of tests Overly conservative (as usually outcome/time points are correlated) Considers all analyses of equivalent importance More complex approach are available but still somewhat simplistic Better approach is to think about hierarchy of testing and take a p-value with a good pinch of salt 11 11
Dealing with multiplicity Limit the number of analyses Consider analyses which all testing of multiple groups Prioritise key analyses over others Primary versus secondary outcomes Hypothesis testing versus hypothesis generating Distinguish between planned and posthoc (after the event) analyses Interpret similar analyses together not in isolation If only one of 11 analyses on a single outcome is significant 12
Why Examine subgroups? To confirm an observed treatment effect is consistent across all major subgroups We suspect in advance that certain features may alter the magnitude of the effect, e.g. age, severity of disease, histological type of tumour To identify those for which the treatment does not work To identify groups who benefit from the treatment even when the overall result is not significant To generate hypotheses for future studies 13
Subgroup analyses What is the question? Main analysis (e.g. RCT looks for a difference in treatments) give an overall finding Subgroup analysis asks if there is evidence that result (e.g. the treatment effect in a RCT) varies across subgroups Examining each subgroup is misleading Separate tests do not address the right question Multiple tests results in a raised false positive rate Commonly done! Should compare subgroups directly Interaction test 14
Example: HIV Vaccine Trial Placebo Vaccine Relative Risk Reduction (95%CI) All volunteers 98/1679 (5.8%) 191/3330 (5.7%) 3.8% (-22.9 to 24.7%) White & Hispanic 81/1508 (5.4%) 179/3003 (6.0%) 15-9.7% (-42.8 to 15.7) Black/Asian/Other 17/171 (9.9%) 12/327 (3.7%) 66.8% (30.2 to 84.2) Black 9/111 (8.1%) 4/203 (2.0%) 78.3% (29.0 to 93.3) Asian 2/20 (10.0%) 2/53 (3.8%) 68.0% (-129.4 to 95.5) Other 6/40 (15.0%) 6/71 (8.5%) 46.2% (-67.8 to 82.8)
HIV Vaccine Trial This is the first time we have specific numbers to suggest that a vaccine has prevented HIV infection in humans, said Phillip Berman, inventor of the vaccine and senior vice president of Research and Development at VaxGen (Brisbane, CA), the company that is developing the vaccine. We're not sure yet why certain groups have a better immune response, but these preliminary results indicate that a surface protein vaccine that stimulates neutralising antibodies correlates with prevention of infection. 16 16
JAMA headline Lancet headline
Missing data & why it occurs Patients lost to follow up are very unlikely to be a random subset of all those randomised as they may fail to return because they feel much better or worse they failed to comply and feel guilty etc. Missing data may introduce bias (and undermine the benefit of randomisation if we have do so) Also leads to a loss of statistical precision 18
Missing data & its impact Impact depends on the amount missing Can be large in some contexts, e.g. smoking cessation Credibility will be weakened if many participants are lost to follow up Hence the need to know how complete follow up was Credibility will particularly suffer if loss to follow up is greater in one group 19
Missing data in trials Wood et al. Clin Trials 2004
Dealing with missing data No fully satisfactory solution Assumptions are needed beyond those needed to analyse full data set All approaches make important assumptions Those assumptions are largely uncheckable Can investigate sensitivity to those assumptions Main options Ignore & conduct complete case analysis Impute 21
Imputing Simple imputation All missing values set to the same outcome (e.g. best or worst) Leads to optimistic or pessimistic results for binary outcomes Difficult for continuous data (can use mean or median) Leads to overly-precise results Common simple imputation approaches Best case - worst case Generally not helpful Last value carried forward Popular but problematic More complex regression methods Assume a relationship between missing and observed data Valid analysis if underlying assumptions are correct 22
LOCF (1) We have a trial with longitudinal follow-up Observations at 2 or more different times With no dropouts analysis is straightforward Under last observation carried forward (LOCF) Where patients have partial (e.g. dropped out) data we fill in all their missing observations with their last observation We analyse this completed data set as if it was the real data set Simple and popular, but 23
LOCF (2) We make the strong assumption that unseen observations equal the last observation seen How plausible? We also ignore uncertainty associated with that assumption Imputed data should show more uncertainty than real data, not less! Method has bad properties Gives biased treatment estimates Direction and size of bias depends on (unknown) true effect Tests are biased (over-optimistic)/confidence intervals wrong coverage 24
LOCF (3) Pittler et al. Br J Dermatol 2003
The best solutions to missing data Don t have any! Design the trial to maximise completeness of data collection e.g. systems for chasing people Anticipate possibility of missing data when preparing protocol and analysis plan Pre-specify statistical methods Assess sensitivity of result to assumptions 26
Analysis General strategy analysis & reporting Decisions about which analyses to do and who to include should be made (AFAP) before viewing data Document reasons for missing data and quantify it Advisable to do analysis on everyone relevant even if good reasons for look at a specific subpopulation Less analysis is more (consider the threat of multiple comparisons) Reporting Always clarify who was included in each analysis Depict key inclusion decisions in a flow diagram Report posthoc as posthoc Interpret similar tests together 27
Summary What gets into the analysis affects the validity & credibility of the findings Studies should be designed to minimise missing data Statistical analyses need careful planning Be choosey about analyses (less is more) Report what you did clearly, fully and accurately as intended Not in relation to chance findings 28
References Man WD-C, et al. BMJ 2004 Community pulmonary rehabilitation after hospitalisation for acute exacerbations of chronic obstructive pulmonary disease: randomised controlled study. doi:10.1136/bmj.38258.662720.3a. Molnar F, et al. Does analysis using "last observation carried forward" introduce bias in dementia research?, CMAJ 2008 179(8) 751-3. Pittler MH, et al. Randomized, double-blind, placebo-controlled trial of autologous blood therapy for atopic dermatitis. Br J Dermatol. 2003 Feb;148(2):307-13. Bender R, Lange S. Adjusting for multiple testing--when and how? J Clin Epidemiol. 2001 Apr;54(4):343-9. Dmitrienko A, et al. General Guidance on Exploratory and Confirmatory Subgroup Analysis in Late-Stage Clinical Trials J Biopharm Stat. 2015 [Epub ahead of print] 29