QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDICTORS ON THE PERFORMANCE OF A PROGNOSTIC MODEL

Size: px

Start display at page:

Download "QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDICTORS ON THE PERFORMANCE OF A PROGNOSTIC MODEL"

August Benson
5 years ago
Views:

1 QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDICTORS ON THE PERFORMANCE OF A PROGNOSTIC MODEL Gary Collins, Emmanuel Ogundimu, Jonathan Cook, Yannick Le Manach, Doug Altman Centre for Statistics in Medicine University of Oxford 20-July-2016 gary.collins@csm.ox.ac.uk

2 Outline 2 Existing guidance What s done in practice? Brief overview of the study sample & simulation set-up Findings & Discussion

3 Basis of this presentation 3

4 Not a new idea 4

5 It s all in the title ( ) 5 1. Problems in dichotomizing continuous variables (Altman 1994) 2. Dangers of using "optimal" cutpoints in the evaluation of prognostic factors. (Altman et al 1994) 3. How bad is categorization? (Weinberg; 1995) 4. Seven reasons why you should NOT categorize continuous data (Dinero; 1996) 5. Breaking Up is Hard to Do: The Heartbreak of Dichotomizing Continuous Data (Streiner; 2002) 6. Negative consequences of dichotomizing continuous predictor variables (Irwin & McClelland; 2003) 7. Why carve up your continuous data? (Owen 2005) 8. Chopped liver? OK. Chopped data? Not OK. Chopped liver? OK. Chopped data? Not OK (Butts & Ng 2005) 9. Categorizing continuous variables resulted in different predictors in a prognostic model for nonspecific neck pain (Schellingerhout et al 2006)

6 It s all in the title ( ) 10.Dichotomizing continuous predictors in multiple regression: a bad idea (Royston et el 2006) 11. The cost of dichotomising continuous variables (Altman & Royston; 2006) 12.Leave 'em alone - why continuous variables should be analyzed as such (van Walraven & Hart; 2008) 13.Dichotomization of continuous data--a pitfall in prognostic factor studies (Metze; 2008) 14. Analysis by categorizing or dichotomizing continuous variables is inadvisable: an example from the natural history of unruptured aneurysms (Naggara et al 2011) 15.Against quantiles: categorization of continuous variables in epidemiologic research, and its discontents (Bennette & Vickers; 2012) 16.Dichotomizing continuous variables in statistical analysis: a practice to avoid (Dawson & Weiss; 2012) 17. The danger of dichotomizing continuous variables: A visualization (Kuss 2013) 18. The anathema of arbitrary categorization of continuous predictors (Vintzileos et al; 2014) 19. Ophthalmic statistics note: the perils of dichotomising continuous variables (Cumberland et al 2014) 6

Biologically implausible Prognostic factor (PF) Convoluted Reasoning and PF not present (low risk) PF present (high risk) Anti-intellectual Pomposity C.

7 Biologically implausible Prognostic factor (PF) Convoluted Reasoning and PF not present (low risk) PF present (high risk) Anti-intellectual Pomposity C.R.A.P (Norman & Streiner; A B Biostatistics: the Bare Cut-point C Essentials, 2008) Slide adapted from Michael Babyak ( Modeling with Observational Data )

8 Still, what happens in practice? 8 Breast cancer models (Altman 2009) Categorised some/all - 34/53 (64%) Diabetes models (Collins et al 2011) Categorised some/all 21/43 (49%) General medical journals (Bouwmeester et al 2012) Categorised 30/64 (47%) Dichotomised 21/64 (21%) Cancer models (Mallett et al 2010) All categorised/dichotomised 24/47 (51%)

9 Aim of the study 9 Investigate the impact of different approaches for handling continuous predictors on the apparent performance (same data) validation performance (different data; geographical validation) Investigate the influence of sample size on the approach for handling continuous predictors

$4688 CVD events 7721 hip fractures 565 hip$

10 Sample characteristics (THIN) 10 80,800 CVD events 4688 CVD events 7721 hip fractures 565 hip fractures

11 Models 11 Cox models to predict 10-year risk of CVD (men & women) 10-year risk of hip fracture (women only) CVD model contained 7 predictors Age, sex, family history, cholesterol, SBP, BMI, hypertension Hip fracture model contained 5 predictors Age, BMI, Townsend score, asthma, antidepressants

12 Resampling strategy 12 MODEL DEVELOPMENT To ensure the number of events in each sample was fixed at 25, 50, 100, and 2000 events Sample were drawn from those with and without the event (separately) 200 samples randomly drawn (with replacement) MODEL VALIDATION All available data were used CVD: n=110,934 (4688 CVD events) Hip fracture: n=61,563 (565 hip fractures)

13 Approaches considered 13 Dichotomised at the Median predictor value optimal cut-point based on the logrank test Categorised into 3 groups (using tertile predictor values) 4 groups (using quartile predictor values) 5 groups (using quintile predictor values) 5-year age categories 10-year age categories Linear relationship Nonlinear relationship fractional polynomials (FP2; 4 degrees of freedom per predictor) restricted cubic splines (3 knots)

14 Performance measures calculated Calibration Calibration plot Harrell s val.surv function; hazard regression with linear splines 14 Discrimination Harrell s c-index Clinical utility Decision curve analysis (Vickers & Elkin 2006) Net benefit; weighted difference between true positives and false positives D-statistic; Brier Score; R-squared also examined Not reported here - but in the supplementary material of Collins et al Stat Med 2016.

15 Net benefit (recap) 15 p t is the probability threshold to denote high risk Used to weight the FP and FN results TP and FP calculated using Kaplan-Meier estimates of the percentage surviving at 10 years among those with predicted risks greater than p t Bottom line: model with highest NB wins

16 Age & CVD 16

17 Total serum cholesterol & CVD 17

18 Age, cholesterol, BMI, SBP & CVD 18

19 Age, BMI & Hip fracture 19

20 RESULTS: CVD 25 events 20

21 RESULTS: CVD 50 events 21

22 RESULTS: CVD 100 events 22

23 RESULTS: CVD 2000 events 23

24 RESULTS: Hip fracture 25 events 24

25 RESULTS: Hip fracture 50 events 25

26 RESULTS: Hip fracture 100 events 26

27 RESULTS: Hip fracture 2000 events 27

28 RESULTS: Discrimination CVD 28 At small sample sizes (25 events) Large difference in between apparent performance and validation performance for optimal dichotomisation 0.84 (apparent); 0.72 (validation) Smaller differences observed for FP/RCS/Linear 0.84 (apparent); 0.78 (validation) Observed difference between dichotomisation (at the median) and linear/fp/rcs Apparent performance: difference of 0.05 Validation performance: difference of 0.05 Observed over all 4 sample sizes examined Negligible differences between linear/fp/rcs

29 RESULTS: Discrimination Hip Fracture 29 At small sample sizes (25 events) Large difference in between apparent performance and validation performance for optimal dichotomisation 0.86 (apparent); 0.76 (validation) FP/RCS/Linear 0.90 (apparent); 0.87 (validation) Observed difference between dichotomisation (at the median) and linear/fp/rcs Apparent performance: difference of 0.1 Validation performance: difference of 0.1 Observed over all 4 sample sizes examined Negligible differences between linear/fp/rcs

30 RESULTS: Discrimination Hip Fracture 30

31 RESULTS: Decision Curve Analysis (CVD only) [higher NB better model] 31 FP/RCS dichotomisation

32 RESULTS: Net cases found per

33 Conclusions 33 Systematic reviews show dichotomising / categorising continuous predictors routinely done when developing a prediction model Dichotomising, either at the median or optimal predictor value leads to models with substantially poorer performance Poor discrimination; poor calibration; poor clinical utility Large discrepancies between apparent performance and validation performance observed for optimal split dichotomising The impact of dichotomising continuous predictors are handled is more pronounced at smaller sample sizes

Supplementary appendix

Supplementary appendix This appendix formed part of the original submission and has been peer reviewed. We post it as supplied by the authors. Supplement to: Callegaro D, Miceli R, Bonvalot S, et al. Development