Up-to-date survival estimates from prognostic models using temporal recalibration

Similar documents
Sarwar Islam Mozumder 1, Mark J Rutherford 1 & Paul C Lambert 1, Stata London Users Group Meeting

Overview. All-cause mortality for males with colon cancer and Finnish population. Relative survival

Quantifying cancer patient survival; extensions and applications of cure models and life expectancy estimation

Estimating and Modelling the Proportion Cured of Disease in Population Based Cancer Studies

Multivariate dose-response meta-analysis: an update on glst

An overview and some recent advances in statistical methods for population-based cancer survival analysis

Estimating and modelling relative survival

About me. An overview and some recent advances in statistical methods for population-based cancer survival analysis. Research interests

M. J. Rutherford 1,*, T. M-L. Andersson 2, H. Møller 3, P.C. Lambert 1,2.

Estimating relative survival using the strel command

Multi-state survival analysis in Stata

Assessment of lead-time bias in estimates of relative survival for breast cancer

Issues for analyzing competing-risks data with missing or misclassification in causes. Dr. Ronny Westerman

Age-standardised Net Survival & Cohort and Period Net Survival Estimates

Notes for laboratory session 2

An overview and some recent advances in statistical methods for population-based cancer survival analysis

An application of the new irt command in Stata

Survival Prediction Models for Estimating the Benefit of Post-Operative Radiation Therapy for Gallbladder Cancer and Lung Cancer

Biostatistics II

ASSESSMENT OF LEAD-TIME BIAS IN ESTIMATES OF RELATIVE SURVIVAL FOR BREAST CANCER

m 11 m.1 > m 12 m.2 risk for smokers risk for nonsmokers

Cancer Incidence Predictions (Finnish Experience)

1. Objective: analyzing CD4 counts data using GEE marginal model and random effects model. Demonstrate the analysis using SAS and STATA.

Age (continuous) Gender (0=Male, 1=Female) SES (1=Low, 2=Medium, 3=High) Prior Victimization (0= Not Victimized, 1=Victimized)

The improved cure fraction for esophageal cancer in Linzhou city

Cross-validation. Miguel Angel Luque Fernandez Faculty of Epidemiology and Population Health Department of Non-communicable Diseases.

4. STATA output of the analysis

MEASURING CANCER CURE IN IRELAND

Cross-validated AUC in Stata: CVAUROC

Supplementary Appendix

Modeling unobserved heterogeneity in Stata

Supplementary Online Content

1 Cancer Council Queensland, Brisbane, Queensland, Australia.

Cancer survival and prevalence in Tasmania

Dynamic prediction using joint models for recurrent and terminal events: Evolution after a breast cancer

Downloaded from:

Multiple Linear Regression Analysis

Supplement for: CD4 cell dynamics in untreated HIV-1 infection: overall rates, and effects of age, viral load, gender and calendar time.

Survival analysis beyond the Cox-model

Cross-validation. Miguel Angel Luque Fernandez Faculty of Epidemiology and Population Health Department of Non-communicable Disease.

OHDSI Tutorial: Design and implementation of a comparative cohort study in observational healthcare data

MODEL SELECTION STRATEGIES. Tony Panzarella

Assessing methods for dealing with treatment switching in clinical trials: A follow-up simulation study

Workplace Project Portfolio C (WPP C) Laura Rodwell

Immortal Time Bias and Issues in Critical Care. Dave Glidden May 24, 2007

The results of the clinical exam first have to be turned into a numeric variable.

Self-assessment test of prerequisite knowledge for Biostatistics III in R

A note on Australian AIDS survival

Development and validation of QDiabetes-2018 risk prediction algorithm to estimate future risk of type 2 diabetes: cohort study

STATISTICAL APPENDIX

Name: emergency please discuss this with the exam proctor. 6. Vanderbilt s academic honor code applies.

Integrated Quality Survival Product Stata UK Meeting 12th September Piers Gaunt

Modelling Spatially Correlated Survival Data for Individuals with Multiple Cancers

An Examination of Factors Affecting Incidence and Survival in Respiratory Cancers. Katie Frank Roberto Perez Mentor: Dr. Kate Cowles.

Supplementary Online Content

Sociology Exam 3 Answer Key [Draft] May 9, 201 3

Estimating and comparing cancer progression risks under varying surveillance protocols: moving beyond the Tower of Babel

Original Article INTRODUCTION. Alireza Abadi, Farzaneh Amanpour 1, Chris Bajdik 2, Parvin Yavari 3

The Stata Journal. Editor Nicholas J. Cox Department of Geography Durham University South Road Durham DH1 3LE UK

Regression Output: Table 5 (Random Effects OLS) Random-effects GLS regression Number of obs = 1806 Group variable (i): subject Number of groups = 70

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method

The Statistical Analysis of Failure Time Data

Use of GEEs in STATA

Estimation and prediction in multistate models

Statistical Methods for the Evaluation of Treatment Effectiveness in the Presence of Competing Risks

Benchmark Dose Modeling Cancer Models. Allen Davis, MSPH Jeff Gift, Ph.D. Jay Zhao, Ph.D. National Center for Environmental Assessment, U.S.

Impact and implications of Cancer Death Status Reporting Delay on Population- Based Relative Survival Analysis with Presumed-Alive Assumption

QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDICTORS ON THE PERFORMANCE OF A PROGNOSTIC MODEL

AN INDEPENDENT VALIDATION OF QRISK ON THE THIN DATABASE

Challenges in design and analysis of large register-based epidemiological studies

Multivariable Cox regression. Day 3: multivariable Cox regression. Presentation of results. The statistical methods section

Bayesian Dose Escalation Study Design with Consideration of Late Onset Toxicity. Li Liu, Glen Laird, Lei Gao Biostatistics Sanofi

An Overview of Survival Statistics in SEER*Stat

Application of EM Algorithm to Mixture Cure Model for Grouped Relative Survival Data

Biostatistics 513 Spring Homework 1 Key solution. See the Appendix below for Stata code and output related to this assignment.

Sensitivity, Specificity and Predictive Value [adapted from Altman and Bland BMJ.com]

Long-Term Use of Long-Acting Insulin Analogs and Breast Cancer Incidence in Women with Type 2 Diabetes.

Estimating average treatment effects from observational data using teffects

BIOSTATISTICAL METHODS AND RESEARCH DESIGNS. Xihong Lin Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA

Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology

This tutorial presentation is prepared by. Mohammad Ehsanul Karim

8/6/15. Homework. Homework. Lecture 8: Systematic Reviews and Meta-analysis

Development and validation of a multivariable prediction model for all-cause cancer incidence based on health behaviours in the population setting

Case Studies in Bayesian Augmented Control Design. Nathan Enas Ji Lin Eli Lilly and Company

Metabolic factors and risk of prostate, kidney, and bladder cancer. Christel Häggström

The SAS SUBTYPE Macro

Generalized Mixed Linear Models Practical 2

Methodology for the Survival Estimates

Final Report 22 January 2014

ESM1 for Glucose, blood pressure and cholesterol levels and their relationships to clinical outcomes in type 2 diabetes: a retrospective cohort study

Ethnic Disparities in the Treatment of Stage I Non-small Cell Lung Cancer. Juan P. Wisnivesky, MD, MPH, Thomas McGinn, MD, MPH, Claudia Henschke, PhD,

INTRODUCTION TO SURVIVAL CURVES

Period estimates of cancer patient survival are more up-to-date than complete estimates even at comparable levels of precision

Modelled prevalence. Marc COLONNA. January 22-23, 2014 Ispra

CCSS Concept Proposal Working Group: Biostatistics and Epidemiology

Creating prognostic systems for cancer patients: A demonstration using breast cancer

UK Liver Transplant Audit

Statistical analysis and modeling: cancer, clinical trials, environment and epidemiology.

Final Exam - section 2. Thursday, December hours, 30 minutes

Downloaded from jhs.mazums.ac.ir at 13: on Friday July 27th 2018 [ DOI: /acadpub.jhs ]

Transcription:

Up-to-date survival estimates from prognostic models using temporal recalibration Sarah Booth 1 Mark J. Rutherford 1 Paul C. Lambert 1,2 1 Biostatistics Research Group, Department of Health Sciences, University of Leicester, Leicester, UK 2 Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden 12th September 2018 Nordic and Baltic Stata Users Group Meeting Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 1 / 23

Overview Prognostic models for cancer Flexible parametric survival models (stpm2) Period analysis (stset) Method of temporal recalibration Comparison of cohort, recalibrated and period analysis models Importance of updating prognostic models Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 2 / 23

PREDICT: Prognostic Model for Breast Cancer dos Reis, F. J. C., Wishart, G. C., Dicks, E. M. et al. (2017), An updated PREDICT breast cancer prognostication and treatment benefit prediction model with independent validation, Breast Cancer Research 19(1). PREDICT Version 2.1 tool available from: http://www.predict.nhs.uk/predict_v2.1/ Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 3 / 23

PREDICT: Prognostic Model for Breast Cancer dos Reis, F. J. C., Wishart, G. C., Dicks, E. M. et al. (2017), An updated PREDICT breast cancer prognostication and treatment benefit prediction model with independent validation, Breast Cancer Research 19(1). PREDICT Version 2.1 tool available from: http://www.predict.nhs.uk/predict_v2.1/ Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 3 / 23

Flexible Parametric Survival Models Unlike the Cox model, parametric models specify the baseline hazard The Weibull model requires linearity on the log cumulative hazard scale ln[h(t x i )] = ln(λ) + γ ln(t) + x i β Flexible parametric survival models use restricted cubic splines which allow more complex shapes to be captured Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 4 / 23

Restricted Cubic Splines 200 150 Y 100 50 0 0 5 10 15 20 X Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 5 / 23

Restricted Cubic Splines 200 150 Y 100 50 0 0 5 10 15 20 X Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 5 / 23

Restricted Cubic Splines 200 150 Y 100 50 0 0 5 10 15 20 X Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 5 / 23

Flexible Parametric Survival Models ln[h(t x i )] = γ 0 + γ 1 z 1i + γ 2 z 2i + γ 3 z 3i +... + x i β z i = derived variables for the restricted cubic splines x i β = linear predictor = prognostic index stpm2 command in Stata Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 6 / 23

Cohort vs Period Analysis Cohort Analysis All 4 participants would be included in cohort analysis Referred to as complete analysis by Brenner et al. (2009) Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 7 / 23

Cohort vs Period Analysis Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 7 / 23

Cohort vs Period Analysis Advantages of Period Analysis Creates more up-to-date survival estimates because people diagnosed many years ago only contribute to long-term survival estimates Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 7 / 23

Cohort vs Period Analysis Advantages of Period Analysis Creates more up-to-date survival estimates because people diagnosed many years ago only contribute to long-term survival estimates Disadvantages of Period Analysis Reduces sample size Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 7 / 23

Temporal Recalibration Method Fit a cohort model Use a period analysis sample to recalibrate the model The covariate effects are constrained to be the same The baseline hazard function is allowed to vary which can capture any improvements in survival Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 8 / 23

Data Colon cancer data from Surveillance, Epidemiology, and End Results Program (SEER) database National Cancer Institute: Data collected from the United States Variables used in this analysis are: age at diagnosis, sex, ethnicity Survival times measured in months but for period analysis dates are required Surveillance, Epidemiology, and End Results (SEER) Program (www.seer.cancer.gov) Research Data (1973-2015), National Cancer Institute, DCCPS, Surveillance Research Program, released April 2018, based on the November 2017 submission Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 9 / 23

Data Colon cancer data from Surveillance, Epidemiology, and End Results Program (SEER) database National Cancer Institute: Data collected from the United States Variables used in this analysis are: age at diagnosis, sex, ethnicity Survival times measured in months but for period analysis dates are required. gen dx = mdy(mmdx,1,yydx). format dx %td. gen exit = dx+survmm*30.5. format exit %td mmdx: month of diagnosis yydx: year of diagnosis survmm: survival time in months dx: date of diagnosis exit: date of death or censoring Surveillance, Epidemiology, and End Results (SEER) Program (www.seer.cancer.gov) Research Data (1973-2015), National Cancer Institute, DCCPS, Surveillance Research Program, released April 2018, based on the November 2017 submission Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 9 / 23

Data Used for Each Model Cause-specific survival: deaths due to colon cancer Proportional hazards models: for simplicity but also possible with time-dependent effects Cohort: 63,223 participants, 22,119 deaths Period Analysis: 39,743 participants, 4,889 deaths Observed: 6,300 participants, 2,474 deaths Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 10 / 23

stset: Cohort. stset exit, origin(dx) fail(cancer==1) scale(365.24) /// > exit(time min(dx+10*365.25,mdy(12,31,2005))) id: id failure event: cancer == 1 obs. time interval: (exit[_n-1], exit] exit on or before: time min(dx+10*365.25,mdy(12,31,2005)) t for analysis: (time-origin)/365.24 origin: time dx 124,579 total observations 61,356 observations begin on or after exit 63,223 observations remaining, representing 63,223 subjects 22,119 failures in single-failure-per-subject data 184,050.03 total analysis time at risk and under observation at risk from t = 0 earliest observed entry t = 0 last observed exit t = 9.998905 exit: date of death or censoring Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 11 / 23

stset: Cohort. stset exit, origin(dx) fail(cancer==1) scale(365.24) /// > exit(time min(dx+10*365.25,mdy(12,31,2005))) id: id failure event: cancer == 1 obs. time interval: (exit[_n-1], exit] exit on or before: time min(dx+10*365.25,mdy(12,31,2005)) t for analysis: (time-origin)/365.24 origin: time dx 124,579 total observations 61,356 observations begin on or after exit 63,223 observations remaining, representing 63,223 subjects 22,119 failures in single-failure-per-subject data 184,050.03 total analysis time at risk and under observation at risk from t = 0 earliest observed entry t = 0 last observed exit t = 9.998905 origin: when people become at risk, dx date of diagnosis Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 11 / 23

stset: Cohort. stset exit, origin(dx) fail(cancer==1) scale(365.24) /// > exit(time min(dx+10*365.25,mdy(12,31,2005))) id: id failure event: cancer == 1 obs. time interval: (exit[_n-1], exit] exit on or before: time min(dx+10*365.25,mdy(12,31,2005)) t for analysis: (time-origin)/365.24 origin: time dx 124,579 total observations 61,356 observations begin on or after exit 63,223 observations remaining, representing 63,223 subjects 22,119 failures in single-failure-per-subject data 184,050.03 total analysis time at risk and under observation at risk from t = 0 earliest observed entry t = 0 last observed exit t = 9.998905 scale(365.24): convert to survival time in years Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 11 / 23

stset: Cohort. stset exit, origin(dx) fail(cancer==1) scale(365.24) /// > exit(time min(dx+10*365.25,mdy(12,31,2005))) id: id failure event: cancer == 1 obs. time interval: (exit[_n-1], exit] exit on or before: time min(dx+10*365.25,mdy(12,31,2005)) t for analysis: (time-origin)/365.24 origin: time dx 124,579 total observations 61,356 observations begin on or after exit 63,223 observations remaining, representing 63,223 subjects 22,119 failures in single-failure-per-subject data 184,050.03 total analysis time at risk and under observation at risk from t = 0 earliest observed entry t = 0 last observed exit t = 9.998905 fail: event indicator, cancer==1: death due to colon cancer Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 11 / 23

stset: Cohort. stset exit, origin(dx) fail(cancer==1) scale(365.24) /// > exit(time min(dx+10*365.25,mdy(12,31,2005))) id: id failure event: cancer == 1 obs. time interval: (exit[_n-1], exit] exit on or before: time min(dx+10*365.25,mdy(12,31,2005)) t for analysis: (time-origin)/365.24 origin: time dx 124,579 total observations 61,356 observations begin on or after exit 63,223 observations remaining, representing 63,223 subjects 22,119 failures in single-failure-per-subject data 184,050.03 total analysis time at risk and under observation at risk from t = 0 earliest observed entry t = 0 last observed exit t = 9.998905 exit(): follow-up until end of 2005 or for a maximum of 10 years Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 11 / 23

Model: Cohort. stpm2 agercs* female black, scale(hazard) df(5) noorthog eform Log likelihood = -73439.283 Number of obs = 63,223 exp(b) Std. Err. z P> z [95% Conf. Interval] xb agercs1 1.012557.0025474 4.96 0.000 1.007577 1.017563 agercs2 1.00005 7.46e-06 6.68 0.000 1.000035 1.000064 agercs3.9999177 8.99e-06-9.15 0.000.9999001.9999353 female.9098671.0125303-6.86 0.000.8856366.9347606 black 1.403117.0286116 16.61 0.000 1.348145 1.46033 _rcs1 12.69938.6035658 53.48 0.000 11.56984 13.93919 _rcs2 1.150777.0046616 34.67 0.000 1.141677 1.15995 _rcs3.8279092.0097947-15.96 0.000.8089329.8473307 _rcs4 1.009746.0174485 0.56 0.575.9761203 1.04453 _rcs5 1.113578.0115556 10.37 0.000 1.091159 1.136459 _cons 308.5041 53.719 32.92 0.000 219.3025 433.9887. estimates store cohort. range timevar10 0 10 1000. predict cohort2006 if yydx==2006, timevar(timevar10) meansurv agercs* female black: covariates in the model Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 12 / 23

Model: Cohort. stpm2 agercs* female black, scale(hazard) df(5) noorthog eform Log likelihood = -73439.283 Number of obs = 63,223 exp(b) Std. Err. z P> z [95% Conf. Interval] xb agercs1 1.012557.0025474 4.96 0.000 1.007577 1.017563 agercs2 1.00005 7.46e-06 6.68 0.000 1.000035 1.000064 agercs3.9999177 8.99e-06-9.15 0.000.9999001.9999353 female.9098671.0125303-6.86 0.000.8856366.9347606 black 1.403117.0286116 16.61 0.000 1.348145 1.46033 _rcs1 12.69938.6035658 53.48 0.000 11.56984 13.93919 _rcs2 1.150777.0046616 34.67 0.000 1.141677 1.15995 _rcs3.8279092.0097947-15.96 0.000.8089329.8473307 _rcs4 1.009746.0174485 0.56 0.575.9761203 1.04453 _rcs5 1.113578.0115556 10.37 0.000 1.091159 1.136459 _cons 308.5041 53.719 32.92 0.000 219.3025 433.9887. estimates store cohort. range timevar10 0 10 1000. predict cohort2006 if yydx==2006, timevar(timevar10) meansurv scale(hazard): scale used e.g. hazards, odds Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 12 / 23

Model: Cohort. stpm2 agercs* female black, scale(hazard) df(5) noorthog eform Log likelihood = -73439.283 Number of obs = 63,223 exp(b) Std. Err. z P> z [95% Conf. Interval] xb agercs1 1.012557.0025474 4.96 0.000 1.007577 1.017563 agercs2 1.00005 7.46e-06 6.68 0.000 1.000035 1.000064 agercs3.9999177 8.99e-06-9.15 0.000.9999001.9999353 female.9098671.0125303-6.86 0.000.8856366.9347606 black 1.403117.0286116 16.61 0.000 1.348145 1.46033 _rcs1 12.69938.6035658 53.48 0.000 11.56984 13.93919 _rcs2 1.150777.0046616 34.67 0.000 1.141677 1.15995 _rcs3.8279092.0097947-15.96 0.000.8089329.8473307 _rcs4 1.009746.0174485 0.56 0.575.9761203 1.04453 _rcs5 1.113578.0115556 10.37 0.000 1.091159 1.136459 _cons 308.5041 53.719 32.92 0.000 219.3025 433.9887. estimates store cohort. range timevar10 0 10 1000. predict cohort2006 if yydx==2006, timevar(timevar10) meansurv df(5): degrees of freedom for modelling the baseline Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 12 / 23

Model: Cohort. stpm2 agercs* female black, scale(hazard) df(5) noorthog eform Log likelihood = -73439.283 Number of obs = 63,223 exp(b) Std. Err. z P> z [95% Conf. Interval] xb agercs1 1.012557.0025474 4.96 0.000 1.007577 1.017563 agercs2 1.00005 7.46e-06 6.68 0.000 1.000035 1.000064 agercs3.9999177 8.99e-06-9.15 0.000.9999001.9999353 female.9098671.0125303-6.86 0.000.8856366.9347606 black 1.403117.0286116 16.61 0.000 1.348145 1.46033 _rcs1 12.69938.6035658 53.48 0.000 11.56984 13.93919 _rcs2 1.150777.0046616 34.67 0.000 1.141677 1.15995 _rcs3.8279092.0097947-15.96 0.000.8089329.8473307 _rcs4 1.009746.0174485 0.56 0.575.9761203 1.04453 _rcs5 1.113578.0115556 10.37 0.000 1.091159 1.136459 _cons 308.5041 53.719 32.92 0.000 219.3025 433.9887. estimates store cohort. range timevar10 0 10 1000. predict cohort2006 if yydx==2006, timevar(timevar10) meansurv noorthog: splines are not orthogonalised (simplifies recalibration) Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 12 / 23

Model: Cohort. stpm2 agercs* female black, scale(hazard) df(5) noorthog eform Log likelihood = -73439.283 Number of obs = 63,223 exp(b) Std. Err. z P> z [95% Conf. Interval] xb agercs1 1.012557.0025474 4.96 0.000 1.007577 1.017563 agercs2 1.00005 7.46e-06 6.68 0.000 1.000035 1.000064 agercs3.9999177 8.99e-06-9.15 0.000.9999001.9999353 female.9098671.0125303-6.86 0.000.8856366.9347606 black 1.403117.0286116 16.61 0.000 1.348145 1.46033 _rcs1 12.69938.6035658 53.48 0.000 11.56984 13.93919 _rcs2 1.150777.0046616 34.67 0.000 1.141677 1.15995 _rcs3.8279092.0097947-15.96 0.000.8089329.8473307 _rcs4 1.009746.0174485 0.56 0.575.9761203 1.04453 _rcs5 1.113578.0115556 10.37 0.000 1.091159 1.136459 _cons 308.5041 53.719 32.92 0.000 219.3025 433.9887. estimates store cohort. range timevar10 0 10 1000. predict cohort2006 if yydx==2006, timevar(timevar10) meansurv eform: display the hazard ratios instead of log hazard ratios Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 12 / 23

Model: Cohort. stpm2 agercs* female black, scale(hazard) df(5) noorthog eform Log likelihood = -73439.283 Number of obs = 63,223 exp(b) Std. Err. z P> z [95% Conf. Interval] xb agercs1 1.012557.0025474 4.96 0.000 1.007577 1.017563 agercs2 1.00005 7.46e-06 6.68 0.000 1.000035 1.000064 agercs3.9999177 8.99e-06-9.15 0.000.9999001.9999353 female.9098671.0125303-6.86 0.000.8856366.9347606 black 1.403117.0286116 16.61 0.000 1.348145 1.46033 _rcs1 12.69938.6035658 53.48 0.000 11.56984 13.93919 _rcs2 1.150777.0046616 34.67 0.000 1.141677 1.15995 _rcs3.8279092.0097947-15.96 0.000.8089329.8473307 _rcs4 1.009746.0174485 0.56 0.575.9761203 1.04453 _rcs5 1.113578.0115556 10.37 0.000 1.091159 1.136459 _cons 308.5041 53.719 32.92 0.000 219.3025 433.9887. estimates store cohort. range timevar10 0 10 1000. predict cohort2006 if yydx==2006, timevar(timevar10) meansurv display the hazard ratios instead of log hazard ratios Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 12 / 23

Model: Cohort. stpm2 agercs* female black, scale(hazard) df(5) noorthog eform Log likelihood = -73439.283 Number of obs = 63,223 exp(b) Std. Err. z P> z [95% Conf. Interval] xb agercs1 1.012557.0025474 4.96 0.000 1.007577 1.017563 agercs2 1.00005 7.46e-06 6.68 0.000 1.000035 1.000064 agercs3.9999177 8.99e-06-9.15 0.000.9999001.9999353 female.9098671.0125303-6.86 0.000.8856366.9347606 black 1.403117.0286116 16.61 0.000 1.348145 1.46033 _rcs1 12.69938.6035658 53.48 0.000 11.56984 13.93919 _rcs2 1.150777.0046616 34.67 0.000 1.141677 1.15995 _rcs3.8279092.0097947-15.96 0.000.8089329.8473307 _rcs4 1.009746.0174485 0.56 0.575.9761203 1.04453 _rcs5 1.113578.0115556 10.37 0.000 1.091159 1.136459 _cons 308.5041 53.719 32.92 0.000 219.3025 433.9887. estimates store cohort. range timevar10 0 10 1000. predict cohort2006 if yydx==2006, timevar(timevar10) meansurv display the hazard ratios instead of log hazard ratios Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 12 / 23

Model: Cohort. stpm2 agercs* female black, scale(hazard) df(5) noorthog eform Log likelihood = -73439.283 Number of obs = 63,223 exp(b) Std. Err. z P> z [95% Conf. Interval] xb agercs1 1.012557.0025474 4.96 0.000 1.007577 1.017563 agercs2 1.00005 7.46e-06 6.68 0.000 1.000035 1.000064 agercs3.9999177 8.99e-06-9.15 0.000.9999001.9999353 female.9098671.0125303-6.86 0.000.8856366.9347606 black 1.403117.0286116 16.61 0.000 1.348145 1.46033 _rcs1 12.69938.6035658 53.48 0.000 11.56984 13.93919 _rcs2 1.150777.0046616 34.67 0.000 1.141677 1.15995 _rcs3.8279092.0097947-15.96 0.000.8089329.8473307 _rcs4 1.009746.0174485 0.56 0.575.9761203 1.04453 _rcs5 1.113578.0115556 10.37 0.000 1.091159 1.136459 _cons 308.5041 53.719 32.92 0.000 219.3025 433.9887. estimates store cohort. range timevar10 0 10 1000. predict cohort2006 if yydx==2006, timevar(timevar10) meansurv display the hazard ratios instead of log hazard ratios Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 12 / 23

stset: Temporal Recalibration & Period Analysis. stset exit, origin(dx) fail(cancer==1) scale(365.24) /// > entry(time mdy(1,1,2004)) exit(time min(dx+10*365.25,mdy(12,31,2005))) id: id failure event: cancer == 1 obs. time interval: (exit[_n-1], exit] enter on or after: time mdy(1,1,2004) exit on or before: time min(dx+10*365.25,mdy(12,31,2005)) t for analysis: (time-origin)/365.24 origin: time dx 124,579 total observations 23,480 observations end on or before enter() 61,356 observations begin on or after exit 39,743 observations remaining, representing 39,743 subjects 4,889 failures in single-failure-per-subject data 59,904.493 total analysis time at risk and under observation at risk from t = 0 earliest observed entry t = 0 last observed exit t = 9.998905 Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 13 / 23

Constraints: Temporal Recalibration. estimates restore cohort (results cohort are active now). local agercs1 = _b[agercs1]. local agercs2 = _b[agercs2]. local agercs3 = _b[agercs3]. local female = _b[female]. local black = _b[black]. constraint 1 _b[agercs1] = `agercs1. constraint 2 _b[agercs2] = `agercs2. constraint 3 _b[agercs3] = `agercs3. constraint 4 _b[female] = `female. constraint 5 _b[black] = `black. local knots = e(bhknots). local bknots = e(boundary_knots) Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 14 / 23

Constraints: Temporal Recalibration. estimates restore cohort (results cohort are active now). local agercs1 = _b[agercs1]. local agercs2 = _b[agercs2]. local agercs3 = _b[agercs3]. local female = _b[female]. local black = _b[black]. constraint 1 _b[agercs1] = `agercs1. constraint 2 _b[agercs2] = `agercs2. constraint 3 _b[agercs3] = `agercs3. constraint 4 _b[female] = `female. constraint 5 _b[black] = `black. local knots = e(bhknots). local bknots = e(boundary_knots) Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 14 / 23

Constraints: Temporal Recalibration. estimates restore cohort (results cohort are active now). local agercs1 = _b[agercs1]. local agercs2 = _b[agercs2]. local agercs3 = _b[agercs3]. local female = _b[female]. local black = _b[black]. constraint 1 _b[agercs1] = `agercs1. constraint 2 _b[agercs2] = `agercs2. constraint 3 _b[agercs3] = `agercs3. constraint 4 _b[female] = `female. constraint 5 _b[black] = `black. local knots = e(bhknots). local bknots = e(boundary_knots) Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 14 / 23

Model: Temporal Recalibration. stpm2 agercs* female black, scale(hazard) noorthog constraints(1 2 3 4 5) /// > bknots(`bknots ) knots(`knots ) eform note: delayed entry models are being fitted Log likelihood = -16015.094 Number of obs = 39,743 exp(b) Std. Err. z P> z [95% Conf. Interval] xb agercs1 1.012557 (constrained) agercs2 1.00005 (constrained) agercs3.9999177 (constrained) female.9098671 (constrained) black 1.403117 (constrained) _rcs1 23.11036 2.852501 25.44 0.000 18.14443 29.43541 _rcs2 1.201228.0117535 18.74 0.000 1.178411 1.224486 _rcs3.7933542.0212882-8.63 0.000.7527083.8361949 _rcs4.9970216.0372468-0.08 0.936.9266278 1.072763 _rcs5 1.144055.02421 6.36 0.000 1.097575 1.192504 _cons 2544.921 1128.816 17.68 0.000 1066.887 6070.581. predict recalibration2006 if yydx==2006, timevar(timevar10) meansurv Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 15 / 23

Model: Temporal Recalibration. stpm2 agercs* female black, scale(hazard) noorthog constraints(1 2 3 4 5) /// > bknots(`bknots ) knots(`knots ) eform note: delayed entry models are being fitted Log likelihood = -16015.094 Number of obs = 39,743 exp(b) Std. Err. z P> z [95% Conf. Interval] xb agercs1 1.012557 (constrained) agercs2 1.00005 (constrained) agercs3.9999177 (constrained) female.9098671 (constrained) black 1.403117 (constrained) _rcs1 23.11036 2.852501 25.44 0.000 18.14443 29.43541 _rcs2 1.201228.0117535 18.74 0.000 1.178411 1.224486 _rcs3.7933542.0212882-8.63 0.000.7527083.8361949 _rcs4.9970216.0372468-0.08 0.936.9266278 1.072763 _rcs5 1.144055.02421 6.36 0.000 1.097575 1.192504 _cons 2544.921 1128.816 17.68 0.000 1066.887 6070.581. predict recalibration2006 if yydx==2006, timevar(timevar10) meansurv Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 15 / 23

Model: Temporal Recalibration. stpm2 agercs* female black, scale(hazard) noorthog constraints(1 2 3 4 5) /// > bknots(`bknots ) knots(`knots ) eform note: delayed entry models are being fitted Log likelihood = -16015.094 Number of obs = 39,743 exp(b) Std. Err. z P> z [95% Conf. Interval] xb agercs1 1.012557 (constrained) agercs2 1.00005 (constrained) agercs3.9999177 (constrained) female.9098671 (constrained) black 1.403117 (constrained) _rcs1 23.11036 2.852501 25.44 0.000 18.14443 29.43541 _rcs2 1.201228.0117535 18.74 0.000 1.178411 1.224486 _rcs3.7933542.0212882-8.63 0.000.7527083.8361949 _rcs4.9970216.0372468-0.08 0.936.9266278 1.072763 _rcs5 1.144055.02421 6.36 0.000 1.097575 1.192504 _cons 2544.921 1128.816 17.68 0.000 1066.887 6070.581. predict recalibration2006 if yydx==2006, timevar(timevar10) meansurv Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 15 / 23

Model: Period Analysis. stpm2 agercs* female black, scale(hazard) df(5) eform note: delayed entry models are being fitted Log likelihood = -16080.35 Number of obs = 39,743 exp(b) Std. Err. z P> z [95% Conf. Interval] xb agercs1 1.004674.0051795 0.90 0.366.9945736 1.014877 agercs2 1.000028.0000157 1.80 0.072.9999974 1.000059 agercs3.9999383.000019-3.24 0.001.999901.9999756 female.9084784.0266046-3.28 0.001.8578025.962148 black 1.441617.0606779 8.69 0.000 1.327464 1.565587 _rcs1 2.014562.0187427 75.28 0.000 1.97816 2.051634 _rcs2 1.124344.0079382 16.60 0.000 1.108892 1.14001 _rcs3.9535394.0044961-10.09 0.000.9447678.9623925 _rcs4 1.069052.003847 18.56 0.000 1.061538 1.076618 _rcs5 1.008206.0025619 3.22 0.001 1.003198 1.01324 _cons.3234849.0094079-38.81 0.000.3055615.3424596. predict period2006 if yydx==2006, timevar(timevar10) meansurv Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 16 / 23

10 Year Marginal Survival 1.0 Observed 0.9 Proportion Alive 0.8 0.7 0.6 0.5 0 2 4 6 8 10 Years since Diagnosis Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 17 / 23

10 Year Marginal Survival 1.0 Observed Cohort 0.9 Proportion Alive 0.8 0.7 0.6 0.5 0 2 4 6 8 10 Years since Diagnosis Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 17 / 23

10 Year Marginal Survival 1.0 0.9 Observed Cohort Recalibrated Proportion Alive 0.8 0.7 0.6 0.5 0 2 4 6 8 10 Years since Diagnosis Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 17 / 23

10 Year Marginal Survival 1.0 0.9 Observed Cohort Recalibrated Period Analysis Proportion Alive 0.8 0.7 0.6 0.5 0 2 4 6 8 10 Years since Diagnosis Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 17 / 23

Calibration of Models. predict prognosticindex, xbnobaseline. xtile calibrationgroup = prognosticindex, n(10) 0.7 Calibration Plot Observed Survival Probability 0.6 0.5 0.4 Cohort Reference line 0.3 0.3 0.4 0.5 0.6 0.7 Expected Survival Probability Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 18 / 23

Calibration of Models. predict prognosticindex, xbnobaseline. xtile calibrationgroup = prognosticindex, n(10) 0.7 Calibration Plot Observed Survival Probability 0.6 0.5 0.4 Cohort Recalibration Reference line 0.3 0.3 0.4 0.5 0.6 0.7 Expected Survival Probability Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 18 / 23

Calibration of Models. predict prognosticindex, xbnobaseline. xtile calibrationgroup = prognosticindex, n(10) 0.7 Calibration Plot Observed Survival Probability 0.6 0.5 0.4 Cohort Recalibration Period Analysis Reference line 0.3 0.3 0.4 0.5 0.6 0.7 Expected Survival Probability Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 18 / 23

Risk Groups 1.0 Group 5 1.0 Group 2 0.8 0.8 Proportion Alive Proportion Alive 0.6 0.6 0.4 0 2 4 6 8 10 Years since Diagnosis 0.4 0 2 4 6 8 10 Years since Diagnosis Cohort Recalibrated Period Observed Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 19 / 23

Use of New Data Keep the original model from 1986-1995 Recalibrate using a period window for the 2 most recent years. Continue until recalibrating with a period window of 2004-2005 Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 20 / 23

Use of New Data Refit the model each year using the most recent 10 years of data Recalibrate using a period window for the 2 most recent years. Continue until recalibrating with a period window of 2004-2005 Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 20 / 23

10 Year Marginal Survival Predictions 1.0 Observed: Diagnosed in 2006 Proportion Alive 0.8 0.6 0.4 0 2 4 6 8 10 Years since Diagnosis Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 21 / 23

10 Year Marginal Survival Predictions 1.0 Observed: Diagnosed in 2006 Original Cohort: 1986-1995 Proportion Alive 0.8 0.6 0.4 0 2 4 6 8 10 Years since Diagnosis Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 21 / 23

10 Year Marginal Survival Predictions 1.0 Observed: Diagnosed in 2006 Original Cohort: 1986-1995 Cohort All Data: 1986-2005 Proportion Alive 0.8 0.6 0.4 0 2 4 6 8 10 Years since Diagnosis Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 21 / 23

10 Year Marginal Survival Predictions 1.0 Observed: Diagnosed in 2006 Original Cohort: 1986-1995 Cohort All Data: 1986-2005 New Cohort: 1996-2005 Proportion Alive 0.8 0.6 0.4 0 2 4 6 8 10 Years since Diagnosis Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 21 / 23

10 Year Marginal Survival Predictions Proportion Alive 1.0 0.8 0.6 Observed: Diagnosed in 2006 Original Cohort: 1986-1995 Cohort All Data: 1986-2005 New Cohort: 1996-2005 New Cohort Recalibrated 0.4 0 2 4 6 8 10 Years since Diagnosis Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 21 / 23

10 Year Marginal Survival Predictions Proportion Alive 1.0 0.8 0.6 Observed: Diagnosed in 2006 Original Cohort: 1986-1995 Cohort All Data: 1986-2005 New Cohort: 1996-2005 New Cohort Recalibrated Original Cohort Recalibrated 0.4 0 2 4 6 8 10 Years since Diagnosis Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 21 / 23

Summary Cohort models often underestimate survival Period analysis uses a subset of data to create more up-to-date survival predictions Very similar predictions are produced using temporal recalibration but all the data is used Simple to fit these types of models in Stata using stset to define the sample, stpm2 and constraints to fit the models Importance of regularly updating models when new data becomes available These methods can also be used for non-proportional hazard models Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 22 / 23

Selected References dos Reis, F. J. C., Wishart, G. C., Dicks, E. M. et al. (2017) An updated PREDICT breast cancer prognostication and treatment benefit prediction model with independent validation, Breast Cancer Research, 19. PREDICT Version 2.1 tool available from: http://www.predict.nhs.uk/predict_v2.1/ Surveillance, Epidemiology, and End Results (SEER) Program (www.seer.cancer.gov) Research Data (1973-2015), National Cancer Institute, DCCPS, Surveillance Research Program, released April 2018, based on the November 2017 submission. Royston, P. & Lambert, P. C. (2011) Flexible Parametric Survival Analysis Using Stata: Beyond the Cox Model, Stata Press. Hinchliffe, S. R. & Lambert, P. C. (2013) Flexible parametric modelling of cause-specific hazards to estimate cumulative incidence functions, BMC Medical Research Methodology, Springer Nature, 13. Brenner, H. & Gefeller, O (1996) An alternative approach to monitoring cancer patient survival, Cancer, 78, 2004-2010. Brenner, H. & Hakulinen, T. (2009) Up-to-date cancer survival: Period analysis and beyond, International Journal of Cancer, 124, 1384-1390. Sarah Booth: sb824@le.ac.uk Producing up-to-date survival estimates from prognostic models 23 / 23