Outline: Multivariable Cox regression PhD course Survival analysis Day 3: multivariable Cox regression Thomas Alexander Gerds Presentation of results The statistical methods section Modelling The linear predictor Testing for significant effects Stratification Interaction 1 / 50 2 / 50 Multivariable Cox regression Why adjusting for other variables? RCT: avoid attenuation of treatment effects Observational studies: avoid confounding increase precision of estimates improve model fit (e.g. proportional hazards) Why not adjusting for other variables missing values measurement error if the model changes the interpretation of the parameter of interest when the variable is included high correlation with one of the other variables Prognostic Factors Assessed for 15,096 Patients with Colon Cancer in Stages I and II (Mroczkowski et al. 2012) World J Surg DOI 10.1007/s00268-012-1531-2 3 / 50 4 / 50
A simple score predicts future cardiovascular events in an inception cohort of Regulatory BCL2 promoter polymorphism ( 938C>A) is associated with adverse dialysis patients outcome in patients with prostate carcinoma Molecular markers predictive of prostate cancer prognosis are urgently needed... We hypothesized that a regulatory BCL2 938C>A promoter polymorphism, which significantly affects promoter activity and Bcl-2 expression in different malignancies, may influence survival.... Survival analysis showed that the 938AA genotype was an independent, unfavorable prognostic factor for relapse-free survival in a primary cohort of 142 patients and in an independent replication cohort of 148 patients, with hazard ratios (HR) of 4.4 (95% CI, 1.3 15.1; p = 0.018) and 4.6 (95% CI, 1.5 14.2; p = 0.009). Furthermore, the 938AA genotype was independently associated with worse overall survival in the replication series, with a HR of 10.9 (95% CI, 1.2 99.3; p = 0.034)... Schwaiger et al. Kidney International (2006) 70, 543 548 Bachmann et al. Int. J. Cancer: 129, 2390 2399 (2011) 5 / 50 6 / 50 Bachman et al. Table 2 Bachman 7 / 50 8 / 50
Duration of adjuvant chemotherapy for breast cancer: a joint analysis of two Perioperative epidural analgesia for major abdominal surgery for cancer and randomised trials investigating three versus six courses of CMF recurrence-free survival: randomised trial Myles et al. 2011 BMJ 2011;342:d1491 9 / 50 10 / 50 Introduction Colleoni Presentation et al. of Br results J Cancer. Interpretation 2002 June 5; What 86(11): happens 1705 1714. when Testing Modelling Stratification Interaction Summary Introduction Presentation of results Interpretation What happens when Testing Modelling Stratification Interaction Summ Terminology: the linear predictor A multivariable Cox regression model specifies the hazard function for given values of the explanatory variables. The model is a proportional hazard model if all the effects β are independent of time: hazard = α 0 (t) exp(β 1 age + β 2 I (smoking=yes) + ) }{{} Linear predictor Interpretation of adjusted hazard ratios The hazard ratio of smoking for a person age 50: HR smoking = α 0(t) exp(β age 50 + β smoking (1)) α 0 (t) exp(β age 50 + β smoking (0)) = exp(β age50) exp(β smoking (1)) exp(β age 50) exp(β smoking (0)) = exp(β smoking ). The exponential of the coefficients are called the hazard ratios. For example, exp(β 1 ) is the hazard ratio for a one unit change of age and exp(β 2 ) the hazard ratio for smokers compared to non-smokers. 11 / 50 Interpretation: The effect of smoking on the hazard when comparing two patients with the same age, one smokes the other does not. The model assumes that the effect of smoking does not change with age. 12 / 50
Interpretation of adjusted hazard ratios Example: Ovarian cancer The hazard ratio of age in the same model: HR smoking = α 0(t) exp(β age 60 + β smoking (0)) α 0 (t) exp(β age 50 + β smoking (0)) = exp(β age 60 β age 50 + β smoking (0) β smoking (0)) = exp(β age 10). Interpretation: The effect of a 10 change of age on the hazard when comparing two patients with the same smoking behavior. The model assumes that the effect of a 10 year change of age is the same for smokers and non-smokers (and does not change with age). Survival in a randomised trial comparing two treatments for ovarian cancer (from an Eastern Cooperative Oncology Group study) futime: fustat: age: resid.ds: rx: ecog.ps: survival or censoring time censoring status in years residual disease present (1=no,2=yes) treatment group ECOG performance status http://ecog.dfci.harvard.edu/ẽcogdba/general/perf_stat.html 13 / 50 14 / 50 What happens when we omit (important) variables? rx = 2 1 rx = 1 2.50 [0.69; 8.98] 0.1616 age 1.13 [1.03; 1.24] 0.007772 ecog.ps = 1 1 ecog.ps = 2 1.40 [0.40; 4.94] 0.6016 resid.ds = 1 1 resid.ds = 2 2.28 [0.49; 10.74] 0.2954 rx = 2 1 rx = 1 2.23 [0.65; 7.71] 0.2034 age 1.16 [1.06; 1.27] 0.00141 rx = 2 1 rx = 1 1.82 [0.57; 5.74] 0.3096 GBSG-II study: Data from 686 women: Predictors hormonal therapy: levels no and yes age of the patients (years) menopausal status: levels pre and post tumor size (in mm) tumor grade: levels I < II < III number of positive nodes progrec progesterone receptor (fmol) estrec estrogen receptor (fmol) Outcome the recurrence free survival time (days). the censoring indicator (0 censored, 1 event). Eastern Cooperative Oncology Group study (ovarian cancer) 15 / 50 16 / 50
What happens when we add (important) variables? What happens when we log-transform a variable? horth=no 1.44 [1.13; 1.84] 0.003602 horth=no 1.45 [1.14; 1.86] 0.00284 tsize 1.02 [1.01; 1.02] < 0.0001 horth=no 1.41 [1.09; 1.81] 0.007875 age 1.00 [0.99; 1.01] 0.9959 tsize 1.01 [1.00; 1.01] 0.07548 grade=iii 1 grade=i 0.36 [0.22; 0.61] 0.0001125 grade=ii 0.79 [0.60; 1.02] 0.07118 pnodes 1.05 [1.04; 1.07] < 0.0001 horth=no 1.41 [1.09; 1.81] 0.007875 age 1.00 [0.99; 1.01] 0.9959 tsize 1.01 [1.00; 1.01] 0.07548 grade=iii 1 grade=i 0.36 [0.22; 0.61] 0.0001125 grade=ii 0.79 [0.60; 1.02] 0.07118 pnodes 1.05 [1.04; 1.07] < 0.0001 horth=no 1.40 [1.09; 1.81] 0.008282 age 1.00 [0.99; 1.01] 0.9769 log(tsize) 1.35 [1.04; 1.74] 0.0237 grade=iii 1 grade=i 0.37 [0.22; 0.61] 0.0001296 grade=ii 0.79 [0.61; 1.03] 0.08439 pnodes 1.05 [1.04; 1.07] < 0.0001 German breast cancer study group data (GBSG2) 17 / 50 German breast cancer study group data (GBSG2) 18 / 50 Testing for significant effects The aim is to test if the adjusted effect of one variable is zero: The Wald test: Z Wald = Null hypothesis:β age = 0 ˆβ asymptotic standard error( ˆβ) χ2 r The likelihood ratio test: [ ] Lp (reduced) Z LR 2 log χ 2 r L p (full) Output of R summary ( coxph ( Surv ( time, c e n s )~age+menostat+l o g ( t s i z e )+ grade+pnodes, data=gbsg2 ) ) Call: coxph(formula = Surv(time, cens) ~ age + menostat + log(tsize) + grade + pnodes, data = GBSG2) n= 686, number of events= 299 coef exp(coef) se(coef) z Pr(> z ) age -0.014739 0.985369 0.009009-1.636 0.1018 menostatpost 0.290007 1.336437 0.181232 1.600 0.1096 log(tsize) 0.298601 1.347971 0.131428 2.272 0.0231 * gradei -1.022727 0.359613 0.262396-3.898 0.00009713032506 *** gradeii -0.255626 0.774431 0.133730-1.912 0.0559. pnodes 0.050769 1.052080 0.007409 6.853 0.00000000000725 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 exp(coef) exp(-coef) lower.95 upper.95 age 0.9854 1.0148 0.9681 1.0029 menostatpost 1.3364 0.7483 0.9369 1.9064 log(tsize) 1.3480 0.7419 1.0419 1.7440 gradei 0.3596 2.7808 0.2150 0.6014 gradeii 0.7744 1.2913 0.5959 1.0065 pnodes 1.0521 0.9505 1.0369 1.0675 Concordance= 0.668 (se = 0.018 ) Rsquare= 0.107 (max possible= 0.995 ) Likelihood ratio test= 77.58 on 6 df, p=0.00000000000001132 Wald test = 92.62 on 6 df, p=0 Score (logrank) test = 99.13 on 6 df, p=0 19 / 50 20 / 50
Likelihood ratio test for effect of tumor grade (R) Output of SAS Null hypothesis: β tgrade I vs II = β tgrade II vs III = β tgrade I vs III = 0 reduced=coxph ( Surv ( time, c e n s )~horth+age+l o g ( t s i z e )+pnodes, data=gbsg2) f u l l =coxph ( Surv ( time, c e n s )~horth+age+l o g ( t s i z e )+ grade+pnodes, data=gbsg2) anova ( reduced, f u l l, t e s t=" Chisq " ) Type 3 Tests Wald Effect DF Chi-Square Pr > ChiSq age 1 2.6714 0.1022 menostat 1 2.5537 0.1100 log_tsize 1 5.1623 0.0231 tgrade 2 15.4523 0.0004 pnodes 1 46.9294 <.0001 Analysis of Maximum Likelihood Estimates Analysis of Deviance Table Cox model: response is Surv(time, cens) Model 1: ~ horth + age + log(tsize) + pnodes Model 2: ~ horth + age + log(tsize) + grade + pnodes loglik Chisq Df P(> Chi ) 1-1755.9 2-1747.0 17.76 2 0.0001391 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Parameter Standard Hazard 95% Hazard Ratio Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio Confidence Limits age 1-0.01472 0.00901 2.6714 0.1022 0.985 0.968 1.003 menostat Post 1 0.28961 0.18123 2.5537 0.1100 1.336 0.937 1.906 log_tsize 1 0.29862 0.13143 5.1623 0.0231 1.348 1.042 1.744 tgrade I 1-1.02244 0.26240 15.1831 <.0001 0.360 0.215 0.602 tgrade II 1-0.25549 0.13373 3.6499 0.0561 0.775 0.596 1.007 pnodes 1 0.05076 0.00741 46.9294 <.0001 1.052 1.037 1.067 21 / 50 22 / 50 Likelihood ratio test in SAS Likelihood ratio test for effect of tumor grade (Stata) proc phreg data=gbsg2; class horth menostat tgrade; model time*cens(0)=age menostat log_tsize tgrade pnodes / RISKLIMITS; ods output FitStatistics=logl1; run; proc phreg data=gbsg2; class horth menostat tgrade; model time*cens(0)=age menostat log_tsize pnodes; ods output FitStatistics=logl2; run; data lrt; merge logl1(rename=(withcovariates=ll1)) logl2(rename=(withcovariates=ll2)); where Criterion= -2 LOG L ; lrt=ll2-ll1; df=2; p=1-probchi(lrt,2); keep lrt df p; run; proc print data=lrt; run; stset time, failure(cens==1) xi: stcox age i.menostat logtsize i.tgrade pnodes estimates store model1 xi: stcox age i.menostat logtsize pnodes estimates store model2 lrtest model1 model2 23 / 50 24 / 50
Stata output xi: stcox age i.menostat logtsize i.tgrade pnodes estimates store model1 Cox regression -- Breslow method for ties No. of subjects = 686 Number of obs = 686 No. of failures = 299 Time at risk = 771400 LR chi2(6) = 77.54 Log likelihood = -1749.4035 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ _t Haz. Ratio Std. Err. z P> z [95% Conf. Interval] -------------+---------------------------------------------------------------- age.9853844.0088766-1.63 0.102.9681394 1.002937 _Imenostat_2.7485525.1356617-1.60 0.110.5247558 1.067794 logtsize 1.348004.177172 2.27 0.023 1.041874 1.744083 _Itgrade_2 2.153197.5318953 3.10 0.002 1.32683 3.494239 _Itgrade_3 2.779972.7294545 3.90 0.000 1.662219 4.649354 pnodes 1.052071.0077956 6.85 0.000 1.036902 1.067461 ------------------------------------------------------------------------------. lrtest model1 model2 Likelihood-ratio test LR chi2(2) = 18.37 (Assumption: model2 nested in model1) Prob > chi2 = 0.0001 25 / 50 Good questions What is an independent predictor? A factor which is significant in both univariate and multiple Cox regression? A factor which is significant in a multiple Cox regression? How to build the linear predictor? remove non-significant variables? (0.05)? automated stepwise selection? log transform? regression splines? Should we adjust multivariable Cox analyses for multiple testing? Bonferroni-Holm? False-discovery rate? Including the p-values from model checks? Unfortunately (good for me), there are no general answers to these questions. Whatever you do: write all important steps of modeling in your article! 26 / 50 An incomplete methods section A good methods section Randomized clinical trial: high-dose vs. standard-dose proton pump inhibitors for the prevention of recurrent haemorrhage after combined endoscopic haemostasis of bleeding peptic ulcers Chen et al. 201, to appear in Aliment Pharmacol Ther 27 / 50 Boku et al. 2009 The Lancet Oncology 28 / 50
Multivariable Models in Lancet and New England Journal of Medicine Purpose: To review the principles of multivariable analysis and to examine the application of multivariable statistical methods in general medical literature. Data Sources: A computer-assisted search of articles in The Lancet and The New England Journal of Medicine identified 451 publications containing multivariable methods from 1985 through 1989. A random sample of 60 articles that used the two most common methods logistic regression or proportional hazards analysis was selected for more intensive review. Data Extraction: During review of the 60 randomly selected articles, the focus was on generally accepted methodologic guidelines that can prevent problems affecting the accuracy and interpretation of multivariable analytic results. Concato et al. 2004 Annals of Internal Medicine 29 / 50 Multivariable Models in Lancet and New England Journal of Medicine Results: From 1985 to 1989, the relative frequency of multivariable statistical methods increased annually from about 10% to 18% among all articles in the two journals. In 44 (73%) of 60 articles using logistic or proportional hazards regression, risk estimates were quantified for individual variables ( risk factors ). Violations and omissions of methodologic guidelines in these 44 articles included overfitting of data; no test of conformity of variables to a linear gradient; no mention of pertinent checks for proportional hazards; no report of testing for interactions between independent variables; and unspecified coding or selection of independent variables. These problems would make the reported results potentially inaccurate, misleading, or difficult to interpret. Conclusions: The findings suggest a need for improvement in the reporting and perhaps conducting of multivariable analyses in medical research. Concato et al. 2004 Annals of Internal Medicine 30 / 50 German breast cancer study Terminology: stratified baseline hazard Survival probability 5 10 [menostat=post] [menostat=pre ] 0 500 1000 1500 2000 2500 Time menostat Post Pre 396 374 322 252 200 153 114 55 29 9 1 290 264 220 170 146 113 84 49 20 4 0 If one is in doubt about the proportional hazard assumption of a single factor in a multivariable analysis, then the stratified Cox model can be a useful alternative: hazard = { α pre meno (t) exp(β 1 age + β 2 I (horth=yes) + ) α post meno (t) exp(β 1 age + β 2 I (horth=yes) + ) Comparison with sub-group analysis { hazard = α pre meno (t) exp(β pre 1 age + βpre 2 I (horth=yes) + ) α post meno (t) exp(β post 1 age + β post 2 I (horth=yes) + ) 31 / 50 32 / 50
Stratified analysis in R Effect modification horth=no 1.43 [1.11; 1.84] 0.005746 age 0.99 [0.97; 1.01] 0.1703 log(tsize) 1.36 [1.05; 1.76] 0.02061 grade=iii 1 grade=i 0.37 [0.22; 0.62] 0.0001568 grade=ii 0.80 [0.61; 1.04] 0.08961 pnodes 1.05 [1.04; 1.07] < 0.0001 menostat=pre 1 menostat=post 1.39 [0.97; 1.97] 0.07124 horth=no 1.42 [1.10; 1.83] 0.00649 age 0.99 [0.97; 1.01] 0.2058 log(tsize) 1.36 [1.05; 1.76] 0.01942 grade=iii 1 grade=i 0.38 [0.22; 0.63] 0.0002003 grade=ii 0.79 [0.61; 1.03] 0.08664 pnodes 1.05 [1.04; 1.07] < 0.0001 If there are good reasons to assume that the magnitude of the changes of survival due to treatment or other exposure depend on the other patient characteristics, then one needs to look into statistical interactions between the explanatory variables. Such results can be the basis for individualized medicine. However, a multivariable model which allows all possible interactions will typically be very complex... 33 / 50 34 / 50 Boku et al. (2009) The Lancet Oncology Example: epo study Anaemia is a deficiency of red blood cells and/or hemoglobin and an additional risk factor for cancer patients. Randomized placebo controlled trial: does treatment with epoetin beta epo (300 U/kg) enhance hemoglobin concentration level and improve survival chances? Henke et al. 2006 identified the c20 expression (erythropoietin receptor status) as a new biomarker for the prognosis of locoregional progression-free survival. 35 / 50 Henke et al. Do erythropoietin receptors on cancer cells explain unexpected clinical findings? J Clin Oncol, 24(29):4708-4713, 2006. 36 / 50
Treatment Survival part of the eop study The study includes 149 head and neck cancer patients with a tumor located in the oropharynx (36%), the oral cavity (27%), the larynx (14%) or in the hypopharynx (23%). Main treatment was radiotherapy following The locoregional progression free survival time in the epo study is the time between treatment and what comes first, death of patient or locoregional progression of the tumour. Resection Complete Incomplete No 35 14 25 36 14 25 Successful epo treatment should improve the survival chances... with non-missing blood values 37 / 50 38 / 50 Kaplan-Meier with point-wise 95% confidence intervals Locoregional progression free survival 10 Effect of epo treatment Locoregional progression free survival 10 5 5 number of patients 149 114 74 64 55 45 41 33 28 16 10 75 52 35 30 25 20 18 13 11 5 3 74 62 39 34 30 25 23 20 17 11 7 39 / 50 40 / 50
Results from Henke et al. (2006) The epo receptor status! New biomarker? Locoregional progression free survival 10 5 Receptor: positive Locoregional progression free survival 10 5 Receptor: negative 51 34 23 19 15 12 11 9 8 2 2 24 18 12 11 10 8 7 4 3 3 1 50 41 28 26 24 20 18 17 15 9 6 24 21 11 8 6 5 5 3 2 2 1 41 / 50 42 / 50 Effect of epo treatment 10 Resection: none 10 Resection: complete Statistical methods in Henke et al. (2006) 5 5 25 10 2 2 0 0 0 0 0 0 0 25 21 7 7 6 3 3 3 2 2 2 36 33 25 22 20 16 14 11 9 4 3 35 28 21 20 17 16 14 11 10 5 5 10 Resection: incomplete 5 14 9 8 7 6 5 4 2 2 1 0 141311 8 7 6 6 6 5 4 1 43 / 50 44 / 50
Interaction analysis presented in Henke et al. (2006) Interaction between treatment and epo receptor status (R) coxph ( Surv ( l d f s. time, l d f s. s t a t u s )~eporec Treat+stratum, data= ) Call: coxph(formula = Surv(ldfs.time, ldfs.status) ~ eporec * Treat + stratum, data = ) coef exp(coef) se(coef) z p eporecpos -0.3820 0.682 0.286-1.3348 0.180000000 Treat -0.0324 0.968 0.347-0.0933 0.930000000 stratumincompleteresection 0.2987 1.348 0.274 1.0920 0.270000000 stratumnoresection 1.2557 3.510 0.229 5.4777 0.000000043 eporecpos:treat 0.7281 2.071 0.417 1.7445 0.081000000 Likelihood ratio test=37.6 on 5 df, p=0.000000444 n= 149, number of events= 109 45 / 50 46 / 50 Terminology: interaction A statistical interaction is an effect modification: the hazard ratio of one variable depends on the value of another variable. hazard =α 0 (t) exp(β 1 I (eporec=pos) + β 2 I (Treat=) + β 3 I (Incomp. Resection) + β 4 I (No Resection) + β 5 I (eporec=pos and Treat=) E.g. for a No Resection patient the hazard function is: α 0 (t) exp(β 4 ) α 0 (t) exp(β 1 + β 4 ) α 0 (t) exp(β 2 + β 4 ) α 0 (t) exp(β 1 + β 2 + β 4 + β 5 ) Treatment= and eporec=negative Treatment= and eporec=positive Treatment= and eporec=negative Treatment= and eporec=positive Interaction between treatment and epo receptor status (R) $ eporec. Treat=i n t e r a c t i o n ( $ Treat, $ eporec ) $ eporec. Treat=r e l e v e l ( $ eporec. Treat, ". pos " ) coxph ( Surv ( l d f s. time, l d f s. s t a t u s )~eporec. Treat+stratum, data= ) Call: coxph(formula = Surv(ldfs.time, ldfs.status) ~ eporec.treat + stratum, data = ) coef exp(coef) se(coef) z p eporec.treat.neg -0.314 0.731 0.276-1.13 0.260000000 eporec.treat.neg -0.346 0.707 0.302-1.15 0.250000000 eporec.treat.pos -0.696 0.499 0.241-2.89 0.003900000 stratumincompleteresection 0.299 1.348 0.274 1.09 0.270000000 stratumnoresection 1.256 3.510 0.229 5.48 0.000000043 Likelihood ratio test=37.6 on 5 df, p=0.000000444 n= 149, number of events= 109 47 / 50 48 / 50
Discussion of the study Take home messages Not every head and neck cancer patient will benefit from EPO treatment (which is also costly) Independent data would be needed to confirm the Henke et al. findings New studies to identify a subgroup of patients that could benefit (negative receptor status?, tumor resection?) are not ethical. Meta analysis concluded: There is even a strong suggestion that erythropoietin negatively influences outcome. Multivariable models are often superior to unadjusted models There are not many generally applicable rules for model building. Model building must be adapted to the subject matter question Typical source of error: the same data are used to generate hypotheses and to test them If many data dependent steps of modeling were performed, then the finally selected model needs to be interpreted with care If in doubt: Consult a statistician :) Lambin et al. (2009). The Cochrane collaboration. 49 / 50 50 / 50