Supplementary Information for Avoidable deaths and random variation in patients survival by K Seppä, T Haulinen and E Läärä
Supplementary Appendix Relative survival and excess hazard of death The relative survival ratio S R (t) = S(t)/S (t), that is the ratio of the observed survival proportion S(t) of the patients and the expected survival S (t), the latter being derived from a comparable reference population at time t from diagnosis, was used to measure the net survival in the five cancer control regions, i.e., the hypothetical survival in the absence of other causes of death. The observed and expected hazard functions λ(t) and λ (t), respectively, are the rates of death per unit of time in the patients and in the reference population, respectively, at time t. The excess hazard function of death due to colon cancer γ(t) = λ(t) λ (t) is the excess rate of death of the patients as compared with the death rate in the reference population. The functions λ (t) and γ(t) were derived based on the nown relations between the survival and hazard functions: S (t) = exp { t 0 λ (u) du } and S R (t) = exp { t γ(u) du}. 0 The expected hazard of death for a patient i in follow-up interval j was determined by region r i (r = 1,..., 5), sex s i (s = 0 for males and s = 1 for females), calendar year v ij (v = 2000,..., 2009) and age a ij in years (a = 0,..., 99), i.e. λ ij = λ r i s i v ij a ij. The expected hazard λ rsva was estimated by dividing the pertinent number of deaths by the mid-year population count in each stratum of region, sex, calendar year and age. Region-specific population counts and deaths were obtained from Statistics Finland for each calendar year from 2000 to 2009 (Statistics Finland 2011). Because the numbers of deaths by cancer control region were only available in 5-year age groups, the regional hazard of death was assumed to be proportional to the hazard of death in the whole country within the 5-year age groups in order to estimate the region-specific hazards in 1-year age groups. The expected survival of a group of patients was estimated using the Ederer II method (Ederer and Heise 1959; Haulinen et al, 2011). Traditional direct standardisation by age and sex was used to compare the 5-1
year relative survival ratios and the excess and expected hazards of death, respectively, as some differences exist in the age and sex structures of the patients across the regions (Pohrel and Haulinen, 2008). The age-standardisation was based on five age groups: 0 44, 45 54, 55 64, 65 74 and 75 89 years at diagnosis, and the age and sex structure of all patients diagnosed in 2000 2007 was used as the standard. The excess hazard of death γ ij for patient i in follow-up interval j was modelled as a multiplicative function of covariates z il (l = 1,..., b), i.e. γ ij = exp{z i1 β 1 + + z ib β b } where regression coefficient β l is interpreted as the additive effect of covariate z il on the logarithm of the excess hazard. The model included sex, cancer control region (5 levels), age group (the same five categories as in the agestandardisation) and follow-up time (0 3 months, 4 12 months, and four annual intervals from 1 to 5 years) as categorical covariates. Interaction terms between age and follow-up time were included to allow non-proportional excess hazards by the age groups. In addition, interaction terms between age and sex were included in the model. This relative survival regression model was fitted in the framewor of generalized linear models using exact survival times and individual subject-band observations (Dicman et al, 2004). This implies a Poisson distribution for the indicator of death d ij of patient i in interval j with lin function ln(µ ij + λ ijy ij ) and offset ln(y ij ), where y ij is the time at ris of patient i in interval j and µ ij = (λ ij + γ ij )y ij is the expected value for the death indicator d ij. Numbers of deaths from cancer and from other causes In order to estimate the numbers of deaths from the target cancer and from all the other causes, respectively, the crude probability of death due to the cancer and to other causes, respectively, were obtained using the theory of competing riss (hiang 1968, p. 245). The crude conditional probability that a patient 2
alive at x j will die from cancer in interval j is q c j = xj+1 x j { t } exp λ(u) du γ(t) dt x j where the observed total hazard of death λ(t) is expressed as the sum of the expected and the excess hazard: λ(t) = λ (t)+γ(t). If the hazards are assumed to be constants within the intervals, the probability of dying from cancer in interval j can be written as q c j = γ j λ j + γ q j (1) j where λ j and γ j are the interval-specific expected and excess hazards, respectively, and q j = 1 p j = 1 exp{ (λ j + γ j ) j } is the conditional probability of death in interval j, when j is the length of the interval j, given survival until the beginning of the interval. The cumulative crude probability of dying from cancer during the first intervals is Q c = q1 c + p 1 q2 c + p 1 p 2 q3 c + + p 1 p 2 p 1 q. c The number of deaths from cancer accumulated during the first intervals D c can be estimated as a sum over the cumulative crude probabilities of n patients: D c = n i=1 Qc i where the cumulative crude probability of patient i is given by replacing γ j and λ j in formula (1) by individual estimates γ ij and λ ij = λ r i s i v ij a ij. Fitted (predicted) values of the excess hazard of the relative survival regression model were used for patient i in the first followup intervals, even if the follow-up time of the patient was censored or the patient died during the intervals. For v ij > 2009, the expected hazard of year 2009 was used. The number of deaths from other causes accumulated during the first intervals D o is obtained by writing the probability of dying from competing causes of death other than the cancer qj o = λ jq j /(λ j +γ j ). The total number of deaths is written as a sum of the number of deaths from cancer and other causes, i.e. D T = Dc + Do. 3
Numbers and proportions of avoidable deaths The hypothetical number D,S of deaths from cause (c=cancer, o=other causes, T=any cause) accumulated during the first intervals was calculated under three different scenarios S (A, B, AB). In scenario A, the excess hazard γ ij was replaced with the corresponding excess hazard in region 1 (specific to sex, age group and follow-up interval). In scenario B, the expected hazard λ ij was replaced with the corresponding expected hazard in region 2 (matched by sex, calendar year and year of age). In scenario AB, both γ ij and λ ij were replaced with those in regions 1 and 2, respectively. The number and proportion of avoidable deaths from cause accumulated during the first intervals in scenario S are written as Diff, S = D D, S and Prop, S = (D D, S )/D, respectively, where D is the true number of deaths from cause accumulated during the first intervals. Variances of the estimators Let α rsva be the natural logarithm of the expected hazard of death defined by region r, sex s, calendar year v and year of age a, i.e. λ rsva = exp{α rsva }. The variances for the number of deaths from cause and for the number and proportion of avoidable deaths from cause in scenario S were approximated by the delta method (asella and Berger 2001): Var(D ) l,m D Var(Diff,S ) ( D l,m + D ov( β ˆβ l, ˆβ m ) + m r,s,v,a D,S ( D r,s,v,a )( D ( D α rsva D,S β m β m ) 2 Var(ˆα rsva), ) ov( ˆβ l, ˆβ m ) ) 2 D,S Var(ˆα rsva), and α rsva α rsva 4
Var(Prop,S ) { l,m ( D D,S D,S D + ( D D,S α rsva r,s,v,a )( D D,S β m D,S D α rsva D,S D β m ) 2 Var(ˆα rsva)} ) ov( ˆβ l, ˆβ m ) (D ) 4. The estimated covariances of the estimates of the β parameters were provided by the iterative weighted least squares algorithm used to fit the generalized linear model of relative survival. The variance of the estimate of the logarithm of the expected hazard of death Var(ˆα rsva ) was estimated by the inverse of the number of deaths in national population stratified by region, sex, calendar year and age group. The partial derivatives of the number of deaths from cancer, from other causes, and from any cause with respect to β l parameter are given by D c = D o = and D T n i=1 m=1 n i=1 m=1 = Dc { P i,m 1 qim c { P i,m 1 qim o + Do = respectively, where n I βl (γ im )q 1 im ( mγ im p im + q o im) I βl (γ im )q 1 im ( mγ im p im q c im) i=1 m=1 P im { I βl (γ im ) m γ im }, m 1 j=1 m 1 j=1 I βl (γ ij ) j γ ij }, I βl (γ ij ) j γ ij }, P im = m j=0 p ij is the cumulative probability for patient i to survive at least until the end of interval m where in particular p i0 = 1. q im = 1 p im, q c im and q o im are the probability of death and the crude probabilities of dying from cancer and from other causes, respectively, for patient i in interval m. I βl (γ ij ) is an indicator equalling 1, if regression coefficient β l is included in the predicted excess hazard γ ij of patient i in follow-up interval j, and I βl (γ ij ) = 0 otherwise. 5
Partial derivatives D c/ α rsva, D o/ α rsva and D T/ α rsva are otherwise similar to D o/, D c/ and D T/, respectively, but qim, c qim, o γ ij and I βl are replaced with qim, o qim, c λ ij and I αrsva, respectively, where I αrsva (λ ij) = 1, if parameter α rsva is included in the expected hazard λ ij of patient i in followup interval j, and I αrsva (λ ij) = 0 otherwise. The random variation in the expected hazard rates was taen into account in the calculation of the variances after obtaining the estimated covariance matrix of the estimators of the regression coefficients of the excess hazard. However, the expected hazard rates estimated from large regional populations were considered as being essentially free from random error in the estimation of the excess hazard, otherwise the relative survival regression model could not be fitted within the framewor of generalized linear models. Statistical software The relative survival regression model can be easily fitted using any software that allows the estimation of generalised linear models with user-defined lin functions (Dicman et al, 2004). We used R environment for statistical computing and graphics in the all analysis (R Development ore Team 2012). First, glm function was used in fitting the relative survival model. Then, the numbers of deaths, the numbers and proportions of avoidable deaths and their variances were estimated using the explicit formulae presented above. The scripts are available from the first author on request. References asella G, Berger RL (2001) Statistical Inference, 2nd Edition. Duxbury Press: Pacific Grove, A hiang L (1968) Introduction to Stochastic Processes in Biostatistics. Wiley: New Yor 6
Dicman PW, Sloggett A, Hills M, Haulinen T (2004) Regression models for relative survival. Statistics in Medicine 23: 51 64 Ederer F, Heise H (1959) Instructions to IBM 650 programmers in processing survival computations. Methodological note no. 10. End Results Evaluation Section, National ancer Institute: Bethesda, MD Haulinen T, Seppä K, Lambert P (2011) hoosing the relative survival method for cancer survival estimation. European Journal of ancer 47: 2202 2210 Pohrel A, Haulinen T (2008) How to interpret the relative survival ratios of cancer patients. European Journal of ancer 44: 2661 2667 R Development ore Team (2012) R: A Language and Environment for Statistical omputing, version 2.14.2. R foundation for Statistical omputing, Vienna, Austria. URL http://www.r-project.org, accessed 19 March 2012. Statistics Finland (2011) StatFin online service. URL http://pxweb2.stat.fi/ database/statfin/databasetree en.asp, accessed 19 March 2012 7
Supplementary Table 1: Age distributions (%) of colon cancer patients diagnosed in Finland in 2000 2007 by cancer control region. Region 0 44 45 54 55 64 65 74 75 89 All: 0 89 1 177 (5) 309 (9) 677 (20) 945 (28) 1260 (37) 3368 (100) 2 49 (3) 114 (7) 290 (17) 518 (30) 732 (43) 1703 (100) 3 119 (4) 222 (8) 454 (17) 772 (29) 1108 (41) 2675 (100) 4 85 (5) 167 (9) 349 (19) 519 (28) 718 (39) 1838 (100) 5 69 (6) 99 (8) 219 (18) 351 (30) 450 (38) 1188 (100) Total 499 (5) 911 (8) 1989 (18) 3105 (29) 4268 (40) 10772 (100) Supplementary Table 2: Age-specific and -standardised 5-year relative survival ratios (%) for colon cancer patients diagnosed in Finland in 2000 2007 by cancer control region. All the estimates were standardised by sex. Region 0 44 45 54 55 64 65 74 75 89 All: 0 89 95% I 1 78 68 62 63 57 61 (59, 64) 2 79 62 63 63 55 60 (57, 64) 3 80 58 60 60 61 61 (58, 64) 4 84 60 60 58 54 58 (55, 62) 5 75 61 62 61 56 60 (55, 64) Total 79 63 61 61 57 60 (59, 62) 8