Cancer Incidence Predictions (Finnish Experience) Tadeusz Dyba Joint Research Center EPAAC Workshop, January 22-23 2014, Ispra
Rational for making cancer incidence predictions Administrative: to plan the allocation of the resources (then predictions should be as accurate as possible) Scientific: to evaluate the success of disease control (then predictions that do not come true are useful)
Example of administrative prediction: Updating cancer registry data (Annual Report of Finnish Cancer Registry 2006/2007)
Example of scientific prediction:
Precision of prediction "Predictions are often given as point forecasts with no guidance as to their likely accuracy." "Given their importance, it is perhaps surprising and rather regrettable, that many [ ] do not regularly produce prediction intervals, and that most predictions are still given as a single value." Chris Chatfield, Time-Series Forecasting, Chapman&Chall, 2001.
Why use predictions intervals in incidence predictions? Monitoring a range of possible future outcomes Evaluation of cancer prevention actions Some predictions more accurate than others Elimination of absolutely inaccurate predictions
Calculating prediction intervals Using the formula for calculating the variance of conditional distribution var(c it ) = var( E(c it θ it ) ) + E(var(c it θ it ) ) can be shown for any model: var(c T ) = var(θ T ) + E(θ T ) where θ it is the estimator of predicted number of cases c it uncertainty about = uncertainty about + uncertainty about the predicted the parameters of the future number of cases model distribution ------------------------ Confidence Interval ----------------------------------------------------- Prediction Interval
Why use simple models for cancer incidence prediction? Rule of parsimony ("status quo" assumption lying behind any prediction) Complicated models are not likely to hold in the future Lack of information or of reliable information about causes of cancers Short prediction interval; more precise prediction (if model holds) Clear interpretation
AGE PERIOD MODELS Decreasing trends of cancer incidence lne(i it ) = α i + βt β i Increasing trends of cancer incidence I it = c it / n it c it = number of cases n it = number of person-years i = age, t = period E( I it ) = α i + β i t E( I it ) = α i ( 1+ βt ) no non-identifiability property Assumption about: c it Poisson I it, age-adjusted Normal
Empirical coverage error of ex post predictions Finnish data (1953-2003) Site Females Males Lip 95 95 Oesophagus 95 95 Stomach 84 100 Colon 97 73 Rectum 89 97 Liver 78 65 Gallbladder 76 65 Pancreas 89 97 Lung 84 31 Corpus uteri 86 --- Ovary 86 --- Kidney 84 78 Bladder 76 70 Skin melanoma 89 86 Skin non-melanoma 68 81 Nervous system 76 78 Thyroid 68 86 Non-Hodgkin 85 86 Leukaemia 81 92 cancer sites without screening activities, horizon of prediction = 5 years, 32 ex post predictions per site
Site specific predictions - Lung Cancer Hakulinen T and Pukkala E, Int J Epidemiol, 1981 a simulation model to predict lung cancer incidence in Finland based on historical smoking habits and possible future scenarios of starting and quiting smoking - Breast Cancer Seppanen et al., Cancer Cause Control, 2006 predicting breast cancer incidence under historical and possible future scenarios of screening practices in Finland
Examples of predictive models for Finland based on age-period-region specific data - cancer control is by region in Finland - stratification by region increases homogenity of data eliminating extra-poisson variation for the more common cancers No. Model D.F. Pearson s X 2 Deviance Dev. + 2*NP 1 α i 702 991.2 1044.6 1070.6 2 α i ( 1 + βt ) 701 983.2 1037.8 1065.8 3 α i ( 1 + γ r + βt ) 697 701.6 724.6 760.6 4 α i ( 1 + γ r + β r t ) 693 695.9 718.9 762.9 5 α i + β i t 689 966.6 1024.5 1076.5 6 α i r 650 644.7 677.9 807.9 7 α i r ( 1 + βt ) 649 637.0 671.1 803.1 8 α i r ( 1 + β r t ) 645 634.2 667.0 807.0 9 α i r + β i t 638 619.8 657.4 811.4 10 α i r + β i r t 588 577.6 609.9 863.9 The models for cancer incidence specific to age(α i ), period(β) and to region(γ r ) applied for cancer sites with increasing (or stable) incidence pattern. The example of fit is for lung cancer in females in Finland
Predictions for males in Finland as age-adjusted incidence rates based on age-period-region models
Other approaches to prediction Age-period-cohort models Moller B, et al. Stat in Med, 2003 (empirical evaluation of using Age-Period-Cohort models for prediction, applying different methods using data from Nordic countries, no evaluation of the precision of the performed predictions by means of prediction interval) Rutherford M et al. Int J Biost 2012, Phd Thesis 2011 (in the framework of flexible parametric modelling forces period and cohort cubic spline functions to be linear beyond the boundary knot in order to predict the future incidence) Bayesian age-period-cohort models Bashir G and Esteve J, J of Epidemiol and Biostat, 2001 Bray I, Appl Stat, 2002 Baker A and Bray I, Am. J Epidemiol, 2005 Cleries R et al., Stat Med, 2012 (choice of smoothing priors is crucial, long credible intervals) Generalized additive models Clements et al., Biostatistics, 2005 (prediction interval more precise than those from Bayesian approach)
Software for incidence prediction - Published papers are sometimes accompanied by computer code to perform prediction or the code is available upon request, many Bayesian predictions use Winbugs software - Nordpred package developed in Norway, written in R software http://www.kreftregisteret.no/en/research/projects/nordpred/nordpred-software/ Moller B, et al. Stat in Med, 2003 Engholm et al., Association of Nordic Cancer Registries. Danish Cancer Society, 2009 - A four presented here age-period models can be applied using Stata macros http://www.cancer.fi/syoparekisteri/en/general/links/, Hakulienen T and Dyba, Stat Med, 1994; Dyba T and Hakulinen T, Stat Med, 1997; 2000 - Prediction based on APC models using restricted cubic splines uses Stata macros Rutherford M et al., Int J Biost 2012; Stata Jour 2012; Rutherford M, Phd Thesis 2011 - On line analysis, allowing to perform predictions for certain data sets Nordic countries: http://www.kreftregisteret.no/en/research/projects/nordpred/nordpred-software/ Other countries: http://www-dep.iarc.fr/whodb/predictions_sel.htm
Closing remarks Predictive methods should clearly state assumptions used during prediction process One method of fitting all cancer sites doesn't exist Performed cancer incidence predictions often lack necessary measure of precision Mathematically advanced predictive methods/models are hard to interpret The need of collecting software used by different prediction methods at one place (website?) Without external information on cancer etiology, latency time, screening activities performing cancer incidence predictions will remain a challenge
THANK YOU
Long-term Bayesian predictions for Finland: numbers of new cases in females 1990-1994 Site Observed Projected 90% CI Oesophagus 442 440 130; 1552 Lung 2208 2152 706; 6539 Melanoma 1240 1675 431; 6216 Breast 13930 15032 4952; 46354 Brain 1948 2117 643; 8285 Bashir G and Esteve J, J of Epidemiol and Biostat, 2001
Posterior estimates of the precision parameters of the predictive model Timescale Posterior median 90% Credible Interval Age 18.6 6.9; 52.5 Period 674.2 54.9; 2993.4 Cohort 513.2 61.6; 2724.3 Bray I, Appl Stat, 2002
Moller B, et al. Stat in Med, 2003
Moller B, et al. Stat in Med, 2003
Moller B, et al. Stat in Med, 2003
Moller B, et al. Stat in Med, 2003