Predicting perioperative mortality after oesophagectomy: a systematic review of performance and methods of multivariate models

Similar documents
Is surgical Apgar score an effective assessment tool for the prediction of postoperative complications in patients undergoing oesophagectomy?

General introduction and outline of thesis

Evaluation of POSSUM and P-POSSUM as predictors of mortality and morbidity in patients undergoing laparotomy at a referral hospital in Nairobi, Kenya

Systematic reviews and meta-analyses of observational studies (MOOSE): Checklist.

FTS Oesophagectomy: minimal research to date 3,4

ORIGINAL ARTICLE. Impact of Hospital Volume on Long-term Survival After Esophageal Cancer Surgery

Dr. Stuart McCorkell BSc FRCA FFICM Anaesthetic Department, Guy s & St. Thomas s NHS Foundation Trust 2017 POPS

Weekday of esophageal cancer surgery and its relation to prognosis. Lagergren, Jesper; Mattsson Fredrik; Lagergren, Pernilla.

National Oesophago-Gastric Cancer Audit New Patient Registration sheet Patients with Oesophageal High Grade Glandular Dysplasia

FEV1 predicts length of stay and in-hospital mortality in patients undergoing cardiac surgery

Validation of a Nomogram Predicting Complications After Esophagectomy for Cancer

Minimally Invasive Esophagectomy- Valuable. Jayer Chung, MD University of Colorado Health Sciences Center December 11, 2006

Systematic reviews of prognostic studies 3 meta-analytical approaches in systematic reviews of prognostic studies

Intervention(s) Results primary outcome Critical appraisal of review quality

Impact of co-morbidity on mortality after oesophageal cancer surgery

Below is summarised some of the tools and papers that are worth looking at if you have an interest in the area.

Minimally Invasive Esophagectomy: OVERRATED!!! Sagar Damle UCHSC December 11, 2006

UK Liver Transplant Audit

Surgical strategies in esophageal cancer

Outcomes After Esophagectomy: A Ten-Year Prospective Cohort

Systematic reviews of prognostic studies: a meta-analytical approach

Association of Age and Survival in Patients With Gastroesophageal Cancer Undergoing Surgery With or Without Preoperative Therapy

The following slides are provided as presented by the author during the live educa7onal ac7vity and are intended for reference purposes only.

From single studies to an EBM based assessment some central issues

Surgery in Frail Elders. Emily Finlayson, MD, MS Department of Surgery University of California, San Francisco September, 2011

Clinicopathologic and prognostic factors of young and elderly patients with esophageal adenocarcinoma: is there really a difference?

Preoperative tests (update)

Advances in gastric cancer: How to approach localised disease?

Systematic reviews of prediction modeling studies: planning, critical appraisal and data collection

DATA REPORT. August 2014

Surgical Apgar Score Predicts Post- Laparatomy Complications

Delay in Diagnostic Workup and Treatment of Esophageal Cancer

Reporting and Methods in Clinical Prediction Research: A Systematic Review

List of publications - Pernilla Lagergren

Summary HTA. HTA-Report Summary

POSTOPERATIVE COMPLICATIONS OF TRANSTHORACIC ESOPHAGECTOMY FOR ESOPHAGEAL CARCINOMA

Zhengtao Liu 1,2,3*, Shuping Que 4*, Lin Zhou 1,2,3 Author affiliation:

APPLYING ENHANCED RECOVERY PRINCIPLES: EARLY TESTING IN UPPER GI CANCER

Title: What is the role of pre-operative PET/PET-CT in the management of patients with

Ammonia level at admission predicts in-hospital mortality for patients with alcoholic hepatitis

Minimally Invasive Esophagectomy

Lymph node audit on Ivor-Lewis Oesophagogastrectomy specimens - November 2013 to October 2014.

The CROSS road in neoadjuvant therapy for esophageal cancer: long-term results of CROSS trial

What is indirect comparison?

National Bowel Cancer Audit Supplementary Report 2011

Pubmed citation for the paper: Acta Oncol Feb 28. [Epub ahead of print]

Template 1 for summarising studies addressing prognostic questions

Controversies in management of squamous esophageal cancer

Outcome of Esophagectomy for Cancer in Elderly Patients

The Impact of Body Mass Index on Esophageal Cancer

Variations in survival and perioperative complications between hospitals based on data from two phase III clinical trials for oesophageal cancer

Determining the Optimal Surgical Approach to Esophageal Cancer

Cancerous esophageal stenosis before treatment was significantly correlated to poor prognosis of patients with esophageal cancer: a meta-analysis

PROSPERO International prospective register of systematic reviews

The QUOROM Statement: revised recommendations for improving the quality of reports of systematic reviews

Determining the optimal number of lymph nodes harvested during esophagectomy

Cardiopulmonary exercise testing provides a predictive tool for early and late outcomes in abdominal aortic aneurysm patients

Conventional Gastrectomy for Gastric Cancer. Franklin Wright UCHSC Department of Surgery Grand Rounds January 14, 2008

Perioperative risk prediction scores

surgery: A systematic review and meta-analysis protocol

MODEL SELECTION STRATEGIES. Tony Panzarella

Centralization of Esophageal Cancer Surgery: Does It Improve Clinical Outcome?

Overall survival analysis of neoadjuvant chemoradiotherapy and esophagectomy for esophageal cancer

Surveillance report Published: 17 March 2016 nice.org.uk

Log odds of positive lymph nodes is a novel prognostic indicator for advanced ESCC after surgical resection

Supplementary Text A. Full search strategy for each of the searched databases

Introduction to REMARK: Reporting tumour marker prognostic studies

Tristate Lung Meeting 2014 Pro-Con Debate: Surgery has no role in the management of certain subsets of N2 disease

Benefits and Harms of Routine Preoperative Testing: A Comparative Effectiveness Review

Statistical modelling for thoracic surgery using a nomogram based on logistic regression

Accuracy of endoscopic ultrasound staging for T2N0 esophageal cancer: a national cancer database analysis

Cochrane Breast Cancer Group

Traumatic brain injury

Surveillance report Published: 9 January 2017 nice.org.uk

Esophageal cancer: Biology, natural history, staging and therapeutic options

Prognostic Value of Plasma D-dimer in Patients with Resectable Esophageal Squamous Cell Carcinoma in China

Validation of the T descriptor in the new 8th TNM classification for non-small cell lung cancer

Assessment of surgical outcome in general surgery using Portsmouth possum scoring

ACCURATE prediction of perioperative risk is an

Results. NeuRA Worldwide incidence April 2016

List of publications - Pernilla Lagergren

CHAMP: CHecklist for the Appraisal of Moderators and Predictors

Supplementary Online Content

OESOPHAGO-GASTRIC CANCER 2016

The Society of Thoracic Surgeons General Thoracic Surgery Database: Establishing Generalizability to National Lung Cancer Resection Outcomes

Development, validation and application of risk prediction models

Qigong for healthcare: an overview of systematic reviews

Preoperative Biliary Drainage Among Patients With Resectable Hepatobiliary Malignancy: Does Technique Matter?

Early extubation after transthoracic oesophagectomy

Cochrane Pregnancy and Childbirth Group Methodological Guidelines

Presented By: Samik Patel MD. Martinovski M 1, Patel S 1, Navratil A 2, Zeni T 3, Jonker M 3, Ferraro J 1, Albright J 1, Cleary RK 1

The Effect of Tidal Volume on Pulmonary Complications following Minimally Invasive Esophagectomy: A Randomized and Controlled Study

The right middle lobe is the smallest lobe in the lung, and

17 th December 2008 Glasgow eprints Service

Downloaded from:

Evaluation of the Cockroft Gault, Jelliffe and Wright formulae in estimating renal function in elderly cancer patients

Cover Page. The handle holds various files of this Leiden University dissertation

Individual Participant Data (IPD) Meta-analysis of prediction modelling studies

Supplementary Online Material

Outcome following surgery for colorectal cancer

Transcription:

British Journal of Anaesthesia 114 (1): 32 43 (2015) Advance Access publication 17 September 2014. doi:10.1093/bja/aeu294 Predicting perioperative mortality after oesophagectomy: a systematic review of performance and methods of multivariate models I. Warnell 1 *, M. Chincholkar 2 and M. Eccles 3 1 Department of Anaesthesia, Royal Victoria Infirmary, Newcastle upon Tyne NHS Foundation Trust, Queen Victoria Road, Newcastle upon Tyne NE1 4LP, UK 2 Department of Anaesthesia, Salford Royal NHS Foundation Trust, Stott Lane, Salford M6 8HD, UK 3 Institute of Health and Society, Newcastle University, The Baddiley-Clark Building, Richardson Road, Newcastle upon Tyne NE2 4AX, UK * Corresponding author. E-mail: ian.warnell@nuth.nhs.uk Editor s key points The authors systematically reviewed the prediction of mortality risk after oesophagectomy for cancer. They found generally unsatisfactory performance in commonly used models, and recommend further work in developing and validating new prediction models via large data sets. Summary. Predicting risk of perioperative mortality after oesophagectomy for cancer may assist patients to make treatment choices and allow balanced comparison of providers. The aim of this systematic review of multivariate prediction models isto report their performance in new patients, and compare study methods against current recommendations. We used PRISMA guidelines and searched Medline, Embase, and standard texts from 1990 to 2012. Inclusion criteria were English language articles reporting development and validation of prediction models of perioperative mortality after open oesophagectomy. Two reviewers screened articles and extracted data for methods, results, and potential biases. We identified 11 development, 10 external validation, and two clinical impact studies. Overestimation of predicted mortality was common (5 200% error), discrimination was poor to moderate (area under receiver operator curves ranged from 0.58 to 0.78), and reporting of potential bias was poor. There were potentially important case mix differences between modelling and validation samples, and sample sizes were considerably smaller than is currently recommended. Steyerberg and colleagues model used the most transportable predictors and was validated in the largest sample. Most models have not been adequately validated and reported performance has been unsatisfactory. There is a need to clarify definition, effect size, and selection of currently available candidate predictors for inclusion in prediction models, and to identify new ones strongly associated with outcome. Adoption of prediction models into practice requires further development and validation in well-designed large sample prospective studies. Keywords: oesophagectomy; postoperative complications, mortality; risk assessment The UK government has put the provision of information to facilitate patient choice of treatment and provider at the centre of its vision for the NHS. 12 For oesophagectomy, perioperative morbidity and mortality rates are likely to feature in this information as reported in-hospital mortality is around 5%, 34 major complication rates up to 60%, and there is a possibility of reduced quality of life in the postoperative period. 5 Unadjusted mortality rates for individual surgeons, who carry out oesophagectomy, are also now publicly available. 6 Risk prediction models may allow a risk-stratified and more suitable comparison of service providers and also assisting individual choice of treatment. However, these stratifiers can only be considered for general use if they have been shown to be reliable, can contribute clinical benefit to patient care, and are transportable to new settings. 78 Currently, available prediction models of perioperative mortality for oesophagectomy are not widely used, because it is not clear that they fulfil the above criteria. Clinicians assess a range of potential comorbidities when providing prognostic information, and therefore, successful prediction models should probably also reflect the multifactorial nature of outcome prediction. 9 Therefore, in this review, we focus on the multivariate models which have been used for this purpose. In a descriptive review of some models, Shende and colleagues 10 reported poor validation and performance, and Dutta and colleagues reported overestimation of mortality in a quantitative data synthesis of POSSUM (Physiological and Operative Severity Score for the enumeration of Mortality and Morbidity) 11 models in a mixed gastric and oesophageal cancer cohort. 12 To our knowledge, there are no current systematic reviews of methodology and performance of available prediction models of perioperative mortality after oesophagectomy. The methods for studying and reporting multivariate prediction models have been well described, 713 15 as have & The Author 2014. Published by Oxford University Press on behalf of the British Journal of Anaesthesia. All rights reserved. For Permissions, please email: journals.permissions@oup.com

Predicting mortality after oesophagectomy BJA causes of poor performance. 15 In this systematic review, we aim to report the performance of currently available clinical multivariate prediction models and to report recognized sources of methodological bias, which could contribute to impaired performance. Methods This systematic review was carried out in accordance with guidelines published in the PRISMA statement. 16 Inclusion criteria for primary studies Studies of development, validation in new patient groups, or clinical impact of multivariate prediction models of perioperative mortality were included. The study population included adult patients, who underwent elective open surgical resection of oesophageal cancer. Studies of laparoscopic, thoracoscopic, minimally invasive, and endoscopic techniques were excluded. Perioperative mortality was defined as all cause mortality associated with the hospital admission for oesophagectomy ( in-hospital mortality), or 30 day all cause mortality. Selection filters Reported perioperative mortality from oesophagectomy has decreased from 72% in 1941 17 to 2.9% currently. 18 This trend has been observed across European, American, and Far Eastern centres. 19 24 This review was intended for contemporary practice; therefore, we only included studies that were published after 1990. Improved outcome has also been associated with higher volume centres; 19 22 25 28 therefore, we included only studies from high volume single centres or results from large databases. High volume was defined as 10 or more cases annually, based on approximating Killeen and colleagues 25 definition of eight or nine cases required annually to reduce mortality by one case per year. Annual volume was estimated by dividing the reported total operating load by the duration of the study period. Studies were confined to English language reports. Search strategy Medline and Embase were searched from 1990 to 2012, and hand searches were made of reference lists from primary research studies, review articles, 10 29 and standard texts. 30 The search strategy used the AND logical operator to combine population definition (e.g. oesophagectomy), study type (e.g. cohort study), and a combination of outcome (e.g. mortality) OR prognostic testing (e.g. prediction). The full search strategy is available in Supplementary material. Study selection and data extraction Two reviewers (I.W. and M.C.) screened titles and abstracts from potentially relevant studies and examined full-text versions of selected articles for inclusion criteria. The selection process is summarized in Figure 1. Data items were extracted into an Excel spreadsheet by one reviewer (I.W.) and validated by the second (M.C.); potential for bias items were extracted and compared independently by both reviewers. Disagreements were resolved by consensus. The following study characteristics were extracted: study period, geographical location, data source (e.g. population database, clinical centre), modelling and validation methods, sample size, case mix descriptors (e.g. surgical procedure, 17 939 studies from Medline and Embase deduplication 13 744 studies from Medline and Embase 526 abstracts from database searches and reference lists from other sources 137 full-text articles retrieved for full examination 20 studies fulfilled inclusion criteria Studies of multivariate prediction models included. Studies of effects of individual candidated predictors excluded 117 studies did not fulfil inclusion criteria 11 clinical prediction models 10 validation studies 2 clinical impact studies Note: 23 separate studies were reported in 20 articles Fig 1 Flow chart of selection process for included studies. 33

BJA Warnell et al. tumour histology), perioperative mortality definition, and individual predictor descriptions. We also extracted performance items which measured how accurately outcome was predicted (calibration) and how well models could discriminate between survivors and non-survivors. Potential sources of bias in primary studies Items used to assess potential for bias in clinical prediction models were adapted from Hayden and colleagues 14 study of the reporting of potential risk of bias in systematic reviews of prognostic studies. Items relating to confounding variables were not included as the selection of candidate predictors for inclusion into models was not the subject of this review (Table 1). Quantitative data synthesis We considered attempting to synthesize summary statistics of discrimination and calibration, which could be applied generally to other populations. However, the variety of study designs, case mixes, and reported summary statistics would have resulted in very few relevant data points, for which a summary statistic may have been inappropriate and misleading. Results Included studies Twenty studies met the inclusion criteria (Table 2). Eleven studies developed clinical prediction models, 431 40 31 41 49 10 Table 1 Extracted items for main areas of potential bias. Methodology adapted from Hayden and colleagues. 14 Scoring criteria: M, fully met; P, partially met; N, not met; U, unclear or unknown; NA, not applicable Main category of potential bias Scoring items to assess potential for bias Scoring method The sample adequately represents the population of interest Reported exclusions from surgery in eligible patients (e.g. unfit for surgery) Sample data include all patients undergoing oesophagectomy during the reporting period Sample characteristics are described adequately to apply them to the population of interest, e.g. age, gender, tumour histology and stage, surgical procedure, surgical operative volume, geographical location, period of study, overall study mortality rate Excluded cases described and quantified, M; reported but not quantified or reasons not given, P; not reported, N; unclear, U All oesophagectomies included, M; reasons for exclusion from sample reported, P; otherwise U All characteristics described, M; partially described, P; not described, N; unclear, U The data represent the sample Follow-up rate is reported Number of survivors and deaths separately reported, M; follow-up rate deducible from article, P; unreported, or unknown, U Prospective (e.g. database) or retrospective (e.g. Prospective, P; retrospective, R; unclear or unknown, U clinical record review) data collection Evidence of data validation Data audit or double entry described, M; partial validation, e.g. datacross-checked with more than one database, P; not stated or not done, N; unclear, U Missing values reported Missing values reported, M; deducible from article or partially stated, P; no report or unclear, U Description of missing value handling No missing values or, acceptable missing value procedure reported, M; some information given, P; no report or unclear, U Transportable predictors to new patient group (clearly defined and easily and reliably predictor) Adequate description of predictor Continuous variables handled appropriately Predictor defined, M; some predictors described and or transportable, P; predictors not defined, N; unclear, U Continuous variables used, M; predefined cut points with rational basis, P; data-driven cut points, N; unclear, U Outcome adequately measured Outcome defined Period of follow-up to perioperative mortality clearly defined, M; deducible from text, P; not stated, N; unclear U Appropriate data analysis Description of appropriate statistical model Selection of statistical model and variables is appropriate M; inappropriate model, N; unclear, U For validation models: discrimination and, calibration reported, M; some elements of above, P; unclear, unavailable U Sufficient information given to assess adequacy of analysis Adequate sample size Adequate model description and presentation of results M; model described but incomplete details, P; inadequate information or unclear, U At least 10 outcome events for each predictor in regression models M; sample too small, N; unclear, U 34

Predicting mortality after oesophagectomy BJA Table 2 Characteristics of included studies. SEER, Surveillance, Epidemiology and End Results Medicare database; ACS-NSQIP, American College of Surgeons National Surgical Quality Improvement Program Author, study period Steyerberg 31 1991 2002 Ra 32 1997 2003 Tekkis 33 1994 2000 Bartels 34 1982 1996 McCulloch P 4 1999 2002 Bailey 35 1991 2000 Study design Total sample size Geographical location Prediction model and validation Prediction model and internal validation Prediction model and internal validation Prediction model, internal validation, and clinical application Prediction model and internal validation Prediction model and internal validation Source of data n¼3592 USA/The Netherlands Population databases and clinical centre Modelling sample: USA SEER (1991 6) database Validation sample: USA SEER (97 99), Eindhoven registry (1993 2001), Rotterdam hospital (1980 2002) n¼1162 USA Population database, SEER Medicare database n¼1042 (538 oesophagectomies) UK n¼805 Germany Single centre Regional and national clinical databases (36 centres) of gastrectomy and oesophagectomy n¼995 UK Subset of ASCOT National database (multiple centres reporting gastric and oesophageal surgery) n¼1777 USA Population database. Data submitted from 109 Veterans Affairs medical centres, USA Law 36 1982 1992 Prediction model n¼523 Hong Kong Single centre Liu 37 1994 7 Prediction model n¼32 Australia Single centre Sanz 38 1987 1999 Prediction model n¼114 Spain Single centre Dhungel 39 2005 8 Prediction model n¼1032 USA ACS-NSQIP database Zhang 40 1986 9 Prediction model, n¼162 Japan Single centre validation, and clinical application Schroder 41 Predictor effect study, n¼126 Germany Single centre 1997 2002 external validation Lai 42 2001 5 External validation n¼545 Hong Kong Administrative database (data submitted from 14 centres) Nagabhushan 43 External validation n¼313 UK Single centre 1990 2002 Lagarde 44 External validation n¼663 The Netherlands Single centre 1993 2005 Zafirellis 45 1990 9 External validation n¼204 UK Single centre Zingg 46 1990 2007 External validation n¼346 Australia, The Two centre Netherlands, Switzerland Bosch 47 1991 2007 External validation n¼280 The Netherlands Single centre Ball 48 2 yr period External validation n¼53 UK Two centre Dutta 49 2005 9 External validation n¼121 UK Single centre evaluated prediction models on new data sets (external validation), and two 34 40 reported clinical impact studies. Excluded studies The search strategy retrieved many studies of individual candidate predictor effects (e.g. age), but these were excluded from this review. Studies of thoracoscopic procedures were excluded, 50 as were studies of mixed surgical caseloads if summary statistics and results for oesophagectomies were unavailable. 51 53 Studies with after operation measured predictors 54 or unclearly reported perioperative mortality were 55 59 also excluded. Clinical prediction models 31 32 35 39 Four models were developed on data from the USA, two from the UK 433 and one each from Germany, 34 Spain, 38 Hong Kong, 36 Australia, 37 and Japan (Table 3). 40 Six models were developed on data from medium to large databases. 4 31 33 35 39 Bailey and colleagues, 35 Ra and colleagues, 32 and Steyerberg and colleagues 31 used data from US population databases. Bailey and colleagues 35 used 1777 records of the Veterans Affairs National Surgical Improvement Program. Ra and colleagues 32 and Steyerberg and colleagues 31 used 1172 and 1327 records, respectively, from the Surveillance, Epidemiology and End Results (SEER) Medicare database. 35

BJA Warnell et al. Table 3 Methods, performance, and predictors used in development of clinical prediction models of perioperative mortality after oesophagectomy. H L, Hosmer Lemeshow; O:E, observed to expected ratio; CCI, Charlson comorbidity index; 60 BUN, blood urea nitrogen Study Modelling method Sample size (n), number of deaths Steyerberg and colleagues 31 Ra and colleagues 32 Logistic regression, bootstrap internal validation. Generation of simple risk score Multivariate logistic regression. Generation of six-point risk score Tekkis and colleagues 33 Multiple logistic regression on 70% of sample. Individual centres accounted for in multilevel model (m). Validation on 30% and comparison with P-POSSUM Bartels and colleagues 34 Modelling (1982 1991), validation (1992 3), clinical application (1994 6). Predictors stratified and modelled (discriminant analysis) against outcome ( normal, prolonged, severe, fatal ) McCulloch and colleagues 4 Logistic regression. Validation on mixed oesophagogastric sample from final year of study n¼1327, 147 deaths n¼1172, 160 deaths n¼1042, 125 deaths. Combined sample of oesophagectomy (538) and gastrectomy Modelling: n¼432, 43 deaths Validation: n¼121, 9 deaths Application: n¼252, 4 deaths Modelling: n¼773, about 97 deaths Validation: n¼222, about 16 deaths Validation results (2 decimal places) Discrimination: area under ROC in modelling cohort 0.66; 0.65 on internal validation Predicted and observed mortality reported for modelling sample. Predicted high-risk group 29.8% vs observed 22% (sparse data) Discrimination: C-index: P-POSSUM 0.74; O-POSSUM 0.75; multilevel O-POSSUM (m) 0.80 Calibration: H-L, P value: P-POSSUM, P,0.01; O-POSSUM, P¼0.23; O-POSSUM, (m), P¼0.25 O:E ratio: P-POSSUM 1.21; O-POSSUM 1.03; O-POSSUM (m) 1.04 Model: Low, moderate, and high-risk groups for 30 day mortality (3.6%, 8.7%, and 28%) Validation: predicted risk groups of 2%, 5%, and 25% Clinical practice: Reduction in mortality from 9.4% to 1.6% after application Discrimination: C-index 0.79 in modelling sample and 0.68 in validation Calibration: O:E ratio 1.04 (H L, P¼0.5) in modelling. 0.82 (H L, P¼0.49) in validation Bailey and colleagues 35 Multivariate logistic regression n¼1777, 174 deaths Discrimination: C-index 0.69 in modelling sample Calibration: H L (P¼0.93) in modelling sample Law and colleagues 36 Liu and colleagues 37 Sanz and colleagues 38 Discriminant analysis to select risk predictors. Three level risk (7%, 30%, and 38% mortality) score Multiple regression; stratified three levels of risk (mortality 50%, 27%, and 8%). (Sample of 32 selected from total 70) Discriminant analysis to generate three level mortality risk: low (6.8%), intermediate (12.5%), high (50%) n¼523, 81 deaths Sensitivity 0.72, specificity 0.74, overall accuracy 0.74 in modelling sampling Predictors included in model Age categories (50 65, 66 80,.80), comorbidities (cardiorespiratory, diabetes, hepatic, renal), neoadjuvant therapy, hospital surgical volume Age over 80, modified Charlson score, 60 hospital surgical volume Physiological POSSUM, age, urgency, surgical procedure, POSSUM tumour stage Karnofsky index, 61 spirometry, arterial PO 2, aminopyrine breath test, cirrhosis, cardiac function (cardiologist impression) Physiological POSSUM, surgeon s assessment of fitness for surgery ( fit, significant comorbidity, comorbidity serious risk to postoperative survival ) tumour stage, operation Age, diabetes, functional status, neoadjuvant, BUN, alcohol intake, ascites, alkaline phosphatise Age, mid-arm circumference, operative blood loss, spirometry, abnormal chest X-ray, curative vs palliative procedure n¼32, 8 deaths No validation Hypertension, smoking, spirometry n¼114, 14 deaths No validation Previous cancer, cirrhosis, abnormal spirometry, cholesterol, albumen Dungel and Multivariate logistic regression n¼1032, 30 deaths Not reported Diabetes, dyspnoea, age colleagues 39 Continued 36

Predicting mortality after oesophagectomy BJA Table 3 Continued Study Modelling method Sample size (n), number of deaths Zhang and colleagues 40 Logistic regression to develop risk score (1986 1990). Validation sample from same centre 1990 1 Modelling: n¼100, 13 deaths Validation: n¼62, 2 deaths Validation results (2 decimal places) Modelling: sensitivity 0.75, specificity 0.99 Validation: sensitivity 0.33, specificity 0.98 Predictors included in model Oral glucose tolerance test, tumour stage, age, abnormal ECG, creatinine clearance, surgical procedure Tekkis and colleagues 33 developed the O-POSSUM from 1042 records (538 oesophagectomies) and McCulloch and colleagues 4 used 995 from the UK ASCOT database and the Risk Scoring Collaborative. Bartels and colleagues 34 used 432 records, Lawand colleagues 36 used 523, Sanz and colleagues 38 used 114, and Liu and colleagues 37 used 32 in single-centre studies. The outcome event was in hospital mortality in four studies, 4333738 30 day mortality in two, 31 35 both of these in two, 32 36 30 and 90 day mortality in Bartels and colleagues model, 34 and 45 day mortality in Zhang and colleagues model. 40 Mortality was not time defined by Dhungel and colleagues. 39 There was considerable variation in candidate predictor representation. For example, age was coded as a continuous variable, 33 ordered age group categories, 31 and an octogenarian subgroup. 32 Nutritional state was represented by weight loss, 34 serum albumin, 34 38 and skin fold thickness. 36 Some comorbidity was also included within composite general health or comorbidity scores such as the Karnovsky 34 61 32 60 or Charlson scores. Some scores were entirely subjective classifications, for example, physician assessment of cardiac risk, 34 surgeon classification of fitness for surgery, 4 and some items from the POSSUM scoring systems (e.g. the radiological and respiratory comorbidity scores). Models were developed using logistic regression by all studies except for three, which used discriminant analysis. 34 36 38 All studies except Steyerberg and colleagues 31 used some data driven statistical methods to select predictors for the final model. These included univariate selection of statistically significant predictors, 32 33 36 39 40 stepwise elimination in logistic regression, 435 34 36 38 and data driven cut-offstodefine predictors. Steyerberg and colleagues, 31 Ra and colleagues, 32 and Bartels and colleagues 34 used logistic regression equations to create simplified scoring systems. The larger database studies reported about 10 or more deaths for each predictor screened. 431 33 The exception was Bailey and colleagues, 35 who used stepwise regression to select from 122 candidate predictors in a model with about 170 fatalities. Smaller studies from single centres had much lower event to predictor ratios, 34 37 38 40 and therefore were more susceptible to overfitting. Investigators used a variety of methods to test robustness of models on the development data sets. Steyerberg and colleagues 31 used bootstrap methods and Ra and colleagues, 32 Bailey and colleagues, 35 and Law and colleagues 36 examined model fit on the development data ( apparent validation). 62 Tekkis and colleagues 33 and McCulloch and colleagues 4 used split samples for development and validation. Bartels and colleagues 34 and Zhang and colleagues 40 prospectively validated their models on patients from subsequent periods. Liu and colleagues 37 and Sanz and colleagues 38 did not formally examine model performance. In development studies, discrimination was moderate with area under the receiver operator curves (ROC) ranging from 0.65 31 to 0.797. 33 Steyerberg and colleagues 31 reported good calibration, but generally predictions were reported to overestimate mortality. 43233 Studies of model performance in new populations (external validation) Ten authors validated prediction models in new patient samples (Table 4). POSSUM-based models were the most extensively studied, but the models of Ra and colleagues, 32 Bartels and colleagues, 34 and Steyerberg and colleagues 31 have also been validated. A variety of performance measurements were reported including overall observed to expected mortality ratio, 42 44 48 43 45 47 49 standardized mortality ratios, 41 45 47 tabulated or graphical calibration of predicted risk levels, and goodness of fit statistics. 42 45 47 Discrimination was assessed most frequently using ROC curves. 42 45 47 49 Zingg and colleagues reported summary statistics from a logistic regression of the predicted mortality from original models against observed mortality for various outcomes in four different models, but did not report standard values for calibration or discrimination. Validation study sample sizes were small containing between five 49 and 32 43 fatalities. Only Steyerberg and colleagues 31 validated a model on a large data set (291 deaths). Sevenstudiesevaluated POSSUM models. 42 45 47 49 Overestimation was common in all POSSUM models, but the P-POSSUM generally performed best with prediction errors ranging 33 42 43 47 49 from 5% underestimate to 40% overestimate. The O-POSSUM overestimation ranged up to 200%. 42 45 47 49 Discrimination was moderate and ranged from 0.6 44 to 0.776. 42 Schroder and colleagues 41 evaluated Bartels and colleagues model 34 on 126 patients. Discrimination and calibration were not formally studied, but Schroder and colleagues predicted high risk group observed 16.7% mortality, lower than 37

BJA Warnell et al. Table 4 External validation studies: methods and results (rounded two decimal places). SMR, standardized mortality ratio; H L Hosmer Lemeshow goodness of fit; O:E, observed to expected ratio; CCI, Charlson comorbidity index; ACCI, age adjusted Charlson comorbidity index 63 Study Study design Sample size (n) and number of deaths Calibration Discrimination Lai and Comparison of O-, P-, and original POSSUM n¼545, 30 deaths colleagues 42 (5.5%) Overall predicted mortality and x 2 lack of fit (P-value): POSSUM 15% (,0.01) O-POSSUM 10.9% (,0.01) P-POSSUM 4.7% (0.81) Note: All overpredicted over whole risk range but P-POSSUM most accurate Nagabhushan and colleagues 43 Comparison of O- & P-POSSUM n¼313, 32 deaths SMR (P-value for H L goodness of fit): P-POSSUM 0.89; (P¼0.02) O-POSSUM 0.65; (P¼0.01) Note: Calibration over 6 predicted levels:,5 deaths in three highest risk groups. All failed to predict accurately Lagarde and External validation of O-POSSUM n¼663, 24 deaths O:E ratio 0.29 colleagues 44 H L goodness of fit, P,0.01 Note: Highest two risk strata had,5 deaths Zafirellis and External validation of POSSUM n¼204, 26 deaths SMR 0.66 colleagues 45 H L goodness of fit, P,0.01 Note: Highest three risk groups had,5 deaths Zingg and colleagues 46 Bosch and colleagues 47 Comparison of Ra and colleagues, Steyerberg and colleagues, Bartels and colleagues models and ASA on samples from Australia and Switzerland Comparison of P-, O-, and original POSSUM, CCI, ACCI, ASA n¼346, 14 deaths Australia, 8 deaths Switzerland Non-standard calibration or discrimination. Concluded none practically useful n¼280, 15 deaths Overall SMR: P-POSSUM 1.05 O-POSSUM 0.67 H L (P-value): P-POSSUM P¼0.04 O-POSSUM P¼0.53 CCI (P¼0.66) ACCI (P¼0.27) ASA (P¼0.21) O-POSSUM overpredicted compared with P-POSSUM Ball and External validation of P-POSSUM n¼53, 6 deaths Expected 2 deaths, observed 6 colleagues 48 deaths Dutta and Comparison of P-, O-, and original POSSUM n¼121, 5 deaths colleagues 49 (4.1%) Schroder and colleagues 41 Steyerberg and colleagues 31 External validation of Bartels and colleagues model External validation in SEER database (USA), 1997 9 Eindhoven Cancer Registry, 1993 2001 Rotterdam University Hospital, 1980 2002 n¼126, 7 deaths SEER, n¼714, 74 deaths Eindhoven, n¼349, 25 deaths Rotterdam, n¼1202, 45 deaths Grand total, n¼3592, 291 deaths Predicted overall mortality rate and SMR: POSSUM 16.5%, SMR 0.25 P-POSSUM 5.8%, SMR 0.71 O-POSSUM 9.9%, SMR 0.42 Used Bartels model to predict low, moderate, and high risk. O:E mortality (%): Low risk 2.9:3.6 (%) Moderate 3.0:8.7 (%) High 16.7:28 (%) Fewer than 5 deaths in each risk group Comparison of O:E risk. Reported good calibration for pooled sample but problematic for Netherlands samples curve: POSSUM 0.78 P-POSSUM 0.78 O-POSSUM 0.68 curve: P-POSSUM 0.68 O-POSSUM 0.61 curve 0.6 curve 0.62 Not available curve: P-POSSUM 0.77 O-POSSUM 0.76 CCI score 0.57 ACCI score 0.68 ASA score 0.64 Not available curve: POSSUM 0.76 P-POSSUM 0.81 O-POSSUM 0.72 Not available curve: 0.56 0.7 38

Predicting mortality after oesophagectomy BJA Sufficient data to assess analysis Reported appropriate model Continuous data handled appropriately Prognostic predictors defined Reported acceptable follow up rate Missing values handled appropriately Met Missing values stated or deducible Partially met Data validation Not met Data collection prospective retrospective Unclear, unknown Sample characteristics described Consecutive cases Surgical exclusions described 0% 20% 40% 60% 80% 100% % studies meeting criteria items Fig 2 Percentage of primary studies meeting reporting criteria for risk of bias. the 25% in Bartels and colleagues study, suggesting overestimation by the original model. Steyerberg and colleagues 31 evaluated the original Rotterdam model in a later SEER cohort, and in cohorts from a Netherlands population database and Rotterdam clinical centre. Discrimination was reported as poor (receiver operator AUC 0.56 0.7), but calibration was described as excellent for SEER patients and pooled data, but problematic for the Netherlands cohorts. Zingg and colleagues 46 evaluated Steyerberg and colleagues, 31 Bartels and colleagues, 34 and Ra and colleagues 32 models on cohorts from Switzerland and Australia. Standard discrimination and calibration methods were not reported, but the authors concluded that no models were applicable in practice. Clinical impact studies Two studies 34 40 reported using their models in clinical practice to reduce perioperative mortality, but these were not within prospective impact studies. Case mix differences between modelling and validation samples Case mix details are reported in Supplementary material. Differences between POSSUM modelling and validation samples included mortality definition, for example, the use of 30 day mortality, 43 45 49 percentages of elective cases, 43 44 and proportions of gastrectomy and oesophagectomy. 43 44 Most centres reported similar mixes of squamous and adenocarcinoma, but Lai and colleagues 42 sample from Hong Kong was exclusively squamous. Sample mortality rates also varied, for example, the O-POSSUM study reported 8.6% in hospital mortality, but the external validation studies reported mortality between 3.6% 44 and 12.7%. 45 Reported operative volumes ranged from nine 43 to 56 36 annually. The large population databases frequently did not report details of operative volume, overall mortality rates, and histological or operative details. Risk of bias in primary studies Sixteen studies did not report exclusions from surgery for fitness or other reasons, 433 35 37 39 49 431 33 35 37 and in 13, 39 41 43 48 49 it was not clear whether samples included consecutive operated cases (Fig. 2). Data were retrospectively 31 42 45 46 48 49 extracted from medical records in six studies, and in nine, 432 34 37 39 40 43 44 it was unclear whether data collection was prospective or retrospective. Data validation (e.g. data audit) was not performed or was poorly reported in 17 studies. 4 31 32 34 36 38 40 41 43 49 Reporting of the quantity and handling of missing data was poor or unclear in 14 studies. 32 41 46 49 The larger database studies were generally better at reporting data validation methods and management of missing data. Explicit reporting of survivor and non-survivor 43132343638 40 42 46 48 49 counts was not clear in 12 studies. Discussion We conducted this systematic review to identify a prediction model, which could be used as an aid to decision-making for patients or to assist in comparative audit. We found 11 models, of which the POSSUM -based models and those developed by Steyerberg and colleagues, 31 Ra and colleagues, 32 and Bartels and colleagues 34 have been validated in new patients. Reported discrimination was weak for all models and predicted mortality frequently exceeded observed mortality. In comparisons of POSSUM models, all tended to overestimate mortality but the P-POSSUM was most accurate. Poor reporting of case 39

BJA Warnell et al. selection and missing data management was common, sample sizes were frequently smaller than currently recommended particularly in validation studies, and there was evidence of potentially important case mix differences in validation samples compared with original development samples. Steyerberg and colleagues model 31 was subjected to the most rigorous validation and appeared to use predictors more likely to be reliably transportable to other settings. Unreliable prediction in new patients may occur when a model is too closely aligned to random variations in data from development samples ( overfitting ). This can occur if candidate predictors are selected using statistically significant associations between predictors and outcome ( data driven methods), rather than using clinical and evidential knowledge to make selections. 64 65 Most models except for Steyerberg and colleagues 31 and the P-POSSUM 66 used data driven methods to some degree. Overfitting is also common in small samples, especially when the ratio of outcome events to screened predictors is,10. 67 This was the case for the single-centre models and in Bailey and colleagues 35 model. The P-POSSUM 66 did not use data driven selection and used a fairly large sample, perhaps partially explaining its superior performance in new sample comparisons with the original POSSUM and O-POSSUM. Sample size is also important in validation studies and some 68 69 investigators recommend using up to 100 outcome events to allow valid model comparisons, and to reduce random imbalance of predictors. Except for Steyerberg and colleagues study, 31 small samples were used and in calibration, the highrisk categories frequently contained very few events, limiting precision and reliability. Since high-risk categories may be of particular interest in decision-making, much larger samples are likely to be needed to improve model utility. Case mix differences can also affect prediction in new populations 8 by affecting the distribution of predictors. These may be explicit, for instance, POSSUM development and validation samples differed in outcome definitions, and some potentially important variables. 42 45 49 Differences in mortality rates were also apparent and could reflect differences in important predictors (measured or not). Implicit and less obvious case mix differences may also have arisen from biases in case selection and the handling of missing data, both of which were often poorly reported. The use of subjectively interpreted predictors may not be reliably reproducible and also contribute to case mix differences, for example, the physician assessment of cardiac risk in Bartels and colleagues model. There are other contemporary scores available for cardiac comorbidity 70 or heart failure, 71 which may be more reliable than subjective assessment and could be considered. The poor discrimination between survivors and nonsurvivors reflects the weak association between currently available predictors and perioperative mortality. Age was the most consistently used and reliable predictor, but the most discriminating predictors (e.g. the presence of ascites had an odds ratio 15.7) 35 were unlikely to be relevant to current practice because their presence would exclude such patients from surgery. Similarly, the Glasgow coma score and the extreme categories of haematological and biochemical tests of the POSSUM models are probably not relevant to contemporary elective surgical populations, having been developed on a mixed elective and emergency surgical population. Clinical prediction models are unlikely to perform well if important predictors are omitted. 7 The considerable variation in individual predictors and their definition across the included studies highlights the current uncertainty about which candidate predictors are important. There is also a need to identify stronger candidate predictors and a potential candidate is cardiopulmonary exercise (CPX) testing. Unlike multivariate models, this is used reasonably widely in the UK 72 as part of risk stratification for surgery. Given the likely multifactorial nature of perioperative and medium-term outcome, it would seem reasonable to examine the potential role for CPX within robustly validated multivariate prognostic models. Similarly, the development of rapid genotyping may present opportunities to identify susceptible individuals to perioperative complications such as acute lung injury 73 and could be a candidate for prognostic study. Much work has been done to develop clinical prediction models for oesophagogastric surgery, but the heterogeneity of methods, presentation of results, and candidate predictor definition makes direct model comparison difficult. Further advances require better predictor characterization and large high-quality validation studies. This was done prospectively for the cardiothoracic euroscore, when the data from 19 030 procedures were collected from 128 European centres in a 3 month period in 1995. 74 For oesophagectomy, a project on a similar scale would take considerably longer, because clinical units do fewer cases in 3 months than the 120 cases submitted by each cardiothoracic centre. For instance, the National Oesophago-Gastric Cancer Audit 2010 3 took nearly 2 yr to collect data from 16 264 patients. With limited time and resources, we should consider alternative ways to make use of available information. Pooling results from current available data or individual patient systematic reviews may be an option. This method has been used to estimate predictor effects in cancer survival studies. 75 78 Large databases, such as the ICNARC database 79 or the National Oesophago-Gastric Cancer audits, 3 could be suitable sources from which to estimate some predictor effects. However, ultimately, large-scale prospective data collection will be necessary to validate and assess potential clinical impact. Finally, this review highlights some of the potential biases which may be encountered in studies of outcomes in high-risk surgery. This applies to most studies of clinical prediction models because they are observational and likely to have used secondary data sources such as clinical notes, and clinical or administrative databases. Outcome prediction studies in any high-risk surgical speciality need to consider how to best manage and report these potential risks of bias. There are good practice guidelines available for the design of both clinical prediction studies 7 9 13 80 81 and more generally for the design 82 and reporting 83 of studies of clinical outcomes, which use secondary data sources. 40

Predicting mortality after oesophagectomy BJA Strengths and weaknesses Systematic review methods for prediction models are less well developed than for interventions; however, we used PRISMA guidelines and adapted current recommendations for prognostic models. 14 80 84 85 Inevitably, parts of the review process were iterative. For example, we modified the search strategy, when studies known to the reviewers were not retrieved and redefined risk of bias items which were found to be difficult to apply. Iteration may introduce bias, but has been recognized as an acceptable part of systematic review 16 86 methodology. We intended to apply conclusions to high surgical volume centres, such as our own. Therefore, we only included studies from either large population databases, which are likely to be widely applicable because of the larger sample sizes and more general predictors, or high volume clinical centres. However, restriction to high volume studies may have introduced bias which could adversely affect application to less specialized settings. Publication bias may have affected this study just as it has been reported in other prognostic and outcome studies for oesophageal cancer surgery 87 and limitation to English language articles may have biased selection to studies with statistically significant results. 88 Searches did not include the grey literature and we did not contact authors. Neither did we use formal quantitative methods (e.g. funnel plots) to assess potential publication bias because of the heterogeneous nature of the reported summary statistics. 88 Conclusion None of the models identified in this review can currently be applied in clinical practice with any confidence, because performance was generally unreliable, discrimination poor, and validation studies were too small. Potential study biases were poorly managed or reported in some studies. Steyerberg and colleagues model 31 has more potential for future validation and application than POSSUM models because the constituent predictors are more transportable and relevant to current elective surgical groups. Further model development requires achieving consensus on predictor definition and effect, and also validation and model comparison in large samples using currently acceptable methods and reporting. These are unlikely to be obtainable from single clinical centres. Existing UK databases and published studies may be useful sources for data synthesis, but prospective high-quality validation in large samples requires coordinated multicentre or large database studies. Supplementary material Supplementary material is available at British Journal of Anaesthesia online. Authors contributions I.W.: review design, data extraction and interpretation, writing up, and revising article. M.C.: data extraction and interpretation and revising article. M.E.: review design and revising article. Acknowledgements We would like to thank Erika Gynett (Walton Library) for help structuring the search strategy and Dr Nick Steen (Institute of Health and Society) for statistical advice. Declaration of interest None declared. References 1 Secretary of State for Health. Equity and Excellence: Liberating the NHS. HM Government, Department of Health, 2010 2 Secretary of State for Health. Liberating the NHS: No Decision About Me, Without Me. Further Consultation on Proposals to Secure Shared Decision-Making. HM Government, Department of Health, 2012 3 Cromwell D, Palser T, van der Meulen J, et al. National Oesophago- Gastric Cancer Audit 2010. The NHS Information Centre, 2010 4 McCulloch P, Ward J, Tekkis PP, ASCOT Group of Surgeons, British Oesophago-Gastric Cancer Group. Mortality and morbidity in gastro-oesophageal cancer surgery: initial results of ASCOT multicentre prospective cohort study. Br Med J 2003; 327: 1192 7 5 Blazeby JM, Farndon JR, Donovan J, Alderson D. A prospective longitudinal study examining the quality of life of patients with esophageal carcinoma. Cancer 2000; 88: 1781 7 6 Association of Upper Gastrointestinal Surgeons of Great Britain and Ireland. Outcomes data. Available from http://www.augis.org/ surgical-outcomes/outcomes-data.htm (accessed 14 October 2013) 7 Altman DG, Vergouwe Y, Royston P, Moons KGM. Prognosis and prognostic research: validating a prognostic model. Br Med J 2009; 338: 1432 8 Moons KGM, Altman DG, Vergouwe Y, Royston P. Prognosis and prognostic research: application and impact of prognostic models in clinical practice. Br Med J 2009; 338: b606 9 Moons KGM, Royston P, Vergouwe Y, Grobbee DE, Altman DG. Prognosis and prognostic research: what, why, and how? Br Med J 2009; 338: b375 10 Shende MR, Waxman J, Luketich JD. Predictive ability of preoperative indices for esophagectomy. Thorac Surg Clin 2007; 17: 337 41 11 Copeland P, Jones D, Walters M. POSSUM: a scoring system for surgical audit. Br J Surg 1991; 78: 355 60 12 Dutta S, Horgan P, McMillan D. POSSUM and its related models as predictors of postoperative mortality and morbidity in patients undergoing surgery for gastro-oesophageal cancer: a systematic review. World J Surg 2010; 34: 2076 82 13 Royston P, Moons KGM, Altman DG, Vergouwe Y. Prognosis and prognostic research: developing a prognostic model. Br Med J 2009; 338: b604 14 Hayden JA, Cote P, Bombardier C. Evaluation of the quality of prognosis studies in systematic reviews. Ann Intern Med 2006; 144: 427 37 15 Steyerberg EW. Patterns of external validity. In: Gail M, Tsiatis A, Krickeberg K, Sarnet J, eds. Clinical Prediction Models. A Practical Approach to Development, Validation, and Updating. New York: Springer, 2009; 335 16 Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. J Clin Epidemiol 2009; 62: 1006 12 17 Ochsner JL, DeBakey M. Surgical aspects of carcinoma of the esophagus; review of the literature and report of 4 cases. J Thorac Surg 1941; 10: 401 45 41

BJA Warnell et al. 18 Clinical Effectiveness Unit, The Royal College of Surgeons of England. National Oesophago-Gastric Cancer Audit, 2013 19 Al-Sarira AA, David G, Willmott S, Slavin JP, Deakin M, Corless DJ. Oesophagectomy practice and outcomes in England. Br J Surg 2007; 94: 585 91 20 Dimick JB, Wainess RM, Upchurch GR Jr, Iannettoni MD, Orringer MB. National trends in outcomes for esophageal resection. Ann Thorac Surg 2005; 79: 212 6 21 Hofstetter W, Swisher SG, Correa AM, et al. Treatment outcomes of resected esophageal cancer. Ann Surg 2002; 236: 376 84 22 Rouvelas I, Zeng W, Lindblad M, Viklund P, Ye W, Lagergren J. Survival after surgery for oesophageal cancer: a population-based study. Lancet Oncol 2005; 6: 864 70 23 Sauvanet A, Mariette C, Thomas P, et al. Mortality and morbidity after resection for adenocarcinoma of the gastroesophageal junction: predictive factors. J Am Coll Surg 2005; 201: 253 62 24 Jamieson GG, Mathew G, Ludemann R, Wayman J, Myers JC, Devitt PG. Postoperative mortality following oesophagectomy and problems in reporting its rate. Br J Surg 2004; 91: 943 7 25 Killeen SD, O Sullivan MJ, Coffey JC, Kirwan WO, Redmond HP. Provider volume and outcomes for oncological procedures. Br J Surg 2005; 92: 389 402 26 Bachmann MO, Alderson D, Edwards D, et al. Cohort study in South andwest Englandof the influenceof specialization on the management and outcome of patients with oesophageal and gastric cancers. Br J Surg 2002; 89: 914 22 27 Allareddy V, Allareddy V, Konety BR. Specificity of procedure volume and in-hospital mortality association. Ann Surg 2007; 246: 135 9 28 Birkmeyer JD, Siewers AE, Finlayson EVA, et al. Hospital volume and surgical mortality in the United States. N Engl J Med 2002; 346: 1128 37 29 Pennefather SH. Anaesthesia for oesphagectomy. Curr Opin Anaesthesiol 2007; 20: 15 20 30 Shaw IH. Anaesthetic aspects and case selection for oesophageal and gastric surgery. In: Griffin SM, Raimes S, eds. Oesophagogastric Surgery: A Companion to Specialist Surgical Practice, 4th Edn. Saunders Elsevier, 2008 31 Steyerberg EW, Neville BA, Koppert LB, et al. Surgical mortality in patients with esophageal cancer: development and validation of a simple risk score. J Clin Oncol 2006; 24: 4277 84 32 Ra J, Paulson EC, Kucharczuk J, et al. Postoperative mortality after esophagectomy for cancer: development of a risk prediction model. Ann Surg Oncol 2008; 15: 1577 84 33 Tekkis PP, McCulloch P, Poloniecki JD, Prytherch DR, Kessaris N, Steger AC. Risk-adjusted prediction of operative mortality in oesophagogastric surgery with O-POSSUM. Br J Surg 2004; 91: 288 95 34 Bartels H, Stein HJ, Siewert JR. Preoperative risk analysis and postoperative mortality of oesophagectomy for resectable oesophageal cancer. Br J Surg 1998; 85: 840 4 35 Bailey SH, Bull DA, Harpole DH, et al. Outcomes after esophagectomy: a ten-year prospective cohort. Ann Thorac Surg 2003; 75: 217 22 36 Law SYK, Fok M, Wong J. Risk analysis in resection of squamous cell carcinoma of the esophagus. World J Surg 1994; 18: 339 46 37 Liu JF, Watson DI, Devitt PG, Mathew G, Myburgh J, Jamieson GG. Riskfactoranalysis of post-operative mortalityin oesophagectomy. Dis Esophagus 2000; 13: 130 5 38 Sanz L, Ovejero VJ, Gonzalez JJ, et al. Mortality risk scales in esophagectomy for cancer: their usefulness in preoperative patient selection. Hepatogastroenterology 2006; 53: 869 73 39 Dhungel B, Diggs B, Hunter J, Sheppard B, Vetto J, Dolan J. Patient and peri-operative predictors of morbidity and mortality after esophagectomy: American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP), 2005 2008. J Gastrointest Surg 2010; 14: 1492 501 40 Zhang GH, Fujita H, Yamana H, Kakegawa T. A prediction of hospital mortality after surgical treatment for esophageal cancer. Surg Today 1994; 24: 122 7 41 Schroder W, Bollschweiler E, Kossow C, Holscher AH. Preoperative risk analysis a reliable predictor of postoperative outcome after transthoracic esophagectomy? Langenbeck s Arch Surg 2006; 391: 455 60 42 Lai F, Kwan TK, Yuen WC, Wai A, Shung YCSE. Evaluation of various POSSUM models for predicting mortality in patients undergoing elective oesophagectomy for carcinoma. Br J Surg 2007; 94: 1172 8 43 Nagabhushan JS, Srinath S, Weir F, Angerson WJ, Sugden BA, Morran CG. Comparison of P-POSSUM and O-POSSUM in predicting mortality after oesophagogastric resections. Postgrad Med J 2007; 83: 355 8 44 Lagarde SM, Maris AKD, de Castro S, Busch ORC, Obertop H, van Lanschot JJB. Evaluation of O-POSSUM in predicting in-hospital mortality after resection for oesophageal cancer. Br J Surg 2007; 94: 1521 6 45 Zafirellis KD, Fountoulakis A, Dolan K, Dexter SPL, Martin IG, Sue-Ling HM. Evaluation of POSSUM in patients with oesophageal cancer undergoing resection. Br J Surg 2002; 89: 1150 5 46 Zingg U, Langton C, Addison B, et al. Risk prediction scores for postoperative mortality after esophagectomy. J Gastrointest Surg 2009; 13: 611 8 47 Bosch DJ, Pultrum BB, de Bock GH, Oosterhuis JK, Rodgers MGG, Plukker JTM. Comparison of different risk-adjustment models in assessing short-term surgical outcome after transthoracic esophagectomy in patients with esophageal cancer. Am J Surg 2011; 202: 303 9 48 Ball C, Butterworth J, Seidel J. Predictive value of P-POSSUM scoring for Ivor-Lewis oesophagectomy. Abstracts of ESICM LIVES 2011, Berlin, 1 5 October 2011. Intensive Care Med 2011; 37: S61 49 Dutta S, Al-Mrabt N, Fullarton G, Horgan P, McMillan D. A comparison of POSSUM and GPS models in the prediction of post-operative outcome in patients undergoing oesophago-gastric cancer resection. Ann Surg Oncol 2011; 18: 2808 17 50 Yamashita S, Takeno S, Moroga T, et al. E-PASS (the Estimation of Physiologic Ability and Surgical Stress) scoring system helps the prediction of postoperative morbidity and mortality in esophageal cancer operation. Dis Esophagus 2010; 23(Suppl. S1): 54A 51 Chamogeorgakis T, Toumpoulis I, Tomos P, et al. External validation of the modified Thoracoscore in a new thoracic surgery program: prediction of in-hospital mortality. Interact Cardiovasc Thorac Surg 2009; 9: 463 6 52 LunaA, RebasaP, NavarroS, etal. Anevaluationofmorbidityandmortality in oncologic gastric surgery with the application of POSSUM, P-POSSUM, and O-POSSUM. World J Surg 2009; 33: 1889 94 53 Guest RV, Chandrabalan VV, Murray GD, Auld CD. Application of variable life adjusted display (VLAD) to risk-adjusted mortality f esophagogastric cancer surgery. World J Surg 2012; 36: 104 8 54 Noble F, Curtis N, Harris S, et al. Risk assessment using a novel score to predict anastomotic leak and major complications after oesophageal resection. J Gastrointest Surg 2012; 16: 1083 95 55 Sunpaweravong S, Ruangsin S, Laohawiriyakamol S. Prediction of post-operative complications and survival for esophageal carcinoma. 12th World Congress of the International Society for Diseases of the Esophagus, 2010. Dis Esophagus 2010; 23(Suppl. S1): 14A 56 Vashist Y, Loos J, Dedow J, et al. Glasgow prognostic score is a predictorof perioperative and long-term outcome in patients with only 42