Modeling major lung resection outcomes using classification trees and multiple imputation techniques

Similar documents
Prapaporn Pornsuriyasak, M.D. Pulmonary and Critical Care Medicine Ramathibodi Hospital

Different Diffusing Capacity of the Lung for Carbon Monoxide as Predictors of Respiratory Morbidity

ORIGINAL PAPER. Marginal pulmonary function is associated with poor short- and long-term outcomes in lung cancer surgery

Parenchymal air leak is a frequent complication after. Pleural Tent After Upper Lobectomy: A Randomized Study of Efficacy and Duration of Effect

The cardiopulmonary exercise test (CPET) has been

Pulmonary Function Tests Do Not Predict Pulmonary Complications After Thoracoscopic Lobectomy

Lung resection still achieves the best long-term results

Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study

Does preoperative predictive lung functions correlates with post surgical lung functions in lobectomy?

Treatment of Clinical Stage I Lung Cancer: Thoracoscopic Lobectomy is the Standard

Predicting Postoperative Pulmonary Function in Patients Undergoing Lung Resection*

A Scoring System to Predict the Risk of Prolonged Air Leak After Lobectomy

Thoracoscopic Lobectomy for Locally Advanced Lung Cancer. Masters of Minimally Invasive Thoracic Surgery Orlando September 19, 2014

Despite the large number of patients undergoing. Quality of Life and Mood in Older Patients After Major Lung Resection

FEV1 predicts length of stay and in-hospital mortality in patients undergoing cardiac surgery

Long-term respiratory function recovery in patients with stage I lung cancer receiving video-assisted thoracic surgery versus thoracotomy

Complex Thoracoscopic Resections for Locally Advanced Lung Cancer

Lung Cancer in Octogenarians: Factors Affecting Morbidity and Mortality After Pulmonary Resection

Approximately 20% of patients with non small cell

The effect of surgeon volume on procedure selection in non-small cell lung cancer surgeries. Dr. Christian Finley MD MPH FRCSC McMaster University

Proper Treatment Selection May Improve Survival in Patients With Clinical Early-Stage Nonsmall Cell Lung Cancer

Preoperative assessment for lung resection. RA Dyer

ORIGINAL ARTICLE. Abstract INTRODUCTION

Postoperative Mortality in Lung Cancer Patients

DLCO and postpneumonectomy complications RELATIONSHIP OF CARBON MONOXIDE PULMONARY DIFFUSING CAPACITY TO POSTOPERATIVE CARDIOPULMONARY COMPLICATIONS I

Yutian Lai #, Xin Wang #, Pengfei Li, Jue Li, Kun Zhou, Guowei Che. Introduction

Preoperative Workup for Pulmonary Resection. Kristen Bridges, M.D. Richmond University Medical Center January 21, 2016

Predicting Short Term Morbidity following Revision Hip and Knee Arthroplasty

Studies of patients undergoing various types of operations

Impact of limited pulmonary function on the management of resectable lung cancer

Robotic lobectomy has the greatest benefit in patients with marginal pulmonary function

Akihiro Hayashi, MD, Shinzo Takamori, MD, Masahiro Mitsuoka, MD, Keisuke Miwa, MD, Mari Fukunaga, MD, Keiko Matono, MD, and Kazuo Shirouzu, MD

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY

T3 NSCLC: Chest Wall, Diaphragm, Mediastinum

Prognostic value of visceral pleura invasion in non-small cell lung cancer q

Outcomes of Patients with Preoperative Weight Loss following Colorectal Surgery

The Role of an Array of Routine Clinical Variables to the Occurrence and Severity of Postoperative Pneumonia in Non Small Cell Lung Cancer Patients

Does ambroxol confer a protective effect on the lungs in patients undergoing cardiac surgery or having lung resection?

Video-Assisted Thoracic Surgery Pulmonary Resection for Lung Cancer in Patients with Poor Lung Function

Clinical results of sublobar resection versus lobectomy or more extensive resection for lung cancer patients with idiopathic pulmonary fibrosis

Quality metrics for resection: Are they reasonable?

Chest drainage systems and management of air leaks after a pulmonary resection

Evaluation of operability before lung resection Chris Thomas Bolliger, MD, PhD

Prolonged air leak after video-assisted thoracic surgery lung cancer resection: risk factors and its effect on postoperative clinical recovery

Help! Statistics! Missing data. An introduction

Predicting Breast Cancer Survival Using Treatment and Patient Factors

This is a cross-sectional analysis of the National Health and Nutrition Examination

Optimal technique for the removal of chest tubes after pulmonary resection

Does fast-tracking increase the readmission rate after pulmonary resection? A case-matched study

Surgical resection is the treatment of choice for nonsmall

MODEL SELECTION STRATEGIES. Tony Panzarella

Is surgical Apgar score an effective assessment tool for the prediction of postoperative complications in patients undergoing oesophagectomy?

Survival Prediction Models for Estimating the Benefit of Post-Operative Radiation Therapy for Gallbladder Cancer and Lung Cancer

Fariba Rezaeetalab Associate Professor,Pulmonologist

Although surgical resection is the best treatment for localized. Predictors of Postoperative Quality of Life after Surgery for Lung Cancer

Is laparoscopic sleeve gastrectomy safer than laparoscopic gastric bypass?

L ung cancer takes more lives than any other cancer in the

DOES SMOKING MARIJUANA INCREASE THE RISK OF CHRONIC OBSTRUCTIVE PULMONARY DISEASE?

Missing Data and Imputation

ORIGINAL ARTICLE. Incidence and risk factors for acute lung injury after open thoracotomy for thoracic diseases

Chapter. Diffusion capacity and BMPR2 mutations in pulmonary arterial hypertension

Transfusion & Mortality. Philippe Van der Linden MD, PhD

Study No.: Title: Rationale: Phase: Study Period: Study Design: Centres: Indication: Treatment: Objectives: Primary Outcome/Efficacy Variable:

Clinical and radiographic predictors of GOLD-Unclassified smokers in COPDGene

MAKING THE NSQIP PARTICIPANT USE DATA FILE (PUF) WORK FOR YOU

Respiratory Function Testing Is Safe in Patients With Abdominal Aortic Aneurysms.

DIAGNOSTIC NOTE TEMPLATE

Early and locally advanced non-small-cell lung cancer (NSCLC)

Usefulness of Lung Perfusion Scintigraphy Before Lung Cancer Resection in Patients with Ventilatory Obstruction

Idiopathic pulmonary fibrosis (IPF) is a

In patients with peripheral T1N0 non-small cell lung cancer

CASE REPORT. Introduction. Case series reports. J Thorac Dis 2012;4(S1): DOI: /j.issn s003

Predicting pulmonary complications after pneumonectomy for lung cancer q

Carcinoma of the esophagus continues to carry a

Impact of Tidal Volume on Complications after Thoracic Surgery

Complete surgical excision remains the greatest potential

Physiologic Evaluation of the Patient. With Lung Cancer Being Considered for Resectional Surgery

Key words: activities of daily living; complications; comprehensive geriatric assessment; mini-mental state examination; thoracic surgery elderly

Performance at Preoperative Stair-Climbing Test Is Associated With Prognosis After Pulmonary Resection in Stage I Non-Small Cell Lung Cancer

Exploring the Impact of Missing Data in Multiple Regression

ORIGINAL ARTICLE. Thoracoscopic minimally invasive surgery for non-small cell lung cancer in patients with chronic obstructive pulmonary disease

Preoperative Pulmonary Evaluation. Michelle Zetoony, DO, FCCP, FACOI Board Certified Pulmonary, Critical Care, Sleep and Internal Medicine

As the proportion of the elderly in the

Does quality of life predict morbidity or mortality in patients with atrial fibrillation (AF)?

Early-stage locally advanced non-small cell lung cancer (NSCLC) Clinical Case Discussion

Reduced lung function in midlife and cognitive impairment in the elderly

In uence of age and predicted forced expiratory volume in 1 s on prognosis following complete resection for non-small cell lung carcinoma

SCIENZE MEDICHE SPECIALISTICHE

Preoperative Serum Bicarbonate Levels Predict Acute Kidney Iinjry after Cardiac Surgery

The Effects of Preoperative Short-term Intense Physical Therapy in Lung Cancer Patients: A Randomized Controlled Trial

A Novel Score to Estimate the Risk of Pneumonia After Cardiac Surgery

Does chemotherapy increase the risk of respiratory complications after pneumonectomy?

Lung-Volume Reduction Surgery ARCHIVED

Xiang-Lin Hu 1 *, Song-Tao Xu 2 *, Xiao-Cen Wang 1, Dong-Ni Hou 1, Cui-Cui Chen 1, Dong Yang 1, Yuan- Lin Song 1. Original Article

CT Densitometry as a Predictor of Pulmonary Function in Lung Cancer Patients

Research Findings in Thoracic

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

Supplementary appendix

STS General Thoracic Surgery Database (GTSD) Update

Productivity losses in chronic obstructive pulmonary disease a population-based survey.

Transcription:

European Journal of Cardio-thoracic Surgery 34 (2008) 1085 1089 www.elsevier.com/locate/ejcts Modeling major lung resection outcomes using classification trees and multiple imputation techniques Mark K. Ferguson a, *, Juned Siddique b, Theodore Karrison b a Department of Surgery, The University of Chicago, 5841 South Maryland Avenue, MC5035, Chicago, IL USA b Department of Health Studies, The University of Chicago, Chicago, IL USA Received 5 June 2008; received in revised form 14 July 2008; accepted 21 July 2008; Available online 29 August 2008 Abstract Objective: Modeling of operative risks associated with major lung resection is potentially inaccurate and inefficient because of incomplete observations for predictor variables (covariates). Missing values do not usually occur randomly, potentially introducing an important source of bias in modeling. Deletion of cases with missing data also results in loss of precision. The current study analyzes incomplete variables as potential predictors of outcomes after major lung resection using imputation techniques. Methods: We analyzed major lung resection patients treated from 1980 to 2006 for predictors of pulmonary, cardiovascular, and overall complications, as well as mortality. Predictive variables were initially determined using classification and regression tree (CART) methods. Imputation models were developed and variables with missing values were multiply imputed. We fit a logistic regression model for each outcome using CART variables and any covariates that were of interest clinically. Results: Of 1046 resected patients, serum albumin and diffusing capacity (DLCO%) had a large number of missing values (32% and 13% missing, respectively). Models included 10 covariates for pulmonary complications ( p < 0.05 for DLCO% and forced expiratory volume in the first second [FEV1%]), 12 covariates for cardiovascular complications ( p < 0.05 for FEV1%, extent of resection, year of operation, and age), 15 covariates for overall complications ( p < 0.05 for DLCO%, performance status, serum albumin, and FEV1/FVC ratio), and 12 covariates for death ( p < 0.05 for DLCO%, extent of resection, and operation year). Conclusions: We identified serum albumin as a previously under-reported and strong predictor of overall complications. Serum albumin was marginally significantly related to pulmonary and cardiovascular outcomes after major lung surgery. Use of imputation techniques for modeling surgical risks has potential value in identifying important predictive variables that may ordinarily be eliminated from analysis or not identified as predictors because of incomplete observations in clinical databases. # 2008 European Association for Cardio-Thoracic Surgery. Published by Elsevier B.V. All rights reserved. Keywords: Neoplasm; Lung; Surgical risk; Serum albumin; Imputation; Diffusing capacity; Classification and regression tree 1. Introduction Major lung surgery is associated with rates of morbidity and mortality that are not trivial. Risks for pulmonary complications or cardiovascular complications exceed 15%, and mortality rates range from 2% for lobectomy to as high as 8% for pneumonectomy. Patients undergoing major lung resection often have multiple comorbid factors such as hypertension, coronary artery disease, diabetes, and chronic obstructive pulmonary disease. Data on these factors are easily recorded and have been used in risk assessment estimates for decades. Formal modeling of risks for morbidity and mortality is important in appropriate patient selection for surgery, counseling patients as part of the surgical consent process, stratifying outcomes for research purposes, and assessment * Corresponding author. Tel.: +1 773 702 3551; fax: +1 773 702 2642. E-mail address: mferguso@surgery.bsd.uchicago.edu (M.K. Ferguson). of resource utilization. Previous studies from our institution have demonstrated the importance of pulmonary function, age, and performance status in predicting the risk of major lung resection [1 6]. Recent analyses of our database as part of a longitudinal study of outcomes demonstrated a possible important effect of serum albumin on estimations of outcomes such as mortality and cardiopulmonary complications [7]. However, there were too many missing data points for that parameter to include it in the development of a generalized risk model using routine methodology. Standard statistical techniques and software routines automatically eliminate patients with missing values from risk analyses. This is problematic because missing values do not usually occur randomly, potentially introducing an important source of bias in any such analyses. Deleting observations can also reduce the precision of parameter estimates. New techniques for imputing missing values are currently being advocated for more accurate modeling of outcomes using large datasets [8 10]. In fact, some clinical medicine journals strongly encourage the use of imputation 1010-7940/$ see front matter # 2008 European Association for Cardio-Thoracic Surgery. Published by Elsevier B.V. All rights reserved. doi:10.1016/j.ejcts.2008.07.037

1086 M.K. Ferguson et al. / European Journal of Cardio-thoracic Surgery 34 (2008) 1085 1089 techniques in data analyses so that potentially relevant relationships are not overlooked and to avoid potential bias in results reporting. The purpose of the current study is to analyze serum albumin and other incomplete variables as potential predictors of outcomes after major lung resection using imputation techniques. 2. Methods We analyzed our database of patients who underwent major lung resection from 1980 through 2006. The data include demographic information, preoperative laboratory values, type of operation, cancer staging information where appropriate, and operative outcomes including specific complications. This study was approved by our institutional review board and specific patient consent was waived. Predictive variables for specific outcomes including mortality, overall complications, pulmonary morbidity (initial ventilatory support >24 h, reintubation for respiratory insufficiency, pneumonia, lobar collapse) and cardiovascular morbidity (myocardial infarction, arrhythmia, pulmonary embolism, cardiovascular instability requiring intravenous vasoactive agents) were determined using classification and regression tree (CART) methods [11]. Classification trees are a computationally intensive nonparametric approach to classification often used in medical decision-making. Tree construction consists of searching through the predictive variables and choosing the one variable that best subsets the data into two groups in which the distribution of the variable to be classified is more pure (i.e., homogenous with respect to the outcome, resulting in lower misclassification rates) than before the split. If a predictive variable is not already binary, the algorithm identifies the cut-off that provides the best split. Once the data have been subset into two nodes, the algorithm then considers each node separately. If a node is sufficiently pure, no more classification is necessary. Otherwise, the algorithm, using the cases in that node, searches through the remaining variables to identify the variable that best splits the node into two additional nodes of increased purity. This construction process continues until all nodes are sufficiently pure and no more splitting is necessary. A final pruning step may also be performed to avoid over fitting. By incorporating a loss function into the tree algorithm, one can grow trees that maximize sensitivity or specificity. In order to identify all variables associated with our outcomes, we grew two trees for each outcome; one that maximized sensitivity, and one that maximized specificity. We then developed an imputation model that incorporated the following: our outcome variables, the variables that were identified in the classification trees, and any additional variables that appeared in our regression models or were correlated with those variables with missing values. All variables with missing values were multiply imputed under a multivariate normal model [12]. Once the data were imputed, we fit a logistic regression model for each outcome again using those variables identified by the classification trees as well as any covariates that were of interest clinically. All models included the variables DLCO% (diffusing capacity of the lung for carbon monoxide, expressed as a percent of predicted), gender, performance status, induction therapy, cancer stage, FEV1% (forced expiratory volume in the first second, expressed as a percent predicted), serum albumin, and fraction of lung volume, whether or not they were identified in the classification trees. While we considered all two-way interactions among the covariates, for the sake of parsimony, the regression models only included covariate main effects [13]. Models were fit on each of the five imputed datasets and results were combined using the methods described in Rubin [14]. Data are expressed as mean standard deviation (SD). 3. Results A total of 1046 patients underwent major lung resection during the study period. Their demographic information, laboratory values, diagnoses, preoperative therapy, operations, and outcomes are listed in Table 1. Among the variables that had the largest number of missing values were serum albumin and diffusing capacity (32% and 13% missing, respectively). There were substantial differences between groups with and without these measurements, indicating the importance of complete datasets in modeling outcomes (Table 2). After CART identification of predictive variables and multiple imputation for missing values, the resultant models included 10 covariates for pulmonary complications ( p < 0.05 for DLCO% and FEV1%; Table 3); 12 covariates for cardiovascular complications ( p < 0.05 for FEV1%, extent of resection, year of operation, and age; Table 4); 15 covariates for overall complications ( p < 0.05 for DLCO%, performance status, serum albumin, and FEV1/FVC ratio; Table 5); and 12 covariates for death ( p < 0.05 for DLCO%, extent of resection, and operation year; Table 6). Serum albumin was marginally significantly related to pulmonary and cardiovascular outcomes. 4. Discussion The prediction of the risk of complications after major lung resection is important for a number of reasons. It assists in selecting patients for surgery, obtaining informed consent, determining the utility of preoperative risk-reducing interventions, selecting alternative therapies if the risk is deemed to be excessive, and in assigning resources for postoperative care. Formal preoperative assessment of risk by clinicians using rating algorithms is uncommon in major lung surgery. This is likely because the predictive ability of such algorithms is generally only moderately good, a variety of systems have been published that are not consistent in their recommendations, and many surgeons believe that the risk of a resection for cancer is often a better option than the high risk a patient faces if curative cancer therapy in the form of surgery is not offered. Inconsistency among risk models exists in part because of differing patient populations, clinical management, and methods of outcome assessment. In addition, potentially useful indicators may not be measured routinely or their

M.K. Ferguson et al. / European Journal of Cardio-thoracic Surgery 34 (2008) 1085 1089 1087 Table 1 Demographic, laboratory, and surgical data for patients who underwent major lung resection Category Patients assessed Value or number affected Range Age (years; mean SD) 1045 61.5 11.4 17 96 Male gender 1046 573 (54.8%) Diabetes mellitus 1037 146 (14.1%) Hypertension 1039 407 (39.1%) Serum creatinine (mg/dl) 777 1.1 1.0 0.3 11.4 Serum hemoglobin (g/dl) 817 13.1 1.7 7.2 18.4 Serum albumin (g/dl) 711 4.0 0.5 1.8 5.0 Prior myocardial infarction 1025 102 (10.0%) 1018 855 (84.0%) Current cigarette smoker 1039 452 (43.5%) Body mass index (BMI) 985 26.4 5.6 13.5 51.7 FVC% 977 85.9 18.5 28 145 FEV1% 998 82.1 22.2 22 158 FEV1/FVC ratio 1008 0.70 0.18 0.33 0.98 DLCO% 907 84.7 21.9 25 171 Lung cancer diagnosis 1041 862 (82.8%) Induction chemotherapy or radiotherapy 906 94 (10.4%) Year of operation 1045 1980 2006 Extent of resection 1046 Lobectomy 780 (74.6%) Bilobectomy 89 (8.5%) Pneumonectomy 177 (16.9%) Fraction of lung volume 998 0.75 0.11 0.41 0.98 Cancer stage 858 0orI 472 (55.0%) II 166 (19.4%) III 207 (24.1%) IV 13 (1.5%) Pulmonary complications 1035 151 (14.6%) Cardiovascular complications 1033 156 (15.1%) Any complication 1026 325 (31.7%) Operative mortality 1045 65 (6.2%) FVC%: forced vital capacity expressed as a percent of predicted; FEV1%: forced expiratory volume in the first second expressed as a percent of predicted; DLCO%: diffusing capacity of the lung for carbon monoxide expressed as a percent of predicted. values may not be collected for database purposes. An example is diffusing capacity, which has been shown to be a strong, independent determinant of pulmonary complications and operative mortality after major lung resection. In the European Thoracic Surgery Database fewer than 25% of patients had values for diffusing capacity recorded [15]. In our own database we also found in statistical analyses that serum albumin was a potentially important determinant of Table 2 Comparison of variables between groups with and without accompanying key covariate measurements Variable No DLCO group DLCO group No albumin group Albumin group Age (years) 58.6 13.2 61.9 11.1 0.006 61.8 11.8 61.3 11.2 0.58 Male 61.1% 53.8% 0.11 49.6% 57.2% 0.02 Diabetes mellitus 11.0% 14.5% 0.27 12.2% 15.0% 0.23 Hypertension 33.8% 40.0% 0.17 38.4% 39.5% 0.73 Prior myocardial infarction 11.2% 9.8% 0.61 8.3% 10.7% 0.23 Induction therapy 11.3% 10.2% 0.73 6.8% 12.1% 0.015 83.2% 84.1% 0.79 13.3% 17.3% 0.10 Serum creatinine (mg/dl) 1.09 1.1 1.11 1.7 0.91 0.95 0.7 1.17 1.9 0.02 Serum hemoglobin (g/dl) 12.8 1.98 13.1 1.6 0.32 13.1 1.6 13.0 1.7 0.52 FVC% 78.0 20.8 86.5 18.2 0.001 86.1 18.8 85.8 18.4 0.85 FEV1% 71.9 22.8 83.1 21.9 <0.001 82.8 23.9 81.8 21.4 0.55 FEV1/FVC ratio 0.73 0.12 0.70 0.12 0.023 0.70 0.12 0.70 0.11 0.69 DLCO% 83.3 21.9 85.4 21.9 0.17 Fraction of lung volume 0.75 0.13 0.75 0.11 0.54 0.76 0.10 0.75 0.12 0.015 Current smoker 54.1% 41.9% 0.008 35.0% 47.5% <0.001 Body mass index (BMI) 25.2 5.6 26.6 5.5 0.013 26.7 5.4 26.3 5.7 0.37 Cancer stage I or II 66.1% 75.6% 0.034 77.9% 72.7% 0.10 Serum albumin (g/dl) 3.9 0.5 4.1 1.4 0.09 Year of operation 1989.1 8.7 1995.5 7.8 <0.001 1997.5 7.5 1993.3 8.2 <0.001 FVC%: forced vital capacity expressed as a percent of predicted; FEV1%: forced expiratory volume in the first second expressed as a percent of predicted; DLCO%: diffusing capacity of the lung for carbon monoxide expressed as a percent of predicted.

1088 M.K. Ferguson et al. / European Journal of Cardio-thoracic Surgery 34 (2008) 1085 1089 Table 3 Covariates for pulmonary complications Table 5 Covariates for overall complications DLCO% (10% change) 0.80 0.73 0.89 <0.0001 FEV1% (10% change) 0.91 0.83 0.99 0.035 Female 0.74 0.51 1.08 0.12 1.46 0.92 2.32 0.11 Induction therapy 0.65 0.29 1.46 0.29 Early cancer stage (I or II) 0.65 0.40 1.06 0.086 Late cancer stage (III or IV) 0.70 0.38 1.32 0.27 Serum albumin 0.62 0.37 1.03 0.062 Fraction of lung volume 0.39 0.08 1.97 0.26 Year of operation (5 years change) 0.92 0.82 1.03 0.14 FEV1%: forced expiratory volume in the first second expressed as a percent of predicted; DLCO%: diffusing capacity of the lung for carbon monoxide expressed as a percent of predicted. Table 4 Covariates for cardiovascular complications DLCO% (10% change) 0.94 0.85 1.04 0.21 FEV1% (10% change) 0.87 0.79 0.96 0.0056 Female 0.83 0.57 1.22 0.34 1.23 0.78 1.92 0.38 Induction therapy 1.76 0.92 3.37 0.088 Early cancer stage (I or II) 0.75 0.42 1.32 0.31 Late cancer stage (III or IV) 0.98 0.51 1.90 0.96 Serum albumin 0.66 0.43 1.01 0.056 Fraction of lung volume 0.18 0.04 0.82 0.027 Body mass index (BMI) 0.99 0.96 1.03 0.67 Year of operation 0.81 0.72 0.91 0.0004 (5 years change) Age (10 years change) 1.72 1.40 2.10 <0.0001 FEV1%: forced expiratory volume in the first second expressed as a percent of predicted; DLCO%: diffusing capacity of the lung for carbon monoxide expressed as a percent of predicted. risk, but because many patients did not have serum albumin values measured or recorded, this variable was often not considered in risk models. In this study we explored newer techniques for improving risk assessment, including CART methodology and multiple imputation techniques for assigning values to variables that were frequently missing. The need for this type of analysis is clearly indicated in Table 2, in which numerous important statistical and clinical differences are evident between groups with and without measurements for two of the most common missing variables, diffusing capacity and serum albumin. In addition, because most statistical software drops observations with missing values, any statistical model including both DLCO and serum albumin would result in 432 (41%) of our observations being dropped from the analysis. CART methodology previously has been used sparingly in modeling lung resection outcomes [16] and is not generally familiar to the thoracic surgical community. Its primary advantage is the development of more rigorous DLCO% (10% change) 0.87 0.80 0.94 0.0006 FEV1% (10% change) 1.03 0.89 1.19 0.68 Female 0.80 0.60 1.06 0.12 1.47 1.00 2.16 0.048 Induction therapy 0.91 0.51 1.62 0.75 Early cancer stage (I or II) 0.89 0.40 1.99 0.78 Late cancer stage (III or IV) 1.07 0.46 2.47 0.87 Serum albumin 0.65 0.42 0.99 0.047 Fraction of lung volume 0.53 0.15 1.88 0.32 FVC% (10% change) 0.90 0.77 1.06 0.22 Body mass index (BMI) 0.99 0.96 1.02 0.33 FEV1/FVC ratio >0.7 0.53 0.35 0.79 0.0022 Non-small cell lung cancer Small cell lung cancer 1.42 0.52 3.87 0.49 Carcinoid tumor 0.62 0.25 1.55 0.31 Other lung pathology 0.99 0.48 2.00 0.97 FVC%: forced vital capacity expressed as a percent of predicted; FEV1%: forced expiratory volume in the first second expressed as a percent of predicted; DLCO%: diffusing capacity of the lung for carbon monoxide expressed as a percent of predicted. Table 6 Covariates for operative mortality DLCO% (10% change) 0.78 0.68 0.90 0.0006 FEV1% (10% change) 1.03 0.84 1.28 0.76 Female 0.82 0.46 1.46 0.50 1.74 0.94 3.22 0.079 Induction therapy 1.28 0.43 3.83 0.65 Early cancer stage (I or II) 0.80 0.35 1.80 0.59 Late cancer stage (III or IV) 0.69 0.27 1.80 0.45 Serum albumin 0.50 0.20 1.24 0.12 Fraction of lung volume 0.03 0.00 0.24 0.0011 FVC% (10% change) 0.86 0.68 1.10 0.24 Year of operation (5 years 0.82 0.69 0.97 0.021 change) Age (10 years change) 1.30 0.99 1.69 0.058 FVC%: forced vital capacity expressed as a percent of predicted; FEV1%: forced expiratory volume in the first second expressed as a percent of predicted; DLCO%: diffusing capacity of the lung for carbon monoxide expressed as a percent of predicted. models through nonparametric recursive partitioning of data, including interactions, permitting identification of potentially influential variables that might not be obvious in simple univariate or even multivariable analyses. Multiple imputation techniques, which replace missing values with two or more plausible values, strengthens modeling efforts by enabling inclusion of more patients whose data ordinarily would be eliminated from model generation because of missing values. It should be noted, however, that the validity of post-imputation estimates depends on the assumption that the missing data are missing at random (MAR), i.e., the probability that an observation is missing can depend on the values of observed items but not on the value of the missing item itself. Thus, for example, if the probability that a value

M.K. Ferguson et al. / European Journal of Cardio-thoracic Surgery 34 (2008) 1085 1089 1089 is missing is greater in males than females and gender is recorded, then the data satisfy the MAR assumption. On the other hand, if a value is more likely to be missing when the true (but unobserved) value of the variable itself is large, then the data would not satisfy the MAR assumption. Our findings are similar to some previously published findings, in which diffusing capacity was identified as the single strongest predictor of complications (pulmonary, overall, mortality) following major lung resection [17 19]. Additional previously reported determinants of complications that approached or achieved statistical significance in this analysis included FEV1% (pulmonary, cardiovascular), performance status (overall, mortality), age (cardiovascular, mortality), the extent of surgery (cardiovascular, mortality), chronic obstructive lung disease (COPD, defined as FEV1/FVC ratio <0.7; overall complications), and year of operation (cardiovascular, mortality). In addition, as suspected in our preliminary review of these data, serum albumin approached or achieved statistical significance as a predictor of outcomes including pulmonary, cardiovascular, and overall complications. That serum albumin is an important predictor of complications after major lung surgery is not surprising. Weight loss and low body cell mass are correlated with low serum albumin in patients with lung cancer, and are associated with systemic inflammation that may contribute to an increased risk of postoperative complications [20]. Other authors have reported an association between low serum albumin and postoperative pulmonary complications, ascribing the utility of serum albumin as a determinant of outcomes to its value as a surrogate for overall nutritional and/or immune status [21,22]. Because multiple imputation incorporates additional variability into parameter estimates to account for the uncertainty due to missing values being unknown, standard errors of parameter estimates tend to be larger than in analyses with no missing values. As a result, the importance of serum albumin in our models is likely to be understated. In summary, in addition to confirming the role of diffusing capacity in predicting postoperative complications after major lung resection, we report that serum albumin is also a strong predictor of such complications. Its routine measurement may improve estimates of risk in patients who are candidates for lung resection. Use of CART and imputation techniques for modeling surgical risk has potential value in identifying important predictive variables that may ordinarily be eliminated from analysis or not identified as predictors because of incomplete observations in clinical databases. References [1] Ferguson MK, Little L, Rizzo L, Popovich KJ, Glonek GF, Leff A, Manjoney D, Little AG. Diffusing capacity predicts morbidity and mortality after pulmonary resection. J Thorac Cardiovasc Surg 1988;96:894 900. [2] Ferguson MK, Reeder LB, Mick R. Optimizing selection of patients for major lung resection. J Thorac Cardiovasc Surg 1995;109:275 83. [3] Wang J, Olak J, Ferguson MK. Diffusing capacity predicts operative mortality but not long-term survival after resection for lung cancer. J Thorac Cardiovasc Surg 1999;117:581 7. [4] Wang J, Olak J, Ultmann RE, Ferguson MK. Assessment of pulmonary complications after lung resection. Ann Thor Surg 1999;67:1444 7. [5] Ferguson MK, Durkin AE. A comparison of three scoring systems for predicting complications after major lung resection. Eur J Cardiothorac Surg 2003;23:35 42. [6] Ferguson MK, Vigneswaran WT. Diffusing capacity predicts postoperative morbidity after major lung resection in patients without obstructive pulmonary disease. Ann Thorac Surg 2008;85:1158 65. [7] Ferguson MK, Vigneswaran WT. Changes in patient presentation and outcomes for major lung resection over three decades. Eur J Cardiothorac Surg 2008;33:497 501. [8] Donders AR, van der Heijden GJ, Stijnen T, Moons KG. Review: a gentle introduction to imputation of missing values. J Clin Epidemiol 2006;59: 1087 91. [9] Manca A, Palmer S. Handling missing data in patient-level cost-effectiveness analysis alongside randomised clinical trials. Appl Health Econ Health Policy 2005;4:65 75. [10] Ambler G, Omar RZ, Royston P. A comparison of imputation techniques for handling missing predictor values in a risk model with a binary outcome. Stat Methods Med Res 2007;16:277 98. [11] Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and regression trees. New York: Chapman & Hall/CRC; 1984. [12] Schafer JL. Analysis of incomplete multivariate data. New York: Chapman & Hall/CRC; 1997. [13] Harrell FE. Regression modeling strategies. New York: Springer; 2001. [14] Rubin DB. Multiple imputation for nonresponse in surveys. New York: Wiley; 1987. [15] Berrisford R, Brunelli A, Rocco G, Treasure T, Utley M. Audit and Guidelines Committee of the European Society of Thoracic Surgeons; European Association of Cardiothoracic Surgeons. The European Thoracic Surgery Database project: modeling the risk of in-hospital death following lung resection. Eur J Cardiothorac Surg 2005;28:306 11. [16] Varela G, Brunelli A, Rocco G, Novoa N, Refai M, Jiménez MF, Salati M, Gatani T. Measured FEV1 in the first postoperative day, and not ppofev1, is the best predictor of cardio-respiratory morbidity after lung resection. Eur J Cardiothorac Surg 2007;31:518 21. [17] Markos J, Mullan BP, Hillman DR, Musk AW, Antico VF, Lovegrove FT, Carter MJ, Finucane KE. Preoperative assessment as a predictor of mortality and morbidity after lung resection. Am Rev Respir Dis 1989;139:902 10. [18] Pierce RJ, Copland JM, Sharpe K, Barter CE. Preoperative risk evaluation for lung cancer resection: predicted postoperative product as a predictor of surgical mortality. Am J Respir Crit Care Med 1994;150: 947 55. [19] Brunelli A, Refai MA, Salati M, Sabbatini A, Morgan-Hughes NJ, Rocco G. Carbon monoxide lung diffusion capacity improves risk stratification in patients without airflow limitation: evidence for systematic measurement before lung resection. Eur J Cardiothorac Surg 2006;29:567 70. [20] Simons JP, Schols AM, Buurman WA, Wouters EF. Weight loss and low body cell mass in males with lung cancer: relationship with systemic inflammation, acute-phase response, resting energy expenditure, and catabolic and anabolic hormones. Clin Sci (Lond) 1999;97:215 23. [21] Busch E, Verazin G, Antkowiak JG, Driscoll D, Takita H. Pulmonary complications in patients undergoing thoracotomy for lung carcinoma. Chest 1994;105:760 6. [22] Amar D, Zhang H, Park B, Heerdt PM, Fleisher M, Thaler HT. Inflammation and outcome after general thoracic surgery. Eur J Cardiothorac Surg 2007;32:431 4.