Propensity Score Analysis to compare effects of radiation and surgery on survival time of lung cancer patients from National Cancer Registry (SEER)
|
|
- Amy Hensley
- 5 years ago
- Views:
Transcription
1 Propensity Score Analysis to compare effects of radiation and surgery on survival time of lung cancer patients from National Cancer Registry (SEER) Yan Wu Advisor: Robert Pruzek Epidemiology and Biostatistics School of Public Health SUNY-Albany December 2009 Abstract: This study used a wide variety of modern propensity score analysis (PSA) methods and graphics to assess survival times following radiation or surgery of lung cancer patients. The data consisted of a sample (n=9474) from the Surveillance, Epidemiology and End Results (SEER) Registry of the National Cancer Institute. Although several different methods were used to define propensity scores, including classification trees and logistic regression, using both stratification and matching, results were generally consistent across methods, showing the following: 1. Lung cancer patients treated with surgery alone had longer survival times than those treated with radiation alone (averaging about ½ year longer). 2. For patients who received no radiation, surgery yielded large mean gains in survival times (almost 1 year longer); among patients who had radiation, the addition of surgery had smaller average effects (roughly ½ year difference). 3. Radiation tended to improve survival of lung cancer patients who did not have surgery (averaging 2-3 months); but radiation tended to have little affect following surgery (adding just over 1 month on average). 1. Introduction 1.1 Data source The Surveillance, Epidemiology, and End Results (SEER) Registry from the National Cancer Institute was used as the data source. This is an authoritative source of information on cancer incidence and survival in the United States (1). The SEER Program is the only comprehensive source of population-based information on cancer in the United States; it includes information for several covariates as well as patient survival data under different treatment conditions. SEER collects information on stage of cancer at the time of diagnosis, incidence, prevalence and survival for patients from specific geographic areas that represent 26 percent of the U.S. population; the reports are compiled to include covariate observations plus cancer mortality for the entire country. The data are available and intended for research for anyone interested in US cancer statistics or cancer surveillance methods. Updated annually and provided as a public service in print and electronic formats, SEER data are used by thousands of researchers, clinicians, public health officials, legislators, policymakers, and community groups. 1
2 1.2 Background of the research Lung cancer is the second most diagnosed cancer in men and women in the U.S. and the number one cause of death from cancer each year in both men and women. Surgical resection (cutting away) of the tumor generally is indicated for cancer that has not spread beyond the lung. This is the principal form of treatment for patients with stage 1 or stage 2 lung cancer. Radiation therapy, or radiotherapy, delivers high-energy x-rays that can destroy rapidly dividing cancer cells. It can shrink the tumor(s) before surgery and eliminate most cancer cells that remain in the treated area after surgery. The SEER data that is analyzed in this report consists of 9474 cases pertaining to the effects of radiation or surgery on survival of patients with lung cancer. 1.3 Literature review of the research Cox regression analysis has predicted long time survival after lung cancer surgery with early preoperative stage, ages below 70 years and normal pulmonary function (2). The Kaplan-Meier method was used to calculate 5-year postoperative survival in all categories of patient s characteristics; it identified significant increases in survival for early stage patients (3). Cox model and Kaplan-Meier curves can more accurately identify patients at risk for lung cancer death after surgery using histologic type and precise size specifications than using conventional tumor-node-metastasis staging, with SEER data (4). Research on the effects of surgery on survival focuses on building up survival models to identify significant variables in survival prediction. No research article on the effects of radiation treatment only on survival of lung cancer patients has been found. No studies seem to have comprehensively compared groups who did vs. did not have surgery or radiation with respect to survival for lung cancer. No comparison seems to have been made by controlling all available covariates, that is, adjusting for the effects of selection bias! The preceding means that an important question still remains to be answered: Do lung cancer patients survive longer with following surgery or radiation, or both of these treatments? Since the observationally defined groups cannot be fairly compared without covariate adjustments, a decision was made to use a wide variety of modern PSA-specific methods and graphics to compare groups after adjusting for selection bias. 1.4 Propensity Score Analysis (PSA) The Propensity Score is the conditional probability that a unit i will receive a treatment (Z i =1), given a set X of observed covariates: e(x i )=pr(z i =1 X i =x i ) (where it is assumed that, given the X s, the Z i [0 or 1] are independent) (5). Given an appropriate (model) or method for computing propensity scores, e(xi), treated and control subjects in the same stratum, or matched set (having nearly the same PS), should have approximately the same distribution for each covariate, or combination of covariates. Given an appropriate choice of covariates, treatment and control groups should differ from one another only by chance if their propensity scores are highly similar. This effect is similar to what happens when randomization is used to assign treatments; it is notable that it can also occur in well-designed observational studies. There are two stages of PSA: Stage 1, 2
3 Estimation of Propensity Scores (PS); Stage 2a, PS matching and 2b, treatment comparisons. 2. Methods 2.1 Data creation and cleaning The SEER archive starts since Quite a few of the data elements are outdated and some of them have different coding, most of which standardized in Thus, we extracted patients who were diagnosed with lung cancer in 2004 and used the data elements with coding consistent from 2004 to There are no missing data for outcome variable or treatment variables, which are surgery and radiation. About 40% records with any missing data elements were deleted from the final dataset which has 9474 cases. The survival time is measured by month and the cutoff is in Thus, the data is censored and about 30% people were still alive with survival time equaling to 35 months. The description of the data elements is in the table 1. The following Table provides a brief description of all SEER covariates as well as the response variable, survival, in the form of a data dictionary: Variable survival surgery radiation laterality size marital race sex age grade histg diag_confirm node exten stage site mets malignant number_pri vitalr lymph Description # of months 0 35 (All cases diagnosed in 2004 and cutoff is in If survive at the end of 3 years, survival=35) Surgery is performed or not Radiation is performed or not Which side the tumor originated The diameter of the primary tumor recorded in millimeters marital status patient race patient gender patient age Grading and differentiation codes Grouped histological type The method used to confirm the presence of cancer (histology or cytology) Exact number of regional lymph nodes containing metastases Contiguous growth of the primary tumor Describes the extent of the cancer the site in which the primary tumor originated the distant site(s) of metastatic involvement First malignant primary indicator the actual number of primaries Whether patient dies from the cancer the regional lymph nodes involved with cancer Table 1. Data dictionary of the studying dataset. 3
4 2.2 Comparison design To answer the question: Do lung cancer patients survive longer following surgery or radiation, or both, five binary group comparisons were made, using a 2 x 2 table to show how a total of five comparisons were made in three groups. See Figure 1. The first comparison, for Group I compared the effects of the surgery alone vs. radiation alone with respect to survival times (Figure 1a). Two comparisons were defined for Group II (by rows): for these, the effects of surgery on survival were assessed with and without radiation (Figure 1b). For Group III (by columns): the effects of radiation were compared with or without surgery (Figure 1c) PSA stages Stage 1: Estimation of PS. Both classification tree and logistic regression (LR) methods were used to estimate propensity scores. The classification tree algorithm is not model-based; rather it is algorithmic. Binary recursive partitioning is used to form splits; the bottom of the tree identifies leaves. Covariates and cut points are chosen to ensure the best splits; interactions of covariates are therefore automatically detected. Strata were formed naturally using leaves of a tree; strata sizes were unconstrained and the number of terminal leaves/strata chosen was based on prior experience with logistic regression, especially some results of Rosenbaum and Rubin (6). Propensity Scores were derived as (# in treatment group) / (# of units), within each stratum; the number of strata equaled the number of leaves (bottom of tree). All individuals were assigned the same estimated PS within each stratum. For the LR, Z i = binary variable (1 = treatment, 0 = control) and X i = vector of independent variables (explanatory covariates). PS= Prob(Z i =1 X). Fitted values are used to estimate PS s, e(x i ), for each observation, based on the available covariates. When propensity scores are estimated from the X i ; the estimation method should generally consider product functions, or interactions. The goal of LR is to use all covariates that appear to relate to treatment and outcome to improve prediction. Overfitting in phase I of PSA is not problematic since there are no concerns about external validity of PS estimation model. To the extent that there is selection bias (for which adjustments are needed) this can be seen in discrimination reflected in an ROC curve associated with the model. (But many of the best PSA studies show little discrimination.) 4
5 2.3.2 Stage 2: PS matching and treatments comparison. Nearest neighbor 1:1 matching (NNM) was used, without replacement: this method selects the m comparison (control) units whose propensity cores are closest to those of the corresponding treated unit. Propensity scores were ranked from high to low. The highest ranked unit was matched first and the matched comparison unit was removed from further matching; then the next highest ranked unit, etc., until all matches were made. Given the matched pairs, a paired sample comparison was used to analyze and display the data pairs and difference scores; confidence intervals for the population mean difference were used for each such comparison. Stratification was also used; this entailed use of loess regression and Five (nearly equally sized) strata from stratification-based comparison of outcomes across derived strata were generally sufficient to remove 90% of removable bias. Means of treatment and control group were compared within each stratum and the DAE-Direct Adjustment Estimator (weighted mean difference between control and treatment response across strata) was calculated. The LOESS-based comparison of treatment to control groups for response was based on LR results. Local nonlinear regression curves were obtained for treatment and control groups. The span argument in the R function loess was used to control smoothness. The vertical distance between curves across PS range provides an index of the treatment effect weighted according to the density of the PS distribution ignoring groups. 3. Results 3.1 Preliminary comparison of survival time distribution. The survival time histograms from total, no surgery and no radiation, surgery alone, radiation alone, and surgery and radiation groups were compared side by side in figure 2. These histograms suggested several questions about survival for certain 5
6 groups that would be worthy of deeper analysis, but they required detailed knowledge of archive and so were not considered here. 3.2 Classification tree stratification 6
7 In the classification tree graphics, each leaf (at the bottom) shows numerical values of the derived propensity scores and number of observations with such PS. The circ.psa function in PSAgraphics R package was used for these comparisons. In the circ.psa plots, a circle is plotted for each stratum, centered on the means for the treatments groups (for the X and Y axes) respectively. The size of the circle corresponds to the size of the stratum. A diagonal line from lower left to upper right shows the identity where X=Y, so that circles on, say, the lower side of this line show that the corresponding X mean is larger than the Y mean for that stratum, and vice-versa. Parallel projections are made from the centers of the stratumcircles to difference scores that are plotted on a line segment on the lower-left corner of the graphic; the average difference, which corresponds to the direct adjustment estimator (DAE) for the overall treatment effect, is plotted as the blue heavy dashed line parallel to the identity diagonal. Rug plots are shown on the upper and right margins of the graphic, for the X and Y marginal distributions. A 95% confidence interval (green solid bar) 7
8 for the overall effect is plotted to the left of the distribution of the stratadefined difference scores, centered on the DAE. In the comparison of surgery alone vs. radiation alone (Figure 3), the grand mean difference is 6.56 and the 95% C.I. ranges from 5.38 to In Figure 4, we assess the effects of surgery on survival time with or without radiation. For the patients without radiation, surgery led to average months longer survivals compared to no surgery (95% CI =10.28, 12.76). For patients who had had radiation, surgery led to average 6.68 months longer survival compared to no surgery (95% CI = 4.04,9.32). In Figure 5, we assess the effects of radiation on survival time with or without surgery. For the patients without surgery, radiation led on average to 2.89 months longer survival than no surgery (95% CI =2.06, 3.71). For patients with surgery, radiation led to a mean of 0.78 months longer survival than no radiation (95% CI = -2.3, 0.74). 3.3 LR to estimate the PS LR was used to estimate the PS of treatment for each comparison. The covariates in red with very small p values contribute most 8
9 to the PS, compared to covariates in black. In Figure 6, race, sex, age, site, grade, diag_confirm, exten, lymph, mets, stage and number_pri had stronger roles in defining the estimated PS of surgery alone vs. radiation alone. In Figure 7, in no radiation group, marital, age, site, laterality, grade, diag_confirm, exten, lymph, mets, stage and number_pri had strong roles in determining the estimated PS fpr comparing surgery with no surgery. In the radiation group, race, age, site, 9
10 laterality, grade, diag_confirm, exten, lymph, mets, stage, number_pri and malignant had strong roles in determining the estimated PS for comparing surgery to no surgery. In Figure 8, the no surgery group, marital, age, site, grade, diag_confirm, exten, lymph, mets, stage and malignant had strong roles in determining the estimated PS for radiation vs. no radiation. In the surgery group, marital, age, site, grade, diag_confirm, exten, lymph, mets and stage had strong roles in determining the estimated PS for comparing surgery with no surgery :1 nearest neighbor matching (NNM) Nearest neighbor 1:1 matching (NNM) was used without replacement; we compared the distribution of estimated probabilities for each treatment before and after matching using histograms. The gray histograms correspond to before matching; these 10
11 generally showed strong PS differences between treatments for all comparisons. After NNM, the green histograms show that the estimated PS s were nearly identical for the paired treatments in each five comparisons (Figures 9, 10 and 11). 3.5 Paired dependent sample comparison After 1:1 NNM, we used the function granova.ds from the granova R package to compare the matched (paired) dependent data. Paired X & Y values (survival times) were plotted as scatterplot where the identity reference line (for Y=X) is drawn. All data points are plotted relative to the identity line, and summary results are shown graphically. Parallel projections of data points to (a lower-left) line segment show how each point relates to its X-Y = D difference score; blue 'crosses' are used to display the distribution of difference scores and the mean difference is displayed as a red heavy dashed line, parallel to the identity reference line. Means for X and Y are also plotted as thin dashed vertical and horizontal lines to form rug plots for the distributions of X (at the top of graphic) and Y (on the right side). The 95% confidence interval for the population mean difference is also shown graphically (the green solid bar at the lower left side). In the comparison of surgery alone vs. radiation alone (Figure 12), the grand mean difference between the paired groups is 5.12 and the C.I. is from 4.44 to In Figure 13, we access the effects of surgery on survival time with or without radiation. For the patients without radiation, surgery led to 9.12 months longer average survivals compared to no surgery (95% CI = 7.87,10.38). For patients with radiation, 11
12 surgery led to a mean of 5.51 months longer survival compared to no surgery (95% CI = 4.19, 6.84). In Figure 14, we assess the effects of radiation on survival time with or without surgery. For the patients without surgery, radiation led to an average of 2.59 months longer survival than no surgery (95% CI =2.05, 3.12). For the patients with surgery, radiation led to an average of 0.29 months shorter survival than no radiation (95% CI = , 1.02). 3.6 Stratification-based comparison After estimation of PS by LR, stratification-based comparisons were made using quintiles of the PS distribution, to follow the logic of Rosenbaum and Rubin (1983), i.e, 5 strata eliminate 90 % of selection bias. We stratified the observations into 5 equal sized strata according to their PSs. The cbal.psa R function (from the not-yet posted PSAgraphics package) was used to check the balance of covariates in treatment and control groups with and without PS adjustment. In the cbal.psa plot, the standardized effect size (stes) is defined as difference of means divided by pooled standard deviation. Both unadjusted ESs (stesuna) and the adjusted counterparts (stes-adj) are computed and plotted. For stes-adj, d is 12
13 computed using denominator based on pooling variances in all subgroups for each covariate. stes-adj corresponds to the stes adjusted using PS values, while stes-una does not entail PS adjustment. The horizontal scale corresponds to differences of stes between treatment and control groups. The blue letters show results for individual propensity score stratum, while the larger filled (blue) squares show the average standardized effect sizes across strata. The filled (red) circles show standardized effect size differences between the treatment and control groups before covariate adjustment; that is, these are stes-una differences. Covariates are listed on the vertical scale, and they are ordered on the basis of the sizes of unadjusted stes differences. From the balance plot, we see that the PS adjustment helped to balance the covariates between the treatment and control groups (Figure 15 17). 13
14 Then we used the circ.psa function from PSAgraphics R for further comparisons. The circles are equal sized as we used quintiles so that all 5 circles have the same area. We also computed the grand mean of difference from the blue heavy dash line along with the C.I. from the green solid bar. In the comparison of surgery alone vs. radiation alone (Figure 15), the grand mean difference between the paired groups is 5.7 months and the 95% C.I. is from 3.36 to 6.9. In Figure 16, we assess the effects of surgery on survival time with or without radiation. For the patients without radiation, surgery led to an average of 10.7 months longer survival than no surgery (95% CI = 7.95, 13.53). For the patients with radiation, 14
15 surgery led to an average of 5.99 months longer survival than no surgery (95% CI = 4.18, 7.79). In Figure 17, we assess the effects of radiation on survival time with or without surgery. For the patients without surgery, radiation led to an average of 2.68 months longer survival than no surgery (95% CI = 2.19, 3.17). For the patients with surgery, radiation led to an average of 1.21 months longer survival than no radiation (95% CI = , 2.44). To see more details about the distribution of each observation, we used the LOESS graph (loess.psa, from PSAgraphics) to assess treatment effects across the full range of each PS distribution. Data points are plotted using propensity scores against the survival response, separately for treatment and control groups, and distinguished by both symbol and color for both groups (the green is for control and the blue is for treatment). Non-linear, loess-based regression curves for both groups are also plotted to show the estimates of effects across the range of PS scores; the vertical differences between these regression curves are weighted by the density of the PS distribution (where the red line is for the treatment group and the black line is for the control group). 15
16 In the comparison of surgery alone (black line) vs. radiation alone (red line in Figure 15), the grand mean difference between the two groups is 6.63 months and the 95% C.I. is (4.82, 8.45). In Figure 16, we assess the effects of surgery on survival time with versus without radiation. For patients without radiation, surgery led to a mean of 10.5 months longer survival than no surgery (95% CI = 9.05, 11.85); for patients who had had radiation, surgery led to an average of 5.8 months longer survival than no surgery (95% CI = 4.01, 7.58). In Figure 17, we assess the effects of radiation on survival time with or without surgery. For the patients without surgery, radiation led to an average of 2.6 months longer survival than no surgery (95% CI =2.09, 3.1). For patients with surgery, radiation led to a mean of 1.42 months longer survival than no radiation (95% CI = -0.16, 2.67). 4. Summary and discussion All four kinds of PSA-related methods and graphics consistently showed similar inferential results for all five comparisons, after adjusting for selection bias. These results are summarized in Figure 18 using 95% confidence intervals; they generally suggest that surgery offers longer survival than radiation, and surgery leads to longer average survival times than no surgery, with or without radiation, while radiation does so only without surgery but not with surgery. This is a relatively thorough comparison of lung cancer patients survival times following radiation or surgery treatments after adjusting for all available covariates in SEER. It has been based on modern methods of covariate adjustment using propensity scores so that the interpretations require fewer qualifications than are needed when covariates adjustments are not used; and it has demonstrated use of different PSA methods and several graphics (using R). This study has helped quantify mean differences (expressed as months of survival) between various treatments, after adjusting for covariate differences. Future steps: PSA can only adjust for observable covariates, which leads to the question of how large the effects of unknown or unobserved confounders would have to be to explain the effects. This question might be answered to some extent using Sensitivity Analysis, as suggested by Rosenbaum. There were missing values for some covariates; these records were removed from study data set. Multiple imputation may provide a useful strategy for dealing with this issue in future analyses. 16
17 References: Predictors of long time survival after lung cancer surgery: A retrospective cohort study. Kjetil Roth et al. BMC Pulmonary Medicine 2008, 8:22 3. Surgery for non-small cell lung cancer: postoperative survival based on the revised tumor-node-metastasis classification and its time trend. Fumihiro Tanaka et al. European Journal of Cardio-thoracic Surgery 18 (2000) Survival after Surgery in Stage IA and IB Non Small Cell Lung Cancer David Ost et al. Am J Respir Crit Care Med Vol 177. pp , Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. Ralph Doagostino. Statist. Med. 17, (1998) 6. The central role of the propensity score in observational studies for causal effects. PAUL R. ROSENBAUM and DONALD B. RUBIN. Biometrika (1): Enhancing Dependent Sample Analyses with Graphics. Robert M. Pruzek, James E. Helmreich. Journal of Statistics Education Volume 17, Number 1, 2009 ( 8. PSAgraphics: An R package to Support Propensity Score Analysis. Helmreich, J & Pruzek, R.M. Jour. of Stat. Software, 29, ( 9. R Granova package. Robert M. Pruzek, James E. Helmreich. 10. R PSAgraphics package. Robert M. Pruzek, James E. Helmreich. 11. R cbal-es function. Kuannan Xiong, Robert M. Pruzek. (unpublished) 17
Propensity Score Analysis to compare effects of radiation and surgery on survival time of lung cancer patients from National Cancer Registry (SEER)
Propensity core Analysis to compare effects of radiation and surgery on survival time of lung cancer patients from National Cancer egistry (EE) Yan Wu Advisor: obert Pruzek Epidemiology and Biostatistics,
More informationPropensity Score Analysis: Its rationale & potential for applied social/behavioral research. Bob Pruzek University at Albany
Propensity Score Analysis: Its rationale & potential for applied social/behavioral research Bob Pruzek University at Albany Aims: First, to introduce key ideas that underpin propensity score (PS) methodology
More informationBIOSTATISTICAL METHODS
BIOSTATISTICAL METHODS FOR TRANSLATIONAL & CLINICAL RESEARCH PROPENSITY SCORE Confounding Definition: A situation in which the effect or association between an exposure (a predictor or risk factor) and
More informationPubH 7405: REGRESSION ANALYSIS. Propensity Score
PubH 7405: REGRESSION ANALYSIS Propensity Score INTRODUCTION: There is a growing interest in using observational (or nonrandomized) studies to estimate the effects of treatments on outcomes. In observational
More informationUnit 1 Exploring and Understanding Data
Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile
More informationSurvival Prediction Models for Estimating the Benefit of Post-Operative Radiation Therapy for Gallbladder Cancer and Lung Cancer
Survival Prediction Models for Estimating the Benefit of Post-Operative Radiation Therapy for Gallbladder Cancer and Lung Cancer Jayashree Kalpathy-Cramer PhD 1, William Hersh, MD 1, Jong Song Kim, PhD
More informationPropensity Score Methods for Causal Inference with the PSMATCH Procedure
Paper SAS332-2017 Propensity Score Methods for Causal Inference with the PSMATCH Procedure Yang Yuan, Yiu-Fai Yung, and Maura Stokes, SAS Institute Inc. Abstract In a randomized study, subjects are randomly
More informationPropensity score method: a non-parametric technique to reduce model dependence
Big-data Clinical Trial Column Page of 8 Propensity score method: a non-parametric technique to reduce model dependence Zhongheng Zhang Department of Emergency Medicine, Sir Run-Run Shaw Hospital, Zhejiang
More informationFrom Biostatistics Using JMP: A Practical Guide. Full book available for purchase here. Chapter 1: Introduction... 1
From Biostatistics Using JMP: A Practical Guide. Full book available for purchase here. Contents Dedication... iii Acknowledgments... xi About This Book... xiii About the Author... xvii Chapter 1: Introduction...
More informationIntroduction to Discrimination in Microarray Data Analysis
Introduction to Discrimination in Microarray Data Analysis Jane Fridlyand CBMB University of California, San Francisco Genentech Hall Auditorium, Mission Bay, UCSF October 23, 2004 1 Case Study: Van t
More informationUnderstandable Statistics
Understandable Statistics correlated to the Advanced Placement Program Course Description for Statistics Prepared for Alabama CC2 6/2003 2003 Understandable Statistics 2003 correlated to the Advanced Placement
More informationBiostatistics II
Biostatistics II 514-5509 Course Description: Modern multivariable statistical analysis based on the concept of generalized linear models. Includes linear, logistic, and Poisson regression, survival analysis,
More informationCHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships 3.1 Scatterplots and Correlation The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers Reading Quiz 3.1 True/False 1.
More informationHow to interpret scientific & statistical graphs
How to interpret scientific & statistical graphs Theresa A Scott, MS Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott 1 A brief introduction Graphics:
More informationEthnic Disparities in the Treatment of Stage I Non-small Cell Lung Cancer. Juan P. Wisnivesky, MD, MPH, Thomas McGinn, MD, MPH, Claudia Henschke, PhD,
Ethnic Disparities in the Treatment of Stage I Non-small Cell Lung Cancer Juan P. Wisnivesky, MD, MPH, Thomas McGinn, MD, MPH, Claudia Henschke, PhD, MD, Paul Hebert, PhD, Michael C. Iannuzzi, MD, and
More informationLAB ASSIGNMENT 4 INFERENCES FOR NUMERICAL DATA. Comparison of Cancer Survival*
LAB ASSIGNMENT 4 1 INFERENCES FOR NUMERICAL DATA In this lab assignment, you will analyze the data from a study to compare survival times of patients of both genders with different primary cancers. First,
More information2017 American Medical Association. All rights reserved.
Supplementary Online Content Borocas DA, Alvarez J, Resnick MJ, et al. Association between radiation therapy, surgery, or observation for localized prostate cancer and patient-reported outcomes after 3
More informationLecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics
Biost 517 Applied Biostatistics I Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 3: Overview of Descriptive Statistics October 3, 2005 Lecture Outline Purpose
More informationChapter 1: Exploring Data
Chapter 1: Exploring Data Key Vocabulary:! individual! variable! frequency table! relative frequency table! distribution! pie chart! bar graph! two-way table! marginal distributions! conditional distributions!
More informationbivariate analysis: The statistical analysis of the relationship between two variables.
bivariate analysis: The statistical analysis of the relationship between two variables. cell frequency: The number of cases in a cell of a cross-tabulation (contingency table). chi-square (χ 2 ) test for
More informationApplication of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties
Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties Bob Obenchain, Risk Benefit Statistics, August 2015 Our motivation for using a Cut-Point
More informationSupplementary Appendix
Supplementary Appendix This appendix has been provided by the authors to give readers additional information about their work. Supplement to: Weintraub WS, Grau-Sepulveda MV, Weiss JM, et al. Comparative
More informationTRIPLL Webinar: Propensity score methods in chronic pain research
TRIPLL Webinar: Propensity score methods in chronic pain research Felix Thoemmes, PhD Support provided by IES grant Matching Strategies for Observational Studies with Multilevel Data in Educational Research
More informationSTATISTICS & PROBABILITY
STATISTICS & PROBABILITY LAWRENCE HIGH SCHOOL STATISTICS & PROBABILITY CURRICULUM MAP 2015-2016 Quarter 1 Unit 1 Collecting Data and Drawing Conclusions Unit 2 Summarizing Data Quarter 2 Unit 3 Randomness
More informationPropensity Score Methods for Longitudinal Data Analyses: General background, ra>onale and illustra>ons*
Propensity Score Methods for Longitudinal Data Analyses: General background, ra>onale and illustra>ons* Bob Pruzek, University at Albany SUNY Summary Propensity Score Analysis (PSA) was introduced by Rosenbaum
More informationIntroduction ORIGINAL RESEARCH
Cancer Medicine ORIGINAL RESEARCH Open Access The effect of radiation therapy in the treatment of adult soft tissue sarcomas of the extremities: a long- term community- based cancer center experience Jeffrey
More informationInfluence of Lymphadenectomy on Survival for Early-Stage Endometrial Cancer
Influence of Lymphadenectomy on Survival for Early-Stage Endometrial Cancer Jason D. Wright, MD, Yongemei Huang, MD/PhD, William M. Burke, MD, et al. Journal Club March 16, 2016 Blaine Campbell-PGY2 Objective
More informationChapter 13 Estimating the Modified Odds Ratio
Chapter 13 Estimating the Modified Odds Ratio Modified odds ratio vis-à-vis modified mean difference To a large extent, this chapter replicates the content of Chapter 10 (Estimating the modified mean difference),
More informationNature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training.
Supplementary Figure 1 Behavioral training. a, Mazes used for behavioral training. Asterisks indicate reward location. Only some example mazes are shown (for example, right choice and not left choice maze
More informationPredicting Breast Cancer Survival Using Treatment and Patient Factors
Predicting Breast Cancer Survival Using Treatment and Patient Factors William Chen wchen808@stanford.edu Henry Wang hwang9@stanford.edu 1. Introduction Breast cancer is the leading type of cancer in women
More informationCombining machine learning and matching techniques to improve causal inference in program evaluation
bs_bs_banner Journal of Evaluation in Clinical Practice ISSN1365-2753 Combining machine learning and matching techniques to improve causal inference in program evaluation Ariel Linden DrPH 1,2 and Paul
More informationAP STATISTICS 2010 SCORING GUIDELINES
AP STATISTICS 2010 SCORING GUIDELINES Question 1 Intent of Question The primary goals of this question were to assess students ability to (1) apply terminology related to designing experiments; (2) construct
More informationINTRODUCTION TO MACHINE LEARNING. Decision tree learning
INTRODUCTION TO MACHINE LEARNING Decision tree learning Task of classification Automatically assign class to observations with features Observation: vector of features, with a class Automatically assign
More informationSUPPLEMENTAL MATERIAL
1 SUPPLEMENTAL MATERIAL Response time and signal detection time distributions SM Fig. 1. Correct response time (thick solid green curve) and error response time densities (dashed red curve), averaged across
More informationHow should the propensity score be estimated when some confounders are partially observed?
How should the propensity score be estimated when some confounders are partially observed? Clémence Leyrat 1, James Carpenter 1,2, Elizabeth Williamson 1,3, Helen Blake 1 1 Department of Medical statistics,
More informationEPI 200C Final, June 4 th, 2009 This exam includes 24 questions.
Greenland/Arah, Epi 200C Sp 2000 1 of 6 EPI 200C Final, June 4 th, 2009 This exam includes 24 questions. INSTRUCTIONS: Write all answers on the answer sheets supplied; PRINT YOUR NAME and STUDENT ID NUMBER
More informationWe have no disclosures
Pulmonary Artery Pressure Changes Differentially Effect Survival in Lung Transplant Patients with COPD and Pulmonary Hypertension: An Analysis of the UNOS Registry Kathryn L. O Keefe MD, Ahmet Kilic MD,
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 10: Introduction to inference (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 17 What is inference? 2 / 17 Where did our data come from? Recall our sample is: Y, the vector
More informationCopyright 2007 IEEE. Reprinted from 4th IEEE International Symposium on Biomedical Imaging: From Nano to Macro, April 2007.
Copyright 27 IEEE. Reprinted from 4th IEEE International Symposium on Biomedical Imaging: From Nano to Macro, April 27. This material is posted here with permission of the IEEE. Such permission of the
More informationCHAPTER ONE CORRELATION
CHAPTER ONE CORRELATION 1.0 Introduction The first chapter focuses on the nature of statistical data of correlation. The aim of the series of exercises is to ensure the students are able to use SPSS to
More informationApplication of Artificial Neural Network-Based Survival Analysis on Two Breast Cancer Datasets
Application of Artificial Neural Network-Based Survival Analysis on Two Breast Cancer Datasets Chih-Lin Chi a, W. Nick Street b, William H. Wolberg c a Health Informatics Program, University of Iowa b
More informationExamining Relationships Least-squares regression. Sections 2.3
Examining Relationships Least-squares regression Sections 2.3 The regression line A regression line describes a one-way linear relationship between variables. An explanatory variable, x, explains variability
More information4/10/2018. SEER EOD and Summary Stage. Overview KCR 2018 SPRING TRAINING. What is SEER EOD? Ambiguous Terminology General Guidelines
SEER EOD and Summary Stage KCR 2018 SPRING TRAINING Overview What is SEER EOD Ambiguous Terminology General Guidelines EOD Primary Tumor EOD Regional Nodes EOD Mets SEER Summary Stage 2018 Site Specific
More informationTITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS)
TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS) AUTHORS: Tejas Prahlad INTRODUCTION Acute Respiratory Distress Syndrome (ARDS) is a condition
More informationSPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences.
SPRING GROVE AREA SCHOOL DISTRICT PLANNED COURSE OVERVIEW Course Title: Basic Introductory Statistics Grade Level(s): 11-12 Units of Credit: 1 Classification: Elective Length of Course: 30 cycles Periods
More informationRegression Discontinuity Analysis
Regression Discontinuity Analysis A researcher wants to determine whether tutoring underachieving middle school students improves their math grades. Another wonders whether providing financial aid to low-income
More informationPropensity score methods to adjust for confounding in assessing treatment effects: bias and precision
ISPUB.COM The Internet Journal of Epidemiology Volume 7 Number 2 Propensity score methods to adjust for confounding in assessing treatment effects: bias and precision Z Wang Abstract There is an increasing
More informationBiostatistics for Med Students. Lecture 1
Biostatistics for Med Students Lecture 1 John J. Chen, Ph.D. Professor & Director of Biostatistics Core UH JABSOM JABSOM MD7 February 14, 2018 Lecture note: http://biostat.jabsom.hawaii.edu/education/training.html
More informationCHL 5225 H Advanced Statistical Methods for Clinical Trials. CHL 5225 H The Language of Clinical Trials
CHL 5225 H Advanced Statistical Methods for Clinical Trials Two sources for course material 1. Electronic blackboard required readings 2. www.andywillan.com/chl5225h code of conduct course outline schedule
More informationEFFECT OF RADIATION THERAPY ON SURVIVAL IN PATIENTS WITH RESECTED MERKEL CELL CARCINOMA: A POPULATION-BASED ANALYSIS JULIAN A. KIM, M.D.
EFFECT OF RADIATION THERAPY ON SURVIVAL IN PATIENTS WITH RESECTED MERKEL CELL CARCINOMA: A POPULATION-BASED ANALYSIS by JULIAN A. KIM, M.D. Submitted in partial fulfillment of the requirements For the
More informationMulti-Stage Stratified Sampling for the Design of Large Scale Biometric Systems
Multi-Stage Stratified Sampling for the Design of Large Scale Biometric Systems Jad Ramadan, Mark Culp, Ken Ryan, Bojan Cukic West Virginia University 1 Problem How to create a set of biometric samples
More informationIAPT: Regression. Regression analyses
Regression analyses IAPT: Regression Regression is the rather strange name given to a set of methods for predicting one variable from another. The data shown in Table 1 and come from a student project
More informationEarly Learning vs Early Variability 1.5 r = p = Early Learning r = p = e 005. Early Learning 0.
The temporal structure of motor variability is dynamically regulated and predicts individual differences in motor learning ability Howard Wu *, Yohsuke Miyamoto *, Luis Nicolas Gonzales-Castro, Bence P.
More information6. Unusual and Influential Data
Sociology 740 John ox Lecture Notes 6. Unusual and Influential Data Copyright 2014 by John ox Unusual and Influential Data 1 1. Introduction I Linear statistical models make strong assumptions about the
More informationWhat you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu
What you should know before you collect data BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Types and levels of study Descriptive statistics Inferential statistics How to choose a statistical test
More informationUnit 7 Comparisons and Relationships
Unit 7 Comparisons and Relationships Objectives: To understand the distinction between making a comparison and describing a relationship To select appropriate graphical displays for making comparisons
More informationMethods for Addressing Selection Bias in Observational Studies
Methods for Addressing Selection Bias in Observational Studies Susan L. Ettner, Ph.D. Professor Division of General Internal Medicine and Health Services Research, UCLA What is Selection Bias? In the regression
More informationDr. Allen Back. Oct. 7, 2016
Dr. Allen Back Oct. 7, 2016 al Was it Fair? The first draft lottery during the Vietnam War: 366 balls labeled by dates. Mixed up and pulled out in a random order. al Was it Fair? Scatterplot al Was it
More informationLab 4 (M13) Objective: This lab will give you more practice exploring the shape of data, and in particular in breaking the data into two groups.
Lab 4 (M13) Objective: This lab will give you more practice exploring the shape of data, and in particular in breaking the data into two groups. Activity 1 Examining Data From Class Background Download
More informationIntroduction to diagnostic accuracy meta-analysis. Yemisi Takwoingi October 2015
Introduction to diagnostic accuracy meta-analysis Yemisi Takwoingi October 2015 Learning objectives To appreciate the concept underlying DTA meta-analytic approaches To know the Moses-Littenberg SROC method
More informationDr. Allen Back. Sep. 30, 2016
Dr. Allen Back Sep. 30, 2016 Extrapolation is Dangerous Extrapolation is Dangerous And watch out for confounding variables. e.g.: A strong association between numbers of firemen and amount of damge at
More informationOHDSI Tutorial: Design and implementation of a comparative cohort study in observational healthcare data
OHDSI Tutorial: Design and implementation of a comparative cohort study in observational healthcare data Faculty: Martijn Schuemie (Janssen Research and Development) Marc Suchard (UCLA) Patrick Ryan (Janssen
More informationSurgical Management of Metastatic Colon Cancer: analysis of the Surveillance, Epidemiology and End Results (SEER) database
Surgical Management of Metastatic Colon Cancer: analysis of the Surveillance, Epidemiology and End Results (SEER) database Hadi Khan, MD 1, Adam J. Olszewski, MD 2 and Ponnandai S. Somasundar, MD 1 1 Department
More informationTrends in Cancer Survival in NSW 1980 to 1996
Trends in Cancer Survival in NSW 19 to 1996 Xue Q Yu Dianne O Connell Bruce Armstrong Robert Gibberd Cancer Epidemiology Research Unit Cancer Research and Registers Division The Cancer Council NSW August
More informationWan-Ting, Lin 2017/11/21. JAMA Intern Med. doi: /jamainternmed Published online August 21, 2017.
Development and Validation of a Tool to Identify Patients With Type 2 Diabetes at High Risk of Hypoglycemia-Related Emergency Department or Hospital Use Andrew J. Karter, PhD; E. Margaret Warton, MPH;
More informationWeight Adjustment Methods using Multilevel Propensity Models and Random Forests
Weight Adjustment Methods using Multilevel Propensity Models and Random Forests Ronaldo Iachan 1, Maria Prosviryakova 1, Kurt Peters 2, Lauren Restivo 1 1 ICF International, 530 Gaither Road Suite 500,
More informationStatistics: A Brief Overview Part I. Katherine Shaver, M.S. Biostatistician Carilion Clinic
Statistics: A Brief Overview Part I Katherine Shaver, M.S. Biostatistician Carilion Clinic Statistics: A Brief Overview Course Objectives Upon completion of the course, you will be able to: Distinguish
More informationSurgical resection improves survival in pancreatic cancer patients without vascular invasion- a population based study
Original article Annals of Gastroenterology (2013) 26, 346-352 Surgical resection improves survival in pancreatic cancer patients without vascular invasion- a population based study Subhankar Chakraborty
More informationMunich Cancer Registry
Munich Cancer Registry Incidence and Mortality Selection Matrix Homepage Deutsch ICD-1 C54: Corpus cancer Survival Year of diagnosis 1988-1997 1998-16 Patients 2,146 9,811 Diseases 2,146 9,811 Cases evaluated
More informationBy: Mei-Jie Zhang, Ph.D.
Propensity Scores By: Mei-Jie Zhang, Ph.D. Medical College of Wisconsin, Division of Biostatistics Friday, March 29, 2013 12:00-1:00 pm The Medical College of Wisconsin is accredited by the Accreditation
More informationChapter 4. Navigating. Analysis. Data. through. Exploring Bivariate Data. Navigations Series. Grades 6 8. Important Mathematical Ideas.
Navigations Series Navigating through Analysis Data Grades 6 8 Chapter 4 Exploring Bivariate Data Important Mathematical Ideas Copyright 2009 by the National Council of Teachers of Mathematics, Inc. www.nctm.org.
More informationData Mining Techniques to Predict Survival of Metastatic Breast Cancer Patients
Data Mining Techniques to Predict Survival of Metastatic Breast Cancer Patients Abstract Prognosis for stage IV (metastatic) breast cancer is difficult for clinicians to predict. This study examines the
More informationOptimal full matching for survival outcomes: a method that merits more widespread use
Research Article Received 3 November 2014, Accepted 6 July 2015 Published online 6 August 2015 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/sim.6602 Optimal full matching for survival
More informationDEVELOPMENT OF MULTIPLE PRIMARY CANCERS IN LUNG CANCER PATIENTS: APPALACHIAN VS. NON-APPALACHIAN POPULATIONS OF KENTUCKY
University of Kentucky UKnowledge Theses and Dissertations--Public Health (M.P.H. & Dr.P.H.) College of Public Health 2016 DEVELOPMENT OF MULTIPLE PRIMARY CANCERS IN LUNG CANCER PATIENTS: APPALACHIAN VS.
More informationLONG-TERM SURGICAL OUTCOMES OF 1018 PATIENTS WITH EARLY STAGE NSCLC IN ACOSOG Z0030 (ALLIANCE) TRIAL
LONG-TERM SURGICAL OUTCOMES OF 1018 PATIENTS WITH EARLY STAGE NSCLC IN ACOSOG Z0030 (ALLIANCE) TRIAL Stacey Su, MD; Walter J. Scott, MD; Mark S. Allen, MD; Gail E. Darling, MD; Paul A. Decker, MS; Robert
More informationFinland and Sweden and UK GP-HOSP datasets
Web appendix: Supplementary material Table 1 Specific diagnosis codes used to identify bladder cancer cases in each dataset Finland and Sweden and UK GP-HOSP datasets Netherlands hospital and cancer registry
More informationMunich Cancer Registry
Munich Cancer Registry Incidence and Mortality Selection Matrix Homepage Deutsch ICD- C56: Ovarian cancer Survival Year of diagnosis 1988-1997 1998-16 Patients 1,547 7,273 Diseases 1,547 7,275 Cases evaluated
More informationEvidence Based Medicine
Course Goals Goals 1. Understand basic concepts of evidence based medicine (EBM) and how EBM facilitates optimal patient care. 2. Develop a basic understanding of how clinical research studies are designed
More informationSupplementary appendix
Supplementary appendix This appendix formed part of the original submission and has been peer reviewed. We post it as supplied by the authors. Supplement to: Callegaro D, Miceli R, Bonvalot S, et al. Development
More informationLecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method
Biost 590: Statistical Consulting Statistical Classification of Scientific Studies; Approach to Consulting Lecture Outline Statistical Classification of Scientific Studies Statistical Tasks Approach to
More informationMAKING THE NSQIP PARTICIPANT USE DATA FILE (PUF) WORK FOR YOU
MAKING THE NSQIP PARTICIPANT USE DATA FILE (PUF) WORK FOR YOU Hani Tamim, PhD Clinical Research Institute Department of Internal Medicine American University of Beirut Medical Center Beirut - Lebanon Participant
More informationClassification. Methods Course: Gene Expression Data Analysis -Day Five. Rainer Spang
Classification Methods Course: Gene Expression Data Analysis -Day Five Rainer Spang Ms. Smith DNA Chip of Ms. Smith Expression profile of Ms. Smith Ms. Smith 30.000 properties of Ms. Smith The expression
More informationComparison of discrimination methods for the classification of tumors using gene expression data
Comparison of discrimination methods for the classification of tumors using gene expression data Sandrine Dudoit, Jane Fridlyand 2 and Terry Speed 2,. Mathematical Sciences Research Institute, Berkeley
More informationMeasuring the User Experience
Measuring the User Experience Collecting, Analyzing, and Presenting Usability Metrics Chapter 2 Background Tom Tullis and Bill Albert Morgan Kaufmann, 2008 ISBN 978-0123735584 Introduction Purpose Provide
More informationMatt Laidler, MPH, MA Acute and Communicable Disease Program Oregon Health Authority. SOSUG, April 17, 2014
Matt Laidler, MPH, MA Acute and Communicable Disease Program Oregon Health Authority SOSUG, April 17, 2014 The conditional probability of being assigned to a particular treatment given a vector of observed
More information1 Introduction. st0020. The Stata Journal (2002) 2, Number 3, pp
The Stata Journal (22) 2, Number 3, pp. 28 289 Comparative assessment of three common algorithms for estimating the variance of the area under the nonparametric receiver operating characteristic curve
More informationLecture Outline Biost 517 Applied Biostatistics I
Lecture Outline Biost 517 Applied Biostatistics I Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 2: Statistical Classification of Scientific Questions Types of
More informationReminders/Comments. Thanks for the quick feedback I ll try to put HW up on Saturday and I ll you
Reminders/Comments Thanks for the quick feedback I ll try to put HW up on Saturday and I ll email you Final project will be assigned in the last week of class You ll have that week to do it Participation
More informationSurvival Comparison of Adenosquamous, Squamous Cell, and Adenocarcinoma of the Lung After Lobectomy
Survival Comparison of Adenosquamous, Squamous Cell, and Adenocarcinoma of the Lung After Lobectomy David T. Cooke, MD, Danh V. Nguyen, PhD, Ying Yang, MS, Steven L. Chen, MD, MBA, Cindy Yu, MD, and Royce
More informationAnalysis of gene expression in blood before diagnosis of ovarian cancer
Analysis of gene expression in blood before diagnosis of ovarian cancer Different statistical methods Note no. Authors SAMBA/10/16 Marit Holden and Lars Holden Date March 2016 Norsk Regnesentral Norsk
More informationMeta Analysis. David R Urbach MD MSc Outcomes Research Course December 4, 2014
Meta Analysis David R Urbach MD MSc Outcomes Research Course December 4, 2014 Overview Definitions Identifying studies Appraising studies Quantitative synthesis Presentation of results Examining heterogeneity
More informationCreating prognostic systems for cancer patients: A demonstration using breast cancer
Received: 16 April 2018 Revised: 31 May 2018 DOI: 10.1002/cam4.1629 Accepted: 1 June 2018 ORIGINAL RESEARCH Creating prognostic systems for cancer patients: A demonstration using breast cancer Mathew T.
More informationGENERAL ICD- C61: Prostate cancer Page 2 of 18 ICD- C61: Malignant neoplasm of prostate Period of diagnosis % Relative survival N=46,
Munich Cancer Registry Incidence and Mortality Selection Matrix Homepage Deutsch ICD- C61: Prostate cancer Survival Year of diagnosis 1988-1997 1998-215 Patients 7,45 49,513 Diseases 7,45 49,514 Cases
More informationIndex. Springer International Publishing Switzerland 2017 T.J. Cleophas, A.H. Zwinderman, Modern Meta-Analysis, DOI /
Index A Adjusted Heterogeneity without Overdispersion, 63 Agenda-driven bias, 40 Agenda-Driven Meta-Analyses, 306 307 Alternative Methods for diagnostic meta-analyses, 133 Antihypertensive effect of potassium,
More informationResearch Article Clinical Features and Outcomes Differ between Skeletal and Extraskeletal Osteosarcoma
Sarcoma, Article ID 902620, 8 pages http://dx.doi.org/10.1155/2014/902620 Research Article Clinical Features and Outcomes Differ between and Osteosarcoma Sheila Thampi, 1 Katherine K. Matthay, 1 W. John
More informationMatched Cohort designs.
Matched Cohort designs. Stefan Franzén PhD Lund 2016 10 13 Registercentrum Västra Götaland Purpose: improved health care 25+ 30+ 70+ Registries Employed Papers Statistics IT Project management Registry
More informationUK Liver Transplant Audit
November 2012 UK Liver Transplant Audit In patients who received a Liver Transplant between 1 st March 1994 and 31 st March 2012 ANNUAL REPORT Advisory Group for National Specialised Services Prepared
More informationUsing machine learning to assess covariate balance in matching studies
bs_bs_banner Journal of Evaluation in Clinical Practice ISSN1365-2753 Using machine learning to assess covariate balance in matching studies Ariel Linden, DrPH 1,2 and Paul R. Yarnold, PhD 3 1 President,
More informationDiurnal Pattern of Reaction Time: Statistical analysis
Diurnal Pattern of Reaction Time: Statistical analysis Prepared by: Alison L. Gibbs, PhD, PStat Prepared for: Dr. Principal Investigator of Reaction Time Project January 11, 2015 Summary: This report gives
More information