An Examination of Factors Affecting Incidence and Survival in Respiratory Cancers. Katie Frank Roberto Perez Mentor: Dr. Kate Cowles.

Similar documents
Asthma and Particulate Air Pollution: A Spatial Analysis

An Overview of Survival Statistics in SEER*Stat

Survival Prediction Models for Estimating the Benefit of Post-Operative Radiation Therapy for Gallbladder Cancer and Lung Cancer

Trends in Lung Cancer Morbidity and Mortality

County-Level Small Area Estimation using the National Health Interview Survey (NHIS) and the Behavioral Risk Factor Surveillance System (BRFSS)

Testing Statistical Models to Improve Screening of Lung Cancer

Population Health Observatory

NEZ PERCE COUNTY CANCER PROFILE

KOOTENAI COUNTY CANCER PROFILE

BOUNDARY COUNTY CANCER PROFILE

Supplementary Materials

ADAMS COUNTY CANCER PROFILE

BONNER COUNTY CANCER PROFILE

BINGHAM COUNTY CANCER PROFILE

CANCER FACTS & FIGURES For African Americans

NEZ PERCE COUNTY CANCER PROFILE

KOOTENAI COUNTY CANCER PROFILE

TWIN FALLS COUNTY CANCER PROFILE

JEROME COUNTY CANCER PROFILE

BUTTE COUNTY CANCER PROFILE

LINCOLN COUNTY CANCER PROFILE

CANYON COUNTY CANCER PROFILE

Chapter II: Overview

Cancer A Superficial Introduction

Multnomah County: Leading Causes of Death

Classification of Patients Treated for Infertility Using the IVF Method

Co-Variation in Sexual and Non-Sexual Risk Behaviors Over Time Among U.S. High School Students:

Journal of Physics: Conference Series. Related content PAPER OPEN ACCESS

Ethnic Disparities in the Treatment of Stage I Non-small Cell Lung Cancer. Juan P. Wisnivesky, MD, MPH, Thomas McGinn, MD, MPH, Claudia Henschke, PhD,

Computer Age Statistical Inference. Algorithms, Evidence, and Data Science. BRADLEY EFRON Stanford University, California

Supplementary Material for

What is the Impact of Cancer on African Americans in Indiana? Average number of cases per year. Rate per 100,000. Rate per 100,000 people*

Current HPV-Related Cancer Rates New Jersey State Cancer Registry

Stage-Specific Predictive Models for Cancer Survivability

A Closer Look at Leading Causes of Death in Guilford County

Risk Factors in African-American Women. Michele L. Cote, PhD Associate Professor Wayne State t University

Cancer in Ontario. 1 in 2. Ontarians will develop cancer in their lifetime. 1 in 4. Ontarians will die from cancer

Information Services Division NHS National Services Scotland

2018 Community Health Assessment

Predicting Breast Cancer Survival Using Treatment and Patient Factors

/RFDO )LQGLQJV. Cancers All Types. Cancer is the second leading cause of death in Contra Costa.

A post-psa Update on Trends in Prostate Cancer Incidence. Ann Hamilton and Myles Cockburn Keck School of Medicine, USC, Los Angeles

Cancer Statistics, 2008

Walworth County Health Data Report. A summary of secondary data sources

Application of Artificial Neural Network-Based Survival Analysis on Two Breast Cancer Datasets

Cancer of the Breast (Female) - Cancer Stat Facts

Predictive Models for Making Patient Screening Decisions

Key Words. Cancer statistics Incidence Lifetime risk Multiple primaries Survival SEER

CANCER IN NSW ABORIGINAL PEOPLES. Incidence, mortality and survival September 2012

Updates on the Conflict of Postoperative Radiotherapy Impact on Survival of Young Women with Cancer Breast: A Retrospective Cohort Study

Annual Report to the Nation on the Status of Cancer, , Featuring Survival Questions and Answers

Cancer in New Mexico 2017

Confluence: Conformity Influence in Large Social Networks

Percent Surviving 5 Years 89.4%

Survival in sinonasal and middle ear malignancies: a population-based study using the SEER database

*

Cancer in Utah: An Overview of Cancer Incidence and Mortality from

Application of EM Algorithm to Mixture Cure Model for Grouped Relative Survival Data

Logistic regression. Department of Statistics, University of South Carolina. Stat 205: Elementary Statistics for the Biological and Life Sciences

Data Mining Techniques to Predict Survival of Metastatic Breast Cancer Patients

Weight Adjustment Methods using Multilevel Propensity Models and Random Forests

Information Services Division NHS National Services Scotland

Mortality Slide Series. National Center for HIV/AIDS, Viral Hepatitis, STD, and TB Prevention Division of HIV/AIDS Prevention

SMOKING AND CANCER RISK

Evaluating Classifiers for Disease Gene Discovery

BREAST CANCER IN TARRANT COUNTY: Screening, Incidence, Mortality, and Stage at Diagnosis

School of Population and Public Health SPPH 503 Epidemiologic methods II January to April 2019

Incidence of Cancers Associated with Modifiable Risk Factors and Late Stage Diagnoses for Cancers Amenable to Screening Idaho

Early Age and Late Stage Diagnosis of Colorectal Cancer Among American Indian Residents of Montana,

8/24/2011. Study Goal. Study Design. Patient Attributes Influencing Pain and Pain Management in Postoperative Total Knee Arthroplasty Patients

Errata Corrected 17 January, 2017

CASINO REVENUE AND AMERICAN INDIAN HEALTH

Epidemiology in Texas 2006 Annual Report. Cancer

A New Measure to Assess the Completeness of Case Ascertainment

Supplementary Appendix

Meier Hsu, Ann Zauber, Mithat Gönen, Monica Bertagnolli. Memorial-Sloan Kettering Cancer Center. May 18, 2011

Cancer Facts & Figures for African Americans

Patient Safety in the Age of Big Data

Cancer in Maine: Using Data to Direct Actions 2018 Challenge Cancer Conference May 1, 2018

Sensitivity, specicity, ROC

Epithelial-myoepithelial carcinoma: a population-based survival analysis

Creating prognostic systems for cancer patients: A demonstration using breast cancer

National Cancer Institute

Burden of Cancer in California

Estimated Minnesota Cancer Prevalence, January 1, MCSS Epidemiology Report 04:2. April 2004

Incidence of Cancers Associated with Modifiable Risk Factors and Late Stage Diagnoses for Cancers Amenable to Screening Idaho

Small-area estimation of mental illness prevalence for schools

Cancer in New Mexico 2014

Machine Learning for Survival Analysis: A Case Study on Recurrence of Prostate Cancer

Overview of Gynecologic Cancers in New Jersey

Changing Patient Base. A Knowledge to Practice Program

Mitochondrial DNA variation associated with gait speed decline among older HIVinfected non-hispanic white males

What Is Small Cell Lung Cancer?

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Key Words. SEER Cancer Survival Incidence Mortality Prevalence

Cigarette Smoking Among Adolescents and Adults in 24 U.S. States and the District of Columbia in 1997 What Explains the Relationship?

Up-to-date survival estimates from prognostic models using temporal recalibration

Biostatistics II

Tuberculosis Epidemiology

Incident cancers among participants of the Alaska EARTH study, and associations with known cancer risk factors.

Transcription:

An Examination of Factors Affecting Incidence and Survival in Respiratory Cancers Katie Frank Roberto Perez Mentor: Dr. Kate Cowles ISIB 2015 University of Iowa College of Public Health July 16th, 2015 Katie Frank Roberto Perez Incidence and Survival in Respiratory Cancers 1/28

Introduction Research Goals Predict which demographic variables distinguish between types of respiratory cancers. Determine which variables affect survival of patients with various types of respiratory cancers. Identify what characteristics of county populations affect respiratory cancer incidence rates. Katie Frank Roberto Perez Incidence and Survival in Respiratory Cancers 2/28

What Are Respiratory Cancers? 1 Cancers that affect any part of the respiratory system. 2 In the United States and worldwide, respiratory cancers are a leading cause of cancer-related deaths for both men and women. 3 According to the American Cancer Society, lung cancer in particular accounts for approximately 27% of all cancer deaths. More deaths than prostate, breast, and colorectal cancers combined. 4 It is commonly known that smoking affects the risk of respiratory cancers, such as lung cancer. Katie Frank Roberto Perez Incidence and Survival in Respiratory Cancers 3/28

Data Sets 1 The Surveillance, Epidemiology, and End Results (SEER) data set is a compilation of cancer data on all incident cancer cases from 1973 to 2012 in certain regions of the United States. Includes data on approximately 625,000 respiratory cancer cases. Contains 131 variables. Iowa is a SEER State. 2 The National Cancer Institute s Small Area Estimates for Cancer Risk Factors and Screening Behavior data set contains estimates on smoking prevalence by county for the years 1997 to 1999 and 2000 to 2003. Includes data on current smokers and ever smokers for both sexes. Katie Frank Roberto Perez Incidence and Survival in Respiratory Cancers 4/28

Respiratory System Nose, Nasal Cavity, and Middle Ear Larynx Lung and Bronchus Trachea, Mediastinum, and Other Respiratory Organs Pleura Katie Frank Roberto Perez Incidence and Survival in Respiratory Cancers 5/28

SEER Incidence of Respiratory Cancers SEER Incidence of Respiratory Cancers 572735 Reported Cases 0e+00 2e+05 4e+05 42941 6716 316 2289 Larynx Lung Nose Pleura Trachea Respiratory Cancers Katie Frank Roberto Perez Incidence and Survival in Respiratory Cancers 6/28

SEER Incidence of Respiratory Cancers by Sex Katie Frank Roberto Perez Incidence and Survival in Respiratory Cancers 7/28

SEER Incidence of Respiratory Cancers by Race Katie Frank Roberto Perez Incidence and Survival in Respiratory Cancers 8/28

Machine Learning Machine learning is a subfield of computer science that explores the construction and study of algorithms that can learn from and make predictions on data. We used classification models to predict cancer type given a variety of demographic variables. Katie Frank Roberto Perez Incidence and Survival in Respiratory Cancers 9/28

Machine Learning We used the Naive Bayes, Neural Networks, Multinomial Logistic Regression and Support Vector Machines machine learning algorithms to predict cancer type given certain demographic variables. The variables used were age, race, hispanic origin, and region. To test the reliability of our models we divided our data in a training subset ( 80% of the data) and a testing subset ( 20% of the data). Katie Frank Roberto Perez Incidence and Survival in Respiratory Cancers 10/28

Machine Learning (With Lung Cancer) Naive Bayes Multinomial Logistic Regression Nose Larynx Lung Pleura Trachea Nose 1 0 0 0 1 Larynx 11 1 7 0 11 Lung 1311 7659 115475 71 372 Pleura 0 0 0 0 0 Trachea12 1 12 0 55 Percent Correct: 92.4256% Neural Networks Nose Larynx Lung Pleura Trachea Nose 7 0 0 0 8 Larynx 0 0 0 0 0 Lung 1319 7661 115490 71 388 Pleura 0 0 0 0 0 Trachea9 0 4 0 43 Percent Correct: 92.432% Support Vector Machines Nose Larynx Lung Pleura Trachea Lung 1335 7661 115494 71 439 Percent Correct: 92.3952% Nose Larynx Lung Pleura Trachea Nose 0 0 0 0 0 Larynx 0 0 0 0 0 Lung 1335 7661 115494 71 439 Pleura 0 0 0 0 0 Trachea0 0 0 0 0 Percent Correct: 92.3952% Katie Frank Roberto Perez Incidence and Survival in Respiratory Cancers 11/28

Machine Learning (Without Lung Cancer) Naive Bayes Multinomial Logistic Regression Nose Larynx Pleura Trachea Nose 8 9 0 9 Larynx 1349 8364 79 337 Pleura 0 0 0 0 Trachea 80 38 6 140 Percent Correct: 81.6969% Neural Networks Nose Larynx Pleura Trachea Nose 14 2 0 15 Larynx 1360 8387 83 346 Pleura 0 0 0 0 Trachea 63 22 2 125 Percent Correct: 81.83127% Support Vector Machines Nose Larynx Pleura Trachea Nose 41 32 4 31 Larynx 1322 8357 79 317 Trachea 74 22 2 138 Percent Correct: 81.92725% Nose Larynx Pleura Trachea Nose 11 1 0 13 Larynx 1367 8389 83 352 Pleura 0 0 0 0 Trachea 59 21 2 121 Percent Correct: 81.78328% Katie Frank Roberto Perez Incidence and Survival in Respiratory Cancers 12/28

Ecological Analysis with Poisson Regression Poisson regression is a form of regression analysis that can be used to model count data. Let x R n is a vector of independent variables then: log(e(y x)) = α + β x In our case Y represents the incidence of a certain cancer. Katie Frank Roberto Perez Incidence and Survival in Respiratory Cancers 13/28

Ecological Tables Table: Lung and Bronchus Cancer (Current Smoking 1997-1999) Estimate Std. Error z value Pr(> z ) (Intercept) -4.4024 0.0978-45.00 0.0000 40< 5.0801 0.0858 59.19 0.0000 Black 0.0429 0.0270 1.59 0.1121 Other -0.7491 0.0476-15.73 0.0000 Female -0.0130 0.0657-0.20 0.8426 Current1997_1999 0.0317 0.0020 15.77 0.0000 Female:Current1997_1999-0.0112 0.0029-3.80 0.0001 Katie Frank Roberto Perez Incidence and Survival in Respiratory Cancers 14/28

Ecological Tables Table: Lung and Bronchus Cancer (Lifetime Smoking 1997-1999) Estimate Std. Error t value Pr(> t ) (Intercept) -5.1354 0.1193-43.04 0.0000 40< 5.0676 0.0858 59.04 0.0000 Black 0.0854 0.0267 3.20 0.0014 Other -0.7075 0.0477-14.83 0.0000 Female 0.1650 0.1080 1.53 0.1267 Lifetime1997_1999 0.0286 0.0016 17.91 0.0000 Female:Lifetime1997_1999-0.0048 0.0023-2.10 0.0361 Katie Frank Roberto Perez Incidence and Survival in Respiratory Cancers 15/28

Ecological Tables Table: Larynx Cancer (Current Smoking 1997-1999) Estimate Std. Error t value Pr(> t ) (Intercept) -4.0562 0.2969-13.66 0.0000 40< 4.4987 0.2522 17.84 0.0000 Black 0.4022 0.0947 4.25 0.0000 Other -0.8298 0.2034-4.08 0.0000 Female -0.7595 0.2927-2.60 0.0095 Current1997_1999 0.0394 0.0067 5.85 0.0000 Female:Current1997_1999-0.0207 0.0136-1.52 0.1277 Katie Frank Roberto Perez Incidence and Survival in Respiratory Cancers 16/28

Ecological Tables Table: Larynx Cancer (Lifetime Smoking 1997-1999) Estimate Std. Error t value Pr(> t ) (Intercept) -5.0940 0.3773-13.50 0.0000 40< 4.4846 0.2522 17.78 0.0000 Black 0.4640 0.0927 5.01 0.0000 Other -0.7682 0.2038-3.77 0.0002 Female -0.2454 0.4698-0.52 0.6014 Lifetime1997_1999 0.0378 0.0054 7.03 0.0000 Female:Lifetime1997_1999-0.0157 0.0104-1.51 0.1316 Katie Frank Roberto Perez Incidence and Survival in Respiratory Cancers 17/28

Ecological Analysis Summary Blacks are at a higher risk than whites for most respiratory cancers while Others (American Indians/AK Natives and Asian/Pacific Islanders) are at lower risk when other variables are fixed. (Both Cases) Females are at a lower risk than males for most respiratory cancers when other variables are fixed. Smokers, both Current and Lifetime, are at a higher risk for respiratory cancers. Katie Frank Roberto Perez Incidence and Survival in Respiratory Cancers 18/28

Survival Analysis 1 Performed survival analysis on SEER respiratory cancer data by using the Kaplan-Meier method. 2 Time-to-event: The time from the beginning of the observation period to death or withdrawal from the study. 3 Variables Age at Diagnosis Sex Race Tumor Grade Katie Frank Roberto Perez Incidence and Survival in Respiratory Cancers 19/28

Kaplan-Meier Survival Analysis Used to estimate a population survival curve from a sample. Allows computation of survival over time with censored data. Let S(t) be the survival probability at any given time for a member from a given population. n i d i Ŝ(t) = Π ti <t n i Katie Frank Roberto Perez Incidence and Survival in Respiratory Cancers 20/28

Age at Diagnosis Survival Analysis Lung and Bronchus Cancer Survival Distributions by Age Probability of Survival 0.0 0.2 0.4 0.6 0.8 1.0 Age 00 Ages 1 4 Ages 5 9 Ages 10 14 Ages 15 19 Ages 20 24 Ages 25 29 Ages 30 34 Ages 35 39 Ages 40 44 Ages 45 49 Ages 50 54 Ages 55 59 Ages 60 64 Ages 65 69 Ages 70 74 Ages 75 79 Ages 80 84 Ages 85+ 0 100 200 300 400 Survival Time (Months) Katie Frank Roberto Perez Incidence and Survival in Respiratory Cancers 21/28

Sex Survival Analysis Katie Frank Roberto Perez Incidence and Survival in Respiratory Cancers 22/28

Race Survival Analysis Katie Frank Roberto Perez Incidence and Survival in Respiratory Cancers 23/28

Tumor Grade Survival Analysis Katie Frank Roberto Perez Incidence and Survival in Respiratory Cancers 24/28

Survival Analysis Summary Elderly people have the worst survival prognosis. Survival rates are lower for males than females for most respiratory cancers. For all respiratory cancers, American Indians/AK Natives and Asian/Pacific Islanders have the best survival prognosis, while Blacks have the worst survival prognosis. A higher tumor grade corresponds to a lower probability of survival. Katie Frank Roberto Perez Incidence and Survival in Respiratory Cancers 25/28

Conclusions Current and Lifetime smoking are highly significant predictors for respiratory cancers. Age and Tumor Grade are good predictors for survival. Some respiratory cancer patients are able to live 30+ years with their cancer. Smoking data is a bit dated, would ve been interesting to look at data from this decade. Katie Frank Roberto Perez Incidence and Survival in Respiratory Cancers 26/28

References Respiratory System Organs [Photograph]. Retrieved from http://www.wickersham.us/anne/respiration.htm Small Area Estimates for Cancer Risk Factors Screening Behaviors. National Cancer Institute, DCCPS, Statistical Methodology Applications Branch, released May 2010 (sae.cancer.gov). Underlying data provided by Behavioral Risk Factor Surveillance System (http://www.cdc.gov/brfss/) and National Health Interview Survey (http://www.cdc.gov/nchs/nhis.htm). Surveillance, Epidemiology, and End Results (SEER) Program (www.seer.cancer.gov) Research Data (1973-2012), National Cancer Institute, DCCPS, Surveillance Research Program, Surveillance Systems Branch, released April 2015, based on the November 2014 submission. The Respiratory System. (2012, July 7). Retrieved July 15, 2015, from http://www.nhlbi.nih.gov/health/health-topics/topics/hlw/system What are the key statistics about lung cancer? (2015, March 4). Retrieved July 15, 2015, from http://www.cancer.org/cancer/lungcancer-nonsmallcell/detailedguide/non-small-cell-lung-cancer-key-statistics ISIB Program sponsored by the National Heart Lung and Blood Institute (NHLBI) T15 HL097622 Katie Frank Roberto Perez Incidence and Survival in Respiratory Cancers 27/28

Citation for R and R Packages R Core Team (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL: http://www.r-project.org/. David Meyer, Evgenia Dimitriadou, Kurt Hornik, Andreas Weingessel and Friedrich Leisch (2014). e1071: Misc Functions of the Department of Statistics (e1071), TU Wien. R package version 1.6-4. http://cran.r-project.org/package=e1071 Ian Marschner (2014). glm2: Fitting Generalized Linear Models. R package version 1.1.2. http://cran.r-project.org/package=glm2 Ingo Feinerer and Kurt Hornik (2015). tm: Text Mining Package. R package version 0.6-2. http://cran.r-project.org/package=tm Therneau T (2014). A Package for Survival Analysis in S. R package version 2.37-7. http://cran.r-project.org/package=survival Venables, W. N. Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth Edition. Springer, New York. ISBN 0-387-95457-0 Katie Frank Roberto Perez Incidence and Survival in Respiratory Cancers 28/28