Issues for analyzing competing-risks data with missing or misclassification in causes. Dr. Ronny Westerman

Similar documents
Estimating and Modelling the Proportion Cured of Disease in Population Based Cancer Studies

Up-to-date survival estimates from prognostic models using temporal recalibration

Quantifying cancer patient survival; extensions and applications of cure models and life expectancy estimation

Estimating and comparing cancer progression risks under varying surveillance protocols: moving beyond the Tower of Babel

Application of EM Algorithm to Mixture Cure Model for Grouped Relative Survival Data

Assessment of lead-time bias in estimates of relative survival for breast cancer

Biological Cure. Margaret R. Stedman, Ph.D. MPH Angela B. Mariotto, Ph.D.

Overview. All-cause mortality for males with colon cancer and Finnish population. Relative survival

Statistical Models for Censored Point Processes with Cure Rates

Case Studies in Bayesian Augmented Control Design. Nathan Enas Ji Lin Eli Lilly and Company

ESTIMATING OVERDIAGNOSIS FROM TRIALS AND POPULATIONS OVERCOMING CHALLENGES, AVOIDING MISTAKES

Original Article INTRODUCTION. Alireza Abadi, Farzaneh Amanpour 1, Chris Bajdik 2, Parvin Yavari 3

Sarwar Islam Mozumder 1, Mark J Rutherford 1 & Paul C Lambert 1, Stata London Users Group Meeting

Breast cancer survival, competing risks and mixture cure model: a Bayesian analysis

Statistical Models for Bias and Overdiagnosis in Prostate Cancer Screening

Estimating and modelling relative survival

Part [1.0] Introduction to Development and Evaluation of Dynamic Predictions

Modelling Spatially Correlated Survival Data for Individuals with Multiple Cancers

LOGO. Statistical Modeling of Breast and Lung Cancers. Cancer Research Team. Department of Mathematics and Statistics University of South Florida

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016

Multi-state survival analysis in Stata

Ethnic Disparities in the Treatment of Stage I Non-small Cell Lung Cancer. Juan P. Wisnivesky, MD, MPH, Thomas McGinn, MD, MPH, Claudia Henschke, PhD,

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method

Jonathan D. Sugimoto, PhD Lecture Website:

Statistical Methods for the Evaluation of Treatment Effectiveness in the Presence of Competing Risks

THE ISSUE OF STAGE AT DIAGNOSIS

How to Choose the Wrong Model. Scott L. Zeger Department of Biostatistics Johns Hopkins Bloomberg School

Statistical Tolerance Regions: Theory, Applications and Computation

Non-homogenous Poisson Process for Evaluating Stage I & II Ductal Breast Cancer Treatment

Assessing methods for dealing with treatment switching in clinical trials: A follow-up simulation study

The improved cure fraction for esophageal cancer in Linzhou city

Statistical methods for the meta-analysis of full ROC curves

Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology

Methods for adjusting survival estimates in the presence of treatment crossover a simulation study

Estimating heritability for cause specific mortality based on twin studies

ethnicity recording in primary care

Long-term survival rate of stage I-III small cell lung cancer patients in the SEER database - application of the lognormal model

Bayesians methods in system identification: equivalences, differences, and misunderstandings

Douglas A. Thoroughman, 1,2 Deborah Frederickson, 3 H. Dan Cameron, 4 Laura K. Shelby, 2,5 and James E. Cheek 2

Application of Multivariate and Bivariate Normal Distributions to Estimate Duration of Diabetes

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

Supplement for: CD4 cell dynamics in untreated HIV-1 infection: overall rates, and effects of age, viral load, gender and calendar time.

Computer Age Statistical Inference. Algorithms, Evidence, and Data Science. BRADLEY EFRON Stanford University, California

What to do with missing data in clinical registry analysis?

MEASURING THE UNDIAGNOSED FRACTION:

Estimands, Missing Data and Sensitivity Analysis: some overview remarks. Roderick Little

Performance Assessment for Radiologists Interpreting Screening Mammography

Survival Prediction Models for Estimating the Benefit of Post-Operative Radiation Therapy for Gallbladder Cancer and Lung Cancer

M. J. Rutherford 1,*, T. M-L. Andersson 2, H. Møller 3, P.C. Lambert 1,2.

Benchmark Dose Modeling Cancer Models. Allen Davis, MSPH Jeff Gift, Ph.D. Jay Zhao, Ph.D. National Center for Environmental Assessment, U.S.

Analysis of Vaccine Effects on Post-Infection Endpoints Biostat 578A Lecture 3

Accommodating informative dropout and death: a joint modelling approach for longitudinal and semicompeting risks data

Quantifying and Mitigating the Effect of Preferential Sampling on Phylodynamic Inference

~.~i. PERFORMING ORGANIZAT in REPORT NUMBER MADISON WI AESSRTR* O

Spatiotemporal models for disease incidence data: a case study

Bayesian Estimations from the Two-Parameter Bathtub- Shaped Lifetime Distribution Based on Record Values

Sociology Exam 3 Answer Key [Draft] May 9, 201 3

Parvovirus B19 in five European countries: disentangling the sociological and microbiological mechanisms underlying infectious disease transmission

Using dynamic prediction to inform the optimal intervention time for an abdominal aortic aneurysm screening programme

Analysis of Subpopulation Emergence in Bacterial Cultures, a case Study for Model Based Clustering or Finite Mixture Models

Comparison of methods and programs for calculating health life expectancies.

Missing Data and Imputation

A Comparative Study of Survival approaches for Breast Cancer Patients

Downloaded from irje.tums.ac.ir at 19:49 IRST on Monday January 14th 2019

Statistical methods for the meta-analysis of full ROC curves

Appendix NEW METHOD FOR ESTIMATING HIV INCIDENCE AND TIME FROM INFECTION TO DIAGNOSIS USING HIV SURVEILLANCE DATA: RESULTS FOR FRANCE

Introduction to Survival Analysis Procedures (Chapter)

Estimation of Area under the ROC Curve Using Exponential and Weibull Distributions

Reducing Decision Errors in the Paired Comparison of the Diagnostic Accuracy of Continuous Screening Tests

Abstract: Heart failure research suggests that multiple biomarkers could be combined

Propensity scores and instrumental variables to control for confounding. ISPE mid-year meeting München, 2013 Rolf H.H. Groenwold, MD, PhD

16:35 17:20 Alexander Luedtke (Fred Hutchinson Cancer Research Center)

Individual Participant Data (IPD) Meta-analysis of prediction modelling studies

Bayesian Adjustments for Misclassified Data. Lawrence Joseph

Motivation Empirical models Data and methodology Results Discussion. University of York. University of York

Bayesian Adjustments for Misclassified Data. Lawrence Joseph

A bivariate parametric long-range dependent time series model with general phase

HIV risk associated with injection drug use in Houston, Texas 2009: A Latent Class Analysis

BREAST CANCER IN TARRANT COUNTY: Screening, Incidence, Mortality, and Stage at Diagnosis

Bayesian Hierarchical Models for Fitting Dose-Response Relationships

Comparison Cure Rate Models by DIC Criteria in Breast Cancer Data

OPERATIONAL RISK WITH EXCEL AND VBA

A Bayesian Perspective on Unmeasured Confounding in Large Administrative Databases

National Cancer Institute

ASSESSMENT OF LEAD-TIME BIAS IN ESTIMATES OF RELATIVE SURVIVAL FOR BREAST CANCER

Analysis of left-censored multiplex immunoassay data: A unified approach

Nation nal Cancer Institute. Prevalence Projections: The US Experience

Statistical approaches to antibody data analysis for populations on the path of malaria elimination

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition

Friday, September 9, :00-11:00 am Warwick Evans Conference Room, Building D Refreshments will be provided at 9:45am

Downloaded from:

A Predictive Chronological Model of Multiple Clinical Observations T R A V I S G O O D W I N A N D S A N D A M. H A R A B A G I U

An overview and some recent advances in statistical methods for population-based cancer survival analysis

Vessel wall differences between middle cerebral artery and basilar artery. plaques on magnetic resonance imaging

Prediction and Inference under Competing Risks in High Dimension - An EHR Demonstration Project for Prostate Cancer

Tuberculosis Epidemiology

How to Choose the Wrong Model. Scott L. Zeger Department of Biostatistics Johns Hopkins Bloomberg School

Robust Outbreak Surveillance

CLINICAL ASSISTED REPRODUCTION

Measuring cancer survival in populations: relative survival vs cancer-specific survival

Transcription:

Issues for analyzing competing-risks data with missing or misclassification in causes Dr. Ronny Westerman Institute of Medical Sociology and Social Medicine Medical School & University Hospital July 27, 2012

Outline Introduction/Background Data Methods Discussion Perspectives

Introduction Limited Failure Models (Immortal) Competing Risks Missing and Misclassification of causes (Masked causes)

Limited Failure Model (Cure Survival Models) Examples: Infant Mortality Curability of cancer and decreasing mortality risk since diagnosis of cancer None defective units are not expected to fail from risk

The Competing Risk Problem: Each subject being exposed to many competing risks, but only one will be caused the failure Subject ist still right-censored if it do not fail within the follow-up duration

Competing Risks Non-parametic, semi-parametric and full-parametric models Cause-spezific hazard function Problem: Assumption of independence through cause often violated? Failure Time for all risks are operatively the same, in that case, all risks being removed except the risk under consideration

Missclassification and Missing of causes Cause of event for some of units or individuals not exactly identified or recored Partial masking: Cause is narrowed down but not exactely identified Reason for missclassification: documentation containing the information needed for attributing the cause of failure may be not collected, or the cause of diseases for some patients may be difficult to determine

Difficulties for determination: (aetiological problems) Example: Cardioembolic stroke (Leary and Caplan, 2008) Cardioembolic stroke occurs when the heart pumps unwanted materials into the brain circulation, resulting in the occlusion of a brain blood vessel and damage to the brain tissue. CS diagnosed in 3-8% stroke patients, but in various current stroke registries, approximately 10-20% patients with CS have not maximal symptoms at the onset of their stroke Exclusion

Missclassification: Example: Breast Cancer TNM Staging vs. I-IV Staging Stage migration: improved detection of illness leads to movement of people from the set of healthy people to the set of unhealthy people Will Rogers phenomen: When the Okies left Oklahoma and moved to California, they raised the average intelligence level in both states

Methods for treating masked cause data 1) Mutiple Imputations Should be used, when Baseline are not proportional Works good in case of Missing at Random (for cause) Problem: High-Mortality-Risks, Multiple-Specific and High- Potential-Risks often not Missing at Random False classification or misinterpretation of cause-specific mortality

Methods for treating masked cause data 2) Second Stage Analysis Models with non-proportional cause-specific hazard 3) EM for grouped Survival data Bayesian Methods Assumptions for masked causes: Right censoring, if causes not exactly identified Masking probabilty is constant over time

SEER Cancer Statistic Data Base National Cancer Institute, DCCPS, Surveillance Research Program, Cancer Statistics Branch (released April 2012) Incidence by Race, Gender and Age (different periods of time) Cause-Specific Mortality including all specific cancer SEER public use dataset on survival of breast cancer patients from 1992-2009 (n=69,990 in Situ)

Leading Cause of Death in the U.S. 1975 vs. 2009 Source: US Moratlity Files, National Center of Health Statistics, Centers of Disease Control and Prevention

US Death Rates, 1975-2009 Heart Disease compared to Neoplasms, by age at death Source: US Moratlity Files, National Center of Health Statistics, Centers of Disease Control and Prevention Rates are per 100,000 and age-adjusted to the 2000 US Std Population (19 age groups - Census P25-1103).

Source: US Moratlity Files, National Center of Health Statistics, Centers of Disease Control and Prevention

5-year Conditional Relative Survival for Cancer of female Breast SEER, 2012

And now, what s the problem? Preliminary Analysis with SEER- DATA (Sen et al. 2010) Over-sampling the masked cases 46 % of the women died during follow-up Specific mortality related to breast cancer, other cancer or non-cancer related causes for 56 % the exact cause of death was known for 35 % partial information available 30 % with missing cause of death: false classification (breast cancer to other or multiples cancer) 65 % missing causes were complety masked

How do deal with masked causes? Motivation to use Two-Component-Model for masked causes Risks are latent: no specific information about the cause of the component failure Only some individuals may susceptible to the event of interest (curability or the recessive risk for the disease)

Introduction Background Data and Methods Results Discussion Referenc

Useful Stata commands for cure models: lncure, spsurv, and cureregr (Lambert, 2007) the advances of cureregr: fits both mixture and nonmixture cure models parametric distributions: exponential, Weibull, lognormal, and gamma parametric distributions available Optional: strsmix allowing more flexible parametric distributions

Data Analysis with SEER Breast Cancer Data Survival of breast cancer patients from 1992-2009 (n=69,990 in Situ) cause of death: breast cancer and other causes other causes as competing risks We used a non-mixture cure fraction model with Weibull and Exponential specification

Results from Data Analysis (Estimates for the Long-Term Survival Function) Table 1: MLEs and the standard errors for SEER Breast Cancer Data Distribution λ φ p Weibull 0.0047 (0.00021) 0.6732 (0.1428) 0.28057 (0.1016) Exponential 0.0041 (0.0009) 0.3032 (0.0962) Λ-scale parameter, φ- shape parameter, p- long-term parameter Table 2: Likelihood, AIC and BIC values Model l(.) AIC BIC Weibull -46.12845 98.27691 103.7626 Exponential -46.20798 96.43586 100.1032

Results no evidence that Weibull provides a better fitting than the Exponential for Seer Breast Cancer Data at 5% significance corrobate the empirical Kaplan-Meier Survival

Thrills and Tears with Cure Survival Models Thrills: less assumptions and minor computation problems Tears: to overcome the naïve assumption for infinite failure time of the nonsusceptiple units

Limitations for parametric hazard functions The complexity of the baseline hazard function (Crowther and Lambert, 2011) beyond standard and sometimes biologically and implausible shapes a turning point in the hazard function is observed 2-component mixture distribution e.g. Weibull-Weibull-distribution S 0 ( t) = p exp( λ1t γ 1 ) + (1 p)exp( λ 2t γ 2 ) other distribution families also available

Options in STATA STPM2: Stata module to estimate flexible parametric survival models (Royston-Parmar models) (updated by Lampert, 2012) STPM2 also used with single- or multiple- record (more generalized) STMIX: 2-component parametric mixture survival models (Crowther and Lambert, 2011) distribution choices includes Weibull-Weibull or Weibull-exponential STMIX can be used with single- or multiple-record

References Craiu and Lee (2005): Model Selection for the Competing-Risks Model with and without masking. Technometrics, Vol.25, No.4, 457-467 Leary MC, Caplan LR (2008): Cardioembolic stroke: An updated on etiology, diagnosis and management. Annuals Indian Academic Neurology, 11, 52-63 Lu and Liang (2008): Analysis of competing risks data with missing cause of failure under additive hazard model. Statistica Sinica 18, 219-234. Crowther and Lampert (2011): Simulating complex survival data. Stata Nordic and Baltic Users Group Meeting National Cancer Institute DCCPS Surveillance Research Programme, Surveillance, Epidemiology and End Results (SEER) Programme (www.cancer.seer.gov) Research Data (1973-2009) (Released April 2012) Louzda et al. (2012): The Long-Term Bivariate Survival FGM Copula Model: An Application to a Brazilian HIV Data. Journal of Data Science 10, 515-535 Roman et al. (2012): A New Long-Term Survival Distribution for Cancer Data. Journal of Data Science 10, 242-258 Sen et al. (2010): A Bayesian approach to competing risks analysis with masked cause of death. Statistics in Medicine, 29, 1681-1695

Thank you for your attention

Mon, 7/30/2012, 8:30 AM - 10:20 AM Biometrics Section CC-Room 24C Advances in Modeling Competing Risks Contributed Papers