Modelling Spatially Correlated Survival Data for Individuals with Multiple Cancers

Similar documents
Joint Spatio-Temporal Modeling of Low Incidence Cancers Sharing Common Risk Factors

Application of EM Algorithm to Mixture Cure Model for Grouped Relative Survival Data

A Bayesian Measurement Model of Political Support for Endorsement Experiments, with Application to the Militant Groups in Pakistan

Prediction and Inference under Competing Risks in High Dimension - An EHR Demonstration Project for Prostate Cancer

14. Linear Mixed-Effects Models for Data from Split-Plot Experiments

BAYESIAN ANALYSIS OF CANCER MORTALITY RATES FROM DIFFERENT TYPES AND THEIR RELATIVE OCCURENCES

Modelling prognostic capabilities of tumor size: application to colorectal cancer

Data Analysis Using Regression and Multilevel/Hierarchical Models

arxiv: v3 [stat.ap] 31 Jul 2017

Bayesian hierarchical joint modeling of repeatedly measured continuous and ordinal markers of disease severity: Application to Ugandan diabetes data

Outlier Analysis. Lijun Zhang

SUPPLEMENTARY MATERIAL. Impact of Vaccination on 14 High-Risk HPV type infections: A Mathematical Modelling Approach

SCHOOL OF MATHEMATICS AND STATISTICS

Bayesian Modeling of Multivariate Spatial Binary Data with Application to Dental Caries

Bayesian Joint Modelling of Longitudinal and Survival Data of HIV/AIDS Patients: A Case Study at Bale Robe General Hospital, Ethiopia

Combining Risks from Several Tumors Using Markov Chain Monte Carlo

Small-area estimation of mental illness prevalence for schools

Bayesian Confidence Intervals for Means and Variances of Lognormal and Bivariate Lognormal Distributions

Cancer survivorship and labor market attachments: Evidence from MEPS data

Bayesian Estimation of a Meta-analysis model using Gibbs sampler

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination

Bayesian Nonparametric Methods for Precision Medicine

CDRI Cancer Disparities Geocoding Project. November 29, 2006 Chris Johnson, CDRI

A bivariate parametric long-range dependent time series model with general phase

A Bayesian approach to sample size determination for studies designed to evaluate continuous medical tests

with Stata Bayesian Analysis John Thompson University of Leicester A Stata Press Publication StataCorp LP College Station, Texas

2014 Modern Modeling Methods (M 3 ) Conference, UCONN

Bayesian Latent Subgroup Design for Basket Trials

Hierarchical Bayesian Methods for Estimation of Parameters in a Longitudinal HIV Dynamic System

Design for Targeted Therapies: Statistical Considerations

Practical Bayesian Design and Analysis for Drug and Device Clinical Trials

Knowledge Discovery and Data Mining I

Bayesian Modelling of the Consistency of Symptoms Reported During Hypoglycaemia for Individual Patients

Jonathan D. Sugimoto, PhD Lecture Website:

Bayesian Bi-Cluster Change-Point Model for Exploring Functional Brain Dynamics

Faculty of Sciences School for Information Technology Master of Statistics

Accommodating informative dropout and death: a joint modelling approach for longitudinal and semicompeting risks data

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

Multilevel analysis quantifies variation in the experimental effect while optimizing power and preventing false positives

Spatio-temporal modelling and probabilistic forecasting of infectious disease counts

TRIPODS Workshop: Models & Machine Learning for Causal I. & Decision Making

Statistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes.

Using mixture priors for robust inference: application in Bayesian dose escalation trials

HIDDEN MARKOV MODELS FOR ALCOHOLISM TREATMENT TRIAL DATA

arxiv: v2 [stat.ap] 11 Apr 2018

Cancer in Rural Illinois, Incidence, Mortality, Staging, and Access to Care. April 2014

Comparison of discrimination methods for the classification of tumors using gene expression data

Spatiotemporal models for disease incidence data: a case study

Analysis of Hearing Loss Data using Correlated Data Analysis Techniques

How to analyze correlated and longitudinal data?

A Brief Introduction to Bayesian Statistics

Selection of Linking Items

Missing data. Patrick Breheny. April 23. Introduction Missing response data Missing covariate data

Multivariate meta-analysis using individual participant data

Bayesian Models for Combining Data Across Subjects and Studies in Predictive fmri Data Analysis

Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology

Gianluca Baio. University College London Department of Statistical Science.

Multivariate Multilevel Models

BayesOpt: Extensions and applications

Where a licence is displayed above, please note the terms and conditions of the licence govern your use of this document.

Bayesian versus maximum likelihood estimation of treatment effects in bivariate probit instrumental variable models

Bivariate Models for the Analysis of Internal Nitrogen Use Efficiency: Mixture Models as an Exploratory Tool

Age-Adjusted US Cancer Death Rate Predictions

An application of a pattern-mixture model with multiple imputation for the analysis of longitudinal trials with protocol deviations

Bayesian Dose Escalation Study Design with Consideration of Late Onset Toxicity. Li Liu, Glen Laird, Lei Gao Biostatistics Sanofi

Meta-analysis using individual participant data: one-stage and two-stage approaches, and why they may differ

Bayesian Inference. Thomas Nichols. With thanks Lee Harrison

Multilevel IRT for group-level diagnosis. Chanho Park Daniel M. Bolt. University of Wisconsin-Madison

Modern Regression Methods

Heritability. The Extended Liability-Threshold Model. Polygenic model for continuous trait. Polygenic model for continuous trait.

Historical controls in clinical trials: the meta-analytic predictive approach applied to over-dispersed count data

Estimation of Area under the ROC Curve Using Exponential and Weibull Distributions

Bayesian Semiparametric Estimation of Cancer-specific Age-at-onset Penetrance with Application to. Li-Fraumeni Syndrome

Emerging Themes in Epidemiology. Open Access RESEARCH ARTICLE. Dan Li 1, Ruth Keogh 2, John P. Clancy 3 and Rhonda D.

Advanced IPD meta-analysis methods for observational studies

How to interpret scientific & statistical graphs

Some alternatives for Inhomogeneous Poisson Point Processes for presence only data

A Comparison of Methods for Determining HIV Viral Set Point

Information Systems Mini-Monograph

Bayesian approaches to handling missing data: Practical Exercises

The North Carolina Health Data Explorer

Reveal Relationships in Categorical Data

Applications. DSC 410/510 Multivariate Statistical Methods. Discriminating Two Groups. What is Discriminant Analysis

A Diabetes minimal model for Oral Glucose Tolerance Tests

GENERALIZED ESTIMATING EQUATIONS FOR LONGITUDINAL DATA. Anti-Epileptic Drug Trial Timeline. Exploratory Data Analysis. Exploratory Data Analysis

Package xmeta. August 16, 2017

Appendix 1: Systematic Review Protocol [posted as supplied by author]

Memorial Sloan-Kettering Cancer Center

Live WebEx meeting agenda

Analysis of Vaccine Effects on Post-Infection Endpoints Biostat 578A Lecture 3

An application of topological relations of fuzzy regions with holes

Statistical Tolerance Regions: Theory, Applications and Computation

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

Econometric Game 2012: infants birthweight?

Statistical Models for Enhancing Cross-Population Comparability

Linking Errors in Trend Estimation in Large-Scale Surveys: A Case Study

County-Level Small Area Estimation using the National Health Interview Survey (NHIS) and the Behavioral Risk Factor Surveillance System (BRFSS)

Supplementary Material Estimating the Distribution of Sensorimotor Synchronization Data: A Bayesian Hierarchical Modeling Approach Rasmus Bååth

Metropolitan and Micropolitan Statistical Area Cancer Incidence: Late Stage Diagnoses for Cancers Amenable to Screening, Idaho

Incidence of Cancers Associated with Modifiable Risk Factors and Late Stage Diagnoses for Cancers Amenable to Screening Idaho

Transcription:

Modelling Spatially Correlated Survival Data for Individuals with Multiple Cancers Dipak K. Dey, Ulysses Diva and Sudipto Banerjee Department of Statistics University of Connecticut, Storrs. March 16, 2007 1 Modeling Multivariate Survival Data

Introduction Motivation and Research Objectives Spatial data is used in a variety of disciplines current focus is on spatially referenced survival data. 2 Modeling Multivariate Survival Data

Introduction Motivation and Research Objectives Spatial data is used in a variety of disciplines current focus is on spatially referenced survival data. Accounting for spatial correlation provides additional insights Failure to account for such association could potentially lead to spurious and sometimes misleading results 2 Modeling Multivariate Survival Data

Introduction Motivation and Research Objectives Spatial data is used in a variety of disciplines current focus is on spatially referenced survival data. Accounting for spatial correlation provides additional insights Failure to account for such association could potentially lead to spurious and sometimes misleading results Spatial variation in survival patterns often reveal underlying lurking factors, which help in better understanding of the disease patterns. 2 Modeling Multivariate Survival Data

Introduction Motivation and Research Objectives Spatial data is used in a variety of disciplines current focus is on spatially referenced survival data. Accounting for spatial correlation provides additional insights Failure to account for such association could potentially lead to spurious and sometimes misleading results Spatial variation in survival patterns often reveal underlying lurking factors, which help in better understanding of the disease patterns. Spatial-survival models attempt to map spatial frailties 2 Modeling Multivariate Survival Data

Introduction Motivation and Research Objectives Spatial data is used in a variety of disciplines current focus is on spatially referenced survival data. Accounting for spatial correlation provides additional insights Failure to account for such association could potentially lead to spurious and sometimes misleading results Spatial variation in survival patterns often reveal underlying lurking factors, which help in better understanding of the disease patterns. Spatial-survival models attempt to map spatial frailties Data available on patients with several cancers, where each cancer can potentially have its own spatial distribution. 2 Modeling Multivariate Survival Data

Introduction Motivation and Research Objectives Spatial data is used in a variety of disciplines current focus is on spatially referenced survival data. Accounting for spatial correlation provides additional insights Failure to account for such association could potentially lead to spurious and sometimes misleading results Spatial variation in survival patterns often reveal underlying lurking factors, which help in better understanding of the disease patterns. Spatial-survival models attempt to map spatial frailties Data available on patients with several cancers, where each cancer can potentially have its own spatial distribution. How should we model these multiple spatial frailties? 2 Modeling Multivariate Survival Data

The SEER database Introduction SEER (Surveillance Epidemiology and End Results) database of the National Cancer Institute records primary cancer types First primary is followed up with subsequent cancers. GIS-link: Each patient s county of residence is available 3 Modeling Multivariate Survival Data

The SEER database Introduction SEER (Surveillance Epidemiology and End Results) database of the National Cancer Institute records primary cancer types First primary is followed up with subsequent cancers. GIS-link: Each patient s county of residence is available Survival models for multiple cancers are sought for practical reasons: To assess survival from multiple primary cancers simultaneously To adjust survival rates from a specific primary cancer in the presence of other primary cancers 3 Modeling Multivariate Survival Data

The SEER database Multiple response data Date of Date of Survival Patient ID diagnosis Cancer type end-point Status Time (months) 1234 March, 1991 Colorectal May, 1994 Dead 38 1234 August, 1992 Stomach May, 1994 Dead 21 1234 February, 1994 Pancreas May, 1994 Dead 3 The SEER (Surveillance Epidemiology and End Results) database lists cases of multiple cancers by cancer types diagnosed. Together with medicare data one can extract an individual patient s complete medical history. Simple example: The patient in the above table Patient specific information available in the registry includes gender, race, marital status, and their county of residence (provided the state belongs to the SEER registry). 4 Modeling Multivariate Survival Data

The SEER database Multiple response data Assume there are I counties. In the i th county we observe n i patients. Total no. of patients: N = I i=1 n i Ordered pair (i, j): j th individual from the i th county. Each patient has been diagnosed with at least one of K pre-fixed cancer types. t ijk is the survival time for the (i, j) th individual from his diagnosis with the k th type of cancer. 5 Modeling Multivariate Survival Data

The SEER database Multiple response data Assume there are I counties. In the i th county we observe n i patients. Total no. of patients: N = I i=1 n i Ordered pair (i, j): j th individual from the i th county. Each patient has been diagnosed with at least one of K pre-fixed cancer types. t ijk is the survival time for the (i, j) th individual from his diagnosis with the k th type of cancer. Some special care is needed: Not all patients develop all the K types of cancer, List possible cancers {1, 2,..., K} and form patient-specific index set C (i,j) {1, 2,..., K}. Example: if patient 1234 in Section is the 10th individual from the 8th county and if colorectal, stomach and pancreas are cancer types 1, 3 and 5, in {1, 2,..., K} then C (8,10) = {1, 3, 5}. 5 Modeling Multivariate Survival Data

Survival modelling framework Multiple responses Survival times will be correlated for similar county-patient-cancer combinations. We capture these correlations by introducing appropriate frailties. 6 Modeling Multivariate Survival Data

Survival modelling framework Multiple responses Survival times will be correlated for similar county-patient-cancer combinations. We capture these correlations by introducing appropriate frailties. Multiple response frailty modelling h (t ijk ) = h 0 (t ijk ) exp ( x T ijk β + u (i,j) + v k + φ ik ) 6 Modeling Multivariate Survival Data

Survival modelling framework Multiple responses Survival times will be correlated for similar county-patient-cancer combinations. We capture these correlations by introducing appropriate frailties. Multiple response frailty modelling h (t ijk ) = h 0 (t ijk ) exp ( x T ijk β + u (i,j) + v k + φ ik ) u (i,j) : frailty for the (i, j) th patient (these may be looked upon as patient effects nested within counties). v k : frailty for the k th cancer type. φ ik be the frailty for the k th cancer type nested within the i th county. x ijk denotes the patient-cancer specific covariates. 6 Modeling Multivariate Survival Data

Bayesian hierarchical modelling The data likelihood Hierarchical models combine information from the data likelihood and priors on parameters. The data likelihood I n i i=1 j=1 k C (i,j) [S (t ijk x ijk, β, η)] [ h 0 (t ijk ; η) exp ( x T ijk β + u (i,j) + v k + φ ik )] δij S (t ijk x ijk, β, η) is the survival function evaluated at t ijk conditional on the parameters and observed covariates. 7 Modeling Multivariate Survival Data

Bayesian hierarchical modelling Modelling the frailties Patient effects nested within counties: Priors on u ij indep u (i,j) N ( 0, σi 2 ) Cancer effects: Priors on v k s v = (v 1,..., v K ) T N (0, Λ), where Λ is a K K (unknown) covariance matrix, which is assigned an Inverse-Wishart prior. 8 Modeling Multivariate Survival Data

Bayesian hierarchical modelling Spatial Frailties Spatial Effects: Form Φ = ( φ T 1,...φ T ) T I, where φ i = (φ i1,..., φ ik ) T is the collection of the spatial frailties for the K cancers within the i th county. Priors on φ ik s Φ = ( φ T 1,...φ T ) T I N (0, ΣW (α) Λ). Here Σ W (α) = (Diag (m i ) αw ) 1. m i is the number of neighbors of county i W is the adjacency matrix of the graph representing our region. Only α is random in Σ W (α). Note: α (0, 1) ensures a proper M CAR distribution. 9 Modeling Multivariate Survival Data

Bayesian hierarchical modelling Remarks on likelihood modelling h 0 (t) denotes baseline hazard function modelled semiparametrically as a mixture of Beta functions (see, e.g., Gelfand and Mallick, 1995 and Ibrahim et al., 2001) as cancer-specific (in which case we write h 0k (t)) or county-specific (in which case we write h 0i (t)). In principle we can also include subject-cancer interaction terms of the form w (ij)k, which may reveal cancer-specific variation in patient frailties. However, in practice reliably estimating these higher order interaction terms becomes difficult, especially with data such as ours, where many subjects die before contracting fewer than three cancers. 10 Modeling Multivariate Survival Data

Bayesian hierarchical modelling Spatial modelling extensions We are using the same Λ to model the MCAR and the cancer main effects: we are modelling the dispersion matrix of the space-cancer interactions, Φ, as a Kronecker product of spatial dispersion and cancer dispersion. 11 Modeling Multivariate Survival Data

Bayesian hierarchical modelling Spatial modelling extensions We are using the same Λ to model the MCAR and the cancer main effects: we are modelling the dispersion matrix of the space-cancer interactions, Φ, as a Kronecker product of spatial dispersion and cancer dispersion. We may generalize the MCAR (α, Λ) to MCAR ((α 1,..., α K ), Λ), enriching our model by assigning a different spatial smoothness parameter to each cancer type. (Carlin and Banerjee, 2003) 11 Modeling Multivariate Survival Data

Bayesian hierarchical modelling Spatial modelling extensions We are using the same Λ to model the MCAR and the cancer main effects: we are modelling the dispersion matrix of the space-cancer interactions, Φ, as a Kronecker product of spatial dispersion and cancer dispersion. We may generalize the MCAR (α, Λ) to MCAR ((α 1,..., α K ), Λ), enriching our model by assigning a different spatial smoothness parameter to each cancer type. (Carlin and Banerjee, 2003) Yet another modification considers v i N (0, Λ i ) (not identically distributed), whereupon each county has its own cancer-covariance pattern. 11 Modeling Multivariate Survival Data

Bayesian hierarchical modelling Spatial modelling extensions We are using the same Λ to model the MCAR and the cancer main effects: we are modelling the dispersion matrix of the space-cancer interactions, Φ, as a Kronecker product of spatial dispersion and cancer dispersion. We may generalize the MCAR (α, Λ) to MCAR ((α 1,..., α K ), Λ), enriching our model by assigning a different spatial smoothness parameter to each cancer type. (Carlin and Banerjee, 2003) Yet another modification considers v i N (0, Λ i ) (not identically distributed), whereupon each county has its own cancer-covariance pattern. In addition, one may compare MCAR (α, Λ) to MCAR (α, (Λ 1,..., Λ I )), retaining a common smoothness parameter for the different cancers (even put α = 1) but allow different covariance functions for different counties. 11 Modeling Multivariate Survival Data

Bayesian hierarchical modelling Spatial modelling extensions However, the above generalizations of the MCAR (α, Λ) will often require substantial information on multiple cancer patients. Unfortunately, such information is rare in practice. For instance, in the SEER database less than one percent of the patients have moderately large monitoring times with more than two different types of cancers. 12 Modeling Multivariate Survival Data

Bayesian hierarchical modelling Spatial modelling extensions However, the above generalizations of the MCAR (α, Λ) will often require substantial information on multiple cancer patients. Unfortunately, such information is rare in practice. For instance, in the SEER database less than one percent of the patients have moderately large monitoring times with more than two different types of cancers. Hence general models are identifiable only via priors. 12 Modeling Multivariate Survival Data

Bayesian hierarchical modelling Spatial modelling extensions However, the above generalizations of the MCAR (α, Λ) will often require substantial information on multiple cancer patients. Unfortunately, such information is rare in practice. For instance, in the SEER database less than one percent of the patients have moderately large monitoring times with more than two different types of cancers. Hence general models are identifiable only via priors. Therefore, we restrict our subsequent attention to the M CAR(α, Λ) specification and compare a spatially independent model with basically the same structure by replacing the Σ W (α) with a diagonal matrix which does not account for the spatial structure of the region. 12 Modeling Multivariate Survival Data

Bayesian hierarchical modelling Spatial modelling extensions However, the above generalizations of the MCAR (α, Λ) will often require substantial information on multiple cancer patients. Unfortunately, such information is rare in practice. For instance, in the SEER database less than one percent of the patients have moderately large monitoring times with more than two different types of cancers. Hence general models are identifiable only via priors. Therefore, we restrict our subsequent attention to the M CAR(α, Λ) specification and compare a spatially independent model with basically the same structure by replacing the Σ W (α) with a diagonal matrix which does not account for the spatial structure of the region. Now the independent frailties for the K cancers within the i th county, denoted by Φ, has a N ( 0, τ 2 I I I Λ ) distribution where τ 2 is a scale parameter. 12 Modeling Multivariate Survival Data

Numerical computations Models Model Description Model Model Description Model I Full MCAR (α, Λ) Model II MCAR (α, Λ) Without Patient Frailties {u (i,j) } Model III Full Spatial Independence Model Model IV Spatial Independence Model Without Patient Frailties {u (i,j) } 13 Modeling Multivariate Survival Data

Numerical computations MCAR model: MCAR (α, Λ) We design an appropriate Gibbs sampler for the above hierarchical model. β have full conditional distributions that are log-concave (with either flat or vague Gaussian priors) and can be updated using the Adaptive Rejection Sampling (ARS) algorithm (Gilks and Wild, 1992) or a Metropolis step. The same is true for the patient frailties { u (i,j) }, the cancer frailties {v k } and the spatial frailties {φ ik }. The mixture weights, η, in the baseline hazard modelling needs to be updated using a Metropolis step. The full conditionals for each of the σi 2 inverted-gamma distributions are conjugate Λ has a conjugate inverted-wishart distribution, which follows from the following Lemma: 14 Modeling Multivariate Survival Data

Numerical computations The Lemma Lemma: Lemma: Let A and B be I I and K K matrices and let x be any vector of length IK. Then, there exists a K K matrix C and an I I matrix D such that x T (A B) x = tr (CB) = tr (DA). 15 Modeling Multivariate Survival Data

Illustrations SEER data description Source: 1973-2001 SEER Public Use Incidence Database on patients from the state of Iowa diagnosed with colorectal cancer. In the dataset, 28% of the patients survival times were censored, 44% of patients were females, 69% of patients had at most two primary cancers, 47% had colon cancer before rectal cancer, majority of cases (78% for colon and 75% for rectal cancer) were in the local or regional stage, and most of the cases were diagnosed when patients were 65 years or older (76% for colon and 74% for rectal cancer). 16 Modeling Multivariate Survival Data

Illustrations SEER data description Colorectal cancer involves two gastrointestinal related organs. Good motivation for using this cancer type to illustrate the methodology. Diagnosis: two cancer types (colon and rectum) with survival times recorded in months from diagnosis up to study termination (censored) or death (failure). 1083 patients; represent 96 of the 99 counties in Iowa. Age at diagnosis for each type of cancer categorized: < 55, 55 65, 65 75, and > 75. Stage of each type of cancer: in situ or unstaged, local, regional, or distant. Others: total number of primary cancers, dichotomized as 0 (nprimes 2) and 1 (nprimes > 2), and gender. An indicator variable, colon first, is used to capture the effect of the order of occurrence. 17 Modeling Multivariate Survival Data

Illustrations Parameter Estimates: Regressors Parameter Model I Model II Model III Model IV Patient-level Gender=Female -0.36 (-0.36, -0.36) -0.36 (-0.36, -0.36) -0.36 (-0.36, -0.36) -0.36 (-0.36, -0.36) Number of primaries -0.30 (-0.30, -0.30) -0.30 (-0.30, -0.30) -0.30 (-0.30, -0.30) -0.30 (-0.30, -0.30) Colon first 0.04 (0.04, 0.04) 0.04 (0.04, 0.04) 0.04 (0.04, 0.04) 0.04 (0.04, 0.04) Colon cancer Frailty, ν 1-2.40 (-2.90, -1.98) -2.46 (-2.92, -2.01) -1.70 (-1.86, -1.56) -1.80 (-2.05, -1.57) Stage In situ or Unstaged Local -0.25 (-0.33, -0.18) -0.25 (-0.33, -0.18) -0.25 (-0.33, -0.18) -0.25 (-0.32, -0.18) Regional 0.03 (-0.05, 0.10) 0.03 (-0.04, 0.10) 0.03 (-0.05, 0.10) 0.03 (-0.04, 0.11) Distant 1.12 (1.02, 1.21) 1.12 (1.02, 1.21) 1.12 (1.02, 1.21) 1.12 (1.02, 1.20) Age < 55 55 < 65 0.47 (0.36, 0.58) 0.47 (0.36, 0.58) 0.47 (0.36, 0.57) 0.47 (0.35, 0.57) 65 < 75 0.74 (0.64, 0.84) 0.74 (0.64, 0.84) 0.74 (0.64, 0.84) 0.74 (0.64, 0.84) 75 1.16 (1.07, 1.27) 1.16 (1.06, 1.26) 1.16 (1.06, 1.26) 1.16 (1.06, 1.26) Rectal cancer Frailty, ν 2-2.17 (-2.63, -1.72) -2.25 (-2.73, -1.80) -1.88 (-2.07, -1.69) -2.07 (-2.37, -1.75) Stage In situ or Unstaged Local -0.40 (-0.81, 0.02) -0.40 (-0.81, 0.04) -0.40 (-0.80, 0.02) -0.40 (-0.82, 0.02) Regional 0.07 (-0.36, 0.49) 0.07 (-0.35, 0.47) 0.07 (-0.35, 0.49) 0.07 (-0.35, 0.48) Distant 1.44 (1.16, 1.72) 1.44 (1.16, 1.72) 1.44 (1.16, 1.71) 1.44 (1.17, 1.73) Age at diagnosis < 55 55 < 65 0.36 (0.05, 0.66) 0.35 (0.05, 0.65) 0.36 (0.05, 0.66) 0.35 (0.04, 0.64) 65 < 75 0.76 (0.36, 1.17) 0.76 (0.35, 1.16) 0.76 (0.36, 1.18) 0.76 (0.36, 1.17) 75 1.28 (0.86, 1.70) 1.28 (0.87, 1.71) 1.28 (0.87, 1.72) 1.28 (0.86, 1.70) Reference categories. 18 Modeling Multivariate Survival Data

Illustrations Parameter Estimates: Other parameters Parameter Model I Model II Model III Model IV α 0.95 (0.81, 1.00) 0.95 (0.81, 1.00) η 1 0.90 (0.88, 0.92) 0.90 (0.88, 0.92) 0.90 (0.88, 0.92) 0.90 (0.88, 0.92) η 2 0.03 (0.01, 0.04) 0.03 (0.01, 0.04) 0.02 (0.01, 0.04) 0.02 (0.01, 0.04) η 3 0.02 (0.01, 0.04) 0.03 (0.01, 0.04) 0.03 (0.01, 0.04) 0.03 (0.01, 0.04) η 4 0.03 (0.01, 0.04) 0.03 (0.01, 0.04) 0.02 (0.01, 0.04) 0.02 (0.01, 0.04) η 5 0.03 (0.01, 0.04) 0.03 (0.01, 0.04) 0.03 (0.01, 0.04) 0.02 (0.01, 0.04) Λ 11 1.29 (0.59, 2.11) 1.30 (0.59, 2.12) 0.41 (0.20, 0.65) 0.41 (0.19, 0.65) Λ 22 0.54 (0.20, 0.91) 0.57 (0.23, 0.96) 0.27 (0.13, 0.42) 0.29 (0.13, 0.46) Λ 12 1.78 (0.90, 2.74) 1.84 (0.95, 2.83) 0.87 (0.48, 1.30) 0.98 (0.54, 1.47) ρ 0.36 (0.20, 0.53) 0.37 (0.21, 0.54) 0.45 (0.31, 0.60) 0.46 (0.31, 0.62) Precision is 1 10 3. 19 Modeling Multivariate Survival Data

Model Comparisons CPO and ALMPL score The conditional predictive ordinate (CPO) and the average log-marginal pseudo-likelihood (ALMPL) were used. (Gelfand and Dey, 1994, and Chen et al., 2000) CPO statistic CP O ij = f ( D ij x ij, D ( ij)) ( = ) 1 1 π (Θ D) Θ f (D ij x ij, Θ) D ij = (i, j) th observation, D is the complete data and Θ is the collection of parameters. ALMPL ALMP L = 1 N log CP O ij. ij Larger values of ALMPL indicate better model fit. 20 Modeling Multivariate Survival Data

Model Comparisons DIC score The modified Deviance Information Criterion (DIC) proposed by Celeux et al. (2006) was also used for model comparison. The DIC for a particular model with data y, random effects Z and parameters Ω and likelihood f ( y Ω ) is given by DIC score DIC = 4E Ω,Z [ log f ( y, Z Ω ) y ] +2E Z [ log p ( y, Z EΩ [ Ω y, Z ])] This essentially integrates out the random effects. Lower values of DIC indicate better models. Measure of complexity [ ( p D = DIC + 2E Ω,Z log f y, Z Ω ) ] y Smaller values pointing to models with lesser complexity. 21 Modeling Multivariate Survival Data

Model Comparisons Model Comparison Table Model ALMPL DIC p D Model I -5.92 90.40 Model II -6.03-247.81 2.71 Model III -6.96 5761.58 276.75 Model IV -8.48 3588.42 149.13 DIC difference from Model I (smaller indicates better model). 22 Modeling Multivariate Survival Data

Figures Index plots of CPO s Index plots of the log CP Oij A log CP OB ij for each model pair. Positive values indicate support for model A over model B while negative values indicate the opposite. From left to right, top to bottom: I-II, I-III, I-IV, II-III, II-IV, III-IV. Y-axes range is from -1.0 to 7.0 (comparisons involving Model IV were truncated at 7.0 to facilitate presentation: 23 Modeling Multivariate Survival Data

Figures Cancer frailties Distribution of cancer-specific frailties (y-axes) under the different models. Positive frailties indicate higher relative risks while negative values indicate the opposite: 24 Modeling Multivariate Survival Data

Figures Patient-specific frailties Distribution of average patient-specific frailties (y-axes) from different counties (x-axes) in Iowa under Model I (top) and Model III (bottom). Positive frailties indicate higher relative risks while negative values indicate the opposite: 25 Modeling Multivariate Survival Data

Figures Spatial frailties Distribution of spatial frailties (y-axes) for colon and rectal cancers from different counties (x-axes) in Iowa under best model. Positive frailties indicate higher relative risks while negative values indicate the opposite: 26 Modeling Multivariate Survival Data

Figures Spatial frailties: choropleth map Spatial frailties for colon and rectal cancers in Iowa under the different models. Color change from blue to red indicates an increase from negative to positive spatial frailties. Cutoff values are one standard deviation units from the mean of each cancer type in each model. Locations of urban areas are also indicated to potentially account for clustering: 27 Modeling Multivariate Survival Data

Concluding Remarks Remarks We presented a methodology for modelling spatially correlated multivariate cancer survival data, illustrated using colon and rectal cancer data for the state of Iowa obtained from the SEER database. 28 Modeling Multivariate Survival Data

Concluding Remarks Remarks We presented a methodology for modelling spatially correlated multivariate cancer survival data, illustrated using colon and rectal cancer data for the state of Iowa obtained from the SEER database. The results suggest that there is positive correlation between the hazard rates of colon and rectal cancers. This information may not be available or interpretable when modelling the cancers independently. 28 Modeling Multivariate Survival Data

Concluding Remarks Remarks We presented a methodology for modelling spatially correlated multivariate cancer survival data, illustrated using colon and rectal cancer data for the state of Iowa obtained from the SEER database. The results suggest that there is positive correlation between the hazard rates of colon and rectal cancers. This information may not be available or interpretable when modelling the cancers independently. Other mechanisms for handling spatial covariation, such as the shared component approach presented by Held and Best (2001) applied to disease mapping of oral and esophageal cancers, are also possible. The basic modelling framework may be extended to different hazard structures if the need arises. 28 Modeling Multivariate Survival Data

Concluding Remarks Remarks We presented a methodology for modelling spatially correlated multivariate cancer survival data, illustrated using colon and rectal cancer data for the state of Iowa obtained from the SEER database. The results suggest that there is positive correlation between the hazard rates of colon and rectal cancers. This information may not be available or interpretable when modelling the cancers independently. Other mechanisms for handling spatial covariation, such as the shared component approach presented by Held and Best (2001) applied to disease mapping of oral and esophageal cancers, are also possible. The basic modelling framework may be extended to different hazard structures if the need arises. 28 Modeling Multivariate Survival Data

Concluding Remarks Thank you! 29 Modeling Multivariate Survival Data