Spatiotemporal models for disease incidence data: a case study

Size: px
Start display at page:

Download "Spatiotemporal models for disease incidence data: a case study"

Transcription

1 Spatiotemporal models for disease incidence data: a case study Erik A. Sauleau 1,2, Monica Musio 3, Nicole Augustin 4 1 Medicine Faculty, University of Strasbourg, France 2 Haut-Rhin Cancer Registry 3 University of Cagliari, Italy 4 Department of Mathematics, University of Bath, UK Modelling complex environmental spatial and temporal data June 2009, Bath

2 Outline Cancer registries Dataset and analyses aims Known effects Our data Works in progress and problems

3 Cancer registries: background Registries collect exhaustively individual data on cases of cancer Routine gathering or ad hoc epidemiological studies Routine collect Active Sources: medical records wards, pathology, GP,... For example: date of birth, date of diagnosis, sex, address Epidemiological studies back to medical records (or patients) Haut-Rhin cancer registry Covering a "departement" of 750,000 inhabitants Website

4 Cancer registries: ARER68

5 Outline Cancer registries Dataset and analyses aims Known effects Our data Works in progress

6 Data and analyses aims: incidence or survival Main aims 1. Survival: time to event(s) Death Complication, stage (clinical, biological,... ), metastases, second cancer, recurrence, Incidence: new cases 3. Mortality = incidence survival 4. Estimation of prevalence Two type of data 1. Individual data for survival Crude survival, relative survival Cox model log(h(t,x)) = log(h 0 (t)) + β x 2. Aggregated data for incidence why?

7 Short digression: standardized incidence ratio Measures of epidemiological risks Excess of risk comparing with the "at-risk" local population Difference of risks, relative risk Here the relative risk is the Standardized Incidence Ratio Ratio of the observed cases in each geographical unit on the expected cases: SIR i = O i E i Expected cases are the result of the exposition of the population at-risk to a certain risk E i = ˆp i N i What about these ˆpi? 1. Global risk in the study region i, ˆp i = ˆp = O. N. 2. Adjusted risk on certain categorical variable(s)

8 Outline Cancer registries Dataset and analyses aims Known effects Our data Works in progress and problems

9 Known effects: age Figure: Example of lung cancer All localizations of cancer except some rare sites (testis) and pediatric cancers Under-reporting for older age categories? Models P-spline smoother Indicator variables for categories

10 Known effects: period Depends on cancer localisation Different causes Spontaneous evolution Environmental factor Screening Risky behaviors Figure: SIR for breast cancer along time

11 Known effects: period models Different ways P-spline smoother Trend (linear or quadratic) Indicator variables for categories Often aggregation of several years: example 3-years classes Variance stabilisation Comparisons between registries Alignement with age categories

12 Known effects: cohort cohort = period age identifiability problems Age-period-cohort models Figure: Basal cell carcinoma: APC plot for male

13 Known effects: gender Figure: Interaction period-gender on lung cancer (WHO standardized incidence) Highly depends on localisation of cancer No sex effect in colo-rectal cancer Except specific localisation (testis, prostate,... ) Model: fixed effect

14 Known effects: spatial Highly depends on localisation of cancer Interpretation 1. Survival quality of care 2. Incidence proxy for unobserved environmental exposure What variable? 1. Survival: exact location or geographical unit of residence 2. Incidence Difficulties with exact location (problem with expected) Geographical unit of residence and centroids

15 Known effects: spatial models 1. Coordinates of geographical unit centroid Gaussian random field Geospline (bidimensional smoother) Trend (linear or quadratic) 2. Geographical unit Bayesian prior: convolution prior Conditional autoregressive prior for autocorrelation Exchangeable normal for heterogeneity

16 Known effects (?): interactions complexity of cancer aetiology Interaction age-period and/or cohort effect P-spline or indicator variable for cohort effect Smoothed age-period surface (tensor product) Gender-period and gender-age Varying coefficients model (VCM): f 1 (t) + s f 2 (t) f 1 (t) is the basal time effect (for s = 0) and f 2 (t) is the added time effect for s = 1 Temporal slope and intercept by gender Space-period, age-space-period and gender-space-period: VCM or multidimensional smoother

17 Outline Cancer registries Dataset and analyses aims Known effects Our data Works in progress and problems

18 Our data: dataset ENT data: ear-nose-throat cancer Alcohol and tobacco consumption Latency between exposure and cancer Covariates: Gender: 0 for female and 1 for male Age into 9 groups: [0-45 years), 5-year intervals and [80 or more] Time: date of diagnosis categorized in year, from 1988 to 2005 Geographical unit of residence, with centroid coordinates Population counts 1990 census for 1988 to census for 1998 to 2002 Linear interpolation at 1993 and 1996 for and for census for 2003 to 2005 Adjusted risk on gender

19 Our data: objectives Compare models for detecting effects of time, space, sex and/or interactions Space-time trend and interaction Account for covariates with possible non linear effects, such as age Models are compared using the AIC criterion Analyses carried out using packages mgcv and geor for R

20 Our data: exploratory analysis Total number of cases: 3,850, 87% male Figure: Raw SIRs (with 95% CI) by year and gender

21 Our data: exploratory analysis Figure: Raw SIRs (with 95% CI) by age and gender

22 Our data: exploratory analysis

23 Our data: model for data Notation Indices: s for sex (0,1), a for age category (1-9), t for year ( ) and i for GU (1-377) Number of cases: O O sati Population at risk: N N sati Estimation of adjusted risk on gender: ˆp s The model where O sati P (E sati e µ sati ) log(e(o sati )) = log(e sati ) + µ sati 1. E. are expected cases, calculated as ˆp. N. 2. log(e. ) acts as offset in the Poisson regression 3. µ. is the Poisson mean to be modelled

24 Our data: our spatiotemporal models O sati P (E sati e µ sati ) 1. Models for age, time and gender Model µ = M00 f 1 (a) Cubic P-spline for age (9 knots) M01 f 1 (a) + sβ s + tβ t + stβ st Fixed main effects and interaction M02 f 1 (a) + f 2 (t) Cubic P-spline for year (18 knots) M03 f 1 (a,t) Tensor product (9 and 18 knots) M04 f 1 (a) + sβ s + f 2 (t) Cubic P-spline for year (18 knots) and fixed effect for gender M05 f 1 (a) + s f 2 (t) VCM model

25 Our data: our spatiotemporal models O sati P (E sati e µ sati ) 2. Models for space and time Model µ = M00 f 1 (a) Cubic P-spline for age (9 knots) M02 f 1 (a) + f 2 (t) Cubic P-spline for year (18 knots) M11 f 1 (a) + f 3 (X,Y) Tensor product (thin plate spline) M12 f 1 (a) + f 2 (t) + f 3 (X,Y) M02+M11 M13 f 1 (a) + f 4 (X,Y,t) Tensor product (thin plate spline for space and cubic spline for year)

26 Our data: models results 1. Models for age, time and gender Model µ = AIC R 2 edf f (a) f (t) M00 f 1 (a) 22, M01 f 1 (a) + sβ s + tβ t + stβ st 22, M02 f 1 (a) + f 2 (t) 22, M03 f 1 (a,t) 22, M04 f 1 (a) + sβ s + f 2 (t) 22, M05 f 1 (a) + s f 2 (t) 21, F: M: 7.191

27 Our data: models results Cohort effect views as an age-time smoothed surface (type="response")

28 Our data: models results 2. Models for space and time Model µ = AIC R 2 edf f (a) f (t) f (X,Y) M00 f 1 (a) 22, M02 f 1 (a) + f 2 (t) 22, M11 f 1 (a) + f 3 (X,Y) 22, M12 f 1 (a) + f 2 (t)+ 22, f 3 (X,Y) M13 f 1 (a) + f 4 (X,Y,t) Memory crashes

29 Our data: models results 3. Model for age, gender, time and space Model µ = AIC R 2 edf f (a) f (t) f (X,Y) M05 f 1 (a) + s f 2 (t) 21, F: M: M12 f 1 (a) + f 2 (t)+ 22, f 3 (X,Y) M20 f 1 (a) + f 3 (X,Y) 21, F: s f 2 (t) M: 7.182

30 Our data: results model M20 Parametric coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) <2e-16 *** Approximate significance of smooth terms: edf Ref.df Chi.sq p-value s(age) < 2e-16 *** te(x,y) < 2e-16 *** s(an):sex e-05 *** s(an):sex < 2e-16 *** R-sq.(adj) = Deviance explained = 27.9% UBRE score = Scale est. = 1 n =

31 Our data: results age effect

32 Our data: results period effect by gender

33 Our data: results spatial effect

34 Our data: results spatial effect Empirical semi-variogram of Pearson residuals for 1988, 1996 and 2005

35 Our data: conclusions First comprehensive analysis GAMs provide framework for spatio-temporal modeling tool of different nature to estimate space-time trends with confidence bands the model allows to address scientific questions through the inclusion of covariates

36 Outline Cancer registries Dataset and analyses aims Known effects Our data Works in progress and problems

37 Works in progress and problems Better handling of memory in R Models comparisons Likelihood ratio test nested models Penalized likelihood like AIC or BIC same data 1. Model with age effect: data aggregated on all variables except age 2. Model with age and sex effects: data aggregated on all variables except age and sex two times bigger dataset

38 Works in progress: ZIP models Due to covariates, the ENT dataset counts were spread over 122,148 cells with 119,324 empty cells (97%) Higher incidence of zeros than expected under Poisson distribution zero-inflated Poisson distribution { ω + (1 ω)e µ if O = 0 Pr(O, µ,ω) = (1 ω)e µ µ O O! if O > 0 Variance is two times mean (0.070/0.032) more appropriate distributions: quasipoisson, negative binomiale, Tweedie

39 Works in progress: multivariate analyses The problem ENT and lung cancers share common risk factors No individual measure of consumption Use geographical unit for proxy (ecological bias) Specific risk factor for ENT cancer? An idea Use SIR for lung cancer as covariate Estimation of a VCM: I(SIR lung > 1) f spat (X,Y)

40 Works in progress: multivariate analyses A second (and better) idea Multivariate approach A model where ( log O (1) sati O (2) sati ) = log Õ sati P ( Ẽ sati e µ sati ) ( E (1) sati E (2) sati Bayesian models (shared components) ) ( f + f 1 (a) + + (1) (X,Y) f (2) (X,Y) )

41 Works in progress: more complex correlation structure Time effect O sati P (E sati e µ sati ) log(o sati ) = log(o sati ) + µ sati + ε sati where ε N(0,Λ) and covariance matrix Λ modelled as a first order autoregressive (AR1) process on time Memory problems Spatial effect Mimic an autocorrelation like in convolution prior model Replace thin plate splines with a "kriging component" memory problem

42 Works in progress Monotonic splines on time and spatial effects!!!

43 Thank you for your attention

National Cancer Institute

National Cancer Institute U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICES Trends and patterns of childhood cancer incidence in the US, 1995 2010 Li Zhu National Cancer Institute Linda Pickle StatNet Consulting, LLC Joe Zou Information

More information

1. Objective: analyzing CD4 counts data using GEE marginal model and random effects model. Demonstrate the analysis using SAS and STATA.

1. Objective: analyzing CD4 counts data using GEE marginal model and random effects model. Demonstrate the analysis using SAS and STATA. LDA lab Feb, 6 th, 2002 1 1. Objective: analyzing CD4 counts data using GEE marginal model and random effects model. Demonstrate the analysis using SAS and STATA. 2. Scientific question: estimate the average

More information

Poisson regression. Dae-Jin Lee Basque Center for Applied Mathematics.

Poisson regression. Dae-Jin Lee Basque Center for Applied Mathematics. Dae-Jin Lee dlee@bcamath.org Basque Center for Applied Mathematics http://idaejin.github.io/bcam-courses/ D.-J. Lee (BCAM) Intro to GLM s with R GitHub: idaejin 1/40 Modeling count data Introduction Response

More information

Biostatistics II

Biostatistics II Biostatistics II 514-5509 Course Description: Modern multivariable statistical analysis based on the concept of generalized linear models. Includes linear, logistic, and Poisson regression, survival analysis,

More information

Regression analysis of mortality with respect to seasonal influenza in Sweden

Regression analysis of mortality with respect to seasonal influenza in Sweden Regression analysis of mortality with respect to seasonal influenza in Sweden 1993-2010 Achilleas Tsoumanis Masteruppsats i matematisk statistik Master Thesis in Mathematical Statistics Masteruppsats 2010:6

More information

Supplementary Figures

Supplementary Figures Supplementary Figures Supplementary Fig. 1: Map of the ten countries included in the analysis. Showing the first sub-national administrative boundaries in light grey, and DHS survey locations representing

More information

Ecological Statistics

Ecological Statistics A Primer of Ecological Statistics Second Edition Nicholas J. Gotelli University of Vermont Aaron M. Ellison Harvard Forest Sinauer Associates, Inc. Publishers Sunderland, Massachusetts U.S.A. Brief Contents

More information

NORTH SOUTH UNIVERSITY TUTORIAL 2

NORTH SOUTH UNIVERSITY TUTORIAL 2 NORTH SOUTH UNIVERSITY TUTORIAL 2 AHMED HOSSAIN,PhD Data Management and Analysis AHMED HOSSAIN,PhD - Data Management and Analysis 1 Correlation Analysis INTRODUCTION In correlation analysis, we estimate

More information

Applied Medical. Statistics Using SAS. Geoff Der. Brian S. Everitt. CRC Press. Taylor Si Francis Croup. Taylor & Francis Croup, an informa business

Applied Medical. Statistics Using SAS. Geoff Der. Brian S. Everitt. CRC Press. Taylor Si Francis Croup. Taylor & Francis Croup, an informa business Applied Medical Statistics Using SAS Geoff Der Brian S. Everitt CRC Press Taylor Si Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Croup, an informa business A

More information

Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology

Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology Sylvia Richardson 1 sylvia.richardson@imperial.co.uk Joint work with: Alexina Mason 1, Lawrence

More information

Overview. All-cause mortality for males with colon cancer and Finnish population. Relative survival

Overview. All-cause mortality for males with colon cancer and Finnish population. Relative survival An overview and some recent advances in statistical methods for population-based cancer survival analysis: relative survival, cure models, and flexible parametric models Paul W Dickman 1 Paul C Lambert

More information

Geographical differences in prevalences and mortality rates of COPD

Geographical differences in prevalences and mortality rates of COPD Geographical differences in prevalences and mortality rates of COPD 1 Where is Taiwan? 2 San Diego is Taiwan s neighbor on the map. In reality, it takes 18 hours to arrive Outline Brief introduction of

More information

Cancer Incidence Predictions (Finnish Experience)

Cancer Incidence Predictions (Finnish Experience) Cancer Incidence Predictions (Finnish Experience) Tadeusz Dyba Joint Research Center EPAAC Workshop, January 22-23 2014, Ispra Rational for making cancer incidence predictions Administrative: to plan the

More information

Data Analysis in Practice-Based Research. Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine

Data Analysis in Practice-Based Research. Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine Data Analysis in Practice-Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine Multilevel Data Statistical analyses that fail to recognize

More information

GENERALIZED ESTIMATING EQUATIONS FOR LONGITUDINAL DATA. Anti-Epileptic Drug Trial Timeline. Exploratory Data Analysis. Exploratory Data Analysis

GENERALIZED ESTIMATING EQUATIONS FOR LONGITUDINAL DATA. Anti-Epileptic Drug Trial Timeline. Exploratory Data Analysis. Exploratory Data Analysis GENERALIZED ESTIMATING EQUATIONS FOR LONGITUDINAL DATA 1 Example: Clinical Trial of an Anti-Epileptic Drug 59 epileptic patients randomized to progabide or placebo (Leppik et al., 1987) (Described in Fitzmaurice

More information

Supplementary Online Content

Supplementary Online Content Supplementary Online Content Neuhouser ML, Aragaki AK, Prentice RL, et al. Overweight, obesity, and postmenopausal invasive breast cancer risk: a secondary analysis of the Women s Health Initiative randomized

More information

Analyzing diastolic and systolic blood pressure individually or jointly?

Analyzing diastolic and systolic blood pressure individually or jointly? Analyzing diastolic and systolic blood pressure individually or jointly? Chenglin Ye a, Gary Foster a, Lisa Dolovich b, Lehana Thabane a,c a. Department of Clinical Epidemiology and Biostatistics, McMaster

More information

Trends in the Lifetime Risk of Developing Cancer in Ontario, Canada

Trends in the Lifetime Risk of Developing Cancer in Ontario, Canada Trends in the Lifetime Risk of Developing Cancer in Ontario, Canada Huan Jiang 1,2, Prithwish De 1, Xiaoxiao Wang 2 1 Surveillance and Cancer Registry, Analytic and Informatics, Cancer Care Ontario 2 Dalla

More information

Mammographic density and risk of breast cancer by tumor characteristics: a casecontrol

Mammographic density and risk of breast cancer by tumor characteristics: a casecontrol Krishnan et al. BMC Cancer (2017) 17:859 DOI 10.1186/s12885-017-3871-7 RESEARCH ARTICLE Mammographic density and risk of breast cancer by tumor characteristics: a casecontrol study Open Access Kavitha

More information

Modelling Spatially Correlated Survival Data for Individuals with Multiple Cancers

Modelling Spatially Correlated Survival Data for Individuals with Multiple Cancers Modelling Spatially Correlated Survival Data for Individuals with Multiple Cancers Dipak K. Dey, Ulysses Diva and Sudipto Banerjee Department of Statistics University of Connecticut, Storrs. March 16,

More information

Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties

Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties Bob Obenchain, Risk Benefit Statistics, August 2015 Our motivation for using a Cut-Point

More information

Logistic regression. Department of Statistics, University of South Carolina. Stat 205: Elementary Statistics for the Biological and Life Sciences

Logistic regression. Department of Statistics, University of South Carolina. Stat 205: Elementary Statistics for the Biological and Life Sciences Logistic regression Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences 1 / 1 Logistic regression: pp. 538 542 Consider Y to be binary

More information

Notes for laboratory session 2

Notes for laboratory session 2 Notes for laboratory session 2 Preliminaries Consider the ordinary least-squares (OLS) regression of alcohol (alcohol) and plasma retinol (retplasm). We do this with STATA as follows:. reg retplasm alcohol

More information

Modern Regression Methods

Modern Regression Methods Modern Regression Methods Second Edition THOMAS P. RYAN Acworth, Georgia WILEY A JOHN WILEY & SONS, INC. PUBLICATION Contents Preface 1. Introduction 1.1 Simple Linear Regression Model, 3 1.2 Uses of Regression

More information

Bayesian Joint Modelling of Longitudinal and Survival Data of HIV/AIDS Patients: A Case Study at Bale Robe General Hospital, Ethiopia

Bayesian Joint Modelling of Longitudinal and Survival Data of HIV/AIDS Patients: A Case Study at Bale Robe General Hospital, Ethiopia American Journal of Theoretical and Applied Statistics 2017; 6(4): 182-190 http://www.sciencepublishinggroup.com/j/ajtas doi: 10.11648/j.ajtas.20170604.13 ISSN: 2326-8999 (Print); ISSN: 2326-9006 (Online)

More information

Meta-analysis using individual participant data: one-stage and two-stage approaches, and why they may differ

Meta-analysis using individual participant data: one-stage and two-stage approaches, and why they may differ Tutorial in Biostatistics Received: 11 March 2016, Accepted: 13 September 2016 Published online 16 October 2016 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/sim.7141 Meta-analysis using

More information

Multivariate dose-response meta-analysis: an update on glst

Multivariate dose-response meta-analysis: an update on glst Multivariate dose-response meta-analysis: an update on glst Nicola Orsini Unit of Biostatistics Unit of Nutritional Epidemiology Institute of Environmental Medicine Karolinska Institutet http://www.imm.ki.se/biostatistics/

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis Basic Concept: Extend the simple regression model to include additional explanatory variables: Y = β 0 + β1x1 + β2x2 +... + βp-1xp + ε p = (number of independent variables

More information

Multivariate meta-analysis for non-linear and other multi-parameter associations

Multivariate meta-analysis for non-linear and other multi-parameter associations Research Article Received 9 August 2011, Accepted 11 May 2012 Published online 16 July 2012 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/sim.5471 Multivariate meta-analysis for non-linear

More information

Simple Linear Regression the model, estimation and testing

Simple Linear Regression the model, estimation and testing Simple Linear Regression the model, estimation and testing Lecture No. 05 Example 1 A production manager has compared the dexterity test scores of five assembly-line employees with their hourly productivity.

More information

Quantifying cancer patient survival; extensions and applications of cure models and life expectancy estimation

Quantifying cancer patient survival; extensions and applications of cure models and life expectancy estimation From the Department of Medical Epidemiology and Biostatistics Karolinska Institutet, Stockholm, Sweden Quantifying cancer patient survival; extensions and applications of cure models and life expectancy

More information

Vessel wall differences between middle cerebral artery and basilar artery. plaques on magnetic resonance imaging

Vessel wall differences between middle cerebral artery and basilar artery. plaques on magnetic resonance imaging Vessel wall differences between middle cerebral artery and basilar artery plaques on magnetic resonance imaging Peng-Peng Niu, MD 1 ; Yao Yu, MD 1 ; Hong-Wei Zhou, MD 2 ; Yang Liu, MD 2 ; Yun Luo, MD 1

More information

Epidemiologic Methods I & II Epidem 201AB Winter & Spring 2002

Epidemiologic Methods I & II Epidem 201AB Winter & Spring 2002 DETAILED COURSE OUTLINE Epidemiologic Methods I & II Epidem 201AB Winter & Spring 2002 Hal Morgenstern, Ph.D. Department of Epidemiology UCLA School of Public Health Page 1 I. THE NATURE OF EPIDEMIOLOGIC

More information

Meier Hsu, Ann Zauber, Mithat Gönen, Monica Bertagnolli. Memorial-Sloan Kettering Cancer Center. May 18, 2011

Meier Hsu, Ann Zauber, Mithat Gönen, Monica Bertagnolli. Memorial-Sloan Kettering Cancer Center. May 18, 2011 Meier Hsu, Ann Zauber, Mithat Gönen, Monica Bertagnolli Memorial-Sloan Kettering Cancer Center May 18, 2011 Background Colorectal cancer (CRC) is the second leading cause of cancer deaths in the US CRC

More information

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition List of Figures List of Tables Preface to the Second Edition Preface to the First Edition xv xxv xxix xxxi 1 What Is R? 1 1.1 Introduction to R................................ 1 1.2 Downloading and Installing

More information

FACULTY OF SCIENCES Master of Statistics: Biostatistics

FACULTY OF SCIENCES Master of Statistics: Biostatistics FACULTY OF SCIENCES Master of Statistics: Biostatistics 2012 2013 Masterproef Sick leave and presenteeism in Ankylosing Spondylitis patients under treatment with Tumor Necrosis Factor (TNF) inhibitor Promotor

More information

Using dynamic prediction to inform the optimal intervention time for an abdominal aortic aneurysm screening programme

Using dynamic prediction to inform the optimal intervention time for an abdominal aortic aneurysm screening programme Using dynamic prediction to inform the optimal intervention time for an abdominal aortic aneurysm screening programme Michael Sweeting Cardiovascular Epidemiology Unit, University of Cambridge Friday 15th

More information

Prediction and Inference under Competing Risks in High Dimension - An EHR Demonstration Project for Prostate Cancer

Prediction and Inference under Competing Risks in High Dimension - An EHR Demonstration Project for Prostate Cancer Prediction and Inference under Competing Risks in High Dimension - An EHR Demonstration Project for Prostate Cancer Ronghui (Lily) Xu Division of Biostatistics and Bioinformatics Department of Family Medicine

More information

Data Analysis Using Regression and Multilevel/Hierarchical Models

Data Analysis Using Regression and Multilevel/Hierarchical Models Data Analysis Using Regression and Multilevel/Hierarchical Models ANDREW GELMAN Columbia University JENNIFER HILL Columbia University CAMBRIDGE UNIVERSITY PRESS Contents List of examples V a 9 e xv " Preface

More information

How to analyze correlated and longitudinal data?

How to analyze correlated and longitudinal data? How to analyze correlated and longitudinal data? Niloofar Ramezani, University of Northern Colorado, Greeley, Colorado ABSTRACT Longitudinal and correlated data are extensively used across disciplines

More information

Math 215, Lab 7: 5/23/2007

Math 215, Lab 7: 5/23/2007 Math 215, Lab 7: 5/23/2007 (1) Parametric versus Nonparamteric Bootstrap. Parametric Bootstrap: (Davison and Hinkley, 1997) The data below are 12 times between failures of airconditioning equipment in

More information

Objective: To describe a new approach to neighborhood effects studies based on residential mobility and demonstrate this approach in the context of

Objective: To describe a new approach to neighborhood effects studies based on residential mobility and demonstrate this approach in the context of Objective: To describe a new approach to neighborhood effects studies based on residential mobility and demonstrate this approach in the context of neighborhood deprivation and preterm birth. Key Points:

More information

3. Model evaluation & selection

3. Model evaluation & selection Foundations of Machine Learning CentraleSupélec Fall 2016 3. Model evaluation & selection Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr

More information

Week 8 Hour 1: More on polynomial fits. The AIC. Hour 2: Dummy Variables what are they? An NHL Example. Hour 3: Interactions. The stepwise method.

Week 8 Hour 1: More on polynomial fits. The AIC. Hour 2: Dummy Variables what are they? An NHL Example. Hour 3: Interactions. The stepwise method. Week 8 Hour 1: More on polynomial fits. The AIC Hour 2: Dummy Variables what are they? An NHL Example Hour 3: Interactions. The stepwise method. Stat 302 Notes. Week 8, Hour 1, Page 1 / 34 Human growth

More information

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys Multiple Regression Analysis 1 CRITERIA FOR USE Multiple regression analysis is used to test the effects of n independent (predictor) variables on a single dependent (criterion) variable. Regression tests

More information

11/24/2017. Do not imply a cause-and-effect relationship

11/24/2017. Do not imply a cause-and-effect relationship Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are highly extraverted people less afraid of rejection

More information

Available from Deakin Research Online:

Available from Deakin Research Online: This is the published version: Richardson, Ben and Fuller Tyszkiewicz, Matthew 2014, The application of non linear multilevel models to experience sampling data, European health psychologist, vol. 16,

More information

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality Week 9 Hour 3 Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality Stat 302 Notes. Week 9, Hour 3, Page 1 / 39 Stepwise Now that we've introduced interactions,

More information

The Spatio-Temporal Spread of Influenza in Virginia,

The Spatio-Temporal Spread of Influenza in Virginia, The Spatio-Temporal Spread of Influenza in Virginia, 2005-2008 River Pugsley, MPH Influenza Surveillance Coordinator Virginia Department of Health, Office of Epidemiology, Division of Disease Surveillance

More information

CLASSICAL AND. MODERN REGRESSION WITH APPLICATIONS

CLASSICAL AND. MODERN REGRESSION WITH APPLICATIONS - CLASSICAL AND. MODERN REGRESSION WITH APPLICATIONS SECOND EDITION Raymond H. Myers Virginia Polytechnic Institute and State university 1 ~l~~l~l~~~~~~~l!~ ~~~~~l~/ll~~ Donated by Duxbury o Thomson Learning,,

More information

Age-Adjusted US Cancer Death Rate Predictions

Age-Adjusted US Cancer Death Rate Predictions Georgia State University ScholarWorks @ Georgia State University Public Health Faculty Publications School of Public Health 2010 Age-Adjusted US Cancer Death Rate Predictions Matt Hayat Georgia State University,

More information

Advanced IPD meta-analysis methods for observational studies

Advanced IPD meta-analysis methods for observational studies Advanced IPD meta-analysis methods for observational studies Simon Thompson University of Cambridge, UK Part 4 IBC Victoria, July 2016 1 Outline of talk Usual measures of association (e.g. hazard ratios)

More information

Relationship between neighbourhood-level killed oral cholera vaccine coverage and protective efficacy: evidence for herd immunity

Relationship between neighbourhood-level killed oral cholera vaccine coverage and protective efficacy: evidence for herd immunity Published by Oxford University Press on behalf of the International Epidemiological Association International Journal of Epidemiology 2006;35:1044 1050 Ó The Author 2006; all rights reserved. Advance Access

More information

Understandable Statistics

Understandable Statistics Understandable Statistics correlated to the Advanced Placement Program Course Description for Statistics Prepared for Alabama CC2 6/2003 2003 Understandable Statistics 2003 correlated to the Advanced Placement

More information

Index. Springer International Publishing Switzerland 2017 T.J. Cleophas, A.H. Zwinderman, Modern Meta-Analysis, DOI /

Index. Springer International Publishing Switzerland 2017 T.J. Cleophas, A.H. Zwinderman, Modern Meta-Analysis, DOI / Index A Adjusted Heterogeneity without Overdispersion, 63 Agenda-driven bias, 40 Agenda-Driven Meta-Analyses, 306 307 Alternative Methods for diagnostic meta-analyses, 133 Antihypertensive effect of potassium,

More information

CDRI Cancer Disparities Geocoding Project. November 29, 2006 Chris Johnson, CDRI

CDRI Cancer Disparities Geocoding Project. November 29, 2006 Chris Johnson, CDRI CDRI Cancer Disparities Geocoding Project November 29, 2006 Chris Johnson, CDRI cjohnson@teamiha.org CDRI Cancer Disparities Geocoding Project Purpose: To describe and understand variations in cancer incidence,

More information

Daniel Boduszek University of Huddersfield

Daniel Boduszek University of Huddersfield Daniel Boduszek University of Huddersfield d.boduszek@hud.ac.uk Introduction to Multinominal Logistic Regression SPSS procedure of MLR Example based on prison data Interpretation of SPSS output Presenting

More information

This information is current as of January 19, 2007

This information is current as of January 19, 2007 A New Method of Estimating United States and State-level Cancer Incidence Counts for the Current Calendar Year Linda W. Pickle, Yongping Hao, Ahmedin Jemal, Zhaohui Zou, Ram C. Tiwari, Elizabeth Ward,

More information

Predictive statistical modelling approach to estimating TB burden. Sandra Alba, Ente Rood, Masja Straetemans and Mirjam Bakker

Predictive statistical modelling approach to estimating TB burden. Sandra Alba, Ente Rood, Masja Straetemans and Mirjam Bakker Predictive statistical modelling approach to estimating TB burden Sandra Alba, Ente Rood, Masja Straetemans and Mirjam Bakker Overall aim, interim results Overall aim of predictive models: 1. To enable

More information

Improving ecological inference using individual-level data

Improving ecological inference using individual-level data Improving ecological inference using individual-level data Christopher Jackson, Nicky Best and Sylvia Richardson Department of Epidemiology and Public Health, Imperial College School of Medicine, London,

More information

Air ARPAE Emilia-Romagna. Roberta Amorati e Chiara Agostini

Air ARPAE Emilia-Romagna. Roberta Amorati e Chiara Agostini Air Quality @ ARPAE Emilia-Romagna Roberta Amorati e Chiara Agostini The principal activity of our group at ARPAE deals with air quality. One of the main purpose is to ensure a proper flux of information

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

Approaches to the spatial modelling of cancer incidence and mortality in metropolitan Perth, Western Australia,

Approaches to the spatial modelling of cancer incidence and mortality in metropolitan Perth, Western Australia, Edith Cowan University Research Online Theses: Doctorates and Masters Theses 2011 Approaches to the spatial modelling of cancer incidence and mortality in metropolitan Perth, Western Australia, 1990-2005

More information

PRACTICAL STATISTICS FOR MEDICAL RESEARCH

PRACTICAL STATISTICS FOR MEDICAL RESEARCH PRACTICAL STATISTICS FOR MEDICAL RESEARCH Douglas G. Altman Head of Medical Statistical Laboratory Imperial Cancer Research Fund London CHAPMAN & HALL/CRC Boca Raton London New York Washington, D.C. Contents

More information

OPCS coding (Office of Population, Censuses and Surveys Classification of Surgical Operations and Procedures) (4th revision)

OPCS coding (Office of Population, Censuses and Surveys Classification of Surgical Operations and Procedures) (4th revision) Web appendix: Supplementary information OPCS coding (Office of Population, Censuses and Surveys Classification of Surgical Operations and Procedures) (4th revision) Procedure OPCS code/name Cholecystectomy

More information

Regression models, R solution day7

Regression models, R solution day7 Regression models, R solution day7 Exercise 1 In this exercise, we shall look at the differences in vitamin D status for women in 4 European countries Read and prepare the data: vit

More information

Generalized Estimating Equations for Depression Dose Regimes

Generalized Estimating Equations for Depression Dose Regimes Generalized Estimating Equations for Depression Dose Regimes Karen Walker, Walker Consulting LLC, Menifee CA Generalized Estimating Equations on the average produce consistent estimates of the regression

More information

Analysis of Vaccine Effects on Post-Infection Endpoints Biostat 578A Lecture 3

Analysis of Vaccine Effects on Post-Infection Endpoints Biostat 578A Lecture 3 Analysis of Vaccine Effects on Post-Infection Endpoints Biostat 578A Lecture 3 Analysis of Vaccine Effects on Post-Infection Endpoints p.1/40 Data Collected in Phase IIb/III Vaccine Trial Longitudinal

More information

Linear Regression Analysis

Linear Regression Analysis Linear Regression Analysis WILEY SERIES IN PROBABILITY AND STATISTICS Established by WALTER A. SHEWHART and SAMUEL S. WILKS Editors: David J. Balding, Peter Bloomfield, Noel A. C. Cressie, Nicholas I.

More information

Modeling The Count Data Of Emergency Department Use Among The Chronically Homeless Adults

Modeling The Count Data Of Emergency Department Use Among The Chronically Homeless Adults Yale University EliScholar A Digital Platform for Scholarly Publishing at Yale Public Health Theses School of Public Health January 2014 Modeling The Count Data Of Emergency Department Use Among The Chronically

More information

BIOSTATISTICAL METHODS AND RESEARCH DESIGNS. Xihong Lin Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA

BIOSTATISTICAL METHODS AND RESEARCH DESIGNS. Xihong Lin Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA BIOSTATISTICAL METHODS AND RESEARCH DESIGNS Xihong Lin Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA Keywords: Case-control study, Cohort study, Cross-Sectional Study, Generalized

More information

Prediction Model For Risk Of Breast Cancer Considering Interaction Between The Risk Factors

Prediction Model For Risk Of Breast Cancer Considering Interaction Between The Risk Factors INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME, ISSUE 0, SEPTEMBER 01 ISSN 81 Prediction Model For Risk Of Breast Cancer Considering Interaction Between The Risk Factors Nabila Al Balushi

More information

COMPARING MODELS OF THE EFFECT OF AIR POLLUTANTS ON HOSPITAL ADMISSIONS AND SYMPTOMS FOR CHRONIC OBSTRUCTIVE PULMONARY DISEASE

COMPARING MODELS OF THE EFFECT OF AIR POLLUTANTS ON HOSPITAL ADMISSIONS AND SYMPTOMS FOR CHRONIC OBSTRUCTIVE PULMONARY DISEASE Cent Eur J Public Health 2012; 20 (4): 282 286 COMPARING MODELS OF THE EFFECT OF AIR POLLUTANTS ON HOSPITAL ADMISSIONS AND SYMPTOMS FOR CHRONIC OBSTRUCTIVE PULMONARY DISEASE Mehmet Ali Cengiz, Yuksel Terzi

More information

Content. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries. Research question. Example Newly diagnosed Type 2 Diabetes

Content. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries. Research question. Example Newly diagnosed Type 2 Diabetes Content Quantifying association between continuous variables. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General

More information

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do.

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do. Midterm STAT-UB.0003 Regression and Forecasting Models The exam is closed book and notes, with the following exception: you are allowed to bring one letter-sized page of notes into the exam (front and

More information

Multiple Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Multiple Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Multiple Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Multiple Regression 1 / 19 Multiple Regression 1 The Multiple

More information

Dynamic prediction using joint models for recurrent and terminal events: Evolution after a breast cancer

Dynamic prediction using joint models for recurrent and terminal events: Evolution after a breast cancer Dynamic prediction using joint models for recurrent and terminal events: Evolution after a breast cancer A. Mauguen, B. Rachet, S. Mathoulin-Pélissier, S. Siesling, G. MacGrogan, A. Laurent, V. Rondeau

More information

FACULTY OF SCIENCES Master of Statistics

FACULTY OF SCIENCES Master of Statistics 2014 2015 FACULTY OF SCIENCES Master of Statistics Master's thesis Modelling the Evolution of CD4+ Cell Counts and Hemoglobin Concentration Level for HIV-1 Patients on Antiretroviral Therapy (ART) in Mildmay

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression Assoc. Prof Dr Sarimah Abdullah Unit of Biostatistics & Research Methodology School of Medical Sciences, Health Campus Universiti Sains Malaysia Regression Regression analysis

More information

Journal of Physics: Conference Series. Related content PAPER OPEN ACCESS

Journal of Physics: Conference Series. Related content PAPER OPEN ACCESS Journal of Physics: Conference Series PAPER OPEN ACCESS Application of logistic regression models to cancer patients: a case study of data from Jigme Dorji Wangchuck National Referral Hospital (JDWNRH)

More information

Estimating drug effects in the presence of placebo response: Causal inference using growth mixture modeling

Estimating drug effects in the presence of placebo response: Causal inference using growth mixture modeling STATISTICS IN MEDICINE Statist. Med. 2009; 28:3363 3385 Published online 3 September 2009 in Wiley InterScience (www.interscience.wiley.com).3721 Estimating drug effects in the presence of placebo response:

More information

Investigation of relative survival from colorectal cancer between NHS organisations

Investigation of relative survival from colorectal cancer between NHS organisations School Cancer of Epidemiology something Group FACULTY OF OTHER MEDICINE AND HEALTH Investigation of relative survival from colorectal cancer between NHS organisations Katie Harris k.harris@leeds.ac.uk

More information

Cross-validation. Miguel Angel Luque Fernandez Faculty of Epidemiology and Population Health Department of Non-communicable Disease.

Cross-validation. Miguel Angel Luque Fernandez Faculty of Epidemiology and Population Health Department of Non-communicable Disease. Cross-validation Miguel Angel Luque Fernandez Faculty of Epidemiology and Population Health Department of Non-communicable Disease. August 25, 2015 Cancer Survival Group (LSH&TM) Cross-validation August

More information

Outline. Hierarchical Hidden Markov Models for HIV-Transmission Behavior Outcomes. Motivation. Why Hidden Markov Model? Why Hidden Markov Model?

Outline. Hierarchical Hidden Markov Models for HIV-Transmission Behavior Outcomes. Motivation. Why Hidden Markov Model? Why Hidden Markov Model? Hierarchical Hidden Markov Models for HIV-Transmission Behavior Outcomes Li-Jung Liang Department of Medicine Statistics Core Email: liangl@ucla.edu Joint work with Rob Weiss & Scott Comulada Outline Motivation

More information

An Introduction to Multiple Imputation for Missing Items in Complex Surveys

An Introduction to Multiple Imputation for Missing Items in Complex Surveys An Introduction to Multiple Imputation for Missing Items in Complex Surveys October 17, 2014 Joe Schafer Center for Statistical Research and Methodology (CSRM) United States Census Bureau Views expressed

More information

Statistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes.

Statistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes. Final review Based in part on slides from textbook, slides of Susan Holmes December 5, 2012 1 / 1 Final review Overview Before Midterm General goals of data mining. Datatypes. Preprocessing & dimension

More information

Constructing a mixed model using the AIC

Constructing a mixed model using the AIC Constructing a mixed model using the AIC The Data: The Citalopram study (PI Dr. Zisook) Does Citalopram reduce the depression in schizophrenic patients with subsyndromal depression Two Groups: Citalopram

More information

Self-assessment test of prerequisite knowledge for Biostatistics III in R

Self-assessment test of prerequisite knowledge for Biostatistics III in R Self-assessment test of prerequisite knowledge for Biostatistics III in R Mark Clements, Karolinska Institutet 2017-10-31 Participants in the course Biostatistics III are expected to have prerequisite

More information

MODELING AN SMT LINE TO IMPROVE THROUGHPUT

MODELING AN SMT LINE TO IMPROVE THROUGHPUT As originally published in the SMTA Proceedings MODELING AN SMT LINE TO IMPROVE THROUGHPUT Gregory Vance Rockwell Automation, Inc. Mayfield Heights, OH, USA gjvance@ra.rockwell.com Todd Vick Universal

More information

First of two parts Joseph Hogan Brown University and AMPATH

First of two parts Joseph Hogan Brown University and AMPATH First of two parts Joseph Hogan Brown University and AMPATH Overview What is regression? Does regression have to be linear? Case study: Modeling the relationship between weight and CD4 count Exploratory

More information

Early Learning vs Early Variability 1.5 r = p = Early Learning r = p = e 005. Early Learning 0.

Early Learning vs Early Variability 1.5 r = p = Early Learning r = p = e 005. Early Learning 0. The temporal structure of motor variability is dynamically regulated and predicts individual differences in motor learning ability Howard Wu *, Yohsuke Miyamoto *, Luis Nicolas Gonzales-Castro, Bence P.

More information

You must answer question 1.

You must answer question 1. Research Methods and Statistics Specialty Area Exam October 28, 2015 Part I: Statistics Committee: Richard Williams (Chair), Elizabeth McClintock, Sarah Mustillo You must answer question 1. 1. Suppose

More information

AP Statistics. Semester One Review Part 1 Chapters 1-5

AP Statistics. Semester One Review Part 1 Chapters 1-5 AP Statistics Semester One Review Part 1 Chapters 1-5 AP Statistics Topics Describing Data Producing Data Probability Statistical Inference Describing Data Ch 1: Describing Data: Graphically and Numerically

More information

Use of early longitudinal viral load as a surrogate to the virologic endpoint in Hepatitis C: a semi-parametric mixed effect approach using SAS.

Use of early longitudinal viral load as a surrogate to the virologic endpoint in Hepatitis C: a semi-parametric mixed effect approach using SAS. SP04 Use of early longitudinal viral load as a surrogate to the virologic endpoint in Hepatitis C: a semi-parametric mixed effect approach using SAS. Igwebuike Enweonye, B&D Life Sciences Clinical Research,

More information

Multivariate Multilevel Models

Multivariate Multilevel Models Multivariate Multilevel Models Getachew A. Dagne George W. Howe C. Hendricks Brown Funded by NIMH/NIDA 11/20/2014 (ISSG Seminar) 1 Outline What is Behavioral Social Interaction? Importance of studying

More information

bivariate analysis: The statistical analysis of the relationship between two variables.

bivariate analysis: The statistical analysis of the relationship between two variables. bivariate analysis: The statistical analysis of the relationship between two variables. cell frequency: The number of cases in a cell of a cross-tabulation (contingency table). chi-square (χ 2 ) test for

More information

Bias Adjustment: Local Control Analysis of Radon and Ozone

Bias Adjustment: Local Control Analysis of Radon and Ozone Bias Adjustment: Local Control Analysis of Radon and Ozone S. Stanley Young Robert Obenchain Goran Krstic NCSU 19Oct2016 Abstract Bias Adjustment: Local control analysis of Radon and ozone S. Stanley Young,

More information

Reveal Relationships in Categorical Data

Reveal Relationships in Categorical Data SPSS Categories 15.0 Specifications Reveal Relationships in Categorical Data Unleash the full potential of your data through perceptual mapping, optimal scaling, preference scaling, and dimension reduction

More information

Application of Cox Regression in Modeling Survival Rate of Drug Abuse

Application of Cox Regression in Modeling Survival Rate of Drug Abuse American Journal of Theoretical and Applied Statistics 2018; 7(1): 1-7 http://www.sciencepublishinggroup.com/j/ajtas doi: 10.11648/j.ajtas.20180701.11 ISSN: 2326-8999 (Print); ISSN: 2326-9006 (Online)

More information

Exploratory Spatial Analyses of Sexual Assaults in Anchorage

Exploratory Spatial Analyses of Sexual Assaults in Anchorage Exploratory Spatial Analyses of Sexual Assaults in Anchorage André B. Rosay and Robert H. Langworthy Justice Center, University of Alaska This research was supported by Grant No. 2000-RH-CX-K039 awarded

More information