Prediction and Inference under Competing Risks in High Dimension - An EHR Demonstration Project for Prostate Cancer
|
|
- Amice Armstrong
- 5 years ago
- Views:
Transcription
1 Prediction and Inference under Competing Risks in High Dimension - An EHR Demonstration Project for Prostate Cancer Ronghui (Lily) Xu Division of Biostatistics and Bioinformatics Department of Family Medicine and Public Health and Department of Mathematics University of California, San Diego rxu@ucsd.edu This is joint work with: Jiayi Hou, Ph.D., Anthony Paravati, M.D., James Murphy, M.D., MPH; and Jue (Marquis) Hou, Jelena Bradic, Ph.D. R. Xu (UCSD) Competing Risks in High Dimension 1 / 30
2 Linked SEER-Medicare database As an illustration project of how information contained in patients electronic medical records (EMR) can be harvested for the purposes of precision medicine, we consider the large data set linking: the Surveillance, Epidemiology and End Results (SEER) Program database of the National Cancer Institute with the federal health insurance program Medicare database for prostate cancer patients of age 65 or older. R. Xu (UCSD) Competing Risks in High Dimension 2 / 30
3 Cancer and non-cancer mortality Clinical guidelines for the management of prostate cancer based on two factors: an estimation of the aggressiveness of a patient s tumor; and estimation of a patient s overall life expectancy. Currently no tool exists to predict non-cancer survival for this patient population. Meanwhile non-cancer and cancer survival are not independent, i.e. competing risks, a comprehensive prediction tool should consider both types of risks simultaneously. R. Xu (UCSD) Competing Risks in High Dimension 3 / 30
4 Data We restrict our analysis to patients diagnosed between 2004 and 2009 in the SEER-Medicare database. After excluding additional patients with missing clinical records, we have a total of 57,011 patients 7 relevant clinical variables (age, PSA, Gleason score, AJCC stage, and AJCC stage T, N, M, respectively) 5 demographical variables (race, marital status, metro, registry and year of diagnosis) 8971 binary insurance claim codes, capturing medical diagnoses, procedures, hospitalization and outpatient activities, etc. 1 if the procedure/activity present, and 0 if absent. The clinical and demographic information are at the time of diagnosis, and insurance claims data during the year prior to diagnosis, so that survival prediction would occur at the time of diagnosis R. Xu (UCSD) Competing Risks in High Dimension 4 / 30
5 Data (cont d) Until December 2013 (end of follow-up) there were a total of 1,247 deaths due to cancer 5,221 deaths unrelated to cancer. As the number of events dictates the effective sample size, we have more predictors than the effective sample size. R. Xu (UCSD) Competing Risks in High Dimension 5 / 30
6 High dimensional problem Different approaches have been proposed to analyze survival data with high dimensional covariates: LASSO under the Cox model (Tibshirani 1997), adaptive LASSO under the Cox model (Zhang and Lu 2007), random forest for right-censoring data (Hothorn et al. 2006), non-concave penalized methods under ultra high dimension (Bradic et al. 2011). But very few high-dimensional methods have been developed in the presence of competing risks: Binder et al. (2009) first proposed a boosting approach for fitting the proportional subdistribution hazards model, Fu et al. (2017) considered penalized approaches under the same model (requires p n). R. Xu (UCSD) Competing Risks in High Dimension 6 / 30
7 The models Denote the observed data X i = min(t i, C i ), δ i = I(T i C i ), ɛ i = 1, 2 is the cause or type, Z i is a vector of covariates, i = 1,..., n. We are interested in the cumulative incidence function (CIF) F j (t Z) = P (T t, ɛ = j Z) for failure type j. The proportional cause-specific hazards (PCSH) model λ j (t Z) = λ 0j (t) exp(β jz), (1) for j = 1, 2, where 1 λ j (t) = lim t 0+ tp (t T < t + t, J = j T t) is the cause-specific hazard function. Then F j (t Z) = t 0 λ j(u Z)S(u Z)du, where S(t Z) = P (T > t Z) is the overall survival function. R. Xu (UCSD) Competing Risks in High Dimension 7 / 30
8 The models (cont d) The proportional subdistribution hazards (PSDH, Fine and Gray 1999) model λ 1 (t Z) = λ 0 (t) exp(β Z). (2) where λ 1 (t Z) = d dt log{1 F 1(t Z)}. Obviously F 1 (t Z) is readily estimated after fitting the PSDH model for cause 1 only. R. Xu (UCSD) Competing Risks in High Dimension 8 / 30
9 High-d algorithm simulation: PCSH model with LASSO (1) Figure: Continuous covariates: n = 500, the three columns correspond to p = 20, 500, and The rows are: 1) CV10, 2) CV+1SE, 3) minimum AIC, 4) minimum BIC, 5) elbow AIC and 6) elbow BIC. The true CIF 1 (2 z 0 ) = Ind Exch AR1 Oracle R. Xu (UCSD) Competing Risks in High Dimension 9 / 30
10 High-d algorithm simulation: PCSH model with LASSO (2) Figure: Balanced binary covariates: n = 500, the three columns correspond to p = 20, 500, and The rows are: 1) CV10, 2) CV+1SE, 3) minimum AIC, 4) minimum BIC, 5) elbow AIC and 6) elbow BIC. The true CIF 1 (2 z 0 ) = See Supplement See Supplement Ind Exch AR1 Oracle See Supplement See Supplement R. Xu (UCSD) Competing Risks in High Dimension 10 / 30
11 High-d algorithm simulation: PCSH model with LASSO (3) Figure: Sparse binary covariates: n = 500, the three columns correspond to p = 20, 500, and The rows are: 1) CV10, 2) CV+1SE, 3) minimum AIC, 4) minimum BIC, 5) elbow AIC and 6) elbow BIC. The true CIF 1 (2 z 0 ) = See Supplement See Supplement Ind Exch AR1 Oracle See Supplement See Supplement See Supplement See Supplement R. Xu (UCSD) Competing Risks in High Dimension 11 / 30
12 High-d algorithm simulation: PSDH with boosting (1) Figure: Continuous covariates: n = 500, the three columns correspond to p = 20, 500, and The rows are: 1) CV10, 2) minimum AIC, 3) minimum BIC, 4) elbow AIC and 5) elbow BIC. The true CIF 1 (2 z 0 ) = Ind Exch AR1 Oracle R. Xu (UCSD) Competing Risks in High Dimension 12 / 30
13 High-d algorithm simulation: PSDH with boosting (2) Figure: Balanced binary covariates: n = 500, the three columns correspond to p = 20, 500, and The rows are: 1) CV10, 2) minimum AIC, 3) minimum BIC, 4) elbow AIC and 5) elbow BIC. The true CIF 1 (2 z 0 ) = Ind Exch AR1 Oracle R. Xu (UCSD) Competing Risks in High Dimension 13 / 30
14 High-d algorithm simulation: PSDH with boosting (3) Figure: Sparse binary covariates: n = 500, the three columns correspond to p = 20, 500, and The rows are: 1) CV10, 2) minimum AIC, 3) minimum BIC, 4) elbow AIC and 5) elbow BIC. The true CIF 1 (2 z 0 ) = Ind Exch AR1 Oracle R. Xu (UCSD) Competing Risks in High Dimension 14 / 30
15 Apply to SEER-Medicare data: PCSH model with LASSO Random split into training (n = 28, 505) and test (n = 28, 506) datasets. On training data: p 1 = 2188 and p 2 = 1079 claim codes for non-cancer and cancer mortality after univariate screening; CV10 was used to choose the penalty parameter; final model contained 143 predictors for non-cancer mortality, and 9 predictors for cancer mortality. On test data: For each mortality type j = 1, 2, we divided into 4 risk strata: low (L), median low (ML), median high (MH) and high (H) by quartiles; this gives 16 strata for the combined two types of mortalities; after plotting the predicted CIF s for the 16 strata, 5 distinct risk groups emerged for both cancer and non-cancer mortalities. R. Xu (UCSD) Competing Risks in High Dimension 15 / 30
16 SEER-Medicare results: PCSH model with LASSO R. Xu (UCSD) Competing Risks in High Dimension 16 / 30
17 SEER-Medicare results: PCSH model with LASSO Risk Groups n # events Cause-Specific Non-cancer mortality: Non-cancer: Cancer: L L L, ML, MH, H ML ML L, ML, MH, H M MH L, ML, MH, H MH H L, ML, MH H H H Cancer mortality: Non-cancer: Cancer: L L, ML, MH, H L ML L, ML, MH, H ML M L, ML, MH, H MH MH L, ML, MH H H H H R. Xu (UCSD) Competing Risks in High Dimension 17 / 30
18 SEER-Medicare results: PCSH model with LASSO R. Xu (UCSD) Competing Risks in High Dimension 18 / 30
19 Apply to SEER-Medicare data: PSDH model with boosting Same training (n = 28, 505) and test (n = 28, 506) datasets. On training data: top 2000 claim codes (ranked by p-value) for each type of mortality (p 1 = 4634 and p 2 = 6088 after univariate screening); AIC was used, because it was a close second to CV10 and less computationally intensive; final model contained 53 predictors for non-cancer mortality, and 13 predictors for cancer mortality. On test data: For each mortality type j = 1, 2, we divided into 5 risk strata: low (L), median low (ML), median (M), median high (MH) and high (H) by quintiles. R. Xu (UCSD) Competing Risks in High Dimension 19 / 30
20 SEER-Medicare results: PSDH model with boosting R. Xu (UCSD) Competing Risks in High Dimension 20 / 30
21 Prediction, variable selection, and inference The field of high dimensional statistical theory has been advancing (Bühlmann, Peter and van de Geer, 2011). Consistent prediction can be relatively easily achieved (eg. LASSO can be consistent for estimating the underlying regression function); Consistent variable selection needs a beta-min condition (i.e. all non-zero coefficients are sufficiently large), in addition to the restrictive irrepresentable condition (cannot have too strongly correlated design); Recent developments by Zhang and Zhang (2014), van de Geer et al. (2014) allows confidence intervals, in the presence of many small non-zero coefficients. R. Xu (UCSD) Competing Risks in High Dimension 21 / 30
22 One-step estimator Consider the high-dimensional PSDH model where Z(t) may be time-dependent. λ 1 (t Z(t)) = λ 0 (t) exp{β Z(t)}. (3) The modified log partial likelihood that gives rise to the estimating equation for β (in low dim.) is n t n m(β) = n 1 β Z i (t) log ω j (t)y j (t)e β Z j (t) dn i o (t), i=1 0 where Ni o (t) = I(X i t, δ iɛ i = 1), ω i(t)y i(t) = r i(t)y i(t) Ĝ(t)/Ĝ(t Xi), G(t) = P (C > t) and r i(t)y i(t) = I(t X i) + I(t > X i)i(δ iɛ i > 1). j=1 (4) R. Xu (UCSD) Competing Risks in High Dimension 22 / 30
23 Initial LASSO estimator Following Zhang and Zhang (2014), van de Geer et al. (2014), let the initial estimator be } β(λ) = argmin { m(β) + λ β 1. (5) β Oracle inequality = for the regularization parameter λ choosen to be of the order log(n) log(p)/n, we have β β o 1 = O p (s o log(n) ) log(p)/n. R. Xu (UCSD) Competing Risks in High Dimension 23 / 30
24 One-step estimator (cont d) The one-step bias-corrected estimator is b := β + Θṁ( β), (6) where Θ is such that m(β o ) Θ I p, and is obtained by a version of nodewise LASSO based on the score vectors instead of the design matrix (Zhang and Zhang, 2014). R. Xu (UCSD) Competing Risks in High Dimension 24 / 30
25 Confidence intervals We can show that b β o = Θṁ(β o ) + o p(1), and that for any c R p such that c 1 = 1, c ṁ(β o ) d N(0, c Vc). An 1 α confidence interval for c β o is (c b Z1 α/2 c Θ V Θ c/n, c b ) + Z1 α/2 c Θ V Θ c/n. (7) R. Xu (UCSD) Competing Risks in High Dimension 25 / 30
26 Simulation results For cause 1, s o = 2. True Mean Est SD SE SE corrected Coverage Level/Power n=200, p=1000 β 1, β 1, β 1, n=500, p=1000 β 1, β 1, β 1, R. Xu (UCSD) Competing Risks in High Dimension 26 / 30
27 References 1 Hou J, Paravati A, Xu R, Murphy J. High-Dimensional Variable Selection and Prediction under Competing Risks with Application to SEER-Medicare Linked Data. 2 Hou J, Bradic J, Xu R, Fine-Gray Competing Risks Model with High Dimensional Covariates: Estimation and Inference. R. Xu (UCSD) Competing Risks in High Dimension 27 / 30
28 High dimensional algorithms: LASSO Tibshirani s (1997) LASSO can be applied to the PCSH model, as standard Cox regression software is used in low dimensions. Selection of penalty parameter: CV10: minimum 10-fold cross-validated (CV) negative predictive log partial likelihood (referred to as error in the following); CV+1SE: minimum 10-fold CV error plus one standard error of the CV estimated errors; min AIC / BIC; elbow AIC / BIC (Tibshirani et al. 2001). Under the Cox model AIC = 2 log(l) + 2s, where L is the partial likelihood and s = S( β) is the size of the active set (Verweij and van Houwelingen, 1993; Xu et al. 2009); BIC = 2 log(l) + 2s log(k), where k is the number of observed events (Volinsky and Raftery, 2000) from the cause of interest. R. Xu (UCSD) Competing Risks in High Dimension 28 / 30
29 LASSO (cont d) Fu et al. (R package crrp ) contains LASSO for the PSDH model, unfortunately was not able to apply to the linked SEER-Medicare data as it ran out of memory. (This becomes a challenge for the initial estimator in our 2nd paper.) R. Xu (UCSD) Competing Risks in High Dimension 29 / 30
30 High dimensional algorithms: boosting Binder et al. (2009) applied a likelihood based boosting approach to the weighted partial likelihood under the PSDH model. Selection of optimal number of boosting steps: CV10; min AIC / BIC; elbow AIC / BIC. Definitions of the criteria are similar to those under the PCSH model above, with the likelihood returned by the crr() function in R package cmprsk. R. Xu (UCSD) Competing Risks in High Dimension 30 / 30
Machine Learning to Inform Breast Cancer Post-Recovery Surveillance
Machine Learning to Inform Breast Cancer Post-Recovery Surveillance Final Project Report CS 229 Autumn 2017 Category: Life Sciences Maxwell Allman (mallman) Lin Fan (linfan) Jamie Kang (kangjh) 1 Introduction
More informationModelling Spatially Correlated Survival Data for Individuals with Multiple Cancers
Modelling Spatially Correlated Survival Data for Individuals with Multiple Cancers Dipak K. Dey, Ulysses Diva and Sudipto Banerjee Department of Statistics University of Connecticut, Storrs. March 16,
More informationStatistical Methods for the Evaluation of Treatment Effectiveness in the Presence of Competing Risks
Statistical Methods for the Evaluation of Treatment Effectiveness in the Presence of Competing Risks Ravi Varadhan Assistant Professor (On Behalf of the Johns Hopkins DEcIDE Center) Center on Aging and
More informationAnalysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach
University of South Florida Scholar Commons Graduate Theses and Dissertations Graduate School November 2015 Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach Wei Chen
More informationPart [2.1]: Evaluation of Markers for Treatment Selection Linking Clinical and Statistical Goals
Part [2.1]: Evaluation of Markers for Treatment Selection Linking Clinical and Statistical Goals Patrick J. Heagerty Department of Biostatistics University of Washington 174 Biomarkers Session Outline
More informationComputer Age Statistical Inference. Algorithms, Evidence, and Data Science. BRADLEY EFRON Stanford University, California
Computer Age Statistical Inference Algorithms, Evidence, and Data Science BRADLEY EFRON Stanford University, California TREVOR HASTIE Stanford University, California ggf CAMBRIDGE UNIVERSITY PRESS Preface
More informationSelection and Combination of Markers for Prediction
Selection and Combination of Markers for Prediction NACC Data and Methods Meeting September, 2010 Baojiang Chen, PhD Sarah Monsell, MS Xiao-Hua Andrew Zhou, PhD Overview 1. Research motivation 2. Describe
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016 Exam policy: This exam allows one one-page, two-sided cheat sheet; No other materials. Time: 80 minutes. Be sure to write your name and
More informationSarwar Islam Mozumder 1, Mark J Rutherford 1 & Paul C Lambert 1, Stata London Users Group Meeting
2016 Stata London Users Group Meeting stpm2cr: A Stata module for direct likelihood inference on the cause-specific cumulative incidence function within the flexible parametric modelling framework Sarwar
More informationDynamic prediction using joint models for recurrent and terminal events: Evolution after a breast cancer
Dynamic prediction using joint models for recurrent and terminal events: Evolution after a breast cancer A. Mauguen, B. Rachet, S. Mathoulin-Pélissier, S. Siesling, G. MacGrogan, A. Laurent, V. Rondeau
More informationUniversity of California, Berkeley
University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2007 Paper 221 Biomarker Discovery Using Targeted Maximum Likelihood Estimation: Application to the
More informationApplication of EM Algorithm to Mixture Cure Model for Grouped Relative Survival Data
Journal of Data Science 5(2007), 41-51 Application of EM Algorithm to Mixture Cure Model for Grouped Relative Survival Data Binbing Yu 1 and Ram C. Tiwari 2 1 Information Management Services, Inc. and
More informationChapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)
Chapter : Advanced Remedial Measures Weighted Least Squares (WLS) When the error variance appears nonconstant, a transformation (of Y and/or X) is a quick remedy. But it may not solve the problem, or it
More informationUsing dynamic prediction to inform the optimal intervention time for an abdominal aortic aneurysm screening programme
Using dynamic prediction to inform the optimal intervention time for an abdominal aortic aneurysm screening programme Michael Sweeting Cardiovascular Epidemiology Unit, University of Cambridge Friday 15th
More informationBIOINFORMATICS ORIGINAL PAPER
BIOINFORMATICS ORIGINAL PAPER Vol. 21 no. 13 2005, pages 3001 3008 doi:10.1093/bioinformatics/bti422 Gene expression Penalized Cox regression analysis in the high-dimensional and low-sample size settings,
More informationWhat is Statistics? Ronghui (Lily) Xu (UCSD) An Introduction to Statistics 1 / 23
What is Statistics? Some definitions of Statistics: The science of Statistics is essentially a branch of Applied Mathematics, and maybe regarded as mathematics applied to observational data. Fisher The
More informationLandmarking, immortal time bias and. Dynamic prediction
Landmarking and immortal time bias Landmarking and dynamic prediction Discussion Landmarking, immortal time bias and dynamic prediction Department of Medical Statistics and Bioinformatics Leiden University
More informationScore Tests of Normality in Bivariate Probit Models
Score Tests of Normality in Bivariate Probit Models Anthony Murphy Nuffield College, Oxford OX1 1NF, UK Abstract: A relatively simple and convenient score test of normality in the bivariate probit model
More informationStatistical Models for Censored Point Processes with Cure Rates
Statistical Models for Censored Point Processes with Cure Rates Jennifer Rogers MSD Seminar 2 November 2011 Outline Background and MESS Epilepsy MESS Exploratory Analysis Summary Statistics and Kaplan-Meier
More informationModel Selection Methods for Cancer Staging and Other Disease Stratification Problems. Yunzhi Lin
Model Selection Methods for Cancer Staging and Other Disease Stratification Problems by Yunzhi Lin A dissertation submitted in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY
More informationCatherine A. Welch 1*, Séverine Sabia 1,2, Eric Brunner 1, Mika Kivimäki 1 and Martin J. Shipley 1
Welch et al. BMC Medical Research Methodology (2018) 18:89 https://doi.org/10.1186/s12874-018-0548-0 RESEARCH ARTICLE Open Access Does pattern mixture modelling reduce bias due to informative attrition
More informationVARIABLE SELECTION WHEN CONFRONTED WITH MISSING DATA
VARIABLE SELECTION WHEN CONFRONTED WITH MISSING DATA by Melissa L. Ziegler B.S. Mathematics, Elizabethtown College, 2000 M.A. Statistics, University of Pittsburgh, 2002 Submitted to the Graduate Faculty
More informationAssessment of a disease screener by hierarchical all-subset selection using area under the receiver operating characteristic curves
Research Article Received 8 June 2010, Accepted 15 February 2011 Published online 15 April 2011 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/sim.4246 Assessment of a disease screener by
More informationA Bayesian Perspective on Unmeasured Confounding in Large Administrative Databases
A Bayesian Perspective on Unmeasured Confounding in Large Administrative Databases Lawrence McCandless lmccandl@sfu.ca Faculty of Health Sciences, Simon Fraser University, Vancouver Canada Summer 2014
More informationModelling prognostic capabilities of tumor size: application to colorectal cancer
Session 3: Epidemiology and public health Modelling prognostic capabilities of tumor size: application to colorectal cancer Virginie Rondeau, INSERM Modelling prognostic capabilities of tumor size : application
More informationThe Statistical Analysis of Failure Time Data
The Statistical Analysis of Failure Time Data Second Edition JOHN D. KALBFLEISCH ROSS L. PRENTICE iwiley- 'INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents Preface xi 1. Introduction 1 1.1
More informationMultivariate Multilevel Models
Multivariate Multilevel Models Getachew A. Dagne George W. Howe C. Hendricks Brown Funded by NIMH/NIDA 11/20/2014 (ISSG Seminar) 1 Outline What is Behavioral Social Interaction? Importance of studying
More informationStatistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes.
Final review Based in part on slides from textbook, slides of Susan Holmes December 5, 2012 1 / 1 Final review Overview Before Midterm General goals of data mining. Datatypes. Preprocessing & dimension
More informationAnalyzing diastolic and systolic blood pressure individually or jointly?
Analyzing diastolic and systolic blood pressure individually or jointly? Chenglin Ye a, Gary Foster a, Lisa Dolovich b, Lehana Thabane a,c a. Department of Clinical Epidemiology and Biostatistics, McMaster
More informationRidge regression for risk prediction
Ridge regression for risk prediction with applications to genetic data Erika Cule and Maria De Iorio Imperial College London Department of Epidemiology and Biostatistics School of Public Health May 2012
More informationTreatment effect estimates adjusted for small-study effects via a limit meta-analysis
Treatment effect estimates adjusted for small-study effects via a limit meta-analysis Gerta Rücker 1, James Carpenter 12, Guido Schwarzer 1 1 Institute of Medical Biometry and Medical Informatics, University
More informationDeterminants and Status of Vaccination in Bangladesh
Dhaka Univ. J. Sci. 60(): 47-5, 0 (January) Determinants and Status of Vaccination in Bangladesh Institute of Statistical Research and Training (ISRT), University of Dhaka, Dhaka-000, Bangladesh Received
More informationLecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method
Biost 590: Statistical Consulting Statistical Classification of Scientific Studies; Approach to Consulting Lecture Outline Statistical Classification of Scientific Studies Statistical Tasks Approach to
More informationEstimating and comparing cancer progression risks under varying surveillance protocols: moving beyond the Tower of Babel
Estimating and comparing cancer progression risks under varying surveillance protocols: moving beyond the Tower of Babel Jane Lange March 22, 2017 1 Acknowledgements Many thanks to the multiple project
More informationPart [1.0] Introduction to Development and Evaluation of Dynamic Predictions
Part [1.0] Introduction to Development and Evaluation of Dynamic Predictions A Bansal & PJ Heagerty Department of Biostatistics University of Washington 1 Biomarkers The Instructor(s) Patrick Heagerty
More informationBIOSTATISTICAL METHODS
BIOSTATISTICAL METHODS FOR TRANSLATIONAL & CLINICAL RESEARCH PROPENSITY SCORE Confounding Definition: A situation in which the effect or association between an exposure (a predictor or risk factor) and
More informationMultivariate Regression with Small Samples: A Comparison of Estimation Methods W. Holmes Finch Maria E. Hernández Finch Ball State University
Multivariate Regression with Small Samples: A Comparison of Estimation Methods W. Holmes Finch Maria E. Hernández Finch Ball State University High dimensional multivariate data, where the number of variables
More informationEPI 200C Final, June 4 th, 2009 This exam includes 24 questions.
Greenland/Arah, Epi 200C Sp 2000 1 of 6 EPI 200C Final, June 4 th, 2009 This exam includes 24 questions. INSTRUCTIONS: Write all answers on the answer sheets supplied; PRINT YOUR NAME and STUDENT ID NUMBER
More informationPropensity Score Analysis to compare effects of radiation and surgery on survival time of lung cancer patients from National Cancer Registry (SEER)
Propensity Score Analysis to compare effects of radiation and surgery on survival time of lung cancer patients from National Cancer Registry (SEER) Yan Wu Advisor: Robert Pruzek Epidemiology and Biostatistics
More informationCSE 255 Assignment 9
CSE 255 Assignment 9 Alexander Asplund, William Fedus September 25, 2015 1 Introduction In this paper we train a logistic regression function for two forms of link prediction among a set of 244 suspected
More informationSpatiotemporal models for disease incidence data: a case study
Spatiotemporal models for disease incidence data: a case study Erik A. Sauleau 1,2, Monica Musio 3, Nicole Augustin 4 1 Medicine Faculty, University of Strasbourg, France 2 Haut-Rhin Cancer Registry 3
More informationMODEL SELECTION STRATEGIES. Tony Panzarella
MODEL SELECTION STRATEGIES Tony Panzarella Lab Course March 20, 2014 2 Preamble Although focus will be on time-to-event data the same principles apply to other outcome data Lab Course March 20, 2014 3
More informationA Predictive Chronological Model of Multiple Clinical Observations T R A V I S G O O D W I N A N D S A N D A M. H A R A B A G I U
A Predictive Chronological Model of Multiple Clinical Observations T R A V I S G O O D W I N A N D S A N D A M. H A R A B A G I U T H E U N I V E R S I T Y O F T E X A S A T D A L L A S H U M A N L A N
More informationBayesian Model Averaging for Propensity Score Analysis
Multivariate Behavioral Research, 49:505 517, 2014 Copyright C Taylor & Francis Group, LLC ISSN: 0027-3171 print / 1532-7906 online DOI: 10.1080/00273171.2014.928492 Bayesian Model Averaging for Propensity
More informationHeritability. The Extended Liability-Threshold Model. Polygenic model for continuous trait. Polygenic model for continuous trait.
The Extended Liability-Threshold Model Klaus K. Holst Thomas Scheike, Jacob Hjelmborg 014-05-1 Heritability Twin studies Include both monozygotic (MZ) and dizygotic (DZ) twin
More informationVisualizing Data for Hypothesis Generation Using Large-Volume Health care Claims Data
Visualizing Data for Hypothesis Generation Using Large-Volume Health care Claims Data Eberechukwu Onukwugha PhD, School of Pharmacy, UMB Margret Bjarnadottir PhD, Smith School of Business, UMCP Shujia
More informationCenter for Bioinformatics & Molecular Biostatistics
Center for Bioinformatics & Molecular Biostatistics (University of California, San Francisco) Year 2004 Paper L1Cox Penalized Cox Regression Analysis in the High-Dimensional and Low-sample Size Settings,
More informationOutlier Analysis. Lijun Zhang
Outlier Analysis Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Extreme Value Analysis Probabilistic Models Clustering for Outlier Detection Distance-Based Outlier Detection Density-Based
More informationSubLasso:a feature selection and classification R package with a. fixed feature subset
SubLasso:a feature selection and classification R package with a fixed feature subset Youxi Luo,3,*, Qinghan Meng,2,*, Ruiquan Ge,2, Guoqin Mai, Jikui Liu, Fengfeng Zhou,#. Shenzhen Institutes of Advanced
More informationBayesian Dose Escalation Study Design with Consideration of Late Onset Toxicity. Li Liu, Glen Laird, Lei Gao Biostatistics Sanofi
Bayesian Dose Escalation Study Design with Consideration of Late Onset Toxicity Li Liu, Glen Laird, Lei Gao Biostatistics Sanofi 1 Outline Introduction Methods EWOC EWOC-PH Modifications to account for
More informationMEA DISCUSSION PAPERS
Inference Problems under a Special Form of Heteroskedasticity Helmut Farbmacher, Heinrich Kögel 03-2015 MEA DISCUSSION PAPERS mea Amalienstr. 33_D-80799 Munich_Phone+49 89 38602-355_Fax +49 89 38602-390_www.mea.mpisoc.mpg.de
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More informationTRIPODS Workshop: Models & Machine Learning for Causal I. & Decision Making
TRIPODS Workshop: Models & Machine Learning for Causal Inference & Decision Making in Medical Decision Making : and Predictive Accuracy text Stavroula Chrysanthopoulou, PhD Department of Biostatistics
More informationApplication of Cox Regression in Modeling Survival Rate of Drug Abuse
American Journal of Theoretical and Applied Statistics 2018; 7(1): 1-7 http://www.sciencepublishinggroup.com/j/ajtas doi: 10.11648/j.ajtas.20180701.11 ISSN: 2326-8999 (Print); ISSN: 2326-9006 (Online)
More informationAccommodating informative dropout and death: a joint modelling approach for longitudinal and semicompeting risks data
Appl. Statist. (2018) 67, Part 1, pp. 145 163 Accommodating informative dropout and death: a joint modelling approach for longitudinal and semicompeting risks data Qiuju Li and Li Su Medical Research Council
More informationBringing machine learning to the point of care to inform suicide prevention
Bringing machine learning to the point of care to inform suicide prevention Gregory Simon and Susan Shortreed Kaiser Permanente Washington Health Research Institute Don Mordecai The Permanente Medical
More informationFundamental Clinical Trial Design
Design, Monitoring, and Analysis of Clinical Trials Session 1 Overview and Introduction Overview Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics, University of Washington February 17-19, 2003
More informationWhat is Regularization? Example by Sean Owen
What is Regularization? Example by Sean Owen What is Regularization? Name3 Species Size Threat Bo snake small friendly Miley dog small friendly Fifi cat small enemy Muffy cat small friendly Rufus dog large
More informationTITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS)
TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS) AUTHORS: Tejas Prahlad INTRODUCTION Acute Respiratory Distress Syndrome (ARDS) is a condition
More informationMediation Analysis With Principal Stratification
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 3-30-009 Mediation Analysis With Principal Stratification Robert Gallop Dylan S. Small University of Pennsylvania
More informationarxiv: v1 [cs.lg] 29 Nov 2017
A Novel Data-Driven Framework for Risk Characterization and Prediction from Electronic Medical Records: A Case Study of Renal Failure arxiv:1711.11022v1 [cs.lg] 29 Nov 2017 Prithwish Chakraborty 1, Vishrawas
More informationLogistic regression. Department of Statistics, University of South Carolina. Stat 205: Elementary Statistics for the Biological and Life Sciences
Logistic regression Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences 1 / 1 Logistic regression: pp. 538 542 Consider Y to be binary
More informationLecture Outline Biost 517 Applied Biostatistics I
Lecture Outline Biost 517 Applied Biostatistics I Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 2: Statistical Classification of Scientific Questions Types of
More informationCancer survivorship and labor market attachments: Evidence from MEPS data
Cancer survivorship and labor market attachments: Evidence from 2008-2014 MEPS data University of Memphis, Department of Economics January 7, 2018 Presentation outline Motivation and previous literature
More informationSupplement for: CD4 cell dynamics in untreated HIV-1 infection: overall rates, and effects of age, viral load, gender and calendar time.
Supplement for: CD4 cell dynamics in untreated HIV-1 infection: overall rates, and effects of age, viral load, gender and calendar time. Anne Cori* 1, Michael Pickles* 1, Ard van Sighem 2, Luuk Gras 2,
More information3. Model evaluation & selection
Foundations of Machine Learning CentraleSupélec Fall 2016 3. Model evaluation & selection Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr
More informationCase Studies of Signed Networks
Case Studies of Signed Networks Christopher Wang December 10, 2014 Abstract Many studies on signed social networks focus on predicting the different relationships between users. However this prediction
More informationPackage ORIClust. February 19, 2015
Type Package Package ORIClust February 19, 2015 Title Order-restricted Information Criterion-based Clustering Algorithm Version 1.0-1 Date 2009-09-10 Author Maintainer Tianqing Liu
More informationAdjusting for mode of administration effect in surveys using mailed questionnaire and telephone interview data
Adjusting for mode of administration effect in surveys using mailed questionnaire and telephone interview data Karl Bang Christensen National Institute of Occupational Health, Denmark Helene Feveille National
More informationUnit 1 Exploring and Understanding Data
Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile
More informationArticle from. Forecasting and Futurism. Month Year July 2015 Issue Number 11
Article from Forecasting and Futurism Month Year July 2015 Issue Number 11 Calibrating Risk Score Model with Partial Credibility By Shea Parkes and Brad Armstrong Risk adjustment models are commonly used
More informationModelling heterogeneity variances in multiple treatment comparison meta-analysis Are informative priors the better solution?
Thorlund et al. BMC Medical Research Methodology 2013, 13:2 RESEARCH ARTICLE Open Access Modelling heterogeneity variances in multiple treatment comparison meta-analysis Are informative priors the better
More informationStructured Association Advanced Topics in Computa8onal Genomics
Structured Association 02-715 Advanced Topics in Computa8onal Genomics Structured Association Lasso ACGTTTTACTGTACAATT Gflasso (Kim & Xing, 2009) ACGTTTTACTGTACAATT Greater power Fewer false posi2ves Phenome
More informationBiostatistics II
Biostatistics II 514-5509 Course Description: Modern multivariable statistical analysis based on the concept of generalized linear models. Includes linear, logistic, and Poisson regression, survival analysis,
More informationIdentification of Tissue Independent Cancer Driver Genes
Identification of Tissue Independent Cancer Driver Genes Alexandros Manolakos, Idoia Ochoa, Kartik Venkat Supervisor: Olivier Gevaert Abstract Identification of genomic patterns in tumors is an important
More informationPackage speff2trial. February 20, 2015
Version 1.0.4 Date 2012-10-30 Package speff2trial February 20, 2015 Title Semiparametric efficient estimation for a two-sample treatment effect Author Michal Juraska , with contributions
More informationMathematisches Forschungsinstitut Oberwolfach. Statistische und Probabilistische Methoden der Modellwahl
Mathematisches Forschungsinstitut Oberwolfach Report No. 47/2005 Statistische und Probabilistische Methoden der Modellwahl Organised by James O. Berger (Durham) Holger Dette (Bochum) Gabor Lugosi (Barcelona)
More informationTutorial #7A: Latent Class Growth Model (# seizures)
Tutorial #7A: Latent Class Growth Model (# seizures) 2.50 Class 3: Unstable (N = 6) Cluster modal 1 2 3 Mean proportional change from baseline 2.00 1.50 1.00 Class 1: No change (N = 36) 0.50 Class 2: Improved
More informationPubH 7405: REGRESSION ANALYSIS
PubH 7405: REGRESSION ANALYSIS APLLICATIONS B: MORE BIOMEDICAL APPLICATIONS #. SMOKING & CANCERS There is strong association between lung cancer and smoking; this has been thoroughly investigated. A study
More informationRISK PREDICTION MODEL: PENALIZED REGRESSIONS
RISK PREDICTION MODEL: PENALIZED REGRESSIONS Inspired from: How to develop a more accurate risk prediction model when there are few events Menelaos Pavlou, Gareth Ambler, Shaun R Seaman, Oliver Guttmann,
More informationSupplementary Appendix
Supplementary Appendix This appendix has been provided by the authors to give readers additional information about their work. Supplement to: Rawshani Aidin, Rawshani Araz, Franzén S, et al. Risk factors,
More informationWeight Adjustment Methods using Multilevel Propensity Models and Random Forests
Weight Adjustment Methods using Multilevel Propensity Models and Random Forests Ronaldo Iachan 1, Maria Prosviryakova 1, Kurt Peters 2, Lauren Restivo 1 1 ICF International, 530 Gaither Road Suite 500,
More informationSurvival Prediction Models for Estimating the Benefit of Post-Operative Radiation Therapy for Gallbladder Cancer and Lung Cancer
Survival Prediction Models for Estimating the Benefit of Post-Operative Radiation Therapy for Gallbladder Cancer and Lung Cancer Jayashree Kalpathy-Cramer PhD 1, William Hersh, MD 1, Jong Song Kim, PhD
More informationEthnic Disparities in the Treatment of Stage I Non-small Cell Lung Cancer. Juan P. Wisnivesky, MD, MPH, Thomas McGinn, MD, MPH, Claudia Henschke, PhD,
Ethnic Disparities in the Treatment of Stage I Non-small Cell Lung Cancer Juan P. Wisnivesky, MD, MPH, Thomas McGinn, MD, MPH, Claudia Henschke, PhD, MD, Paul Hebert, PhD, Michael C. Iannuzzi, MD, and
More informationPersonalized Colorectal Cancer Survivability Prediction with Machine Learning Methods*
Personalized Colorectal Cancer Survivability Prediction with Machine Learning Methods* 1 st Samuel Li Princeton University Princeton, NJ seli@princeton.edu 2 nd Talayeh Razzaghi New Mexico State University
More informationDetecting Anomalous Patterns of Care Using Health Insurance Claims
Partially funded by National Science Foundation grants IIS-0916345, IIS-0911032, and IIS-0953330, and funding from Disruptive Health Technology Institute. We are also grateful to Highmark Health for providing
More informationCross-validation. Miguel Angel Luque Fernandez Faculty of Epidemiology and Population Health Department of Non-communicable Disease.
Cross-validation Miguel Angel Luque Fernandez Faculty of Epidemiology and Population Health Department of Non-communicable Disease. August 25, 2015 Cancer Survival Group (LSH&TM) Cross-validation August
More informationNon-homogenous Poisson Process for Evaluating Stage I & II Ductal Breast Cancer Treatment
Journal of Modern Applied Statistical Methods Volume 10 Issue 2 Article 23 11-1-2011 Non-homogenous Poisson Process for Evaluating Stage I & II Ductal Breast Cancer Treatment Chris P Tsokos University
More informationLecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics
Biost 517 Applied Biostatistics I Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 3: Overview of Descriptive Statistics October 3, 2005 Lecture Outline Purpose
More informationLogistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision in Pune, India
20th International Congress on Modelling and Simulation, Adelaide, Australia, 1 6 December 2013 www.mssanz.org.au/modsim2013 Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision
More informationPredictive Analytics for Retention in HIV Care
Predictive Analytics for Retention in HIV Care Jessica Ridgway, MD, MS; Arthi Ramachandran, PhD; Hannes Koenig, MS; Avishek Kumar, PhD; Joseph Walsh, PhD; Christina Sung; Rayid Ghani, MS; John A. Schneider,
More informationCounty-Level Small Area Estimation using the National Health Interview Survey (NHIS) and the Behavioral Risk Factor Surveillance System (BRFSS)
County-Level Small Area Estimation using the National Health Interview Survey (NHIS) and the Behavioral Risk Factor Surveillance System (BRFSS) Van L. Parsons, Nathaniel Schenker Office of Research and
More informationMultiple Mediation Analysis For General Models -with Application to Explore Racial Disparity in Breast Cancer Survival Analysis
Multiple For General Models -with Application to Explore Racial Disparity in Breast Cancer Survival Qingzhao Yu Joint Work with Ms. Ying Fan and Dr. Xiaocheng Wu Louisiana Tumor Registry, LSUHSC June 5th,
More informationBivariate Density Estimation With an Application to Survival Analysis
Bivariate Density Estimation With an Application to Survival Analysis Charles KOOPERBERG A procedure for estimating a bivariate density based on data that may be censored is described After the data are
More informationCan Family Strengths Reduce Risk of Substance Abuse among Youth with SED?
Can Family Strengths Reduce Risk of Substance Abuse among Youth with SED? Michael D. Pullmann Portland State University Ana María Brannan Vanderbilt University Robert Stephens ORC Macro, Inc. Relationship
More informationPackage xmeta. August 16, 2017
Type Package Title A Toolbox for Multivariate Meta-Analysis Version 1.1-4 Date 2017-5-12 Package xmeta August 16, 2017 Imports aod, glmmml, numderiv, metafor, mvmeta, stats A toolbox for meta-analysis.
More informationIndividualized Treatment Effects Using a Non-parametric Bayesian Approach
Individualized Treatment Effects Using a Non-parametric Bayesian Approach Ravi Varadhan Nicholas C. Henderson Division of Biostatistics & Bioinformatics Department of Oncology Johns Hopkins University
More informationarxiv: v2 [stat.ap] 7 Dec 2016
A Bayesian Approach to Predicting Disengaged Youth arxiv:62.52v2 [stat.ap] 7 Dec 26 David Kohn New South Wales 26 david.kohn@sydney.edu.au Nick Glozier Brain Mind Centre New South Wales 26 Sally Cripps
More informationReview: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections
Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections New: Bias-variance decomposition, biasvariance tradeoff, overfitting, regularization, and feature selection Yi
More informationA Comparison of Methods for Determining HIV Viral Set Point
STATISTICS IN MEDICINE Statist. Med. 2006; 00:1 6 [Version: 2002/09/18 v1.11] A Comparison of Methods for Determining HIV Viral Set Point Y. Mei 1, L. Wang 2, S. E. Holte 2 1 School of Industrial and Systems
More information