Regression Tree Methods for Precision Medicine
|
|
- Deirdre Fields
- 5 years ago
- Views:
Transcription
1 Regression Tree Methods for Precision Medicine Wei-Yin Loh Department of Statistics University of Wisconsin Madison W-Y Loh July 12,
2 Subgroup identification: breast cancer trial Randomized trial of 672 subjects with primary node positive breast cancer (Schumacher et al., 1994) Response is recurrence-free survival time ( days, 299 uncensored, 387 censored) Eight predictor variables with no missing values: 1. horth (hormone therapy, yes/no) 2. age (21 80 years) 3. tsize (tumor size, mm) 4. pnodes (number of positive lymph nodes, 1 51) 5. progrec (progesterone receptor status, fmol) 6. estrec (estrogen receptor status, fmol) 7. menostat (menopausal status, pre/post) 8. tgrade (tumor grade, 1, 2, 3) W-Y Loh July 12,
3 Survival probability horth = no horth = yes Days Variable Coef p-value Variable Coef p-value horth=yes e-03 tsize age pnodes e-11 meno=pre progrec tgrade estrec W-Y Loh July 12,
4 cor(estrec,progrec) = 0.39 cor(ln(estrec+1),ln(progrec+1)) = 0.64 estrec estrec progrec progrec+1 W-Y Loh July 12,
5 GUIDE model (2nd best variable is estrec) progrec Node 2 Node 3 Survival probability horth = yes horth = no horth = yes horth = no W-Y Loh July 12,
6 Earlier subgroup identification methods Interaction trees (Su et al., 2008, 2009). For each X and split set S (e.g., {X < c} or {X A}), fit E(Y) = β 0 +β 2 I(S)+β 1 Z +β 3 Z I(S) to data. Find split (X,S) with most significant interaction (β 3 ). SIDES: (Lipkovich et al., 2011; Lipkovich and Dmitrienko, 2014). Find split (X,S) with most significant between-node difference in treatment effects. QUINT: Qualitative interaction tree (Dusseldorp and Van Mechelen, 2014) Find split (X,S) to optimize function of effect size and subgroup size. VT: Virtual twins (Foster et al., 2011). 1. Fit a Random forest model (Breiman, 2001) to observed outcomes y obs 2. Use model to predict counterfactual outcomes y unobs (other treatment) 3. Fit CART model to (y obs y unobs ) to find subgroups W-Y Loh July 12,
7 Limitations 1. Most methods follow CART approach of greedy search over all (X,S) result is bias in variable selection 2. Many are only applicable to 2 treatment levels 3. Most require imputation to deal with missing covariate values but imputation is possibly the hardest problem in statistics! 4. All are designed for univariate response only; extension to multivariate or longitudinal, time-dependent response is not straightforward W-Y Loh July 12,
8 Selection bias of CART and Random forest Ordinal X with n distinct values allows (n1) splits of the form {X c} Categorical X with m levels has (2 m1 1) splits of the form {X A} Bias: Variables with large n and m have more chance to split a node W-Y Loh July 12,
9 Example of selection bias: predicting heart disease 617 observations, no missing values Response is diagnosis of heart disease (5 levels) 52 predictor variables (29 ordinal, 23 categorical), including 1. ekgmo: month of electrocardiogram (12 values, splits) 2. ekgday: day of electrocardiogram (31 values, splits) W-Y Loh July 12,
10 RPART tree (Breiman et al., 1984) (3.6 hrs) W-Y Loh July 12,
11 GUIDE tree (Loh, 2002, 2009) (3 sec.) lmt=0 rcaprox 1 lmt 1 ladprox 1 rcadist 1 cxmain 1 2 8/15 laddist /32 ladprox 1 rcadist 1 cxmain 1 rcaprox 1 4 1/31 laddist 1 ladprox 1 cxmain 1 laddist 1 1 2/15 2/11 0/7 1 2 laddist 1 cxmain 1 ladprox 1 3 0/11 cxmain 1 3 3/18 ramus 1 3 0/27 2/ /6 cxmain 1 cxmain ramus om /10 2/9 2/ / /18 om /9 5/63 0/20 1/16 0/7 4/36 0/ / /7 W-Y Loh July 12,
12 Many missing values: a retrospective candidate gene study 1504 subjects randomized to treatment or placebo Response is survival time in days, with 63% censored 23 baseline (17 ordered, 6 categorical) and 282 genetic (cat.) variables 95% of subjects have missing values; only 7 variables are complete Survival probability Treatment Placebo Days W-Y Loh July 12,
13 GUIDE model with 95% bootstrap intervals for relative risk (treatment vs placebo) a2 0.1 or NA (0.73, 1.54) (0.45, 0.81) a2 0.1 or NA a2 > 0.1 Survival probability Treatment Placebo Days Treatment Placebo Days At each node, a case goes to the left child node if stated condition is satisfied. Sample sizes are beside terminal nodes. W-Y Loh July 12,
14 GUIDE method for subgroup identification (Loh, 2014; Loh et al., 2015) 1. Let Z = 1, 2,..., be treatment variable and X a split variable 2. Do for each X at each node: (a) If X is a categorical variable, add a category to X for missing values and test lack of fit of the additive model: EY = η + j β j I(X = j)+ k γ k I(Z = k) (b) If X is ordinal, convert it to categorical by discretization at quartiles compare with: EY = η+ β j I(X j < c j )+ γ k I(Z = k)+ ω jk I(X j < c j,z = k) j k j k 3. Let X be the variable with the most significant chi-squared 4. Find split on X that minimizes sum of squared residuals of the model EY = η + k γ ki(z = k) fitted to each subnode W-Y Loh July 12,
15 Type 2 diabetes longitudinal study with missing values in responses and covariates (Loh et al., 2016) 1249 subjects from a multi-center, randomized double-blind trial (Charbonnel et al., 2004) Subjects randomized to a 52-week treatment period of drug G (Gliclazide) or P (Pioglitazone) 24 baseline (time 0) variables measured for each subject as well as their HbA1c at 10 time points (-2, 0, 4, 8, 12, 16, 24, 32, 42, and 52 weeks) Gliclazide increases amount of insulin produced by the pancreas Pioglitazone improves how body uses insulin ( insulin sensitizer ) W-Y Loh July 12,
16 HbA1c means for 747 subjects A1C Pioglitazone Gliclazide Weeks W-Y Loh July 12,
17 Baseline variables and their missing values Variable #Missing Variable #Missing HDL 7 Age 0 LDL 77 Weight 1 Total cholesterol 6 BMI 0 Triglycerides 6 Waist 4 Creatinine 0 A1CBase 0 Fasting insulin 46 HomaS 62 ALT 0 HomaIR 62 AST 0 HomaB 62 GGT 0 Diastolic blood pressure 0 C-peptide 593 Systolic blood pressure 0 Diabetes duration 0 Pulse 0 Fasting blood glucose 0 W-Y Loh July 12,
18 GUIDE tree with 95% bootstrap CIs (Loh et al., 2016) HOMAB Fasting blood glucose Weeks Gliclazide Pioglitazone Node Weeks Gliclazide Pioglitazone Node Weeks Gliclazide Pioglitazone Node 7 W-Y Loh July 12,
19 Frequently (and not so frequently) asked questions 1. P(Type I error) controlled? 2. Subgroup correct? 3. Split points statistically significant? 4. Estimated subgroup treatment effects unbiased? 5. Estimated subgroup treatment effects statistically significant? 6. Estimated subgroup treatment effects confounded with covariates? W-Y Loh July 12,
20 Q1. Does GUIDE control P(Type I error)? As n, the estimated regression function is asymptotically consistent (Chaudhuri et al., 1994, 1995; Chaudhuri and Loh, 2002). Hence P(Type I error) 0 W-Y Loh July 12,
21 Q2. Is subgroup correctly identified? Surprise! There is no correct subgroup progrec Node 2 Node 3 Survival probability horth = yes horth = no horth = yes horth = no W-Y Loh July 12,
22 Model without progrec estrec Node Node 3 Survival probability horth = yes horth = no horth = yes horth = no W-Y Loh July 12,
23 Where is the correct subgroup? progrec estrec estrec progrec+1 W-Y Loh July 12,
24 Q3. Are split points statistically significant? Consider these two simulation models Jump model Broken line model Y Y X X W-Y Loh July 12,
25 IT and SIDES vs GUIDE Jump model (true subgroup marked by dotted line) Interaction Trees SIDES GUIDE Response Drug Placebo Response Drug Placebo Response Drug Placebo Biomarker Biomarker Biomarker Interaction Trees maximizes significance of treatment-biomarker interaction SIDES minimizes p-value of difference between treatment effects GUIDE minimizes sum of squared residuals W-Y Loh July 12,
26 Mean of split point for two models Model Interaction Trees SIDES GUIDE Jump (0.003) (0.005) (0.002) Broken line (0.006) (0.013) (0.006) based on iterations; simulation SEs in parentheses For Jump model, true split point is 5.0 For Broken line model, true split point is undefined W-Y Loh July 12,
27 Q4. Are estimated treatment effects unbiased? Ans: Usually not, but some methods are better Subgroup treatment effect bias Model Interaction Trees SIDES GUIDE Jump (0.001) (0.001) (0.001) Broken line (0.001) (0.002) (0.001) based on iterations; simulation SEs in parentheses W-Y Loh July 12,
28 Q5. Are treatment effects statistically significant? 1. Subgroups are random because they are results of search algorithms 2. Hence, unlike classical theory, true subgroup effects θ are also random 3. Statistical significance of estimates ˆθ must account for the search 4. P-value requires a null hypothesis H 0 but what is H 0? W-Y Loh July 12,
29 Bootstrap calibration (Loh, 1987, 1991) Naïve intervals too short do not account for subgroup search Need to increase nominal confidence level Use bootstrap to estimate true confidence levels Increase nominal level of intervals to reach desired level W-Y Loh July 12,
30 Tree from a bootstrap sample X X Real data Bootstrap sample x x x x 1 W-Y Loh July 12,
31 Bootstrap calibrated intervals (Loh et al., 2016) 1. Let F be true (unknown) distribution of data 2. Given sample of data, construct a tree model 3. Given γ, construct a nominal 100γ% interval at each terminal node 4. Let C(F,γ) be true average coverage of nominal 100γ% intervals 5. Let γ F be such that C(F,γ F ) = If we know F, construct nominal 100γ F % intervals and we are finished 7. Because F is unknown, let ˆF be its bootstrap estimate 8. Use simulation to find calibrated level γˆf such that C(ˆF,γˆF) = Construct desired intervals at nominal level γˆf W-Y Loh July 12,
32 Bootstrap calibrated alpha for 95% confidence intervals Bootstrap coverage Nominal alpha W-Y Loh July 12,
33 95% bootstrap intervals for RR (therapy vs none) progrec (0.56,1.42) (0.30,0.89) Bootstrap calibrated ˆα = Node 2 Node 3 Survival probability horth = yes horth = no horth = yes horth = no W-Y Loh July 12,
34 Coverage of 95% CIs for treatment effect for breast cancer data Naïve t interval ± Bootstrap calibrated interval ± simulation trials with 25 bootstraps each (± 2 simulation SEs in parentheses) W-Y Loh July 12,
35 Q6. How to ensure treatment effects are unconfounded within subgroups? Many studies include prognostic variables (e.g., age, tumor size) Treatment randomization balances the overall effects of these variables But balance may be upset within subgroups W-Y Loh July 12,
36 95% bootstrap intervals for RR due to horth with linear control of prognostic variables progrec 24 1 (0.56,1.18) (0.34,0.82) Bootstrap calibrated ˆα = Node 2 Node 3 coef p-value coef p-value constant pnodes horth=yes unadjusted p-values W-Y Loh July 12,
37 Coverage (± 2 SEs) of 95% CIs for treatment effect with local linear prognostic control for breast cancer data Naïve t interval ± Bootstrap calibrated interval ± based on 1200 simulation trials with 25 bootstraps per trial W-Y Loh July 12,
38 Conclusions 1. Asking for correct subgroup is naïve: often there is no unique subgroup 2. GUIDE handles missing values without imputation 3. GUIDE has no selection bias: does not select variables that have more splits 4. GUIDE seems to give less biased estimates of subgroup treatment effects 5. GUIDE can control for prognostic effects within subgroups 6. Simple way to assess statistical significance is bootstrap calibrated intervals Some outstanding problems 1. Given a tree model, how to tell which node defines the subgroup? 2. How to remove the bias in estimated treatment effect in the subgroup? 3. How to deal with longitudinal (time-dependent) covariates? W-Y Loh July 12,
39 Acknowledgments Xu He, Michael Man and Lei Shen Probal Chaudhuri Yu-Shan Shih, Wei Zheng and 18 other PhD students US Army Research Office US National Science Foundation US Bureau of Labor Statistics US National Institutes of Health AbbVie, Eli Lilly, Gilead Sciences, Pfizer and Takeda W-Y Loh July 12,
40 References Breiman, L. (2001). Random forests. Machine Learning, 45:5 32. Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984). Classification and Regression Trees. Chapman & Hall/CRC. Charbonnel, B. H.and Matthews, D. R., Schernthaner, G., Hanefeld, M., and Brunetti, P. (2004). A long-term comparison of Pioglitazone and Gliclazide in patients with Type 2 diabetes mellitus: a randomized, double-blind, parallel-group comparison trial. Diabetic Medicine, 22: Chaudhuri, P., Huang, M.-C., Loh, W.-Y., and Yao, R. (1994). Piecewise-polynomial regression trees. Statistica Sinica, 4: Chaudhuri, P., Lo, W.-D., Loh, W.-Y., and Yang, C.-C. (1995). Generalized regression trees. Statistica Sinica, 5: Chaudhuri, P. and Loh, W.-Y. (2002). Nonparametric estimation of conditional quantiles using quantile regression trees. Bernoulli, 8: W-Y Loh July 12,
41 Dusseldorp, E. and Van Mechelen, I. (2014). Qualitative interaction trees: a tool to identify qualitative treatment-subgroup interactions. Statistics in Medicine, 33: Foster, J. C., Taylor, J. M. G., and Ruberg, S. J. (2011). Subgroup identification from randomized clinical trial data. Statistics in Medicine, 30: Lipkovich, I. and Dmitrienko, A. (2014). Strategies for identifying predictive biomarkers and subgroups with enhanced treatment effect in clinical trials using SIDES. Journal of Biopharmaceutical Statistics, 24: Lipkovich, I., Dmitrienko, A., Denne, J., and Enas, G. (2011). Subgroup identification based on differential effect search a recursive partitioning method for establishing response to treatment in patient subpopulations. Statistics in Medicine, 30: Loh, W.-Y. (1987). Calibrating confidence coefficients. Journal of the American Statistical Association, 82: Loh, W.-Y. (1991). Bootstrap calibration for confidence interval construction and selection. Statistica Sinica, 1: W-Y Loh July 12,
42 Loh, W.-Y. (2002). Regression trees with unbiased variable selection and interaction detection. Statistica Sinica, 12: Loh, W.-Y. (2009). Improving the precision of classification trees. Annals of Applied Statistics, 3: Loh, W.-Y. (2014). Fifty years of classification and regression trees (with discussion). International Statistical Review, 34: Loh, W.-Y., Fu, H., Man, M., Champion, V., and Yu, M. (2016). Identification of subgroups with differential treatment effects for longitudinal and multiresponse variables. Statistics in Medicine, 35: Loh, W.-Y., He, X., and Man, M. (2015). A regression tree approach to identifying subgroups with differential treatment effects. Statistics in Medicine, 34: Schumacher, M., Baster, G., Bojar, H., Hübner, K., Olschewski, M., Sauerbrei, W., Schmoor, C., Beyerle, C., Newmann, R. L. A., and Rauschecker, H. F. (1994). Randomized 2 2 trial evaluating hormonal treatment and the W-Y Loh July 12,
43 duration of chemotherapy in node-positive breast cancer patients. Journal of Clinical Oncology, 12: Su, X., Tsai, C. L., Wang, H., Nickerson, D. M., and Bogong, L. (2009). Subgroup analysis via recursive partitioning. Journal of Machine Learning Research, 10: Su, X., Zhou, T., Yan, X., Fan, J., and Yang, S. (2008). Interaction trees with censored survival data. International Journal of Biostatistics, 4. Article 2. W-Y Loh July 12,
44 Model for 10-week A1C without linear control InsulinFastpmolLBase A1CBase Sample sizes below nodes; treatment means for G and P beside nodes. Symbol stands for or missing. Red nodes indicate significant treatment effects. W-Y Loh July 12,
45 Node 2: Terminal node Regressor Coefficient t-stat p-val Thera.P E Mean of A1C10 = Node 6: Terminal node Regressor Coefficient t-stat p-val Thera.P E Mean of A1C10 = Node 7: Terminal node Regressor Coefficient t-stat p-val Thera.P E Mean of A1C10 = W-Y Loh July 12,
46 Model for 10-week A1C with linear control InsulinFastpmolLBase A1CBase A1CBase A1CBase FastBGBase Sample size, mean A1C10 and linear covariate below node. Red nodes indicate significant treatment effects. W-Y Loh July 12,
47 Node 2: Terminal node Regressor Coefficient t-stat p-val A1CBase E Thera.P E Mean of A1C10 = Node 6: Terminal node Regressor Coefficient t-stat p-val A1CBase E Thera.P E Mean of A1C10 = Node 7: Terminal node Regressor Coefficient t-stat p-val FastBGBase E Thera.P E Mean of A1C10 = W-Y Loh July 12,
48 Extension to censored response data via Poisson regression 1. Let U i and C i be survival and censoring times of subject i 2. Let Y i = min(u i,c i ) and δ i = I(T i < C i ) be the event indicator 3. Let Λ 0 (.) be the baseline cumulative hazard function of PH model 4. Estimate coefficients of PH model by iteratively fitting a Poisson regression model with δ i as response and logλ 0 (y i ) as offset: (a) Use the Nelson-Aalen method to get an initial estimate of Λ 0 (.) (b) Use GUIDE to construct a Poisson regression tree (c) Update Λ 0 (.) with the tree (d) Repeat steps (b) and (c) four more times W-Y Loh July 12,
49 Do at each node: Extension to multiple responses 1. For each response variable Y j, find chi-squared of each X variable 2. Choose the variable X with largest sum of chi-squared values over j 3. Choose the split on X that yields smallest sum of squared residuals over all response variables Extension to correlated response variables Apply principal components of Y variables computed locally at each node W-Y Loh July 12,
Data mining methods for subgroup identification. Ilya Lipkovich and Alex Dmitrienko, Quintiles TICTS, April 22, 2014
Data mining methods for subgroup identification Ilya Lipkovich and Alex Dmitrienko, Quintiles TICTS, April 22, 2014 Outline Introduction Principles and standards for Subgroup Analysis in clinical research
More informationMultivariable Cox regression. Day 3: multivariable Cox regression. Presentation of results. The statistical methods section
Outline: Multivariable Cox regression PhD course Survival analysis Day 3: multivariable Cox regression Thomas Alexander Gerds Presentation of results The statistical methods section Modelling The linear
More informationPredicting Breast Cancer Survival Using Treatment and Patient Factors
Predicting Breast Cancer Survival Using Treatment and Patient Factors William Chen wchen808@stanford.edu Henry Wang hwang9@stanford.edu 1. Introduction Breast cancer is the leading type of cancer in women
More informationAn Empirical Comparison of Principal Component Analysis and Clustering on Variables for Dimension Reduction Using Leukemia and Breast Cancer Data
International Journal of Statistics and Applications 2018, 8(3): 144-152 DOI: 10.5923/j.statistics.20180803.05 An Empirical Comparison of Principal Component Analysis and Clustering on Variables for Dimension
More informationRecursive Partitioning Method on Survival Outcomes for Personalized Medicine
Recursive Partitioning Method on Survival Outcomes for Personalized Medicine Wei Xu, Ph.D Dalla Lana School of Public Health, University of Toronto Princess Margaret Cancer Centre 2nd International Conference
More informationRise of the Machines
Rise of the Machines Statistical machine learning for observational studies: confounding adjustment and subgroup identification Armand Chouzy, ETH (summer intern) Jason Wang, Celgene PSI conference 2018
More informationSurvival Prediction Models for Estimating the Benefit of Post-Operative Radiation Therapy for Gallbladder Cancer and Lung Cancer
Survival Prediction Models for Estimating the Benefit of Post-Operative Radiation Therapy for Gallbladder Cancer and Lung Cancer Jayashree Kalpathy-Cramer PhD 1, William Hersh, MD 1, Jong Song Kim, PhD
More informationDesign, Sampling, and Probability
STAT 269 Design, Sampling, and Probability Three ways to classify data Quantitative vs. Qualitative Quantitative Data: data that represents counts or measurements, answers the questions how much? or how
More informationRecent Advances in Methods for Quantiles. Matteo Bottai, Sc.D.
Recent Advances in Methods for Quantiles Matteo Bottai, Sc.D. Many Thanks to Advisees Andrew Ortaglia Huiling Zhen Joe Holbrook Junlong Wu Li Zhou Marco Geraci Nicola Orsini Paolo Frumento Yuan Liu Collaborators
More informationBayesian additive decision trees of biomarker by treatment interactions for predictive biomarkers detection and subgroup identification
Bayesian additive decision trees of biomarker by treatment interactions for predictive biomarkers detection and subgroup identification Wei Zheng Sanofi-Aventis US Comprehend Info and Tech Talk outlines
More informationLecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method
Biost 590: Statistical Consulting Statistical Classification of Scientific Studies; Approach to Consulting Lecture Outline Statistical Classification of Scientific Studies Statistical Tasks Approach to
More informationEcological Statistics
A Primer of Ecological Statistics Second Edition Nicholas J. Gotelli University of Vermont Aaron M. Ellison Harvard Forest Sinauer Associates, Inc. Publishers Sunderland, Massachusetts U.S.A. Brief Contents
More informationA COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY
A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY Lingqi Tang 1, Thomas R. Belin 2, and Juwon Song 2 1 Center for Health Services Research,
More informationSupplemental Table S2: Subgroup analysis for IL-6 with BMI in 3 groups
Supplemental Table S1: Unadjusted and Adjusted Hazard Ratios for Diabetes Associated with Baseline Factors Considered in Model 3 SMART Participants Only Unadjusted Adjusted* Baseline p-value p-value Covariate
More informationSYNOPSIS OF RESEARCH REPORT (PROTOCOL BC20779)
TITLE OF THE STUDY / REPORT No. / DATE OF REPORT INVESTIGATORS / CENTERS AND COUNTRIES Clinical Study Report Protocol BC20779: Multicenter, double-blind, randomized, placebo-controlled, dose ranging phase
More informationSUPPLEMENTARY MATERIAL
SUPPLEMENTARY MATERIAL Supplementary Figure 1. Recursive partitioning using PFS data in patients with advanced NSCLC with non-squamous histology treated in the placebo pemetrexed arm of LUME-Lung 2. (A)
More informationUsing Ensemble-Based Methods for Directly Estimating Causal Effects: An Investigation of Tree-Based G-Computation
Institute for Clinical Evaluative Sciences From the SelectedWorks of Peter Austin 2012 Using Ensemble-Based Methods for Directly Estimating Causal Effects: An Investigation of Tree-Based G-Computation
More informationLandmarking, immortal time bias and. Dynamic prediction
Landmarking and immortal time bias Landmarking and dynamic prediction Discussion Landmarking, immortal time bias and dynamic prediction Department of Medical Statistics and Bioinformatics Leiden University
More informationComparison of discrimination methods for the classification of tumors using gene expression data
Comparison of discrimination methods for the classification of tumors using gene expression data Sandrine Dudoit, Jane Fridlyand 2 and Terry Speed 2,. Mathematical Sciences Research Institute, Berkeley
More informationUnderstandable Statistics
Understandable Statistics correlated to the Advanced Placement Program Course Description for Statistics Prepared for Alabama CC2 6/2003 2003 Understandable Statistics 2003 correlated to the Advanced Placement
More informationSupplementary Online Content
Supplementary Online Content Larsen JR, Vedtofte L, Jakobsen MSL, et al. Effect of liraglutide treatment on prediabetes and overweight or obesity in clozapine- or olanzapine-treated patients with schizophrenia
More informationDynamic prediction using joint models for recurrent and terminal events: Evolution after a breast cancer
Dynamic prediction using joint models for recurrent and terminal events: Evolution after a breast cancer A. Mauguen, B. Rachet, S. Mathoulin-Pélissier, S. Siesling, G. MacGrogan, A. Laurent, V. Rondeau
More informationTo cite this article:
To cite this article: Sies, A., Demyttenaere, K., & Van Mechelen, I. (in press). Studying treatment-effect heterogeneity in precision medicine through induced subgroups. Journal of Biopharmaceutical Statistics.
More informationCentral pressures and prediction of cardiovascular events in erectile dysfunction patients
Central pressures and prediction of cardiovascular events in erectile dysfunction patients N. Ioakeimidis, K. Rokkas, A. Angelis, Z. Kratiras, M. Abdelrasoul, C. Georgakopoulos, D. Terentes-Printzios,
More informationA comparison of five recursive partitioning methods to find person subgroups involved in meaningful treatment subgroup interactions
Adv Data Anal Classif DOI 1.17/s11634-13-159-x REGULAR ARTICLE A comparison of five recursive partitioning methods to find person subgroups involved in meaningful treatment subgroup interactions L. L.
More informationRoadmap for Developing and Validating Therapeutically Relevant Genomic Classifiers. Richard Simon, J Clin Oncol 23:
Roadmap for Developing and Validating Therapeutically Relevant Genomic Classifiers. Richard Simon, J Clin Oncol 23:7332-7341 Presented by Deming Mi 7/25/2006 Major reasons for few prognostic factors to
More informationLEPTIN AS A NOVEL PREDICTOR OF DEPRESSION IN PATIENTS WITH THE METABOLIC SYNDROME
LEPTIN AS A NOVEL PREDICTOR OF DEPRESSION IN PATIENTS WITH THE METABOLIC SYNDROME Diana A. Chirinos, Ronald Goldberg, Elias Querales-Mago, Miriam Gutt, Judith R. McCalla, Marc Gellman and Neil Schneiderman
More informationMODEL SELECTION STRATEGIES. Tony Panzarella
MODEL SELECTION STRATEGIES Tony Panzarella Lab Course March 20, 2014 2 Preamble Although focus will be on time-to-event data the same principles apply to other outcome data Lab Course March 20, 2014 3
More informationbreast cancer; relative risk; risk factor; standard deviation; strength of association
American Journal of Epidemiology The Author 2015. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail:
More informationLecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics
Biost 517 Applied Biostatistics I Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 3: Overview of Descriptive Statistics October 3, 2005 Lecture Outline Purpose
More informationDescribe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo
Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment
More informationBusiness Statistics Probability
Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment
More informationSupplementary Appendix
Supplementary Appendix This appendix has been provided by the authors to give readers additional information about their work. Supplement to: Serra AL, Poster D, Kistler AD, et al. Sirolimus and kidney
More informationModelling prognostic capabilities of tumor size: application to colorectal cancer
Session 3: Epidemiology and public health Modelling prognostic capabilities of tumor size: application to colorectal cancer Virginie Rondeau, INSERM Modelling prognostic capabilities of tumor size : application
More informationLearning Objectives 9/9/2013. Hypothesis Testing. Conflicts of Interest. Descriptive statistics: Numerical methods Measures of Central Tendency
Conflicts of Interest I have no conflict of interest to disclose Biostatistics Kevin M. Sowinski, Pharm.D., FCCP Last-Chance Ambulatory Care Webinar Thursday, September 5, 2013 Learning Objectives For
More informationTable S1. Characteristics associated with frequency of nut consumption (full entire sample; Nn=4,416).
Table S1. Characteristics associated with frequency of nut (full entire sample; Nn=4,416). Daily nut Nn= 212 Weekly nut Nn= 487 Monthly nut Nn= 1,276 Infrequent or never nut Nn= 2,441 Sex; n (%) men 52
More information9/4/2013. Decision Errors. Hypothesis Testing. Conflicts of Interest. Descriptive statistics: Numerical methods Measures of Central Tendency
Conflicts of Interest I have no conflict of interest to disclose Biostatistics Kevin M. Sowinski, Pharm.D., FCCP Pharmacotherapy Webinar Review Course Tuesday, September 3, 2013 Descriptive statistics:
More informationBasic Biostatistics. Chapter 1. Content
Chapter 1 Basic Biostatistics Jamalludin Ab Rahman MD MPH Department of Community Medicine Kulliyyah of Medicine Content 2 Basic premises variables, level of measurements, probability distribution Descriptive
More informationDesign for Targeted Therapies: Statistical Considerations
Design for Targeted Therapies: Statistical Considerations J. Jack Lee, Ph.D. Department of Biostatistics University of Texas M. D. Anderson Cancer Center Outline Premise General Review of Statistical Designs
More information12/30/2017. PSY 5102: Advanced Statistics for Psychological and Behavioral Research 2
PSY 5102: Advanced Statistics for Psychological and Behavioral Research 2 Selecting a statistical test Relationships among major statistical methods General Linear Model and multiple regression Special
More informationORIGINAL INVESTIGATION. C-Reactive Protein Concentration and Incident Hypertension in Young Adults
ORIGINAL INVESTIGATION C-Reactive Protein Concentration and Incident Hypertension in Young Adults The CARDIA Study Susan G. Lakoski, MD, MS; David M. Herrington, MD, MHS; David M. Siscovick, MD, MPH; Stephen
More informationUnit 1 Exploring and Understanding Data
Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile
More informationSu Yon Jung 1*, Eric M. Sobel 2, Jeanette C. Papp 2 and Zuo-Feng Zhang 3
Jung et al. BMC Cancer (2017) 17:290 DOI 10.1186/s12885-017-3284-7 RESEARCH ARTICLE Open Access Effect of genetic variants and traits related to glucose metabolism and their interaction with obesity on
More informationBeyond the intention-to treat effect: Per-protocol effects in randomized trials
Beyond the intention-to treat effect: Per-protocol effects in randomized trials Miguel Hernán DEPARTMENTS OF EPIDEMIOLOGY AND BIOSTATISTICS Intention-to-treat analysis (estimator) estimates intention-to-treat
More informationStatistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions
Readings: OpenStax Textbook - Chapters 1 5 (online) Appendix D & E (online) Plous - Chapters 1, 5, 6, 13 (online) Introductory comments Describe how familiarity with statistical methods can - be associated
More informationDifferent styles of modeling
Different styles of modeling Marieke Timmerman m.e.timmerman@rug.nl 19 February 2015 Different styles of modeling (19/02/2015) What is psychometrics? 1/40 Overview 1 Breiman (2001). Statistical modeling:
More informationDescribe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo
Please note the page numbers listed for the Lind book may vary by a page or two depending on which version of the textbook you have. Readings: Lind 1 11 (with emphasis on chapters 10, 11) Please note chapter
More informationSTATISTICAL METHODS FOR DIAGNOSTIC TESTING: AN ILLUSTRATION USING A NEW METHOD FOR CANCER DETECTION XIN SUN. PhD, Kansas State University, 2012
STATISTICAL METHODS FOR DIAGNOSTIC TESTING: AN ILLUSTRATION USING A NEW METHOD FOR CANCER DETECTION by XIN SUN PhD, Kansas State University, 2012 A THESIS Submitted in partial fulfillment of the requirements
More informationAnalyzing diastolic and systolic blood pressure individually or jointly?
Analyzing diastolic and systolic blood pressure individually or jointly? Chenglin Ye a, Gary Foster a, Lisa Dolovich b, Lehana Thabane a,c a. Department of Clinical Epidemiology and Biostatistics, McMaster
More informationRELATIONSHIP OF CLINICAL FACTORS WITH ADIPONECTIN AND LEPTIN IN CHILDREN WITH NEWLY DIAGNOSED TYPE 1 DIABETES. Yuan Gu
RELATIONSHIP OF CLINICAL FACTORS WITH ADIPONECTIN AND LEPTIN IN CHILDREN WITH NEWLY DIAGNOSED TYPE 1 DIABETES by Yuan Gu BE, Nanjing Institute of Technology, China, 2006 ME, University of Shanghai for
More informationTwo-stage Methods to Implement and Analyze the Biomarker-guided Clinical Trail Designs in the Presence of Biomarker Misclassification
RESEARCH HIGHLIGHT Two-stage Methods to Implement and Analyze the Biomarker-guided Clinical Trail Designs in the Presence of Biomarker Misclassification Yong Zang 1, Beibei Guo 2 1 Department of Mathematical
More informationPrediction and Inference under Competing Risks in High Dimension - An EHR Demonstration Project for Prostate Cancer
Prediction and Inference under Competing Risks in High Dimension - An EHR Demonstration Project for Prostate Cancer Ronghui (Lily) Xu Division of Biostatistics and Bioinformatics Department of Family Medicine
More informationLecture 14: Adjusting for between- and within-cluster covariates in the analysis of clustered data May 14, 2009
Measurement, Design, and Analytic Techniques in Mental Health and Behavioral Sciences p. 1/3 Measurement, Design, and Analytic Techniques in Mental Health and Behavioral Sciences Lecture 14: Adjusting
More informationA Robust Recursive Partitioning Algorithm for Mining Multiple Populations
A Robust Recursive Partitioning Algorithm for Mining Multiple Populations Jose Alvir 1 Javier Cabrera 2 Frank Caridi 1 Ha Nguyen 1 Pfizer Inc 1 & Rutgers University 2 Rutgers Biostatistics Day, 4/25/2008
More informationSelection of Linking Items
Selection of Linking Items Subset of items that maximally reflect the scale information function Denote the scale information as Linear programming solver (in R, lp_solve 5.5) min(y) Subject to θ, θs,
More informationMachine Learning to Inform Breast Cancer Post-Recovery Surveillance
Machine Learning to Inform Breast Cancer Post-Recovery Surveillance Final Project Report CS 229 Autumn 2017 Category: Life Sciences Maxwell Allman (mallman) Lin Fan (linfan) Jamie Kang (kangjh) 1 Introduction
More informationGender Differences in Physical Inactivity and Cardiac Events in Men and Women with Type 2 Diabetes
Gender Differences in Physical Inactivity and Cardiac Events in Men and Women with Type 2 Diabetes Margaret M. McCarthy 1 Lawrence Young 2 Silvio Inzucchi 2 Janice Davey 2 Frans J Th Wackers 2 Deborah
More information2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%
Capstone Test (will consist of FOUR quizzes and the FINAL test grade will be an average of the four quizzes). Capstone #1: Review of Chapters 1-3 Capstone #2: Review of Chapter 4 Capstone #3: Review of
More informationSince 1980, obesity has more than doubled worldwide, and in 2008 over 1.5 billion adults aged 20 years were overweight.
Impact of metabolic comorbidity on the association between body mass index and health-related quality of life: a Scotland-wide cross-sectional study of 5,608 participants Dr. Zia Ul Haq Doctoral Research
More informationDetecting Multiple Mean Breaks At Unknown Points With Atheoretical Regression Trees
Detecting Multiple Mean Breaks At Unknown Points With Atheoretical Regression Trees 1 Cappelli, C., 2 R.N. Penny and 3 M. Reale 1 University of Naples Federico II, 2 Statistics New Zealand, 3 University
More informationBiostatistics for Med Students. Lecture 1
Biostatistics for Med Students Lecture 1 John J. Chen, Ph.D. Professor & Director of Biostatistics Core UH JABSOM JABSOM MD7 February 14, 2018 Lecture note: http://biostat.jabsom.hawaii.edu/education/training.html
More informationQuint: An R package for the identification of subgroups of clients who differ in which treatment alternative is best for them
Behav Res (2016) 48:650 663 DOI 10.3758/s13428-015-0594-z Quint: An R package for the identification of subgroups of clients who differ in which treatment alternative is best for them Elise Dusseldorp
More informationIdentifying Change Points in a Covariate Effect on Time-to-Event Analysis with Reduced Isotonic Regression
RESEARCH ARTICLE Identifying Change Points in a Covariate Effect on Time-to-Event Analysis with Reduced Isotonic Regression Yong Ma 1,2 *, Yinglei Lai 1,3, John M. Lachin 1,2 1. The Biostatistics Center,
More informationUK Liver Transplant Audit
November 2012 UK Liver Transplant Audit In patients who received a Liver Transplant between 1 st March 1994 and 31 st March 2012 ANNUAL REPORT Advisory Group for National Specialised Services Prepared
More informationLINEAR REGRESSION FOR BIVARIATE CENSORED DATA VIA MULTIPLE IMPUTATION
STATISTICS IN MEDICINE Statist. Med. 18, 3111} 3121 (1999) LINEAR REGRESSION FOR BIVARIATE CENSORED DATA VIA MULTIPLE IMPUTATION WEI PAN * AND CHARLES KOOPERBERG Division of Biostatistics, School of Public
More informationTable S2: Anthropometric, clinical, cardiovascular and appetite outcome changes over 8 weeks (baseline-week 8) by snack group
Table S1: Nutrient composition of cracker and almond snacks Cracker* Almond** Weight, g 77.5 g (5 sheets) 56.7 g (2 oz.) Energy, kcal 338 364 Carbohydrate, g (kcal) 62.5 12.6 Dietary fiber, g 2.5 8.1 Protein,
More informationCatherine A. Welch 1*, Séverine Sabia 1,2, Eric Brunner 1, Mika Kivimäki 1 and Martin J. Shipley 1
Welch et al. BMC Medical Research Methodology (2018) 18:89 https://doi.org/10.1186/s12874-018-0548-0 RESEARCH ARTICLE Open Access Does pattern mixture modelling reduce bias due to informative attrition
More informationChapter 1: Exploring Data
Chapter 1: Exploring Data Key Vocabulary:! individual! variable! frequency table! relative frequency table! distribution! pie chart! bar graph! two-way table! marginal distributions! conditional distributions!
More informationLEADER Liraglutide and cardiovascular outcomes in type 2 diabetes
LEADER Liraglutide and cardiovascular outcomes in type 2 diabetes Presented at DSBS seminar on mediation analysis August 18 th Søren Rasmussen, Novo Nordisk. LEADER CV outcome study To determine the effect
More informationApplication of Artificial Neural Network-Based Survival Analysis on Two Breast Cancer Datasets
Application of Artificial Neural Network-Based Survival Analysis on Two Breast Cancer Datasets Chih-Lin Chi a, W. Nick Street b, William H. Wolberg c a Health Informatics Program, University of Iowa b
More informationOnline Supplementary Material
Section 1. Adapted Newcastle-Ottawa Scale The adaptation consisted of allowing case-control studies to earn a star when the case definition is based on record linkage, to liken the evaluation of case-control
More informationAnalysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach
University of South Florida Scholar Commons Graduate Theses and Dissertations Graduate School November 2015 Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach Wei Chen
More informationChapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)
Chapter : Advanced Remedial Measures Weighted Least Squares (WLS) When the error variance appears nonconstant, a transformation (of Y and/or X) is a quick remedy. But it may not solve the problem, or it
More informationComparison And Application Of Methods To Address Confounding By Indication In Non- Randomized Clinical Studies
University of Massachusetts Amherst ScholarWorks@UMass Amherst Masters Theses 1911 - February 2014 Dissertations and Theses 2013 Comparison And Application Of Methods To Address Confounding By Indication
More informationSelected Topics in Biostatistics Seminar Series. Missing Data. Sponsored by: Center For Clinical Investigation and Cleveland CTSC
Selected Topics in Biostatistics Seminar Series Missing Data Sponsored by: Center For Clinical Investigation and Cleveland CTSC Brian Schmotzer, MS Biostatistician, CCI Statistical Sciences Core brian.schmotzer@case.edu
More informationSelection and Combination of Markers for Prediction
Selection and Combination of Markers for Prediction NACC Data and Methods Meeting September, 2010 Baojiang Chen, PhD Sarah Monsell, MS Xiao-Hua Andrew Zhou, PhD Overview 1. Research motivation 2. Describe
More informationMagnetic resonance imaging, image analysis:visual scoring of white matter
Supplemental method ULSAM Magnetic resonance imaging, image analysis:visual scoring of white matter hyperintensities (WMHI) was performed by a neuroradiologist using a PACS system blinded of baseline data.
More informationBayesian Prediction Tree Models
Bayesian Prediction Tree Models Statistical Prediction Tree Modelling for Clinico-Genomics Clinical gene expression data - expression signatures, profiling Tree models for predictive sub-typing Combining
More informationTypes of Statistics. Censored data. Files for today (June 27) Lecture and Homework INTRODUCTION TO BIOSTATISTICS. Today s Outline
INTRODUCTION TO BIOSTATISTICS FOR GRADUATE AND MEDICAL STUDENTS Files for today (June 27) Lecture and Homework Descriptive Statistics and Graphically Visualizing Data Lecture #2 (1 file) PPT presentation
More informationApplication of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties
Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties Bob Obenchain, Risk Benefit Statistics, August 2015 Our motivation for using a Cut-Point
More informationMethods for Computing Missing Item Response in Psychometric Scale Construction
American Journal of Biostatistics Original Research Paper Methods for Computing Missing Item Response in Psychometric Scale Construction Ohidul Islam Siddiqui Institute of Statistical Research and Training
More informationApplied Medical. Statistics Using SAS. Geoff Der. Brian S. Everitt. CRC Press. Taylor Si Francis Croup. Taylor & Francis Croup, an informa business
Applied Medical Statistics Using SAS Geoff Der Brian S. Everitt CRC Press Taylor Si Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Croup, an informa business A
More informationInverse Probability of Censoring Weighting for Selective Crossover in Oncology Clinical Trials.
Paper SP02 Inverse Probability of Censoring Weighting for Selective Crossover in Oncology Clinical Trials. José Luis Jiménez-Moro (PharmaMar, Madrid, Spain) Javier Gómez (PharmaMar, Madrid, Spain) ABSTRACT
More informationISIR: Independent Sliced Inverse Regression
ISIR: Independent Sliced Inverse Regression Kevin B. Li Beijing Jiaotong University Abstract In this paper we consider a semiparametric regression model involving a p-dimensional explanatory variable x
More informationDescribe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo
Please note the page numbers listed for the Lind book may vary by a page or two depending on which version of the textbook you have. Readings: Lind 1 11 (with emphasis on chapters 5, 6, 7, 8, 9 10 & 11)
More informationList of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition
List of Figures List of Tables Preface to the Second Edition Preface to the First Edition xv xxv xxix xxxi 1 What Is R? 1 1.1 Introduction to R................................ 1 1.2 Downloading and Installing
More informationPropensity scores and causal inference using machine learning methods
Propensity scores and causal inference using machine learning methods Austin Nichols (Abt) & Linden McBride (Cornell) July 27, 2017 Stata Conference Baltimore, MD Overview Machine learning methods dominant
More informationBEST PRACTICES FOR IMPLEMENTATION AND ANALYSIS OF PAIN SCALE PATIENT REPORTED OUTCOMES IN CLINICAL TRIALS
BEST PRACTICES FOR IMPLEMENTATION AND ANALYSIS OF PAIN SCALE PATIENT REPORTED OUTCOMES IN CLINICAL TRIALS Nan Shao, Ph.D. Director, Biostatistics Premier Research Group, Limited and Mark Jaros, Ph.D. Senior
More information(n=6279). Continuous variables are reported as mean with 95% confidence interval and T1 T2 T3. Number of subjects
Table 1. Distribution of baseline characteristics across tertiles of OPG adjusted for age and sex (n=6279). Continuous variables are reported as mean with 95% confidence interval and categorical values
More informationCausal versus Casual Inference
ASA Biopharmaceutical Section Workshop Washington, DC 13 Sep 2018 Causal versus Casual Inference What Happens When I Take This Medication? Stephen J. Ruberg, PhD President Analytix Thinking, LLC AnalytixThinking@gmail.com
More information"Lack of activity destroys the good condition of every human being, while movement and methodical physical exercise save it and preserve it.
Leave all the afternoon for exercise and recreation, which are as necessary as reading. I will rather say more necessary because health is worth more than learning. - Thomas Jefferson "Lack of activity
More informationSupplementary Appendix
Supplementary Appendix This appendix has been provided by the authors to give readers additional information about their work. Supplement to: Rawshani Aidin, Rawshani Araz, Franzén S, et al. Risk factors,
More informationA Brief (very brief) Overview of Biostatistics. Jody Kreiman, PhD Bureau of Glottal Affairs
A Brief (very brief) Overview of Biostatistics Jody Kreiman, PhD Bureau of Glottal Affairs What We ll Cover Fundamentals of measurement Parametric versus nonparametric tests Descriptive versus inferential
More informationImpact of BMI on pathologic complete response (pcr) following neo adjuvant chemotherapy (NAC) for locally advanced breast cancer
Impact of BMI on pathologic complete response (pcr) following neo adjuvant chemotherapy (NAC) for locally advanced breast cancer Rachna Raman, MD, MS Fellow physician University of Iowa hospitals and clinics
More informationModel Selection Methods for Cancer Staging and Other Disease Stratification Problems. Yunzhi Lin
Model Selection Methods for Cancer Staging and Other Disease Stratification Problems by Yunzhi Lin A dissertation submitted in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY
More informationPTHP 7101 Research 1 Chapter Assignments
PTHP 7101 Research 1 Chapter Assignments INSTRUCTIONS: Go over the questions/pointers pertaining to the chapters and turn in a hard copy of your answers at the beginning of class (on the day that it is
More informationBayesRandomForest: An R
BayesRandomForest: An R implementation of Bayesian Random Forest for Regression Analysis of High-dimensional Data Oyebayo Ridwan Olaniran (rid4stat@yahoo.com) Universiti Tun Hussein Onn Malaysia Mohd Asrul
More informationBiostatistics II
Biostatistics II 514-5509 Course Description: Modern multivariable statistical analysis based on the concept of generalized linear models. Includes linear, logistic, and Poisson regression, survival analysis,
More informationStill important ideas
Readings: OpenStax - Chapters 1 11 + 13 & Appendix D & E (online) Plous - Chapters 2, 3, and 4 Chapter 2: Cognitive Dissonance, Chapter 3: Memory and Hindsight Bias, Chapter 4: Context Dependence Still
More information