MODEL SELECTION STRATEGIES. Tony Panzarella

Size: px
Start display at page:

Download "MODEL SELECTION STRATEGIES. Tony Panzarella"

Transcription

1 MODEL SELECTION STRATEGIES Tony Panzarella

2 Lab Course March 20, Preamble Although focus will be on time-to-event data the same principles apply to other outcome data

3 Lab Course March 20, Developing a multivariable prediction model Select clinically relevant predictors for possible inclusion in the model Evaluate the quality of the data and how to handle missing data Data handling decisions Choosing a strategy for selecting the important variables in the final model Deciding how to model continuous variables Selecting measures of model performance or predictive accuracy

4 AUTOMATIC SELECTION ROUTINES

5 Lab Course March 20, Forward Selection Variables are added to the model one at a time At each stage the variable added is the one which gives the largest decrease in the value of -2LogL on its inclusion The process ends when each of the remaining variables fails to reduce -2LogL by a pre-specified amount (typically couched as a significance level)

6 Lab Course March 20, Backward elimination Full model is fit first Variables are excluded one at a time At each stage the variable omitted is the one that increases -2LogL by the smallest amount by its exclusion The process ends when the next candidate for deletion increases the value of -2LogL by more than a pre-specified amount.

7 Lab Course March 20, Stepwise Operates similarly to forward selection However, a variable that is included can be considered for exclusion at a later stage Thus after adding a variable, the procedure then checks whether any previously included variable can be deleted

8 Lab Course March 20, Best Subsets Provides a computational efficient way to screen all possible models The procedure requires a criterion to judge a model Given the criterion the software screens all models containing q covariates and reports the covariates in the best, say n, models for q=1,2,3,,p, where p denotes the number of covariates SAS uses the score test proc phreg data=myeloma; model Time*VStatus(0)=LogBUN HGB Platelet Age LogWBC Frac LogPBM Protein SCalc / selection=score best=3; run;

9 Lab Course March 20, The PHREG Procedure Regression Models Selected by Score Criterion Number of Score Variables Chi-Square Variables Included in Model LogBUN HGB Platelet LogBUN HGB LogBUN Platelet LogBUN SCalc LogBUN HGB SCalc LogBUN HGB Age LogBUN HGB Frac LogBUN HGB Age SCalc LogBUN HGB Frac SCalc LogBUN HGB LogPBM SCalc LogBUN HGB Age Frac SCalc LogBUN HGB Age LogPBM SCalc LogBUN HGB Age LogWBC SCalc LogBUN HGB Age Frac LogPBM SCalc LogBUN HGB Age LogWBC Frac SCalc LogBUN HGB Platelet Age Frac SCalc LogBUN HGB Platelet Age Frac LogPBM SCalc LogBUN HGB Age LogWBC Frac LogPBM SCalc LogBUN HGB Platelet Age LogWBC Frac SCalc LogBUN HGB Platelet Age LogWBC Frac LogPBM SCalc LogBUN HGB Platelet Age Frac LogPBM Protein SCalc LogBUN HGB Platelet Age LogWBC Frac Protein SCalc LogBUN HGB Platelet Age LogWBC Frac LogPBM Protein SCalc

10 Lab Course March 20, Disadvantages of automatic routines They typically lead to one particular subset of variables, rather than a set of equally good ones The subsets found might be different for different selection routines They generally tend not to account for the hierarchic principle Dependent on the stopping rule It does not foster critical thinking about the problem

11 Lab Course March 20, Collett The model selection strategy depends to some extent on the purpose of the study

12 Lab Course March 20, Collett Chow et al. (2002) Main goal: Investigate what explanatory variables, in a palliative care setting, are associated with overall survival

13 Lab Course March 20, Collett Fosker et al. (2013) The Importance of Poor Performance Status in Personalizing Palliative Radiotherapy Towards the End of Life

14 Lab Course March 20, Collett Step 0: Identify a set of explanatory variables that have the potential for being included in a model This approach assumes that all variables are considered to be on an equal footing, and there is no a priori reason to include any specific variables (like treatment). Steps 1-4: Determine the combination of variables to be included In practice, there will not be a unique combination of variables; there are likely to be a number of equally good models

15 Lab Course March 20, Collett If the number of potential explanatory variables (including interactions, non-linear terms etc.) is not too large, it might be feasible to consider all combinations of terms Pay due regard to the hierarchic principle and use the statistic -2Log(Likelihood) Use AIC to compare possible models

16 Lab Course March 20, Collett When the number of variables is relatively large, the number of possible models that need to be fitted can be computationally expensive Automatic selection routines might seem to be an attractive option Forward selection Backward elimination Stepwise

17 Lab Course March 20, Collett Step 1: Fit a univariate model for each covariate, and identify the predictors significant at some level p1, say 0.20.

18 Lab Course March 20, Collett Step 2: Fit a multivariate model with all significant univariate predictors, and use backward selection to eliminate nonsignificant variables at some level p2, say 0.10.

19 Lab Course March 20, Collett Step 3: Starting with final step (2) model, consider each of the non-significant variables from step (1) using forward selection, with significance level p3, say 0.10.

20 Lab Course March 20, Collett Step 4: Do final pruning of main-effects model (omit variables that are non-significant, add any that are significant), using stepwise regression with significance level p4.

21 Lab Course March 20, Collett At this stage, you may also consider adding interactions between any of the main effects currently in the model, under the hierarchical principle.

22 Lab Course March 20, Collett Collett recommends using a likelihood ratio test for all variable inclusion/exclusion decisions.

23 Lab Course March 20, Collett Statistical criteria alone should not guide the model selection strategy It may not be appropriate to include particular combinations of variables It might be unwise to omit some non statistically significant variables

24 Lab Course March 20, Hosmer, Lemeshow and May Purposeful selection Step 1: Fit a multivariable model containing all variables significant in the univariable analysis at the 0.20 to 0.25 significance level, and any other variables not selected using this criterion but judged to be of clinical importance

25 Lab Course March 20, Hosmer, Lemeshow and May Note: If there are many covariates that show a statistically significant association with survival you can rank order the covariates based on p-values using only the most highly significant variables. Include one covariate per ten events.

26 Lab Course March 20, Hosmer, Lemeshow and May Step 2: Use Wald test p-values of the individual coefficients to identify covariates that might be deleted Cautioned not to delete too many seemingly non-significant variables at one time Confirm above by using partial likelihood test

27 Lab Course March 20, Hosmer, Lemeshow and May Step 3: Assess whether removal of the covariate has produced an important change in the coefficients of the variables remaining in the model. A value of 20% is used as an indicator of important change. If the variable excluded is an important confounder reintroduce it into the model. This process continues until no variables can be deleted.

28 Lab Course March 20, Hosmer, Lemeshow and May Step 4: Add to the model, one at a time, all variables excluded from the initial multivariable model to confirm that they are neither statistically significant nor a confounder Result referred to as the preliminary main effects model

29 Lab Course March 20, Hosmer, Lemeshow and May Step 5: Test linearity of the continuous covariates This is referred to as the main effects model

30 Lab Course March 20, Hosmer, Lemeshow and May Step 6: Are interactions needed? Use 0.05 significance level. Use Wald p-value and partial likelihood ratio test as described earlier

31 Lab Course March 20, Hosmer, Lemeshow and May Step 7: Final Model Check model assumptions, goodness-of-fit

32 Lab Course March 20, Machin, Cheung, Parmar Explanatory variables are categorized 1. Fundamental to research design (D) 2. Those that influence outcome or are confounders (K) 3. Uncertain influence (Q)

33 Lab Course March 20, Strategies Forced-entry Significance tests Change in estimates of hazard ratios

34 Lab Course March 20, Forced-entry Include variables in the model according to research design or prior opinion. This could include a non-statistically significant variable. E.g. treatment variable in a RCT Include variables known to be influential in their ability to confound the primary association of interest The resulting model (with statistically non-significant effects) could have a reduced efficiency

35 Lab Course March 20, Significance testing Step-up or step-down procedures where selection is manual, not automated

36 Lab Course March 20, Change in estimates If our purpose is to obtain a suitable estimate of the HR for a key variable the significance-testing strategy may not be successful in selecting confounders Compare HR Crude with the adjusted estimate HR Adjusted for a clinically important difference. A 10% change is suggested.

37 Lab Course March 20, Practical considerations Due to the effects of bias if more than 20% of the data points are missing for a variable exclude it from the modeling process. If missing data comprise < 5% then the bias introduced will likely be small. Check to see how any automatic selection routines handle missing data In practice one can start with missing data excluded at the early stages of the selection process but bring them back into the process as it becomes more clear which variables are likely to be in the final model

38 Lab Course March 20, Practical considerations Significance level to use? Err on the side of caution. Use 0.10 generally and 0.2 for the change-in-estimates method

39 Lab Course March 20, Practical considerations Univariable analysis per se is not recommended Rationale for univariable screening if an explanatory variable is associated with an outcome variable this association may be the result of confounding However, if an explanatory variable is not associated with an outcome variable in a univariable analysis, there is no gain in further examining it in a multivariable analysis This argument is flawed; it overlooks the possibility of confounding which may suppress a genuine relation; so-called negative confounding

40 Lab Course March 20, Positive vs. Negative Confounding Positive confounding An association is found between an exposure variable and outcome but in reality there is no association. The spurious association is caused by the confounder OR the association is stronger than it appears because of the confounder Negative confounding - An association is not found between an exposure variable and outcome but in reality there is an association. The true association is suppressed by the confounder OR the association is weaker than it appears in reality because of a confounder

41 Lab Course March 20, Higher education in women Outcome: Higher breast cancer incidence Nulliparous True Magnitude Higher education in women Apparent Magnitude Lower breast cancer incidence Outcome: Lower breast cancer incidence

42 Lab Course March 20, Steyerberg The problem of overfitting already starts with considering too many candidate predictors in a data set. The problem is difficult to solve with standard statistical techniques which are used by default in medical research. The uncertainty of model selection is an important source of overfitting.

43 Lab Course March 20, Steyerberg Improvements can be sought by limiting the necessity for selection by using subject matter knowledge, especially in relatively smaller data sets (also advocated by Harrell) Use better algorithms to discover patterns in the data (e.g. LASSO) LASSO is a penalized estimation technique where the estimated regression coefficients are constrained such that the sum of their scaled absolute values falls below some constant k chosen by cross-validation This type of constraint forces some regression coefficients towards zero (which helps with overfitting problem) and some to exactly zero (helping with variable selection)

44 Lab Course March 20, Royston et al. No consensus exits on the best method for selecting variables Two main strategies: Full model approach all candidate variables are included. This model is claimed to avoid overfitting and selection bias and provide correct standard errors and P values. However, the full model is not always easy to define Backward elimination approach the choice of significance level has a major effect on the number of variables selected. Selection of predictors by significance testing is known to produce selection bias (regression coefficients overestimated) and optimism as a result of overfitting. Overfitting leads to worse prediction in independent data

45 Lab Course March 20, Example 1 - Chow et al. (Collett approach)

46 Lab Course March 20,

47 Lab Course March 20,

48 Lab Course March 20,

49 Lab Course March 20,

50 Lab Course March 20,

51 Lab Course March 20,

52 Lab Course March 20, Example 2 Fosker et al. (Harrell approach) The Importance of Poor Performance Status in Personalising Palliative Radiotherapy Towards the End of Life The goal of our project is to define a clinically relevant ECOG PS based algorithm that would enable accurate prediction of patients with shorter life expectancies (< 3-4 months).

53 Lab Course March 20,

54 Lab Course March 20, Multivariate Analysis Cox Proportional Hazards model results NOTE: ECOG=0 as reference category for variable ECOG Parameter P-value Hazard 95% CI Ra o Lower Upper Age Age 75+ < Brain mets Yes < ECOG 1 < ECOG 2 < ECOG 3 < ECOG 4 < Gender Male < Primary Lung <

55 Lab Course March 20, Conclusions One size doesn t fit all hard to conclude there is a best approach. Cutting to the chase is not appropriate to describe multivariable modeling building A good model is one chosen by using a careful, well thought out covariate selection process that gives thought consideration to issues of adjustment and interactions and thoroughly evaluates the model for assumptions, influential observations, and tests for goodness-of-fit (Hosmer and Lemeshow 2008)

56 Lab Course March 20, References Collett D. Modelling Survival Data in Medical Research. Chapman and Hall Hosmer DW, Lemeshow S, May S. Applied Survival Analysis Regression Modeling of Time-to-event Data 2 nd edition Wiley Machin D, Cheung YB, Parmar MKB. Survival Analysis A Practical Approach. Wiley Steyerberg EW. Clinical Prediction Models. Springer Royston P, Moons KGM, Altman DG, Vergouwe Y. Prognosis and prognostic research: Developing a prognostic model BMJ June 2009 Volume 338 pp Harrell FE. Regression Modeling Strategies. Springer New York.

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS) Chapter : Advanced Remedial Measures Weighted Least Squares (WLS) When the error variance appears nonconstant, a transformation (of Y and/or X) is a quick remedy. But it may not solve the problem, or it

More information

Application of Cox Regression in Modeling Survival Rate of Drug Abuse

Application of Cox Regression in Modeling Survival Rate of Drug Abuse American Journal of Theoretical and Applied Statistics 2018; 7(1): 1-7 http://www.sciencepublishinggroup.com/j/ajtas doi: 10.11648/j.ajtas.20180701.11 ISSN: 2326-8999 (Print); ISSN: 2326-9006 (Online)

More information

CHAMP: CHecklist for the Appraisal of Moderators and Predictors

CHAMP: CHecklist for the Appraisal of Moderators and Predictors CHAMP - Page 1 of 13 CHAMP: CHecklist for the Appraisal of Moderators and Predictors About the checklist In this document, a CHecklist for the Appraisal of Moderators and Predictors (CHAMP) is presented.

More information

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality Week 9 Hour 3 Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality Stat 302 Notes. Week 9, Hour 3, Page 1 / 39 Stepwise Now that we've introduced interactions,

More information

Part 8 Logistic Regression

Part 8 Logistic Regression 1 Quantitative Methods for Health Research A Practical Interactive Guide to Epidemiology and Statistics Practical Course in Quantitative Data Handling SPSS (Statistical Package for the Social Sciences)

More information

Multiple Analysis. Some Nomenclatures. Learning Objectives. A Weight Lifting Analysis. SCHOOL OF NURSING The University of Hong Kong

Multiple Analysis. Some Nomenclatures. Learning Objectives. A Weight Lifting Analysis. SCHOOL OF NURSING The University of Hong Kong Some Nomenclatures Multiple Analysis Daniel Y.T. Fong Dependent/ Outcome variable Independent/ Explanatory variable Univariate Analyses 1 1 1 2 Simple Analysis Multiple Analysis /Multivariable Analysis

More information

Reporting and Methods in Clinical Prediction Research: A Systematic Review

Reporting and Methods in Clinical Prediction Research: A Systematic Review Reporting and Methods in Clinical Prediction Research: A Systematic Review Walter Bouwmeester 1., Nicolaas P. A. Zuithoff 1., Susan Mallett 2, Mirjam I. Geerlings 1, Yvonne Vergouwe 1,3, Ewout W. Steyerberg

More information

Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study

Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study STATISTICAL METHODS Epidemiology Biostatistics and Public Health - 2016, Volume 13, Number 1 Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation

More information

WELCOME! Lecture 11 Thommy Perlinger

WELCOME! Lecture 11 Thommy Perlinger Quantitative Methods II WELCOME! Lecture 11 Thommy Perlinger Regression based on violated assumptions If any of the assumptions are violated, potential inaccuracies may be present in the estimated regression

More information

RISK PREDICTION MODEL: PENALIZED REGRESSIONS

RISK PREDICTION MODEL: PENALIZED REGRESSIONS RISK PREDICTION MODEL: PENALIZED REGRESSIONS Inspired from: How to develop a more accurate risk prediction model when there are few events Menelaos Pavlou, Gareth Ambler, Shaun R Seaman, Oliver Guttmann,

More information

Template 1 for summarising studies addressing prognostic questions

Template 1 for summarising studies addressing prognostic questions Template 1 for summarising studies addressing prognostic questions Instructions to fill the table: When no element can be added under one or more heading, include the mention: O Not applicable when an

More information

Daniel Boduszek University of Huddersfield

Daniel Boduszek University of Huddersfield Daniel Boduszek University of Huddersfield d.boduszek@hud.ac.uk Introduction to Logistic Regression SPSS procedure of LR Interpretation of SPSS output Presenting results from LR Logistic regression is

More information

Prediction Research. An introduction. A. Cecile J.W. Janssens, MA, MSc, PhD Forike K. Martens, MSc

Prediction Research. An introduction. A. Cecile J.W. Janssens, MA, MSc, PhD Forike K. Martens, MSc Prediction Research An introduction A. Cecile J.W. Janssens, MA, MSc, PhD Forike K. Martens, MSc Emory University, Rollins School of Public Health, department of Epidemiology, Atlanta GA, USA VU University

More information

BIOSTATISTICAL METHODS AND RESEARCH DESIGNS. Xihong Lin Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA

BIOSTATISTICAL METHODS AND RESEARCH DESIGNS. Xihong Lin Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA BIOSTATISTICAL METHODS AND RESEARCH DESIGNS Xihong Lin Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA Keywords: Case-control study, Cohort study, Cross-Sectional Study, Generalized

More information

Selection and Combination of Markers for Prediction

Selection and Combination of Markers for Prediction Selection and Combination of Markers for Prediction NACC Data and Methods Meeting September, 2010 Baojiang Chen, PhD Sarah Monsell, MS Xiao-Hua Andrew Zhou, PhD Overview 1. Research motivation 2. Describe

More information

A MONTE CARLO STUDY OF MODEL SELECTION PROCEDURES FOR THE ANALYSIS OF CATEGORICAL DATA

A MONTE CARLO STUDY OF MODEL SELECTION PROCEDURES FOR THE ANALYSIS OF CATEGORICAL DATA A MONTE CARLO STUDY OF MODEL SELECTION PROCEDURES FOR THE ANALYSIS OF CATEGORICAL DATA Elizabeth Martin Fischer, University of North Carolina Introduction Researchers and social scientists frequently confront

More information

research methods & reporting

research methods & reporting Prognosis and prognostic research: Developing a prognostic model research methods & reporting Patrick Royston, 1 Karel G M Moons, 2 Douglas G Altman, 3 Yvonne Vergouwe 2 In the second article in their

More information

Multivariable Systems. Lawrence Hubert. July 31, 2011

Multivariable Systems. Lawrence Hubert. July 31, 2011 Multivariable July 31, 2011 Whenever results are presented within a multivariate context, it is important to remember that there is a system present among the variables, and this has a number of implications

More information

Chapter 17 Sensitivity Analysis and Model Validation

Chapter 17 Sensitivity Analysis and Model Validation Chapter 17 Sensitivity Analysis and Model Validation Justin D. Salciccioli, Yves Crutain, Matthieu Komorowski and Dominic C. Marshall Learning Objectives Appreciate that all models possess inherent limitations

More information

Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision in Pune, India

Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision in Pune, India 20th International Congress on Modelling and Simulation, Adelaide, Australia, 1 6 December 2013 www.mssanz.org.au/modsim2013 Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision

More information

Statistical modelling for thoracic surgery using a nomogram based on logistic regression

Statistical modelling for thoracic surgery using a nomogram based on logistic regression Statistics Corner Statistical modelling for thoracic surgery using a nomogram based on logistic regression Run-Zhong Liu 1, Ze-Rui Zhao 2, Calvin S. H. Ng 2 1 Department of Medical Statistics and Epidemiology,

More information

National Surgical Adjuvant Breast and Bowel Project (NSABP) Foundation Annual Progress Report: 2008 Formula Grant

National Surgical Adjuvant Breast and Bowel Project (NSABP) Foundation Annual Progress Report: 2008 Formula Grant National Surgical Adjuvant Breast and Bowel Project (NSABP) Foundation Annual Progress Report: 2008 Formula Grant Reporting Period July 1, 2011 December 31, 2011 Formula Grant Overview The National Surgical

More information

Summary. 20 May 2014 EMA/CHMP/SAWP/298348/2014 Procedure No.: EMEA/H/SAB/037/1/Q/2013/SME Product Development Scientific Support Department

Summary. 20 May 2014 EMA/CHMP/SAWP/298348/2014 Procedure No.: EMEA/H/SAB/037/1/Q/2013/SME Product Development Scientific Support Department 20 May 2014 EMA/CHMP/SAWP/298348/2014 Procedure No.: EMEA/H/SAB/037/1/Q/2013/SME Product Development Scientific Support Department evaluating patients with Autosomal Dominant Polycystic Kidney Disease

More information

Machine Learning to Inform Breast Cancer Post-Recovery Surveillance

Machine Learning to Inform Breast Cancer Post-Recovery Surveillance Machine Learning to Inform Breast Cancer Post-Recovery Surveillance Final Project Report CS 229 Autumn 2017 Category: Life Sciences Maxwell Allman (mallman) Lin Fan (linfan) Jamie Kang (kangjh) 1 Introduction

More information

All Possible Regressions Using IBM SPSS: A Practitioner s Guide to Automatic Linear Modeling

All Possible Regressions Using IBM SPSS: A Practitioner s Guide to Automatic Linear Modeling Georgia Southern University Digital Commons@Georgia Southern Georgia Educational Research Association Conference Oct 7th, 1:45 PM - 3:00 PM All Possible Regressions Using IBM SPSS: A Practitioner s Guide

More information

Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis

Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis EFSA/EBTC Colloquium, 25 October 2017 Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis Julian Higgins University of Bristol 1 Introduction to concepts Standard

More information

Applying Machine Learning Methods in Medical Research Studies

Applying Machine Learning Methods in Medical Research Studies Applying Machine Learning Methods in Medical Research Studies Daniel Stahl Department of Biostatistics and Health Informatics Psychiatry, Psychology & Neuroscience (IoPPN), King s College London daniel.r.stahl@kcl.ac.uk

More information

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics Biost 517 Applied Biostatistics I Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 3: Overview of Descriptive Statistics October 3, 2005 Lecture Outline Purpose

More information

Protocol Development: The Guiding Light of Any Clinical Study

Protocol Development: The Guiding Light of Any Clinical Study Protocol Development: The Guiding Light of Any Clinical Study Susan G. Fisher, Ph.D. Chair, Department of Clinical Sciences 1 Introduction Importance/ relevance/ gaps in knowledge Specific purpose of the

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

More information

A macro of building predictive model in PROC LOGISTIC with AIC-optimal variable selection embedded in cross-validation

A macro of building predictive model in PROC LOGISTIC with AIC-optimal variable selection embedded in cross-validation SESUG Paper AD-36-2017 A macro of building predictive model in PROC LOGISTIC with AIC-optimal variable selection embedded in cross-validation Hongmei Yang, Andréa Maslow, Carolinas Healthcare System. ABSTRACT

More information

BIOSTATISTICAL METHODS

BIOSTATISTICAL METHODS BIOSTATISTICAL METHODS FOR TRANSLATIONAL & CLINICAL RESEARCH PROPENSITY SCORE Confounding Definition: A situation in which the effect or association between an exposure (a predictor or risk factor) and

More information

PubH 7405: REGRESSION ANALYSIS. Propensity Score

PubH 7405: REGRESSION ANALYSIS. Propensity Score PubH 7405: REGRESSION ANALYSIS Propensity Score INTRODUCTION: There is a growing interest in using observational (or nonrandomized) studies to estimate the effects of treatments on outcomes. In observational

More information

Regression Discontinuity Analysis

Regression Discontinuity Analysis Regression Discontinuity Analysis A researcher wants to determine whether tutoring underachieving middle school students improves their math grades. Another wonders whether providing financial aid to low-income

More information

Meta-Analysis. Zifei Liu. Biological and Agricultural Engineering

Meta-Analysis. Zifei Liu. Biological and Agricultural Engineering Meta-Analysis Zifei Liu What is a meta-analysis; why perform a metaanalysis? How a meta-analysis work some basic concepts and principles Steps of Meta-analysis Cautions on meta-analysis 2 What is Meta-analysis

More information

Midterm Exam ANSWERS Categorical Data Analysis, CHL5407H

Midterm Exam ANSWERS Categorical Data Analysis, CHL5407H Midterm Exam ANSWERS Categorical Data Analysis, CHL5407H 1. Data from a survey of women s attitudes towards mammography are provided in Table 1. Women were classified by their experience with mammography

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method Biost 590: Statistical Consulting Statistical Classification of Scientific Studies; Approach to Consulting Lecture Outline Statistical Classification of Scientific Studies Statistical Tasks Approach to

More information

Examining Relationships Least-squares regression. Sections 2.3

Examining Relationships Least-squares regression. Sections 2.3 Examining Relationships Least-squares regression Sections 2.3 The regression line A regression line describes a one-way linear relationship between variables. An explanatory variable, x, explains variability

More information

11/24/2017. Do not imply a cause-and-effect relationship

11/24/2017. Do not imply a cause-and-effect relationship Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are highly extraverted people less afraid of rejection

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

Media, Discussion and Attitudes Technical Appendix. 6 October 2015 BBC Media Action Andrea Scavo and Hana Rohan

Media, Discussion and Attitudes Technical Appendix. 6 October 2015 BBC Media Action Andrea Scavo and Hana Rohan Media, Discussion and Attitudes Technical Appendix 6 October 2015 BBC Media Action Andrea Scavo and Hana Rohan 1 Contents 1 BBC Media Action Programming and Conflict-Related Attitudes (Part 5a: Media and

More information

Modern Regression Methods

Modern Regression Methods Modern Regression Methods Second Edition THOMAS P. RYAN Acworth, Georgia WILEY A JOHN WILEY & SONS, INC. PUBLICATION Contents Preface 1. Introduction 1.1 Simple Linear Regression Model, 3 1.2 Uses of Regression

More information

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n. University of Groningen Latent instrumental variables Ebbes, P. IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document

More information

Variable selection should be blinded to the outcome

Variable selection should be blinded to the outcome Variable selection should be blinded to the outcome Tamás Ferenci Manuscript type: Letter to the Editor Title: Variable selection should be blinded to the outcome Author List: Tamás Ferenci * (Physiological

More information

Correlation and Regression

Correlation and Regression Dublin Institute of Technology ARROW@DIT Books/Book Chapters School of Management 2012-10 Correlation and Regression Donal O'Brien Dublin Institute of Technology, donal.obrien@dit.ie Pamela Sharkey Scott

More information

UMbRELLA interim report Preparatory work

UMbRELLA interim report Preparatory work UMbRELLA interim report Preparatory work This document is intended to supplement the UMbRELLA Interim Report 2 (January 2016) by providing a summary of the preliminary analyses which influenced the decision

More information

Individual Participant Data (IPD) Meta-analysis of prediction modelling studies

Individual Participant Data (IPD) Meta-analysis of prediction modelling studies Individual Participant Data (IPD) Meta-analysis of prediction modelling studies Thomas Debray, PhD Julius Center for Health Sciences and Primary Care Utrecht, The Netherlands March 7, 2016 Prediction

More information

From single studies to an EBM based assessment some central issues

From single studies to an EBM based assessment some central issues From single studies to an EBM based assessment some central issues Doug Altman Centre for Statistics in Medicine, Oxford, UK Prognosis Prognosis commonly relates to the probability or risk of an individual

More information

Correlation and regression

Correlation and regression PG Dip in High Intensity Psychological Interventions Correlation and regression Martin Bland Professor of Health Statistics University of York http://martinbland.co.uk/ Correlation Example: Muscle strength

More information

Answer to exercise: Growth of guinea pigs

Answer to exercise: Growth of guinea pigs Answer to exercise: Growth of guinea pigs The effect of a vitamin E diet on the growth of guinea pigs is investigated in the following way: In the beginning of week 1, 10 animals received a growth inhibitor.

More information

STATISTICAL METHODS FOR DIAGNOSTIC TESTING: AN ILLUSTRATION USING A NEW METHOD FOR CANCER DETECTION XIN SUN. PhD, Kansas State University, 2012

STATISTICAL METHODS FOR DIAGNOSTIC TESTING: AN ILLUSTRATION USING A NEW METHOD FOR CANCER DETECTION XIN SUN. PhD, Kansas State University, 2012 STATISTICAL METHODS FOR DIAGNOSTIC TESTING: AN ILLUSTRATION USING A NEW METHOD FOR CANCER DETECTION by XIN SUN PhD, Kansas State University, 2012 A THESIS Submitted in partial fulfillment of the requirements

More information

Help! Statistics! Missing data. An introduction

Help! Statistics! Missing data. An introduction Help! Statistics! Missing data. An introduction Sacha la Bastide-van Gemert Medical Statistics and Decision Making Department of Epidemiology UMCG Help! Statistics! Lunch time lectures What? Frequently

More information

Statistics as a Tool. A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations.

Statistics as a Tool. A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations. Statistics as a Tool A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations. Descriptive Statistics Numerical facts or observations that are organized describe

More information

Bayesian Logistic Regression Modelling via Markov Chain Monte Carlo Algorithm

Bayesian Logistic Regression Modelling via Markov Chain Monte Carlo Algorithm Journal of Social and Development Sciences Vol. 4, No. 4, pp. 93-97, Apr 203 (ISSN 222-52) Bayesian Logistic Regression Modelling via Markov Chain Monte Carlo Algorithm Henry De-Graft Acquah University

More information

Supplementary Online Content

Supplementary Online Content Supplementary Online Content Neuhouser ML, Aragaki AK, Prentice RL, et al. Overweight, obesity, and postmenopausal invasive breast cancer risk: a secondary analysis of the Women s Health Initiative randomized

More information

Impact of pre-treatment symptoms on survival after palliative radiotherapy An improved model to predict prognosis?

Impact of pre-treatment symptoms on survival after palliative radiotherapy An improved model to predict prognosis? Impact of pre-treatment symptoms on survival after palliative radiotherapy An improved model to predict prognosis? Thomas André Ankill Kämpe 30.05.2016 MED 3950,-5 year thesis Profesjonsstudiet i medisin

More information

Bangor University Laboratory Exercise 1, June 2008

Bangor University Laboratory Exercise 1, June 2008 Laboratory Exercise, June 2008 Classroom Exercise A forest land owner measures the outside bark diameters at.30 m above ground (called diameter at breast height or dbh) and total tree height from ground

More information

Influence of Hypertension and Diabetes Mellitus on. Family History of Heart Attack in Male Patients

Influence of Hypertension and Diabetes Mellitus on. Family History of Heart Attack in Male Patients Applied Mathematical Sciences, Vol. 6, 01, no. 66, 359-366 Influence of Hypertension and Diabetes Mellitus on Family History of Heart Attack in Male Patients Wan Muhamad Amir W Ahmad 1, Norizan Mohamed,

More information

Confounding by indication developments in matching, and instrumental variable methods. Richard Grieve London School of Hygiene and Tropical Medicine

Confounding by indication developments in matching, and instrumental variable methods. Richard Grieve London School of Hygiene and Tropical Medicine Confounding by indication developments in matching, and instrumental variable methods Richard Grieve London School of Hygiene and Tropical Medicine 1 Outline 1. Causal inference and confounding 2. Genetic

More information

Clinical research in AKI Timing of initiation of dialysis in AKI

Clinical research in AKI Timing of initiation of dialysis in AKI Clinical research in AKI Timing of initiation of dialysis in AKI Josée Bouchard, MD Krescent Workshop December 10 th, 2011 1 Acute kidney injury in ICU 15 25% of critically ill patients experience AKI

More information

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data TECHNICAL REPORT Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data CONTENTS Executive Summary...1 Introduction...2 Overview of Data Analysis Concepts...2

More information

Article from. Forecasting and Futurism. Month Year July 2015 Issue Number 11

Article from. Forecasting and Futurism. Month Year July 2015 Issue Number 11 Article from Forecasting and Futurism Month Year July 2015 Issue Number 11 Calibrating Risk Score Model with Partial Credibility By Shea Parkes and Brad Armstrong Risk adjustment models are commonly used

More information

M15_BERE8380_12_SE_C15.6.qxd 2/21/11 8:21 PM Page Influence Analysis 1

M15_BERE8380_12_SE_C15.6.qxd 2/21/11 8:21 PM Page Influence Analysis 1 M15_BERE8380_12_SE_C15.6.qxd 2/21/11 8:21 PM Page 1 15.6 Influence Analysis FIGURE 15.16 Minitab worksheet containing computed values for the Studentized deleted residuals, the hat matrix elements, and

More information

Sawtooth Software. The Number of Levels Effect in Conjoint: Where Does It Come From and Can It Be Eliminated? RESEARCH PAPER SERIES

Sawtooth Software. The Number of Levels Effect in Conjoint: Where Does It Come From and Can It Be Eliminated? RESEARCH PAPER SERIES Sawtooth Software RESEARCH PAPER SERIES The Number of Levels Effect in Conjoint: Where Does It Come From and Can It Be Eliminated? Dick Wittink, Yale University Joel Huber, Duke University Peter Zandan,

More information

Package StepReg. November 3, 2017

Package StepReg. November 3, 2017 Type Package Title Stepwise Regression Analysis Version 1.0.0 Date 2017-10-30 Author Junhui Li,Kun Cheng,Wenxin Liu Maintainer Junhui Li Package StepReg November 3, 2017 Description

More information

Traumatic brain injury

Traumatic brain injury Introduction It is well established that traumatic brain injury increases the risk for a wide range of neuropsychiatric disturbances, however there is little consensus on whether it is a risk factor for

More information

Advanced Handling of Missing Data

Advanced Handling of Missing Data Advanced Handling of Missing Data One-day Workshop Nicole Janz ssrmcta@hermes.cam.ac.uk 2 Goals Discuss types of missingness Know advantages & disadvantages of missing data methods Learn multiple imputation

More information

Systematic reviews of prediction modeling studies: planning, critical appraisal and data collection

Systematic reviews of prediction modeling studies: planning, critical appraisal and data collection Systematic reviews of prediction modeling studies: planning, critical appraisal and data collection Karel GM Moons, Lotty Hooft, Hans Reitsma, Thomas Debray Dutch Cochrane Center Julius Center for Health

More information

Business Statistics Probability

Business Statistics Probability Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

Propensity score methods to adjust for confounding in assessing treatment effects: bias and precision

Propensity score methods to adjust for confounding in assessing treatment effects: bias and precision ISPUB.COM The Internet Journal of Epidemiology Volume 7 Number 2 Propensity score methods to adjust for confounding in assessing treatment effects: bias and precision Z Wang Abstract There is an increasing

More information

Strategies for handling missing data in randomised trials

Strategies for handling missing data in randomised trials Strategies for handling missing data in randomised trials NIHR statistical meeting London, 13th February 2012 Ian White MRC Biostatistics Unit, Cambridge, UK Plan 1. Why do missing data matter? 2. Popular

More information

Problem solving therapy

Problem solving therapy Introduction People with severe mental illnesses such as schizophrenia may show impairments in problem-solving ability. Remediation interventions such as problem solving skills training can help people

More information

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0% Capstone Test (will consist of FOUR quizzes and the FINAL test grade will be an average of the four quizzes). Capstone #1: Review of Chapters 1-3 Capstone #2: Review of Chapter 4 Capstone #3: Review of

More information

GSK Medicine: Study Number: Title: Rationale: Study Period: Objectives: Indication: Study Investigators/Centers: Research Methods: Data Source

GSK Medicine: Study Number: Title: Rationale: Study Period: Objectives: Indication: Study Investigators/Centers: Research Methods: Data Source The study listed may include approved and non-approved uses, formulations or treatment regimens. The results reported in any single study may not reflect the overall results obtained on studies of a product.

More information

Linear Regression in SAS

Linear Regression in SAS 1 Suppose we wish to examine factors that predict patient s hemoglobin levels. Simulated data for six patients is used throughout this tutorial. data hgb_data; input id age race $ bmi hgb; cards; 21 25

More information

PTHP 7101 Research 1 Chapter Assignments

PTHP 7101 Research 1 Chapter Assignments PTHP 7101 Research 1 Chapter Assignments INSTRUCTIONS: Go over the questions/pointers pertaining to the chapters and turn in a hard copy of your answers at the beginning of class (on the day that it is

More information

A Population-Based Study on the Uptake and Utilization of Stereotactic Radiosurgery (SRS) for Brain Metastasis in Nova Scotia

A Population-Based Study on the Uptake and Utilization of Stereotactic Radiosurgery (SRS) for Brain Metastasis in Nova Scotia A Population-Based Study on the Uptake and Utilization of Stereotactic Radiosurgery (SRS) for Brain Metastasis in Nova Scotia Gaurav Bahl, Karl Tennessen, Ashraf Mahmoud-Ahmed, Dorianne Rheaume, Ian Fleetwood,

More information

Applied Medical. Statistics Using SAS. Geoff Der. Brian S. Everitt. CRC Press. Taylor Si Francis Croup. Taylor & Francis Croup, an informa business

Applied Medical. Statistics Using SAS. Geoff Der. Brian S. Everitt. CRC Press. Taylor Si Francis Croup. Taylor & Francis Croup, an informa business Applied Medical Statistics Using SAS Geoff Der Brian S. Everitt CRC Press Taylor Si Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Croup, an informa business A

More information

isc ove ring i Statistics sing SPSS

isc ove ring i Statistics sing SPSS isc ove ring i Statistics sing SPSS S E C O N D! E D I T I O N (and sex, drugs and rock V roll) A N D Y F I E L D Publications London o Thousand Oaks New Delhi CONTENTS Preface How To Use This Book Acknowledgements

More information

Clinical course of untreated cerebral cavernous malformations: an individual patient data meta-analysis

Clinical course of untreated cerebral cavernous malformations: an individual patient data meta-analysis Statistical Analysis Plan Clinical course of untreated cerebral cavernous malformations: an individual patient data meta-analysis Background The disease A cerebral cavernous malformation (CCM) is a small

More information

Study of cigarette sales in the United States Ge Cheng1, a,

Study of cigarette sales in the United States Ge Cheng1, a, 2nd International Conference on Economics, Management Engineering and Education Technology (ICEMEET 2016) 1Department Study of cigarette sales in the United States Ge Cheng1, a, of pure mathematics and

More information

Small Group Presentations

Small Group Presentations Admin Assignment 1 due next Tuesday at 3pm in the Psychology course centre. Matrix Quiz during the first hour of next lecture. Assignment 2 due 13 May at 10am. I will upload and distribute these at the

More information

Clinical Epidemiology for the uninitiated

Clinical Epidemiology for the uninitiated Clinical epidemiologist have one foot in clinical care and the other in clinical practice research. As clinical epidemiologists we apply a wide array of scientific principles, strategies and tactics to

More information

Original Article INTRODUCTION. Alireza Abadi, Farzaneh Amanpour 1, Chris Bajdik 2, Parvin Yavari 3

Original Article INTRODUCTION.  Alireza Abadi, Farzaneh Amanpour 1, Chris Bajdik 2, Parvin Yavari 3 www.ijpm.ir Breast Cancer Survival Analysis: Applying the Generalized Gamma Distribution under Different Conditions of the Proportional Hazards and Accelerated Failure Time Assumptions Alireza Abadi, Farzaneh

More information

Assessment of a disease screener by hierarchical all-subset selection using area under the receiver operating characteristic curves

Assessment of a disease screener by hierarchical all-subset selection using area under the receiver operating characteristic curves Research Article Received 8 June 2010, Accepted 15 February 2011 Published online 15 April 2011 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/sim.4246 Assessment of a disease screener by

More information

STATISTICS & PROBABILITY

STATISTICS & PROBABILITY STATISTICS & PROBABILITY LAWRENCE HIGH SCHOOL STATISTICS & PROBABILITY CURRICULUM MAP 2015-2016 Quarter 1 Unit 1 Collecting Data and Drawing Conclusions Unit 2 Summarizing Data Quarter 2 Unit 3 Randomness

More information

Computer Age Statistical Inference. Algorithms, Evidence, and Data Science. BRADLEY EFRON Stanford University, California

Computer Age Statistical Inference. Algorithms, Evidence, and Data Science. BRADLEY EFRON Stanford University, California Computer Age Statistical Inference Algorithms, Evidence, and Data Science BRADLEY EFRON Stanford University, California TREVOR HASTIE Stanford University, California ggf CAMBRIDGE UNIVERSITY PRESS Preface

More information

COMPARISON OF THE SURVIVAL OF SHIPPED AND LOCALLY TRANSPLANTED CADAVERIC RENAL ALLOGRAFTS

COMPARISON OF THE SURVIVAL OF SHIPPED AND LOCALLY TRANSPLANTED CADAVERIC RENAL ALLOGRAFTS COMPARISON OF THE SURVIVAL OF SHIPPED AND LOCALLY TRANSPLANTED CADAVERIC RENAL ALLOGRAFTS A COMPARISON OF THE SURVIVAL OF SHIPPED AND LOCALLY TRANSPLANTED CADAVERIC RENAL ALLOGRAFTS KEVIN C. MANGE, M.D.,

More information

IAPT: Regression. Regression analyses

IAPT: Regression. Regression analyses Regression analyses IAPT: Regression Regression is the rather strange name given to a set of methods for predicting one variable from another. The data shown in Table 1 and come from a student project

More information

MAKING THE NSQIP PARTICIPANT USE DATA FILE (PUF) WORK FOR YOU

MAKING THE NSQIP PARTICIPANT USE DATA FILE (PUF) WORK FOR YOU MAKING THE NSQIP PARTICIPANT USE DATA FILE (PUF) WORK FOR YOU Hani Tamim, PhD Clinical Research Institute Department of Internal Medicine American University of Beirut Medical Center Beirut - Lebanon Participant

More information

Regression analysis is a valuable research method. Logistic Regression: A Brief Primer RESEARCH METHODS AND STATISTICS. Abstract

Regression analysis is a valuable research method. Logistic Regression: A Brief Primer RESEARCH METHODS AND STATISTICS. Abstract RESEARCH METHODS AND STATISTICS Logistic Regression: A Brief Primer Jill C. Stoltzfus, PhD Abstract Regression techniques are versatile in their application to medical research because they can measure

More information

Evidence-Based Medicine Journal Club. A Primer in Statistics, Study Design, and Epidemiology. August, 2013

Evidence-Based Medicine Journal Club. A Primer in Statistics, Study Design, and Epidemiology. August, 2013 Evidence-Based Medicine Journal Club A Primer in Statistics, Study Design, and Epidemiology August, 2013 Rationale for EBM Conscientious, explicit, and judicious use Beyond clinical experience and physiologic

More information

Statistical Methods and Reasoning for the Clinical Sciences

Statistical Methods and Reasoning for the Clinical Sciences Statistical Methods and Reasoning for the Clinical Sciences Evidence-Based Practice Eiki B. Satake, PhD Contents Preface Introduction to Evidence-Based Statistics: Philosophical Foundation and Preliminaries

More information

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research 2012 CCPRC Meeting Methodology Presession Workshop October 23, 2012, 2:00-5:00 p.m. Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy

More information

baseline comparisons in RCTs

baseline comparisons in RCTs Stefan L. K. Gruijters Maastricht University Introduction Checks on baseline differences in randomized controlled trials (RCTs) are often done using nullhypothesis significance tests (NHSTs). In a quick

More information

Downloaded from:

Downloaded from: Hemingway, H; Croft, P; Perel, P; Hayden, JA; Abrams, K; Timmis, A; Briggs, A; Udumyan, R; Moons, KG; Steyerberg, EW; Roberts, I; Schroter, S; Altman, DG; Riley, RD; PROGRESS Group (2013) Prognosis research

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction 1.1 Motivation and Goals The increasing availability and decreasing cost of high-throughput (HT) technologies coupled with the availability of computational tools and data form a

More information

Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach

Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach University of South Florida Scholar Commons Graduate Theses and Dissertations Graduate School November 2015 Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach Wei Chen

More information

REVIEW ARTICLE. A Review of Inferential Statistical Methods Commonly Used in Medicine

REVIEW ARTICLE. A Review of Inferential Statistical Methods Commonly Used in Medicine A Review of Inferential Statistical Methods Commonly Used in Medicine JCD REVIEW ARTICLE A Review of Inferential Statistical Methods Commonly Used in Medicine Kingshuk Bhattacharjee a a Assistant Manager,

More information