RISK PREDICTION MODEL: PENALIZED REGRESSIONS

Size: px
Start display at page:

Download "RISK PREDICTION MODEL: PENALIZED REGRESSIONS"

Transcription

1 RISK PREDICTION MODEL: PENALIZED REGRESSIONS Inspired from: How to develop a more accurate risk prediction model when there are few events Menelaos Pavlou, Gareth Ambler, Shaun R Seaman, Oliver Guttmann, Perry Elliott, Michael King, Rumana Z Omar BMJ 2015;351:h3868 Tip: Use to scan QR code Journal Club January Pawin Numthavaj, M.D. Section for Clinical Epidemiology and Biostatistics Faculty of Medicine Ramathibodi Hospital

2 RISK PREDICTION MODEL Statistical model Use predictors to predict health outcome

3 USUAL RISK PREDICTION MODEL DEVELOPMENT 1. Model development based on patients in one group 2. Obtaining outcome and predictor data 3. Create a mathematical model of prediction of outcome 4. Test the performance of model

4 MODEL PERFORMANCE 1. Discrimination Model s ability to discriminate between low and high risk 2. Calibration Agreement between real observed outcomes and predictions

5 1. DISCRIMINATION Ability to distinguish low risk versus high risk patients Area under ROC Curve of model predicted outcome vs actual outcome for different cut-off points of predicted risk Concordance (C) Statistics Probability that a randomly selected subject with outcome will have a higher predicted probability of outcome compared to a randomly selected subject without outcome : acceptable, excellent, outstanding

6 C-STATISTICS concordant + (0.5 ties) C = all pairs C = 6 + (0.5 3) = Giovanni Tripepi et al. Nephrol. Dial. Transplant. 2010;25:

7 2. CALIBRATION Measure of how close predicted probabilities are to observed rated of positive outcome Ex: Predicted 70% chance is 70% observed in actual data? Commonly used technique: Hosmer and Lemeshow chisquare Partition data into groups Compare average of predicted probabilities and outcome prevalence in each group by Chi-square

8 HOSMER-LEMESHOW TEST Deciles of estimated probability of death Sum of predicted deaths Sum of observed deaths Giovanni Tripepi et al. Nephrol. Dial. Transplant. 2010;25:

9 Deciles Sum of predicted deaths Sum of observed deaths HL test χ 2 = [ = [ =12 observed - estimated 2 ] estimated Chi-square of 12 with n-2 (8) degrees of freedom p=0.15 Proportion of deaths predicted by model does not significantly differ from observed deaths ]

10 TYPICAL TECHNIQUES FOR MODEL VALIDATION Internal validation Bootstrapping methods External validation Use patient data not used for model development

11 EXAMPLE (BOX1) Outcome: Mechanical failure of heart valve (Y/N) Predictors: sex (score of 1=female) age (years) body surface area (BSA; m2) whether a replacement valve came from a batch with fractures (score of 1=valve came from batch with fractures)

12 RISK MODEL: LOGISTIC REGRESSION MODEL Patient s risk of heart failure = e (patient s risk score) (1+e (patient s risk score) ) Patient s risk score = intercept + (b sex sex) + (b age age) + (b BSA BSA) + (b fracture fracture) Regression coefficients (b) can be obtained using various methods: standard logistic regression, ridge or lasso

13 b sex = b age = b BSA = b fracture = Intercept = 4.25 The risk score for a 40 year old female patient with a body surface area of 1.7 m2 and an artificial valve from a batch with fractures would then be calculated as: = ( (female sex)) + ( (age; years)) + ( (BSA in m 2 )) + ( (fracture present in batch)) = 2.89 Therefore, her predicted risk would be: exp( 2.89) (1+exp( 2.89)) = 5.3%

14 BOOTSTRAP VALIDATION Use when no external cohort is not available Bootstrap dataset: imitation of original dataset, constructed by random sampling of patients from original dataset Typically, large number of bootstrap dataset (ex: 200) is created Model is fitted to each boostrap dataset, and estimated coefficients are use to obtain predictions for the patients in original dataset These predictions are used to calculate calibration slope for the fitted model

15 SOMETIMES, THERE ARE FEW EVENTS COMPARED TO NUMBER OF PREDICTORS Example: Structural failure of medical heart valves Sudden cardiac death in patients with hypertrophic cardiomyopathy Predictors from the model often perform less well in a new patient group

16 WHY? Fitted model captures not only the association between outcome and predictors Also random variation (noise) in development dataset Model overfitting Underestimate probability of event in low risk patients Overestimates probability of event in high risk patients

17 SAMPLE SIZE REQUIRED FOR RISK PREDICTION MODEL Rule of thumb Events per variable (EPV) ratio EPV = Number of events Number of regression coefficient EPV of 10 is needed to avoid overfitting

18 EXAMPLE 60 events for model with 6 regression coefficients Structural Heart Disease Age CV Death Sex HT Family History of CVD DM

19 WHEN EVENTS ARE RARE EPV of 10 may be difficult to achieve

20 PROBLEM OF RARE OUTCOME Models with few events compared to numbers of predictors often underperform when applied to new patients Model Overfitting Underestimate probability of event in low risk patients Overestimate probability of event in high risk patients

21 COMMON STRATEGIES 1. Univariable screening Only include significant predictors in the model 2. Stepwise model selection Ex: Backwards elimination Drawback: Process may not be stable Small changes in the data or in the predictor selection process could lead to different predictors being included in the final model

22 ANOTHER WAY TO ALLEVIATE MODEL FITTING Shrinkage methods Methods that tend to shrink the regression coefficient towards zero Moving poorly calibrated predicted risks towards the average risk

23 SIMPLEST SHRINKAGE METHOD Shrink all coefficients by common factor: ex. -20% However, this approach does not perform well if EPV very low

24 PENALIZED REGRESSION Flexible shrinkage approaches that is effective when EPV is low (<10) Process: 1. Specify form of risk model (ex: logistic/cox) 2. Fit the data to estimate coefficient in standard logistic/cox model 3. Range of predicted risk is too wide as result of overfitting 4. Shrinking regression coefficients toward zero by placing constraint on the values of regression coefficients (Penalized) Coefficient estimates are typically smaller than those of standard regression

25 SEVERAL FORMS OF PENALIZED REGRESSION Ridge Lasso Derivations of Ridge and Lasso: Elastic net, Smoothly clipped absolute deviation, adaptive Lasso Etc. Packages in R (penalized), SPSS *Stata rxridge, firthlogit, overfit

26 RIDGE REGRESSION Fit model under constraint that sum of squared regression coefficients does not exceed particular threshold Penalized the coefficients using formula: l β λ p j=1 λ : scalar chosen by the investigator to control the amount of shrinkage λ = 0 results in the standard regression model β j 2

27 The threshold is chosen to maximize model s predictive ability using cross validation: Dataset is split into k group Model is fitted to (k-1) groups and validated on the omitted group Repeated k times, each time omitting a different group Ex: 10-fold cross validation Split dataset into 10 subsets Subset j is omitted then penalized model is fitted to other nine subsets Calculate prediction for all patients, calculate predictive abilities and compare with the full model

28 LASSO REGRESSION Least Absolute Shrinkage and Selection Operator Similar to ridge Constrain the sum of absolute values of regression coefficients l β λ Lasso can effectively exclude predictors from the final model by shrinking coefficient to 0 p j=1 β j

29 RIDGE OR LASSO? In health research, set of prespecified predictors is often available Ridge regression is usually preferred option Lasso: if preferred simpler model with few predictors (ex: save time/resources by collecting less information on patients)

30 DETECTION OF MODEL OVERFITTING Assessment of model calibration Internal validation External validation Dividing patients into risk groups according to predicted risk Compare proportion of patients who had event and average predicted risk in that group Graph (calibration plot) Table (and Hosmer-Lemeshow GoF)

31 DEGREE OF OVERFITTING Quantify by simple regression model Outcomes in validation data are regressed using logistic regression on their predicted risk score Well-calibrated model: estimated slop (calibration slope): close to 1 Overfitted model: <1 (low risks are underestimated, high risks are overestimated

32 EXAMPLE 1: MECHANICAL HEART VALVE FAILURE Data of 3118 patients with mechanical heart valve Outcome: Failure of artificial valve (56) Predictor: age, sex, BSA, fractures in the batch of the valve (Y/N), year of valve manufacture (<1981/>1981), valve size (10 coefficients) EPV = 56/10 = 5.6 Standard, ridge, lasso regression

33 Predictors Descriptive statistics Regression coefficient estimates Standard Ridge Lasso regression regression regression Intercept (23) 6.65 (15) Sex (female) 1337 (43) (41) 0.16 (34) Age (years) 54.1 (10.8) (11) (4) Body surface area (m2) 1.6 (0.3) (24) 1.75 (12) Aortic size 23, 27, 29, 31 mm (75) 0.61 (68) Mitral size mm (84) 0.43 (67) Mitral size 29 mm (59) 1.13 (42) Mitral size 31 mm (47) 1.77 (33) Mitral size 33 mm (45) 1.73 (33) Fracture in batch (yes) ( 17) 0.64 ( 9) Date of manufacture (after 1981) (26) 1.22 (12)

34 FIG 1: DISTRIBUTION OF PREDICTED RISK SCORES ESTIMATED USING STANDARD, RIDGE, AND LASSO REGRESSION Menelaos Pavlou et al. BMJ 2015;351:bmj.h3868

35 FIG 2: OBSERVED PROPORTIONS VERSUS AVERAGE PREDICTED RISK OF THE EVENT (USING STANDARD, RIDGE AND LASSO REGRESSION).

36 EXAMPLE 2: SUDDEN CARDIAC DEATH IN HYPERTROPHIC CARDIOMYOPATHY Data on 1000 patients Outcome: risk of sudden cardiac death within 10 years from diagnosis (42 events) Predictors: age, max LV wall thickness, fractional shortening, LA diameter, peak LV outflow tract gradient (cont) and gender, family history of SCD, non-sustained VT, severity of HF by NYHA, unexplained syncope (binary) EPV = 4.2 Externally validated model using data from different centers (2405 patients, 106 events)

37 COEFFICIENT TABLE Predictors Standard regression Regression coefficient estimates Ridge regression Lasso regression Age (years) Max Wall Thickness (mm) Fractional Shortening(mm) LA diameter (mm) Peak LVOT gradient (mmhg) Sudden cardiac death in family Non-sustain VT Syncope Sex-male NYHA class III/IV

38 FIGURE

39 CONCLUSION When number of events is low compared to predictors in risk model: standard regression may produced overfitted risk model Common method such as stepwise selection and univariable screening are problematic and should be avoided Recommended that the use of penalized regression methods be explored Other methods such as incorporated existing evidence (from published risk models, meta-analysis, and expert opinion) could be better in some scenario

40 TAKE HOME MESSAGE Beware prediction models with Number of events EPV( Number of regression coefficient ) < 10 Standard model usually overfitted in EPV<10: underestimate low risk patients, and overestimate high risk patients Penalizing the coefficient using penalized regression methods such as Ridge and Lasso is a possible solution to this problem

41 THANK YOU

42 NEXT JOURNAL CLUB REMINDER: Factors influencing recruitment to research: qualitative study of the experiences and perceptions of research teams by Threechada Boonchan Friday 19 th, February 13:00-14:30 Room 905 lunch from 12:00 noon Register at: Tip: Use Scan app to scan QR code and add appointment

Part [2.1]: Evaluation of Markers for Treatment Selection Linking Clinical and Statistical Goals

Part [2.1]: Evaluation of Markers for Treatment Selection Linking Clinical and Statistical Goals Part [2.1]: Evaluation of Markers for Treatment Selection Linking Clinical and Statistical Goals Patrick J. Heagerty Department of Biostatistics University of Washington 174 Biomarkers Session Outline

More information

Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach

Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach University of South Florida Scholar Commons Graduate Theses and Dissertations Graduate School November 2015 Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach Wei Chen

More information

Testing Statistical Models to Improve Screening of Lung Cancer

Testing Statistical Models to Improve Screening of Lung Cancer Testing Statistical Models to Improve Screening of Lung Cancer 1 Elliot Burghardt: University of Iowa Daren Kuwaye: University of Hawai i at Mānoa Iowa Summer Institute in Biostatistics - University of

More information

Selection and Combination of Markers for Prediction

Selection and Combination of Markers for Prediction Selection and Combination of Markers for Prediction NACC Data and Methods Meeting September, 2010 Baojiang Chen, PhD Sarah Monsell, MS Xiao-Hua Andrew Zhou, PhD Overview 1. Research motivation 2. Describe

More information

Computer Models for Medical Diagnosis and Prognostication

Computer Models for Medical Diagnosis and Prognostication Computer Models for Medical Diagnosis and Prognostication Lucila Ohno-Machado, MD, PhD Division of Biomedical Informatics Clinical pattern recognition and predictive models Evaluation of binary classifiers

More information

MODEL SELECTION STRATEGIES. Tony Panzarella

MODEL SELECTION STRATEGIES. Tony Panzarella MODEL SELECTION STRATEGIES Tony Panzarella Lab Course March 20, 2014 2 Preamble Although focus will be on time-to-event data the same principles apply to other outcome data Lab Course March 20, 2014 3

More information

Graphical assessment of internal and external calibration of logistic regression models by using loess smoothers

Graphical assessment of internal and external calibration of logistic regression models by using loess smoothers Tutorial in Biostatistics Received 21 November 2012, Accepted 17 July 2013 Published online 23 August 2013 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/sim.5941 Graphical assessment of

More information

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS) Chapter : Advanced Remedial Measures Weighted Least Squares (WLS) When the error variance appears nonconstant, a transformation (of Y and/or X) is a quick remedy. But it may not solve the problem, or it

More information

What is Regularization? Example by Sean Owen

What is Regularization? Example by Sean Owen What is Regularization? Example by Sean Owen What is Regularization? Name3 Species Size Threat Bo snake small friendly Miley dog small friendly Fifi cat small enemy Muffy cat small friendly Rufus dog large

More information

STATISTICAL METHODS FOR DIAGNOSTIC TESTING: AN ILLUSTRATION USING A NEW METHOD FOR CANCER DETECTION XIN SUN. PhD, Kansas State University, 2012

STATISTICAL METHODS FOR DIAGNOSTIC TESTING: AN ILLUSTRATION USING A NEW METHOD FOR CANCER DETECTION XIN SUN. PhD, Kansas State University, 2012 STATISTICAL METHODS FOR DIAGNOSTIC TESTING: AN ILLUSTRATION USING A NEW METHOD FOR CANCER DETECTION by XIN SUN PhD, Kansas State University, 2012 A THESIS Submitted in partial fulfillment of the requirements

More information

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models White Paper 23-12 Estimating Complex Phenotype Prevalence Using Predictive Models Authors: Nicholas A. Furlotte Aaron Kleinman Robin Smith David Hinds Created: September 25 th, 2015 September 25th, 2015

More information

Applying Machine Learning Methods in Medical Research Studies

Applying Machine Learning Methods in Medical Research Studies Applying Machine Learning Methods in Medical Research Studies Daniel Stahl Department of Biostatistics and Health Informatics Psychiatry, Psychology & Neuroscience (IoPPN), King s College London daniel.r.stahl@kcl.ac.uk

More information

Rest and Exercise Echocardiography in Hypertrophic Cardiomyopathy: Determinants of Exercise Peak Gradient and Predictors of Outcome

Rest and Exercise Echocardiography in Hypertrophic Cardiomyopathy: Determinants of Exercise Peak Gradient and Predictors of Outcome Rest and Exercise Echocardiography in Hypertrophic Cardiomyopathy: Determinants of Exercise Peak Gradient and Predictors of Outcome G. Deswarte, AS. Polge, N. Lamblin, A. Millaire, M. Richardson, C. Bauters,

More information

An Introduction to Bayesian Statistics

An Introduction to Bayesian Statistics An Introduction to Bayesian Statistics Robert Weiss Department of Biostatistics UCLA Fielding School of Public Health robweiss@ucla.edu Sept 2015 Robert Weiss (UCLA) An Introduction to Bayesian Statistics

More information

Chapter 17 Sensitivity Analysis and Model Validation

Chapter 17 Sensitivity Analysis and Model Validation Chapter 17 Sensitivity Analysis and Model Validation Justin D. Salciccioli, Yves Crutain, Matthieu Komorowski and Dominic C. Marshall Learning Objectives Appreciate that all models possess inherent limitations

More information

MODEL PERFORMANCE ANALYSIS AND MODEL VALIDATION IN LOGISTIC REGRESSION

MODEL PERFORMANCE ANALYSIS AND MODEL VALIDATION IN LOGISTIC REGRESSION STATISTICA, anno LXIII, n. 2, 2003 MODEL PERFORMANCE ANALYSIS AND MODEL VALIDATION IN LOGISTIC REGRESSION 1. INTRODUCTION Regression models are powerful tools frequently used to predict a dependent variable

More information

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method Biost 590: Statistical Consulting Statistical Classification of Scientific Studies; Approach to Consulting Lecture Outline Statistical Classification of Scientific Studies Statistical Tasks Approach to

More information

An informal analysis of multilevel variance

An informal analysis of multilevel variance APPENDIX 11A An informal analysis of multilevel Imagine we are studying the blood pressure of a number of individuals (level 1) from different neighbourhoods (level 2) in the same city. We start by doing

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

More information

Template 1 for summarising studies addressing prognostic questions

Template 1 for summarising studies addressing prognostic questions Template 1 for summarising studies addressing prognostic questions Instructions to fill the table: When no element can be added under one or more heading, include the mention: O Not applicable when an

More information

Machine Learning to Inform Breast Cancer Post-Recovery Surveillance

Machine Learning to Inform Breast Cancer Post-Recovery Surveillance Machine Learning to Inform Breast Cancer Post-Recovery Surveillance Final Project Report CS 229 Autumn 2017 Category: Life Sciences Maxwell Allman (mallman) Lin Fan (linfan) Jamie Kang (kangjh) 1 Introduction

More information

WELCOME! Lecture 11 Thommy Perlinger

WELCOME! Lecture 11 Thommy Perlinger Quantitative Methods II WELCOME! Lecture 11 Thommy Perlinger Regression based on violated assumptions If any of the assumptions are violated, potential inaccuracies may be present in the estimated regression

More information

Modelling Research Productivity Using a Generalization of the Ordered Logistic Regression Model

Modelling Research Productivity Using a Generalization of the Ordered Logistic Regression Model Modelling Research Productivity Using a Generalization of the Ordered Logistic Regression Model Delia North Temesgen Zewotir Michael Murray Abstract In South Africa, the Department of Education allocates

More information

The impact of pre-selected variance inflation factor thresholds on the stability and predictive power of logistic regression models in credit scoring

The impact of pre-selected variance inflation factor thresholds on the stability and predictive power of logistic regression models in credit scoring Volume 31 (1), pp. 17 37 http://orion.journals.ac.za ORiON ISSN 0529-191-X 2015 The impact of pre-selected variance inflation factor thresholds on the stability and predictive power of logistic regression

More information

Supplemental Material

Supplemental Material Supplemental Material Supplemental Results The baseline patient characteristics for all subgroups analyzed are shown in Table S1. Tables S2-S6 demonstrate the association between ECG metrics and cardiovascular

More information

Influence of Hypertension and Diabetes Mellitus on. Family History of Heart Attack in Male Patients

Influence of Hypertension and Diabetes Mellitus on. Family History of Heart Attack in Male Patients Applied Mathematical Sciences, Vol. 6, 01, no. 66, 359-366 Influence of Hypertension and Diabetes Mellitus on Family History of Heart Attack in Male Patients Wan Muhamad Amir W Ahmad 1, Norizan Mohamed,

More information

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics Biost 517 Applied Biostatistics I Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 3: Overview of Descriptive Statistics October 3, 2005 Lecture Outline Purpose

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys Multiple Regression Analysis 1 CRITERIA FOR USE Multiple regression analysis is used to test the effects of n independent (predictor) variables on a single dependent (criterion) variable. Regression tests

More information

Predicting Breast Cancer Survival Using Treatment and Patient Factors

Predicting Breast Cancer Survival Using Treatment and Patient Factors Predicting Breast Cancer Survival Using Treatment and Patient Factors William Chen wchen808@stanford.edu Henry Wang hwang9@stanford.edu 1. Introduction Breast cancer is the leading type of cancer in women

More information

Cross-validation. Miguel Angel Luque Fernandez Faculty of Epidemiology and Population Health Department of Non-communicable Diseases.

Cross-validation. Miguel Angel Luque Fernandez Faculty of Epidemiology and Population Health Department of Non-communicable Diseases. Cross-validation Miguel Angel Luque Fernandez Faculty of Epidemiology and Population Health Department of Non-communicable Diseases. October 29, 2015 Cancer Survival Group (LSH&TM) Cross-validation October

More information

1 Introduction. st0020. The Stata Journal (2002) 2, Number 3, pp

1 Introduction. st0020. The Stata Journal (2002) 2, Number 3, pp The Stata Journal (22) 2, Number 3, pp. 28 289 Comparative assessment of three common algorithms for estimating the variance of the area under the nonparametric receiver operating characteristic curve

More information

bivariate analysis: The statistical analysis of the relationship between two variables.

bivariate analysis: The statistical analysis of the relationship between two variables. bivariate analysis: The statistical analysis of the relationship between two variables. cell frequency: The number of cases in a cell of a cross-tabulation (contingency table). chi-square (χ 2 ) test for

More information

Daniel Boduszek University of Huddersfield

Daniel Boduszek University of Huddersfield Daniel Boduszek University of Huddersfield d.boduszek@hud.ac.uk Introduction to Logistic Regression SPSS procedure of LR Interpretation of SPSS output Presenting results from LR Logistic regression is

More information

Multiple Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Multiple Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Multiple Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Multiple Regression 1 / 19 Multiple Regression 1 The Multiple

More information

A novel clinical risk prediction model for sudden cardiac death in HCM: a proof of concept study

A novel clinical risk prediction model for sudden cardiac death in HCM: a proof of concept study A novel clinical risk prediction model for sudden cardiac death in HCM: a proof of concept study C O Mahony,* S Rahman,** E Biagini, C Rappezzi, L Monseratt, J Gimeno, G Limongeli, A Anastasakis, W McKenna,*

More information

The SAGE Encyclopedia of Educational Research, Measurement, and Evaluation Multivariate Analysis of Variance

The SAGE Encyclopedia of Educational Research, Measurement, and Evaluation Multivariate Analysis of Variance The SAGE Encyclopedia of Educational Research, Measurement, Multivariate Analysis of Variance Contributors: David W. Stockburger Edited by: Bruce B. Frey Book Title: Chapter Title: "Multivariate Analysis

More information

11/24/2017. Do not imply a cause-and-effect relationship

11/24/2017. Do not imply a cause-and-effect relationship Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are highly extraverted people less afraid of rejection

More information

Evidence Based Medicine

Evidence Based Medicine Course Goals Goals 1. Understand basic concepts of evidence based medicine (EBM) and how EBM facilitates optimal patient care. 2. Develop a basic understanding of how clinical research studies are designed

More information

Managing Hypertrophic Cardiomyopathy with Imaging. Gisela C. Mueller University of Michigan Department of Radiology

Managing Hypertrophic Cardiomyopathy with Imaging. Gisela C. Mueller University of Michigan Department of Radiology Managing Hypertrophic Cardiomyopathy with Imaging Gisela C. Mueller University of Michigan Department of Radiology Disclosures Gadolinium contrast material for cardiac MRI Acronyms Afib CAD Atrial fibrillation

More information

Introduction to Meta-analysis of Accuracy Data

Introduction to Meta-analysis of Accuracy Data Introduction to Meta-analysis of Accuracy Data Hans Reitsma MD, PhD Dept. of Clinical Epidemiology, Biostatistics & Bioinformatics Academic Medical Center - Amsterdam Continental European Support Unit

More information

VARIABLE SELECTION WHEN CONFRONTED WITH MISSING DATA

VARIABLE SELECTION WHEN CONFRONTED WITH MISSING DATA VARIABLE SELECTION WHEN CONFRONTED WITH MISSING DATA by Melissa L. Ziegler B.S. Mathematics, Elizabethtown College, 2000 M.A. Statistics, University of Pittsburgh, 2002 Submitted to the Graduate Faculty

More information

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training.

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training. Supplementary Figure 1 Behavioral training. a, Mazes used for behavioral training. Asterisks indicate reward location. Only some example mazes are shown (for example, right choice and not left choice maze

More information

Colon cancer subtypes from gene expression data

Colon cancer subtypes from gene expression data Colon cancer subtypes from gene expression data Nathan Cunningham Giuseppe Di Benedetto Sherman Ip Leon Law Module 6: Applied Statistics 26th February 2016 Aim Replicate findings of Felipe De Sousa et

More information

Simple Linear Regression the model, estimation and testing

Simple Linear Regression the model, estimation and testing Simple Linear Regression the model, estimation and testing Lecture No. 05 Example 1 A production manager has compared the dexterity test scores of five assembly-line employees with their hourly productivity.

More information

Abstract ESC Pisa

Abstract ESC Pisa Abstract ESC 82441 Maximal left ventricular mass-to-power output: A novel index to assess left ventricular performance and to predict outcome in patients with advanced heart failure FL. Dini 1, D. Mele

More information

Biostatistics 2 nd year Comprehensive Examination. Due: May 31 st, 2013 by 5pm. Instructions:

Biostatistics 2 nd year Comprehensive Examination. Due: May 31 st, 2013 by 5pm. Instructions: Biostatistics 2 nd year Comprehensive Examination Due: May 31 st, 2013 by 5pm. Instructions: 1. The exam is divided into two parts. There are 6 questions in section I and 2 questions in section II. 2.

More information

Dr. Dermot Phelan MB BCh BAO PhD European Society of Cardiology 2012

Dr. Dermot Phelan MB BCh BAO PhD European Society of Cardiology 2012 Relative Apical Sparing of Longitudinal Strain Using 2- Dimensional Speckle-Tracking Echocardiography is Both Sensitive and Specific for the Diagnosis of Cardiac Amyloidosis. Dr. Dermot Phelan MB BCh BAO

More information

Response to Mease and Wyner, Evidence Contrary to the Statistical View of Boosting, JMLR 9:1 26, 2008

Response to Mease and Wyner, Evidence Contrary to the Statistical View of Boosting, JMLR 9:1 26, 2008 Journal of Machine Learning Research 9 (2008) 59-64 Published 1/08 Response to Mease and Wyner, Evidence Contrary to the Statistical View of Boosting, JMLR 9:1 26, 2008 Jerome Friedman Trevor Hastie Robert

More information

Treatment of Hypertrophic Cardiomyopathy in Bruce B. Reid, MD

Treatment of Hypertrophic Cardiomyopathy in Bruce B. Reid, MD Treatment of Hypertrophic Cardiomyopathy in 2017 Bruce B. Reid, MD Disclosures I have no conflicts of interest to disclose I will not be discussing any off label medications and/or devices Objectives 1)

More information

Reliability of Ordination Analyses

Reliability of Ordination Analyses Reliability of Ordination Analyses Objectives: Discuss Reliability Define Consistency and Accuracy Discuss Validation Methods Opening Thoughts Inference Space: What is it? Inference space can be defined

More information

CHAPTER - 6 STATISTICAL ANALYSIS. This chapter discusses inferential statistics, which use sample data to

CHAPTER - 6 STATISTICAL ANALYSIS. This chapter discusses inferential statistics, which use sample data to CHAPTER - 6 STATISTICAL ANALYSIS 6.1 Introduction This chapter discusses inferential statistics, which use sample data to make decisions or inferences about population. Populations are group of interest

More information

Technical Notes for PHC4 s Report on CABG and Valve Surgery Calendar Year 2005

Technical Notes for PHC4 s Report on CABG and Valve Surgery Calendar Year 2005 Technical Notes for PHC4 s Report on CABG and Valve Surgery Calendar Year 2005 The Pennsylvania Health Care Cost Containment Council April 2007 Preface This document serves as a technical supplement to

More information

Statistical modelling for thoracic surgery using a nomogram based on logistic regression

Statistical modelling for thoracic surgery using a nomogram based on logistic regression Statistics Corner Statistical modelling for thoracic surgery using a nomogram based on logistic regression Run-Zhong Liu 1, Ze-Rui Zhao 2, Calvin S. H. Ng 2 1 Department of Medical Statistics and Epidemiology,

More information

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0% Capstone Test (will consist of FOUR quizzes and the FINAL test grade will be an average of the four quizzes). Capstone #1: Review of Chapters 1-3 Capstone #2: Review of Chapter 4 Capstone #3: Review of

More information

Discrimination and Reclassification in Statistics and Study Design AACC/ASN 30 th Beckman Conference

Discrimination and Reclassification in Statistics and Study Design AACC/ASN 30 th Beckman Conference Discrimination and Reclassification in Statistics and Study Design AACC/ASN 30 th Beckman Conference Michael J. Pencina, PhD Duke Clinical Research Institute Duke University Department of Biostatistics

More information

Introduction to diagnostic accuracy meta-analysis. Yemisi Takwoingi October 2015

Introduction to diagnostic accuracy meta-analysis. Yemisi Takwoingi October 2015 Introduction to diagnostic accuracy meta-analysis Yemisi Takwoingi October 2015 Learning objectives To appreciate the concept underlying DTA meta-analytic approaches To know the Moses-Littenberg SROC method

More information

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality Week 9 Hour 3 Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality Stat 302 Notes. Week 9, Hour 3, Page 1 / 39 Stepwise Now that we've introduced interactions,

More information

Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision in Pune, India

Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision in Pune, India 20th International Congress on Modelling and Simulation, Adelaide, Australia, 1 6 December 2013 www.mssanz.org.au/modsim2013 Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision

More information

Clincial Biostatistics. Regression

Clincial Biostatistics. Regression Regression analyses Clincial Biostatistics Regression Regression is the rather strange name given to a set of methods for predicting one variable from another. The data shown in Table 1 and come from a

More information

Multiple Treatments on the Same Experimental Unit. Lukas Meier (most material based on lecture notes and slides from H.R. Roth)

Multiple Treatments on the Same Experimental Unit. Lukas Meier (most material based on lecture notes and slides from H.R. Roth) Multiple Treatments on the Same Experimental Unit Lukas Meier (most material based on lecture notes and slides from H.R. Roth) Introduction We learned that blocking is a very helpful technique to reduce

More information

Anale. Seria Informatică. Vol. XVI fasc Annals. Computer Science Series. 16 th Tome 1 st Fasc. 2018

Anale. Seria Informatică. Vol. XVI fasc Annals. Computer Science Series. 16 th Tome 1 st Fasc. 2018 HANDLING MULTICOLLINEARITY; A COMPARATIVE STUDY OF THE PREDICTION PERFORMANCE OF SOME METHODS BASED ON SOME PROBABILITY DISTRIBUTIONS Zakari Y., Yau S. A., Usman U. Department of Mathematics, Usmanu Danfodiyo

More information

CHAPTER 3 RESEARCH METHODOLOGY

CHAPTER 3 RESEARCH METHODOLOGY CHAPTER 3 RESEARCH METHODOLOGY 3.1 Introduction 3.1 Methodology 3.1.1 Research Design 3.1. Research Framework Design 3.1.3 Research Instrument 3.1.4 Validity of Questionnaire 3.1.5 Statistical Measurement

More information

Small Group Presentations

Small Group Presentations Admin Assignment 1 due next Tuesday at 3pm in the Psychology course centre. Matrix Quiz during the first hour of next lecture. Assignment 2 due 13 May at 10am. I will upload and distribute these at the

More information

Reporting and Methods in Clinical Prediction Research: A Systematic Review

Reporting and Methods in Clinical Prediction Research: A Systematic Review Reporting and Methods in Clinical Prediction Research: A Systematic Review Walter Bouwmeester 1., Nicolaas P. A. Zuithoff 1., Susan Mallett 2, Mirjam I. Geerlings 1, Yvonne Vergouwe 1,3, Ewout W. Steyerberg

More information

PRINCIPLES OF STATISTICS

PRINCIPLES OF STATISTICS PRINCIPLES OF STATISTICS STA-201-TE This TECEP is an introduction to descriptive and inferential statistics. Topics include: measures of central tendency, variability, correlation, regression, hypothesis

More information

patients actual drug exposure for every single-day of contribution to monthly cohorts, either before or

patients actual drug exposure for every single-day of contribution to monthly cohorts, either before or SUPPLEMENTAL MATERIAL Methods Monthly cohorts and exposure Exposure to generic or brand-name drugs were captured at an individual level, reflecting each patients actual drug exposure for every single-day

More information

Overview of Multivariable Prediction Modelling. Methodological Conduct & Reporting: Introducing TRIPOD guidelines

Overview of Multivariable Prediction Modelling. Methodological Conduct & Reporting: Introducing TRIPOD guidelines Overview of Multivariable Prediction Modelling Methodological Conduct & Reporting: Introducing TRIPOD guidelines Gary Collins Centre for Statistics in Medicine www.csm-oxford.org.uk University of Oxford

More information

NHS Diabetes Prevention Programme (NHS DPP) Non-diabetic hyperglycaemia. Produced by: National Cardiovascular Intelligence Network (NCVIN)

NHS Diabetes Prevention Programme (NHS DPP) Non-diabetic hyperglycaemia. Produced by: National Cardiovascular Intelligence Network (NCVIN) NHS Diabetes Prevention Programme (NHS DPP) Non-diabetic hyperglycaemia Produced by: National Cardiovascular Intelligence Network (NCVIN) Date: August 2015 About Public Health England Public Health England

More information

Part 8 Logistic Regression

Part 8 Logistic Regression 1 Quantitative Methods for Health Research A Practical Interactive Guide to Epidemiology and Statistics Practical Course in Quantitative Data Handling SPSS (Statistical Package for the Social Sciences)

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016 Exam policy: This exam allows one one-page, two-sided cheat sheet; No other materials. Time: 80 minutes. Be sure to write your name and

More information

Aortic Stenosis and Perioperative Risk With Non-cardiac Surgery

Aortic Stenosis and Perioperative Risk With Non-cardiac Surgery Aortic Stenosis and Perioperative Risk With Non-cardiac Surgery Aortic stenosis (AS) is characterized as a high-risk index for cardiac complications during non-cardiac surgery. A critical analysis of old

More information

Content. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries. Research question. Example Newly diagnosed Type 2 Diabetes

Content. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries. Research question. Example Newly diagnosed Type 2 Diabetes Content Quantifying association between continuous variables. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General

More information

Article from. Forecasting and Futurism. Month Year July 2015 Issue Number 11

Article from. Forecasting and Futurism. Month Year July 2015 Issue Number 11 Article from Forecasting and Futurism Month Year July 2015 Issue Number 11 Calibrating Risk Score Model with Partial Credibility By Shea Parkes and Brad Armstrong Risk adjustment models are commonly used

More information

Online Supplementary Appendix

Online Supplementary Appendix Online Supplementary Appendix This appendix has been provided by the authors to give readers additional information about their work. Supplement to: Lehman * LH, Saeed * M, Talmor D, Mark RG, and Malhotra

More information

Development, validation and application of risk prediction models

Development, validation and application of risk prediction models Development, validation and application of risk prediction models G. Colditz, E. Liu, M. Olsen, & others (Ying Liu, TA) 3/28/2012 Risk Prediction Models 1 Goals Through examples, class discussion, and

More information

Identifying Susceptibility in Epidemiology Studies: Implications for Risk Assessment. Joel Schwartz Harvard TH Chan School of Public Health

Identifying Susceptibility in Epidemiology Studies: Implications for Risk Assessment. Joel Schwartz Harvard TH Chan School of Public Health Identifying Susceptibility in Epidemiology Studies: Implications for Risk Assessment Joel Schwartz Harvard TH Chan School of Public Health Risk Assessment and Susceptibility Typically we do risk assessments

More information

Load and Function - Valvular Heart Disease. Tom Marwick, Cardiovascular Imaging Cleveland Clinic

Load and Function - Valvular Heart Disease. Tom Marwick, Cardiovascular Imaging Cleveland Clinic Load and Function - Valvular Heart Disease Tom Marwick, Cardiovascular Imaging Cleveland Clinic Indications for surgery in common valve lesions Risks Operative mortality Failed repair - to MVR Operative

More information

Systematic reviews of prediction modeling studies: planning, critical appraisal and data collection

Systematic reviews of prediction modeling studies: planning, critical appraisal and data collection Systematic reviews of prediction modeling studies: planning, critical appraisal and data collection Karel GM Moons, Lotty Hooft, Hans Reitsma, Thomas Debray Dutch Cochrane Center Julius Center for Health

More information

CSE 258 Lecture 2. Web Mining and Recommender Systems. Supervised learning Regression

CSE 258 Lecture 2. Web Mining and Recommender Systems. Supervised learning Regression CSE 258 Lecture 2 Web Mining and Recommender Systems Supervised learning Regression Supervised versus unsupervised learning Learning approaches attempt to model data in order to solve a problem Unsupervised

More information

Score Tests of Normality in Bivariate Probit Models

Score Tests of Normality in Bivariate Probit Models Score Tests of Normality in Bivariate Probit Models Anthony Murphy Nuffield College, Oxford OX1 1NF, UK Abstract: A relatively simple and convenient score test of normality in the bivariate probit model

More information

Bringing machine learning to the point of care to inform suicide prevention

Bringing machine learning to the point of care to inform suicide prevention Bringing machine learning to the point of care to inform suicide prevention Gregory Simon and Susan Shortreed Kaiser Permanente Washington Health Research Institute Don Mordecai The Permanente Medical

More information

Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections

Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections New: Bias-variance decomposition, biasvariance tradeoff, overfitting, regularization, and feature selection Yi

More information

Data Analysis in Practice-Based Research. Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine

Data Analysis in Practice-Based Research. Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine Data Analysis in Practice-Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine Multilevel Data Statistical analyses that fail to recognize

More information

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018 Introduction to Machine Learning Katherine Heller Deep Learning Summer School 2018 Outline Kinds of machine learning Linear regression Regularization Bayesian methods Logistic Regression Why we do this

More information

Political Science 15, Winter 2014 Final Review

Political Science 15, Winter 2014 Final Review Political Science 15, Winter 2014 Final Review The major topics covered in class are listed below. You should also take a look at the readings listed on the class website. Studying Politics Scientifically

More information

Lecture II: Difference in Difference and Regression Discontinuity

Lecture II: Difference in Difference and Regression Discontinuity Review Lecture II: Difference in Difference and Regression Discontinuity it From Lecture I Causality is difficult to Show from cross sectional observational studies What caused what? X caused Y, Y caused

More information

Chapter 3: Examining Relationships

Chapter 3: Examining Relationships Name Date Per Key Vocabulary: response variable explanatory variable independent variable dependent variable scatterplot positive association negative association linear correlation r-value regression

More information

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES 24 MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES In the previous chapter, simple linear regression was used when you have one independent variable and one dependent variable. This chapter

More information

12/30/2017. PSY 5102: Advanced Statistics for Psychological and Behavioral Research 2

12/30/2017. PSY 5102: Advanced Statistics for Psychological and Behavioral Research 2 PSY 5102: Advanced Statistics for Psychological and Behavioral Research 2 Selecting a statistical test Relationships among major statistical methods General Linear Model and multiple regression Special

More information

TOTAL HIP AND KNEE REPLACEMENTS. FISCAL YEAR 2002 DATA July 1, 2001 through June 30, 2002 TECHNICAL NOTES

TOTAL HIP AND KNEE REPLACEMENTS. FISCAL YEAR 2002 DATA July 1, 2001 through June 30, 2002 TECHNICAL NOTES TOTAL HIP AND KNEE REPLACEMENTS FISCAL YEAR 2002 DATA July 1, 2001 through June 30, 2002 TECHNICAL NOTES The Pennsylvania Health Care Cost Containment Council April 2005 Preface This document serves as

More information

SUPPLEMENTARY INFORMATION. Table 1 Patient characteristics Preoperative. language testing

SUPPLEMENTARY INFORMATION. Table 1 Patient characteristics Preoperative. language testing Categorical Speech Representation in the Human Superior Temporal Gyrus Edward F. Chang, Jochem W. Rieger, Keith D. Johnson, Mitchel S. Berger, Nicholas M. Barbaro, Robert T. Knight SUPPLEMENTARY INFORMATION

More information

Risk prediction in inherited conditions Laminopathies

Risk prediction in inherited conditions Laminopathies Risk prediction in inherited conditions Laminopathies Karim Wahbi Cochin hospital, Paris karim.wahbi@aphp.fr Risk prediction in laminopathies Current approach for risk stratification A new score to predict

More information

Supplementary appendix

Supplementary appendix Supplementary appendix This appendix formed part of the original submission and has been peer reviewed. We post it as supplied by the authors. Supplement to: Callegaro D, Miceli R, Bonvalot S, et al. Development

More information

Experience with 500 Stentless Aortic Valve Replacements

Experience with 500 Stentless Aortic Valve Replacements Experience with 500 Stentless Aortic Valve Replacements Dimitrios C. Iliopoulos, MD Cardiac Surgeon Ass. Professor of Surgery University of Athens, School of Medicine I declare no conflict of interest

More information

Classical Psychophysical Methods (cont.)

Classical Psychophysical Methods (cont.) Classical Psychophysical Methods (cont.) 1 Outline Method of Adjustment Method of Limits Method of Constant Stimuli Probit Analysis 2 Method of Constant Stimuli A set of equally spaced levels of the stimulus

More information

Introduction to ROC analysis

Introduction to ROC analysis Introduction to ROC analysis Andriy I. Bandos Department of Biostatistics University of Pittsburgh Acknowledgements Many thanks to Sam Wieand, Nancy Obuchowski, Brenda Kurland, and Todd Alonzo for previous

More information