Multiple Analysis. Some Nomenclatures. Learning Objectives. A Weight Lifting Analysis. SCHOOL OF NURSING The University of Hong Kong

Similar documents
MODEL SELECTION STRATEGIES. Tony Panzarella

WELCOME! Lecture 11 Thommy Perlinger

Part 8 Logistic Regression

Daniel Boduszek University of Huddersfield

Risk factors for the initiation and aggravation of lymphoedema after axillary lymph node dissection for breast cancer

STATS8: Introduction to Biostatistics. Overview. Babak Shahbaba Department of Statistics, UCI

Week 8 Hour 1: More on polynomial fits. The AIC. Hour 2: Dummy Variables what are they? An NHL Example. Hour 3: Interactions. The stepwise method.

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics

The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016

Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2016 Creative Commons Attribution 4.0

Preliminary Report on Simple Statistical Tests (t-tests and bivariate correlations)

Chapter 14: More Powerful Statistical Methods

Study Guide #2: MULTIPLE REGRESSION in education

CHILD HEALTH AND DEVELOPMENT STUDY

Chapter 2. The Data Analysis Process and Collecting Data Sensibly. Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Body Mechanics When caring for a client

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method

BIOSTATISTICAL METHODS

Multiple Regression Models

Manuscript Presentation: Writing up APIM Results

# % & (!) +,. / !( : 0 ( (;9 +/ ((8

Title: Socioeconomic conditions and number of pain sites in women

Interaction Effects: Centering, Variance Inflation Factor, and Interpretation Issues

All Possible Regressions Using IBM SPSS: A Practitioner s Guide to Automatic Linear Modeling

CHAMP: CHecklist for the Appraisal of Moderators and Predictors

Title: Identifying work ability promoting factors for home care aides and assistant nurses

Correlation and regression

Design of Experiments & Introduction to Research

Title: Body fatness and breast cancer risk in women of African ancestry

From single studies to an EBM based assessment some central issues

AP Statistics Exam Review: Strand 2: Sampling and Experimentation Date:

Measuring impact. William Parienté UC Louvain J PAL Europe. povertyactionlab.org

Applications. DSC 410/510 Multivariate Statistical Methods. Discriminating Two Groups. What is Discriminant Analysis

Funnelling Used to describe a process of narrowing down of focus within a literature review. So, the writer begins with a broad discussion providing b

TOC: VE examples, VE student surveys, VE diagnostic questions Virtual Experiments Examples

Lecture Outline Biost 517 Applied Biostatistics I

The Assertive Community Treatment Transition Readiness Scale User s Manual 1

Unit 1 Exploring and Understanding Data

Regular physical activity is the best tool to improve health and wellbeing. The SAIL Home Activity Program has 3 levels: Reasons to Move Your Body

Title:Postpartum contraceptive use in Gondar town, Northwest Ethiopia: a community based cross-sectional study

Multivariable Systems. Lawrence Hubert. July 31, 2011

Forward Step-ups 2 x 15. Backward Lunges 2 x 15. Bosu/Stability Ball Planks 1 x 12. Bosu/Stability Ball Hip Bridges 1 x 12

THE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER

Title: Home Exposure to Arabian Incense (Bakhour) and Asthma Symptoms in Children: A Community Survey in Two Regions in Oman

CHAPTER OBJECTIVES - STUDENTS SHOULD BE ABLE TO:

Designs. February 17, 2010 Pedro Wolf

In many cardiovascular experiments and observational studies,

AN EXPLORATORY STUDY OF LEADER-MEMBER EXCHANGE IN CHINA, AND THE ROLE OF GUANXI IN THE LMX PROCESS

CHAPTER VI RESEARCH METHODOLOGY

Determining Whether or Not Dental Students Will Immediately Enter Private Practice Upon Graduation. Raymond A. Kuthy Sarah E.

Author's response to reviews

Fitting the Method to the Question

Applying Machine Learning Methods in Medical Research Studies

BIOSTATISTICAL METHODS

Aetiology versus Prediction - correct for Confounding? Friedo Dekker ERA-EDTA Registry / LUMC

Template 1 for summarising studies addressing prognostic questions

The Progressa bed system

A critical look at the use of SEM in international business research

Variable Data univariate data set bivariate data set multivariate data set categorical qualitative numerical quantitative

Sonographic measurement of the epiglottis in normal Chinese adults

Survey research (Lecture 1) Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.

Survey research (Lecture 1)

Summary. 20 May 2014 EMA/CHMP/SAWP/298348/2014 Procedure No.: EMEA/H/SAB/037/1/Q/2013/SME Product Development Scientific Support Department

Discontinuation and restarting in patients on statin treatment: prospective open cohort study using a primary care database

Correlation and Regression

Fitting the Method to the Question

(b) empirical power. IV: blinded IV: unblinded Regr: blinded Regr: unblinded α. empirical power

Analysis of Confidence Rating Pilot Data: Executive Summary for the UKCAT Board

Cost-Utility Analysis (CUA), part II

MY FITNESS PAL USER GUIDE

ORIGINAL INVESTIGATION. C-Reactive Protein Concentration and Incident Hypertension in Young Adults

How to describe bivariate data

Chapter 11 Nonexperimental Quantitative Research Steps in Nonexperimental Research

MENTOR METHOD OF TRAINING

CHAPTER 4 RESULTS. In this chapter the results of the empirical research are reported and discussed in the following order:

Selecting Research Participants. Conducting Experiments, Survey Construction and Data Collection. Practical Considerations of Research

A MONTE CARLO STUDY OF MODEL SELECTION PROCEDURES FOR THE ANALYSIS OF CATEGORICAL DATA

The SAGE Encyclopedia of Educational Research, Measurement, and Evaluation Multivariate Analysis of Variance

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

When women look at a man, they see their shape. What they ideally

10.2 Summary of the Votes and Considerations for Policy

CONDUCTING TRAINING SESSIONS HAPTER

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto

Chapter Eight: Multivariate Analysis

Quick Reference Guide

A Guide to Reading a Clinical or Research Publication

MAKING THE NSQIP PARTICIPANT USE DATA FILE (PUF) WORK FOR YOU

What is Olympic Weightlifting?

Chapter 11: Experiments and Observational Studies p 318

Bayesian Inference. Thomas Nichols. With thanks Lee Harrison

Epidemiology: Overview of Key Concepts and Study Design. Polly Marchbanks

Strategies to live better and preserve function with knee arthritis Thursday, 06 November :06

City, University of London Institutional Repository

Package StepReg. November 3, 2017

Practical Advice for Caring Safely: The ergonomics of providing care for a frail older adult

Office Ergonomics Calculator. Presented by Chelsie Baizana, B.Sc, M.Sc Trevor Schell, M.Sc, CCPE

Predicting Aneurysm Rupture

Title: Correlates of STI symptoms among female sex workers with trucker driver clients in two Mexican border towns

EXPERIMENTAL RESEARCH DESIGNS

Transcription:

Some Nomenclatures Multiple Analysis Daniel Y.T. Fong Dependent/ Outcome variable Independent/ Explanatory variable Univariate Analyses 1 1 1 2 Simple Analysis Multiple Analysis /Multivariable Analysis NURS8222 Statistical Practice in Health Sciences SCHOOL OF NURSING The University of Hong Kong 2 Multivariate Analyses 1 Out of our scope! Learning Objectives A Weight Lifting Analysis A Simple Regression Analysis 1. To understand the possible complications with multiple independent variables 2. To learn methods of selecting predictive factors Weight-lifter Independent variable Weight Outcome data of all subjects Floor Data of the independent variable from all subjects A full-length line means the independent variable of the subject was measured A shortened line means the independent variable of the subject was NOT measured

Who is More Likely to Win? Multiple Multivariate Stronger Weaker Higher R 2 Lower R 2 A comment from the Referee: 1 weight-lifter Simple Analysis 2 weight-lifters Multiple Analysis The Published Version Multiple Regression New Complications ~ Hepatology (2002)

Unadjusted and Adjusted Effects of Age Simple Regression Analysis Multiple Regression - Ideal Unadjusted Effect Adjusted Effect PF = a + b 1 age(in years) For 2 subjects whose age differs by 1 year, their PF will be different by b 1, on average. Multiple Regression Analysis PF = a + b 1 age + b 2 gender For 2 subjects of the same gender, if their ages differ by 1 year, their PF will be different by b 1 on average. Age provided information on PF in addition to gender..... All independent variables provide additional information to explain the outcome variable No missing values for all independent variables Independent variables are truly independent. Increased Strength Increased R 2 Center C ID Group C01 Placebo C02 Placebo C03 Ginseng C04 Ginseng C05 Ginseng C06 Placebo C07 Placebo C08 Ginseng C09 Ginseng C10 Placebo C11 Placebo C12 Placebo C13 Placebo C14 Placebo C15 Ginseng C16 Placebo C17 Ginseng C18 Ginseng Effect of Ginseng Setting: multi-center Subjects: 133 cancer patients Outcome: 12-week change (Week 12 Week 0) of General Health (SF-36) Placebo-controlled Stratified randomization by center Ginseng Placebo Total Center C 31 32 63 Center D 35 35 70 66 67 133 Center D ID Group D01 Placebo D02 Ginseng D03 Ginseng D04 Placebo D05 Placebo D06 Placebo D07 Ginseng D08 Placebo D09 Placebo D10 Ginseng D11 Placebo D12 Placebo D13 Ginseng D14 Ginseng D15 Placebo D16 Ginseng D17 Ginseng D18 Ginseng Multiple Regression Simple Regression Higher precision n = 133, R 2 = 12.1% n = 133, R 2 = 6.8% n = 133, R 2 = 5.4%

Reality 1 Non-useful Variables Age is Not Useful n = 133, R 2 = 13.0% Variable 2 provides no additional information to explain the outcome variable Independent variable(s) n R 2 Model 1 Group 133 6.8% Model 2 Center 133 5.4% Model 3 Age 133 1.4% Model 4 Group, Center 133 12.1% Reality 2 Missing Values Stage has Missing Values n = 103, R 2 = 40.8% Variables 1 and 2 (has more missing values) may decrease the overall power despite giving additional information Variable 3 (has no missing values) can effectively give more information n = 133, R 2 = 12.1%

Reality 3 Multi-collinearity An Awkward Result n = 133, R 2 = 13.6% Insignificant!! Independent variable(s) n R 2 P Data from independent variables 2 and 3 are associated The data may not allow the consideration of both variables simultaneously. Model 1 Group 133 6.8% 0.020 Model 2 Center 133 5.4% 0.002 Model 5 Compliance 133 8.1% 0.001 Model 4 Group, Center 133 12.1% <0.001 Were Group and Compliance Associated? Detecting Multi-collinearity By which test? P < 0.001 Only one of them can stay Sorry! I m also useful! Chinese Proverb: One hill cannot shelter two tigers Variance Inflation Factor (VIF = 1/Tolerance) High VIF or low Tolerance means severe problem with multi-collinearity No defining threshold for large Some suggested Tolerance 0.4

For an Independent Variable, Univariable Regression Multivariable Regression Possible Explanation(s)? From Insignificant to Significant An Example Multiple Analysis Significant Significant Insignificant Insignificant Significant Insignificant ~ Altman (1991) Simple Analyses Q & A 1. In a multivariable analysis, It is preferable to consider only independent variables that are significant in their univariable analyses An independent variable may be more likely to be significant than in a univariable analysis An insignificant independent variable is a result of either it is not useful or missing values Choosing the Best Model True or False?

Selecting the Best Predictors? Can we use R 2? General Health (dependent variable) R 2 of a regression model measures the usefulness of the model predictors in predicting/explaining the dependent variable stage age center group compliance Addition of predictors always increases R 2 This results in having more predictors than needed (over-fitting) Adjusted R 2 Adjusted R 2 2 n1 = 1(1 R ) nk 1 where k is the number of predictors It accounted for the number of model predictors It can be negative It can be reduced with addition of predictors We have not exhausted all possible models!! Optimal? Searching through All Possible Models?! k predictors 2 k possible models 2 predictors 4 possible models 3 predictors 8 possible models 5 predictors 32 possible models 10 predictors 1024 possible models Need better strategy!

Automatic Variable Selection Procedures 1. Forward entry Start with the empty model, i.e. no independent variables Add the candidate variable that is the most significant when added to the model Repeat until no more significant candidate variable when added Model Automatic Variable Selection Procedures 2. Backward removal Start with the full model, i.e. all independent variables Remove the variable in the model that is the most insignificant Keep doing until no more insignificant variables in the model Model Automatic Variable Selection Procedures 3. (Forward) Stepwise Start with the empty model, i.e. all independent variables Add the candidate variable that is the most significant, when added to the model Remove the variable in the model that is the most insignificant Keep doing and until no more insignificant variables in the model Go back to unless no more significant variable Model when added Forward Backward Forward Stepwise Contrasting the Variable Selection Procedures Key Characteristics 1. Start with empty model 2. Once a predictor is included, it will never be excluded 1. Start with full model 2. Once a predictor is excluded, it will never be included 1. Start with empty model 2. A predictor can be included and excluded from the model Remarks 1. Easier to implement when compared with the stepwise method 1. Not desirable when there are many independent variables 1. More flexible 2. Less easy to implement

Selecting the Best Predictors? The set of best predictors may NOT be unique especially when there is multi-collinearity An excluded predictor may be significant on its own (simple regression) e.g. group, and center are significant factors Selecting predictors based on theoretical principles is often desirable e.g. may prefer BMI to height and weight A Major Critique of using Variable Selection Procedures The inflated chance of false positive error due to multiple testings and the presence of influential observations. Be Clear to What you Want 1. those with determined association to examine No need or should not use any automated variable selection procedures here. 2. those for prediction May use automated variable selection procedures Preferably consider independent or cross validation of results 3. those for exploration Cautious with the use of automatic variable selection procedures and perform validation by all means Preferably incorporate clinical insights on the association among the variables, especially on their causal relationships, and apply some structured regression analysis. Alternatively, one may also consider more advanced statistical procedures such as the L1 regularization and the least angle regression (need a friend in statistics!) Q & A 2. In a regression analysis when a variable selection procedure is performed, all independent variables that are not selected have no effects on the outcome/dependent variable we may only consider independent variables that are significant in their simple analyses R 2 is always larger than adjusted R 2 True or False?

Effect of Ginseng - continue Interaction / Effects Modification Mean (SD) Placebo Ginseng DifferenceSE (p) (Ginseng Placebo) Center C + D 16.1 (14.8) 23.6 (13.0) 7.52.4 (P=0.002) Center C 15.4 (9.7) 17.2 (6.2) 1.72.1 (P=0.402) Center D 16.7 (18.5) 29.3 (14.7) 12.64.0 (P=0.002) Is the Ginseng effect different between the two centers? 12-week Change of GH Interaction effect 35 30 25 20 15 10 12.6 Center C Center D 1.7 Interaction effect of Group and Center = effect of Ginseng in Center D effect of Ginseng in Center C = 12.6 1.7 = 10.9 SE = 4.6 (based on the whole sample) P = 0.021 5 0 Placebo Ginseng

Deciding the Best Model Some Suggestions Automated variable selection should not include any interaction effects Examination of interaction effects is often based on clinical interest Remove all insignificant interaction terms before interpreting their corresponding main effects Keep all main effects when their interaction is significant Incorporating clinical theory Check for model validity FAQs 1. Will R 2 be higher even if I add a nonuseful independent variable? 2. Will two independent variables with multi-collinearity have interaction?