Analyzing binary outcomes, going beyond logistic regression

Similar documents
Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

Table. A [a] Multiply imputed. Outpu

Simple Linear Regression the model, estimation and testing

Use of GEEs in STATA

Correlation and regression

Logistic regression. Department of Statistics, University of South Carolina. Stat 205: Elementary Statistics for the Biological and Life Sciences

Modeling Binary outcome

Limited dependent variable regression models

m 11 m.1 > m 12 m.2 risk for smokers risk for nonsmokers

Pitfalls in Linear Regression Analysis

The Pretest! Pretest! Pretest! Assignment (Example 2)

Addendum: Multiple Regression Analysis (DRAFT 8/2/07)

Statistics as a Tool. A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations.

MEASURES OF ASSOCIATION AND REGRESSION

7 Statistical Issues that Researchers Shouldn t Worry (So Much) About

Daniel Boduszek University of Huddersfield

Ordinary Least Squares Regression

1. You want to find out what factors predict achievement in English. Develop a model that

Daniel Boduszek University of Huddersfield

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data

THE STATSWHISPERER. Introduction to this Issue. Doing Your Data Analysis INSIDE THIS ISSUE

Statistical reports Regression, 2010

SW 9300 Applied Regression Analysis and Generalized Linear Models 3 Credits. Master Syllabus

Simple Linear Regression

MEA DISCUSSION PAPERS

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

Today: Binomial response variable with an explanatory variable on an ordinal (rank) scale.

STATISTICS INFORMED DECISIONS USING DATA

Estimating Heterogeneous Choice Models with Stata

Problem Set 3 ECN Econometrics Professor Oscar Jorda. Name. ESSAY. Write your answer in the space provided.

investigate. educate. inform.

Data Analysis in Practice-Based Research. Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine

Regression so far... Lecture 22 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102

CLASSICAL AND. MODERN REGRESSION WITH APPLICATIONS

Daniel Boduszek University of Huddersfield

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)

Business Statistics Probability

appstats26.notebook April 17, 2015

Introduction to Econometrics

SUMMER 2011 RE-EXAM PSYF11STAT - STATISTIK

Moderation in management research: What, why, when and how. Jeremy F. Dawson. University of Sheffield, United Kingdom

Marno Verbeek Erasmus University, the Netherlands. Cons. Pros

Midterm project due next Wednesday at 2 PM

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data

Generalized Estimating Equations for Depression Dose Regimes

Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach

IAPT: Regression. Regression analyses

How to analyze correlated and longitudinal data?

Kidane Tesfu Habtemariam, MASTAT, Principle of Stat Data Analysis Project work

Chapter 13 Estimating the Modified Odds Ratio

Applications of Regression Models in Epidemiology

Differential Item Functioning

NPTEL Project. Econometric Modelling. Module 14: Heteroscedasticity Problem. Module 16: Heteroscedasticity Problem. Vinod Gupta School of Management

Preliminary Report on Simple Statistical Tests (t-tests and bivariate correlations)

Clincial Biostatistics. Regression

Fitting discrete-data regression models in social science

One-Way Independent ANOVA

Validity, Reliability and Classical Assumptions

Sociology Exam 3 Answer Key [Draft] May 9, 201 3

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

Applied Medical. Statistics Using SAS. Geoff Der. Brian S. Everitt. CRC Press. Taylor Si Francis Croup. Taylor & Francis Croup, an informa business

In this module I provide a few illustrations of options within lavaan for handling various situations.

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

PEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW. Ball State University

A SAS Macro for Adaptive Regression Modeling

Poisson regression. Dae-Jin Lee Basque Center for Applied Mathematics.

Tutorial 3: MANOVA. Pekka Malo 30E00500 Quantitative Empirical Research Spring 2016

Before we get started:

Longitudinal and Hierarchical Analytic Strategies for OAI Data

CHAPTER TWO REGRESSION

Getting started with Eviews 9 (Volume IV)

Daniel Boduszek University of Huddersfield

Political Science 15, Winter 2014 Final Review

Modern Regression Methods

Applied Regression The University of Texas at Dallas EPPS 6316, Spring 2013 Tuesday, 7pm 9:45pm Room: FO 2.410

Analysis and Interpretation of Data Part 1

Technical Specifications

Regression Analysis II

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do.

Dan Byrd UC Office of the President

STA 3024 Spring 2013 EXAM 3 Test Form Code A UF ID #

2 Assumptions of simple linear regression

Assessing Studies Based on Multiple Regression. Chapter 7. Michael Ash CPPA

Chapter 3: Examining Relationships

Substantive Significance of Multivariate Regression Coefficients

Basic Biostatistics. Chapter 1. Content

MS&E 226: Small Data

Regression Including the Interaction Between Quantitative Variables

Methodology for Non-Randomized Clinical Trials: Propensity Score Analysis Dan Conroy, Ph.D., inventiv Health, Burlington, MA

CHILD HEALTH AND DEVELOPMENT STUDY

Sawtooth Software. MaxDiff Analysis: Simple Counting, Individual-Level Logit, and HB RESEARCH PAPER SERIES. Bryan Orme, Sawtooth Software, Inc.

Media, Discussion and Attitudes Technical Appendix. 6 October 2015 BBC Media Action Andrea Scavo and Hana Rohan

Analytic Strategies for the OAI Data

Sample size and power calculations in Mendelian randomization with a single instrumental variable and a binary outcome

Objective: To describe a new approach to neighborhood effects studies based on residential mobility and demonstrate this approach in the context of

Index. Springer International Publishing Switzerland 2017 T.J. Cleophas, A.H. Zwinderman, Modern Meta-Analysis, DOI /

Department of Allergy-Pneumonology, Penteli Children s Hospital, Palaia Penteli, Greece

Analysis of Confidence Rating Pilot Data: Executive Summary for the UKCAT Board

Transcription:

Analyzing binary outcomes, going beyond logistic regression 2018 EHE Forum presentation James O. Uanhoro Department of Educational Studies

Premise Obtaining relative risk using Poisson regression Obtaining risk difference using linear regression (OLS) Summary 2/33

3/33

Logistic regression review Researchers perform logistic regression to analyze binary outcomes: The coefficients obtained from the logistic regression model are logits or log odds Logits are difficult to interpret Researchers often exponentiate logits to obtain odds ratios to ease interpretation 4/33

Logistic regression review (example) We observed the referral rates of 189 children to remedial reading programs. On average, 31% of children were referred. We would hope that the only determinant of referral would be reading ability, so we test if there are sex differences in assignment, while controlling for reading ability. 5/33

Logistic regression review (model results) Results: 1. As reading has a negative coefficient, children with higher reading abilities were less likely to be recommended to the remedial program. 6/33

Logistic regression review (model results) Results: 2. Comparing children with the same reading ability, the odds of boys getting recommended to the program was on average 1.91 times the odds of girls getting recommended to the program, p =.046. 7/33

Logistic regression review (in closing) The problem: 1. I m yet to meet someone who understands odds intuitively - I don t know many gamblers... 2. We usually want to compare probabilities. We want to be able to say boys were x times more likely than girls to... This value is known as the relative risk or risk ratio. 8/33

Logistic regression review (in closing) The problem: 3. Many researchers simply interpret the odds ratio as if it were the risk ratio. This would only be acceptable if the rate of referral to the remedial program was under 5%. 4. This problem exists because in logistic regression, we model log odds. It is possible to calculate the risk ratio in logistic regression, but it requires extra math. 9/33

Poisson regression as an alternative Poisson regression can be used as an alternative to model binary data: In Poisson regression, we model the log probabilities. If we exponentiate the model coefficients, we obtain the relative risk. However in Poisson regression, we assume the mean of the data equals its variance. 10/33

Poisson regression (Mean-variance relationship) Mean Variance assumption for Poisson model Mean Variance relationship of binary data Variance (p) 0.0 0.2 0.4 0.6 0.8 1.0 Variance (p * (1 p)) 0.00 0.05 0.10 0.15 0.20 0.25 0.0 0.2 0.4 0.6 0.8 1.0 Mean (p) 0.0 0.2 0.4 0.6 0.8 1.0 Mean (p) 11/33

Problems: Poisson regression assumes the mean of the data equals the variance. In binary data, the mean exceeds the variance. If we proceed, we will be assuming the data are under-dispersed Poisson. The regression coefficients are fine, but the standard errors, hence p-values, are incorrect. We can attempt to address this problem using robust error variances (or quasi-poisson models - not available in SPSS). 12/33

Poisson regression (sample syntax) We can use SPSS GENLIN for Poisson regression with robust errors. Sample syntax: GENLIN Remedial BY Boy (ORDER=DESCENDING) WITH Reading /MODEL Boy Reading INTERCEPT=YES DISTRIBUTION=POISSON LINK=LOG /CRITERIA COVB=ROBUST /PRINT CPS DESCRIPTIVES MODELINFO FIT SUMMARY SOLUTION. These two lines are particularly important: DISTRIBUTION=POISSON LINK=LOG /CRITERIA COVB=ROBUST 13/33

Poisson regression (model results) 14/33

Poisson regression (Google to the rescue) Results: 1. Comparing children with the same reading ability, boys were on average 1.54 times more likely than girls to be recommended to the program, p =.053. 2. Result is not statistically significant. A problem with Poisson alternatives to the logistic regression model is a small loss of statistical power. 15/33

Poisson regression (Take-aways) If anyone interpreted the odds ratio as a comparison of probabilities, they would overstate the difference in referral between boys and girls. Poisson models and modifications may suffer from a loss in statistical power. If a study was well-powered to perform logistic regression, Poisson alternatives would likely be sufficiently powered. There is one more alternative that does not suffer the loss in statistical power. 16/33

Linear regression or the linear probability model Commonplace linear regression using OLS estimation directly models probabilities when the data are binary. In econometrics, this model is known as the Linear Probability Model (LPM). 17/33

Linear regression or the linear probability model Nice features of the LPM: 1. The coefficient of the sex variable would be the difference in probabilities of recommendation to remedial reading between boys and girls. 2. If the rates of recommendation to remedial reading lie between 20% and 80%, the OLS slope approximates the logistic slope. 3. Despite violating homoscedasticity with binary data, standard errors from OLS may still perform adequately. 18/33

Linear regression (sample syntax) We can use SPSS GENLIN for linear regression with robust errors (as a safety net). First, we center the reading scores to make the intercept meaningful. COMPUTE Reading_c = Reading - 65. EXECUTE. GENLIN Remedial BY Boy (ORDER=DESCENDING) WITH Reading_c /MODEL Boy Reading_c INTERCEPT=YES DISTRIBUTION=NORMAL LINK=IDENTITY /CRITERIA COVB=ROBUST /PRINT CPS DESCRIPTIVES MODELINFO FIT SUMMARY SOLUTION. 19/33

Linear regression (model results) Results: 1. The probability of referral to remedial reading for girls with average reading ability was, on average, 24.4%. 20/33

Linear regression (model results) Results: 2. The probability of referral to remedial reading for boys with average reading ability was on average 13.6% higher than it was for girls of average reading ability, p =.039. 21/33

Linear regression (model results) Results: 3. The probability of boys with average reading ability to be referred to remedial reading was, on average, 38% (.244 +.136). 22/33

Linear regression (model results) Results: 4. Boys with average reading ability were 56% (.136.244) more likely to be referred to remedial reading than girls with average reading ability. Poisson value was 54% more. 23/33

Linear regression (model results) You could also get the odds ratio, but why: Logistic regression gave us 1.91..38 1.38.244 1.244 = 1.90 (1) 24/33

Why doesn t anyone do this already? 1. They sometimes do, just not in education. Besides, if we have low or high probabilities, it is unreliable. 2. OLS can return predicted probabilities less than 0 and greater than 1. However, this is usually not a problem if we are interested in average effects. 25/33

Why doesn t anyone do this already? 3. The error of binary data is not homoskedastic - an assumption for correct standard errors with OLS. However, we can address this using robust standard errors as we have done above. Alternatives include weighted least squares. 4. Predictor variables with high leverage (outliers) can easily bias the model; the logistic model is more robust to outliers on the predictor variables. Most software packages can return leverage values on request, after conducting regression analysis. 26/33

Practical advice 1. Calculate the average probability of the outcome. If it is close to 50%, a linear regression model would work just as fine as a logistic regression model. If it is less than 20% or greater than 80%, a linear model would probably be inadequate. However, beware of cases with high leverage. They can throw off your linear model very easily. 27/33

Practical advice 2. If the probability of the outcome is less than 5%, you can freely interpret the exponentiated logistic regression coefficient using likely language. At low probabilities, odds approximately equal probabilities, so the odds ratio approximates the risk ratio. 28/33

Practical advice 3. If you run a Poisson regression so you can interpret your results using likely language, but your p-values are not statistically significant and they are stat. sig. in the logistic regression model, take the extra minute or two to do the math to obtain the risk ratio using the logistic regression model. You should probably center continous predictors before doing this. The intercept changed to -1.165 on doing this: 1 1+exp( ( 1.165+.647)) 1 1+exp( ( 1.165)) = 1.57 (2) 29/33

Practical advice 4. These results extend to multilevel models. If you find that your multilevel logistic model is taking too much time, a linear multilevel model might suffice. 30/33

Questions?? https://ghostbin.com/paste/ou6ds 31/33

Useful references Cheung, Y. B. (2007). A modified least-squares regression approach to the estimation of risk difference. American Journal of Epidemiology, 166(11), 1337 1344. https://doi.org/10.1093/aje/kwm223 Hellevik, O. (2009). Linear versus logistic regression when the dependent variable is a dichotomy. Quality & Quantity, 43(1), 59 74. https://doi.org/10.1007/s11135-007-9077-3 Zou, G. (2004). A modified Poisson regression approach to prospective studies with binary data. American Journal of Epidemiology, 159(7), 702 706. https://doi.org/10.1093/aje/kwh090 Zou, G., & Donner, A. (2013). Extension of the modified Poisson regression model to prospective studies with correlated binary data. Statistical Methods in Medical Research, 22(6), 661 670. https://doi.org/10.1177/0962280211427759 32/33

Table of Contents Premise Obtaining relative risk using Poisson regression Obtaining risk difference using linear regression (OLS) Summary 33/33