What is Multilevel Modelling Vs Fixed Effects. Will Cook Social Statistics

Similar documents
Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Module 14: Missing Data Concepts

Ec331: Research in Applied Economics Spring term, Panel Data: brief outlines

Instrumental Variables Estimation: An Introduction

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Methods for Addressing Selection Bias in Observational Studies

Lecture 14: Adjusting for between- and within-cluster covariates in the analysis of clustered data May 14, 2009

C h a p t e r 1 1. Psychologists. John B. Nezlek

Objective: To describe a new approach to neighborhood effects studies based on residential mobility and demonstrate this approach in the context of

Marno Verbeek Erasmus University, the Netherlands. Cons. Pros

The Effects of Maternal Alcohol Use and Smoking on Children s Mental Health: Evidence from the National Longitudinal Survey of Children and Youth

6. Unusual and Influential Data

Reading and maths skills at age 10 and earnings in later life: a brief analysis using the British Cohort Study

Propensity Score Analysis Shenyang Guo, Ph.D.

Understanding and Applying Multilevel Models in Maternal and Child Health Epidemiology and Public Health

Chapter 21 Multilevel Propensity Score Methods for Estimating Causal Effects: A Latent Class Modeling Strategy

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

How to analyze correlated and longitudinal data?

Households: the missing level of analysis in multilevel epidemiological studies- the case for multiple membership models

Data Analysis in Practice-Based Research. Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine

26:010:557 / 26:620:557 Social Science Research Methods

A NEW TRIAL DESIGN FULLY INTEGRATING BIOMARKER INFORMATION FOR THE EVALUATION OF TREATMENT-EFFECT MECHANISMS IN PERSONALISED MEDICINE

Mendelian Randomization

Examining Relationships Least-squares regression. Sections 2.3

INTRODUCTION TO ECONOMETRICS (EC212)

OLS Regression with Clustered Data

Applied Quantitative Methods II

Logistic regression: Why we often can do what we think we can do 1.

Catherine A. Welch 1*, Séverine Sabia 1,2, Eric Brunner 1, Mika Kivimäki 1 and Martin J. Shipley 1

Social Determinants and Consequences of Children s Non-Cognitive Skills: An Exploratory Analysis. Amy Hsin Yu Xie

WELCOME! Lecture 11 Thommy Perlinger

ONE. What Is Multilevel Modeling and Why Should I Use It? CHAPTER CONTENTS

EC352 Econometric Methods: Week 07

A Strategy for Handling Missing Data in the Longitudinal Study of Young People in England (LSYPE)

Detection of Unknown Confounders. by Bayesian Confirmatory Factor Analysis

DEPARTMENT OF EPIDEMIOLOGY 2. BASIC CONCEPTS

Centering Predictors

f WILEY ANOVA and ANCOVA A GLM Approach Second Edition ANDREW RUTHERFORD Staffordshire, United Kingdom Keele University School of Psychology

Simple Linear Regression the model, estimation and testing

Mixed Effect Modeling. Mixed Effects Models. Synonyms. Definition. Description

Session 3: Dealing with Reverse Causality

Overview of Study Designs in Clinical Research

Using register data to estimate causal effects of interventions: An ex post synthetic control-group approach

HPS301 Exam Notes- Contents

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

Psych 1Chapter 2 Overview

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data

Biostatistics II

Data Sources & Issues for Health Inequalities Research. J. Dunn

12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

Identifying Endogenous Peer Effects in the Spread of Obesity. Abstract

Evidence Based Medicine

Analysis of TB prevalence surveys

Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology

Carrying out an Empirical Project

Cross-Lagged Panel Analysis

An Instrumental Variable Consistent Estimation Procedure to Overcome the Problem of Endogenous Variables in Multilevel Models

Regression Discontinuity Designs: An Approach to Causal Inference Using Observational Data

STATS8: Introduction to Biostatistics. Overview. Babak Shahbaba Department of Statistics, UCI

Case A, Wednesday. April 18, 2012

Addendum: Multiple Regression Analysis (DRAFT 8/2/07)

The Effect of Extremes in Small Sample Size on Simple Mixed Models: A Comparison of Level-1 and Level-2 Size

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)

Estimating Heterogeneous Choice Models with Stata

Using meta-regression to explore moderating effects in surveys of international achievement

Introduction to Observational Studies. Jane Pinelis

The Australian longitudinal study on male health sampling design and survey weighting: implications for analysis and interpretation of clustered data

Introduction. The current project is derived from a study being conducted as my Honors Thesis in

Donna L. Coffman Joint Prevention Methodology Seminar

Identifying Peer Influence Effects in Observational Social Network Data: An Evaluation of Propensity Score Methods

Correlation and regression

Causal Validity Considerations for Including High Quality Non-Experimental Evidence in Systematic Reviews

Problem Set 5 ECN 140 Econometrics Professor Oscar Jorda. DUE: June 6, Name

You must answer question 1.

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data

What is: regression discontinuity design?

Unit 1 Exploring and Understanding Data

Session 1: Dealing with Endogeneity

The essential focus of an experiment is to show that variance can be produced in a DV by manipulation of an IV.

The Use of Propensity Scores for Nonrandomized Designs With Clustered Data

CHAPTER 1 Understanding Social Behavior

Multilevel analysis quantifies variation in the experimental effect while optimizing power and preventing false positives

Use of the Quantitative-Methods Approach in Scientific Inquiry. Du Feng, Ph.D. Professor School of Nursing University of Nevada, Las Vegas

George B. Ploubidis. The role of sensitivity analysis in the estimation of causal pathways from observational data. Improving health worldwide

Political Science 15, Winter 2014 Final Review

Cochrane Pregnancy and Childbirth Group Methodological Guidelines

Multi-level approaches to understanding and preventing obesity: analytical challenges and new directions

Moving beyond regression toward causality:

Version No. 7 Date: July Please send comments or suggestions on this glossary to

PSYC3010 Advanced Statistics for Psychology

Many studies conducted in practice settings collect patient-level. Multilevel Modeling and Practice-Based Research

C.7 Multilevel Modeling

Analysis methods for improved external validity

GLOSSARY OF GENERAL TERMS

MS&E 226: Small Data

Will Eating More Vegetables Trim Our Body Weight?

An informal analysis of multilevel variance

Methods of Reducing Bias in Time Series Designs: A Within Study Comparison

Transcription:

What is Multilevel Modelling Vs Fixed Effects Will Cook Social Statistics

Intro Multilevel models are commonly employed in the social sciences with data that is hierarchically structured Estimated effects from multilevel models can be easily criticised for being driven by confounding variables Fixed effects models are an alternative to deal with this weakness and support causal conclusions There are however a number of drawbacks, including limiting the scope of the modelling undertaken

Intro Preference for the one of the two methods is partially driven by disciplinary norms Sociology & Education = Multilevel models Economics = Fixed effects models Note on terminology: Multilevel models are also called Random Effects models, sometimes denoted as MLM Confusingly it is common to refer to coefficient estimates on explanatory variables in multilevel models as fixed effects!

Examples used Cross sectional hierarchies Peer effects in schools (contextual effects) Longitudinal Hierarchies Effect of divorce on children

Brief overview of multilevel models The causal inference problem Fixed effects models Problems with fixed effects models Which should I use?

Much social data is hierarchically structured

Pupils in schools School A School B Pupils in school A Pupils in school B

Other cross sectional hierarchical data Citizens within countries People within neighbourhoods Siblings within families Players within a sports team

Longitudinal hierarchical data Observations on an individual (or other unit of analysis) across time Individual A Individual B 1 2 3 4 5 6 1 2 3 4 5 6 Observations on individual A Observations on individual B

More complex structures Individuals observed over time within a family, within a school Individuals observed over time within a family, within a school, accounting for changes in school attended Individuals observed over time within a family, within a school, accounting for family members attending different schools

Much social data is hierarchically structured Multilevel modelling allows us to recognise this in our model by assuming that the error term in a regression (i.e. everything that is not explained by the explanatory variables) is structured according to the known hierarchy. E.g. y ij = b 0 + b 1 x ij + b 2 p j + e ij Becomes y ij = b 0 + b 1 x ij + b 2 p j + v ij + u j This means that standard errors on the coefficients (b) are not downwardly biased and therefore reduces risk of type I error. Also is a structure for estimating how relationships vary between contexts (random slopes and cross level interactions)

Much social data is hierarchically structured Multilevel modelling allows us to recognise this in our model by assuming that the error term in a regression (i.e. everything that is not explained by the explanatory variables) is structured according to the known hierarchy. E.g. y ij = b 0 + b 1 x ij + b 2 p j + e ij Becomes y ij = b 0 + b 1 x ij + b 2 p j + v ij + u j Partitioned error, with assumptions the same as usual OLS (i.e. independence, normality, homoscedasticity) This means that standard errors on the coefficients (b) are not downwardly biased and therefore reduces risk of type I error. Also is a structure for estimating how relationships vary between contexts (random slopes and cross level interactions)

Example I: Peer effects in schools

Pupils in schools School A School B Pupils in school A Between Pupils in school B

Effect of SEN inclusion on pupil test scores (attainment) School A School B -Estimation of the effect of the SEN peers becomes a comparison between the outcomes in school A and school B -Controlling for observed variables, the estimation of the effect simply is calculated as the mean of pupil attainment in school A minus the mean of pupil attainment in school B.

Example II: Effect of divorce on children

Effect of divorce on children s behaviour Individual A Between Individual B 1 2 3 4 5 6 1 2 3 4 5 6 Observations on individual A Within Observations on individual B

Estimated effect is driven both by comparing the outcomes between individuals and the change in outcomes within individuals Multilevel modelling generates the correct standard errors and efficiently weights the between and within variation to generate the estimated effect based on the residual variances within and between individuals

Brief overview of multilevel models The causal inference problem Fixed effects models Problems with fixed effects models Which should I use?

Is your data appropriate for MLM? Need to treat higher level units as if they were a sample: Sufficient sample size (N>50?) Random sampling Many MLM analyses do not conform to this (especially when the higher level unit is a country see Mohring (2012))

Causal inference problem To generate unbiased estimates, in regression modelling we make the assumption that explanatory variables are uncorrelated with the error term Key problem in multilevel models is that variation between higher level units that generate your co-efficient estimates, might be related to unobserved confounding variables Problem in example 1: Cannot distinguish the effect of being in a group (e.g. school/neighbourhood etc) from the reason for being in a group (Hoxby, 2000) Problem in example 2: The propensity for individuals to experience change in the variable of interest is often determined by other preexisting variables that vary between individuals and also affect outcomes.

Problems with causal inference from observational studies Unobserved confounder Explanatory variable Outcome Variable

Example I: Sources of bias SEN pupils might go to certain types of school Pupils might be more likely to encounter a SEN peers based on unobserved pupil characteristics that are related to attainment. Parents may actively choose schools that do not have SEN peers; parental motivation is an unobserved pupil characteristic.

Problems with causal inference in a peer effect study Parental motivation % SEN experienced by child Test scores

Example II: Sources of bias Socio-economic deprivation Parenting style (linked to many other factors) Number and gender of siblings

Problems with causal inference in a longitudinal study Parental Stress Divorce Child s behaviour

Brief overview of multilevel models The causal inference problem Fixed effects models Problems with fixed effects models Which should I use?

Fixed effects - overview Fixed effects models eliminate any variation in higher level units in coefficient estimation; they rule out between variation. This means that (fixed) unobserved differences between higher level units (e.g. individuals, schools, neighbourhoods etc) no longer bias our estimates. Estimation is straightforward (OLS), in STATA it is implemented using the xtreg,fe command

Fixed effects dummy variable model In fixed effects models, the higher level effect is no longer part of the residual Instead, fixed effects models can be thought of as including a dummy variable for each higher level unit. These dummy variables control for all variation (observed and unobserved) at the higher level

Within transformation Another way of thinking about fixed effects models is that they transform the variables into the deviation from the higher level unit mean. The actual research question being analysed by the data is whether the deviation of the outcome variable around its (group or individual level) mean is related to the corresponding deviation in the variables of interest from its mean. This is called the within transformation and is what is usually estimated in statistical packages. The overall effect is the same: all between variation is removed.

Example I : Group fixed effects

Effect of SEN inclusion on pupil test scores (attainment) School A School B -Estimation of the effect of the SEN peers becomes a comparison between the outcomes in school A and school B -Controlling for observed variables, the estimation of the effect simply is calculated as the mean of pupil attainment in school A minus the mean of pupil attainment in school B.

Sources of bias SEN pupils might go to certain types of school Pupils might be more likely to encounter a SEN peers based on unobserved pupil characteristics that are related to attainment. Parents may actively choose schools that do not have SEN peers; parental motivation is an unobserved pupil characteristic. Therefore comparing pupils between schools may produce biased results

Comparison between cohorts School fixed effects models using multiple cohorts attempt to avoid these biases Year pupils complete primary school School A School B 2009 2008 2007 No comparison between schools

Comparison between cohorts Multilevel models using both between and within school comparisons Year pupils complete primary school School A School B 2009 2008 2007 Comparison between schools

Fixed effects models of peer effects tend to find smaller effects than multilevel models Note that the fixed effects model is estimating the peer effect by comparing pupils within the same school but in different cohorts Assumes that cohort to cohort variation in peer groups within a school is random

Example II : Individual fixed effects

Comparison between years, within individuals Longitudinal multilevel models use all source of variation in estimation Year the individual is observed 2009 Individuals 2008 2007 Comparison between individuals

Example II: Sources of bias Socio-economic deprivation Parenting style (linked to many other factors) Number and gender of siblings

Comparison between years, within individuals Individual fixed effects eliminate all comparisons between individuals Year the individual is observed 2009 Individuals 2008 2007 No comparison between individuals

Comparison between years, within individuals Individual fixed effects eliminate all comparisons between individuals Year the individual is observed 2009 Individuals 2008 2007 No comparison between individuals

Fixed effects models of divorce on childhood outcomes (e.g. behaviour) tend to find smaller effects than multilevel models and in some cases zero effects Note that the fixed effects model here is estimating the effect of divorce by comparing outcomes for a particular individual before and after parental divorce; it excludes cases where there is no change in state over period of analysis Assumes the timing of the divorce is random

Brief overview of multilevel models The causal inference problem Fixed effects models Problems with fixed effects models Which should I use?

Problem 1: Less variation Lack of variation Measurement error Non-linearities Low power

Problem 2: Elimination of variation of interest Excludes useful variation, though note that cross level interactions are still possible Equivalent of random slopes models is unwieldy In the case of education research, removing the school level from the analysis can be problematic Does not control for time-varying heterogeneity even at the higher level

Brief overview of multilevel models The causal inference problem Fixed effects models Problems with fixed effects models Which should I use?

Hausman test Formally you can test for whether to use a multilevel model using the Hausman test Tests whether the coefficient estimates from the multilevel model are statistically significantly different from the fixed effects estimates (assumed to be unbiased) Assumes a well specified model.

Using Judgement Clarke et al (2010) recommend: when the selection mechanism is fairly well understood and the researcher has access to rich data, the random effects model should naturally be preferred because it can produce policy-relevant estimates while allowing a wider range of research questions to be addressed. This would seem to suggest that fixed effects models should be preferred in most cases.

Use both? Multilevel models have powerful descriptive uses If the data allows, multilevel estimates should be checked against those from fixed effects models Discrepancies and similarities between multilevel and fixed effects estimates are informative in themselves Adding contextual level variables to a multilevel model should eliminate some bias (e.g. means of level one variables) but may be problematic if small number of higher level units Straightforward to test both approaches (e.g. xtreg,re vs. xtreg,fe in STATA)

Summary In choosing between the two methods you should consider: Are you interested in estimating a causal relationship? Do you have concerns that unobserved higher level variables may affect the estimation of this relationship? If both answers are yes then fixed effects models should be preferred. No harm in using both methods and where possible, both should be considered.

References Allison, P. (2009) Fixed Effects Regression Models, Sage Quantitative Applications in the Social Sciences, vol. 160. Clarke, P., Crawford, C., Steele, F. & Vignoles, A. (2010) The choice between fixed and random effects models: some considerations for educational research, Institute of Education DoQSS Working Paper No. 10-10 Goldstein, H. (2010) Multilevel Statistical Models, 4th Edition, Wiley. Mohring, K (2012) The fixed effect as an alternative to multilevel analysis for cross-national analyses, GK Soclife working paper