Lecture 14: Adjusting for between- and within-cluster covariates in the analysis of clustered data May 14, 2009

Similar documents
What is Multilevel Modelling Vs Fixed Effects. Will Cook Social Statistics

Problem Set 5 ECN 140 Econometrics Professor Oscar Jorda. DUE: June 6, Name

How to analyze correlated and longitudinal data?

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology

Selection and Combination of Markers for Prediction

Inference with Difference-in-Differences Revisited

Instrumental Variables Estimation: An Introduction

A re-randomisation design for clinical trials

The Effects of Maternal Alcohol Use and Smoking on Children s Mental Health: Evidence from the National Longitudinal Survey of Children and Youth

EC352 Econometric Methods: Week 07

On Regression Analysis Using Bivariate Extreme Ranked Set Sampling

THE UNIVERSITY OF OKLAHOMA HEALTH SCIENCES CENTER GRADUATE COLLEGE A COMPARISON OF STATISTICAL ANALYSIS MODELING APPROACHES FOR STEPPED-

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

Analysis of Vaccine Effects on Post-Infection Endpoints Biostat 578A Lecture 3

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data

GENERALIZED ESTIMATING EQUATIONS FOR LONGITUDINAL DATA. Anti-Epileptic Drug Trial Timeline. Exploratory Data Analysis. Exploratory Data Analysis

Generalized Estimating Equations for Depression Dose Regimes

DATA CORE MEETING. Observational studies in dementia and the MELODEM initiative. V. Shane Pankratz, Ph.D.

Data Analysis in Practice-Based Research. Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine

The Effects of Autocorrelated Noise and Biased HRF in fmri Analysis Error Rates

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)

Econometric Game 2012: infants birthweight?

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data

Example 7.2. Autocorrelation. Pilar González and Susan Orbe. Dpt. Applied Economics III (Econometrics and Statistics)

Multiple Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Longitudinal and Hierarchical Analytic Strategies for OAI Data

Score Tests of Normality in Bivariate Probit Models

Adjusting the Oral Health Related Quality of Life Measure (Using Ohip-14) for Floor and Ceiling Effects

Supplementary Appendix

Methods for Addressing Selection Bias in Observational Studies

Analysis of TB prevalence surveys

An Introduction to Bayesian Statistics

Lecture 21. RNA-seq: Advanced analysis

6. Unusual and Influential Data

Selected Topics in Biostatistics Seminar Series. Missing Data. Sponsored by: Center For Clinical Investigation and Cleveland CTSC

Unit 1 Exploring and Understanding Data

Analyzing diastolic and systolic blood pressure individually or jointly?

MEA DISCUSSION PAPERS

The Late Pretest Problem in Randomized Control Trials of Education Interventions

Centering Predictors

Introduction to Observational Studies. Jane Pinelis

Final Exam - section 2. Thursday, December hours, 30 minutes

NPTEL Project. Econometric Modelling. Module 14: Heteroscedasticity Problem. Module 16: Heteroscedasticity Problem. Vinod Gupta School of Management

Carrying out an Empirical Project

Day Hospital versus Ordinary Hospitalization: factors in treatment discrimination

NORTH SOUTH UNIVERSITY TUTORIAL 2

Analysis of bivariate binomial data: Twin analysis

Problem set 2: understanding ordinary least squares regressions

Session 3: Dealing with Reverse Causality

Session 1: Dealing with Endogeneity

A Comparison of Robust and Nonparametric Estimators Under the Simple Linear Regression Model

Design and Analysis Plan Quantitative Synthesis of Federally-Funded Teen Pregnancy Prevention Programs HHS Contract #HHSP I 5/2/2016

Ordinary Least Squares Regression

Chapter 1: Exploring Data

MODELING HIERARCHICAL STRUCTURES HIERARCHICAL LINEAR MODELING USING MPLUS

Missing data. Patrick Breheny. April 23. Introduction Missing response data Missing covariate data

Multilevel analysis quantifies variation in the experimental effect while optimizing power and preventing false positives

F1: Introduction to Econometrics

Simple Linear Regression the model, estimation and testing

Marno Verbeek Erasmus University, the Netherlands. Cons. Pros

A Bayesian Perspective on Unmeasured Confounding in Large Administrative Databases

County-Level Small Area Estimation using the National Health Interview Survey (NHIS) and the Behavioral Risk Factor Surveillance System (BRFSS)

A Comparison of Linear Mixed Models to Generalized Linear Mixed Models: A Look at the Benefits of Physical Rehabilitation in Cardiopulmonary Patients

Analytic Strategies for the OAI Data

A NEW TRIAL DESIGN FULLY INTEGRATING BIOMARKER INFORMATION FOR THE EVALUATION OF TREATMENT-EFFECT MECHANISMS IN PERSONALISED MEDICINE

Dan Byrd UC Office of the President

Technical Specifications

Bayes Linear Statistics. Theory and Methods

Prediction Model For Risk Of Breast Cancer Considering Interaction Between The Risk Factors

The Australian longitudinal study on male health sampling design and survey weighting: implications for analysis and interpretation of clustered data

Objective: To describe a new approach to neighborhood effects studies based on residential mobility and demonstrate this approach in the context of

Estimating Heterogeneous Choice Models with Stata

Propensity scores: what, why and why not?

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Interpretation of Data and Statistical Fallacies

Introduction to Econometrics

Received: 14 April 2016, Accepted: 28 October 2016 Published online 1 December 2016 in Wiley Online Library

Catherine A. Welch 1*, Séverine Sabia 1,2, Eric Brunner 1, Mika Kivimäki 1 and Martin J. Shipley 1

Sampling Weights, Model Misspecification and Informative Sampling: A Simulation Study

Examining Relationships Least-squares regression. Sections 2.3

Case A, Wednesday. April 18, 2012

(b) empirical power. IV: blinded IV: unblinded Regr: blinded Regr: unblinded α. empirical power

Mendelian Randomization

Simple Linear Regression

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

Memorial Sloan-Kettering Cancer Center

Russian Journal of Agricultural and Socio-Economic Sciences, 3(15)

Underweight Children in Ghana: Evidence of Policy Effects. Samuel Kobina Annim

The AB of Random Effects Models

Bias reduction with an adjustment for participants intent to dropout of a randomized controlled clinical trial

MS&E 226: Small Data

Multiple Regression Analysis

Decomposition of the Genotypic Value

INTRODUCTION TO ECONOMETRICS (EC212)

This article analyzes the effect of classroom separation

Bayesian hierarchical modelling

Introduction to Econometrics

Introduction to ROC analysis

Transcription:

Measurement, Design, and Analytic Techniques in Mental Health and Behavioral Sciences p. 1/3 Measurement, Design, and Analytic Techniques in Mental Health and Behavioral Sciences Lecture 14: Adjusting for between- and within-cluster covariates in the analysis of clustered data May 14, 2009 XH Andrew Zhou azhou@u.washington.edu Professor, Department of Biostatistics, University of Washington

Measurement, Design, and Analytic Techniques in Mental Health and Behavioral Sciences p. 2/3 Adjustment for cluster-level covariates If cluster-level covariates are available, we can directly adjust for their effects in a multi-level model. If cluster-level covariates are not available, we may still adjust for cluster-level effects by including cluster means in the model because variability in cluster means can confound the estimated association between the individual level covariate measurement and outcome. It has been shown that inference on the individual-level covariate can be misleading without adjusting for cluster-level means.

Measurement, Design, and Analytic Techniques in Mental Health and Behavioral Sciences p. 3/3 An example Let us consider the impact of birth weight on childhood intelligence (or IQ) as measured by the Wechsler Intelligence Scale for children. Most studies demonstrate that heavier babies tend to have higher IQs. However, birth weight is known to be associated with family socio-economic status (SES), with families of higher status having larger babies. SES of a family is also related to the IQ measurements of children in that family. Thus, SES can act as a potential confounder of the relationship between birth weight and IQ (Begg and Parides, 2003).

Measurement, Design, and Analytic Techniques in Mental Health and Behavioral Sciences p. 4/3 Adjustment for cluster-level covariate If we could measure SES by a covariate, we can include that cluster-level covariate into a regression model. However, SES is difficult to measure accurately. Alternatively, if we had measures of birth weight and IQ for multiple siblings within the same family, we could use these data to make within-family comparisons that would be tightly controlled for SES. We can also evaluate individual-level birth weight as a predictor of individual IQ as well as the effect of family-averaged birth weight on IQ.

Measurement, Design, and Analytic Techniques in Mental Health and Behavioral Sciences p. 5/3 Adjustment, continued This leads to separation, or partitioning, of the effect of birth weight that conforms to interesting research hypothesis. Hence by breaking down the birth weight effect into individual-level and family-level components, we are able to obtain better estimates of these effects and draw more accurate conclusions.

Measurement, Design, and Analytic Techniques in Mental Health and Behavioral Sciences p. 6/3 Notations Y ij : the outcome for subject j in cluster i. X ij : a corresponding continuous covariate, i = 1,...,K, j = 1,...,n i. The mean covariate measurement for the group can be computed as X i = n i j=1 X ij/n i.

Measurement, Design, and Analytic Techniques in Mental Health and Behavioral Sciences p. 7/3 Generalized linear models models To study the relationship of Y and X, we consider the following five models: h(e[y ij X ij ]) = β 01 + β 1 X ij, (1) h(e[y ij a i,x ij ]) = β 02 + β 2 X ij + γ 2 Xi, (2) h(e[y ij X ij ]) = β 03 + β 3 (X ij X i ) + γ 3 X i, (3) h(e[y ij X ij ]) = β 04 + β 4 (X ij X i ), (4) h(e[y ij X ij ]) = β 05 + γ 5 Xi. (5) (6)

Measurement, Design, and Analytic Techniques in Mental Health and Behavioral Sciences p. 8/3 Model interpretation Model (2) and Model (3) are mathematical equivalent.

Measurement, Design, and Analytic Techniques in Mental Health and Behavioral Sciences p. 9/3 Interpretation of the coefficients For Model (1), β 1 measures the change in the expectation corresponding to one unit increase in the covariate, X ij, which is distorted due to confounding by cluster-level effect. For Model (2), β 2 measures the change in the expectation corresponding to one unit increase in the covariate, given the same cluster-averaged X. For Model (2), γ 2 can be interpreted as the effect of one unit increase in cluster-averaged X on Y, holding the fixed individual-level covariate, X ij. That is, given two subjects with the same individual covariate, the subject whose cluster has a 1 unit higher average X can be expected to have Y increase about γ 2.

Measurement, Design, and Analytic Techniques in Mental Health and Behavioral Sciences p. 10/3 Interpretation of the coefficients, continued For Model (3), β 3 measures the change in the expectation corresponding to a one unit increase in the covariate, given the same cluster-averaged X. For Model (3), γ 3 represents the mean difference in Y associated with one unit increase in cluster-averaged X i simultaneous with one unit increase in individual X ij.

Measurement, Design, and Analytic Techniques in Mental Health and Behavioral Sciences p. 11/3 Model comparison Model (1) may lead to biased results, and should not be used. Model (2) is the model of choice for most purpose for adjusting for cluster-level confounders and easy interpretation. Model (3) can adjust for cluster-level confounders, but interpretation of the γ 3 is not easy.

Measurement, Design, and Analytic Techniques in Mental Health and Behavioral Sciences p. 12/3 Remarks 1. When covariate values vary within a cluster, the cluster-level mean covariate is often the most possible confounder of the association between individual-level covariate and response. 2. At least, it may be served as a proxy for some relevant cluster-level characteristics. 3. When the clusters are relative large (as in clinical center-based), the cluster mean can be estimated much more precisely than in data sets where cluster sizes are small (as in family studies).

Measurement, Design, and Analytic Techniques in Mental Health and Behavioral Sciences p. 13/3 Generalized linear mixed effects models (GLMM) We can also consider the following five mixed effects models: h(e[y ij X ij ]) = a 1i + β 1 X ij, (7) h(e[y ij a i,x ij ]) = a 2i + β 2 X ij + γ 2 X i, (8) h(e[y ij X ij ]) = a 3i + β 3 (X ij X i ) + γ 3 Xi, (9) h(e[y ij X ij ]) = a 4i + β 4 (X ij X i ), (10) h(e[y ij X ij ]) = a 5i + γ 5 Xi, (11) where a ki N(β k,σ 2 ak ), and a ki and X ij are independent. (12)

Measurement, Design, and Analytic Techniques in Mental Health and Behavioral Sciences p. 14/3 SAS PROC GLIMMIX for GLMM METHOD (=RSPL, MSPL, RMPL, MMPL) specifies the estimation method in a generalized linear mixed model (GLMM). The default is METHOD=RSPL. Estimation methods ending in "PL" are pseudo-likelihood techniques. The first letter identifier determines whether estimation is based on a residual likelihood ("R") or a maximum likelihood ("M"). The second letter identifies the expansion locus for the underlying approximation. The expansion locus of the expansion is either the vector of random effects solutions ("S") or the mean of the random effects ("M").

Measurement, Design, and Analytic Techniques in Mental Health and Behavioral Sciences p. 15/3 NACC UDS The National Alzheimer s Coordinating Center s (NACC) Uniform Data Set (UDS) is an ongoing longitudinal database of subjects seen at one of the National Institute on Aging s 29 funded Alzheimer s Disease Centers (ADC) located throughout the USA. Subjects seen at the ADCs represent a clinical sample of individuals who are either referred to the clinic for evaluation of dementia, self-referred to the clinic, or are recruited by clinics to participate in dementia research. Longitudinal follow-up began in 2005. As of December 2008, 16225 subjects aged 50 or older had at least one clinic visit. Up to four observations per subject were available, with 8101 subjects having at least two observations.

Measurement, Design, and Analytic Techniques in Mental Health and Behavioral Sciences p. 16/3 Generalized linear models with GEE We use the baseline of the NACC UDS data set. Response (Y ): Demented Does the subjects have dementia? (1 Yes, 0 No) Covariate (X): Mini-Mental State Examination (MMSE) score (0-30).

Measurement, Design, and Analytic Techniques in Mental Health and Behavioral Sciences p. 17/3 MMSE Any MMSE score over 27 (out of 30) is effectively normal. Below this, 20-26 indicates some cognitive impairment. Any MMSE between 10 and 19 moderate to severe cognitive impairment. Below 10 indicates very severe cognitive impairment

Measurement, Design, and Analytic Techniques in Mental Health and Behavioral Sciences p. 18/3 Estimation for GLM Models 1 and 2 Model 1 Model 2 Parameters Estimate Std.err p-value Estimate Std.err p-value Intercept 1.0412 0.1551 <0.0001 1.5411 0.8654 0.0749 MMSE -0.0207 0.0323 0.5228 MMSE -0.0223 0.0069 <0.0013-0.0217 0.0066 0.0009 Model 1:logit(P(Y ij = 1)) = β 01 + β 11 MMSE ij. Model 2:logit(P(Y ij = 1)) = β 02 + β 12 MMSE ij + β 22 MMSE i.

Measurement, Design, and Analytic Techniques in Mental Health and Behavioral Sciences p. 19/3 Estimation for GLM Models 3 and 4 Model 3 Model 4 Parameters Estimate Std.err p-value Estimate Std.err p-value Intercept 1.5411 0.8654 0.0749 0.4855 0.1196 <0.0001 MMSE -0.0424 0.0350 0.2254 MMSE MMSE -0217 0.0066 0.0009-0.0215 0.0062 0.0005 Model 3:logit(P(Y ij = 1)) = β 03 + β 13 (MMSE ij MMSE i ) + β 23 MMSE i. Model 4:logit(P(Y ij = 1)) = β 04 + +β 14 (MMSE ij MMSE i ).

Measurement, Design, and Analytic Techniques in Mental Health and Behavioral Sciences p. 20/3 Estimation for GLM Model 5 Parameters Estimate Std.err p-value Intercept 1.5079 0.8235 0.0671 MMSE -0.0411 0.0330 0.2128 Model 5:logit(P(Y ij = 1)) = β 05 + β 15 MMSE i.

Measurement, Design, and Analytic Techniques in Mental Health and Behavioral Sciences p. 21/3 GLMM We consider generalized linear mixed effects models

Measurement, Design, and Analytic Techniques in Mental Health and Behavioral Sciences p. 22/3 Estimation for GLMM Models 1 and 2 Model 1 Model 2 Parameters Estimate Std.err p-value Estimate Std.err p-value Intercept 1.1001 0.1418 <0.0001 1.2721 1.2921 0.3327 MMSE -0.0068 0.05092 0.8937 MMSE -0.02335 0.00149 <0.0001-0.02335 0.00149 <.0001 Model 1:logit(P(Y ij = 1)) = a i1 + β 01 + β 11 MMSE ij. Model 2:logit(P(Y ij = 1)) = a i2 + β 02 + β 12 MMSE ij + β 22 MMSE i.

Measurement, Design, and Analytic Techniques in Mental Health and Behavioral Sciences p. 23/3 Estimation for GLMM Models 3 and 4 Model 1 Model 2 Parameters Estimate Std.err p-value Estimate Std.err p-value Intercept 1.2721 1.2921 0.3327 0.5107 0.1373 0.0008 MMSE -0.03016 0.0509 0.5535 MMSE MMSE -0.02335 0.001487 <0.0001-0.02335 0.001487 <0.0001 Model 3:logit(P(Y ij = 1)) = a i3 + β 03 + β 13 (MMSE ij MMSE i ) + β 23 MMSE i. Model 4:logit(P(Y ij = 1)) = a i4 + β 04 + +β 14 (MMSE ij MMSE i ).

Measurement, Design, and Analytic Techniques in Mental Health and Behavioral Sciences p. 24/3 Estimation for GLMM Model 5 Table 1: Estimation for Model 5 Parameters Estimate Std.err p-value Intercept 1.3003 1.2431 0.3039 MMSE -0.03182 0.04896 0.5158 Model 5:logit(P(Y ij = 1)) = a i5 + β 05 + β 15 MMSE i.

Measurement, Design, and Analytic Techniques in Mental Health and Behavioral Sciences p. 25/3 Research questions We are interested in assessing the effect of individual MMSE on the probability of having dementia, free of confounding by center-level influences. We are also interested in assessing whether center-average MMSE has an independent effect on the probability of having dementia.

Measurement, Design, and Analytic Techniques in Mental Health and Behavioral Sciences p. 26/3 Violation of independence between random effects and covariates Let us consider the following linear mixed effects model: Y ij = a i + βx ij + ǫ ij, where X ij (0,σ 2 X ), a i (0,σ 2 b ), ǫ ij (0,σ 2 ǫ), a i and ǫ ij are independent, X ij and ǫ ij are independent. The ordinary least squared estimator (OLS) for β, β ols = ( i,j X 2 ij) 1 i,j X ij Y ij, is unbiased and consistent if the covariates and random effects are uncorrelated.

Measurement, Design, and Analytic Techniques in Mental Health and Behavioral Sciences p. 27/3 Violation of independence between random effects and covariates However, it is easy to show that when n i = n, β ols β + σ xb σ 2 x This is a standard econometrics result; that is that the error term correlated with a predictor can introduce bias in regression coefficients..

Measurement, Design, and Analytic Techniques in Mental Health and Behavioral Sciences p. 28/3 A shared random-effects model with normal distributions Y i = (Y i1,...,y ini ) are conditionally independent given X i = (X i1,...,x ini ) and a i. Y ij = β 0 + a i + β B Xi + β W (X ij X i ) + ǫ ij, where ǫ ij N(0,σw), 2 a i N(0,σb 2 ), and X ij a i N(δ 0 + δ 1 a i,σx 2 ).

Measurement, Design, and Analytic Techniques in Mental Health and Behavioral Sciences p. 29/3 A shared random-effects model, continued Neuhaus and McCulloch (2006) showed that the likelihood controbution from the ith cluster is f(y i x i,a)f X (x i a)dg(a) = a f(y i ȳ i x i x)f(x i x) a f(ȳ i x i,a)f( x i a)dg(a). They further showed that f(y i ȳ i x i x) involves only β W, but not β B. But f(ȳ i x i,a) involves β B but not β W.

Measurement, Design, and Analytic Techniques in Mental Health and Behavioral Sciences p. 30/3 Implications We may still get a consistent estimator of β W, based on f(y i ȳ i x i x), which is the contribution to the conditional likelihood, proposed by Neuhaus and McCulloch (2006). However, we cannot separate estimation of β B from the specification of the distribution for the random effects, a. Misspecification of the distribution may bias the estimator of β B.

Measurement, Design, and Analytic Techniques in Mental Health and Behavioral Sciences p. 31/3 Implication, continued Models with common between- and within-cluster covariate effects do not distinguish information from the two sources. Poor estimation of between-cluster covariate effects due to features such as correlations between covariates and random effects may yield inconsistent estimates of overall covariate effects.