Use of GEEs in STATA

Similar documents
How to analyze correlated and longitudinal data?

Lecture 21. RNA-seq: Advanced analysis

THE UNIVERSITY OF OKLAHOMA HEALTH SCIENCES CENTER GRADUATE COLLEGE A COMPARISON OF STATISTICAL ANALYSIS MODELING APPROACHES FOR STEPPED-

A SAS Macro for Adaptive Regression Modeling

Generalized Estimating Equations for Depression Dose Regimes

Analytic Strategies for the OAI Data

Longitudinal and Hierarchical Analytic Strategies for OAI Data

Available from Deakin Research Online:

Biostatistics II

Modelling Research Productivity Using a Generalization of the Ordered Logistic Regression Model

Analysis of Hearing Loss Data using Correlated Data Analysis Techniques

Analyzing binary outcomes, going beyond logistic regression

Analysis of TB prevalence surveys

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Community oriented studies. New perspectives

Estimating Adjusted Prevalence Ratio in Clustered Cross-Sectional Epidemiological Data

Meta-analysis using individual participant data: one-stage and two-stage approaches, and why they may differ

GENERALIZED ESTIMATING EQUATIONS FOR LONGITUDINAL DATA. Anti-Epileptic Drug Trial Timeline. Exploratory Data Analysis. Exploratory Data Analysis

Analyzing diastolic and systolic blood pressure individually or jointly?

HIV/AIDS CLINICAL CARE QUALITY MANAGEMENT CHART REVIEW CHARACTERISTICS OF PATIENTS FACTORS ASSOCIATED WITH IMPROVED IMMUNOLOGIC STATUS

Daniel Boduszek University of Huddersfield

Part 8 Logistic Regression

Supplementary Material*

Practical Multivariate Analysis

ANALYSIS OF SURVEYS WITH EPI INFO AND STATA

Predictive statistical modelling approach to estimating TB burden. Sandra Alba, Ente Rood, Masja Straetemans and Mirjam Bakker

Evaluating health management programmes over time: application of propensity score-based weighting to longitudinal datajep_

Hierarchical Linear Models: Applications to cross-cultural comparisons of school culture

Data Analysis Using Regression and Multilevel/Hierarchical Models

HIV risk associated with injection drug use in Houston, Texas 2009: A Latent Class Analysis

Survey of Smoking, Drinking and Drug Use (SDD) among young people in England, Andrew Bryant

Selected Topics in Biostatistics Seminar Series. Missing Data. Sponsored by: Center For Clinical Investigation and Cleveland CTSC

Modeling Binary outcome

Sample size calculation for a stepped wedge trial

Subject index. bootstrap...94 National Maternal and Infant Health Study (NMIHS) example

Constructing a mixed model using the AIC

LOGLINK Example #1. SUDAAN Statements and Results Illustrated. Input Data Set(s): EPIL.SAS7bdat ( Thall and Vail (1990)) Example.

Table. A [a] Multiply imputed. Outpu

Deborah Kacanek, Konstantia Angelidou, Paige L. Williams, Miriam Chernoff, Kenneth Gadow, Sharon Nachman, The IMPAACT P1055 Study Team

Changes Over Time in Occurrence, Severity, and Distress of Common Symptoms During and After Radiation Therapy for Breast Cancer

Technical appendix Strengthening accountability through media in Bangladesh: final evaluation

Analysis of Vaccine Effects on Post-Infection Endpoints Biostat 578A Lecture 3

Bayesian Analysis of Between-Group Differences in Variance Components in Hierarchical Generalized Linear Models

Lisa Yelland. BMa&CompSc (Hons)

DOES FUNCTIONING DIFFER BEFORE AND AFTER DAYLIGHT SAVINGS TIME CHANGES AMONG PATIENTS WITH BIPOLAR DISORDER? Erika L. Douglas

cloglog link function to transform the (population) hazard probability into a continuous

APPENDIX: Supplementary Materials for Advance Directives And Nursing. Home Stays Associated With Less Aggressive End-Of-Life Care For

Appropriate Statistical Methods to Account for Similarities in Binary Outcomes Between Fellow Eyes

Logistic Regression Predicting the Chances of Coronary Heart Disease. Multivariate Solutions

4. STATA output of the analysis

Bayesian Logistic Regression Modelling via Markov Chain Monte Carlo Algorithm

Syntax Menu Description Options Remarks and examples Stored results Methods and formulas References Also see

Why Mixed Effects Models?

Today: Binomial response variable with an explanatory variable on an ordinal (rank) scale.

HIV Development Assistance and Adult Mortality in Africa: A replication study of Bendavid et al. (2012)

INTRODUCTION TO ECONOMETRICS (EC212)

Example 7.2. Autocorrelation. Pilar González and Susan Orbe. Dpt. Applied Economics III (Econometrics and Statistics)

Applied Medical. Statistics Using SAS. Geoff Der. Brian S. Everitt. CRC Press. Taylor Si Francis Croup. Taylor & Francis Croup, an informa business

IAS 2013 Towards an HIV Cure Symposium

The EuResist GEIE data base

Certificate Courses in Biostatistics

Small-area estimation of mental illness prevalence for schools

Detection of Unknown Confounders. by Bayesian Confirmatory Factor Analysis

Introduction to diagnostic accuracy meta-analysis. Yemisi Takwoingi October 2015

Dan Byrd UC Office of the President

Correlation and regression

Analysis of bivariate binomial data: Twin analysis

PSI RESEARCH TOOLKIT. Dashboard Analysis Series Five: Analysis Methodology for Complex Survey Data B UILDING R ESEARCH C APACITY

Donna L. Coffman Joint Prevention Methodology Seminar

Logistic regression. Department of Statistics, University of South Carolina. Stat 205: Elementary Statistics for the Biological and Life Sciences

Bayesian hierarchical modelling

Web Appendix Index of Web Appendix

You must answer question 1.

Least likely observations in regression models for categorical outcomes

Day Hospital versus Ordinary Hospitalization: factors in treatment discrimination

Estimating Heterogeneous Choice Models with Stata

Relationship of nighttime arousals and nocturnal cortisol in IBS and normal subjects. Miranda Bradford. A thesis submitted in partial fulfillment

Statistical Science Issues in HIV Vaccine Trials: Part I

Analysis and Interpretation of Data Part 1

Health related quality of life and antiretroviral treatment in patients over a period of 20 months in KwaZulu-Natal, South Africa

The impact of pre-selected variance inflation factor thresholds on the stability and predictive power of logistic regression models in credit scoring

HIV treatment interruptions are associated with heightened biomarkers of inflammation, coagulopathy and T-cell activation despite viral suppression

THE ANALYSIS OF METHADONE CLINIC DATA USING MARGINAL AND CONDITIONAL LOGISTIC MODELS WITH MIXTURE OR RANDOM EFFECTS

GSK Medicine: Study Number: Title: Rationale: Study Period: Objectives: Indication: Study Investigators/Centers: Research Methods: Data Source

HIV in Alameda County

Antiviral Therapy 2016; 21: (doi: /IMP3052)

Logistic Regression with Missing Data: A Comparison of Handling Methods, and Effects of Percent Missing Values

112 Statistics I OR I Econometrics A SAS macro to test the significance of differences between parameter estimates In PROC CATMOD

STIs in the Indian Country

Package speff2trial. February 20, 2015

The Impact of Relative Standards on the Propensity to Disclose. Alessandro Acquisti, Leslie K. John, George Loewenstein WEB APPENDIX

LONGITUDINAL TREATMENT PATTERNS AND ASSOCIATED OUTCOMES IN PATIENTS WITH NEWLY DIAGNOSED SYSTEMIC LUPUS ERYTHEMATOSUS. Hong Kan 7/12/2016

3rd IAS Conference on HIV Pathogenesis and Treatment. Poster Number Abstract #

Statistical Analysis Plan: Post-hoc analysis of the CALORIES trial

COMBINING DATA SOURCES TO EVALUATE HIV HOUSING PROGRAMS: EXAMPLES

Department of Statistics, Biostatistics & Informatics, University of Dhaka, Dhaka-1000, Bangladesh. Abstract

Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach

In this module I provide a few illustrations of options within lavaan for handling various situations.

Fang Hua 1,2*, Tanya Walsh 2, Anne-Marie Glenny 2 and Helen Worthington 2*

Catherine A. Welch 1*, Séverine Sabia 1,2, Eric Brunner 1, Mika Kivimäki 1 and Martin J. Shipley 1

Transcription:

Use of GEEs in STATA 1. When generalised estimating equations are used and example 2. Stata commands and options for GEEs 3. Results from Stata (and SAS!) 4. Another use of GEEs

Use of GEEs GEEs are one of the methods of analysis that account for correlated observations. Examples repeated observations on individuals over time, clustered observations (e.g. data grouped by family, general practice etc). Use of ordinary models to analyse data with correlated observations tends to produce incorrect SEs and p values for regression coefficients. Models that ignore clustering tend to underestimate SEs of regression coefficients for covariates. However with timevarying covariates, standard models may tend to overestimate SEs.

Use of GEEs GEEs can be used with a variety of models (linear, logistic, poisson). GEEs use robust estimation of standard errors to allow for clustering. Robust standard errors are derived using the observed variability in the data rather than the variability predicted by an underlying probability model (which produces modelbased standard errors). Working correlation matrix is specified reflecting average dependence among correlated observations.

GEEs and Multilevel models GEEs useful to allow for nonindependence in responses but not to further investigate this variability i.e. dependence is nuisance. Multilevel models (random effects models) more flexible useful when within cluster variability is of intrinsic interest or when there are >=1 random effects. GEE populationaveraged approach models marginal distributions. Longitudinal data is treated as crosssectional. Useful when aim is to investigate differences in populationaveraged response. Multilevel models subjectspecific approach. Longitudinal nature of data is preserved. Useful when aim is to investigate change in individuals responses.

GEEs and Multilevel models Example from Stata website effect of marriage on employment Outcome: employed/unemployed Predictor: married/unmarried Repeated data on subjects marriage and employment status Interpretation of odds ratio from GEE/ multilevel model Multilevel model odds of person being employed if married compared to odds of same person being employed if unmarried (does getting married affect a persons employment status?) GEE odds of average married person being employed compared to odds of average unmarried person being employed (do rates of employment differ for average married compared to average unmarried person?)

Example Royal Free HIV database, n=3884 patients Includes all HIVinfected patients seen at RF CD4 count, viral load (VL) and antiretroviral treatment (ART) are recorded at each visit Aims of analysis to investigate trends over time (1999 to 2004) in the clinic prevalence of low CD4 count and raised VL on ART to assess whether the prevalence of low CD4 count and raised VL on ART differ by demographic group to assess whether trends over time differ according to demographic group

Example Royal Free HIV database Outcomes for analysis (measured at midpoint of 6m intervals) Low CD4 count: CD4 < 200 /mm 3 Raised VL on treatment: VL > 50 c/ml, including subjects on ART only Explanatory variables Calendar year in 11 six month intervals (1999B to 2004B) Age at time of outcome measure Demographic group (1. MSM; 2. White heterosexual men; 3. Black heterosexual men; 4. White heterosexual women; 5. Black heterosexual women)

Format of data for analysis Example Hospno cal99_04 yr99_04 demog sex age10 cd4200 26891 1 1 2 1 3.374127 1 26891 2 1.5 2 1 3.399042 0 26891.. 2 1.. 26891 4 2.5 2 1 3.53128 0 26891 5 3 2 1 3.556194 0 26891 6 3.5 2 1 3.623272 0 26891 7 4 2 1 3.649829 0 26891 8 4.5 2 1 3.721013 0 26891 9 5 2 1 3.774675 0 26891 10 5.5 2 1 3.815469 0

Analysis with GEEs using STATA xtgee command Main components to be specified are: 1. Assumed distribution of response variable specified in family () option (e.g. normal, binomial, poisson) 2. Link between response variable and linear predictor specified in link () option (default for each family e.g. log for poisson, logit for binomial) 3. Structure of working correlation matrix specified in correlations () option 4. Clustering variable unit to which observations belong i () 5. Time period to which observations belong t ()

Analysis with GEEs using STATA Options for type of correlation between observations Independence no correlation do not need GEE correlations (indep) Exchangeable within a cluster any two observations are equally correlated, but no correlation between observations from different clusters correlations (exc) Autoregressive repeated measures that are mostly strongly correlated when close together in time and least correlated when furthest apart in time correlations (ar1) Unstructured no constraints are placed on correlations correlations (uns)

Summary of results from STATA and SAS Adjusted log odds ratio & SE of regression coefficient for demographic group black heterosexual M vs MSM only STATA SAS* Beta SE Beta SE Ordinary logistic 1.0567 0.0748 1.0567 0.0748 EXC (robust) 1.2231 0.1384 1.2231 0.1384 without robust 1.2231 0.1240 1.2231 0.1240 UNST (robust) 1.2235 0.1351 1.2243 0.1351 without robust 1.2235 0.1260 1.2243 0.1261 AR1 (robust) 1.1667 0.1355 without robust 1.1667 0.1148 *Default in SAS =robust option in STATA Modelse option in SAS=without robust in STATA

Summary of results from STATA and SAS Adjusted log odds ratio & SE of regression coefficient for calendar year per 1 year later STATA SAS* Beta SE Beta SE Ordinary logistic 0.1799 0.0191 0.1799 0.0191 EXC (robust) 0.2282 0.0215 0.2282 0.0215 without robust 0.2282 0.0164 0.2282 0.0164 UNST (robust) 0.2185 0.0222 0.2187 0.0221 without robust 0.2185 0.0229 0.2187 0.0229 AR1 (robust) 0.2017 0.0233 without robust 0.2017 0.0247 *Default in SAS =robust option in STATA Modelse option in SAS=without robust in STATA

Another use of GEEs GEEs can be used to give correct SEs in Poisson models for binary outcome measures Need arises when we have binary outcome but want to express results in terms of risk ratios not odds ratios as produced by logistic regression Risk ratios more meaningful odds ratio is not a good estimate of risk ratio when the outcome is common Use of ordinary Poisson regression with binary endpoint data will produce risk ratios but will result in standard errors and p values that are too large (i.e. conservative results) due to underdispersion from use of Poisson model for binomial data

Another use of GEEs Can use Poisson regression with robust standard errors to give correct SEs and p values for risk ratios For data with one observation per cluster, use of GEE with unstructured correlation will produce risk ratios with robust standard errors In this case robust standard error deals with variance overestimation when Poisson regression applied to binary data (usually use GEEs to deal with variance underestimation for correlated observations)

Example Factors predicting virological failure in subjects starting ART n=3825 subjects, using Poisson regression No GEE GEE (unstructured) logistic Beta SE p Beta SE p p Risk group MSM 0 0 Het. men 0.231 0.091 0.011 0.231 0.074 0.002 0.002 Het. women 0.147 0.087 0.092 0.147 0.071 0.039 0.040 IDU 0.619 0.075 <0.001 0.619 0.058 <0.001 <0.001 Other 0.382 0.130 0.003 0.382 0.107 <0.001 <0.001 Age 0.118 0.033 0.001 0.118 0.028 <0.001 <0.001 (per yr older) Previous AIDS 0.171 0.073 0.020 0.171 0.056 0.003 0.003 Pre ART VL 0.030 0.034 0.366 0.030 0.026 0.260 0.245 (per I log higher)