Use of GEEs in STATA 1. When generalised estimating equations are used and example 2. Stata commands and options for GEEs 3. Results from Stata (and SAS!) 4. Another use of GEEs
Use of GEEs GEEs are one of the methods of analysis that account for correlated observations. Examples repeated observations on individuals over time, clustered observations (e.g. data grouped by family, general practice etc). Use of ordinary models to analyse data with correlated observations tends to produce incorrect SEs and p values for regression coefficients. Models that ignore clustering tend to underestimate SEs of regression coefficients for covariates. However with timevarying covariates, standard models may tend to overestimate SEs.
Use of GEEs GEEs can be used with a variety of models (linear, logistic, poisson). GEEs use robust estimation of standard errors to allow for clustering. Robust standard errors are derived using the observed variability in the data rather than the variability predicted by an underlying probability model (which produces modelbased standard errors). Working correlation matrix is specified reflecting average dependence among correlated observations.
GEEs and Multilevel models GEEs useful to allow for nonindependence in responses but not to further investigate this variability i.e. dependence is nuisance. Multilevel models (random effects models) more flexible useful when within cluster variability is of intrinsic interest or when there are >=1 random effects. GEE populationaveraged approach models marginal distributions. Longitudinal data is treated as crosssectional. Useful when aim is to investigate differences in populationaveraged response. Multilevel models subjectspecific approach. Longitudinal nature of data is preserved. Useful when aim is to investigate change in individuals responses.
GEEs and Multilevel models Example from Stata website effect of marriage on employment Outcome: employed/unemployed Predictor: married/unmarried Repeated data on subjects marriage and employment status Interpretation of odds ratio from GEE/ multilevel model Multilevel model odds of person being employed if married compared to odds of same person being employed if unmarried (does getting married affect a persons employment status?) GEE odds of average married person being employed compared to odds of average unmarried person being employed (do rates of employment differ for average married compared to average unmarried person?)
Example Royal Free HIV database, n=3884 patients Includes all HIVinfected patients seen at RF CD4 count, viral load (VL) and antiretroviral treatment (ART) are recorded at each visit Aims of analysis to investigate trends over time (1999 to 2004) in the clinic prevalence of low CD4 count and raised VL on ART to assess whether the prevalence of low CD4 count and raised VL on ART differ by demographic group to assess whether trends over time differ according to demographic group
Example Royal Free HIV database Outcomes for analysis (measured at midpoint of 6m intervals) Low CD4 count: CD4 < 200 /mm 3 Raised VL on treatment: VL > 50 c/ml, including subjects on ART only Explanatory variables Calendar year in 11 six month intervals (1999B to 2004B) Age at time of outcome measure Demographic group (1. MSM; 2. White heterosexual men; 3. Black heterosexual men; 4. White heterosexual women; 5. Black heterosexual women)
Format of data for analysis Example Hospno cal99_04 yr99_04 demog sex age10 cd4200 26891 1 1 2 1 3.374127 1 26891 2 1.5 2 1 3.399042 0 26891.. 2 1.. 26891 4 2.5 2 1 3.53128 0 26891 5 3 2 1 3.556194 0 26891 6 3.5 2 1 3.623272 0 26891 7 4 2 1 3.649829 0 26891 8 4.5 2 1 3.721013 0 26891 9 5 2 1 3.774675 0 26891 10 5.5 2 1 3.815469 0
Analysis with GEEs using STATA xtgee command Main components to be specified are: 1. Assumed distribution of response variable specified in family () option (e.g. normal, binomial, poisson) 2. Link between response variable and linear predictor specified in link () option (default for each family e.g. log for poisson, logit for binomial) 3. Structure of working correlation matrix specified in correlations () option 4. Clustering variable unit to which observations belong i () 5. Time period to which observations belong t ()
Analysis with GEEs using STATA Options for type of correlation between observations Independence no correlation do not need GEE correlations (indep) Exchangeable within a cluster any two observations are equally correlated, but no correlation between observations from different clusters correlations (exc) Autoregressive repeated measures that are mostly strongly correlated when close together in time and least correlated when furthest apart in time correlations (ar1) Unstructured no constraints are placed on correlations correlations (uns)
Summary of results from STATA and SAS Adjusted log odds ratio & SE of regression coefficient for demographic group black heterosexual M vs MSM only STATA SAS* Beta SE Beta SE Ordinary logistic 1.0567 0.0748 1.0567 0.0748 EXC (robust) 1.2231 0.1384 1.2231 0.1384 without robust 1.2231 0.1240 1.2231 0.1240 UNST (robust) 1.2235 0.1351 1.2243 0.1351 without robust 1.2235 0.1260 1.2243 0.1261 AR1 (robust) 1.1667 0.1355 without robust 1.1667 0.1148 *Default in SAS =robust option in STATA Modelse option in SAS=without robust in STATA
Summary of results from STATA and SAS Adjusted log odds ratio & SE of regression coefficient for calendar year per 1 year later STATA SAS* Beta SE Beta SE Ordinary logistic 0.1799 0.0191 0.1799 0.0191 EXC (robust) 0.2282 0.0215 0.2282 0.0215 without robust 0.2282 0.0164 0.2282 0.0164 UNST (robust) 0.2185 0.0222 0.2187 0.0221 without robust 0.2185 0.0229 0.2187 0.0229 AR1 (robust) 0.2017 0.0233 without robust 0.2017 0.0247 *Default in SAS =robust option in STATA Modelse option in SAS=without robust in STATA
Another use of GEEs GEEs can be used to give correct SEs in Poisson models for binary outcome measures Need arises when we have binary outcome but want to express results in terms of risk ratios not odds ratios as produced by logistic regression Risk ratios more meaningful odds ratio is not a good estimate of risk ratio when the outcome is common Use of ordinary Poisson regression with binary endpoint data will produce risk ratios but will result in standard errors and p values that are too large (i.e. conservative results) due to underdispersion from use of Poisson model for binomial data
Another use of GEEs Can use Poisson regression with robust standard errors to give correct SEs and p values for risk ratios For data with one observation per cluster, use of GEE with unstructured correlation will produce risk ratios with robust standard errors In this case robust standard error deals with variance overestimation when Poisson regression applied to binary data (usually use GEEs to deal with variance underestimation for correlated observations)
Example Factors predicting virological failure in subjects starting ART n=3825 subjects, using Poisson regression No GEE GEE (unstructured) logistic Beta SE p Beta SE p p Risk group MSM 0 0 Het. men 0.231 0.091 0.011 0.231 0.074 0.002 0.002 Het. women 0.147 0.087 0.092 0.147 0.071 0.039 0.040 IDU 0.619 0.075 <0.001 0.619 0.058 <0.001 <0.001 Other 0.382 0.130 0.003 0.382 0.107 <0.001 <0.001 Age 0.118 0.033 0.001 0.118 0.028 <0.001 <0.001 (per yr older) Previous AIDS 0.171 0.073 0.020 0.171 0.056 0.003 0.003 Pre ART VL 0.030 0.034 0.366 0.030 0.026 0.260 0.245 (per I log higher)