Analytic Strategies for the OAI Data

Similar documents
Longitudinal and Hierarchical Analytic Strategies for OAI Data

How to analyze correlated and longitudinal data?

Generalized Estimating Equations for Depression Dose Regimes

Use of GEEs in STATA

OAI Data Users ACR Study Group

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?

1. Objective: analyzing CD4 counts data using GEE marginal model and random effects model. Demonstrate the analysis using SAS and STATA.

Linear Regression in SAS

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data

Today: Binomial response variable with an explanatory variable on an ordinal (rank) scale.

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data

Comparison And Application Of Methods To Address Confounding By Indication In Non- Randomized Clinical Studies

Methodology for Non-Randomized Clinical Trials: Propensity Score Analysis Dan Conroy, Ph.D., inventiv Health, Burlington, MA

Analyzing diastolic and systolic blood pressure individually or jointly?

Donna L. Coffman Joint Prevention Methodology Seminar

Data Analysis Using Regression and Multilevel/Hierarchical Models

Chapter 13 Estimating the Modified Odds Ratio

Analysis of TB prevalence surveys

LOGLINK Example #1. SUDAAN Statements and Results Illustrated. Input Data Set(s): EPIL.SAS7bdat ( Thall and Vail (1990)) Example.

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

MAKING THE NSQIP PARTICIPANT USE DATA FILE (PUF) WORK FOR YOU

Methods to control for confounding - Introduction & Overview - Nicolle M Gatto 18 February 2015

Daniel Boduszek University of Huddersfield

Certificate Courses in Biostatistics

Simple Linear Regression

Applied Medical. Statistics Using SAS. Geoff Der. Brian S. Everitt. CRC Press. Taylor Si Francis Croup. Taylor & Francis Croup, an informa business

Introduction to diagnostic accuracy meta-analysis. Yemisi Takwoingi October 2015

Lecture 14: Adjusting for between- and within-cluster covariates in the analysis of clustered data May 14, 2009

Selected Topics in Biostatistics Seminar Series. Missing Data. Sponsored by: Center For Clinical Investigation and Cleveland CTSC

Planning knee OA prevention studies in obese populations Lessons from the Multicenter Osteoarthritis Study (MOST) and Osteoathritis Initiative (OAI)

Index. Springer International Publishing Switzerland 2017 T.J. Cleophas, A.H. Zwinderman, Modern Meta-Analysis, DOI /

Biostatistics II

Basic Biostatistics. Chapter 1. Content

Analyzing binary outcomes, going beyond logistic regression

Multi-level approaches to understanding and preventing obesity: analytical challenges and new directions

cloglog link function to transform the (population) hazard probability into a continuous

Intro to SPSS. Using SPSS through WebFAS

Parameter Estimation of Cognitive Attributes using the Crossed Random- Effects Linear Logistic Test Model with PROC GLIMMIX

Models for HSV shedding must account for two levels of overdispersion

IAPT: Regression. Regression analyses

Midterm Exam ANSWERS Categorical Data Analysis, CHL5407H

George B. Ploubidis. The role of sensitivity analysis in the estimation of causal pathways from observational data. Improving health worldwide

Lecture Outline Biost 517 Applied Biostatistics I. Statistical Goals of Studies Role of Statistical Inference

Things you need to know about the Normal Distribution. How to use your statistical calculator to calculate The mean The SD of a set of data points.

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics

Evaluating health management programmes over time: application of propensity score-based weighting to longitudinal datajep_

Inverse Probability of Censoring Weighting for Selective Crossover in Oncology Clinical Trials.

Experimental Studies. Statistical techniques for Experimental Data. Experimental Designs can be grouped. Experimental Designs can be grouped

Logistic Regression Predicting the Chances of Coronary Heart Disease. Multivariate Solutions

Session 3: Dealing with Reverse Causality

THE UNIVERSITY OF OKLAHOMA HEALTH SCIENCES CENTER GRADUATE COLLEGE A COMPARISON OF STATISTICAL ANALYSIS MODELING APPROACHES FOR STEPPED-

investigate. educate. inform.

MEDICATION ADHERENCE: TAILORING THE ANALYSIS

First of two parts Joseph Hogan Brown University and AMPATH

Introducing a SAS macro for doubly robust estimation

Analysis and Interpretation of Data Part 1

Weight Adjustment Methods using Multilevel Propensity Models and Random Forests

V. LAB REPORT. PART I. ICP-AES (section IVA)

Complex Regression Models with Coded, Centered & Quadratic Terms

Propensity Score Methods for Causal Inference with the PSMATCH Procedure

Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach

Zheng Yao Sr. Statistical Programmer

Epidemiologic Methods I & II Epidem 201AB Winter & Spring 2002

Technical appendix Strengthening accountability through media in Bangladesh: final evaluation

The Australian longitudinal study on male health sampling design and survey weighting: implications for analysis and interpretation of clustered data

Module 14: Missing Data Concepts

The AB of Random Effects Models

Clinical Trials A Practical Guide to Design, Analysis, and Reporting

Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

From Biostatistics Using JMP: A Practical Guide. Full book available for purchase here. Chapter 1: Introduction... 1

Using GLIMMIX and GENMOD Procedures to Analyze Longitudinal Data from a Department of Veterans Affairs Multisite Randomized Controlled Trial

Available from Deakin Research Online:

Section on Survey Research Methods JSM 2009

Relationship of nighttime arousals and nocturnal cortisol in IBS and normal subjects. Miranda Bradford. A thesis submitted in partial fulfillment

The SAS SUBTYPE Macro

Logistic regression. Department of Statistics, University of South Carolina. Stat 205: Elementary Statistics for the Biological and Life Sciences

Investigation of relative survival from colorectal cancer between NHS organisations

Biostat/Stat 571: Coursework 5

Mammographic density and risk of breast cancer by tumor characteristics: a casecontrol

Mediation analysis a tool to move from estimating treatment effect to understanding treatment mechanism

Today Retrospective analysis of binomial response across two levels of a single factor.

AClass: A Simple, Online Probabilistic Classifier. Vikash K. Mansinghka Computational Cognitive Science Group MIT BCS/CSAIL

Introduction to ROC analysis

HZAU MULTIVARIATE HOMEWORK #2 MULTIPLE AND STEPWISE LINEAR REGRESSION

Daniel Boduszek University of Huddersfield

BEST PRACTICES FOR IMPLEMENTATION AND ANALYSIS OF PAIN SCALE PATIENT REPORTED OUTCOMES IN CLINICAL TRIALS

Quantitative Research Methods and Tools

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training.

Central Reading of Knee X-rays for Kellgren & Lawrence Grade and Individual Radiographic Features of Tibiofemoral Knee OA

What s New in SUDAAN 11

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

Quadriceps Muscle and Intermuscular Fat Volumes in the Thighs of Men in the OAI are Associated with Physical Function and Knee Pain

RANDOMIZATION. Outline of Talk

Matched Cohort designs.

Statistics as a Tool. A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations.

Is there an association between waist circumference and type 2 diabetes or impaired fasting glucose in US adolescents?

Project 15 Test-Retest Reliability of Semi-quantitative Readings from Knee Radiographs

MEA DISCUSSION PAPERS

Xingzhong (Jason) Jin

Transcription:

Analytic Strategies for the OAI Data Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and Biostatistics, UCSF ACR October 2008

Outline 1. Introduction and examples. 2. General analysis considerations. 3. Accommodating correlations between knees within a person 4. Accommodating correlations over time 5. Analyzing change. 6. Questions from the participants.

Introduction Analysis technique depends on nature of the outcome variable and research question. Binary: logistic regression (e.g., presence of osteophytes) Odds ratios, area under ROC curve Numeric: linear regression (e.g., WOMAC pain) Also time to event (Cox model or pooled logistic regression), count outcomes (Poisson regression) Methods need to be modified if there are clustered data or repeated measures.

Prototypical examples Example 1: (cross sectional) Is KOOS quality of life related to BMI at baseline? Example 2: (clustered by knee) Is difference between men and women in the WOMAC pain score the same for those with and without symptomatic knee OA at baseline?

Prototypical examples Example 3: (clustered data) Is the presence of osteophytes at baseline predicted by knee pain? Example 4: (longitudinal/change) Is the 18 month change in WOMAC pain score the same or different for those with symptomatic knee OA at baseline?

Ex 1: Is KOOS QoL related to baseline BMI? KOOS QoL 0 20 40 60 80 100 Lowess smoother 10 20 30 40 50 BM I bandwidth =.8

Ex 1: Is KOOS QoL related to baseline BMI? Analysis: linear regression. Regression coefficient is -1.01 with a SE of 0.09 and a p-value of <0.0001. Not clustered data.

Accommodating clustered or repeated measures data Important to accommodate clustering and repeated measures. Otherwise SEs, p-values and confidence intervals can be incorrect, sometimes grossly so. Not possible to predict how the results will change when the proper analysis is used.

Efficiency of analyses of clustered data For between person predictors (e.g. BMI), the proper, clustered-data (e.g.,outcome measured on two knees) analysis will usually have larger SEs. Intuition: for between person predictors an analysis that assumes all knees are independent over-represents the information content. For within person predictors (e.g., kneespecific), the proper, clustered-data analysis will usually have smaller SEs. Intuition: Using each person as their own control increases efficiency.

Ex 2: Is there a sex by baseline SX OA interaction for the WOMAC pain score? When analyzing knees, effect of failing to allow correlation between a person s knees Analysis Coeff SE p-value Assume indep -0.87 0.27 0.001 Allow correlation -0.87 0.37 0.02 Mean WOMAC pain score Males Females Difference No Knee OA 1.52 1.57 0.05 Knee OA 3.58 4.49 0.91

Accommodating clustered or repeated measures data Many methods exist to accommodate Mixed models (e.g., SAS Proc MIXED, NLMIXED) Repeated measures ANOVA (e.g., SAS Proc GLM) Alternating logistic regression (in SAS Proc GENMOD) Generalized Estimating Equations (GEEs). Invoked in SAS Proc GENMOD using the REPEATED statement.

Accommodating clustered or repeated measures data Repeated measures/clustering is an issue for the outcome variable, not the predictor. Example: Are days missed from work predicted by knee pain (separate values for left and right knee). Does not have repeated measures on the outcome. Can accommodate by including both left and right knee values as predictors or by calculating summary measure(s) (e.g., average knee pain).

Desirable features for an analysis method Can accommodate a variety of outcome types (e.g., binary and numeric). Can accommodate clustering by knee, person (over time) and perhaps even different regions of interest (ROI) within a knee. Does not require extensive modeling of the correlation over time or between knees or between ROI in the knee.

Recommended analysis strategy - GEEs Works with many types of outcomes. Robust variance estimate obviates need to model correlation structure. Works well with not too many repeated measures per subject and a large number of subjects. So ideal for analyses incorporating multiple knees and time points. Somewhat less good if there are also multiple ROI per knee treated as outcomes (e.g., tibial and femoral cartilage loss).

Recommended analysis strategy - GEEs Accommodates unbalanced data, e.g., some subjects contribute one knee while others contribute two. Accommodates unequally spaced data, e.g., missed visits. BUT always be wary of the pattern of missing data. If the fact that the data are missing is informative (e.g., those with missed visits are in extreme pain), virtually no standard statistical method will get the right answer.

Ex 2: Is there a sex by baseline SX OA interaction for the WOMAC pain score? Effect of different analysis methods: Analysis Coeff SE p-value Assume indep -0.87 0.27 0.0013 GEE - robust -0.87 0.37 0.02 Mixed -0.87 0.32 0.01 Mixed - empirical -0.87 0.37 0.02 SAS GENMOD REPEATED option SAS MIXED EMPIRICAL option Stata cluster() or vce(robust) or robust

Ex 2: Does pain predict presence of osteophytes at baseline? The odds of an osteophyte increase by 12.5% with each increase in pain score of 1 (0-10 scale). Odds ratio of 1.12 (95% CI 1.09, 1.15). Accounts for clustering by subject.

Analyzing change with longitudinal data Including a variable for time (or visit) describes the change over time, e.g., progression. Inclusion of time (or visit) interactions with baseline predictors allows analysis of whether baseline predictors are associated with change over time. Inclusion of a time-varying predictor (e.g., MRI findings at sequential visits) allows analysis of whether change in that predictor is associated with change in the outcome.

Analyzing change with longitudinal data Can use lagged variables to ask if prior values of risk factors predict later onset of disease (Is it prognostic?) Helps to strengthen inference of causation.

Ex 3: Does 18 month change in WOMAC pain depend on baseline SX K OA? Include a SX K OA by visit interaction in the model.

Analyzing change over time: What about analyzing change scores? An excellent and simple method when there are only two time points of interest and most subjects have complete data. Not as attractive with multiple time points or unbalanced data. Some loss of efficiency. If you do analyze change scores, be very wary of adjusting for baseline values of the change scores. Doing so will usually bias estimates of change.

Analyzing change over time: What about analyzing change scores? Mean WOMAC knee pain (N) SX KOA Visit No Yes BL 1.55 4.13 1,954 730 12 month 1.42 3.72 1,860 676

Analyzing change over time: What about analyzing change scores? Using a longitudinal analysis the difference in change (BL to 12 month visit) between the OA and non-oa groups is 0.268 with a SE of 0.1333 and a p-value of 0.045. Using the change score analysis the difference is 0.264 with a SE of 0.1339 and a p-value of 0.049. Adjusting for the baseline value gives a difference of 0.42 with a p-value approximately 0. So the adjusted analysis addresses a different question.

Would more detailed workshops be useful? Hands-on in a computer lab? What would you like to see? How to explore the data to find variables How to generate simple descriptive statistics from the web site How to work with the image data More specific guidance on statistical analysis methods Sign up sheet in back to give us feedback

Questions and Answers? Contact information: chuck@biostat.ucsf.edu

Data layouts for longitudinal/clustered data For longitudinal analyses: long format one row of data per outcome (multiple rows per subject) id visit knee bmi sxkoa 9000296 0 L 29.8 0 9000296 0 R 29.8 1 9000296 12 L 29.4 0 9000296 12 R 29.4 1 9000798 0 R 32.4 0 9000798 12 L 32.3 0 9000798 12 R 32.3 0 9000798 18 L 32.5 0 9000798 18 R 32.5 1

Data layouts for longitudinal/clustered data For change score analyses: wide format. Append all the variables as multiple columns, one row per subject. id p01bmi v01bmi v02bmi 9000296 29.8 29.4 29.1 9000798 32.4 32.3 32.5

Below is sample SAS code that was used to generate the examples in the talk. These are intended to indicate the variety of analyses possible with the OAI data with a focus on accommodating the repeated measures/clustered data nature of the dataset. They do not represent thorough data analyses. data all; set cemwork.oai_example; /* Pick out half the subjects so examples run faster*/ if id<9547618; /* Symptomatic knee OA */ sxkoa=p01sxkoa>0; /* Define definite presence of osteophytes */ defosteo=(svgkost=2); run; /* For repeated measures analyses, data should be sorted by id */ proc sort; by id visit; run; /* Extract baseline data only for first three analyses */ data bl; set work.all; if visit=0; run; title "Cross-sectional analysis of KOOS QoL"; /* Example 1: Association of KOOS QoL as a function of baseline BMI */ proc glm data=work.bl; model koosqol=bmi; run; title "Clustered baseline of WOMAC pain score"; /* Example 2: Comparing association of WOMAC pain score with symptomatic knee OA between men and women at baseline only */ /* Using GEE methods assuming independence working correlation */ proc genmod data=work.bl; class p02sex id; model womkpg=sxkoa p02sex SXKOA*p02sex; repeated subject=id; run; /* Example 2: GEE specifying an exchangeable working correlation structure */ proc genmod data=work.bl; class p02sex id; model womkpg=sxkoa p02sex sxkoa*p02sex; repeated subject=id / type=exch; run; /* Example 2: using mixed model */ proc mixed data=work.bl noclprint; class p02sex id; model womkpg=sxkoa p02sex sxkoa*p02sex/solution; random intercept/ subject=id; run;

/* Ex ample 2: using mixed model with EMPIRICAL option */ proc mixed data=work.bl empirical noclprint; class p02sex id; model womkpg=sxkoa p02sex sxkoa*p02sex/solution; random intercept/subject=id; run; title "Binary outcome clustered"; /* Example 3: Does pain scale predict presence of osteophytes at baseline */ proc genmod data=work.bl descending; class id; model defosteo=p7gkrcv / dist=bin; estimate "pain effect" p7gkrcv 1 / exp; repeated subject=id / type=exch; run; title "Longitudinal"; /* Example 4: Does change in WOMAC pain depend on baseline SX KOA?*/ /* Using GEEs */ proc genmod data=work.all; class p02sex id visit; model womkpg=sxkoa p02sex visit sxkoa*p02sex sxkoa*visit; repeated subject=id; run; /* Example 4: GEE specifying an exchangeable working correlation structure */ proc genmod data=work.all; class p02sex id visit; model womkpg=sxkoa p02sex visit sxkoa*p02sex sxkoa*visit; repeated subject=id/type=exch; run; /* Example 4: mixed model specifying an exchangeable working correlation structure */ proc mixed data=work.all noclprint; class p02sex id visit; model womkpg=sxkoa p02sex visit sxkoa*p02sex sxkoa*visit/solution; random id; run; /* Example 4: mixed model using EMPIRICAL option */ proc mixed data=work.all noclprint empirical; class p02sex id visit; model womkpg=sxkoa p02sex visit sxkoa*p02sex sxkoa*visit/solution; random intercept/subject=id; run; /* E ample x 4: mixed model specifying a nested error structure */ proc mixed data=work.all noclprint; class p02sex id visit; model womkpg=sxkoa p02sex visit sxkoa*p02sex sxkoa*visit/solution; random id visit(id); run;