Tutorial #7A: Latent Class Growth Model (# seizures)

Similar documents
Step 3 Tutorial #3: Obtaining equations for scoring new cases in an advanced example with quadratic term

LOGLINK Example #1. SUDAAN Statements and Results Illustrated. Input Data Set(s): EPIL.SAS7bdat ( Thall and Vail (1990)) Example.

Daniel Boduszek University of Huddersfield

Sampling Weights, Model Misspecification and Informative Sampling: A Simulation Study

Multiple Linear Regression Analysis

Applications. DSC 410/510 Multivariate Statistical Methods. Discriminating Two Groups. What is Discriminant Analysis

4. STATA output of the analysis

Using SPSS for Correlation

GENERALIZED ESTIMATING EQUATIONS FOR LONGITUDINAL DATA. Anti-Epileptic Drug Trial Timeline. Exploratory Data Analysis. Exploratory Data Analysis

bivariate analysis: The statistical analysis of the relationship between two variables.

Lesson: A Ten Minute Course in Epidemiology

Appendix Part A: Additional results supporting analysis appearing in the main article and path diagrams

Daniel Boduszek University of Huddersfield

Growth Modeling With Nonignorable Dropout: Alternative Analyses of the STAR*D Antidepressant Trial

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto

Simple Linear Regression One Categorical Independent Variable with Several Categories

Constructing a mixed model using the AIC

Problem set 2: understanding ordinary least squares regressions

Unit 1 Exploring and Understanding Data

EXPERIMENT 3 ENZYMATIC QUANTITATION OF GLUCOSE

Mediation Analysis With Principal Stratification

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

Dr. Kelly Bradley Final Exam Summer {2 points} Name

11/24/2017. Do not imply a cause-and-effect relationship

Daniel Boduszek University of Huddersfield

CHAPTER TWO REGRESSION

Growth Modeling with Non-Ignorable Dropout: Alternative Analyses of the STAR*D Antidepressant Trial

Joint Modelling of Event Counts and Survival Times: Example Using Data from the MESS Trial

Midterm Exam ANSWERS Categorical Data Analysis, CHL5407H

Charts Worksheet using Excel Obesity Can a New Drug Help?

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?

Part 8 Logistic Regression

Correlation and Regression

Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 5 Residuals and multiple regression Introduction

1. Objective: analyzing CD4 counts data using GEE marginal model and random effects model. Demonstrate the analysis using SAS and STATA.

Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision in Pune, India


MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

SUPPLEMENTAL MATERIAL

Modeling Sentiment with Ridge Regression

CNV PCA Search Tutorial

Business Statistics Probability

Chapter Eight: Multivariate Analysis

Simple Linear Regression the model, estimation and testing

Q: How do I get the protein concentration in mg/ml from the standard curve if the X-axis is in units of µg.

Psy201 Module 3 Study and Assignment Guide. Using Excel to Calculate Descriptive and Inferential Statistics

Multilevel Latent Class Analysis: an application to repeated transitive reasoning tasks

Introduced ICD Changes in Charting... 2

Linear Regression in SAS

ANOVA. Thomas Elliott. January 29, 2013

Multiple Regression Using SPSS/PASW

Data Analysis in Practice-Based Research. Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

Anticoagulation Manager - Getting Started

Chapter Eight: Multivariate Analysis

Today: Binomial response variable with an explanatory variable on an ordinal (rank) scale.

IBRIDGE 1.0 USER MANUAL

Modelling Research Productivity Using a Generalization of the Ordered Logistic Regression Model

V. LAB REPORT. PART I. ICP-AES (section IVA)

Technologies for Data Analysis for Experimental Biologists

NORTH SOUTH UNIVERSITY TUTORIAL 2

EXERCISE: HOW TO DO POWER CALCULATIONS IN OPTIMAL DESIGN SOFTWARE

Analysis and Interpretation of Data Part 1

Daniel Boduszek University of Huddersfield

ANOVA in SPSS (Practical)

What Else do Epileptic Data Reveal

1 Introduction and Motivation

Math 075 Activities and Worksheets Book 2:

Media, Discussion and Attitudes Technical Appendix. 6 October 2015 BBC Media Action Andrea Scavo and Hana Rohan

The North Carolina Health Data Explorer

Catherine A. Welch 1*, Séverine Sabia 1,2, Eric Brunner 1, Mika Kivimäki 1 and Martin J. Shipley 1

Psychology Research Methods Lab Session Week 10. Survey Design. Due at the Start of Lab: Lab Assignment 3. Rationale for Today s Lab Session

Addendum: Multiple Regression Analysis (DRAFT 8/2/07)

Detection of Differential Test Functioning (DTF) and Differential Item Functioning (DIF) in MCCQE Part II Using Logistic Models

Logistic Regression Predicting the Chances of Coronary Heart Disease. Multivariate Solutions

Background. 2 5/30/2017 Company Confidential 2015 Eli Lilly and Company

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data

Reconsidering Social Capital: A Latent Class Approach

Binary Diagnostic Tests Two Independent Samples

MEA DISCUSSION PAPERS

One-Way Independent ANOVA

LI Analysis Training Series

10. LINEAR REGRESSION AND CORRELATION

Here are the various choices. All of them are found in the Analyze menu in SPSS, under the sub-menu for Descriptive Statistics :

An application of a pattern-mixture model with multiple imputation for the analysis of longitudinal trials with protocol deviations

Modeling unobserved heterogeneity in Stata

Determinants and Status of Vaccination in Bangladesh

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Evaluating Social Programs Course: Evaluation Glossary (Sources: 3ie and The World Bank)

PSY 216: Elementary Statistics Exam 4

Utilizing the SAP ISR Report to Monitor ISRs

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

POL 242Y Final Test (Take Home) Name

The University of Texas MD Anderson Cancer Center Division of Quantitative Sciences Department of Biostatistics. CRM Suite. User s Guide Version 1.0.

Regression. Page 1. Variables Entered/Removed b Variables. Variables Removed. Enter. Method. Psycho_Dum

A SAS Macro for Adaptive Regression Modeling

The Association Design and a Continuous Phenotype

Weight Adjustment Methods using Multilevel Propensity Models and Random Forests

Transcription:

Tutorial #7A: Latent Class Growth Model (# seizures) 2.50 Class 3: Unstable (N = 6) Cluster modal 1 2 3 Mean proportional change from baseline 2.00 1.50 1.00 Class 1: No change (N = 36) 0.50 Class 2: Improved (N = 17) 1 2 3 4 TIME Figure 1. A reanalysis of a longitudinal data set on counts of epileptic seizures using latent class growth models. (Data source: Thall and Vail, 1990) What you will learn: To use Latent GOLD to identify distinct latent class growth trajectories in the data. To name the identified latent class subgroups based on their growth patterns o classify 36 cases as unchanged (Class 1 above) o classify 17 cases as improved (Class 2 above) o classify the remaining 6 cases as Unstable (Class 3 above) How to estimate a latent class growth model (Poisson mixture model) to these data that shows those receiving the drug treatment were significantly more likely than the placebo group to improve and significantly less likely to show no change over their baseline seizure rate (p =.02). 1

The Data In this study, 59 epileptics were randomly assigned to either an anti-seizure medication Progabide (TRT = 1; n=31) or a placebo (TRT = 0; n=28) as an adjunct to other chemotherapy. For each, a base number of seizures was recorded for 8 weeks before the drugs were administered (BASE) and then for four consecutive 2 week periods (TIME) each preceding a visit to the clinic. We also know the age (AGE) of each participant. Data for the first 3 cases are shown below: Figure 2. Data for the first 3 cases The Poisson Mixture Model Let Y ixt denote the number of epileptic fits during the 2 weeks prior to visit T=t for case i. Case i is assumed to belong to one of K unobservable latent classes x=1,2,,k. We assume that Y ixt follows the Poisson mixture model given below: ln( Y ) = β ln( Y ) + α + β + ε t = 1,2,3,4 T ixt 0 0 i. x t. x ixt where Y 0i = base rate of seizures for case i per 2-week baseline period = BASE /4 (labeled AVGBASE in the.sav file shown above) α.x is the random intercept measuring the overall change from the baseline 2

T and β t. x is the random effect associated with the t th 2-week treatment period Hence, each latent class x identifies a distinct pattern of change in the seizure rate from the baseline to the treatment period. β T t x For identification of the. 4 t= 1 β T t. x = 0, we use effect coding: In our final model, β0 will be modeled as a class independent offset i.e., β 0 =1, which implies that the expected % change in the seizure rate from the baseline to period t is: E( Y / Y ) = exp( α + β ) T ixt 0 i. x t. x Thus, the growth trajectory associated with latent class x is given by: exp( α β ) + t=1,2,3,4 T. x t. x Tutorials #7A and #7B illustrate somewhat different approaches for using Latent GOLD to estimate 1) the trajectories and 2) the treatment effect. These different approaches agree on the following results: Latent class #1 contains approximately 57% of the cases who show basically no change from the baseline rate. Class #2 (31%-32% of the cases) show a significant reduction in seizure rate. The remaining class(es) consist of 6 unstable cases who show a substantial increase in at least one of the treatment periods. These 6 cases were all identified as outliers in an earlier analysis of these data by Rabe-Hesketh and Skrondal (2004). Those treated with Progabide were significantly less likely to be in class 1 ( no change ) and significantly more likely to be in class 2 ( improved ) than the Placebo group. In tutorial #7A, our analysis consists of two steps. First, we estimate a pure (unsupervised) growth model to identify the K different growth patterns over the four post time periods. This growth model pure in the sense that covariate information (AGE and TREATMENT status of the cases) is not taken into account. During step 2 we assess the relationship between these covariates and the different classes. In tutorial #7B, we estimate the treatment effect and all the model parameters simultaneously (i.e., in a single step) by specifying TREATMENT as an active as opposed to inactive covariate. The models and methods illustrated here are simpler to apply than approaches based on Generalized Estimating Equations that have used in conjunction with these data by others (Diggle, et. al, 1994; Lee, 2004). All estimates are maximum likelihood. 3

Setting up the Model To retrieve the setups for these models: FILE OPEN epil.lgf Double click on Model1 The Variables tab appears as follows: Figure 3. Variables Tab for epil.lgf The scale type for the dependent variable Y is set to Count which means that a Poisson regression model will be estimated for each latent class. ID2 is used in the case ID box, indicating that there are multiple records for each case. The predictors TIME and LBASE are included in the Predictors box: TIME is specified as a nominal predictor which allows the estimated time trend to take on any pattern (such as a reduction during period 1 followed by an increase during period 2). Separate distinct time trends are identified for each class (i.e., random effects are estimated for TIME). 4

The predictor LBASE is treated as class independent which means that this estimate is restricted to be equal in each class. This is indicated by the symbol = which appears next to the variable LBASE in the Predictors box. Thus, a single fixed effect will be estimated for this. Later, we will restrict this parameter estimate to 1. Four additional variables (TRT, BASE, AGE and AVGBASE) are included in the Covariate box and treated as inactive as indicated by the symbol < I >. Thus, these variables will not affect the estimation of the parameters, but these variables will be cross-tabulated by class in the output. Notice in the Classes box that the symbol 1-4 appears. This indicates that we will estimate a 1-class, 2- class, 3-class and a 4-class model. Click Estimate to estimate these 4 models After the estimation has completed: Click the data file name growth1.sav to display the model summary table Right click in the model summary output table to retrieve the Model Summary Display Click to remove the checkmarks for L-square, df and p-value in the Model Summary Display Figure 4. Model Summary Output and Model Summary Display Notice that the BIC statistic is lowest for the 4-class model. We will examine the output for both the 3- and 4-class models here. The R 2 statistic is.84 for the 3-class and.92 for the 4-class model, and the misclassification error is approximately 6% for each of these models. Click Parameters associated with the 4-class model 5

Figure 5. Parameters Output for 4-class Model Note the following: the coefficient for LBASE is almost 1. the TIME estimates for class 1 are very close to zero, suggesting that this class shows no change from the baseline seizure rate. We will refer to the growth pattern for class 1 as no change. The estimate for the Intercept (alpha) for class 2 is a large negative value suggesting an overall reduction in the seizure rate for this class. Despite the small setback between periods 1 and 2 (indicated by the beta increasing from.0973 to.2976) we will refer to this as the improved class. Classes 3 and 4 show a large positive alpha, suggesting an overall increase (worsening) in the seizure rate. The betas for these classes show an abrupt increase in the number of seizures at some point during the follow-up period (time 1 for class 3 shows a beta of.3740; time 3 for class 4 shows a beta of.8410). We will see later that the cases classified into one of these unstable classes are the same ones that were identified as outliers in previous studies. We will refer to these classes as unstable. To display standard errors for these estimates, right click in the output table and select Standard Errors from the popup menu. 6

Figure 6. Parameters Output for 4-class model with Standard Errors For class 1, the estimates for the alpha and beta parameters are all less than 1 standard deviation. Later, we will further refine class 1 by setting the TIME estimates (betas) to zero. The estimates for alpha for the other classes are well above 2 standard errors. Note that the two right-most columns provide the Overall mean and standard deviation for the alpha and beta parameters. For LBASE the standard error for the Overall effect is 0 because this is a fixed effect (i.e., it is class independent). The other estimates have non-zero standard errors because they differ depending upon the class (i.e., they are treated as class dependent or random effects). Click on Profile to display the Profile output for the 4-class model. Figure 7. Profile Output for 4-class Model 7

The row labeled Class Size indicates that Class 1 (the no change class) represents 57% of the cases. Class 2 (those who improved) contains about 31% of the cases. The remaining 12% of the cases (classes 3 and 4) are the unstable ones. Click on the icon to the left of Model 3 to expand it Click on Profile to open the Profile output for the 3-class model Figure 8. Profile Output for 3-class Model. Notice that the class sizes for classes 1 and 2 are about the same as the corresponding classes for the 4- class model. Click Parameters to display the Parameters output for the 3-class model Figure 9. Parameters Output for 3-class Model 8

Notice that the alpha and beta estimates for classes 1 and 2 are also similar to those for the 4-class solution. Thus, we see that the classes 1 and 2 are basically identical in the 3- and 4-class solutions. To see that Class 3 contains both unstable classes from the 4-class solution: Click on Standard Classification Output for the 3-class model. Scroll down to identify the cases for which Modal = 3 These cases are #112, #126, #135, #207, #225, and #227. (The posterior membership probabilities for five of these cases are shown below): Figure 10. Standard Classification Output for the 3-class Model Compare this with the Standard Classification Output for the 4-class model Notice that the 6 cases assigned to class 3 in the 3-class solution have posterior membership probabilities of 1.000 of being in class 3. These cases are the same as those assigned to classes 3 or class 4 in the 4 class solution, again with posterior probabilities equal to 1.000. 9

These 6 cases are the same as those identified as outliers in the analyses of these data by Rabe-Hesketh and Skrondal (see http://fmwww.bc.edu/repec/usug2003/diag.pdf) and excluded from their study. While strict adherence to the BIC criteria would cause us to conclude that there are more than 4 distinct time trends for these data, for our purposes in this tutorial, we will focus here on the 3 class solution which identifies 3 classes that are of substantive interest. Class 1 shows no change, class 2 shows substantial improvement and class 3 contains a small number of unstable outliers who show a substantial increase at some follow-up time period. From a substantive perspective, we will be interested in determining to what extent treatment with the drug Progabide vs. the Placebo can explain the growth pattern exhibited by class 2 as opposed to that exhibited by class 1. We will now refine the 3-class solution by applying the following restrictions: Restrict the beta for LBASE to 1 Restrict the betas for TIME to 0 for class 1 (note that we might also choose to set alpha to 0 for this class) Double-click on Model 3 Click on Model Tab From the Model Tab, Select the row of 1s associated with LBASE Right-click to retrieve the restrictions popup menu Select Offset Figure 11. Model Tab for 3-class Model The 1 s change to * s. 10

To implement the zero restriction: Select the 1 associated with the Class 1 TIME effect Right-click to retrieve the restrictions popup menu Select No Effect The one changes to the symbol - to indicate that this effect is restricted to 0. Figure 12. Implementing the zero restriction Click Estimate Click Parameters to display the parameters output 11

Figure 13. Parameters Output for new 3-class Model The estimates are similar to Model 3 except that the restrictions have been incorporated into the model. To examine the treatment effect: Click Profile Figure 14. Profile Output 12

These column proportions show that most of those in the No Change class received the placebo while most of those in the Improved class received the drug treatment. From the means associated with AVGBASE we also see that the Unstable group had a substantially larger base rate (12.4825) than the other classes. To display the corresponding row proportions: Click ProbMeans Figure 15. ProbMeans Output We see that of those who received the drug, 47.31% are in the Improved class compared to only 14.32% of those who received the placebo. Similarly, only 42.28% showed No change compared to 73.09% of the placebo group. These numbers are in the direction expected if the drug reduced the seizure rate. To test a slightly different treatment effect by using Latent GOLD with an active covariate, see Tutorial #7B. To rename this model for future use: Click Model 5 to select it Model 5 is now highlighted. To enter the Edit Model Click the highlighted Model 5 Replace Model 5 by typing Final 3-class model 13

Figure 16. Renaming Model 5 To save this model for future use: Click Final 3-class model to select it From the File Menu Select Save Definition Click Save The model definition is saved as the file Final 3-class model.lgf 14 3/10/05