SPSS Note The GLM Multivariate procedure is based on the General Linear Model procedure, in which factors and covariates are assumed to have a linear relationship to the dependent variable. Factors. Categorical predictors should be selected as factors in the model. Each level of a factor can have a different linear effect on the value of the dependent variable. Fixed effects factors are generally thought of as variables whose values of interest are all represented in the data file. Random effects factors are variables whose values in the data file can be considered a random sample from a larger population of values. They are useful for explaining excess variability in the dependent variable. Covariates. Scale predictors should be selected as covariates in the model. Within combinations of factor levels (or cells), values of covariates are assumed to be linearly correlated with values of the dependent variables. Interactions. By default, the GLM Univariate procedure produces a model with all factorial interactions, which means that each combination of factor levels can have a different linear effect on the dependent variable. Additionally, you may specify factor covariate interactions, if you believe that the linear relationship between a covariate and the dependent variable changes for different levels of a factor. For the purposes of testing hypotheses concerning parameter estimates, the GLM Multivariate procedure assumes: 1. The values of errors are independent of each other across observations and the independent variables in the model. Good study design generally avoids violation of this assumption. 2. The covariance of dependent variables is constant across cells. This can be particularly important when there are unequal cell sizes; that is, different numbers of observations across factor level combinations. 3. Across the dependent variables, the errors have a multivariate normal distribution with a mean of 0. (C) Jamalludin Ab Rahman 1
(C) Jamalludin Ab Rahman 2
(C) Jamalludin Ab Rahman 3
As part of the initial treatment for myocardial infarction (MI, or "heart attack"), a thrombolytic drug is usually administered to help clear the patient's arteries before surgery*. Three of the available drugs are alteplase, reteplase, and streptokinase. Alteplase and reteplase are newer, more expensive drugs, and a regional health care system wants to determine whether they are cost effective enough to adopt in place of streptokinase. One of the benefits of thrombolytic drugs is that surgery generally proceeds more smoothly, resulting in a shorter recovery period. If the newer drugs are effective, then patients given those drugs should have shorter lengths of stay in the hospital. Hopefully, the shorter lengths of stay will help to make up for the greater initial cost of the newer drugs. The data file patlos_sample.sav, contains the treatment records of a sample of patients who received thrombolytics during treatment for MI. This hypothetical data file contains the treatment records of a sample of patients who received thrombolytics during treatment for myocardial infarction (MI, or "heart attack"). Each case corresponds to a separate patient and records many variables related to their hospital stay. * PTCA = percutaneous transluminal coronary angioplasty, CABG = coronary artery bypass grafting (Which is better? http://www.ctsnet.org/doc/60) (C) Jamalludin Ab Rahman 4
5
1. Please open patlos_sample.sav 2. Click Analyze > General Linear Model > Multivariate (C) Jamalludin Ab Rahman 6
3. Transfer Length of stay & Treatment costs to Dependent Variables box 4. Transfer Clot dissolving drugs & Surgical treatment to Fixed Factor(s) box 5. Then click Contrasts button (C) Jamalludin Ab Rahman 7
We would like to create a indicator variable for thrombolytic drugs (clotsolv) but not to the surgery (proc). 6. Select clotsolv, choose Simple Contrast & put Reference Category as First, click Change Leave proc as None because we don t want to compare CABG to PTCA 7. Click Continue 8. Then click Options button 9. Check Descriptive statistics, Estimates of effect size, Observed power, Homogeneity tests & Spread vs. level plot 10. Click Continue 11. And finally OK button to see the result (C) Jamalludin Ab Rahman 8
Check the Between Subjects Factors table. Is the sample size correct? Observe what will be compared in the analysis. (C) Jamalludin Ab Rahman 9
MANOVA and MANCOVA assume that for each group (each cell in the factor design matrix) the covariance matrix is similar. Box's M tests this assumption. We want M not to be significant in order to conclude there is insufficient evidence that the covariance matrices differ. Here M is significant, so we have violated an assumption. That is, the length of stay & cost differ in their covariance matrices. Note, however, that the F test is quite robust even when there are departures from this assumption. (C) Jamalludin Ab Rahman 10
This table answers, Is each effect significant? The multivariate simultaneously tests each factor effect on the dependent groups. The multivariate formula for F is based not only on the sum of squares between and within groups, as in ANOVA, but also on the sum of crossproducts that is, it takes covariance into account as well as group means. The statistics: 1. Pillai's trace, also called Pillai Bartlett trace is a positive valued statistic. Increasing values of the statistic indicate effects that contribute more to the model.there is evidence that Pillai's trace is more robust than the other statistics to violations of model assumptions (Olson, 1974). 2. Wilks' Lambda is a positive valued statistic that ranges from 0 to 1. Decreasing values of the statistic indicate effects that contribute more to the model. Usually for more than 2 dependents. 3. Hotelling's trace is the sum of the eigenvalues of the test matrix. It is a positive valued statistic for which increasing values indicate effects that contribute more to the model. Hotelling's trace is always larger than Pillai's trace, but when the eigenvalues of the test matrix are small, these two statistics will be nearly equal. This indicates that the effect probably does not contribute much to the model. Usually use this statistics for 2 dependents model. 4. Roy's largest root is the largest eigenvalue of the test matrix. Thus, it is a positive valued statistic for which increasing values indicate effects that contribute more to the model. Roy's largest root is always less than or equal to Hotelling's trace. When these two statistics are equal, the effect is predominantly associated with just one of the dependent variables, there is a strong correlation between the dependent variables, or the effect does not contribute much to the model. How to interpret the result? The significance values of the main effects, CLOTSOLV and PROC, are less than 0.05, indicating that the effects contribute to the model. By contrast, their interaction effect does not contribute to the model. However, though CLOTSOLV does contribute to the model, since the value of Pillai's trace is close to Hotelling's trace, it doesn't contribute very much. A more straightforward way to see this is to look at partial eta squared. Eta squared is the proportion of the total variability in the dependent variable accounted for by the variation in the independent variable. The partial eta squared statistic reports the "practical" significance of each term, based upon the "ratio" of the variation accounted for by the effect to the sum of the variation accounted for by the effect and the variation left to error. Partial eta reports effect size (meaningfulness). Larger values of partial eta squared indicate a greater amount of variation accounted for by the model effect, to a maximum of 1. Psychometric borderline is 0.14 (max is 1). Since partial eta squared is very small (<0.14) for CLOTSOLV, it does not contribute very much to the model. By comparison, partial eta squared for PROC is quite large, which is to be expected. The surgical procedure a patient must undergo for MI treatment is going to have a much greater effect on the length of their hospital stay and final costs than the type of thrombolytic they receive. Looking at our initial objective of the analysis, it is enough for the multivariate tests to show that CLOTSOLV is significant, which means that the effect of at least one of the drugs is different from the others. The contrast results will show you where the differences are. (C) Jamalludin Ab Rahman 11
This table answers this question, Is the model significant for each dependent? The "corrected model" effect reflects the variation in the dependent attributed to other effects (except the intercept) in the model, after corrected by the mean. It is possible to have one or more significant univariate test on an effect without multivariate test to be significant & vice versa. (C) Jamalludin Ab Rahman 12
Level 2 vs. Level 1 (Reteplase vs. Streptokinase) The contrast estimates show that, on average, patients given reteplase spend 0.382 fewer days in the hospital and incur almost 600 dollars more in treatment costs than patients given streptokinase. Since the significance value for Length of stay is less than 0.05, you can conclude this difference is not due to chance. But the significance value for Treatment costs is greater than 0.10, so this difference may be entirely due to chance variation. Level 3 vs. Level 1 (Alteplase vs. Streptokinase) The contrast estimates show that, on average, patients given alteplase spend about half a day less in the hospital and incur slightly over 700 dollars more in treatment costs. Since the significance value for Length of stay is less than 0.05, you can conclude this difference is not due to chance. The significance value for Treatment costs is greater than 0.10, so this difference may be entirely due to chance variation. Conclusion: The contrast results show that alteplase and reteplase seem to reduce patient length of stay. Moreover, the reduction is enough to equalize the treatment costs, or at least bring the difference within the random variation. Thus, the model suggests that alteplase and reteplase should be used in place of streptokinase. However, before adopting this plan, you should check some tests of the model assumptions. (C) Jamalludin Ab Rahman 13
Box's M tests the null hypothesis that the observed covariance matrices of the dependent variables are equal across groups. The significance value of the test is less than 0.05, suggesting that the assumptions are not met, and thus the model results are suspect. However please be reminded that F test is quite robust for the deviation from this assumption. It is sensitive to large data files, meaning that when there are a large number of cases, it can detect even small departures from homogeneity. Moreover, it can be sensitive to departures from the assumption of normality. As an additional check of the diagonals of the covariance matrices, look at Levene's tests. Levene s Test Equality of Variances tests equality of the error variances across the cells defined by the combination of factor levels. The significance value for Length of stay is greater than 0.10, so there is no reason to believe that the equal variances assumption is violated for this variable. However, the significance value for the test of Treatment costs is less than 0.05, indicating that the equal variances assumption is violated for this variable. Like Box's M, Levene's test can be sensitive to large data files, so look at the spread vs. level plot for Treatment costs for visual confirmation. (C) Jamalludin Ab Rahman 14
The spread versus level plot is a scatterplot of the cell means and standard deviations. It provides a visual test of the equal variances assumption, with the added benefit of helping you to assess whether violations of the assumption are due to a relationship between the cell means and standard deviations. (C) Jamalludin Ab Rahman 15
This plot agrees with the result of Levene's test, that the equal variances assumption is violated for Treatment costs. There is also a clear positive relationship in the scatterplot, showing that as the cell mean increases, so does the variability. This relationship suggests a possible solution to the problem. Since Treatment costs is a positive valued variable, you could propose that the error term has a multiplicative, rather than additive, effect on cost. Instead of modeling Treatment costs, you will analyze Log cost. Now could you please run the similar test but this time using Log cost rather than Treatment costs. (C) Jamalludin Ab Rahman 16
Replace cost with lncost. (C) Jamalludin Ab Rahman 17
18
19
20
(C) Jamalludin Ab Rahman 21
The results for Log cost are slightly different from those for Treatment costs. The significance values for both contrasts are less than 0.05, suggesting that the differences in costs between the newer drugs and streptokinase are not due to chance. The contrast estimate for the difference between reteplase and streptokinase is 0.0217. Since you are looking at differences in logtransformed cost, this means that the ratio of costs is e 0.0217 = 1.0219. That is, the ratio of the costs incurred by patients given reteplase is approximately 2.19 % higher than the costs incurred by patients given streptokinase. If the typical MI patient incurs 25,000 to 35,000 dollars in treatment costs, that means reteplase patients incur, roughly, an extra 550 to 770 dollars in costs. The contrast estimate for the difference between alteplase and streptokinase is 0.0243. Since you are looking at differences in log transformed cost, this means that the ratio of costs is e 0.0243 = 1.0246. That is, the ratio of the costs incurred by patients given alteplase is approximately 2.43 % higher than the costs incurred by patients given streptokinase. If the typical MI patient incurs 25,000 to 35,000 dollars in treatment costs, that means alteplase patients incur, roughly, an extra 600 to 860 dollars in costs. These contrast results show that while alteplase and reteplase do seem to reduce patient length of stay, the reduction is not enough to equalize the treatment costs. Thus, determining whether alteplase and reteplase should be used in place of streptokinase will require further study of the cost of these drugs versus their effectiveness at increasing the success of surgery. (C) Jamalludin Ab Rahman 22
23
24