(C) Jamalludin Ab Rahman

Similar documents
Tutorial 3: MANOVA. Pekka Malo 30E00500 Quantitative Empirical Research Spring 2016

Profile Analysis. Intro and Assumptions Psy 524 Andrew Ainsworth

1. You want to find out what factors predict achievement in English. Develop a model that

One-Way Independent ANOVA

isc ove ring i Statistics sing SPSS

The SAGE Encyclopedia of Educational Research, Measurement, and Evaluation Multivariate Analysis of Variance

Overview of Lecture. Survey Methods & Design in Psychology. Correlational statistics vs tests of differences between groups

Small Group Presentations

Repeated Measures ANOVA and Mixed Model ANOVA. Comparing more than two measurements of the same or matched participants

Two-Way Independent ANOVA

Advanced ANOVA Procedures

Applications. DSC 410/510 Multivariate Statistical Methods. Discriminating Two Groups. What is Discriminant Analysis

Daniel Boduszek University of Huddersfield

Examining differences between two sets of scores

Using SPSS for Correlation

ANOVA in SPSS (Practical)

BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA

Two-Way Independent Samples ANOVA with SPSS

CHAPTER VI RESEARCH METHODOLOGY

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

Linear Regression in SAS

Stat Wk 9: Hypothesis Tests and Analysis

HANDOUTS FOR BST 660 ARE AVAILABLE in ACROBAT PDF FORMAT AT:

Analysis of Variance (ANOVA) Program Transcript

Statistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN

10. LINEAR REGRESSION AND CORRELATION

Section 6: Analysing Relationships Between Variables

Day 11: Measures of Association and ANOVA

CHAPTER TWO REGRESSION

LAB ASSIGNMENT 4 INFERENCES FOR NUMERICAL DATA. Comparison of Cancer Survival*

Lessons in biostatistics

SPSS output for 420 midterm study

Readings Assumed knowledge

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data

EPS 625 INTERMEDIATE STATISTICS TWO-WAY ANOVA IN-CLASS EXAMPLE (FLEXIBILITY)

ANALYSIS OF VARIANCE (ANOVA): TESTING DIFFERENCES INVOLVING THREE OR MORE MEANS

C.3 Repeated Measures ANOVA

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data

Supplementary Online Content

Table of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017

12/30/2017. PSY 5102: Advanced Statistics for Psychological and Behavioral Research 2

Unit 1 Exploring and Understanding Data

USER INSTITUTIONS OF OFFICE WORKERS IN GHANA: INVESTIGATING THE DIFFERENCES ABSTRACT

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto

Intro to SPSS. Using SPSS through WebFAS

Assignment #6. Chapter 10: 14, 15 Chapter 11: 14, 18. Due tomorrow Nov. 6 th by 2pm in your TA s homework box

Correlation and Regression

Principal Components Factor Analysis in the Literature. Stage 1: Define the Research Problem

MULTIPLE REGRESSION OF CPS DATA

Data Analysis in Practice-Based Research. Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine

bivariate analysis: The statistical analysis of the relationship between two variables.

CHAPTER ONE CORRELATION

Chapter 12: Analysis of covariance, ANCOVA

SPSS output for 420 midterm study

Part 8 Logistic Regression

Comparability of patient-reported health status: multi-country analysis of EQ-5D responses in patients with type 2 diabetes

Supplementary Online Content

Pharmacy Clinical Support Technician. Cardiac Sciences Program

Analysis of Variance: repeated measures

MANOVA OVER ANOVA - A BETTER OBJECTIVE IN BIOEQUIVALENCE STUDY

DESCRIPTION: Percentage of patients aged 18 years and older undergoing isolated CABG surgery who received an IMA graft

Worksheet 6 - Multifactor ANOVA models

Basic Biostatistics. Chapter 1. Content

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

Measures of Effect Size for Comparative Studies: Applications, Interpretations, and Limitations

STAT 503X Case Study 1: Restaurant Tipping

Statistics Guide. Prepared by: Amanda J. Rockinson- Szapkiw, Ed.D.

Link between effectiveness and cost data Costing was conducted prospectively on the same patient sample as that used in the effectiveness analysis.

Analysis of Covariance (ANCOVA)

f WILEY ANOVA and ANCOVA A GLM Approach Second Edition ANDREW RUTHERFORD Staffordshire, United Kingdom Keele University School of Psychology

PSY 216: Elementary Statistics Exam 4

Balloon angioplasty versus bypass grafting in the era of coronary stenting Ekstein S, Elami A, Merin G, Gotsman M S, Lotan C

Chapter 13: Factorial ANOVA

7 Statistical Issues that Researchers Shouldn t Worry (So Much) About

1. Family context. a) Positive Disengaged

Statistics as a Tool. A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations.

THE UNIVERSITY OF SUSSEX. BSc Second Year Examination DISCOVERING STATISTICS SAMPLE PAPER INSTRUCTIONS

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

MHSPHP Metrics Forum. Diabetes.

Investigating the robustness of the nonparametric Levene test with more than two groups

CHAPTER OBJECTIVES - STUDENTS SHOULD BE ABLE TO:

EXERCISE: HOW TO DO POWER CALCULATIONS IN OPTIMAL DESIGN SOFTWARE

investigate. educate. inform.

RESULTS. Chapter INTRODUCTION

Appendix. I. Map of former Car Doctors site. Soil for the greenhouse study was collected in the area indicated by the arrow.

Type of intervention Diagnosis. Economic study type Cost-effectiveness analysis.

Ecological Statistics

Confidence Intervals On Subsets May Be Misleading

Analysis of Variance (ANOVA)

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

Department of Allergy-Pneumonology, Penteli Children s Hospital, Palaia Penteli, Greece

MAKING THE NSQIP PARTICIPANT USE DATA FILE (PUF) WORK FOR YOU

Business Research Methods. Introduction to Data Analysis

Chapter 14: More Powerful Statistical Methods

Chapter 13: Introduction to Analysis of Variance

Daniel Boduszek University of Huddersfield

11/4/2010. represent the average scores for BOTH A1 & A2 at each level of B. The red lines. are graphing B Main Effects. Red line is the Average A1

Transcription:

SPSS Note The GLM Multivariate procedure is based on the General Linear Model procedure, in which factors and covariates are assumed to have a linear relationship to the dependent variable. Factors. Categorical predictors should be selected as factors in the model. Each level of a factor can have a different linear effect on the value of the dependent variable. Fixed effects factors are generally thought of as variables whose values of interest are all represented in the data file. Random effects factors are variables whose values in the data file can be considered a random sample from a larger population of values. They are useful for explaining excess variability in the dependent variable. Covariates. Scale predictors should be selected as covariates in the model. Within combinations of factor levels (or cells), values of covariates are assumed to be linearly correlated with values of the dependent variables. Interactions. By default, the GLM Univariate procedure produces a model with all factorial interactions, which means that each combination of factor levels can have a different linear effect on the dependent variable. Additionally, you may specify factor covariate interactions, if you believe that the linear relationship between a covariate and the dependent variable changes for different levels of a factor. For the purposes of testing hypotheses concerning parameter estimates, the GLM Multivariate procedure assumes: 1. The values of errors are independent of each other across observations and the independent variables in the model. Good study design generally avoids violation of this assumption. 2. The covariance of dependent variables is constant across cells. This can be particularly important when there are unequal cell sizes; that is, different numbers of observations across factor level combinations. 3. Across the dependent variables, the errors have a multivariate normal distribution with a mean of 0. (C) Jamalludin Ab Rahman 1

(C) Jamalludin Ab Rahman 2

(C) Jamalludin Ab Rahman 3

As part of the initial treatment for myocardial infarction (MI, or "heart attack"), a thrombolytic drug is usually administered to help clear the patient's arteries before surgery*. Three of the available drugs are alteplase, reteplase, and streptokinase. Alteplase and reteplase are newer, more expensive drugs, and a regional health care system wants to determine whether they are cost effective enough to adopt in place of streptokinase. One of the benefits of thrombolytic drugs is that surgery generally proceeds more smoothly, resulting in a shorter recovery period. If the newer drugs are effective, then patients given those drugs should have shorter lengths of stay in the hospital. Hopefully, the shorter lengths of stay will help to make up for the greater initial cost of the newer drugs. The data file patlos_sample.sav, contains the treatment records of a sample of patients who received thrombolytics during treatment for MI. This hypothetical data file contains the treatment records of a sample of patients who received thrombolytics during treatment for myocardial infarction (MI, or "heart attack"). Each case corresponds to a separate patient and records many variables related to their hospital stay. * PTCA = percutaneous transluminal coronary angioplasty, CABG = coronary artery bypass grafting (Which is better? http://www.ctsnet.org/doc/60) (C) Jamalludin Ab Rahman 4

5

1. Please open patlos_sample.sav 2. Click Analyze > General Linear Model > Multivariate (C) Jamalludin Ab Rahman 6

3. Transfer Length of stay & Treatment costs to Dependent Variables box 4. Transfer Clot dissolving drugs & Surgical treatment to Fixed Factor(s) box 5. Then click Contrasts button (C) Jamalludin Ab Rahman 7

We would like to create a indicator variable for thrombolytic drugs (clotsolv) but not to the surgery (proc). 6. Select clotsolv, choose Simple Contrast & put Reference Category as First, click Change Leave proc as None because we don t want to compare CABG to PTCA 7. Click Continue 8. Then click Options button 9. Check Descriptive statistics, Estimates of effect size, Observed power, Homogeneity tests & Spread vs. level plot 10. Click Continue 11. And finally OK button to see the result (C) Jamalludin Ab Rahman 8

Check the Between Subjects Factors table. Is the sample size correct? Observe what will be compared in the analysis. (C) Jamalludin Ab Rahman 9

MANOVA and MANCOVA assume that for each group (each cell in the factor design matrix) the covariance matrix is similar. Box's M tests this assumption. We want M not to be significant in order to conclude there is insufficient evidence that the covariance matrices differ. Here M is significant, so we have violated an assumption. That is, the length of stay & cost differ in their covariance matrices. Note, however, that the F test is quite robust even when there are departures from this assumption. (C) Jamalludin Ab Rahman 10

This table answers, Is each effect significant? The multivariate simultaneously tests each factor effect on the dependent groups. The multivariate formula for F is based not only on the sum of squares between and within groups, as in ANOVA, but also on the sum of crossproducts that is, it takes covariance into account as well as group means. The statistics: 1. Pillai's trace, also called Pillai Bartlett trace is a positive valued statistic. Increasing values of the statistic indicate effects that contribute more to the model.there is evidence that Pillai's trace is more robust than the other statistics to violations of model assumptions (Olson, 1974). 2. Wilks' Lambda is a positive valued statistic that ranges from 0 to 1. Decreasing values of the statistic indicate effects that contribute more to the model. Usually for more than 2 dependents. 3. Hotelling's trace is the sum of the eigenvalues of the test matrix. It is a positive valued statistic for which increasing values indicate effects that contribute more to the model. Hotelling's trace is always larger than Pillai's trace, but when the eigenvalues of the test matrix are small, these two statistics will be nearly equal. This indicates that the effect probably does not contribute much to the model. Usually use this statistics for 2 dependents model. 4. Roy's largest root is the largest eigenvalue of the test matrix. Thus, it is a positive valued statistic for which increasing values indicate effects that contribute more to the model. Roy's largest root is always less than or equal to Hotelling's trace. When these two statistics are equal, the effect is predominantly associated with just one of the dependent variables, there is a strong correlation between the dependent variables, or the effect does not contribute much to the model. How to interpret the result? The significance values of the main effects, CLOTSOLV and PROC, are less than 0.05, indicating that the effects contribute to the model. By contrast, their interaction effect does not contribute to the model. However, though CLOTSOLV does contribute to the model, since the value of Pillai's trace is close to Hotelling's trace, it doesn't contribute very much. A more straightforward way to see this is to look at partial eta squared. Eta squared is the proportion of the total variability in the dependent variable accounted for by the variation in the independent variable. The partial eta squared statistic reports the "practical" significance of each term, based upon the "ratio" of the variation accounted for by the effect to the sum of the variation accounted for by the effect and the variation left to error. Partial eta reports effect size (meaningfulness). Larger values of partial eta squared indicate a greater amount of variation accounted for by the model effect, to a maximum of 1. Psychometric borderline is 0.14 (max is 1). Since partial eta squared is very small (<0.14) for CLOTSOLV, it does not contribute very much to the model. By comparison, partial eta squared for PROC is quite large, which is to be expected. The surgical procedure a patient must undergo for MI treatment is going to have a much greater effect on the length of their hospital stay and final costs than the type of thrombolytic they receive. Looking at our initial objective of the analysis, it is enough for the multivariate tests to show that CLOTSOLV is significant, which means that the effect of at least one of the drugs is different from the others. The contrast results will show you where the differences are. (C) Jamalludin Ab Rahman 11

This table answers this question, Is the model significant for each dependent? The "corrected model" effect reflects the variation in the dependent attributed to other effects (except the intercept) in the model, after corrected by the mean. It is possible to have one or more significant univariate test on an effect without multivariate test to be significant & vice versa. (C) Jamalludin Ab Rahman 12

Level 2 vs. Level 1 (Reteplase vs. Streptokinase) The contrast estimates show that, on average, patients given reteplase spend 0.382 fewer days in the hospital and incur almost 600 dollars more in treatment costs than patients given streptokinase. Since the significance value for Length of stay is less than 0.05, you can conclude this difference is not due to chance. But the significance value for Treatment costs is greater than 0.10, so this difference may be entirely due to chance variation. Level 3 vs. Level 1 (Alteplase vs. Streptokinase) The contrast estimates show that, on average, patients given alteplase spend about half a day less in the hospital and incur slightly over 700 dollars more in treatment costs. Since the significance value for Length of stay is less than 0.05, you can conclude this difference is not due to chance. The significance value for Treatment costs is greater than 0.10, so this difference may be entirely due to chance variation. Conclusion: The contrast results show that alteplase and reteplase seem to reduce patient length of stay. Moreover, the reduction is enough to equalize the treatment costs, or at least bring the difference within the random variation. Thus, the model suggests that alteplase and reteplase should be used in place of streptokinase. However, before adopting this plan, you should check some tests of the model assumptions. (C) Jamalludin Ab Rahman 13

Box's M tests the null hypothesis that the observed covariance matrices of the dependent variables are equal across groups. The significance value of the test is less than 0.05, suggesting that the assumptions are not met, and thus the model results are suspect. However please be reminded that F test is quite robust for the deviation from this assumption. It is sensitive to large data files, meaning that when there are a large number of cases, it can detect even small departures from homogeneity. Moreover, it can be sensitive to departures from the assumption of normality. As an additional check of the diagonals of the covariance matrices, look at Levene's tests. Levene s Test Equality of Variances tests equality of the error variances across the cells defined by the combination of factor levels. The significance value for Length of stay is greater than 0.10, so there is no reason to believe that the equal variances assumption is violated for this variable. However, the significance value for the test of Treatment costs is less than 0.05, indicating that the equal variances assumption is violated for this variable. Like Box's M, Levene's test can be sensitive to large data files, so look at the spread vs. level plot for Treatment costs for visual confirmation. (C) Jamalludin Ab Rahman 14

The spread versus level plot is a scatterplot of the cell means and standard deviations. It provides a visual test of the equal variances assumption, with the added benefit of helping you to assess whether violations of the assumption are due to a relationship between the cell means and standard deviations. (C) Jamalludin Ab Rahman 15

This plot agrees with the result of Levene's test, that the equal variances assumption is violated for Treatment costs. There is also a clear positive relationship in the scatterplot, showing that as the cell mean increases, so does the variability. This relationship suggests a possible solution to the problem. Since Treatment costs is a positive valued variable, you could propose that the error term has a multiplicative, rather than additive, effect on cost. Instead of modeling Treatment costs, you will analyze Log cost. Now could you please run the similar test but this time using Log cost rather than Treatment costs. (C) Jamalludin Ab Rahman 16

Replace cost with lncost. (C) Jamalludin Ab Rahman 17

18

19

20

(C) Jamalludin Ab Rahman 21

The results for Log cost are slightly different from those for Treatment costs. The significance values for both contrasts are less than 0.05, suggesting that the differences in costs between the newer drugs and streptokinase are not due to chance. The contrast estimate for the difference between reteplase and streptokinase is 0.0217. Since you are looking at differences in logtransformed cost, this means that the ratio of costs is e 0.0217 = 1.0219. That is, the ratio of the costs incurred by patients given reteplase is approximately 2.19 % higher than the costs incurred by patients given streptokinase. If the typical MI patient incurs 25,000 to 35,000 dollars in treatment costs, that means reteplase patients incur, roughly, an extra 550 to 770 dollars in costs. The contrast estimate for the difference between alteplase and streptokinase is 0.0243. Since you are looking at differences in log transformed cost, this means that the ratio of costs is e 0.0243 = 1.0246. That is, the ratio of the costs incurred by patients given alteplase is approximately 2.43 % higher than the costs incurred by patients given streptokinase. If the typical MI patient incurs 25,000 to 35,000 dollars in treatment costs, that means alteplase patients incur, roughly, an extra 600 to 860 dollars in costs. These contrast results show that while alteplase and reteplase do seem to reduce patient length of stay, the reduction is not enough to equalize the treatment costs. Thus, determining whether alteplase and reteplase should be used in place of streptokinase will require further study of the cost of these drugs versus their effectiveness at increasing the success of surgery. (C) Jamalludin Ab Rahman 22

23

24