A p p l i e d S T A T I S T I C S From Bivariate Through Multivariate Techniques R e b e c c a M. W a r n e r University of New Hampshire DAI HOC THAI NGUYEN TRUNG TAM HOC LIEU *)SAGE Publications '55' Los Angeles London New Delhi Singapore
Copyright 2008 by Sage Publications, Inc. All rights reserved. No part of this book may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage and retrieval system, without permission in writing from the publisher. For information: Sage Publications, Inc. 2455 Teller Road Thousand Oaks, California 91320 E-mail: order@sagepub.com Sage Publications Ltd. 1 Oliver's Yard 55 City Road London EC1Y ISP United Kingdom Sage Publications India Pvt. Ltd. B 1/11 Mohan Cooperative Industrial Area Mathura Road, New Delhi 110 044 India Sage Publications Asia-Pacific Pte. Ltd. 33 Pekin Street #02-01 Far East Square Singapore 048763 Printed in the United States of America Library of Congress Cataloging-in-Publication Data Warner, Rebecca M. Applied statistics: from bivariate through multivariate techniques/rebecca M. Warner. p. cm. Includes bibliographical references and index. ISBN-13:978-0-7619-2772-3 (cloth) 1. Social sciences Statistical methods. 2. Psychology Statistical methods. 3. Multivariate analysis. I. Title. HA31.35.W37 2007 519.5'35 dc22 2006033700 This book is printed on acid-free paper. 09 10 11 10 9 8 7 6 5 4 3 2 Acquisitions Editor: Associate Editor: Editorial Assistant: Production Editor: Copy Editors: Typesetter: Indexer: Cover Designer: Marketing Manager: Vicki Knight Sean Connelly Lauren Habib Laureen A. Shea Linda Gray and QuADS C&M Digitals (P) Ltd. Will Ragsdale Candice Harman Stephanie Adams
C o n t e n t s Preface xxj Acknowledgments xxv Chapter 1. Review of Basic Concepts 1 1.1 Introduction 1 1.2 A Simple Example of a Research Problem 2 1.3 Discrepancies Between Real and Ideal Research Situations 2 1.4 Samples and Populations 3 1.5 Descriptive Versus Inferential Uses of Statistics 4 1.6 Levels of Measurement and Types of Variables 6 1.7 The Normal Distribution 10 1.8 Research Design 15 1.8.1 Experimental Design 16 1.8.2 Quasi-Experimental Design 19 1.8.3 Nonexperimental Research Design 19 1.8.4 Between-Subjects Versus Within-Subjects or Repeated Measures 20 1.9 Parametric Versus Nonparametric Statistics 21 1.10 Additional Implicit Assumptions 25 1.11 Selection of an Appropriate Bivariate Analysis 26 1.12 Summary 29 Comprehension Questions 37 Chapter 2. Introduction to SPSS: Basic Statistics, Sampling Error, and Confidence Intervals 41 2.1 Introduction 41 2.2 Research Example: Description of a Sample of HR Scores 43 2.3 Sample Mean (M) 48 2.4 Sum of Squared Deviations and Sample Variance (s 2 ) 54 2.5 Degrees of Freedom (df) for a Sample Variance 55 2.6 Why Is There Variance? 57 2.7 Sample Standard Deviation (s) 58 2.8 Assessment of Location of a Single X Score Relative to a Distribution of Scores 59
2.9 A Shift in Level of Analysis: The Distribution of Values of M Across Many Samples From the Same Population 62 2.10 An Index of Amount of Sampling Error: The Standard Error of the Mean (a M ) 63 2.11 Effect of Sample Size (AO on the Magnitude of the Standard Error (a u ) 64 2.12 Sample Estimate of the Standard Error of the Mean (SE M ) 67 2.13 The Family of f Distributions 70 2.14 Confidence Intervals 71 2.14.1 The General Form of a CI 71 2.14.2 Setting Up a CI for M When a Is Known 71 2.14.3 Setting Up a CI for M When the Value of a Is Not Known 73 2.14.4 Reporting CIs 74 2.15 Summary 75 Appendix on SPSS 76 Comprehension Questions 77 Chapter 3. Statistical Significance Testing 81 3.1 The Logic of Null Hypothesis Significance Testing (NHST) 81 3.2 Type I Versus Type II Error 84 3.3 Formal NHST Procedures: The z Test for a Null Hypothesis About One Population Mean 85 3.3.1 Obtaining a Random Sample From the Population of Interest 86 3.3.2 Formulating a Null Hypothesis (H 0 ) for the One-Sample z Test 86 3.3.3 Formulating an Alternative Hypothesis (#,) 87 3.3.4 Choosing a Nominal Alpha Level 89 3.3.5 Determining the Range of z Scores Used to Reject H 0 89 3.3.6 Determining the Range of Values of M Used to Reject H 0 90 3.3.7 Reporting an "Exact"/) Value 92 3.4 Common Research Practices Inconsistent With Assumptions and Rules for NHST 94 3.4.1 Use of Convenience Samples 95 3.4.2 Modification of Decision Rules After the Initial Decision 95 3.4.3 Conducting Large Numbers of Significance Tests 96 3.4.4 Impact of Violations of Assumptions on Risk of Type I Error 96 3.5 Strategies to Limit Risk of Type I Error 97 3.5.1 Use of Random and Representative Samples 97 3.5.2 Adherence to the Rules for NHST 97 3.5.3 Limit the Number of Significance Tests 97 3.5.4 Bonferroni-Corrected Per-Comparison Alpha Levels 98 3.5.5 Replication of Outcome in New Samples 98 3.5.6 Cross-Validation 99 3.6 Interpretation of Results 100 3.6.1 Interpretation of Null Results 100 3.6.2 Interpretation of Statistically Significant Results 101 3.7 When Is a f Test Used Instead of a z Test? 102 3.8 Effect Size 103 3.8.1. Evaluation of "Practical" (vs. Statistical) Significance 103 3.8.2 Formal Effect Size Index: Cohen's Little d 104
3.9 Statistical Power Analysis 106 3.10 Numerical Results for a One-Sample t Test Obtained From SPSS 115 3.11 Guidelines for Reporting Results 118 3.12 Summary 119 3.12.1 Logical Problems With NHST 119 3.12.2 Other Applications of the t Ratio 120 3.12.3 What Does It Mean to Say > <.05"? 122 Comprehension Questions 123 Chapter 4. Preliminary Data Screening 125 4.1 Introduction: Problems in Real Data 125 4.2 Quality Control During Data Collection 126 4.3 Example of an SPSS Data Worksheet 126 4.4 Identification of Errors and Inconsistencies 132 4.5 Missing Values 133 4.6 Empirical Example of Data Screening for Individual Variables 135 4.6.1 Frequency Distribution Tables 135 4.6.2 Removal of Impossible or Extreme Scores 137 4.6.3 Bar Chart for a Categorical Variable 140 4.6.4 Histogram for a Quantitative Variable 141 4.7 Identification and Handling of Outliers 152 4.8 Screening Data for Bivariate Analyses 156 4.8.1 Bivariate Data Screening for Two Categorical Variables 156 4.8.2 Bivariate Data Screening for One Categorical and One Quantitative Variable 160 4.8.3 Bivariate Data Screening for Two Quantitative Variables 162 4.9 Nonlinear Relations 166 4.10 Data Transformations 169 4.11 Verifying That Remedies Had the Desired Effects 172 4.12 Multivariate Data Screening 173 4.13 Reporting Preliminary Data Screening 173 4.14 Summary and Checklist for Data Screening 176 Comprehension Questions 179 Chapter 5. Comparing Group Means Using the Independent Samples t Test 181 5.1 Research Situations Where the Independent Samples f Test Is Used 181 5.2 A Hypothetical Research Example 182 5.3 Assumptions About the Distribution of Scores on the Quantitative Dependent Variable 185 5.3.1 Quantitative, Approximately Normally Distributed 185 5.3.2 Equal Variances of Scores Across Groups (the Homogeneity of Variance Assumption) 185 5.3.3 Independent Observations Both Between and Within Groups 186 5.3.4 Robustness to Violations of Assumptions 186 5.4 Preliminary Data Screening 188 5.5 Issues in Designing a Study 191 5.6 Formulas for the Independent Samples t Test 191
5.6.1 The Pooled Variances f Test 193 5.6.2 Computation of the Separate Variances t Test and Its Adjusted df 195 5.6.3 Evaluation of Statistical Significance of a t Ratio 195 5.6.4 Confidence Interval Around M, - M 2 197 5.7 Conceptual Basis: Factors That Affect the Size of the t Ratio 197 5.7.1 Design Decisions That Affect the Difference Between Group Means,M t -M 2 198 5.7.2 Design Decisions That Affect Pooled Within-Group Variance, 199 5.7.3 Design Decisions About Sample Sizes, n, and n 2 200 5.7.4 Summary: Factors That Influence the Size of t 200 5.8 Effect Size Indexes for r 201 5.8.1 Eta Squared (if) 201 5.8.2 Cohen's d 202 5.8.3 Point Biserial r (rj 202 5.9 Statistical Power and Decisions About Sample Size for the Independent Samples t Test 203 5.10 Describing the Nature of the Outcome 205 5.11 SPSS Output and Model Results Section 206 5.12 Summary 209 Comprehension Questions 211 Chapter 6. One-Way Between-Subjects Analysis of Variance 215 6.1 Research Situations Where One-Way Between-Subjects Analysis of Variance (ANOVA) Is Used 215 6.2 Hypothetical Research Example 217 6.3 Assumptions About Scores on the Dependent Variable for One-Way Between-S ANOVA 217 6.4 Issues in Planning a Study 218 6.5 Data Screening 220 6.6 Partition of Scores Into Components 221 6.7 Computations for the One-Way Between-S ANOVA 225 6.7.1 Comparison Between the Independent Samples t Test and One-Way Between-S ANOVA 225 6.7.2 Summarizing Information About Distances Between Group Means: Computing MS bamm 227 6.7.3 Summarizing Information About Variability of Scores Within Groups: Computing MS^^ 228 6.7.4 The F Ratio: Comparing MS^^ With MS^ 230 6.7.5 Patterns of Scores Related to the Magnitudes of MS^^ and AfS wilhin 231 6.7.6 Expected Value of F When H 0 Is True 233 6.7.7 Confidence Intervals (CIs) for Group Means 234 6.8 Effect-Size Index for One-Way Between-S ANOVA 234 6.9 Statistical Power Analysis for One-Way Between-S ANOVA 235 6.10 Nature of Differences Among Group Means 236 6.10.1 Planned Contrasts 236 6.10.2 Post Hoc or "Protected" Tests 239 6.11 SPSS Output and Model Results 241
6.12 Summary 248 Comprehension Questions 251 Chapter 7. Bivariate Pearson Correlation 255 7.1 Research Situations Where Pearson r Is Used 255 7.2 Hypothetical Research Example 260 7.3 Assumptions for Pearson r 261 7.4 Preliminary Data Screening 264 7.5 Design Issues in Planning Correlation Research 269 7.6 Computation of Pearson r 269 7.7 Statistical Significance Tests for Pearson r 271 7.7.1 Testing the Hypothesis That p XY = 0 271 7.7.2 Testing Other Hypotheses About p XY 273 7.7.3 Assessing Differences Between Correlations 275 7.7.4 Reporting Many Correlations: Need to Control Inflated Risk of Type I Error 277 7.7.4.1 Limiting the Number of Correlations 277 7.7.4.2 Cross-Validation of Correlations 278 7.7.4.3. Bonferroni Procedure: A More Conservative Alpha Level for Tests of Individual Correlations 278 7.8 Setting Up CIs for Correlations 278 7.9 Factors That Influence the Magnitude and Sign of Pearson r 279 7.9.1 Pattern of Data Points in the X, Y Scatter Plot 279 7.9.2 Biased Sample Selection: Restricted Range or Extreme Groups 281 7.9.3 Correlations for Samples That Combine Groups 284 7.9.4 Control of Extraneous Variables 284 7.9.5 Disproportionate Influence by Bivariate Outliers 285 7.9.6 Shapes of Distributions of X and Y 287 7.9.7 Curvilinear Relations 290 7.9.8 Transformations of Data 290 7.9.9 Attenuation of Correlation Due to Unreliability of Measurement 291 7.9.10 Part-Whole Correlations 292 7.9.11 Aggregated Data 292 7.10 Pearson r and r 2 as Effect Size Indexes 292 7.11 Statistical Power and Sample Size for Correlation Studies 294 7.12 Interpretation of Outcomes for Pearson r 295 7.12.1 "Correlation Does Not Necessarily Imply Causation" (So What Does It Imply?) 295 7.12.2 Interpretation of Significant Pearson r Values 296 7.12.3 Interpretation of a Nonsignificant Pearson r Value 297 7.13 SPSS Output and Model Results Write-Up 297 7.14 Summary 304 Comprehension Questions 305 Chapter 8. Alternative Correlation Coefficients 309 8.1 Correlations for Different Types of Variables 309 8.2 Two Research Examples 312
8.3 Correlations for Rank or Ordinal Scores 317 8.4 Correlations for True Dichotomies 318 8.4.1 Point Biserial r(r pb ) 319 8.4.2 Phi Coefficient (0) 321 8.5 Correlations for Artificially Dichotomized Variables 323 8.5.1 Biserial r(r b ) 323 8.5.2 Tetrachoric r(r tet ) 324 8.6 Assumptions and Data Screening for Dichotomous Variables 324 8.7 Analysis of Data: Dog Ownership and Survival After a Heart Attack 325 8.8 Chi-Square Test of Association (Computational Methods for Tables of Any Size) 329 8.9 Other Measures of Association for Contingency Tables 329 8.10 SPSS Output and Model Results Write-Up 330 8.11 Summary 334 Comprehension Questions 335 Chapter 9. Bivariate Regression 338 9.1 Research Situations Where Bivariate Regression Is Used 338 9.2 A Research Example: Prediction of Salary From Years of Job Experience 340 9.3 Assumptions and Data Screening 342 9.4 Issues in Planning a Bivariate Regression Study 342 9.5 Formulas for Bivariate Regression 344 9.6 Statistical Significance Tests for Bivariate Regression 347 9.7 Setting Up Confidence Intervals Around Regression Coefficients 350 9.8 Factors That Influence the Magnitude and Sign of b 351 9.8.1 Factors That Affect the Size of the b Coefficient 352 9.8.2 Comparison of Coefficients for Different Predictors or for Different Groups 352 9.9 Effect Size/Partition of Variance in Bivariate Regression 353 9.10 Statistical Power 356 9.11 Raw Score Versus Standard Score Versions of the Regression Equation 356 9.12 Removing the Influence of X From the Y Variable by Looking at Residuals From Bivariate Regression 357 9.13 Empirical Example Using SPSS 358 9.13.1 Information to Report From a Bivariate Regression 365 9.14 Summary 369 Comprehension Questions 374 Chapter 10. Adding a Third Variable: Preliminary Exploratory Analyses 378 10.1 Three-Variable Research Situations 378 10.2 First Research Example 380 10.3 Exploratory Statistical Analyses for Three-Variable Research Situations 381 10.4 Separate Analysis of X v Y Relationship for Each Level of the Control Variable X 2 382 10.5 Partial Correlation Between X, and Y, Controlling for X 2 387 10.6 Understanding Partial Correlation as the Use of Bivariate Regression to Remove Variance Predictable by X 2 From Both X, and Y 389
10.7 Computation of Partial r From Bivariate Pearson Correlations 390 10.8 Intuitive Approach to Understanding Partial r 394 10.9 Significance Tests, Confidence Intervals, and Statistical Power for Partial Correlations 395 10.9.1 Statistical Significance of Partial r 395 10.9.2 Confidence Intervals for Partial r 395 10.9.3 Effect Size, Statistical Power, and Sample Size Guidelines for Partial r 395 10.10 Interpretation of Various Outcomes for r Y] 2 and r Y1 396 10.11 Two-Variable Causal Models 399 10.12 Three-Variable Models: Some Possible Patterns of Association Among X,, Y, and X 2 401 10.12.1 X, and YAre Not Related Whether You Control for X 2 or Not 402 10.12.2 X 2 Is Irrelevant to the X,, Y Relationship 403 10.12.3 When You Control for X 2, the X,, Y Correlation Drops to 0 or Close to 0 403 10.12.3.1 Completely Spurious Correlation 404 10.12.3.2 Completely Mediated Association Between X, and Y 405 10.12.4 When You Control for X 2, the Correlation Between X, and Y Becomes Smaller (but Does Not Drop to 0 and Does Not Change Sign) 407 10.12.4.1 X 2 Partly Accounts for the X,, Y Association, or X, and X 2 Are Correlated Predictors of Y 407 10.12.4.2X 2 Partly Mediates thex,, YRelationship 408 10.12.5 When You Control for X 2, the X,, Y Correlation Becomes Larger Than r 1Y or Becomes Opposite in Sign Relative to r 1Y 409 10.12.5.1 Suppression of Error Variance in a Predictor Variable 410 10.12.5.2 A Second Type of Suppression 413 10.12.6 "None of the Above" 414 10.13 Mediation Versus Moderation 415 10.13.1 Preliminary Analysis to Identify Possible Moderation 415 10.13.2 Preliminary Analysis to Detect Possible Mediation 417 10.13.3 Experimental Tests for Mediation Models 417 10.14 Model Results 418 10.15 Summary 419 Comprehension Questions 421 Chapter 11. Multiple Regression With Two Predictor Variables 423 11.1 Research Situations Involving Regression With Two Predictor Variables 423 11.2 Hypothetical Research Example 425 11.3 Graphic Representation of Regression Plane 426 11.4 Semipartial (or "Part") Correlation 427 11.5 Graphic Representation of Partition of Variance in Regression With Two Predictors 428
11.6 Assumptions for Regression With Two Predictors 432 11.7 Formulas for Regression Coefficients, Significance Tests, and Confidence Intervals 435 11.7.1 Formulas for Standard Score Beta Coefficients 435 11.7.2 Formulas for Raw Score (b) Coefficients 437 11.7.3 Formula for Multiple R and Multiple R 2 438 11.7.4 Test of Significance for Overall Regression: Overall F Test for H 0 : R = 0 438 11.7.5 Test of Significance for Each Individual Predictor: f Test for H 0 : b- = 0 439 11.7.6 Confidence Interval for Each b Slope Coefficient 439 11.8 SPSS Regression Results 440 11.9 Conceptual Basis: Factors That Affect the Magnitude and Sign of P and b Coefficients in Multiple Regression With Two Predictors 441 11.10 Tracing Rules for Causal Model Path Diagrams 445 11.11 Comparison of Equations for p\ b, pr, and sr 447 11.12 Nature of Predictive Relationships 448 11.13 Effect Size Information in Regression With Two Predictors 449 11.13.1 Effect Size for Overall Model 449 11.13.2 Effect Size for Individual Predictor Variables 449 11.14 Statistical Power 450 11.15 Issues in Planning a Study 451 11.15.1 Sample Size 451 11.15.2 Selection of Predictor Variables 451 11.15.3 Multicollinearity Among Predictors 452 11.15.4 Range of Scores 453 11.16 Use of Regression With Two Predictors to Test Mediated Causal Models 453 11.17 Results 456 11.18 Summary 458 Comprehension Questions 462 Chapter 12. Dummy Predictor Variables and Interaction Terms in Multiple Regression 465 12.1 Research Situations Where Dummy Predictor Variables Can Be Used 465 12.2 Empirical Example 467 12.3 Screening for Violations of Assumptions 470 12.4 Issues in Planning a Study 472 12.5 Parameter Estimates and Significance Tests for Regressions With Dummy Variables 473 12.6 Group Mean Comparisons Using One-Way Between-S ANOVA 474 12.6.1 Gender Differences in Mean Salary 474 12.6.2 College Differences in Mean Salary 475 12.7 Three Methods of Coding for Dummy Variables 478 12.7.1 Regression With Dummy-Coded Dummy Predictor Variables 478 12.7.1.1 Two-Group Example With a Dummy-Coded Dummy Variable 478 12.7.1.2 Multiple-Group Example With Dummy-Coded Dummy Variables 481