Understanding Regression Analysis
Understanding Regression Analysis Michael Patrick Allen Washington State University Pullman, Washington Plenum Press New York and London
Llbrary of Congress Cataloging-in-Publication Data Allen, Michael Patrick. Understanding regression analysis / Michael Patrick Allen. p. C~. Includes bibliographical references and indsx, ISBN 0-306-45648-6 1. Regression analysis, I. Title. QA278,2.A434 1997 519.5'36--dc21 97-20373 rip ISBN 0-306-45648-6 1997 Plenum Press, New York A Division of Plenum Publishing Corporation 233 Spring Street, New York, N. Y. 10013 http://www.plenum.com All rights reserved 10987654321 No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording, or otherwise, without written permission from the Publisher Printed in the United States of America
Preface This book is intended to introduce readers who know relatively little about statistics and who have only a basic understanding of mathematics to one of the most powerful techniques in all of statistics: regression analysis. Regression analysis is one of the most widely used techniques in statistics because it can provide answers to many different analytical questions. Although regression analysis is used widely, it is not always used correctly. Unfortunately, many researchers use regression analysis without understanding either the basic logic or the statistical intricacies of-this technique. It is hardly surprising, then, that regression analysis is often misused and its results misinterpreted. This introduction to regression analysis requires only a rudimentary understanding of mathematics and statistics. Indeed, it requires only that the reader has some familiarity with basic algebra and some understanding of probability. This book proceeds on the assumption that it is possible to understand much of regression analysis without fully comprehending all of the mathematical proofs and statistical theory that underpin this technique. Although it assumes that the reader has only a basic background in mathematics, this book does attempt to introduce the reader to many of the fundamental conventions and operations of matrix algebra. A familiarity with matrix algebra not only renders regression analysis more comprehensible, it also enables the reader to understand more advanced multivariate statistical techniques. The logic of this book is relatively straightforward. It reviews descriptive statistics using vector notation and presents the components of the simple regression model. Only then does it discuss the logic of sampling distributions and simple hypothesis testing. Next, it presents the basic operations of matrix algebra and develops the properties of the multiple regression model. It then goes on to examine the logic of testing compound hypotheses and the application of the regression model to the analysis of variance and analysis of covariance. Finally, it discusses a
vi PREFACE series of more advanced and specialized topics in regression analysis such as structural equation models and influence statistics. This book seeks to provide the reader with an intuitive understanding of regression analysis. Consequently, it defers any discussion of issues of statistical inference until the reader has gained some grasp of the purely descriptive properties of the regression model. Indeed, many of the more complicated issues associated with statistical inference are discussed in 0nly the most general terms. Mathematical derivations of some of the most important and most accessible properties of the regression model are presented in appendices. Moreover, this book does not include many of the equations typically found in discussions of regression analysis. One of the advantages of the regression model is that there are many mathematical relationships among the various quantities associated with this model. These mathematical properties may be useful to the trained statistician but they only serve to confuse the average reader. Finally, this book demonstrates the properties of the regression model using empirical examples that comprise only a few cases. As the reader will eventually discover, statistical analysis is not very useful in extremely small samples. However, examples comprising only a few cases enable the reader to follow more closely the calculations required to compute the various statistical measures. Any competent statistician is likely to find this introduction to regression analysis overly simplistic. Indeed, many topics and issues that are near and dear to the hearts of statisticians, such as the power of statistical tests, have simply been ignored in this book. The omission of these issues from this presentation of regression analysis is not an attempt to belittle their importance. However, most of these issues are not especially relevant to the reader who wishes to gain no more than a basic understanding of regression analysis. Moreover, it is hoped that this book will provide the reader a firm understanding of the basic logic of regression analysis and that this understanding will provide them with the intellectual foundation to pursue more advanced issues and topics in statistics. I am indebted to Scott Long, Tom Rotolo, Jean Stockard, and Lisa Barnett for their comments and suggestions on earlier drafts of this manuscript. They are not responsible, of course, for any remaining errors or omissions. Last but not least, I must acknowledge the contributions of many students whose responses to my lectures on regression analysis over the years gave shape to this book. I wish to acknowledge that the empirical data used to demonstrate path analysis were taken from Socioeconomic Background and Achievement by Otis Duncan, David Featherman, and Beverly Duncan (New York: Seminar Press, 1972).
Contents 1 The Origins and Uses of Regression Analysis 1 2 Basic Matrix Algebra: Manipulating Vectors 6 3 4 5 6 7 8 9 10 11 12 13 14 The Mean and Variance of a Variable Regression Models and Linear Functions Errors of Prediction and Least-Squares Estimation Least-Squares Regression and Covariance Covariance and Linear Independence Separating Explained and Error Variance Transforming Variables to Standard Form Regression Analysis with Standardized Variables Populations, Samples, and Sampling Distributions Sampling Distributions and Test Statistics Testing Hypotheses Using the t Test The t Test for the Simple Regression Coefficient 11 16 21 26 31 36 41 46 51 56 61 66
viii CONqENTS 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 More Matrix Algebra: Manipulating Matrices The Multiple Regression Model Normal Equations and Partial Regression Coefficients Partial Regression and Residualized Variables The Coefficient of Determination in Multiple Regression Standard Errors of Partial Regression Coefficients The Incremental Contributions of Variables Testing Simple Hypotheses Using the F Test Testing Compound Hypotheses Using the F Test Testing Hypotheses in Nested Regression Models Testing for Interaction in Multiple Regression Nonlinear Relationships and Variable Transformations Regression Analysis with Dummy Variables One-Way Analysis of Variance Using the Regression Model Two-Way Analysis of Variance Using the Regression Model Testing for Interaction in Analysis of Variance Analysis of Covariance Using the Regression Model 71 76 81 86 91 96 101 106 109 113 118 123 128 133 138 143 147
CONIEN~ ix 32 33 34 35 36 37 38 39 Interpreting Interaction in Analysis of Covariance Structural Equation Models and Path Analysis Computing Direct and Total Effects of Variables Model Specification in Regression Analysis Influential Cases in Regression Analysis The Problem of Multicollinearity Assumptions of Ordinary Least-Squares IEstimation Beyond Ordinary Regression Analysis Appendix A: Derivation of the Mean and Variance of a Linear Function Appendix B: Derivation of the Least-Squares Regression Coefficient Appendix C: Derivation of the Standard Error of the Simple Regression Coefficient Appendix D: Derivation of the Normal Equations Appendix E: Statistical Tables Suggested Readings Index 152 156 161 166 171 176 181 186 191 195 198 202 205 210 213
Understanding Regression Analysis