A Handbook of Statistical Analyses using SAS SECOND EDITION Geoff Der Statistician MRC Social and Public Health Sciences Unit University of Glasgow Glasgow, Scotland and Brian S. Everitt Professor of Statistics in Behavioural Science Institute of Psychiatry University of London London, U.K. SUB Gottingen 7 213 590 794 2001 A 13946 CHAPMAN & HALL/CRC Boca Raton London New York Washington, D.C.
Contents 1 A Brief Introduction to SAS 1 1.1 Introduction 1 1.2 The Microsoft Windows User Interface 2 1.2.1 The Editor Window 3 1.2.2 The Log and Output Windows 4 1.2.3 Other Menus 4 1.3 The SAS Language 5 1.3.1 All SAS Statements Must End with a Semicolon 6 1.3.2 Program Steps 6 1.3-3 Variable Names and Data Set Names 7 1.3.4 Variable Lists 7 1.4 The Data Step 11 1.4.1 Creating SAS Data Sets from Raw Data 11 1.4.2 The Data Statement 12 1.4.3 The Infile Statement 12 " 1.4.4 The Input Statement 13 1.4.5 Reading Data from an Existing SAS Data Set 17 1.4.6 Storing SAS Data Sets on Disk 17 1.5 Modifying SAS Data 18 1.5.1 Creating and Modifying Variables 18 1.5.2 Deleting Variables 21 1.5.3 Deleting Observations 21 1.5.4 Subsetting Data Sets 22 1.5.5 Concatenating and Merging Data Sets 22 1.5.6 Merging, Data Sets: Adding Variables 23 1.5.7 The Operation of the Data Step 24 1.6 The proc Step 25 1.6.1 The proc Statement 25 1.6.2 The var Statement 25 vii
viii A Handbook of Statistical Analyses Using SAS, Second Edition 1.6.3 The where Statement 25 1.6.4 The by Statement 26 1.6.5 The class Statement 26 1.7 Global Statements 26 1.8 ODS: The Output Delivery System 28 1.9 SAS Graphics 28 1.9.1 Proc gplot 28 1.9.2 Overlaid Graphs 31 1.9.3 Viewing and Printing Graphics 31 1.10 Some Tips for Preventing and Correcting Errors 32 2 Data Description and Simple Inference: Mortality and Water Hardness in the U.K 35 2.1 Description of Data -35 2.2 Methods of Analysis 36 2.3 Analysis Using SAS 36 Exercises 55 3 Simple Inference for Categorical Data: From Sandflies to Organic Particulates in the Air 57 3.1 Description of Data 57 3.2 Methods of Analysis 60 3.3 Analysis Using SAS 61 3.3.1 Cross-Classifying Raw Data 61 3.3.2 Sandflies 63 3-3.3 Acacia Ants 66 3-3-4 Piston Rings 68 3-3-5 Oral Contraceptives 70 3-3-6 Oral Cancers 72 3-3-7 Particulates and Bronchitis 75 Exercises 78 4 Multiple Regression: Determinants of Crime Rate in the United States 79 4.1 Description of Data 79 4.2 The Multiple Regression Model 81 4.3 Analysis Using SAS 83 Exercises 99 5 Analysis of Variance I: Treating Hypertension 101 5.1 Description of Data 101 5.2 Analysis of Variance Model 102 5.3 Analysis Using SAS 103 A;-..
Contents ix Exercises 116 6 Analysis of Variance II: School Attendance Amongst Australian Children 117 6.1 Description of Data 117 6.2 Analysis of Variance Model 119 6.2.1 Type I Sums of Squares 120 6.2.2 Type III Sums of Squares 120 6.3 Analysis Using SAS 122 Exercises 130 7 Analysis of Variance of Repeated Measures: Visual Acuity 131 7.1 Description of Data 131 7.2 Repeated Measures Data 131 7.3 Analysis of Variance for Repeated Measures Designs 133 7.4 Analysis Using SAS 134 Exercises...; 142 8 Logistic Regression: Psychiatric Screening, Plasma Proteins, and Danish Do-It-Yourself ". 143 8.1 Description of Data 143 8.2 The Logistic Regression Model 146 8.3 Analysis Using SAS 147 8.3.1 GHQ Data 147 8.3-2 ESR and Plasma Levels 153 8.3-3 Danish Do-It-Yourself 158 Exercises 164 9 Generalised Linear Models: School Attendance Amongst Australian School Children 165 9.1 Description of Data 165 9.2 Generalised Linear Models 165 9.2.1 Model Selection and Measure of Fit 168 9-3 Analysis Using SAS 169 Exercises 176 10 Longitudinal Data I: The Treatment of Postnatal Depression 179 10.1 Description of Data 179 10.2 The Analyses of Longitudinal Data 181 10.3 Analysis Using SAS 181 10.3.1 Graphical Displays 184 10.3.2 Response Feature Analysis 188 Exercises 195
x A Handbook of Statistical Analyses Using SAS, Second Edition 11 Longitudinal Data II: The Treatment of Alzheimer's Disease 197 11.1 Description of Data 197 11.2 Random Effects Models 199 11.3 Analysis Using SAS 201 Exercises 212 12 Survival Analysis: Gastric Cancer and Methadone Treatment of Heroin Addicts 213 12.1 Description of Data 213 12.2 Describing Survival and Cox's Regression Model 218 12.2.1 Survival Function 218 12.2.2 Hazard Function 219 12.2.3 Cox's Regression 220 12.3 Analysis Using SAS 222 12.3.1 Gastric Cancer 222 12.3.2 Methadone Treatment of Heroin Addicts 229 Exercises 235 13 Principal Components Analysis and Factor Analysis: The Olympic Decathlon and Statements about Pain 237 13-1 Description of Data 237 13-2 Principal Components and Factor Analyses 239 13-2.1 Principal Components Analysis 239 13.2.2 Factor Analysis 241 13-2.3 Factor Analysis and Principal Components Compared 242 13.3 Analysis Using SAS 243 13.3-1 Olympic Decathlon 243 13-3.2 Statements about Pain 252 Exercises 261 14 Cluster Analysis: Air Pollution in the U.S.A 263 14.1 Description of Data 263 14.2 Cluster Analysis 265 14.3 Analysis Using SAS 266 Exercises 284 15 Discriminant Function Analysis: Classifying Tibetan Skulls 287 15.1 Description of Data 287 15.2 Discriminant Function Analysis 289 15.3 Analysis Using SAS 291 Exercises 304
Contents xi 16 Correspondence Analysis: Smoking and Motherhood, Sex and the Single Girl, and European Stereotypes 305 16.1 Description of Data 305 16.2 Displaying Contingency Table Data Graphically Using Correspondence Analysis 307 16.3 Analysis Using SAS 310 16.3.1 Boyfriends., 310 16.3.2 Smoking and Motherhood 315 16.3-3 Are the Germans Really Arrogant? 319 Exercises 325 Appendix A: SAS Macro to Produce Scatterplot Matrices 327 Appendix B: Answers to Selected Chapter Exercises 331 References 347 Index 351