Regression Analysis II Lee D. Walker University of South Carolina e-mail: walker23@gwm.sc.edu COURSE OVERVIEW This course focuses on the theory, practice, and application of linear regression. As Agresti and Finlay argue concerning all social science, the goal of political science research is to understand, explain, and make inference about social phenomena. The goal of this course is to provide students with the intermediate level tools needed to design and implement studies using regression analysis, to read and examine literature that uses regression analysis, and to pursue advance methods in quantitative political analysis. This course assumes a basic understanding of statistics, probability and bivariate regression. Nevertheless, the course begins with a review of basic statistics and the bivariate regression model. We then study multiple regression in depth. We will cover both theory and practice of major aspects of multiple regression analysis. Specifically, we will discuss basic statistical and probability distributions, the bivariate regression model, the multiple regression model, model building, regression diagnostics, what to do when ordinary least squares regression assumptions are violated, linear alternatives to the OLS regression model, and two generalized linear model alternatives to the OLS model that do not assume linearity. I come to this course from a comparative politics background and take an applied approach to statistical methods. I attempt to minimize mathematics in favor of application and interpretation. That being said, some mathematics is unavoidable. I seek to make this class as assessable as possible for students who solely want a firm foundation in statistical application and interpretation. At the same time, I hope that the course will encourage students to seek instruction in more advanced statistical methodologies and approaches. We will use Agresti and Finlay s Statistical Methods for the Social Sciences as the primary text. I also use five Sage monographs that are invaluable as secondary texts for the course and for your personal libraries. From a Computing standpoint, you may complete assignments in any reliable statistical software package. In addition, we read several articles throughout the course. A few of these articles are useful applications, while others are methodological innovations. These articles are listed at the end of the syllabus. Literature Main Text: Agresti, Alan and Barbara Finlay. (1997). Statistical Methods for the Social Sciences 3 rd Edition. Upper Saddle River, NJ: Prentice Hall, Inc.
Secondary Text: Berry, William. (1993). Understanding Regression Assumptions. London: Sage. Fox, John. (1991). Regression Diagnostics: An Introduction. London: Sage Gill, Jeff (2000). Generalized Linear Models: A Unified Approach. London: Sage. Hardy, Melissa A. (1993). Regression with Dummy Variables, London: Sage Namboodiri, K. (1984). Matrix Algebra: An Introduction. London: Sage Optional but important and good: Fox, John. (2002). An R and S-PLUS Companion to Applied Regression. Thousand Oaks: Sage Publications. Kennedy, P. (2003). A Guide to Econometrics. Fifth Edition. Cambridge: MIT Press. Assignments: You will be asked to complete five homework assignments during the course, roughly one assignment each week. These assignments emphasize application and interpretation. Generally, you will replicate findings from a previous analysis and extend the analysis based on the specifications of the assignment. These assignments will involve the use of a statistical package. From time to time, you may have smaller assignments, which are designed to aid your understanding of regression concepts or computing operations. My Availability: I am at your disposal. Please do not hesitate to attend my office hours or make an appointment to meet with me. I am excited about working with you and welcome the interaction. Moreover, my teaching assistant is also available to assist you with substantive, statistical, or computing questions. COURSE SCHEDULE Class 1 and Prior Information: Introduction, Sampling, Descriptive Statistics, and Probability Distributions 1. Introduction to Class 2. Description and Inference 3. Sampling 4. Measures of Central Tendencies 5. Measures of Variation 6. Population Parameters 7. Probability distributions for Discrete and Continuous Variables 8. Theoretical Probability Distributions 9. Sampling Distributions 10. Population, Sample, and Sampling Distributions
11. Readings a. Agresti and Finlay (Chapter 1, 2, 3). b. King, Keohane, and Verba (Chapter 1) c. King (1986) Statistical inference 1. Point estimation 2. Confidence Interval for a mean 3. Confidence interval for a proportion 4. Choice of sample size 5. Confidence interval for a median 6. Introduction to STATA and R: Data Manipulation in STATA and R 7. Readings a. Agresti and Finlay (Chapter 4 and 5)} b. Fox (2002: 85-106)} c. Fox (2002: 34-84} (Reading and Manipulating Data in R) Hypothesis Testing 1. Decisions and error in Test of Hypotheses 2. Small sample Inference for Mean and Proportion 3. Test of Independence 4. Association in 2X2 Tables 5. Proportional Reduction of Error 6. Hypothesis Testing in STATA and R 7. Readings} a. Agresti and Finlay (Chapter 6 and 7) b. Gill (1999) c. Fox (2002: 85-106) d. optional: Fox (2002: 34-84} (Reading and Manipulating Data in R) e. Handout: Hanushek and Jackson (1978) Pre-Assignment 1: Probability and Statistics Class 2: The Bivariate Linear Regression Model Regression Coefficients, R-square and Correlation 1. Least Squares Prediction Equation 2. Linear Regression model 3. Measuring linear association 4. Inference for slope and correlation 5. Test of Independence and Confidence Intervals 6. Coefficient of Determination R-square 7. Readings a. Agresti and Finlay (Chapter 9) b. Fox (2002:18-34) Class 3: The Bivariate Linear Regression Model--2 1. Model Assumptions and Violations 2. Extrapolation 3. Outliers 4. Residuals
5. Data transformation and Graphical Presentations 6. Readings a. Berry pages 3-12 b. Fox (1991: 46-48) c. Fox (2002: 106-117) Pre-Assignment1 due Assignment 2: Bivariate Regression Class 4: Multiple Linear Regression: Correlation and Multiple Regression 1. Multiple Regression Model 2. Multiple Correlation and R-Squared 3. Inference for Multiple Regression Coefficients 4. Modeling Interaction a. Agresti and Finlay: Chapter 11 b. Berry: 13-22 c. Brambor, Clark, and Golder (2005) Class 5: Multiple Linear Regression 2 1. Comparing Regression Models: The F-Test 2. Partial Correlation-Partial Effects 3. Inference for Partial Correlations 4. Standardized Regression Coefficient 5. Problems with Standardized Regression Coefficients 6. Readings a. Agresti and Finlay (Chapter 11) Class 6: Model Building--Interactions and Dummy Variables 1. Comparing means and Regression Lines 2. Analysis of Covariance Models 3. Inference for Analysis of Covariance Model 4. Comparing Regression Models: The F-Test a. Hardy (1-21) b. Agresti and Finlay (Chapter 13) Assignment 2 due Assignment 3: Multiple Linear Regression Class 7: Model Building More on Categorical Independent Variables 1.Testing Hypotheses with Categorical Independent Variable 2.The nominal categorical independent Variables 3. Ordinal categorical independent Variables 4. Comparing Models: F-Test 5. Treating categorical variables as interval
6. Readings a. Hardy 21-29, 48-53, 78-85 b. Fox 61-65 c. McDaniel (1996) Class 8: Polynomial Regression as a solution to NonLinear Relationship 1. The Linearity Assumption 2. Detection of nonlinear relationship 3. Scatter Diagrams and Residual and Partial Residual Plots 4. Quadratic Regression a. Bollen and Jackman (1985) b. Agresti and Finlay (1996): Chapter 14, 543-550. c. Fox (1991): pages 53 to 61. Assignment 3 due Assignment 4: Bollen/Jackman replication Class 9: Multiple Regression Diagnostics and Model Building Residual Analysis, Multicollinearity, Heterosecedasticity, and Influential Observations 1. Model Selection Procedures 2. Omitted Variable Bias 3. Missing Data 4. Measurement Error 5. Collinearity and Multicollinearity in Models 6. Readings a. Berry: 24-27 b. Fox 1991: 10-21 c. King, Keohane, and Verba: Chapter 5 d. Agresti and Finlay (Chapter 14: pages 527-542) Class 10: Regression Diagnostics and Social Science Questions 1. Measure of Influence: Cook s Distance DFFITS and DFBETAS 2. Measure of Leverage: hat value 3. Measures of Distance: residuals, standardized residuals, studentized residuals 4. Use of regression diagnostics as explanatory statistic a. Fox (1991: 21-34) b. Berry (6-11) c. Agresti and Finlay (Chapter 14: pages 527-542) d. Bollen and Jackman (1985) Class 11: Multiple Regression Using Matrices 1. Expressing Linear Equations using Matrix Form 2. The Linear Model 3. OLS Assumptions in Matrix Form 4. OLS using Matrices
5. Residuals 6. Variance-Covariance Matrices 7. Readings a. Namboodiri (1984) 7-55} b. Fox (80-82 and 83-85)} c. Bekker and Wansbeek (1996)} Assignment 4 due Assignment 5: Georgia Violent Crime Replication Class 12: Heteroscedasticity and Autocorrelation detection and Weighted Least Squares 1. Effects of non-constant errors on estimation and inference 2. Detection of non-constant errors 3. The Omega Matrix 4. Breusch and Pagan/Cook-Weisberg Test for Heteroscedasticity 5. White's Test for Heteroscedasticity 6. Weighted Least Squares and Generalized Least Squares as solutions to heteroscedasticity 7. Readings a. Berry: 67-80 b.gill (2001: 42-44) c. Fox: 49-52 d. Lewis and Linzer (2005) Class 13: Non-Normally Distributed Errors and Robust Estimation 1. The Effects on Non-normal errors on estimation and inference 2. Detection of non-normal errors 3. qq-normal plots 4. Huber-M Estimator a. Chatterjee and Wiseman (1983) Class 14: Linear Model and Discrete Data and Limited Dependent Variable 1. Consequence of Violating Continuous Variable assumption 2. Why and when to choose the linear model. 3. Strengths of alternative approaches to the discrete outcome variable 4. OLS versus WLS versus Proportional Odds a. Fujimoto (2005) b. Meier et al. (1999) c. Fox (1991): pages 61-66 Class 15: The Generalized Linear Model and Analysis of Count Data: The Poisson Model 1. Comparison to OLS 2. MLE Estimation 3. The Poisson Regression Model 4. Interpreting Poisson Coefficients 5. Model Checking
6. Wald and Likelihood Ratio tests 7. Residuals and Overdispersion 8. Example: Nonresponse Count data 9. Readings a. Gill (2000) pages 1-7, 9-32, 39-49 b. King (1988) c. Long (1997) Chapter 8: Count Outcomes Assignment 5 due Class16: Modeling Highly Skewed Data: The Gamma Regression Model 1. Gamma distribution 2. Exponential Family and Exponential Regression 3. Gamma link function 4. Gamma Regression Model 5. Interpreting Gamma regression coefficient estimates 6. Model Checking 7. Wald and Likelihood Ratio tests 8. Residuals and Overdispersion 9. Readings a. Gill (2000) pages 1-7, 9-32, 49-51, 69-70 b. Agresti and Finlay (1996): pages 550-555; 556-561. Class 17: Wrap up, Extensions and a quick look at Logistic Regression 1. Dichotomous Choice Model 2. Logistic Regression Model 3. Interpreting Logit Coefficients 4. MLE estimates 5. Newton-Raphson Algorithm 6. Likelihood Ratio Test 7. Readings a. Gill (2000) pages 1-7, 9-32, 39-45 b. Agresti and Finlay: Chapter 15
Articles and Other Useful References Agresti, Alan. (1996). An Introduction to Categorical Data Analysis. New York: John Wiley and Sons. Beck, Nathaniel and Jonathan N. Katz. 1995 What to do (and not to do) with Time-Series Cross-Section Data. APSR 89(3): 634-647. Bekker, paul A. and Tom J. Wansbeek. (1996). Proxies versus Omitted Variables in Regression Analysis. Linear Algebra and Its Application. 237/238: 301-312. Bollen, Kenneth A. and Robert W. Jackman. (1985). Regression Diagnostics: An Expository Treatment of Outliers and Influential Cases. Sociological Methods and Research 13(4): 510-542. Brambor, Thomas, William Roberts Clark and Matt Golder (2005), Understanding Interaction Models: Improving Empirical Analysis. Political Analysis 16(1): 63-82. Lewis, Jeffrey B. and Drew A. Linzer. (2005). Estimating Regression Models in Which the Dependent Variable is Based on Estimates. Political Analysis 13: 345-364. Chatterjee, Sangit and Frederick Wiseman. (1983). Use of Regression Diagnostics in Political Science. American Journal of Political Science 27(3): 601-613. Fujimoto, Kayo. (2005). From Women s College to Work: Inter-Organizational Networks in the Japanese Female Labor Market. Social Science Research 34: 651-681. Gibson, James L. Gregory A. Caldeira, and Venessa A. Baird. 1998. On the Legitimacy of National High Courts. American Political Science Review 92(2): 343-358. Gill, Jeff. (1999). The Insignificance of Null Hypothesis Significance Testing. Political Research Quarterly 52: 647-674. King, Gary. (1986). How not to Lie with Statistics: Avoiding Common Mistakes in Quantitative Political Science. American Journal of Political Science 30(August 1986), 666:687. King, Gary. (1988). Statistical Models for Political Science Event Counts: Bias in Convetional Procedures and Evidence for the Exponential Poisson Regression Model, American Journal of Political Science 32(3): 838-863. King, Gary, Robert O. Keohane, and Sidney Verba. (1994). Designing Social Inquiry. Princeton: Princeton University Press. Chapter 1 and Chapter 5. Long, J. Scott. (1997). Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks: Sage.
McDaniel, Timothy. (1996). Categorical Independent Variables in Ordinary Least Squares Regression. Meier, Kenneth J., Robert D. Wrinkle and J.L. Polinard. (1999). Equity Versus Excellence in Organizations: A Substantively Weighted Least Squares Analysis. American Review of Public Administration 29(1): 5-18. Samuels David J. (2000). "The gubernatorial coattails effect: federalism and congressional elections in Brazil." Journal of Politics 62 (1): 240-253. Venables, W.N. and B. D. Ripley. (2001). Modern Applied Statistics with S-Plus. New York: Springer.