Regression Analysis II

Similar documents
CLASSICAL AND. MODERN REGRESSION WITH APPLICATIONS

The University of North Carolina at Chapel Hill School of Social Work

Biostatistics II

SW 9300 Applied Regression Analysis and Generalized Linear Models 3 Credits. Master Syllabus

Applied Regression The University of Texas at Dallas EPPS 6316, Spring 2013 Tuesday, 7pm 9:45pm Room: FO 2.410

INTRODUCTION TO ECONOMETRICS (EC212)

Marno Verbeek Erasmus University, the Netherlands. Cons. Pros

11/24/2017. Do not imply a cause-and-effect relationship

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

Understanding. Regression Analysis

Understandable Statistics

Ordinary Least Squares Regression

Applied Linear Regression

Modern Regression Methods

Linear Regression Analysis

Ecological Statistics

Advanced Bayesian Models for the Social Sciences. TA: Elizabeth Menninga (University of North Carolina, Chapel Hill)

Unit 1 Exploring and Understanding Data

Applications of Regression Models in Epidemiology

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

MEA DISCUSSION PAPERS

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)

1.4 - Linear Regression and MS Excel

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Business Statistics Probability

Advanced Bayesian Models for the Social Sciences

Interaction Effects: Centering, Variance Inflation Factor, and Interpretation Issues

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences.

Dr. Kelly Bradley Final Exam Summer {2 points} Name

Advanced Handling of Missing Data

AP Statistics. Semester One Review Part 1 Chapters 1-5

Score Tests of Normality in Bivariate Probit Models

STAT445 Midterm Project1

How to describe bivariate data

isc ove ring i Statistics sing SPSS

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Applied Medical. Statistics Using SAS. Geoff Der. Brian S. Everitt. CRC Press. Taylor Si Francis Croup. Taylor & Francis Croup, an informa business

Limited dependent variable regression models

CHILD HEALTH AND DEVELOPMENT STUDY

M15_BERE8380_12_SE_C15.6.qxd 2/21/11 8:21 PM Page Influence Analysis 1

Staff Papers Series. Department of Agricultural and Applied Economics

Political Science 15, Winter 2014 Final Review

6. Unusual and Influential Data

Chapter 1: Explaining Behavior

Data Analysis Using Regression and Multilevel/Hierarchical Models

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics

Pitfalls in Linear Regression Analysis

Practical Multivariate Analysis

Performance of Median and Least Squares Regression for Slightly Skewed Data

This tutorial presentation is prepared by. Mohammad Ehsanul Karim

Correlation and regression

Still important ideas

WELCOME! Lecture 11 Thommy Perlinger

An Introduction to Modern Econometrics Using Stata

Adaptive Aspirations in an American Financial Services Organization: A Field Study

Data Analysis with SPSS

Analyzing binary outcomes, going beyond logistic regression

Measurement Error in Nonlinear Models

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Clincial Biostatistics. Regression

The SAGE Encyclopedia of Educational Research, Measurement, and Evaluation Multivariate Analysis of Variance

Industrial and Manufacturing Engineering 786. Applied Biostatistics in Ergonomics Spring 2012 Kurt Beschorner

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition

Propensity Score Analysis Shenyang Guo, Ph.D.

Online Appendix. According to a recent survey, most economists expect the economic downturn in the United

bivariate analysis: The statistical analysis of the relationship between two variables.

Introduction to Econometrics

Chapter 4: More about Relationships between Two-Variables Review Sheet

Introduction to Survival Analysis Procedures (Chapter)

Introduction to the Logic of Comparative Research

Final Exam - section 2. Thursday, December hours, 30 minutes

Introduction to Meta-Analysis

Preliminary Report on Simple Statistical Tests (t-tests and bivariate correlations)

Course Information and Reading List

Applications. DSC 410/510 Multivariate Statistical Methods. Discriminating Two Groups. What is Discriminant Analysis

Chapter 1: Exploring Data

Journal of Political Economy, Vol. 93, No. 2 (Apr., 1985)

Today: Binomial response variable with an explanatory variable on an ordinal (rank) scale.

ICPSR Causal Inference in the Social Sciences. Course Syllabus

BIOSTATISTICAL METHODS AND RESEARCH DESIGNS. Xihong Lin Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA

Correlated to: ACT College Readiness Standards Science (High School)

Chapter 3: Describing Relationships

IAPT: Regression. Regression analyses

Logistic Regression with Missing Data: A Comparison of Handling Methods, and Effects of Percent Missing Values

REGRESSION MODELLING IN PREDICTING MILK PRODUCTION DEPENDING ON DAIRY BOVINE LIVESTOCK

PRINCIPLES OF STATISTICS

Correlation and Regression

On the purpose of testing:

Bootstrapping Residuals to Estimate the Standard Error of Simple Linear Regression Coefficients

Georgetown University ECON-616, Fall Macroeconometrics. URL: Office Hours: by appointment

Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Jake Bowers Wednesdays, 2-4pm 6648 Haven Hall ( ) CPS Phone is

EMPIRICAL STRATEGIES IN LABOUR ECONOMICS

Estimating Heterogeneous Choice Models with Stata

Logistic regression: Why we often can do what we think we can do 1.

10. LINEAR REGRESSION AND CORRELATION

7 Statistical Issues that Researchers Shouldn t Worry (So Much) About

Transcription:

Regression Analysis II Lee D. Walker University of South Carolina e-mail: walker23@gwm.sc.edu COURSE OVERVIEW This course focuses on the theory, practice, and application of linear regression. As Agresti and Finlay argue concerning all social science, the goal of political science research is to understand, explain, and make inference about social phenomena. The goal of this course is to provide students with the intermediate level tools needed to design and implement studies using regression analysis, to read and examine literature that uses regression analysis, and to pursue advance methods in quantitative political analysis. This course assumes a basic understanding of statistics, probability and bivariate regression. Nevertheless, the course begins with a review of basic statistics and the bivariate regression model. We then study multiple regression in depth. We will cover both theory and practice of major aspects of multiple regression analysis. Specifically, we will discuss basic statistical and probability distributions, the bivariate regression model, the multiple regression model, model building, regression diagnostics, what to do when ordinary least squares regression assumptions are violated, linear alternatives to the OLS regression model, and two generalized linear model alternatives to the OLS model that do not assume linearity. I come to this course from a comparative politics background and take an applied approach to statistical methods. I attempt to minimize mathematics in favor of application and interpretation. That being said, some mathematics is unavoidable. I seek to make this class as assessable as possible for students who solely want a firm foundation in statistical application and interpretation. At the same time, I hope that the course will encourage students to seek instruction in more advanced statistical methodologies and approaches. We will use Agresti and Finlay s Statistical Methods for the Social Sciences as the primary text. I also use five Sage monographs that are invaluable as secondary texts for the course and for your personal libraries. From a Computing standpoint, you may complete assignments in any reliable statistical software package. In addition, we read several articles throughout the course. A few of these articles are useful applications, while others are methodological innovations. These articles are listed at the end of the syllabus. Literature Main Text: Agresti, Alan and Barbara Finlay. (1997). Statistical Methods for the Social Sciences 3 rd Edition. Upper Saddle River, NJ: Prentice Hall, Inc.

Secondary Text: Berry, William. (1993). Understanding Regression Assumptions. London: Sage. Fox, John. (1991). Regression Diagnostics: An Introduction. London: Sage Gill, Jeff (2000). Generalized Linear Models: A Unified Approach. London: Sage. Hardy, Melissa A. (1993). Regression with Dummy Variables, London: Sage Namboodiri, K. (1984). Matrix Algebra: An Introduction. London: Sage Optional but important and good: Fox, John. (2002). An R and S-PLUS Companion to Applied Regression. Thousand Oaks: Sage Publications. Kennedy, P. (2003). A Guide to Econometrics. Fifth Edition. Cambridge: MIT Press. Assignments: You will be asked to complete five homework assignments during the course, roughly one assignment each week. These assignments emphasize application and interpretation. Generally, you will replicate findings from a previous analysis and extend the analysis based on the specifications of the assignment. These assignments will involve the use of a statistical package. From time to time, you may have smaller assignments, which are designed to aid your understanding of regression concepts or computing operations. My Availability: I am at your disposal. Please do not hesitate to attend my office hours or make an appointment to meet with me. I am excited about working with you and welcome the interaction. Moreover, my teaching assistant is also available to assist you with substantive, statistical, or computing questions. COURSE SCHEDULE Class 1 and Prior Information: Introduction, Sampling, Descriptive Statistics, and Probability Distributions 1. Introduction to Class 2. Description and Inference 3. Sampling 4. Measures of Central Tendencies 5. Measures of Variation 6. Population Parameters 7. Probability distributions for Discrete and Continuous Variables 8. Theoretical Probability Distributions 9. Sampling Distributions 10. Population, Sample, and Sampling Distributions

11. Readings a. Agresti and Finlay (Chapter 1, 2, 3). b. King, Keohane, and Verba (Chapter 1) c. King (1986) Statistical inference 1. Point estimation 2. Confidence Interval for a mean 3. Confidence interval for a proportion 4. Choice of sample size 5. Confidence interval for a median 6. Introduction to STATA and R: Data Manipulation in STATA and R 7. Readings a. Agresti and Finlay (Chapter 4 and 5)} b. Fox (2002: 85-106)} c. Fox (2002: 34-84} (Reading and Manipulating Data in R) Hypothesis Testing 1. Decisions and error in Test of Hypotheses 2. Small sample Inference for Mean and Proportion 3. Test of Independence 4. Association in 2X2 Tables 5. Proportional Reduction of Error 6. Hypothesis Testing in STATA and R 7. Readings} a. Agresti and Finlay (Chapter 6 and 7) b. Gill (1999) c. Fox (2002: 85-106) d. optional: Fox (2002: 34-84} (Reading and Manipulating Data in R) e. Handout: Hanushek and Jackson (1978) Pre-Assignment 1: Probability and Statistics Class 2: The Bivariate Linear Regression Model Regression Coefficients, R-square and Correlation 1. Least Squares Prediction Equation 2. Linear Regression model 3. Measuring linear association 4. Inference for slope and correlation 5. Test of Independence and Confidence Intervals 6. Coefficient of Determination R-square 7. Readings a. Agresti and Finlay (Chapter 9) b. Fox (2002:18-34) Class 3: The Bivariate Linear Regression Model--2 1. Model Assumptions and Violations 2. Extrapolation 3. Outliers 4. Residuals

5. Data transformation and Graphical Presentations 6. Readings a. Berry pages 3-12 b. Fox (1991: 46-48) c. Fox (2002: 106-117) Pre-Assignment1 due Assignment 2: Bivariate Regression Class 4: Multiple Linear Regression: Correlation and Multiple Regression 1. Multiple Regression Model 2. Multiple Correlation and R-Squared 3. Inference for Multiple Regression Coefficients 4. Modeling Interaction a. Agresti and Finlay: Chapter 11 b. Berry: 13-22 c. Brambor, Clark, and Golder (2005) Class 5: Multiple Linear Regression 2 1. Comparing Regression Models: The F-Test 2. Partial Correlation-Partial Effects 3. Inference for Partial Correlations 4. Standardized Regression Coefficient 5. Problems with Standardized Regression Coefficients 6. Readings a. Agresti and Finlay (Chapter 11) Class 6: Model Building--Interactions and Dummy Variables 1. Comparing means and Regression Lines 2. Analysis of Covariance Models 3. Inference for Analysis of Covariance Model 4. Comparing Regression Models: The F-Test a. Hardy (1-21) b. Agresti and Finlay (Chapter 13) Assignment 2 due Assignment 3: Multiple Linear Regression Class 7: Model Building More on Categorical Independent Variables 1.Testing Hypotheses with Categorical Independent Variable 2.The nominal categorical independent Variables 3. Ordinal categorical independent Variables 4. Comparing Models: F-Test 5. Treating categorical variables as interval

6. Readings a. Hardy 21-29, 48-53, 78-85 b. Fox 61-65 c. McDaniel (1996) Class 8: Polynomial Regression as a solution to NonLinear Relationship 1. The Linearity Assumption 2. Detection of nonlinear relationship 3. Scatter Diagrams and Residual and Partial Residual Plots 4. Quadratic Regression a. Bollen and Jackman (1985) b. Agresti and Finlay (1996): Chapter 14, 543-550. c. Fox (1991): pages 53 to 61. Assignment 3 due Assignment 4: Bollen/Jackman replication Class 9: Multiple Regression Diagnostics and Model Building Residual Analysis, Multicollinearity, Heterosecedasticity, and Influential Observations 1. Model Selection Procedures 2. Omitted Variable Bias 3. Missing Data 4. Measurement Error 5. Collinearity and Multicollinearity in Models 6. Readings a. Berry: 24-27 b. Fox 1991: 10-21 c. King, Keohane, and Verba: Chapter 5 d. Agresti and Finlay (Chapter 14: pages 527-542) Class 10: Regression Diagnostics and Social Science Questions 1. Measure of Influence: Cook s Distance DFFITS and DFBETAS 2. Measure of Leverage: hat value 3. Measures of Distance: residuals, standardized residuals, studentized residuals 4. Use of regression diagnostics as explanatory statistic a. Fox (1991: 21-34) b. Berry (6-11) c. Agresti and Finlay (Chapter 14: pages 527-542) d. Bollen and Jackman (1985) Class 11: Multiple Regression Using Matrices 1. Expressing Linear Equations using Matrix Form 2. The Linear Model 3. OLS Assumptions in Matrix Form 4. OLS using Matrices

5. Residuals 6. Variance-Covariance Matrices 7. Readings a. Namboodiri (1984) 7-55} b. Fox (80-82 and 83-85)} c. Bekker and Wansbeek (1996)} Assignment 4 due Assignment 5: Georgia Violent Crime Replication Class 12: Heteroscedasticity and Autocorrelation detection and Weighted Least Squares 1. Effects of non-constant errors on estimation and inference 2. Detection of non-constant errors 3. The Omega Matrix 4. Breusch and Pagan/Cook-Weisberg Test for Heteroscedasticity 5. White's Test for Heteroscedasticity 6. Weighted Least Squares and Generalized Least Squares as solutions to heteroscedasticity 7. Readings a. Berry: 67-80 b.gill (2001: 42-44) c. Fox: 49-52 d. Lewis and Linzer (2005) Class 13: Non-Normally Distributed Errors and Robust Estimation 1. The Effects on Non-normal errors on estimation and inference 2. Detection of non-normal errors 3. qq-normal plots 4. Huber-M Estimator a. Chatterjee and Wiseman (1983) Class 14: Linear Model and Discrete Data and Limited Dependent Variable 1. Consequence of Violating Continuous Variable assumption 2. Why and when to choose the linear model. 3. Strengths of alternative approaches to the discrete outcome variable 4. OLS versus WLS versus Proportional Odds a. Fujimoto (2005) b. Meier et al. (1999) c. Fox (1991): pages 61-66 Class 15: The Generalized Linear Model and Analysis of Count Data: The Poisson Model 1. Comparison to OLS 2. MLE Estimation 3. The Poisson Regression Model 4. Interpreting Poisson Coefficients 5. Model Checking

6. Wald and Likelihood Ratio tests 7. Residuals and Overdispersion 8. Example: Nonresponse Count data 9. Readings a. Gill (2000) pages 1-7, 9-32, 39-49 b. King (1988) c. Long (1997) Chapter 8: Count Outcomes Assignment 5 due Class16: Modeling Highly Skewed Data: The Gamma Regression Model 1. Gamma distribution 2. Exponential Family and Exponential Regression 3. Gamma link function 4. Gamma Regression Model 5. Interpreting Gamma regression coefficient estimates 6. Model Checking 7. Wald and Likelihood Ratio tests 8. Residuals and Overdispersion 9. Readings a. Gill (2000) pages 1-7, 9-32, 49-51, 69-70 b. Agresti and Finlay (1996): pages 550-555; 556-561. Class 17: Wrap up, Extensions and a quick look at Logistic Regression 1. Dichotomous Choice Model 2. Logistic Regression Model 3. Interpreting Logit Coefficients 4. MLE estimates 5. Newton-Raphson Algorithm 6. Likelihood Ratio Test 7. Readings a. Gill (2000) pages 1-7, 9-32, 39-45 b. Agresti and Finlay: Chapter 15

Articles and Other Useful References Agresti, Alan. (1996). An Introduction to Categorical Data Analysis. New York: John Wiley and Sons. Beck, Nathaniel and Jonathan N. Katz. 1995 What to do (and not to do) with Time-Series Cross-Section Data. APSR 89(3): 634-647. Bekker, paul A. and Tom J. Wansbeek. (1996). Proxies versus Omitted Variables in Regression Analysis. Linear Algebra and Its Application. 237/238: 301-312. Bollen, Kenneth A. and Robert W. Jackman. (1985). Regression Diagnostics: An Expository Treatment of Outliers and Influential Cases. Sociological Methods and Research 13(4): 510-542. Brambor, Thomas, William Roberts Clark and Matt Golder (2005), Understanding Interaction Models: Improving Empirical Analysis. Political Analysis 16(1): 63-82. Lewis, Jeffrey B. and Drew A. Linzer. (2005). Estimating Regression Models in Which the Dependent Variable is Based on Estimates. Political Analysis 13: 345-364. Chatterjee, Sangit and Frederick Wiseman. (1983). Use of Regression Diagnostics in Political Science. American Journal of Political Science 27(3): 601-613. Fujimoto, Kayo. (2005). From Women s College to Work: Inter-Organizational Networks in the Japanese Female Labor Market. Social Science Research 34: 651-681. Gibson, James L. Gregory A. Caldeira, and Venessa A. Baird. 1998. On the Legitimacy of National High Courts. American Political Science Review 92(2): 343-358. Gill, Jeff. (1999). The Insignificance of Null Hypothesis Significance Testing. Political Research Quarterly 52: 647-674. King, Gary. (1986). How not to Lie with Statistics: Avoiding Common Mistakes in Quantitative Political Science. American Journal of Political Science 30(August 1986), 666:687. King, Gary. (1988). Statistical Models for Political Science Event Counts: Bias in Convetional Procedures and Evidence for the Exponential Poisson Regression Model, American Journal of Political Science 32(3): 838-863. King, Gary, Robert O. Keohane, and Sidney Verba. (1994). Designing Social Inquiry. Princeton: Princeton University Press. Chapter 1 and Chapter 5. Long, J. Scott. (1997). Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks: Sage.

McDaniel, Timothy. (1996). Categorical Independent Variables in Ordinary Least Squares Regression. Meier, Kenneth J., Robert D. Wrinkle and J.L. Polinard. (1999). Equity Versus Excellence in Organizations: A Substantively Weighted Least Squares Analysis. American Review of Public Administration 29(1): 5-18. Samuels David J. (2000). "The gubernatorial coattails effect: federalism and congressional elections in Brazil." Journal of Politics 62 (1): 240-253. Venables, W.N. and B. D. Ripley. (2001). Modern Applied Statistics with S-Plus. New York: Springer.