Multiple Linear Regression (Dummy Variable Treatment) CIVL 7012/8012

Similar documents
Simple Linear Regression the model, estimation and testing

Political Science 15, Winter 2014 Final Review

Multiple Regression with Qualitative Information ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

Introduction to Econometrics

Marno Verbeek Erasmus University, the Netherlands. Cons. Pros

Carrying out an Empirical Project

EWMBA 296 (Fall 2015)

Class 7 Everything is Related

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

Business Statistics Probability

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

Chapter 14: More Powerful Statistical Methods

full file at

bivariate analysis: The statistical analysis of the relationship between two variables.

PRINCIPLES OF STATISTICS

INTRODUCTION TO ECONOMETRICS (EC212)

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

11/24/2017. Do not imply a cause-and-effect relationship

MULTIPLE REGRESSION OF CPS DATA

POL 242Y Final Test (Take Home) Name

Multiple Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Ordinary Least Squares Regression

Problem Set 3 ECN Econometrics Professor Oscar Jorda. Name. ESSAY. Write your answer in the space provided.

Methods for Addressing Selection Bias in Observational Studies

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences.

EC352 Econometric Methods: Week 07

Lecture II: Difference in Difference and Regression Discontinuity

POLS 5377 Scope & Method of Political Science. Correlation within SPSS. Key Questions: How to compute and interpret the following measures in SPSS

Today: Binomial response variable with an explanatory variable on an ordinal (rank) scale.

1 Simple and Multiple Linear Regression Assumptions

Unit 1 Exploring and Understanding Data

Lecture II: Difference in Difference. Causality is difficult to Show from cross

Problem Set 5 ECN 140 Econometrics Professor Oscar Jorda. DUE: June 6, Name

Statistical Reasoning in Public Health 2009 Biostatistics 612, Homework #2

August 29, Introduction and Overview

The Steps for Research process THE RESEARCH PROCESS: THEORETICAL FRAMEWORK AND HYPOTHESIS DEVELOPMENT. Theoretical Framework

Still important ideas

Assessing Studies Based on Multiple Regression. Chapter 7. Michael Ash CPPA

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do.

Chapter 01 What Is Statistics?

Cancer survivorship and labor market attachments: Evidence from MEPS data

MEA DISCUSSION PAPERS

Those Who Tan and Those Who Don t: A Natural Experiment of Employment Discrimination

Understandable Statistics

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Folland et al Chapter 4

9 research designs likely for PSYC 2100

CHAPTER 3 RESEARCH METHODOLOGY

Correlation Ex.: Ex.: Causation: Ex.: Ex.: Ex.: Ex.: Randomized trials Treatment group Control group

Prentice Hall. A Survey of Mathematics with Applications, 7th Edition Mississippi Mathematics Framework 2007 Revised,

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?

Lecture (chapter 12): Bivariate association for nominal- and ordinal-level variables

Preliminary Report on Simple Statistical Tests (t-tests and bivariate correlations)

STATISTICS & PROBABILITY

Chapter 2 Organizing and Summarizing Data. Chapter 3 Numerically Summarizing Data. Chapter 4 Describing the Relation between Two Variables

Review and Wrap-up! ESP 178 Applied Research Methods Calvin Thigpen 3/14/17 Adapted from presentation by Prof. Susan Handy

Final Exam - section 2. Thursday, December hours, 30 minutes

UNIVERSITY OF THE FREE STATE DEPARTMENT OF COMPUTER SCIENCE AND INFORMATICS CSIS6813 MODULE TEST 2

IAPT: Regression. Regression analyses

HOW STATISTICS IMPACT PHARMACY PRACTICE?

Psychology Research Process

Constructing a Bivariate Table:

MTH 225: Introductory Statistics

Limited dependent variable regression models

M 140 Test 1 A Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 60

Statistical Techniques. Masoud Mansoury and Anas Abulfaraj

Dr. Kelly Bradley Final Exam Summer {2 points} Name

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 1.1-1

Pitfalls in Linear Regression Analysis

Economics 345 Applied Econometrics

7 Statistical Issues that Researchers Shouldn t Worry (So Much) About

Announcement. Homework #2 due next Friday at 5pm. Midterm is in 2 weeks. It will cover everything through the end of next week (week 5).

While correlation analysis helps

How to Model Mediating and Moderating Effects. Hsueh-Sheng Wu CFDR Workshop Series Summer 2012

First Hourly Quiz. SW 430: Research Methods in Social Work I

Seid M. Zekavat, Loyola Marymount University

Firming Up Inequality

3.2A Least-Squares Regression

Still important ideas

Psychology Research Process

Firming Up Inequality

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

Simple Linear Regression One Categorical Independent Variable with Several Categories

Wesleyan Economics Working Papers

Chapter 11 Nonexperimental Quantitative Research Steps in Nonexperimental Research

Midterm Exam ANSWERS Categorical Data Analysis, CHL5407H

Student Performance Q&A:

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha

Introduction to Econometrics

AP Statistics Practice Test Ch. 3 and Previous

Importance of factors contributing to work-related stress: comparison of four metrics

6. Unusual and Influential Data

STATISTICS AND RESEARCH DESIGN

Relationships. Between Measurements Variables. Chapter 10. Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

BOOTSTRAPPING CONFIDENCE LEVELS FOR HYPOTHESES ABOUT REGRESSION MODELS

Chapter 2--Norms and Basic Statistics for Testing

On the purpose of testing:

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

Causation. Victor I. Piercey. October 28, 2009

Transcription:

Multiple Linear Regression (Dummy Variable Treatment) CIVL 7012/8012

2 In Today s Class Recap Single dummy variable Multiple dummy variables Ordinal dummy variables Dummy-dummy interaction Dummy-continuous/discrete interaction Binary dependent variables

Introducing Dummy Independent Variable Qualitative Information Examples: gender, race, industry, region, rating grade, A way to incorporate qualitative information is to use dummy variables They may appear as the dependent or as independent variables A single dummy independent variable = the wage gain/loss if the person is a woman rather than a man (holding other things fixed) Dummy variable: =1 if the person is a woman =0 if the person is man

Illustrative Example Graphical Illustration Alternative interpretation of coefficient: i.e. the difference in mean wage between men and women with the same level of education. Intercept shift

Specification of Dummy Variables Dummy variable trap This model cannot be estimated (perfect collinearity) When using dummy variables, one category always has to be omitted: The base category are men The base category are women Alternatively, one could omit the intercept: Disadvantages: 1) More difficult to test for differences between the parameters 2) R-squared formula only valid if regression contains intercept

Interpretation of Dummy Variables Estimated wage equation with intercept shift Holding education, experience, and tenure fixed, women earn 1.81$ less per hour than men Does that mean that women are discriminated against? Not necessarily. Being female may be correlated with other productivity characteristics that have not been controlled for.

Model with only dummy variables- (Example-1) Comparing means of subpopulations described by dummies Discussion Not holding other factors constant, women earn 2.51$ per hour less than men, i.e. the difference between the mean wage of men and that of women is 2.51$. It can easily be tested whether difference in means is significant The wage difference between men and women is larger if no other things are controlled for; i.e. part of the difference is due to differences in education, experience and tenure between men and women

Model with only dummy variables- (Example-2) Further example: Effects of training grants on hours of training Hours training per employee Dummy indicating whether firm received training grant This is an example of program evaluation Treatment group (= grant receivers) vs. control group (= no grant) Is the effect of treatment on the outcome of interest causal?

Dependent log(y) and Dummy Independent Using dummy explanatory variables in equations for log(y) Dummy indicating whether house is of colonial style As the dummy for colonial style changes from 0 to 1, the house price increases by 5.4 percentage points

Dummy variables for multiple categories Using dummy variables for multiple categories 1) Define membership in each category by a dummy variable 2) Leave out one category (which becomes the base category) Holding other things fixed, married women earn 19.8% less than single men (= the base category)

Ordinal Dummy Variables Incorporating ordinal information using dummy variables Example: City credit ratings and municipal bond interest rates Municipal bond rate Credit rating from 0-4 (0=worst, 4=best) This specification would probably not be appropriate as the credit rating only contains ordinal information. A better way to incorporate this information is to define dummies: Dummies indicating whether the particular rating applies, e.g. CR 1 =1 if CR=1 and CR 1 =0 otherwise. All effects are measured in comparison to the worst rating (= base category).

Interactions among dummy variables Interactions involving dummy variables Allowing for different slopes Interaction term = intercept men = intercept women = slope men = slope women Interesting hypotheses The return to education is the same for men and women The whole wage equation is the same for men and women

Graphical illustration Interacting both the intercept and the slope with the female dummy enables one to model completely independent wage equations for men and women

Dummy-Continuous /Discrete Interaction (1) Estimated wage equation with interaction term No evidence against hypothesis that the return to education is the same for men and women Does this mean that there is no significant evidence of lower pay for women at the same levels of educ, exper, and tenure? No: this is only the effect for educ = 0. To answer the question one has to recenter the interaction term, e.g. around educ = 12.5 (= average education).

Dummy-Continuous /Discrete Interaction (2) Testing for differences in regression functions across groups Unrestricted model (contains full set of interactions) College grade point average Standardized aptitude test score High school rank percentile Restricted model (same regression for both groups) Total hours spent in college courses

Dummy-Continuous /Discrete Interaction (3) Null hypothesis All interaction effects are zero, i.e. the same regression coefficients apply to men and women Estimation of the unrestricted model Tested individually, the hypothesis that the interaction effects are zero cannot be rejected

Restricted and Unrestricted Models (with Dummy Variables) Joint test with F-statistic Null hypothesis is rejected Alternative way to compute F-statistic in the given case Run separate regressions for men and for women; the unrestricted SSR is given by the sum of the SSR of these two regressions Run regression for the restricted model and store SSR If the test is computed in this way it is called the Chow-Test Important: Test assumes a constant error variance accross groups

Binary dependent variable A Binary dependent variable: the linear probability model Linear regression when the dependent variable is binary If the dependent variable only takes on the values 1 and 0 Linear probability model (LPM) In the linear probability model, the coefficients describe the effect of the explanatory variables on the probability that y=1

Binary dependent variable:example-1 Example: Labor force participation of married women =1 if in labor force, =0 otherwise Non-wife income (in thousand dollars per year) Does not look significant (but see below) If the number of kids under six years increases by one, the pro- probability that the woman works falls by 26.2%

Binary dependent variable:example-2 Example: Female labor participation of married women (cont.) Graph for nwifeinc=50, exper=5, age=30, kindslt6=1, kidsge6=0 The maximum level of education in the sample is educ=17. For the given case, this leads to a predicted probability to be in the labor force of about 50%. Negative predicted probability but no problem because no woman in the sample has educ < 5.

Disadvantages: Linear probability model Disadvantages of the linear probability model Predicted probabilities may be larger than one or smaller than zero Marginal probability effects sometimes logically impossible The linear probability model is necessarily heteroskedastic Heterosceasticity consistent standard errors need to be computed Advantanges of the linear probability model Easy estimation and interpretation Variance of Bernoulli variable Estimated effects and predictions often reasonably good in practice

Treatment and Control Groups More on policy analysis and program evaluation Example: Effect of job training grants on worker productivity Percentage of defective items =1 if firm received training grant, =0 otherwise No apparent effect of grant on productivity Treatment group: grant reveivers, Control group: firms that received no grant Grants were given on a first-come, first-served basis. This is not the same as giving them out randomly. It might be the case that firms with less productive workers saw an opportunity to improve productivity and applied first.

Self Selection and Endogeneity Self-selection into treatment as a source for endogeneity In the given and in related examples, the treatment status is probably related to other characteristics that also influence the outcome The reason is that subjects self-select themselves into treatment depending on their individual characteristics and prospects Experimental evaluation In experiments, assignment to treatment is random In this case, causal effects can be inferred using a simple regression The dummy indicating whether or not there was treatment is unrelated to other factors affecting the outcome.

Endogenuous dummy regressor Further example of an endogenuous dummy regressor Are nonwhite customers discriminated against? Dummy indicating whether loan was approved Race dummy Credit rating It is important to control for other characteristics that may be important for loan approval (e.g. profession, unemployment) Omitting important characteristics that are correlated with the nonwhite dummy will produce spurious evidence for discriminiation