ECON Introductory Econometrics Seminar 7

Similar documents
Chapter 11 Regression with a Binary Dependent Variable

Age (continuous) Gender (0=Male, 1=Female) SES (1=Low, 2=Medium, 3=High) Prior Victimization (0= Not Victimized, 1=Victimized)

Final Exam - section 2. Thursday, December hours, 30 minutes

Notes for laboratory session 2

Modeling unobserved heterogeneity in Stata

Regression Output: Table 5 (Random Effects OLS) Random-effects GLS regression Number of obs = 1806 Group variable (i): subject Number of groups = 70

Multiple Linear Regression Analysis

ECON Introductory Econometrics. Seminar 1. Stock and Watson Chapter 2 & 3

m 11 m.1 > m 12 m.2 risk for smokers risk for nonsmokers

4. STATA output of the analysis

MULTIPLE REGRESSION OF CPS DATA

Name: emergency please discuss this with the exam proctor. 6. Vanderbilt s academic honor code applies.

1. Objective: analyzing CD4 counts data using GEE marginal model and random effects model. Demonstrate the analysis using SAS and STATA.

Sociology 63993, Exam1 February 12, 2015 Richard Williams, University of Notre Dame,

Least likely observations in regression models for categorical outcomes

MODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING

Sociology Exam 3 Answer Key [Draft] May 9, 201 3

Multivariate dose-response meta-analysis: an update on glst

Estimating average treatment effects from observational data using teffects

Generalized Mixed Linear Models Practical 2

Assessing Studies Based on Multiple Regression. Chapter 7. Michael Ash CPPA

Limited dependent variable regression models

Score Tests of Normality in Bivariate Probit Models

Econometric analysis and counterfactual studies in the context of IA practices

Syntax Menu Description Options Remarks and examples Stored results Methods and formulas References Also see

This tutorial presentation is prepared by. Mohammad Ehsanul Karim

ADVANCED STATISTICAL METHODS: PART 1: INTRODUCTION TO PROPENSITY SCORES IN STATA. Learning objectives:

Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models, 2nd Ed.

Sarwar Islam Mozumder 1, Mark J Rutherford 1 & Paul C Lambert 1, Stata London Users Group Meeting

Introduction to regression

2. Scientific question: Determine whether there is a difference between boys and girls with respect to the distance and its change over time.

Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach

Midterm Exam ANSWERS Categorical Data Analysis, CHL5407H

Implementing double-robust estimators of causal effects

Immortal Time Bias and Issues in Critical Care. Dave Glidden May 24, 2007

Joseph W Hogan Brown University & AMPATH February 16, 2010

Evaluating the Matlab Interventions Using Non-standardOctober Methods10, and2012 Outcomes 1 / 31

G , G , G MHRN

(a) Perform a cost-benefit analysis of diabetes screening for this group. Does it favor screening?

Generalized Linear Models of Malaria Incidence in Jubek State, South Sudan

HZAU MULTIVARIATE HOMEWORK #2 MULTIPLE AND STEPWISE LINEAR REGRESSION

Econometric Game 2012: infants birthweight?

Demystifying causal inference in randomised trials. Lecture 3: Introduction to mediation and mediation analysis using instrumental variables

Editor Executive Editor Associate Editors Copyright Statement:

War and Relatedness Enrico Spolaore and Romain Wacziarg September 2015

Bayesian Logistic Regression Modelling via Markov Chain Monte Carlo Algorithm

Health information and life-course smoking behavior: evidence from Turkey

Analyzing binary outcomes, going beyond logistic regression

ANOVA. Thomas Elliott. January 29, 2013

Heidi Hardt. *** For questions about this online resource, contact Prof. Heidi Hardt at hhardt at uci dot edu ***

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)

Statistical reports Regression, 2010

Controlling for time-dependent confounding using marginal structural models

Final Exam. Thursday the 23rd of March, Question Total Points Score Number Possible Received Total 100

Today: Binomial response variable with an explanatory variable on an ordinal (rank) scale.

NORTH SOUTH UNIVERSITY TUTORIAL 2

Binary Diagnostic Tests Two Independent Samples

Introduction to Econometrics

University of Cape Town

Multiple Regression Analysis

MEA DISCUSSION PAPERS

Statistical questions for statistical methods

Outline. Introduction GitHub and installation Worked example Stata wishes Discussion. mrrobust: a Stata package for MR-Egger regression type analyses

112 Statistics I OR I Econometrics A SAS macro to test the significance of differences between parameter estimates In PROC CATMOD

Contingency Tables Summer 2017 Summer Institutes 187

Unhealthy consumption behaviors and their intergenerational persistence: the role of education

An Analysis of the Effect of Income on Adult Body Weight in China. Ruizhi Zhang Faculty Advisor: Professor David Roland-Holst

Logistic regression. Department of Statistics, University of South Carolina. Stat 205: Elementary Statistics for the Biological and Life Sciences

Marno Verbeek Erasmus University, the Netherlands. Cons. Pros

Poisson regression. Dae-Jin Lee Basque Center for Applied Mathematics.

A Survey of Public Opinion on Secondhand Smoke Related Issues in Bourbon County, KY

STATISTICAL METHODS FOR DIAGNOSTIC TESTING: AN ILLUSTRATION USING A NEW METHOD FOR CANCER DETECTION XIN SUN. PhD, Kansas State University, 2012

STA 3024 Spring 2013 EXAM 3 Test Form Code A UF ID #

Web Feature: Friday, May 16, 2014 Editor: Chris Meyer, Executive Editor Author: Hank Eckardt, University of Notre Dame

Discussion Paper Series No April 2009

STP 231 Example FINAL

THE WAGE EFFECTS OF PERSONAL SMOKING

Quarterly Evaluation Report, 4th Quarter (For Data September-December 2006) Reviewed January 27, 2007

Heuristics as a Proxy for Contestant Risk Aversion in Deal or No Deal

The Stata Journal. Editor Nicholas J. Cox Geography Department Durham University South Road Durham City DH1 3LE UK

Instrumental Variables I (cont.)

Psychology Research Process

CHAPTER 9. X Coefficient(s) Std Err of Coef. Coefficient / S.E. Constant Std Err of Y Est R Squared

Daniel Boduszek University of Huddersfield

Eliciting Subjective Probability Distributions. with Binary Lotteries

THE EFFECTS OF ALCOHOL USE ON SCHOOL ENROLLMENT

EC352 Econometric Methods: Week 07

Self-assessment test of prerequisite knowledge for Biostatistics III in R

RESEARCH NOTES AND COMMENTARIES THE USE OF LIMITED DEPENDENT VARIABLE TECHNIQUES IN STRATEGY RESEARCH: ISSUES AND METHODS

Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections

An Introduction to Modern Econometrics Using Stata

An Exercise in Bayesian Econometric Analysis Probit and Linear Probability Models

Multivariable Cox regression. Day 3: multivariable Cox regression. Presentation of results. The statistical methods section

Cross-over trials. Martin Bland. Cross-over trials. Cross-over trials. Professor of Health Statistics University of York

Testing the Predictability of Consumption Growth: Evidence from China

Young Women s Marital Status and HIV Risk in Sub-Saharan Africa: Evidence from Lesotho, Swaziland and Zimbabwe

The impact of workplace smoking bans: results from a national survey

Getting started with Eviews 9 (Volume IV)

Interviewer Characteristics & Interviewer Effects in PIAAC Germany

HOW STATISTICS IMPACT PHARMACY PRACTICE?

Transcription:

ECON4150 - Introductory Econometrics Seminar 7 Stock and Watson EE11.2 April 28, 2015 Stock and Watson EE11.2 ECON4150 - Introductory Econometrics Seminar 7 April 28, 2015 1 / 25

E. 11.2 b clear set more off clear all cd M:\pc\Desktop\courses\introductory_econometrics\seminar_7 // E. 11.2 b and E. 11.3 b set obs 19 gen edu=0 replace edu=_n-1 gen pred_log= 1/(1+exp(8.146-0.551*edu)) gen pred_prob= normal(-4.107+0.272*edu) gen pred_lin= 0.035*edu-0.172 twoway (line pred_prob edu) (line pred_log edu) Stock and Watson EE11.2 ECON4150 - Introductory Econometrics Seminar 7 April 28, 2015 2 / 25

E. 11.2 b The shape of the regression functions are similar Stock and Watson EE11.2 ECON4150 - Introductory Econometrics Seminar 7 April 28, 2015 3 / 25

E. 11.3 b In this case the LPM model produces nonsensical results (probabilities smaller than 0) Stock and Watson EE11.2 ECON4150 - Introductory Econometrics Seminar 7 April 28, 2015 4 / 25

Empirical exercise E11.2: Data Smoking.dta is a cross-sectional data set with observations on 10,000 indoor workers. Smoking.dta is a cross-sectional data set with observations on 10,000 indoor workers. The data set contains information on whether individuals were, or were not, subject to a workplace smoking ban, whether or not the individuals smoked and other individual characteristics. We are going to investigate the effect of smoking bans on smoking behavior. Stock and Watson EE11.2 ECON4150 - Introductory Econometrics Seminar 7 April 28, 2015 5 / 25

Empirical exercise E11.2: Data Variable smoker smkban age hsdrop hsgrad colsome colgrad black hispanic female description =1 if current smoker, =0 otherwise =1 if there is a work area smoking ban, =0 otherwise age in years =1 if high school dropout, =0 otherwise =1 if high school graduate, =0 otherwise =1 if some college, =0 otherwise =1 if college graduate, =0 otherwise =1 if black, =0 otherwise =1 if Hispanic =0 otherwise =1 if female, =0 otherwise Stock and Watson EE11.2 ECON4150 - Introductory Econometrics Seminar 7 April 28, 2015 6 / 25

Empirical exercise E11.2: Data use "Smoking.dta",clear summ Variable Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- smoker 10000.2423.4284963 0 1 smkban 10000.6098.4878194 0 1 age 10000 38.6932 12.11378 18 88 hsdrop 10000.0912.2879077 0 1 hsgrad 10000.3266.468993 0 1 -------------+-------------------------------------------------------- colsome 10000.2802.4491193 0 1 colgrad 10000.1972.3979045 0 1 black 10000.0769.266446 0 1 hispanic 10000.1134.317097 0 1 female 10000.5637.4959505 0 1 Stock and Watson EE11.2 ECON4150 - Introductory Econometrics Seminar 7 April 28, 2015 7 / 25

a) //a //estimated probability of smoking summarize smoker for all workers is the proportion of smoker in the sample Variable Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- smoker 10000.2423.4284963 0 1 //estimated probability for all workers affected by smoking ban summarize smoker if smkban==1 Variable Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- smoker 6098.2120367.4087842 0 1 //estimated probability for all workers not affected by smoking ban summarize smoker if smkban==0 Variable Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- smoker 3902.2895951.4536326 0 1 Stock and Watson EE11.2 ECON4150 - Introductory Econometrics Seminar 7 April 28, 2015 8 / 25

b) One way is to compute the diff between the means computed in part a): 0.2120 0.2896 = 0.0776 We can estimate linear probability model to check whether difference is significant reg smoker smkban, robust Linear regression Number of obs = 10000 F( 1, 9998) = 75.06 Prob > F = 0.0000 R-squared = 0.0078 Root MSE =.42684 ------------------------------------------------------------------------------ Robust smoker Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- smkban -.0775583.008952-8.66 0.000 -.0951061 -.0600106 _cons.2895951.0072619 39.88 0.000.2753604.3038298 ------------------------------------------------------------------------------ Stock and Watson EE11.2 ECON4150 - Introductory Econometrics Seminar 7 April 28, 2015 9 / 25

c gen age2=age^2 regress smoker smkban female age age2 hsdrop hsgrad colsome colgrad black hispanic, robust Linear regression Number of obs = 10000 F( 10, 9989) = 68.75 Prob > F = 0.0000 R-squared = 0.0570 Root MSE =.41631 ------------------------------------------------------------------------------ Robust smoker Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- smkban -.0472399.0089661-5.27 0.000 -.0648153 -.0296645 female -.0332569.0085683-3.88 0.000 -.0500525 -.0164612 age.0096744.0018954 5.10 0.000.005959.0133898 age2 -.0001318.0000219-6.02 0.000 -.0001747 -.0000889 hsdrop.3227142.0194885 16.56 0.000.2845128.3609156 hsgrad.2327012.0125903 18.48 0.000.2080217.2573807 colsome.1642968.0126248 13.01 0.000.1395495.189044 colgrad.0447983.0120438 3.72 0.000.02119.0684066 black -.0275658.0160785-1.71 0.086 -.0590828.0039513 hispanic -.1048159.0139748-7.50 0.000 -.1322093 -.0774226 _cons -.0141099.0414228-0.34 0.733 -.0953069.0670872 ------------------------------------------------------------------------------ Stock and Watson EE11.2 ECON4150 - Introductory Econometrics Seminar 7 April 28, 2015 10 / 25

c Coefficient in multiple regression model is quite a bit smaller in absolute value, this indicates that the first coefficient suffers from omitted variable bias. Characteristics of workers that work in an office with a smoking ban differ from the characteristics of workers that work in an office without a smoking ban. For example, workers with a college degree are more likely to work in an office with a smoking ban than high-school dropouts, and college graduates are less likely to smoke than high-school dropouts Stock and Watson EE11.2 ECON4150 - Introductory Econometrics Seminar 7 April 28, 2015 11 / 25

d We can compute the t-statistic: H 0 : β smkban = 0 H 1 : β smkban 0 t = 0.047 0 0.009 = 5.27 t = 5.27 > 1.96 so the null hypothesis that the coefficient on smkban is equal to zero is rejected at a 5% significance level. Stock and Watson EE11.2 ECON4150 - Introductory Econometrics Seminar 7 April 28, 2015 12 / 25

e To test the hypothesis that the probability of smoking does not depend on education we have to test the following joint null hypothesis: H 0 : β hsdrop = β hsgrad = β colsome = β colgrad = 0 H 1 : at least one among β hsdrop, β hsgrad, β colsome, β colgrad is not equal to 0 test hsdrop=hsgrad=colsome=colgrad=0 ( 1) hsdrop - hsgrad = 0 ( 2) hsdrop - colsome = 0 ( 3) hsdrop - colgrad = 0 ( 4) hsdrop = 0 F( 4, 9989) = 140.09 Prob > F = 0.0000 The pvalue = 0.0000 so the null hypothesis is rejected at a 1% significance level. Stock and Watson EE11.2 ECON4150 - Introductory Econometrics Seminar 7 April 28, 2015 13 / 25

e The omitted education category is Masters degree or higher Thus the coefficients show the increase in probability of smoking relative to someone with a masters degree or higher For example, the coefficient on colgrad is 0.045, so the probability of smoking for a college graduate is 0.045 (4.5%) higher than for someone with a masters degree or higher. Similarly, the coefficient on hsdrop is 0.323, so the probability of smoking for a high school dropout is 0.323 (32.3%) higher than for someone with a masters degree or higher. Because the coefficients are all positive and get smaller as educational attainment increases, the probability of smoking falls as educational attainment increases. Stock and Watson EE11.2 ECON4150 - Introductory Econometrics Seminar 7 April 28, 2015 14 / 25

f probit probit smoker smkban female age age2 hsdrop hsgrad colsome colgrad black hispanic, robust Iteration 0: log pseudolikelihood = -5537.1662 Iteration 1: log pseudolikelihood = -5238.7464 Iteration 2: log pseudolikelihood = -5235.868 Iteration 3: log pseudolikelihood = -5235.8679 Probit regression Number of obs = 10000 Wald chi2(10) = 542.93 Prob > chi2 = 0.0000 Log pseudolikelihood = -5235.8679 Pseudo R2 = 0.0544 ------------------------------------------------------------------------------ Robust smoker Coef. Std. Err. z P> z [95% Conf. Interval] -------------+---------------------------------------------------------------- smkban -.15863.0291099-5.45 0.000 -.2156843 -.1015757 female -.1117313.028841-3.87 0.000 -.1682585 -.055204 age.0345114.0068839 5.01 0.000.0210192.0480035 age2 -.0004675.0000826-5.66 0.000 -.0006295 -.0003056 hsdrop 1.14161.0729708 15.64 0.000.9985902 1.284631 hsgrad.8826708.0603706 14.62 0.000.7643467 1.000995 colsome.6771192.0614448 11.02 0.000.5566896.7975488 colgrad.2346839.0654163 3.59 0.000.1064703.3628976 black -.0842789.0534536-1.58 0.115 -.1890461.0204883 hispanic -.3382743.0493523-6.85 0.000 -.435003 -.2415457 _cons -1.734926.1519802-11.42 0.000-2.032802-1.437051 ------------------------------------------------------------------------------ Stock and Watson EE11.2 ECON4150 - Introductory Econometrics Seminar 7 April 28, 2015 15 / 25

f We can compute the t-statistic: H 0 : β smkban = 0 H 1 : β smkban 0 t = 0.159 0 0.029 = 5.45 t = 5.45 > 1.96 so the null hypothesis that the coefficient on smkban is equal to zero is rejected at a 5% significance level. Very similar t-statistic as in the linear probability model. Stock and Watson EE11.2 ECON4150 - Introductory Econometrics Seminar 7 April 28, 2015 16 / 25

f test hsdrop=hsgrad=colsome=colgrad=0 ( 1) [smoker]hsdrop - [smoker]hsgrad = 0 ( 2) [smoker]hsdrop - [smoker]colsome = 0 ( 3) [smoker]hsdrop - [smoker]colgrad = 0 ( 4) [smoker]hsdrop = 0 chi2( 4) = 447.34 Prob > chi2 = 0.0000 The pvalue = 0.0000 so the null hypothesis is rejected at a 1% significance level Stock and Watson EE11.2 ECON4150 - Introductory Econometrics Seminar 7 April 28, 2015 17 / 25

g logit smoker smkban female age age2 hsdrop hsgrad colsome colgrad black hispanic, robust Iteration 0: log pseudolikelihood = -5537.1662 Iteration 1: log pseudolikelihood = -5245.1234 Iteration 2: log pseudolikelihood = -5234.0374 Iteration 3: log pseudolikelihood = -5233.9986 Iteration 4: log pseudolikelihood = -5233.9986 Logistic regression Number of obs = 10000 Wald chi2(10) = 515.53 Prob > chi2 = 0.0000 Log pseudolikelihood = -5233.9986 Pseudo R2 = 0.0548 ------------------------------------------------------------------------------ Robust smoker Coef. Std. Err. z P> z [95% Conf. Interval] -------------+---------------------------------------------------------------- smkban -.2620287.049477-5.30 0.000 -.3590019 -.1650555 female -.1907725.0491763-3.88 0.000 -.2871563 -.0943887 age.0599366.0118227 5.07 0.000.0367645.0831087 age2 -.0008182.0001431-5.72 0.000 -.0010987 -.0005376 hsdrop 2.016853.1340272 15.05 0.000 1.754165 2.279542 hsgrad 1.578504.1157682 13.64 0.000 1.351602 1.805405 colsome 1.229978.1176499 10.45 0.000.9993881 1.460567 colgrad.4465832.1263721 3.53 0.000.1988985.6942679 black -.1560341.09125-1.71 0.087 -.3348809.0228126 hispanic -.5971731.0861865-6.93 0.000 -.7660955 -.4282508 _cons -2.999182.2652433-11.31 0.000-3.51905-2.479315 ------------------------------------------------------------------------------ Stock and Watson EE11.2 ECON4150 - Introductory Econometrics Seminar 7 April 28, 2015 18 / 25

g We can compute the t-statistic: H 0 : β smkban = 0 H 1 : β smkban 0 t =.262 0 0.0495 = 5.292 t = 5.292 > 1.96 so the null hypothesis that the coefficient on smkban is equal to zero is rejected at a 5% significance level. Very similar t-statistic as in the linear probability model. Stock and Watson EE11.2 ECON4150 - Introductory Econometrics Seminar 7 April 28, 2015 19 / 25

g test hsdrop=hsgrad=colsome=colgrad=0 ( 1) [smoker]hsdrop - [smoker]hsgrad = 0 ( 2) [smoker]hsdrop - [smoker]colsome = 0 ( 3) [smoker]hsdrop - [smoker]colgrad = 0 ( 4) [smoker]hsdrop = 0 chi2( 4) = 420.14 Prob > chi2 = 0.0000 log close Stock and Watson EE11.2 ECON4150 - Introductory Econometrics Seminar 7 April 28, 2015 20 / 25

h, i. Mr.A (male, white, no hispanic, 20 years old, high school dropout) probability of smoking with no ban based on the probit results: Pr(smok = 1/smkban = 0) A = Φ(20 0.035 20 0.001 + 1.142 1.735) = 0.464 Mr.A probability of smoking with smoking ban based on the probit results: Pr(smok = 1/smkban = 1) A = Φ(20 0.035 20 0.001+1.142 0.159 1.735) = 0.402 effect of the smoking bans in the probabiliyt of smoking of mister A is a decrease of 6.2 % Stock and Watson EE11.2 ECON4150 - Introductory Econometrics Seminar 7 April 28, 2015 21 / 25

h, ii. Ms. B (female, black, 40 years old, college graduate) probability of smoking with no ban based on the probit results: Pr(smok = 1/ban = 0) B = Φ( 0.11 + 40 0.03 40 0.001 0.08 + 0.23 1.73) = 0.144 Ms. B probability of smoking with smoking ban based on the probit results: Pr(smok = 1/ban = 1) B = Φ( 0.11 + 40 0.03 40 0.001 0.08 + 0.23 1.73 0.159) = 0.111 effect of the smoking bans in the probabiliyt of smoking of mister B is a decrease of 3.3 %. The effect of introducing the ban is different for different individuals Stock and Watson EE11.2 ECON4150 - Introductory Econometrics Seminar 7 April 28, 2015 22 / 25

h, iii. linear probability model: Pr(smok = 1/smkban = 0) A = 20 0.010 20 0.001 0.014 + 0.323 0.014 = 0.449 Pr(smok = 1/smkban = 1) A = 20 0.010 20 0.001 0.014 + 0.323 0.047 0.014 = 0.402 Pr(smok = 1/ban = 0) B = 0.033 0.028 + 0.045 + 40 0.010 40 0.001.014 = 0.146 Pr(smok = 1/ban = 1) B = 0.033 0.028+0.045+40 0.010 40 0.001.014 0.047 = 0.099 the marginal impact of workplace smoking bans on the probability of an individual smoking is not dependent on the other characteristics of the individual. In both cases the bans reduces the probability of smoking by 0.047 Stock and Watson EE11.2 ECON4150 - Introductory Econometrics Seminar 7 April 28, 2015 23 / 25

h, iv. Logit model: 1 Pr(smok = 1/smkban = 0) A = = 0.472 1 + e (20.060 20 0.001+2.016 3) Pr(smok = 1/smkban = 1) A = 0.408 difference is 0.064 Pr(smok = 1/ban = 0) B = 0.141 Pr(smok = 1/ban = 1) B = 0.112 difference is 0.029 Stock and Watson EE11.2 ECON4150 - Introductory Econometrics Seminar 7 April 28, 2015 24 / 25

h,v linear probability model assumes that the marginal impact indipendent of other characteristics. probit and logit predicted marginal impact of workplace smoking bans on the probability of smoking depends on individual characteristics In this sense the probit and logit model are likely more appropriate, and they give very similar answers we can think of large impacts. For example, for people with characteristics like Mr. A the reduction on the probability is great than 6% (from the probit and logit models) Stock and Watson EE11.2 ECON4150 - Introductory Econometrics Seminar 7 April 28, 2015 25 / 25