Final Exam - section 2. Thursday, December hours, 30 minutes

Econometrics, ECON312 San Francisco State University Michael Bar Fall 2011 Final Exam - section 2 Thursday, December 15 2 hours, 30 minutes Name: Instructions 1. This is closed book, closed notes exam. 2. No calculators of any kind are allowed. 3. Show all the calculations. 4. If you need more space, use the back of the page. 5. Fully label all graphs. Good Luck

1. (35 points). Meder studies the factors affecting the individual demand for cigarettes. He collected a sample of 807 individuals, with the following variables: cigs number of cigarettes smoked per day lcigpric log of cigarette price lincome log of income educ years of schooling restaurn dummy variable (= 1 if smoking in restaurants is restricted in the state, 0 otherwise) age age in years agesq age squared ( agesq age ). black dummy variable (= 1 if person is black, 0 otherwise) Meder s Stata command and regression output are presented below. 2. regress cigs lcigpric lincome educ restaurn age agesq black Source SS df MS Number of obs = 807 -------------+------------------------------ F( 7, 799) = 6.38 Model 8029.43629 7 1147.06233 Prob > F = 0.0000 Residual 143724.246 799 179.880158 R-squared = 0.0529 -------------+------------------------------ Adj R-squared = 0.0446 Total 151753.683 806 188.280003 Root MSE = 13.412 cigs Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- lcigpric -.8508953 5.782322-0.15 0.883-12.20123 10.49944 lincome.8690151.7287642 1.19 0.233 -.5615034 2.299534 educ -.5017533.1671677-3.00 0.003 -.829893 -.1736135 restaurn -2.865621 1.117406-2.56 0.011-5.059019 -.6722235 age.7745021.1605158 4.83 0.000.4594196 1.089585 agesq -.0090686.0017481-5.19 0.000 -.0124999 -.0056373 black.5592361 1.459461 0.38 0.702-2.305595 3.424067 _cons -3.241715 24.11391-0.13 0.893-50.57581 44.09238 a. Interpret the estimated coefficient on lincome. b 3 /100 0.0087 means that when the income goes up by 1%, the demand for cigarettes is predicted to increase by 0.0087 cigarettes per day. 1

b. Based on Meder s results, what is the estimated income elasticity of demand for cigarettes for someone who smokes 1 cigarette per day, and for someone who smokes 10 cigarettes per day? ˆ ˆ cigs, income cigs 1 cigs, income cigs 10 b3 0.87 0.87 cigs 1 b3 0.87 0.087 cigs 10 You are supposed to learn from the above that for heavier smokers, income becomes less important factor in their demand for cigarettes. c. Interpret the estimated coefficient on restaurn. In states that have restrictions on smoking in restaurants, the demand for cigarettes is 2.87 cigarettes per day lower, than in states that do not have restrictions on smoking in restaurants, holding all other characteristics the same. d. Suppose that Meder wants to test whether restricting smoking in restaurants is an effective policy for reducing smoking. Write the null and alternative hypotheses for this test. H0 : 5 0 H : 0 1 5 2

e. Based on the reported p-values, what is your conclusion about the test in the last section? Explain your answer. First, the sign of the estimated coefficient is negative ( b 2 2.87 0 ), which is consistent with the alternative hypothesis. Second, the reported p-value of 0.011 is for the two sided test, H 0 v.s. 0 : 2 H 1 : 2 0, so for the one-tailed test the relevant p-value is 0.011/ 2 0.0055 0. 05. Therefore, we reject the null hypothesis at significance level of 0. 05 and conclude that restricting smoking is effective in reducing smoking. f. Interpret the estimated coefficient on black. b 8 0.56 means that black (African Americans) smoke 0.56 cigarettes more per day that other races, holding all other characteristics the same. g. Suppose that you wish to test whether the number of cigarettes smoked per day is the same for black as for other races. Write the null and alternative hypotheses for this test. H H 0 1 : 8 0 : 0 8 h. Based on the reported 95% confidence intervals, what is your conclusion about the test in the last section? Explain your answer. The 95% confidence interval for 8 is [-2.305595, 3.424067], which contains the value of 0. Therefore, we fail to reject the null hypothesis of H 0 : 6 0 at significance level of 5%. Recall that the reported confidence intervals in Stata contain all the hypothesized values of the true parameter, which are not rejected by the current sample. We conclude that there is no significant difference between smoking of black and other races. 3

i. Suppose that you wish to test whether the effect of education on demand for cigarettes is the same for black as for other races. How would you change the original model to allow for that test? We would need to add another regressor an interaction term educ*black. The coefficient on that regressor will give the difference between the effect of education on smoking of black and the effect of education on smoking of other races. j. Based on Meder s results, the demand for cigarettes is increasing in age. True/False, circle the correct answer and prove mathematically. The effect of age on smoking is not monotone (cigs is a quadratic function of age, increasing up to certain age, and declining afterwards): cigs b 6 2b 7 age 0.774 2 0.009age age Remark. In fact, one can find the age when smoking is at maximum: cigs * b6 2b7age 0 age * b6 0.774 age 43 2b7 2 0.009 After age of 43 smoking declines, perhaps because people get more mature and start feeling the negative consequences of smoking on their health. 4

2. (20 points). Seung studies the factors affecting the choice of high school students to go to college. She collected data on 1000 former high school students, with the following observations: college - dummy variable (= 1 if the student enrolled in college, 0 otherwise) grades average high school grade of math, English and social studies faminc gross annual family income (in $1000) famsiz - number of family members parcoll dummy variable (= 1 if most educated parent graduated from college or had an advanced degree, 0 otherwise) female dummy variable (= 1 if a person is female, 0 otherwise) black dummy variable (= 1 if a person is black, 0 otherwise) Seung s Stata commands and output are presented below.. probit college grades faminc famsiz parcoll female black Probit regression Number of obs = 1000 LR chi2(6) = 226.42 Prob > chi2 = 0.0000 Log likelihood = -416.21967 Pseudo R2 = 0.2138 college Coef. Std. Err. z P> z [95% Conf. Interval] -------------+---------------------------------------------------------------- grades.2945521.0274882 10.72 0.000.2406761.348428 faminc.005393.0018099 2.98 0.003.0018457.0089404 famsiz -.0531059.0374572-1.42 0.156 -.1265207.0203089 parcoll.4765344.1424817 3.34 0.001.1972755.7557933 female.0237927.1014679 0.23 0.815 -.1750806.2226661 black.6109028.2176202 2.81 0.005.184375 1.037431 _cons -1.135516.2250342-5.05 0.000-1.576574 -.6944567. mfx Marginal effects after probit y = Pr(college) (predict) =.84535426 variable dy/dx Std. Err. z P> z [ 95% C.I. ] X ---------+-------------------------------------------------------------------- grades.0700821.00618 11.35 0.000.057978.082186 6.46961 faminc.0012832.00043 3.02 0.003.00045.002116 51.3935 famsiz -.0126354.00889-1.42 0.155 -.030052.004781 4.206 parcoll*.1030921.02745 3.76 0.000.049284.1569.308 female*.0056604.02414 0.23 0.815 -.041661.052981.496 black*.107392.02648 4.06 0.000.055491.159293.056 (*) dy/dx is for discrete change of dummy variable from 0 to 1 5

a. Interpret the estimated marginal effect of grades. A one-point increase in the average high school grades increases the probability of attending college by 0.07 (or by 7%). b. Interpret the estimated marginal effect of black. The probability of attending college is higher by 0.1 (or by 10%) for black than for white high school graduates, with all other characteristics at their sample mean values. c. Between family income (faminc) and family size (famsiz), which one is more important factor in determining the chances of attending college, based on the above estimates? Explain your answer. Family income is more important because it s effect on probability of attending college is significant (p-value = 0.003 < 0.05), while family size has insignificant effect on the probability of attending college (p-value = 0.155 > 0.05). d. The main reason why logit and probit models are preferred to the linear probability model is (circle the correct answer): i. Logit and probit are easier to estimate than the linear probability model. ii. Logit and probit models allow for calculating the marginal effects on the predicted probability of an outcome. iii. The fitted values in the probit and logit models are always between 0 and 1. iv. The Logit and probit estimators are BLUE (Best Linear Unbiased Estimators). 6

3. (15 points). Suppose that according to the theory, health of a nation s population is important factor in determining the country s economic growth. a. A researcher estimates a regression model with dependent variable being growth, but she does not include health as one of the regressors. What are the likely consequences on the other estimates? Circle the correct answer. i. The OLS estimators of the other coefficients are biased but consistent. ii. The OLS estimators unbiased but inconsistent. iii. The OLS estimators are unbiased and consistent, but inefficient. iv. The OLS estimators are biased and inconsistent. v. The OLS estimators are biased and inefficient. b. Suppose that the researcher realizes that health should be included as one of the regressors in the model, but unfortunately there is no data on the variable health. Propose a solution to this problem. Be specific. The researcher can use a proxy for health, e.g. life expectancy. c. In general, what is the most important source of guidance for model specification (i.e. determining what variable is dependent, and what are the regressors)? Economic (or other) theory. 7

4. (5 points). A researcher estimates a regression model, and finds that the estimated individual coefficients are not significant, but at the same time the overall fit of the model is good. What is the likely reason for her results? Circle the correct answer. a. Multicollinearity. b. Heteroscedasticity. c. Omitted variable bias. d. Serial correlation. 5. (10 points). Suppose a researcher is using time series data in regression analysis. a. Which problem is the researcher more likely to face? Circle the correct answer. i. Heteroscedasticity. ii. Multicollinearity. iii. Autocorrelation. b. Suppose the dependent variable and some of the regressors exhibit time trends. Briefly explain what problem that is likely to arise in this research project, and provide a solution to it. The problem is spurious regression (meaning false or fake), because it does not measure the causal effect of the regressors on the dependent variable. Instead, the model estimates the effects of the time trend on the dependent variable. The simplest solution is to include time as a regressor. 8