The Prevalence of HIV in Botswana

Size: px
Start display at page:

Download "The Prevalence of HIV in Botswana"

Transcription

1 The Prevalence of HIV in Botswana James Levinsohn Yale University and NBER Justin McCrary University of California, Berkeley and NBER January 6, 2010 Abstract This paper implements five methods to correct naïve HIV seroprevalence rates for the selection bias induced by voluntary testing. We apply the methodologies to estimate the prevalence of HIV in Botswana. I. Introduction Household surveys increasingly include a request for a bio-specimen from some household members. Respondents sometimes refuse to provide the requested bio-specimen, raising issues of representativeness. This issue is especially relevant in the case of HIV. In several nationally representative surveys conducted to estimate HIV prevalence rates, opt-out rates of 25 to 50 percent are not uncommon. If opt-out is not random, then prevalence rates will be subject to selection bias. In this paper, we use the 2004 Botswana AIDS Impact Survey (BAIS) to estimate prevalence rates corrected for selection. The richness of this particular data set makes it especially suitable for an analysis predicated on an assumption of selection on observables. Invoking a selection on observables assumption, we estimate a population prevalence rate based on the selected subsample of those who agreed to provide a bio-specimen. We apply five methodologies to estimate the HIV prevalence rate as there is currently not a consensus view of the best way to address selection on observables. We implement a propensity score reweighting estimator (see for example (DiNardo, Fortin and Lemieux 1996) and (Hirano, Imbens and Ridder 2003)), a matching estimator (see for example (Heckman, Ichimura, Smith and Todd 1998)), a control function estimator (see for example (Heckman and Navarro-Lozano 2004)), the double robust approach (see (Robins and Rotnitzky 1995)), as well as an application of Blinder Oaxaca ((Blinder 1973) and (Oaxaca 1973)). This results in both a robust estimate of the corrected-for-selection HIV prevalence rate (our principle focus) as well as a comparison of alternative econometric methodologies (a secondary issue.) We are grateful to Taryn Dinkelman and Zoe McLaren for research assistance. Levinsohn acknowledges support from NICHD. 1

2 Although the methods proposed address an issue that arises regularly in the context of HIV testing, they are potentially applicable to any study in which background information on respondents is available but some respondents decline to provide data on a variable that is key to the question at hand. We find that all our methodologies result in HIV prevalence rates that are only slightly lower than the prevalence rate that is estimated when selection bias is ignored. We then apply non-parametric graphical methods to investigate why this is so. In the next section, we introduce the data and use a bounds test to illustrate the bias that may result from sample selection when computing HIV prevalence rates. In section III, we explain the five approaches we implement. Section IV presents results, while Section V concludes. II. Country Context, Data, and Bounded Prevalence Rates Botswana is a southern African country with a population of about 1.63 million. According to UNAIDS, the overall HIV prevalence rate is 19%. For those in the years old cohort, the prevalence rate is estimated to be 38.5% and by 2010 it is projected that 20% of all children will be orphans. 1 Few countries are as impacted by HIV/AIDS as Botswana. Only tiny Swaziland has a higher estimated overall prevalence rate. In 2004, the second wave of the Botswana AIDS Impact Survey (BAIS-2) was collected. This survey was the first nationally representative survey in Botswana to include a test for HIV. After data cleaning, the sample included 7669 households, comprised of 27,521 individuals. The survey included detailed questions on sexual practices, knowledge of HIV/AIDS, as well as basic demographic information. Respondents 18 months and older were asked to provide an HIV test specimen. 2 13,997 respondents provided a valid specimen, while 13,524 did not. 3 With almost half of our sample opting out of the HIV test, there is substantial uncertainty about what the true HIV prevalence rate might be. We begin by computing the bounds on the true prevalence rate as well as the standard errors around those bounds using the approach of Imbens and Manski. Table 1 reports the upper bound (if all those who opted out were HIV+) and the lower bound (if all those who opted out were HIV-) as well as bootstrapped standard errors for both bounds. It is also possible to bound the prevalence rate that lies within these bounds. This is the Imbens-Manski confidence interval also reported in Table 1. That interval is [ ] using a 95 percent confidence level. 4 With prevalence rates bounded by 8.1 percent to 59.6 percent, testing opt-out clearly may be quite important. Finally, we report the prevalence rate computed for those individuals who provided a bio-sample. We refer to this as the naïve prevalence rate. That rate is percent. We turn next to our multiple methodologies for estimating just where in the Imbens-Manski confidence interval the true prevalence rate lies. 1 See: (accessed Feb. 3, 2009). 2 HIV testing of respondents under 18 years old required consent of a guardian. 3 Individuals who provided an invalid biosample and for whom HIV status is not measured are coded as not having been tested. 4 See equations (6) and (7) in (Imbens and Manski 2004) for the derivation. 2

3 III. Methodology Four of our methodologies (propensity score reweighting, pair matching, control function, and double robust) are based on the propensity score. After establishing notation, each is briefly discussed. Our fifth methodology is an application of the more established Blinder Oaxaca approach. We include this since it is simple to implement, has a long history, and hence provides a benchmark to which the propensity score-based methods can be compared. A. Set-up Denote an individual s characteristics by X. These include demographic attributes as well as information on sexual practices and knowledge about HIV/AIDS. D is an indicator variable set to one if a respondent eligible for the HIV test agreed to be tested. Y is an indicator variable set to one if a person tested positive for HIV, so Y is only observed for respondents with D = 1. There are n individuals in the sample. Denote the propensity score, a scalar, by P (D = 1 X) p(x). Hence the propensity score is the predicted probability of taking the HIV test conditional on an individual s characteristics. The naïve prevalence rate will be correct if seropositivity is independent of whether one opted out of the HIV test. This independence is denoted by: Y D. (1) When (1) does not hold, naïve prevalence rates may be inaccurate. For the case at hand, even the sign of the bias when (1) does not hold is unclear. For example, if HIV-negative individuals are keenly aware of how HIV is transmitted and are hesitant to provide a stranger with a blood sample, the naïve prevalence rate is likely to be over-estimated. If individuals who believe in traditional religions that forbid giving a piece of one s self to a stranger (and whose lack of knowledge about HIV may make them more likely to be HIV+) opt out, then the naïve prevalence rate is likely to be under-estimated. Each of the five employed methodologies assumes that selection is on observables, and this implies a conditional independence assumption (CIA) given by: Y D X. (2) That is, individual characteristics, X, explain the relationship between an individual s willingness to be tested, D, and that individual s HIV status, Y, so that when one conditions on X, the decision to be tested is independent of HIV status. The appropriateness of the CIA in practice will depend on the available X s. We return to the empirical plausibility of the CIA in the results section. We also adopt the common support assumption, that p(x) > c for some c > This is sufficient to insure the root consistency of the estimator of reweighted prevalence rates. See Bussos, DiNardo and McCrary for details. 3

4 B. Propensity Score Reweighting The first approach to estimating the HIV prevalence rate is propensity score reweighting. Given the CIA and the support assumption, it is possible to estimate the population prevalence rates using only a weighted average of data from those who agreed to take the HIV test. The proof of this result is straightforward and clarifies the content of the support and conditional independence assumptions. We have: E [Y ] = E [E [Y X]] [ ] 1 = E E [Y X] E [D X] p(x) [ ] 1 = E E [Y D X] p(x) [ [ = E E Y D 1 ]] p(x) X [ = E Y D 1 ] p(x) [ = E Y D 1 ] [ p(x) D = 1 p + E Y D 1 ] p(x) D = 0 (1 p) [ ] p = E Y p(x) D = 1. (3) where p P (D = 1). The first equality is the Law of Iterated Expectations. The second equality follows from the definition of the propensity score, p(x) and makes use of the support assumption. Note that the support assumption rules out the possibility that there are some individuals for whom their background guarantees that they are unwilling to be tested. The third equality follows from the CIA. The fourth equality follows because p(x) is a function of X, and the fifth is again the Law of Iterated Expectations. The sixth equality states that the overall average is the average over both D = 1 types (those who took the HIV test) and D = 0 types (those who did not take the HIV test). The last equality reflects the fact that only those with D 0 matter for the calculation. The final result shows that the overall prevalence rate E[Y ] can be computed by re-weighting the observed prevalence rate for those who agree to take the test. The appropriate weight is simply the unconditional probability relative to the propensity score, and the average is taken over all those who took the HIV test. C. Matching An alternative methodology, also based on the propensity score, is matching. There are several matching estimators including pair matching, nearest neighbor matching, kernel matching, local linear matching, and ridge matching. 4

5 With the matching estimators, the prevalence rate is given by: E [Y ] = 1 n n i=1 [ ] D i Y i + (1 D i )Ŷi and the different matching estimators define Ŷi differently. We employ pair matching. (4) The intuition behind matching is to use data from individuals who did take the HIV test to infer the status of those who opted out. Pair matching simply infers the status of the opt-out from the status of the individual with the most similar propensity score. D. Control Function The third methodology, again based on the propensity score, is a control function approach. Because we maintain the selection on observables assumption, we do not impose an exclusion restriction. The control function approach is best motivated by a figure. Suppose one plotted the average prevalence rate on the vertical axis and each value of the propensity score on the horizontal axis. Hence, if such a plot was a flat line, there would be no relationship between the likelihood of taking the test and one s HIV status and hence no selection bias. The core idea of the control function (without an exclusion restriction) approach is to flexibly estimate the expected HIV status conditional on the estimated propensity score. 6 This estimation procedure can of course only be done for those for whom HIV status is observed (D = 1). One then predicts the HIV status for those who opted out based on the estimated flexible relationship. The estimated prevalence rate is then simply the average across all individuals of the observed and predicted HIV status. This is shown more formally below. E [Y ] = E [E [Y p(x)]] = E [E [Y p(x), D = 1]] = E [Y p(x) = q, D = 1] f p(x) (q)dq (5) The first line follows from the law of iterated expectations. The second line makes use of the CIA since having conditioned on p(x), nothing is changed by additionally conditioning on D = 1. The third line computes the expectation by taking the integral of the density of propensity scores (f p(x), where q is a specific value of the propensity score and the integration is across q. E. Double Robust We also implement the double robust approach of Robins and Rotnitzky (1995). Imbens (2004) provides an intuitive discussion of this approach in the context of average treatment effect estima- 6 In practice, we employ a fifth order polynomial of the propensity score. 5

6 tion. Essentially, this is a way of combining reweighting with regression adjustment. It is believed to be a more robust method, in the sense that if the propensity score model is misspecified, there is still a possibility that the regression model is correctly specified, and vice versa. It turns out that only one of these models need to be correct for the double robust estimator to deliver an accurate estimate. In our context, the double robust approach involves estimating the propensity score weights for p each observation, p(x), and then regressing HIV status, Y, on those variables that appeared in the propensity score specification using the propensity score weights. This regression is then used to project HIV status for those who did not take the HIV test. The estimated prevalence rate is taken over the actual HIV status for those who were tested and the predicted HIV status for those who were not, as in (4). F. Blinder-Oaxaca Well before the propensity score literature developed, Blinder (1973) and Oaxaca (1973) developed a straightforward methodology to predict values that variables might take in the case of a wellspecified counterfactual. In our context, we regress HIV status, Y, on those variables, X, that entered the propensity score and then use the estimated parameters to predict HIV status for those who did not take the test, ŶD=0. In particular, Y = X D=1 ˆβ (6) Ŷ D=0 = X D=0 ˆβ (7) The first regression is a simple OLS linear probability model run, by construction, only using those individuals who took the HIV test (D = 1.) The overall prevalence rate is then given by (4). One potential advantage of the Blinder-Oaxaca approach is that it is not as dependent on the common support assumption required by the reweighting estimators. In particular, when the distributions of the propensity scores are not sufficiently overlapping between the treatment and control groups, some of the reweighting estimators are problematic. This is because reweighting results in division by zero (or near zero.) In this situation, the Blinder-Oaxaca estimator will be more robust, since it forecasts from the area for which there is common support all the way out to the boundary. G. Standard Errors of Estimated Prevalence Rates The naïve prevalence rate as well as the prevalence rates resulting from each of our five methodologies each have a standard error associated with it. All five of our methodologies, though, involve estimated regressions, and this introduces another source of noise that contributes to the standard error of these prevalence rates. (The propensity score is estimated for all but Blinder-Oaxaca and an HIV prediction equation is estimated for Blinder-Oaxaca.) Although for some estimators 6

7 (e.g. propensity score reweighting see Hirano et al. (2003)) there are results showing that under particular circumstances it may be conservative to ignore this estimation error, we elect to report standard errors that account for it. For propensity score reweighting, the control function approach, the double robust approach, and Blinder-Oaxaca, we report bootstrapped standard errors. 7 In the case of pair-matching, we use subsampling, as the bootstrap is known to be inconsistent (Abadie and Imbens 2006). Whereas the bootstrap involves repeatedly estimating the same estimator using n-choose-n resamples from the original data, drawn randomly with replacement, subsampling involves repeatedly estimating the same estimator using n-choose-b resamples from the original data, drawn randomly without replacement. IV. Results All of our approaches except Blinder-Oaxaca require, as a first step, estimating the propensity score. We begin, then, with a discussion of the estimated propensity score. Each propensity score-based approach also assumes that the CIA holds while reweighting, pair-matching, and double robust also require the support assumption. 8 We next investigate the appropriateness of these assumptions. We then present our estimated prevalence rates. We conclude this section with an investigation into the particular patterns of selection that are driving the results. 9 A. Estimating the propensity score The estimated propensity score is the predicted value of a logit regression of whether one took the HIV test on vectors of attributes, X. We adopt a relatively parsimonious specification and then check to see if the reweighted data are balanced. Results of the propensity score logit are given in Table 2, and predicted values of this regression comprise the propensity scores. The first column of Table 2 gives the variables included in the propensity score. 10 The second column gives the excluded category for the case of categorical variables. The third column reports the coefficient from the logit and the standard error of the coefficient. These coefficients, though, are not readily interpretable. The fourth column reports the average change in the probability of taking the HIV test resulting from a specific counterfactual as well as the standard error of this difference. This is not the same as the marginal effects that are frequently computed by standard 7 Results are reported for standard errors computed with 500 replications. 8 It is not clear that the control function approach requires this assumption. 9 Although our data are confidential, all programs used for estimation will be available on the web sites of each co-author. 10 Not all respondents answered every question. For variables with missing values, a value was imputed when an observation was missing. An indicator variable was created for every variable with any missing observations. This indicator was set to one if a particular observation was imputed. This procedure allows one to still use the full sample for the logit (and hence propensity scores) while sweeping out the impact of the (arbitrary) imputation for the missing values. Such indicators were included in the logit for age, marital status, whether one had protected oneself from HIV, whether one had heard of HIV, whether one had previously taken an HIV test, and years of education. We do not report the coefficients on these variables for purposes of brevity. 7

8 statistical packages, since in the instance of variables that enter other than linearly and in the instance of categorical variables that take on more than two values, the standard marginal effects are difficult (or impossible) to interpret. For the case of continuous variables (e.g. age and years of education), we increase the value of that variable by one year for everyone in the sample and then compute the average impact on the likelihood of testing taking into account the fact that the variable enters the propensity score quadratically. 11 Hence, there is only one value in column 4 for the continuous variables age and education. (Although standard statistical packages will compute a marginal effect for, say, age squared, doing so holding age constant makes little sense.) For the case of categorical variables, we compute the average change in the probability of testing when the categorical variable is changed from its actual value to the value of the excluded characteristic. For example, there are six values that the variable for marital status can take. The excluded category is married. For the case of, say, living together but not married, we compute the change in the likelihood of testing by comparing the average probability of testing for the original logit with the average probability of testing when all individuals who were living together are instead coded as being married (the omitted category.) The computation is similar to that given in footnote 12. The key difference is in the set of observations over which the average change is computed. For a continuous variable such as age, the counterfactual has age increased for everyone so the average change is computed across all observations. For a categorical variable, only those observations coded as 1 (e.g. living together) are changed to zero (e.g. married ) so the average is only taken across those observations whose value changed in the counterfactual. The first column of Table 2 lists the variables that enter the propensity score for the base case. These are the respondent s age and age squared as well as the guardian s age and age squared. (Minors do not give their permission to take the HIV test. Rather, their parent or guardian does, hence we include the attributes of the guardian for those respondents who are minors.) Next is an indicator variable for whether the respondent is female followed by indicator variables for five values of marital status. Next are indicator variables for whether the respondent lives in an urban area, is a citizen, has ever had sex, used protection against HIV, has heard of HIV and has previously had an HIV test. Finally, years of education and its square are included. The last column reports the results in a form that is readily interpretable. For example, increasing the respondent s age by one year results on average in a 0.7 percentage point decrease in the likelihood of taking the HIV test. If one lived in an urban area ( Urban ), the coefficient in the logit is negative. Hence living in an urban area makes one less likely to take the HIV test. If we then take all urban residents and code them as rural (the counterfactual), these recoded respondents are 6.2 percentage points more likely to provide a bio-sample. This table also indicates that one is less likely to take the HIV test if one has already been tested and that more education makes one less likely to take the HIV test. While an entire literature has developed in the public health field around who agrees to be tested for HIV, this table is really only the first step to our analysis of 11 This is done in 4 steps. Step one is to compute the vector of predicted probabilities using the specified logit. Step two is to increase the variable, say age, by one year for everyone in the sample, updating age squared accordingly and, using the coefficients from the original logit, predict the new vector of probabilities. Step 3 is to take the difference between the vector of new probabilities and the original probabilities. Step 4 is to compute the average of these differences across individuals and the standard error of that average. We use the paired (or nonparametric) bootstrap to compute the standard errors. 8

9 selection. B. Evaluating the CIA The CIA implies selection entirely on observables. We examine the appropriateness of this assumption by conducting balancing tests on variables in the logit (internal variables). 12 The intuition behind balancing tests is appealing. Selection means that the values for variables included in the logit are systematically different for those who provided a bio-specimen compared to those who p opted out. If the CIA obtains, then reweighting variables by p(x) as in (3), should erase these systematic differences. To implement the balancing test, for every independent variable in the logit, we regress that variable p on D using the original weights and then again weighting by p(x).13 Comparing results with the reweighted data to the results with the originally weighted data, we investigate whether the reweighting is addressing the selection. To the extent that the reweighting is effective, CIA obtains, and a comparison of regression results with reweighted data should show a coefficient on D that is insignificantly different from zero. Results are reported in Table 3. The first row of each entry gives the difference in the means between those who provided a bio-sample and those who did not. We report the difference in the mean, the standard error of that difference, and the t-statistic for the null hypothesis that the mean is the same for those who provided a specimen as for those who did not. For example, females are 3.17 percent more likely to provide a bio-sample in the unweighted data and this difference is quite statistically significant with a t-statistic of When the data are reweighted, the difference becomes statistically insignificant. This is true for every variable in the propensity score specification. 14 We conclude that the logit specified in Table 3 is generally consistent with the CIA. 15 C. The Distribution of Propensity Scores Kernel density estimates of the distributions of propensity scores for those who provided a biospecimen and for those who opted out are presented in Figure 1. This figure serves two functions. First, it informs one that a selection problem is present. This is evidenced by the fact that the two distributions in Figure 1 are not approximately coincident. (While Figure 1 shows that a selection 12 In previous versions of this paper, we also conducted external balancing tests that is, balancing tests using variables that themselves did not enter the propensity score. These results were also generally consistent with the CIA. They are not reported here for expositional ease. 13 By using a regression framework, it is simple to do hypothesis tests for whether those tested differed in a given attribute from those not tested. There are of course other ways to equivalently test this. 14 Age squared remains significantly different, but age does not. Taken together, age, flexibly specified, does not differ across those who provided a sample and those who did not in the reweighted data. 15 There are of course more saturated logit models that could be used. In results not reported here, we experimented with adding up to 87 more variables to the logit and the estimated prevalence rates using propensity score reweighting did not change by more than about

10 problem exists it does not speak directly to the empirical magnitude of the problem.) Figure 1 provides empirical support for the support assumption. Second, D. Prevalence Rates Estimated HIV prevalence rates are given in Table 4. The standard errors for each prevalence rate are based on bootstrapping for the case of all but pair matching and on sub-sampling for the case of pair matching. 16 There are two striking findings in Table 4. First, all of the selection-corrected prevalence rates are very close to one another. Across four different methodologies based on the propensity score and one (Blinder-Oaxaca) that takes a different approach altogether, the results are mutually confirming. Furthermore, all of the selection-corrected prevalence rates are precisely measured. While it is straightforward to do hypothesis tests on whether the rates are statistically significant from one another, it is clear that most lie within two standard deviations of one another. The second striking fact in Table 4 is that all the selection-corrected prevalence rates are very close to and only modestly lower than the naïve rate. Hence, even though the Imbens-Manski confidence interval given in Table 1 was quite large, correcting for selection into testing (five ways) makes almost no difference. The remaining question is Why? We investigate this question using nonparametric methods to investigate the empirical relationship between a respondent s propensity score and their actual HIV status. If the likelihood of taking the HIV test (the propensity score) is independent of one s HIV status, then one can obtain accurate estimates of the population prevalence rate from any random sample (of sufficient size) of the survey respondents. In this case, a graph with HIV prevalence rate on the vertical axis and the propensity score on the horizontal axis would simply be a flat line the likelihood of being HIV+ would not vary with the propensity score. If such a graph sloped upward, then the more likely one is to take the test, the more likely one is to be HIV+, so the actual prevalence rate would be less than the naïve rate. Conversely, if the graph sloped downward, those likely to take the test are more likely to be HIV-, and the actual prevalence rate would be greater than the naïve rate. We investigate this issue in our data in Figure 2. Figure 2 presents a locally weighted linear regression of HIV status on the respondent s propensity score. The upper and lower lines give the bounds defined by two standard errors of the estimated regression coefficient. Figure 2 indicates that those who are especially likely to opt out are more likely to be HIV+ and those who are especially likely to test are also more likely to be HIV+. Were it only the case that those likely to opt out were more likely to be positive, the naïve prevalence rate would be lower than the corrected rate. Were it only the case that those likely to be tested were more likely to be positive, the naïve prevalence rate would be higher than the corrected rate. The fact that both those very likely and those very unlikely to take the test are more likely to be HIV positive illustrates that while selection is present, the two types of selection countervail one another so that, in this instance, the naïve and corrected rates are about the same. 16 We drew 10,000 samples to compute standard errors. 10

11 An important message from Figure 2 is that it would be wrong to simply ignore selection bias ex ante. One can only determine whether the selection implied by the bi-modal pattern in Figure 2 results in a prevalence rate that differs from the rate by employing one or more of the methodologies used above. V. Conclusions Adopting a selection-on-observables assumption, we have illustrated five ways to correct HIV prevalence rates for non-random opt-out of testing. Using a nationally representative survey from Botswana, we show that opt-out is non-random. Indeed, individuals highly likely to be HIV+ are more likely to take to provide a bio-sample and individuals quite likely to be HIV- are more likely to provide a bio-sample. These potential sources of bias countervail one another and prevalence rates corrected for self-selection are only modestly less than the naïve rate. 11

12 References Abadie, Alberto and Guido Imbens, On the Failure of the Bootstrap for Matching Estimators, June NBER Working Paper No. T0325. Blinder, Alan, Wage Discrimination: Reduced Form and Structural Estimates, Journal of Human Resources, 1973, 8 (4), Bussos, Mathias, John DiNardo, and Justin McCrary, Testing for Parametric Consistency of Semiparametric Treatment Effects Estimators, University of Michigan, Working Paper. DiNardo, John, Nicole M. Fortin, and Thomas Lemieux, Labor Market Institutions and the Distribution of Wages, : A Semiparametric Approach, Econometrica, September 1996, 64 (5), Heckman, James and Salvador Navarro-Lozano, Using Matching, Instrumental Variables, and Control Functions to Estimate Economic Choice Models, The Review of Economics and Statistics, February 2004, 86 (1), Heckman, James J., Hidehiko Ichimura, Jeff Smith, and Petra Todd, Characterizing Selection Bias Using Experimental Data, Econometrica, September 1998, 66 (5), Hirano, Keisuke, Guido W. Imbens, and Geert Ridder, Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score, Econometrica, July 2003, 71 (4), Imbens, Guido and Charles F. Manski, Confidence Intervals for Partially Identified Parameters, Econometrica, November 2004, 72 (6), Imbens, G.W., Nonparametric Estimation of Average Treatment Effects Under Exogeneity: A Review, Review of Economics and Statistics, 2004, 86 (1), Oaxaca, R., Male-female wage differentials in urban labor markets, International Economic Review, 1973, 14, Robins, J.M. and A. Rotnitzky, Semiparametric Efficiency in Multivariate Regression Models with Missing Data, Journal of the American Statistical Association, 1995, 90 (429),

13 Table 1: Bounds for HIV Prevalence Lower Upper Imbens-Manski Naïve Bound Bound Confidence Interval Estimate [ ] (.0029) (.0070) (.0036) Figure 1. Density of Propensity Scores, by Tested Status Density for Tested Density for Untested The density for accepted is the line that initially lies below refused and then is above the refused line for higher values on the X-axis. Figure 2 Prevalence Rates and Propensity Scores Propensity Score HIV prevalence rate Upper Bound Lower Bound 13

14 Table 2: Propensity Score Results Variable Excluded Logit Impact of Category Coefficient Specific Counterfactual Age (.004) (.0004) Age (.00005) Guardian s Age (.007) (.0004) Guardian s Age (.00008) Female Male (.029) (.006) Female Guardian Male Guardian (.050) (.010) Living Together Married (.067) (.013) Separated Married (.201) (.042) Divorced Married (.173) (.035) Widowed Married (.092) (.019) Never Married Married (.061) (.013) Urban Rural (.050) (.010) Citizen Non-citizen (.099) (.019) Ever Had Sex Never Had Sex (.553) (.101) Used Protection Didn t Protect (.051) (.011) Heard of HIV Not Heard (.554) (.119) Guardian Heard Not Heard of HIV (1.554) (.362) Had a Previous Not Tested HIV test (.050) (.011) Guardian Previously Not Tested Had HIV Test (.064) (.014) Years of Educ (.018) (.001) Years Educ (.001) Guardian Years of Educ. (.033) (.001) Guardian Years of Educ. 2 Notes: Standard errors are adjusted for household clusters. Dummy variables for imputed values were included in the regression but are not reported. 14

15 Table 3: Re-Balancing the Data Variable Coefficient S.E. t-statistic Age RW Age RW Guardian Age RW Guardian Age RW Female RW Female Guardian RW Living Together RW Separated RW Divorced RW Widowed RW Never Married RW Urban RW Citizen RW Ever Had Sex RW Used Protection RW Heard of HIV RW Guardian Heard RW Previously Tested RW Guardian Tested RW Years Educ RW Year Educ RW Guardian Educ RW Guardian Educ RW Notes: The first row of each entry gives the results without re-weighting while the second row gives results after re-weighting. 15

16 Table 4: HIV Prevalence Rates Methodology Prevalence Rate Standard Error Naïve Propensity Score Reweighting Pair Matching Control Function Double Robust Blinder-Oaxaca

Practical propensity score matching: a reply to Smith and Todd

Practical propensity score matching: a reply to Smith and Todd Journal of Econometrics 125 (2005) 355 364 www.elsevier.com/locate/econbase Practical propensity score matching: a reply to Smith and Todd Rajeev Dehejia a,b, * a Department of Economics and SIPA, Columbia

More information

Estimating average treatment effects from observational data using teffects

Estimating average treatment effects from observational data using teffects Estimating average treatment effects from observational data using teffects David M. Drukker Director of Econometrics Stata 2013 Nordic and Baltic Stata Users Group meeting Karolinska Institutet September

More information

MEA DISCUSSION PAPERS

MEA DISCUSSION PAPERS Inference Problems under a Special Form of Heteroskedasticity Helmut Farbmacher, Heinrich Kögel 03-2015 MEA DISCUSSION PAPERS mea Amalienstr. 33_D-80799 Munich_Phone+49 89 38602-355_Fax +49 89 38602-390_www.mea.mpisoc.mpg.de

More information

Propensity Score Analysis Shenyang Guo, Ph.D.

Propensity Score Analysis Shenyang Guo, Ph.D. Propensity Score Analysis Shenyang Guo, Ph.D. Upcoming Seminar: April 7-8, 2017, Philadelphia, Pennsylvania Propensity Score Analysis 1. Overview 1.1 Observational studies and challenges 1.2 Why and when

More information

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n. University of Groningen Latent instrumental variables Ebbes, P. IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document

More information

Erratum and discussion of propensity-score reweighting

Erratum and discussion of propensity-score reweighting The Stata Journal (2008 8, Number 4, pp. 532 539 Erratum and discussion of propensity-score reweighting Austin Nichols Urban Institute Washington, DC austinnichols@gmail.com Keywords: st0136 1, xtreg,

More information

Mostly Harmless Simulations? On the Internal Validity of Empirical Monte Carlo Studies

Mostly Harmless Simulations? On the Internal Validity of Empirical Monte Carlo Studies Mostly Harmless Simulations? On the Internal Validity of Empirical Monte Carlo Studies Arun Advani and Tymon Sªoczy«ski 13 November 2013 Background When interested in small-sample properties of estimators,

More information

Score Tests of Normality in Bivariate Probit Models

Score Tests of Normality in Bivariate Probit Models Score Tests of Normality in Bivariate Probit Models Anthony Murphy Nuffield College, Oxford OX1 1NF, UK Abstract: A relatively simple and convenient score test of normality in the bivariate probit model

More information

The Impact of Relative Standards on the Propensity to Disclose. Alessandro Acquisti, Leslie K. John, George Loewenstein WEB APPENDIX

The Impact of Relative Standards on the Propensity to Disclose. Alessandro Acquisti, Leslie K. John, George Loewenstein WEB APPENDIX The Impact of Relative Standards on the Propensity to Disclose Alessandro Acquisti, Leslie K. John, George Loewenstein WEB APPENDIX 2 Web Appendix A: Panel data estimation approach As noted in the main

More information

Pros. University of Chicago and NORC at the University of Chicago, USA, and IZA, Germany

Pros. University of Chicago and NORC at the University of Chicago, USA, and IZA, Germany Dan A. Black University of Chicago and NORC at the University of Chicago, USA, and IZA, Germany Matching as a regression estimator Matching avoids making assumptions about the functional form of the regression

More information

EMPIRICAL STRATEGIES IN LABOUR ECONOMICS

EMPIRICAL STRATEGIES IN LABOUR ECONOMICS EMPIRICAL STRATEGIES IN LABOUR ECONOMICS University of Minho J. Angrist NIPE Summer School June 2009 This course covers core econometric ideas and widely used empirical modeling strategies. The main theoretical

More information

NBER WORKING PAPER SERIES HIV STATUS AND LABOR MARKET PARTICIPATION IN SOUTH AFRICA. James A. Levinsohn Zoë McLaren Olive Shisana Khangelani Zuma

NBER WORKING PAPER SERIES HIV STATUS AND LABOR MARKET PARTICIPATION IN SOUTH AFRICA. James A. Levinsohn Zoë McLaren Olive Shisana Khangelani Zuma NBER WORKING PAPER SERIES HIV STATUS AND LABOR MARKET PARTICIPATION IN SOUTH AFRICA James A. Levinsohn Zoë McLaren Olive Shisana Khangelani Zuma Working Paper 16901 http://www.nber.org/papers/w16901 NATIONAL

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

Does Male Education Affect Fertility? Evidence from Mali

Does Male Education Affect Fertility? Evidence from Mali Does Male Education Affect Fertility? Evidence from Mali Raphael Godefroy (University of Montreal) Joshua Lewis (University of Montreal) April 6, 2018 Abstract This paper studies how school access affects

More information

The role of self-reporting bias in health, mental health and labor force participation: a descriptive analysis

The role of self-reporting bias in health, mental health and labor force participation: a descriptive analysis Empir Econ DOI 10.1007/s00181-010-0434-z The role of self-reporting bias in health, mental health and labor force participation: a descriptive analysis Justin Leroux John A. Rizzo Robin Sickles Received:

More information

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research 2012 CCPRC Meeting Methodology Presession Workshop October 23, 2012, 2:00-5:00 p.m. Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy

More information

When knowledge is not enough: HIV/AIDS information and risky behavior in Botswana

When knowledge is not enough: HIV/AIDS information and risky behavior in Botswana When knowledge is not enough: HIV/AIDS information and risky behavior in Botswana Taryn Dinkelman Department of Economics University of Michigan James Levinsohn Ford School of Public Policy University

More information

bivariate analysis: The statistical analysis of the relationship between two variables.

bivariate analysis: The statistical analysis of the relationship between two variables. bivariate analysis: The statistical analysis of the relationship between two variables. cell frequency: The number of cases in a cell of a cross-tabulation (contingency table). chi-square (χ 2 ) test for

More information

1. INTRODUCTION. Lalonde estimates the impact of the National Supported Work (NSW) Demonstration, a labor

1. INTRODUCTION. Lalonde estimates the impact of the National Supported Work (NSW) Demonstration, a labor 1. INTRODUCTION This paper discusses the estimation of treatment effects in observational studies. This issue, which is of great practical importance because randomized experiments cannot always be implemented,

More information

Identifying Mechanisms behind Policy Interventions via Causal Mediation Analysis

Identifying Mechanisms behind Policy Interventions via Causal Mediation Analysis Identifying Mechanisms behind Policy Interventions via Causal Mediation Analysis December 20, 2013 Abstract Causal analysis in program evaluation has largely focused on the assessment of policy effectiveness.

More information

Young Women s Marital Status and HIV Risk in Sub-Saharan Africa: Evidence from Lesotho, Swaziland and Zimbabwe

Young Women s Marital Status and HIV Risk in Sub-Saharan Africa: Evidence from Lesotho, Swaziland and Zimbabwe Young Women s Marital Status and HIV Risk in Sub-Saharan Africa: Evidence from Lesotho, Swaziland and Zimbabwe Christobel Asiedu Department of Social Sciences, Louisiana Tech University; casiedu@latech.edu

More information

Effects of propensity score overlap on the estimates of treatment effects. Yating Zheng & Laura Stapleton

Effects of propensity score overlap on the estimates of treatment effects. Yating Zheng & Laura Stapleton Effects of propensity score overlap on the estimates of treatment effects Yating Zheng & Laura Stapleton Introduction Recent years have seen remarkable development in estimating average treatment effects

More information

Rapid decline of female genital circumcision in Egypt: An exploration of pathways. Jenny X. Liu 1 RAND Corporation. Sepideh Modrek Stanford University

Rapid decline of female genital circumcision in Egypt: An exploration of pathways. Jenny X. Liu 1 RAND Corporation. Sepideh Modrek Stanford University Rapid decline of female genital circumcision in Egypt: An exploration of pathways Jenny X. Liu 1 RAND Corporation Sepideh Modrek Stanford University This version: February 3, 2010 Abstract Egypt is currently

More information

PARTIAL IDENTIFICATION OF PROBABILITY DISTRIBUTIONS. Charles F. Manski. Springer-Verlag, 2003

PARTIAL IDENTIFICATION OF PROBABILITY DISTRIBUTIONS. Charles F. Manski. Springer-Verlag, 2003 PARTIAL IDENTIFICATION OF PROBABILITY DISTRIBUTIONS Charles F. Manski Springer-Verlag, 2003 Contents Preface vii Introduction: Partial Identification and Credible Inference 1 1 Missing Outcomes 6 1.1.

More information

Lecture II: Difference in Difference. Causality is difficult to Show from cross

Lecture II: Difference in Difference. Causality is difficult to Show from cross Review Lecture II: Regression Discontinuity and Difference in Difference From Lecture I Causality is difficult to Show from cross sectional observational studies What caused what? X caused Y, Y caused

More information

Understanding Uncertainty in School League Tables*

Understanding Uncertainty in School League Tables* FISCAL STUDIES, vol. 32, no. 2, pp. 207 224 (2011) 0143-5671 Understanding Uncertainty in School League Tables* GEORGE LECKIE and HARVEY GOLDSTEIN Centre for Multilevel Modelling, University of Bristol

More information

Lecture II: Difference in Difference and Regression Discontinuity

Lecture II: Difference in Difference and Regression Discontinuity Review Lecture II: Difference in Difference and Regression Discontinuity it From Lecture I Causality is difficult to Show from cross sectional observational studies What caused what? X caused Y, Y caused

More information

BOOTSTRAPPING CONFIDENCE LEVELS FOR HYPOTHESES ABOUT REGRESSION MODELS

BOOTSTRAPPING CONFIDENCE LEVELS FOR HYPOTHESES ABOUT REGRESSION MODELS BOOTSTRAPPING CONFIDENCE LEVELS FOR HYPOTHESES ABOUT REGRESSION MODELS 17 December 2009 Michael Wood University of Portsmouth Business School SBS Department, Richmond Building Portland Street, Portsmouth

More information

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data TECHNICAL REPORT Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data CONTENTS Executive Summary...1 Introduction...2 Overview of Data Analysis Concepts...2

More information

The Impact of Learning HIV Status on Marital Stability and Sexual Behavior within Marriage in Malawi

The Impact of Learning HIV Status on Marital Stability and Sexual Behavior within Marriage in Malawi The Impact of Learning HIV Status on Marital Stability and Sexual Behavior within Marriage in Malawi Theresa Marie Fedor Hans-Peter Kohler Jere R. Behrman March 30, 2012 Abstract This paper assesses how

More information

Regression Discontinuity Design (RDD)

Regression Discontinuity Design (RDD) Regression Discontinuity Design (RDD) Caroline Flammer Ivey Business School 2015 SMS Denver Conference October 4, 2015 The Identification Challenge Does X cause Y? Tempting to regress Y on X Y = a + b

More information

Is Knowing Half the Battle? The Case of Health Screenings

Is Knowing Half the Battle? The Case of Health Screenings Is Knowing Half the Battle? The Case of Health Screenings Hyuncheol Kim, Wilfredo Lim Columbia University May 2012 Abstract This paper provides empirical evidence on both outcomes and potential mechanisms

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

Political Science 15, Winter 2014 Final Review

Political Science 15, Winter 2014 Final Review Political Science 15, Winter 2014 Final Review The major topics covered in class are listed below. You should also take a look at the readings listed on the class website. Studying Politics Scientifically

More information

Motherhood and Female Labor Force Participation: Evidence from Infertility Shocks

Motherhood and Female Labor Force Participation: Evidence from Infertility Shocks Motherhood and Female Labor Force Participation: Evidence from Infertility Shocks Jorge M. Agüero Univ. of California, Riverside jorge.aguero@ucr.edu Mindy S. Marks Univ. of California, Riverside mindy.marks@ucr.edu

More information

Confidence Intervals On Subsets May Be Misleading

Confidence Intervals On Subsets May Be Misleading Journal of Modern Applied Statistical Methods Volume 3 Issue 2 Article 2 11-1-2004 Confidence Intervals On Subsets May Be Misleading Juliet Popper Shaffer University of California, Berkeley, shaffer@stat.berkeley.edu

More information

Empirical Strategies

Empirical Strategies Empirical Strategies Joshua Angrist BGPE March 2012 These lectures cover many of the empirical modeling strategies discussed in Mostly Harmless Econometrics (MHE). The main theoretical ideas are illustrated

More information

Methods for Addressing Selection Bias in Observational Studies

Methods for Addressing Selection Bias in Observational Studies Methods for Addressing Selection Bias in Observational Studies Susan L. Ettner, Ph.D. Professor Division of General Internal Medicine and Health Services Research, UCLA What is Selection Bias? In the regression

More information

Causal Validity Considerations for Including High Quality Non-Experimental Evidence in Systematic Reviews

Causal Validity Considerations for Including High Quality Non-Experimental Evidence in Systematic Reviews Non-Experimental Evidence in Systematic Reviews OPRE REPORT #2018-63 DEKE, MATHEMATICA POLICY RESEARCH JUNE 2018 OVERVIEW Federally funded systematic reviews of research evidence play a central role in

More information

Journal of Political Economy, Vol. 93, No. 2 (Apr., 1985)

Journal of Political Economy, Vol. 93, No. 2 (Apr., 1985) Confirmations and Contradictions Journal of Political Economy, Vol. 93, No. 2 (Apr., 1985) Estimates of the Deterrent Effect of Capital Punishment: The Importance of the Researcher's Prior Beliefs Walter

More information

Marno Verbeek Erasmus University, the Netherlands. Cons. Pros

Marno Verbeek Erasmus University, the Netherlands. Cons. Pros Marno Verbeek Erasmus University, the Netherlands Using linear regression to establish empirical relationships Linear regression is a powerful tool for estimating the relationship between one variable

More information

Early Release from Prison and Recidivism: A Regression Discontinuity Approach *

Early Release from Prison and Recidivism: A Regression Discontinuity Approach * Early Release from Prison and Recidivism: A Regression Discontinuity Approach * Olivier Marie Department of Economics, Royal Holloway University of London and Centre for Economic Performance, London School

More information

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS) Chapter : Advanced Remedial Measures Weighted Least Squares (WLS) When the error variance appears nonconstant, a transformation (of Y and/or X) is a quick remedy. But it may not solve the problem, or it

More information

Randomization as a Tool for Development Economists. Esther Duflo Sendhil Mullainathan BREAD-BIRS Summer school

Randomization as a Tool for Development Economists. Esther Duflo Sendhil Mullainathan BREAD-BIRS Summer school Randomization as a Tool for Development Economists Esther Duflo Sendhil Mullainathan BREAD-BIRS Summer school Randomization as one solution Suppose you could do a Randomized evaluation of the microcredit

More information

COMMITTEE FOR PROPRIETARY MEDICINAL PRODUCTS (CPMP) POINTS TO CONSIDER ON MISSING DATA

COMMITTEE FOR PROPRIETARY MEDICINAL PRODUCTS (CPMP) POINTS TO CONSIDER ON MISSING DATA The European Agency for the Evaluation of Medicinal Products Evaluation of Medicines for Human Use London, 15 November 2001 CPMP/EWP/1776/99 COMMITTEE FOR PROPRIETARY MEDICINAL PRODUCTS (CPMP) POINTS TO

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary Statistics and Results This file contains supplementary statistical information and a discussion of the interpretation of the belief effect on the basis of additional data. We also present

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 10: Introduction to inference (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 17 What is inference? 2 / 17 Where did our data come from? Recall our sample is: Y, the vector

More information

BOOTSTRAPPING CONFIDENCE LEVELS FOR HYPOTHESES ABOUT QUADRATIC (U-SHAPED) REGRESSION MODELS

BOOTSTRAPPING CONFIDENCE LEVELS FOR HYPOTHESES ABOUT QUADRATIC (U-SHAPED) REGRESSION MODELS BOOTSTRAPPING CONFIDENCE LEVELS FOR HYPOTHESES ABOUT QUADRATIC (U-SHAPED) REGRESSION MODELS 12 June 2012 Michael Wood University of Portsmouth Business School SBS Department, Richmond Building Portland

More information

Minimizing Uncertainty in Property Casualty Loss Reserve Estimates Chris G. Gross, ACAS, MAAA

Minimizing Uncertainty in Property Casualty Loss Reserve Estimates Chris G. Gross, ACAS, MAAA Minimizing Uncertainty in Property Casualty Loss Reserve Estimates Chris G. Gross, ACAS, MAAA The uncertain nature of property casualty loss reserves Property Casualty loss reserves are inherently uncertain.

More information

Issues in African Economic Development. Economics 172. University of California, Berkeley. Department of Economics. Professor Ted Miguel

Issues in African Economic Development. Economics 172. University of California, Berkeley. Department of Economics. Professor Ted Miguel Economics 172 Issues in African Economic Development Professor Ted Miguel Department of Economics University of California, Berkeley Economics 172 Issues in African Economic Development Lecture 10 February

More information

Russian Journal of Agricultural and Socio-Economic Sciences, 3(15)

Russian Journal of Agricultural and Socio-Economic Sciences, 3(15) ON THE COMPARISON OF BAYESIAN INFORMATION CRITERION AND DRAPER S INFORMATION CRITERION IN SELECTION OF AN ASYMMETRIC PRICE RELATIONSHIP: BOOTSTRAP SIMULATION RESULTS Henry de-graft Acquah, Senior Lecturer

More information

Problem Set 5 ECN 140 Econometrics Professor Oscar Jorda. DUE: June 6, Name

Problem Set 5 ECN 140 Econometrics Professor Oscar Jorda. DUE: June 6, Name Problem Set 5 ECN 140 Econometrics Professor Oscar Jorda DUE: June 6, 2006 Name 1) Earnings functions, whereby the log of earnings is regressed on years of education, years of on-the-job training, and

More information

Do Your Online Friends Make You Pay? A Randomized Field Experiment on Peer Influence in Online Social Networks Online Appendix

Do Your Online Friends Make You Pay? A Randomized Field Experiment on Peer Influence in Online Social Networks Online Appendix Forthcoming in Management Science 2014 Do Your Online Friends Make You Pay? A Randomized Field Experiment on Peer Influence in Online Social Networks Online Appendix Ravi Bapna University of Minnesota,

More information

The U-Shape without Controls. David G. Blanchflower Dartmouth College, USA University of Stirling, UK.

The U-Shape without Controls. David G. Blanchflower Dartmouth College, USA University of Stirling, UK. The U-Shape without Controls David G. Blanchflower Dartmouth College, USA University of Stirling, UK. Email: david.g.blanchflower@dartmouth.edu Andrew J. Oswald University of Warwick, UK. Email: andrew.oswald@warwick.ac.uk

More information

Using Interviewer Random Effects to Calculate Unbiased HIV Prevalence Estimates in the Presence of Non-Response: a Bayesian Approach

Using Interviewer Random Effects to Calculate Unbiased HIV Prevalence Estimates in the Presence of Non-Response: a Bayesian Approach PROGRAM ON THE GLOBAL DEMOGRAPHY OF AGING Working Paper Series Using Interviewer Random Effects to Calculate Unbiased HIV Prevalence Estimates in the Presence of Non-Response: a Bayesian Approach Mark

More information

Impact Evaluation Toolbox

Impact Evaluation Toolbox Impact Evaluation Toolbox Gautam Rao University of California, Berkeley * ** Presentation credit: Temina Madon Impact Evaluation 1) The final outcomes we care about - Identify and measure them Measuring

More information

What is Multilevel Modelling Vs Fixed Effects. Will Cook Social Statistics

What is Multilevel Modelling Vs Fixed Effects. Will Cook Social Statistics What is Multilevel Modelling Vs Fixed Effects Will Cook Social Statistics Intro Multilevel models are commonly employed in the social sciences with data that is hierarchically structured Estimated effects

More information

Quasi-experimental analysis Notes for "Structural modelling".

Quasi-experimental analysis Notes for Structural modelling. Quasi-experimental analysis Notes for "Structural modelling". Martin Browning Department of Economics, University of Oxford Revised, February 3 2012 1 Quasi-experimental analysis. 1.1 Modelling using quasi-experiments.

More information

KARUN ADUSUMILLI OFFICE ADDRESS, TELEPHONE & Department of Economics

KARUN ADUSUMILLI OFFICE ADDRESS, TELEPHONE &   Department of Economics LONDON SCHOOL OF ECONOMICS & POLITICAL SCIENCE Placement Officer: Professor Wouter Den Haan +44 (0)20 7955 7669 w.denhaan@lse.ac.uk Placement Assistant: Mr John Curtis +44 (0)20 7955 7545 j.curtis@lse.ac.uk

More information

Key Results Liberia Demographic and Health Survey

Key Results Liberia Demographic and Health Survey Key Results 2013 Liberia Demographic and Health Survey The 2013 Liberia Demographic and Health Survey (LDHS) was implemented by the Liberia Institute of Statistics and Geo-Information Services (LISGIS)

More information

A NON-TECHNICAL INTRODUCTION TO REGRESSIONS. David Romer. University of California, Berkeley. January Copyright 2018 by David Romer

A NON-TECHNICAL INTRODUCTION TO REGRESSIONS. David Romer. University of California, Berkeley. January Copyright 2018 by David Romer A NON-TECHNICAL INTRODUCTION TO REGRESSIONS David Romer University of California, Berkeley January 2018 Copyright 2018 by David Romer CONTENTS Preface ii I Introduction 1 II Ordinary Least Squares Regression

More information

Objectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests

Objectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests Objectives Quantifying the quality of hypothesis tests Type I and II errors Power of a test Cautions about significance tests Designing Experiments based on power Evaluating a testing procedure The testing

More information

P E R S P E C T I V E S

P E R S P E C T I V E S PHOENIX CENTER FOR ADVANCED LEGAL & ECONOMIC PUBLIC POLICY STUDIES Revisiting Internet Use and Depression Among the Elderly George S. Ford, PhD June 7, 2013 Introduction Four years ago in a paper entitled

More information

Limited dependent variable regression models

Limited dependent variable regression models 181 11 Limited dependent variable regression models In the logit and probit models we discussed previously the dependent variable assumed values of 0 and 1, 0 representing the absence of an attribute and

More information

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha attrition: When data are missing because we are unable to measure the outcomes of some of the

More information

Issues in African Economic Development. Economics 172. University of California, Berkeley. Department of Economics. Professor Ted Miguel

Issues in African Economic Development. Economics 172. University of California, Berkeley. Department of Economics. Professor Ted Miguel Economics 172 Issues in African Economic Development Professor Ted Miguel Department of Economics University of California, Berkeley Economics 172 Issues in African Economic Development Lecture 11 February

More information

Causal Methods for Observational Data Amanda Stevenson, University of Texas at Austin Population Research Center, Austin, TX

Causal Methods for Observational Data Amanda Stevenson, University of Texas at Austin Population Research Center, Austin, TX Causal Methods for Observational Data Amanda Stevenson, University of Texas at Austin Population Research Center, Austin, TX ABSTRACT Comparative effectiveness research often uses non-experimental observational

More information

The Limits of Inference Without Theory

The Limits of Inference Without Theory The Limits of Inference Without Theory Kenneth I. Wolpin University of Pennsylvania Koopmans Memorial Lecture (2) Cowles Foundation Yale University November 3, 2010 Introduction Fuller utilization of the

More information

A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) *

A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) * A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) * by J. RICHARD LANDIS** and GARY G. KOCH** 4 Methods proposed for nominal and ordinal data Many

More information

Abstract Title Page. Authors and Affiliations: Chi Chang, Michigan State University. SREE Spring 2015 Conference Abstract Template

Abstract Title Page. Authors and Affiliations: Chi Chang, Michigan State University. SREE Spring 2015 Conference Abstract Template Abstract Title Page Title: Sensitivity Analysis for Multivalued Treatment Effects: An Example of a Crosscountry Study of Teacher Participation and Job Satisfaction Authors and Affiliations: Chi Chang,

More information

A Brief Introduction to Bayesian Statistics

A Brief Introduction to Bayesian Statistics A Brief Introduction to Statistics David Kaplan Department of Educational Psychology Methods for Social Policy Research and, Washington, DC 2017 1 / 37 The Reverend Thomas Bayes, 1701 1761 2 / 37 Pierre-Simon

More information

1. Introduction Consider a government contemplating the implementation of a training (or other social assistance) program. The decision to implement t

1. Introduction Consider a government contemplating the implementation of a training (or other social assistance) program. The decision to implement t 1. Introduction Consider a government contemplating the implementation of a training (or other social assistance) program. The decision to implement the program depends on the assessment of its likely

More information

Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections

Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections New: Bias-variance decomposition, biasvariance tradeoff, overfitting, regularization, and feature selection Yi

More information

Regression Discontinuity Analysis

Regression Discontinuity Analysis Regression Discontinuity Analysis A researcher wants to determine whether tutoring underachieving middle school students improves their math grades. Another wonders whether providing financial aid to low-income

More information

Instrumental Variables I (cont.)

Instrumental Variables I (cont.) Review Instrumental Variables Observational Studies Cross Sectional Regressions Omitted Variables, Reverse causation Randomized Control Trials Difference in Difference Time invariant omitted variables

More information

THE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER

THE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER THE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER Introduction, 639. Factor analysis, 639. Discriminant analysis, 644. INTRODUCTION

More information

Testing the Predictability of Consumption Growth: Evidence from China

Testing the Predictability of Consumption Growth: Evidence from China Auburn University Department of Economics Working Paper Series Testing the Predictability of Consumption Growth: Evidence from China Liping Gao and Hyeongwoo Kim Georgia Southern University and Auburn

More information

6. Unusual and Influential Data

6. Unusual and Influential Data Sociology 740 John ox Lecture Notes 6. Unusual and Influential Data Copyright 2014 by John ox Unusual and Influential Data 1 1. Introduction I Linear statistical models make strong assumptions about the

More information

What Behaviors Do Behavior Programs Change

What Behaviors Do Behavior Programs Change What Behaviors Do Behavior Programs Change Yingjuan (Molly) Du, Dave Hanna, Jean Shelton and Amy Buege, Itron, Inc. ABSTRACT Utilities behavioral programs, such as audits and web-based tools, are designed

More information

Identifying Peer Influence Effects in Observational Social Network Data: An Evaluation of Propensity Score Methods

Identifying Peer Influence Effects in Observational Social Network Data: An Evaluation of Propensity Score Methods Identifying Peer Influence Effects in Observational Social Network Data: An Evaluation of Propensity Score Methods Dean Eckles Department of Communication Stanford University dean@deaneckles.com Abstract

More information

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug? MMI 409 Spring 2009 Final Examination Gordon Bleil Table of Contents Research Scenario and General Assumptions Questions for Dataset (Questions are hyperlinked to detailed answers) 1. Is there a difference

More information

Methods of Reducing Bias in Time Series Designs: A Within Study Comparison

Methods of Reducing Bias in Time Series Designs: A Within Study Comparison Methods of Reducing Bias in Time Series Designs: A Within Study Comparison Kylie Anglin, University of Virginia Kate Miller-Bains, University of Virginia Vivian Wong, University of Virginia Coady Wing,

More information

Making comparisons. Previous sessions looked at how to describe a single group of subjects However, we are often interested in comparing two groups

Making comparisons. Previous sessions looked at how to describe a single group of subjects However, we are often interested in comparing two groups Making comparisons Previous sessions looked at how to describe a single group of subjects However, we are often interested in comparing two groups Data can be interpreted using the following fundamental

More information

Understanding Regression Discontinuity Designs As Observational Studies

Understanding Regression Discontinuity Designs As Observational Studies Observational Studies 2 (2016) 174-182 Submitted 10/16; Published 12/16 Understanding Regression Discontinuity Designs As Observational Studies Jasjeet S. Sekhon Robson Professor Departments of Political

More information

Hierarchy of Statistical Goals

Hierarchy of Statistical Goals Hierarchy of Statistical Goals Ideal goal of scientific study: Deterministic results Determine the exact value of a ment or population parameter Prediction: What will the value of a future observation

More information

Class 1: Introduction, Causality, Self-selection Bias, Regression

Class 1: Introduction, Causality, Self-selection Bias, Regression Class 1: Introduction, Causality, Self-selection Bias, Regression Ricardo A Pasquini April 2011 Ricardo A Pasquini () April 2011 1 / 23 Introduction I Angrist s what should be the FAQs of a researcher:

More information

Sex and the Classroom: Can a Cash Transfer Program for Schooling decrease HIV infections?

Sex and the Classroom: Can a Cash Transfer Program for Schooling decrease HIV infections? Sex and the Classroom: Can a Cash Transfer Program for Schooling decrease HIV infections? Sarah Baird, George Washington University Craig McIntosh, UCSD Berk Özler, World Bank Education as a Social Vaccine

More information

Lessons in biostatistics

Lessons in biostatistics Lessons in biostatistics The test of independence Mary L. McHugh Department of Nursing, School of Health and Human Services, National University, Aero Court, San Diego, California, USA Corresponding author:

More information

ICPSR Causal Inference in the Social Sciences. Course Syllabus

ICPSR Causal Inference in the Social Sciences. Course Syllabus ICPSR 2012 Causal Inference in the Social Sciences Course Syllabus Instructors: Dominik Hangartner London School of Economics Marco Steenbergen University of Zurich Teaching Fellow: Ben Wilson London School

More information

Testing for non-response and sample selection bias in contingent valuation: Analysis of a combination phone/mail survey

Testing for non-response and sample selection bias in contingent valuation: Analysis of a combination phone/mail survey Whitehead, J.C., Groothuis, P.A., and Blomquist, G.C. (1993) Testing for Nonresponse and Sample Selection Bias in Contingent Valuation: Analysis of a Combination Phone/Mail Survey, Economics Letters, 41(2):

More information

Statistical Power Sampling Design and sample Size Determination

Statistical Power Sampling Design and sample Size Determination Statistical Power Sampling Design and sample Size Determination Deo-Gracias HOUNDOLO Impact Evaluation Specialist dhoundolo@3ieimpact.org Outline 1. Sampling basics 2. What do evaluators do? 3. Statistical

More information

INTRODUCTION TO ECONOMETRICS (EC212)

INTRODUCTION TO ECONOMETRICS (EC212) INTRODUCTION TO ECONOMETRICS (EC212) Course duration: 54 hours lecture and class time (Over three weeks) LSE Teaching Department: Department of Economics Lead Faculty (session two): Dr Taisuke Otsu and

More information

EC352 Econometric Methods: Week 07

EC352 Econometric Methods: Week 07 EC352 Econometric Methods: Week 07 Gordon Kemp Department of Economics, University of Essex 1 / 25 Outline Panel Data (continued) Random Eects Estimation and Clustering Dynamic Models Validity & Threats

More information

Econometric Game 2012: infants birthweight?

Econometric Game 2012: infants birthweight? Econometric Game 2012: How does maternal smoking during pregnancy affect infants birthweight? Case A April 18, 2012 1 Introduction Low birthweight is associated with adverse health related and economic

More information

What is: regression discontinuity design?

What is: regression discontinuity design? What is: regression discontinuity design? Mike Brewer University of Essex and Institute for Fiscal Studies Part of Programme Evaluation for Policy Analysis (PEPA), a Node of the NCRM Regression discontinuity

More information

Introduction to Program Evaluation

Introduction to Program Evaluation Introduction to Program Evaluation Nirav Mehta Assistant Professor Economics Department University of Western Ontario January 22, 2014 Mehta (UWO) Program Evaluation January 22, 2014 1 / 28 What is Program

More information

Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1: Exploring Data Key Vocabulary:! individual! variable! frequency table! relative frequency table! distribution! pie chart! bar graph! two-way table! marginal distributions! conditional distributions!

More information

Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology

Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology Sylvia Richardson 1 sylvia.richardson@imperial.co.uk Joint work with: Alexina Mason 1, Lawrence

More information

What Are We Weighting For?

What Are We Weighting For? What Are We Weighting For? Gary Solon Steven J. Haider Jeffrey M. Wooldridge Solon, Haider, and Wooldridge abstract When estimating population descriptive statistics, weighting is called for if needed

More information

SELECTED FACTORS LEADING TO THE TRANSMISSION OF FEMALE GENITAL MUTILATION ACROSS GENERATIONS: QUANTITATIVE ANALYSIS FOR SIX AFRICAN COUNTRIES

SELECTED FACTORS LEADING TO THE TRANSMISSION OF FEMALE GENITAL MUTILATION ACROSS GENERATIONS: QUANTITATIVE ANALYSIS FOR SIX AFRICAN COUNTRIES Public Disclosure Authorized Public Disclosure Authorized Public Disclosure Authorized Public Disclosure Authorized ENDING VIOLENCE AGAINST WOMEN AND GIRLS SELECTED FACTORS LEADING TO THE TRANSMISSION

More information