Alan S. Gerber Donald P. Green Yale University. January 4, 2003

Similar documents
Jake Bowers Wednesdays, 2-4pm 6648 Haven Hall ( ) CPS Phone is

Do Get-Out-The-Vote Calls Reduce Turnout? The Importance of Statistical Methods for Field Experiments

This exam consists of three parts. Provide answers to ALL THREE sections.

EUROPEAN COMMISSION HEALTH AND FOOD SAFETY DIRECTORATE-GENERAL. PHARMACEUTICAL COMMITTEE 21 October 2015

Carrying out an Empirical Project

The Late Pretest Problem in Randomized Control Trials of Education Interventions

Assignment 4: True or Quasi-Experiment

Comparing Experimental and Matching Methods using a Large-Scale Field Experiment on Voter Mobilization

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

The Stable Unit Treatment Value Assumption (SUTVA) and Its Implications for Social Science RCTs

Response to the ASA s statement on p-values: context, process, and purpose

Session 1: Dealing with Endogeneity

PharmaSUG Paper HA-04 Two Roads Diverged in a Narrow Dataset...When Coarsened Exact Matching is More Appropriate than Propensity Score Matching

SEMINAR ON SERVICE MARKETING

7 Statistical Issues that Researchers Shouldn t Worry (So Much) About

Information about cases being considered by the Case Examiners

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Misleading Postevent Information and the Memory Impairment Hypothesis: Comment on Belli and Reply to Tversky and Tuchin

Biostatistics 2 nd year Comprehensive Examination. Due: May 31 st, 2013 by 5pm. Instructions:

Practitioner s Guide To Stratified Random Sampling: Part 1

Population-adjusted treatment comparisons Overview and recommendations from the NICE Decision Support Unit

Comparing Experimental and Matching Methods Using a Large-Scale Voter Mobilization Experiment

INTERNATIONAL STANDARD ON ASSURANCE ENGAGEMENTS 3000 ASSURANCE ENGAGEMENTS OTHER THAN AUDITS OR REVIEWS OF HISTORICAL FINANCIAL INFORMATION CONTENTS

The Common Priors Assumption: A comment on Bargaining and the Nature of War

Cambridge Pre-U 9773 Psychology June 2013 Principal Examiner Report for Teachers

THE COMPARATIVE EFFECTIVENESS ON TURNOUT OF POSITIVELY VERSUS NEGATIVELY FRAMED VOTER MOBILIZATION APPEALS

BACKGROUND + GENERAL COMMENTS

DON M. PALLAIS, CPA 14 Dahlgren Road Richmond, Virginia Telephone: (804) Fax: (804)

Linear Regression Analysis

Wither Participation and Empowerment?

Herbert A. Simon Professor, Department of Philosophy, University of Wisconsin-Madison, USA

Assurance Engagements Other than Audits or Review of Historical Financial Statements

Sawtooth Software. The Number of Levels Effect in Conjoint: Where Does It Come From and Can It Be Eliminated? RESEARCH PAPER SERIES

On the Practice of Dichotomization of Quantitative Variables

Natural Experiments. Thad Dunning Department of Political Science Yale University. Draft entry for the International Encyclopedia of Political Science

Access to clinical trial information and the stockpiling of Tamiflu. Department of Health

VERDIN MANUSCRIPT REVIEW HISTORY REVISION NOTES FROM AUTHORS (ROUND 2)

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)

Blocking Multiple Sources of Error in Small Analytic Studies

A critical look at the use of SEM in international business research

Meta-Analysis David Wilson, Ph.D. Upcoming Seminar: October 20-21, 2017, Philadelphia, Pennsylvania

A SAS Macro to Investigate Statistical Power in Meta-analysis Jin Liu, Fan Pan University of South Carolina Columbia

Chapter 11. Experimental Design: One-Way Independent Samples Design

Resilience: A Common or Not-So-Common Phenomenon? Robert Brooks, Ph.D.

A brief history of the Fail Safe Number in Applied Research. Moritz Heene. University of Graz, Austria

Bayesian versus maximum likelihood estimation of treatment effects in bivariate probit instrumental variable models

SYMSYS 130: Research Methods in the Cognitive and Information Sciences (Spring 2013)

Preparing for an Oral Hearing: Taxi, Limousine or other PDV Applications

Principles of publishing

RIGOR AND REPRODUCIBILITY IN RESEARCH GUIDANCE FOR REVIEWERS

The role of theory in construction management: a call for debate

Review Statistics review 2: Samples and populations Elise Whitley* and Jonathan Ball

Issues Surrounding the Normalization and Standardisation of Skin Conductance Responses (SCRs).

Research Prospectus. Your major writing assignment for the quarter is to prepare a twelve-page research prospectus.

ASSESSING THE EFFECTS OF MISSING DATA. John D. Hutcheson, Jr. and James E. Prather, Georgia State University

Claims about health in ads for e-cigarettes. CAP and BCAP s regulatory statement

Speaker Notes: Qualitative Comparative Analysis (QCA) in Implementation Studies

CHECKLIST FOR EVALUATING A RESEARCH REPORT Provided by Dr. Blevins

Energizing Behavior at Work: Expectancy Theory Revisited

6. Unusual and Influential Data

Comparison of volume estimation methods for pancreatic islet cells

Internal Model Calibration Using Overlapping Data

Guidelines for reviewers

The preceding five chapters explain how to use multiple regression to analyze the

Problem Set 5 ECN 140 Econometrics Professor Oscar Jorda. DUE: June 6, Name

Title: Identifying work ability promoting factors for home care aides and assistant nurses

Comments from Deaf Australia in response to the National Mental Health and Disability Employment Strategy Discussion Paper

St. Petersburg College Applied Ethics Program Critical Thinking & Application Paper Fall Session 2010

Sheila Barron Statistics Outreach Center 2/8/2011

National curriculum tests maladministration procedures. March 2007 QCA/07/3097

Capability, Credibility, and Extended General Deterrence. Supplemental Material

Running head: INDIVIDUAL DIFFERENCES 1. Why to treat subjects as fixed effects. James S. Adelman. University of Warwick.

THE COMPARATIVE EFFECTIVENESS ON TURNOUT OF POSITIVELY VERSUS NEGATIVELY FRAMED DESCRIPTIVE NORMS IN MOBILIZATION CAMPAIGNS

9694 THINKING SKILLS

Meta-Analysis. Zifei Liu. Biological and Agricultural Engineering

Scientific Ethics. Modified by Emmanuel and Collin from presentation of Doug Wallace Dalhousie University

SATA Patient Information Sheet. Detailed DVLA Guidance for UK Drivers with Sleep Apnoea

Propensity Score Analysis Shenyang Guo, Ph.D.

MEA DISCUSSION PAPERS

Scientific Working Group on Digital Evidence

NCAA, NAIA, or NJCAA Intercollegiate Football: Traumatic Brain Injury Supplemental Warranty Application for New and Renewal Policies

The Regression-Discontinuity Design

Strategies for handling missing data in randomised trials

HANDBOOK. MD MP Contact Program. CANADIAN MEDICAL ASSOCIATION 1867Alta Vista Drive, Ottawa ON K1G 5W

# BMJ entitled " Complete the antibiotic course to avoid resistance ; non-evidence-based dogma which has run its course?

[Cite as State ex rel. Airborne Freight Corp. v. Indus. Comm., 117 Ohio St.3d 369, 2008-Ohio ]

All Possible Regressions Using IBM SPSS: A Practitioner s Guide to Automatic Linear Modeling

Most candidates were able to gain marks on this question, though there were relatively few who were able to explain interpretive sociology.

PEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW. Ball State University

Causality and Treatment Effects

Cross-Lagged Panel Analysis

Analysis of Cluster-Randomized Experiments: A Comparison of Alternative Estimation Approaches

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

Detection Theory: Sensitivity and Response Bias

C 178/2 Official Journal of the European Union

READTHEORY Passage and Questions

EPSE 594: Meta-Analysis: Quantitative Research Synthesis

The Clean Environment Commission. Public Participation in the Environmental Review Process

Causal Methods for Observational Data Amanda Stevenson, University of Texas at Austin Population Research Center, Austin, TX

Auditing Standards and Practices Council

Transcription:

Technical Note on the Conditions Under Which It is Efficient to Discard Observations Assigned to Multiple Treatments in an Experiment Using a Factorial Design Alan S. Gerber Donald P. Green Yale University January 4, 003 Much of Imai s (003) paper takes umbrage at the New Haven study s (Gerber and Green 000) use of factorial design. Gerber and Green (004, Table A) show that none of the substantive conclusions of the New Haven study hinge on the inclusion or exclusion of those assigned to multiple treatments. Nevertheless, the issue of whether to use factorial designs is an important one, and it is important to dispel the misconceptions that Imai generates concerning the design and analysis of factorial experiments. Experimental researchers routinely use factorial designs in order to gauge interactions between factors. The use of factorial design is particularly common among researchers launching new research agendas. As Mead (988, pp.584-5) explains in The Design of Experiments: Statistical Principles for Practical Applications: A major question will usually be how many aspects of variation, or factors, to consider simultaneously in an experiment. This will depend on the stage of the experiment within the research programme. At the begining of the programme, it is important to include many factors, because failure to investigate the interaction of factors at an early stage can easily lead to wasting resources through pursuing too narrow a line of research before eventually checking on the effects of other

factors. It is difficult to conceive of an investigation with too many factors at an early stage. This research trajectory is typically combined with a hypothesis-testing approach in which analysts look for evidence of interactions and, failing to find them, focus on main effects. This approach figures prominently in virtually all textbooks on experimental design and data analysis (Box, Hunter, and Hunter 978; Cochrane and Cox 957; Mead 988; Montgomery 003), and it is the framework within which the Gerber and Green experiment was designed and analyzed. The variegated factorial design of the study enabled us to investigate a wide variety of potential interactions suggested in previous work that also used factorial design (Eldersveld 956; Miller et al. 98). Factorial design also enabled us to approximate real world variation in background campaign conditions. When examining the effects of phone calls for the population that also receives three mailings, one seeks to learn about the effects of phone calls in environments where other campaigns have sent out mailings. Breaking with conventional thinking about factorial design, Imai derides the use of multiple treatments as incorrect and inefficient. He declares on p.9: In principle, it is advisable to minimize the number of treatments in field experiments. He cannot imagine why we would waste observations on multiple treatments, apparently forgetting that the treatment and interaction effects reported in the studies that he cites with approval (Eldersveld 956; Miller et al. 98) were often very large. It is important to note that our experiment had the power to detect the massive interactions that Miller et al. report (98, p.450). It turned out, however, that our experiment revealed no significant interactions (p.660), and so we focused our write-up on main effects, but the absence of robust interactions is a finding, not a design failure. After donouncing factorial design, Imai s proposes to discard all the observations in the New Haven study that were assigned to more than one type of treatment. Imai provides no analytics to bolster the efficiency of this statistical approach, nor does he cite

any authorities on this point. His argument rests entirely on the substantive assertion that treatment effects are likely to be smaller for those who have already received other treatments (footnote 6). This leaves the reader to fill in the missing statistical argument raised by the question: Under what circumstances is it more efficient to discard or retain observations assigned to multiple treatments? Because this question has broad implications for experimental design and analysis, we welcome the opportunity to discuss the answer, which follows directly from standard econometric treatments of when to include or exclude potential regressors. Here, we show the implications of this analytic result for the New Haven experiment with special reference to the effects of phone calls. Let ˆb be the instrumental variables estimator of the phone treatment effect, β, based solely on what Imai terms the correct groups (i.e., the control group that receives no treatment whatsoever and the treatment group that receives only phone calls). This estimator is what Imai presents in Tables 5. Let ˆb denote the instrumental variables estimator of the phone treatment effect based on a comparison of incorrect groups (e.g., the control group that received mail and the treatment group that received both mail and phone calls). We will use γ to denote the bias associated with ˆb, allowing for the possibility that treatment effects are likely to be small for those who have already received other treatments (Imai 003, footnote 6). Call bˆ T the instrumental variables estimator of the phone treatment effect, b, based on both types of data; this estimator is what Gerber and Green (000) use. It is helpful to express bˆ T as a weighted average of ˆb and ˆb. Using to denote the sampling variance of ˆb, and to denote the sampling variance of ˆb, we express bˆ T as follows: ˆ ˆ ˆ b T = b + b. + + 3

Having defined the competing estimators, we now compare their properties. The estimator ˆb is an unbiased estimator of β with variance. Thus, its mean squared error (MSE) is also. By comparison, bˆ has an expectation of T β + γ and a + variance of +. The MSE of bˆ T is therefore ( ˆ b ) T = γ + MSE. + + A comparison of MSE ( ˆb ) and ( b T ) efficient than using only the correct groups when MSE ˆ shows that the using all of the data is more γ < +. The right hand side of this equation is recognizable as the standard error of the difference between the estimated treatment effects in the correct and incorrect groups. This formula thus provides a simple decision rule: Discard the incorrect group if one suspects that the treatment effect in the incorrect group lies more than one standard error away from the treatment effect in the correct group. Is this decision rule satisfied in the New Haven study? From Imai s analysis of the original New Haven data (Table 5), we learn that ˆ is 6.6, which means that the bias due to the use of incorrect groups must be at least 6.6 percentage-points in order to warrant discarding the observations assigned to multiple treatments. If the true parameter were 5, as Imai contends in his abstract, the ex ante claim that the use of multiple groups leads to a downward bias of more than 6.6 means that phone calls diminish turnout among those who receive other treatments. As it turns out, ˆb < ˆb in the original data, 4

which is the opposite of what Imai s diminishing returns argument supposes. Using the corrected data and robust standard errors, we noted earlier that ˆb is -0.9 and bˆ t is.6, which implies that ˆb equals. percentage-points and the bias (γ) is.. Since ˆ is 6.0 and ˆ is.6, one would not discard the observations assigned to multiple treatment groups unless the bias were greater than 6.5, and empirically the absolute bias (.) is nowhere near this value. Thus, a means-squared error analysis demonstrates that discarding the incorrect groups in the New Haven study cannot be justified on grounds of efficiency. One could argue that Imai acted in accordance with his own very strong priors about the bias associated with multiple treatments, even if his ideas about efficiency were not well thought out. But a closer look at his views on the subject of bias reveals them to be inconsistent as well. Imai is dead set against factorial designs on pages 9-0, but by page he has changed his mind and applauds his discovery that sending a postcard three times is much more effective than mailing it once or twice. This provides evidence against the assumption of Gerber and Green (000) that the effect of mail canvassing is linear in the number of postcards sent. Imai s conviction about diminishing returns has now reversed itself. Leaving aside the fact that this alleged departure from linearity is not statistically significant, the irony of Imai s comment is that he would not have had the opportunity to make this observation had we obeyed his principle of minimizing the number of treatment groups. In sum, Imai s statistical practice of discarding observations assigned to multiple treatments cannot be justified empirically, and the theoretical justification for doing so rests on mercurial assumptions. Nor do we endorse Imai s stricture (p.0) that In principle, it is advisable to minimize the number of treatments in field experiments. This strikes us as shortsighted advice. The fact that so many of the interactions in the New Haven study proved to be inconsequential is a research finding made possible by the 5

exploration of many factors. To dub the use of factorial design as inefficient fails to appreciate the aims of the study. References Box, George E.P., William G. Hunter, and J. Stuart. Hunter. 978. Statistics for Experimenters: An Introduction to Design, Data Analysis, and Model Building. New York: John Wiley & Sons. Cochrane, William G., and Gertrude M. Cox. 957. Experimental Designs. New York: John Wiley & Sons. Eldersveld, Samuel J. 956. Experimental Propaganda Techniques and Voting Behavior. American Political Science Review 50(March): 54-65. Gerber, Alan S. and Donald P. Green. 000. The Effects of Personal Canvassing, Telephone Calls, and Direct Mail on Voter Turnout: A Field Experiment. American Political Science Review, 94 (3): 653-64. Gerber, Alan S. and Donald P. Green. 004. Brief Paid Phone Calls Are Ineffective: Why Experiments Succeed and Matching Fails To Produce Accurate Estimates. Unpublished manuscript, Institution for Social and Policy Studies, Yale University. Imai, Kosuke. 003. Do Get-Out-The-Vote Calls Reduce Turnout? The Importance of Statistical Methods for Field Experiments. American Political Science Review, final manuscript submission, August 9 th. 6

Mead, R.. 988. The Design of Experiments: Statistical Principles for Practical Application. New York: Cambridge University Press. Miller, Roy E., David A. Bositis, and Denise L. Baer. 98. Stimulating Voter Turnout in a Primary: Field Experiment with a Precinct Committeeman. International Political Science Review (4): 445-60. 7