Response to Comment on Cognitive Science in the field: Does exercising core mathematical concepts improve school readiness?

Similar documents
Bayesian Inference Bayes Laplace

A Brief Introduction to Bayesian Statistics

Advanced Bayesian Models for the Social Sciences. TA: Elizabeth Menninga (University of North Carolina, Chapel Hill)

Advanced Bayesian Models for the Social Sciences

AB - Bayesian Analysis

Combining Risks from Several Tumors Using Markov Chain Monte Carlo

Bayesian Logistic Regression Modelling via Markov Chain Monte Carlo Algorithm

BAYESIAN HYPOTHESIS TESTING WITH SPSS AMOS

Estimating Bayes Factors for Linear Models with Random Slopes. on Continuous Predictors

Estimation of contraceptive prevalence and unmet need for family planning in Africa and worldwide,

Bayesian and Frequentist Approaches

Ordinal Data Modeling

Bayesian and Classical Approaches to Inference and Model Averaging

Bayesian Confidence Intervals for Means and Variances of Lognormal and Bivariate Lognormal Distributions

BayesRandomForest: An R

Multivariate Regression with Small Samples: A Comparison of Estimation Methods W. Holmes Finch Maria E. Hernández Finch Ball State University

Bayesian Estimation of a Meta-analysis model using Gibbs sampler

Georgetown University ECON-616, Fall Macroeconometrics. URL: Office Hours: by appointment

Introduction to Survival Analysis Procedures (Chapter)

Rejoinder to discussion of Philosophy and the practice of Bayesian statistics

A Bayes Factor for Replications of ANOVA Results

Econometrics II - Time Series Analysis

Language: English Course level: Doctoral level

How do we combine two treatment arm trials with multiple arms trials in IPD metaanalysis? An Illustration with College Drinking Interventions

Ecological Statistics

Applications with Bayesian Approach

BAYESIAN ESTIMATORS OF THE LOCATION PARAMETER OF THE NORMAL DISTRIBUTION WITH UNKNOWN VARIANCE

You must answer question 1.

Introduction to Bayesian Analysis 1

In addition, a working knowledge of Matlab programming language is required to perform well in the course.

Russian Journal of Agricultural and Socio-Economic Sciences, 3(15)

THE INDIRECT EFFECT IN MULTIPLE MEDIATORS MODEL BY STRUCTURAL EQUATION MODELING ABSTRACT

Bayes Factors for t tests and one way Analysis of Variance; in R

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

An Exercise in Bayesian Econometric Analysis Probit and Linear Probability Models

Bayesian Tolerance Intervals for Sparse Data Margin Assessment

Bayesian Analysis of Between-Group Differences in Variance Components in Hierarchical Generalized Linear Models

STATISTICAL INFERENCE 1 Richard A. Johnson Professor Emeritus Department of Statistics University of Wisconsin

Bayesian Variable Selection Tutorial

The matching effect of intra-class correlation (ICC) on the estimation of contextual effect: A Bayesian approach of multilevel modeling

Regression Discontinuity Designs: An Approach to Causal Inference Using Observational Data

BOOTSTRAPPING CONFIDENCE LEVELS FOR HYPOTHESES ABOUT QUADRATIC (U-SHAPED) REGRESSION MODELS

Bayesian Methodology to Estimate and Update SPF Parameters under Limited Data Conditions: A Sensitivity Analysis

Commentary: Practical Advantages of Bayesian Analysis of Epidemiologic Data

MS&E 226: Small Data

Kelvin Chan Feb 10, 2015

Methods and Criteria for Model Selection

Multilevel IRT for group-level diagnosis. Chanho Park Daniel M. Bolt. University of Wisconsin-Madison

Using Bayes Factors to Get the Most out of Linear Regression: A Practical Guide Using R

Mediation Analysis With Principal Stratification

Bayesian Mediation Analysis

Probabilistic Modeling to Support and Facilitate Decision Making in Early Drug Development

Data Analysis Using Regression and Multilevel/Hierarchical Models

Using Test Databases to Evaluate Record Linkage Models and Train Linkage Practitioners

The Century of Bayes

Individual Differences in Attention During Category Learning

Why Hypothesis Tests Are Essential for Psychological Science: A Comment on Cumming. University of Groningen. University of Missouri

Bayesian Variable Selection Tutorial

ST440/550: Applied Bayesian Statistics. (10) Frequentist Properties of Bayesian Methods

Method Comparison for Interrater Reliability of an Image Processing Technique in Epilepsy Subjects

A COMPARISON OF BAYESIAN MCMC AND MARGINAL MAXIMUM LIKELIHOOD METHODS IN ESTIMATING THE ITEM PARAMETERS FOR THE 2PL IRT MODEL

Supplementary Materials Order Matters: Alphabetizing In-Text Citations Biases Citation Rates Jeffrey R. Stevens and Juan F. Duque

Undesirable Optimality Results in Multiple Testing? Charles Lewis Dorothy T. Thayer

arxiv: v2 [stat.ap] 7 Dec 2016

Some Examples of Using Bayesian Statistics in Modeling Human Cognition

BIOSTATISTICAL METHODS AND RESEARCH DESIGNS. Xihong Lin Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA

Comparison of Meta-Analytic Results of Indirect, Direct, and Combined Comparisons of Drugs for Chronic Insomnia in Adults: A Case Study

Response to the ASA s statement on p-values: context, process, and purpose

A Hierarchical Linear Modeling Approach for Detecting Cheating and Aberrance. William Skorupski. University of Kansas. Karla Egan.

A Tutorial on a Practical Bayesian Alternative to Null-Hypothesis Significance Testing

Bayesian Inference in Public Administration Research: Substantive Differences from Somewhat Different Assumptions

Learning from data when all models are wrong

Statistical Evidence in Experimental Psychology: An Empirical Comparison Using 855 t Tests

Small Sample Bayesian Factor Analysis. PhUSE 2014 Paper SP03 Dirk Heerwegh

Introduction. Patrick Breheny. January 10. The meaning of probability The Bayesian approach Preview of MCMC methods

Analyzing data from educational surveys: a comparison of HLM and Multilevel IRT. Amin Mousavi

2014 Modern Modeling Methods (M 3 ) Conference, UCONN

A Bayesian alternative to null hypothesis significance testing

Bayesians methods in system identification: equivalences, differences, and misunderstandings

Application of Multinomial-Dirichlet Conjugate in MCMC Estimation : A Breast Cancer Study

CISC453 Winter Probabilistic Reasoning Part B: AIMA3e Ch

Bayesian and Classical Hypothesis Testing: Practical Differences for a Controversial Area of Research

Detecting and Modeling Spatial Disease Clustering. A Bayesian Approach

Impact Evaluation. Session 2, June 22 nd, 2010 Microsoft Research India Summer School on Computing for Socioeconomic Development

Bayesian Statistics Estimation of a Single Mean and Variance MCMC Diagnostics and Missing Data

Running head: VARIABILITY AS A PREDICTOR 1. Variability as a Predictor: A Bayesian Variability Model for Small Samples and Few.

Bayesian Modelling on Incidence of Pregnancy among HIV/AIDS Patient Women at Adare Hospital, Hawassa, Ethiopia

Bayesian mixture modeling for blood sugar levels of diabetes mellitus patients (case study in RSUD Saiful Anwar Malang Indonesia)

Bayesian Estimations from the Two-Parameter Bathtub- Shaped Lifetime Distribution Based on Record Values

Bayesian Model Averaging for Propensity Score Analysis

Jake Bowers Wednesdays, 2-4pm 6648 Haven Hall ( ) CPS Phone is

The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective

Running Head: BAYESIAN MEDIATION WITH MISSING DATA 1. A Bayesian Approach for Estimating Mediation Effects with Missing Data. Craig K.

For general queries, contact

Bivariate lifetime geometric distribution in presence of cure fractions

Model-Based fmri Analysis. Will Alexander Dept. of Experimental Psychology Ghent University

What is probability. A way of quantifying uncertainty. Mathematical theory originally developed to model outcomes in games of chance.

County-Level Small Area Estimation using the National Health Interview Survey (NHIS) and the Behavioral Risk Factor Surveillance System (BRFSS)

Salmonella outbreaks: Assessing causes and trends

An Ideal Observer Model of Visual Short-Term Memory Predicts Human Capacity Precision Tradeoffs

Transcription:

Response to Comment on Cognitive Science in the field: Does exercising core mathematical concepts improve school readiness? Authors: Moira R. Dillon 1 *, Rachael Meager 2, Joshua T. Dean 3, Harini Kannan 5, Elizabeth S. Spelke 4 *, Esther Duflo 3,5 * Affiliations: 1 Department of Psychology, New York University, New York, NY USA 2 Department of Economics, London School of Economics and Political Science, London UK 3 Department of Economics, Massachusetts Institute of Technology, Cambridge, MA USA 4 Department of Psychology, Harvard University, Cambridge, MA USA 5 Abdul Latif Jameel Poverty Action Lab *Correspondence to: moira.dillon@nyu.edu, spelke@wjh.harvard.edu, or eduflo@mit.edu We are pleased that the data we published in Science (1) is being used for complementary analyses. Millroth (2) does not question the validity of our results as presented, but analyzes our data using a different methodology, leading to conclusions that appear to be less strong than the conclusions supported by our own preregistered analyses, which use the classical statistics framework. While we are sympathetic to any further research done with our data, we think there are some issues with the analyses presented in this comment. The method used by Millroth is known in statistics to be very sensitive to the choice of algorithms and priors, and it is impossible to judge from the information provided whether the reported results are robust. We suspect that they are not because our own new analyses using more robust Bayesian methods lead to results that are broadly consistent with the analyses that we presented in our paper. We first describe the problems with Millroth s analyses and then present our own.

Millroth relies on one particular tool, Bayes Factors, which is highly controversial in the Bayesian statistics community. Bayes Factors do not have a straightforward interpretation, and they are not recommended for testing purposes in modern Bayesian statistics because they use marginal likelihoods, the value of which can be highly sensitive even to diffuse priors. A key takeaway from Kass and Raftery s seminal work (3) on Bayes Factors was, It is important, and feasible, to assess the sensitivity of conclusions to the prior distributions used. Problematic sensitivity to small changes in the priors is even more likely when point nulls are tested, as they are here, because typical prior choices such as spike and slab distributions or g-priors have undesirable properties (see, e.g., 4). Moreover, Bayes Factors often do not allow stable computation because marginalizing the likelihood numerically is challenging even with Markov chain Monte Carlo methods. These problems are much more severe than the challenges of computing raw posterior probabilities the typical output of Bayesian analysis because Bayes Factors, unlike posterior probabilities, require accurate calculation of the posterior normalizing constant. Millroth does not discuss the computational method that he used, nor does he demonstrate its reliability via citations to the computational literature or to simulation studies (e.g., 5-7). His comment never discusses any of the well-established problems with these tools, nor does it provide prior sensitivity analysis, which is the recommended practice when using Bayes Factors. The computational properties of the algorithms that he used are not discussed, despite the fact that the computational stability of these algorithms is often unclear. The documentation of the package that is used (8) suggests that the package uses the g-priors which, have several undesirable consistency issues (4).

Given these issues, the tool that is recommended in Bayesian statistics by most experts, and by standard textbooks on Bayesian statistics for psychology and economic analyses (see, e.g., 9, 10), is to compute the 95% credibility interval, which is the closest concept in Bayesian statistics to the 95% confidence interval in frequentist statistics. To test for the robustness of our results to a Bayesian framework, therefore, we fit a linear Bayesian model equivalent to our second regression specification for each endline using a Metropolis-Hastings algorithm, and we computed 95% credibility intervals. We used a normal likelihood combined with weakly informative priors (the prior specified that the coefficients were independently normally distributed each with a mean of zero and a variance of 10,000 and that the variance of the error terms followed an inverse gamma distribution with the shape and scale parameters both equal to 0.01.) The results of this analysis are consistent with those reported in our paper. Fig. 1 shows the means of the posterior distribution for the treatment coefficients for each outcome. For comparison, Fig. 2 shows the original frequentist coefficients and the 95% confidence intervals that we reported. The results lead to the same conclusions. In particular, as in the frequentist analysis reported in the paper, zero is outside of the 95% credibility interval in the Bayesian analysis for the mean at all three endlines for our prespecified all math and all nonsymbolic composites. Moreover, zero is outside of the 95% credibility interval for the symbolic composite at our first endline, but not at endlines two and three, again in accord with our published findings. When comparing the math games to the social, active control games, we observe that the credibility intervals of the two treatments do not overlap at the first or second endlines, either for the composite measure of all math assessments or for the composite measure of the nonsymbolic math assessments. At the third endline, they do overlap substantially on the

composite measure of all math assessments but only slightly on the composite measure of the nonsymbolic math assessments. These analyses therefore confirm both the positive and the negative effects that we originally reported. In summary, we welcome further analyses of our work, if the analyses are well-founded. We agree with Millroth that Bayesian statistics are a valuable tool for field experiments like our own. When such analyses focus on posterior probabilities rather than Bayes Factors, in accord with the recommendations of the leading statistics experts recognized in economics and psychology, they confirm rather than undermine our original conclusions.

Main Outome Measures EL1 Math Treatment EL2 Math Treatment EL3 Math Treatment EL1 Social Treatment EL2 Social Treatment EL3 Social Treatment 0.403 0.435 0.324 0.287 0.242 0.162 0.119 0.093 0.074 0.016 0.158 0.021 0.077 0.136 0.036 0.047 0.026 0.045 0.111 0.026 0.015 0.171 0.145 0.000 Z-Score of Math Composite Z-Score of Non-Symbolic Composite Z-Score of Symbolic Composite Z-Score of Social Composite Fig. 1. Means of the posterior distribution of the coefficients and the 95% credibility intervals from a linear Bayesian model of the main outcome measures. Main Outome Measures EL1 Math Treatment EL2 Math Treatment EL3 Math Treatment EL1 Social Treatment EL2 Social Treatment EL3 Social Treatment 0.415 0.450 0.249 0.322 0.287 0.138 0.121 0.094 0.059 0.012 0.164 0.023 0.075 0.132 0.122 0.040 0.044 0.050 0.027 0.025 0.005 0.012 0.165 0.139 Z-Score of Math Composite Z-Score of Non-Symbolic Composite Z-Score of Symbolic Composite Z-Score of Social Composite Fig. 2. Linear regression coefficients and the 95% confidence intervals reported in the original paper (1) for the main outcome measures.

References 1. M. R. Dillon, H. Kannan, J. T. Dean, E. S. Spelke, & E Duflo, Cognitive science in the field: A preschool intervention durably enhances intuitive but not formal mathematics. Science 357, 47-55 (2017). 2. P. Millroth, Descriptive statistics and Bayesian hypothesis testing show that the intervention enhances only geometric sensitivity: Comment on Dillon et al. (E-Letter, 2017); http://science.sciencemag.org/content/357/6346/47/tab-e-letters. 3. R. E. Kass, A. E. Raftery, Bayes Factors. JASA 90, 773-795 (1995). 4. F. Liang, R. Paulo, G. Molina, M. A. Clyde, J. O. Berger, Mixtures of g priors for Bayesian variable selection. JASA 103, 410-423 (2008). 5. T. J. DiCiccio, R. E. Kass, A. Raftery, L. Wasserman, Computing Bayes factors by combining simulation and asymptotic approximations. JASA 92, 903-915 (1997). 6. C. Han, B. P. Carlin, Markov chain Monte Carlo methods for computing Bayes factors: A comparative review. JASA 9, 1122-1132 (2001). 7. W. Xie, P. O. Lewis, Y. Fan, L. Kuo, M. H. Chen, Improving marginal likelihood estimation for Bayesian phylogenetic model selection. System. Biol. 60, 150-160 (2010). 8. R. D. Morey, J. N. Rouder, Package BayesFactor (2015); https://cran.r-project.org/web/packages/bayesfactor/bayesfactor.pdf 9. A. Gelman, D. B. Rubin, Avoiding model selection in Bayesian social research. Sociol. Methodol. 25, 165-173 (1995). 10. J. Kruschke, Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan. (Academic Press, 2014).