Imputation classes as a framework for inferences from non-random samples. 1

Similar documents
Missing Data and Imputation

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY

Effects of propensity score overlap on the estimates of treatment effects. Yating Zheng & Laura Stapleton

By: Mei-Jie Zhang, Ph.D.

ST440/550: Applied Bayesian Statistics. (10) Frequentist Properties of Bayesian Methods

Discussion. Ralf T. Münnich Variance Estimation in the Presence of Nonresponse

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Propensity scores: what, why and why not?

Nonresponse Rates and Nonresponse Bias In Household Surveys

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)

Subject index. bootstrap...94 National Maternal and Infant Health Study (NMIHS) example

Appendix 1. Sensitivity analysis for ACQ: missing value analysis by multiple imputation

A Review of Hot Deck Imputation for Survey Non-response

Imputation approaches for potential outcomes in causal inference

Statistical data preparation: management of missing values and outliers

MS&E 226: Small Data

Causal Validity Considerations for Including High Quality Non-Experimental Evidence in Systematic Reviews

Sequential nonparametric regression multiple imputations. Irina Bondarenko and Trivellore Raghunathan

Selected Topics in Biostatistics Seminar Series. Missing Data. Sponsored by: Center For Clinical Investigation and Cleveland CTSC

Methods for Computing Missing Item Response in Psychometric Scale Construction

Propensity score methods to adjust for confounding in assessing treatment effects: bias and precision

Introduction to Observational Studies. Jane Pinelis

heterogeneity in meta-analysis stats methodologists meeting 3 September 2015

A Potential Outcomes View of Value-Added Assessment in Education

Comparing Alternatives for Estimation from Nonprobability Samples

Lecture Outline Biost 517 Applied Biostatistics I. Statistical Goals of Studies Role of Statistical Inference

UN Handbook Ch. 7 'Managing sources of non-sampling error': recommendations on response rates

Data Analysis Using Regression and Multilevel/Hierarchical Models

On Missing Data and Genotyping Errors in Association Studies

Chapter 21 Multilevel Propensity Score Methods for Estimating Causal Effects: A Latent Class Modeling Strategy

Carrying out an Empirical Project

Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study

Practical propensity score matching: a reply to Smith and Todd

The Impact of Confounder Selection in Propensity. Scores for Rare Events Data - with Applications to. Birth Defects

An Introduction to Bayesian Statistics

Advanced Bayesian Models for the Social Sciences

Examining Relationships Least-squares regression. Sections 2.3

arxiv: v2 [stat.ap] 7 Dec 2016

Advanced Bayesian Models for the Social Sciences. TA: Elizabeth Menninga (University of North Carolina, Chapel Hill)

Module 14: Missing Data Concepts

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data

Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling, Sensitivity Analysis, and Causal Inference

Instrumental Variables Estimation: An Introduction

Bayesian methods for combining multiple Individual and Aggregate data Sources in observational studies

County-Level Small Area Estimation using the National Health Interview Survey (NHIS) and the Behavioral Risk Factor Surveillance System (BRFSS)

Stephen G. West and Felix Thoemmes

Miguel Angel Luque-Fernandez, Aurélien Belot, Linda Valeri, Giovanni Ceruli, Camille Maringe, and Bernard Rachet

Causal Methods for Observational Data Amanda Stevenson, University of Texas at Austin Population Research Center, Austin, TX

Logistic Regression with Missing Data: A Comparison of Handling Methods, and Effects of Percent Missing Values

Comparison of External and Internal Validity Bias in Experimental and Quasi- Experimental Approaches Mark White, Ben Hansen, Tim Lycurgus, Brian Rowan

SESUG Paper SD

Donna L. Coffman Joint Prevention Methodology Seminar

Rise of the Machines

Kelvin Chan Feb 10, 2015

S Imputation of Categorical Missing Data: A comparison of Multivariate Normal and. Multinomial Methods. Holmes Finch.

Missing by Design: Planned Missing-Data Designs in Social Science

Propensity Score Analysis Shenyang Guo, Ph.D.

AnExaminationoftheQualityand UtilityofInterviewerEstimatesof HouseholdCharacteristicsinthe NationalSurveyofFamilyGrowth. BradyWest

The Late Pretest Problem in Randomized Control Trials of Education Interventions

Propensity Score Methods with Multilevel Data. March 19, 2014

Generalization and Theory-Building in Software Engineering Research

Missing data in medical research is

How should the propensity score be estimated when some confounders are partially observed?

Introduction to Applied Research in Economics

Multiple imputation for handling missing outcome data when estimating the relative risk

Propensity Score Matching with Limited Overlap. Abstract

Strategies for handling missing data in randomised trials

Modern Regression Methods

Impact and adjustment of selection bias. in the assessment of measurement equivalence

Methods for Addressing Selection Bias in Observational Studies

Practical Bayesian Optimization of Machine Learning Algorithms. Jasper Snoek, Ryan Adams, Hugo LaRochelle NIPS 2012

Weight Adjustment Methods using Multilevel Propensity Models and Random Forests

The essential focus of an experiment is to show that variance can be produced in a DV by manipulation of an IV.

Help! Statistics! Missing data. An introduction

Regression Discontinuity Designs: An Approach to Causal Inference Using Observational Data

JSM Survey Research Methods Section

Matching with Text Data: An Experimental Evaluation of Methods for Matching Documents and of Measuring Match Quality

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method

Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology

Quantitative Methods. Lonnie Berger. Research Training Policy Practice

Combining machine learning and matching techniques to improve causal inference in program evaluation

(b) empirical power. IV: blinded IV: unblinded Regr: blinded Regr: unblinded α. empirical power

Propensity Score Methods for Causal Inference with the PSMATCH Procedure

Section on Survey Research Methods JSM 2009

Methods for treating bias in ISTAT mixed mode social surveys

Is Random Sampling Necessary? Dan Hedlin Department of Statistics, Stockholm University

Using machine learning to assess covariate balance in matching studies

Linear Regression Analysis

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Assessing the efficacy of mobile phone interventions using randomised controlled trials: issues and their solutions. Dr Emma Beard

SCHIATTINO ET AL. Biol Res 38, 2005, Disciplinary Program of Immunology, ICBM, Faculty of Medicine, University of Chile. 3

EMPIRICAL STRATEGIES IN LABOUR ECONOMICS

Propensity Score Methods to Adjust for Bias in Observational Data SAS HEALTH USERS GROUP APRIL 6, 2018

Dichotomizing partial compliance and increased participant burden in factorial designs: the performance of four noncompliance methods

Identifying Peer Influence Effects in Observational Social Network Data: An Evaluation of Propensity Score Methods

Randomized Controlled Trials Shortcomings & Alternatives icbt Leiden Twitter: CAAL 1

Dr. Allen Back. Sep. 30, 2016

On Test Scores (Part 2) How to Properly Use Test Scores in Secondary Analyses. Structural Equation Modeling Lecture #12 April 29, 2015

Using historical data for Bayesian sample size determination

TESTING FOR COVARIATE BALANCE IN COMPARATIVE STUDIES

Transcription:

Imputation classes as a framework for inferences from non-random samples. 1 Vladislav Beresovsky (hvy4@cdc.gov) National Center for Health Statistics, CDC 1 Disclaimer: The findings and conclusions in this presentation are those of the author(s) and do not necessarily represent the views of the Centers for Disease Control and Prevention. 1 / 7

Proliferation of web surveys and the future of NHIS Is it possible to make reliable estimates from web surveys, planned as extensions to the regular NHIS? Current problems with conventional randomized surveys: Increasingly difficult to define frame with good coverage; Growing unit non-response; Expensive data colection. Advantages of web surveys: Growing number of vendors offering relatively low cost web panels; Quick turnaround of sampling and data collection; Downside of inferences with web survey data: Possibly biased estimates; Unclear how to estimate variances; 2 / 7

Data description, inferential goals and complications Two samples based on NHIS questionnaire collected at the same time (last quarter of 2014): Nonrandom web survey sample s W. Core questions X C, detail questions Y D, unknown weights. NHIS random reference sample s R. Core questions X C, no Y D, known NHIS sampling weights. The challenge: How to make inferences for Y D using X C as covariates utilizing data from s W and s R? Similar problem as inferences with missing data and case-control studies. It requires model-based solution. Which model to use: predictive model for Y D or propensity model of responding to nonrandom web survey? How to account for modeling variability in variance estimation? How to provide for any kind of data analysis with Y D, rather than just estimating totals? 3 / 7

Imputation classes framework for inferences from nonrandom samples Proposed imputation classes framework unifies predictive and propensity models. Create imputation classes as areas of homogeneous prediction by both propensity and predictive models. Similar idea was advocated for missing data in Haziza & Beaumont Int. Stat. Rev. (2007) paper; Use hot-deck to impute Y D from s W to s R within these classes; Make all kind of estimates from s R with imputed Y D and known sampling weights w r ; Calculate variances using delete-a-group modification of jackknife with imputed data, described in Rao & Shao Biometrika (1992) paper; 4 / 7

Results of simulation studies Very basic simulation study demonstrated: Estimates based on imputation classes are double robust- they are unbiased if just one of the models is correct; Valid inferences for population mean and median using delete-a-group version of Rao & Shao jackknife; Simulation study motivated by Kang & Schafer Statistical Science (2007) paper Demystifying Double Robustness... : When both predictive and propensity models are incorrect, estimates based on imputation classes defined by both models are more robust against model misspecification than estimates based on any single model; Estimates by hotdeck within imputation classes are more robust than deterministic imputation; Machine learning have advantage over model-based methods only for large samples or stronger core covariates X C. 5 / 7

Awesome power of response and propensity models ) BR (Ŷmod = Ŷdir Ŷmod 100% Ŷ dir Y pop Figure: pred LM.y PSA LM.p imp5 LM.yp imp10 LM.yp It s impossible, probably you are doing something wrong... Julie Gershunskaya, BLS statistician 6 / 7

It s all about robustness against model misspecification...... Therefore Double robust (DR) methods are better than just propensity (PR) or outcome regression (OR) models; Robust estimation methods (m-estimation, splines, LASSO, Least Angle Regression, Bayesian spike and slabs, model cross-validation) are preferable at every stage; Hotdeck imputation or pair matching techniques are better than deterministic prediction because they may capture information from model residuals (Rubin (1973,1979); Dehejia and Wahba (1999)). Biostatisticians have developed great experience working with nonrandom samples in the context of causal inference. There is much to learn from them and to figure out its application to survey samples. 7 / 7