Selected Topics in Biostatistics Seminar Series. Missing Data. Sponsored by: Center For Clinical Investigation and Cleveland CTSC

Size: px
Start display at page:

Download "Selected Topics in Biostatistics Seminar Series. Missing Data. Sponsored by: Center For Clinical Investigation and Cleveland CTSC"

Transcription

1 Selected Topics in Biostatistics Seminar Series Missing Data Sponsored by: Center For Clinical Investigation and Cleveland CTSC Brian Schmotzer, MS Biostatistician, CCI Statistical Sciences Core June 23, 2010

2 Outline Missing data What is it, what does it look like? How did we get in this mess? What are the consequences? Goals for analyzing data in the presence of missingness Missing data assumptions What types of missing data are there? Traditional approaches What have people typically done in the past? What are the consequences of these approaches? Newer approach What is the state of the art now? How is it better than traditional approaches? 2

3 Missing Data Warnings Missing data is the single most pervasive analytical problem in research studies Most medical research papers do not refer to an adequate analysis approach for dealing with missing data Authors unaware/untrained? Journals and/or reviewers not savvy? 3

4 What is Missing Data? Any value for any variable that you do not have Can arise due to: Subject lost to follow-up Missed/skipped visits Instrument errors or failures Misplaced data extraction sheets We just didn t collect that value, etc 4

5 Example: coronary artery bypass grafting ID Age # Diseased Vessels Previous Surgery Pump Type Mortality Status No Off Alive No Off Alive Yes On Dead No Off Alive No On Alive No Off Alive No On Alive 5

6 Example: Some Missing Data ID Age # Diseased Vessels Previous Surgery Pump Type Mortality Status 1 65 No Off Alive No Off Alive 3 6 Yes On Dead No Off Alive Alive No Off Alive No On Alive 6

7 Example: More Missing Data ID Age # Diseased Vessels Previous Surgery Pump Type Mortality Status 1 65 No Off Alive No Alive 3 6 Yes On Dead 4 62 No Off Alive Alive No Off On Alive 7

8 Consequences of Missing Data Default for software packages is to throw out observations with any missing data Complete case or completers analysis Reduced sample size (best case) Loss of power Poorer estimates of parameters of interest No sample size (worst case) 8

9 More Subtle Consequence Bias: a systematic distortion of an estimate away from its true value Selection bias: bias due to systematic differences between subjects in the sample compared to the target population 9

10 Populations Mean = Male Weight Mean = Female Weight 10

11 Full samples Mean = Male Weight Mean = Female Weight 11

12 Missing values Male Weight Female Weight 12

13 Available samples Mean = Male Weight Mean = Female Weight 13

14 Analysis Goals Maintain the relationships among the variables so that we may: Minimize any bias Maximize the utilization of available information Get good estimates of uncertainty 14

15 NOT the Goals Try to impute values that are close to plausible replacements for representative of that might mirror the real, unknown, missing data values We are not here to recreate the truth 15

16 Missing Data Assumptions Missing Completely At Random (MCAR) Missing At Random (MAR) Not Missing At Random (NMAR) or Non-Ignorable Missingness (NIM) 16

17 MCAR Y is a variable with some values missing Assume MCAR if: The probability that Y is missing is unrelated to the value of Y The probability that Y is missing is unrelated to the set of other observed X variables P(Y is missing X, Y) = P(Y is missing) 17

18 MCAR Example In a laboratory experiment, a test tube is dropped and the cholesterol level that would have been measured from the blood sample is lost Probability that this data would be lost does not depend on the cholesterol level of the blood in the test tube, nor on the age, gender, race, etc. of the subject whose blood it is 18

19 MCAR Consequences MCAR is the strongest assumption In real world situations, MCAR is rare Difficult to convince the world of MCAR If MCAR, then complete case analysis is unbiased Essentially analyzing a random sub-sample of the original data sample 19

20 MAR Y is a variable with some values missing Assume MAR if: The probability that Y is missing is unrelated to the value of Y after controlling for other observed variables X P(Y is missing X, Y) = P(Y is missing X) 20

21 MAR Example In a survey, the probability of missing income depends on marital status, but within each marital status, the probability of missing income does not depend on income 21

22 MAR Individual Income (Single) Individual Income (Married) 22

23 MAR Example One can test if missingness of income depends on marital status (chi-square test) Missing Income Not Missing Income Single Married This evidence refutes MCAR, but does not prove MAR 23

24 MAR Consequences MAR is a weaker assumption than MCAR Easier to convince the world that data is MAR Complete case analysis is likely to be biased if MAR Tractable solutions exist for analyzing data under the MAR assumption 24

25 NMAR Y is a variable with some values missing Assume NMAR if: The probability that Y is missing is related to the value of Y even after controlling for other observed variables X P(Y is missing X, Y) cannot be simplified 25

26 NMAR Example In a study of body self image, it is found that women and men are equally likely to not self-report their weight, but it is suspected that heavier women are even more likely to not report their weight 26

27 NMAR Male Weight Female Weight 27

28 NMAR Example One can test if missingness of weight depends on gender (chi-square test) Missing Weight Not Missing Weight Male 9 21 Female 9 21 This evidence fails to refute MCAR, but could still be NMAR 28

29 NMAR Consequences NMAR is impossible to prove (relies on unknown data values), but easy to suspect No good, canned solutions exist for analyzing data under NMAR Open area of research Some success in specific situations Requires strong, situation-specific assumptions about how the data is missing 29

30 Assumptions Summary Most important missing data assumptions are untestable You will almost never have real data that is MCAR MAR is a common assumption to make Leads to tractable analysis solutions Can usually be defended to the world Note: defense is logical and subjectknowledge based rather than statistical in nature 30

31 Analysis Approaches Traditional Modern Listwise deletion (complete case analysis) Replacement with means Dummy variable adjustment Replacement with conditional means Hot Deck imputation Last observation carried forward (longitudinal) Multiple Imputation (MI) 31

32 Listwise Deletion Delete any case with missing data Strengths: Easy to implement (default for most software) Works for all types of analyses Unbiased if MCAR Data is a simple random sample of original data Standard error estimates are usually conservative 32

33 Weaknesses: Listwise Deletion Likely to introduce bias if MAR instead of MCAR Loss of power due to deleting observations Doesn t utilize all the information that is available 33

34 Replacement with Means Replace all missing values of variable X with the sample mean of X from available cases BMI Sample Mean 31.5 BMI

35 Strengths: Replacement with Means Easy to implement Comforting use of statistics Weaknesses: Inclusion of many repeated constant values at the mean guarantees a crippling bias towards a too low estimate of variability Variable is now useless for any future analysis you may have planned for it In general, a biased approach under MAR 35

36 Dummy Variable Adjustment In a regression predicting Y, suppose there are missing values of predictor X Create a new variable: D=1 if X is missing D=0 if X is present When X is missing, set X=c c is some constant (usually the sample mean of X) Regress Y on both X and D 36

37 Dummy Variable Adjustment Serum Vitamin D BMI Serum Vitamin D BMI D VitD = b 0 + b 1 BMI + b 2 D 37

38 Strengths: Dummy Variable Adjustment Adjusts for using the mean as the imputation value May be OK for not applicable (skip pattern) type of missing data (Allison, 1999) Weaknesses: Still biased under MAR Produces biased coefficient estimates (Jones, JASA, 1996) 38

39 Replacement with Conditional Means Replace missing values with predictions from an estimated regression equation Serum Vitamin D BMI Serum Vitamin D BMI BMI = a 0 + a 1 VitD 39

40 Replacement with Conditional Means Use full dataset to estimate the regression model of interest Serum Vitamin D BMI VitD = b 0 + b 1 BMI 40

41 Sample size 100 Missingness 30% Serum Vitamin D BMI 41

42 Sample size 100 Missingness 30% Complete data correlation Imputed data correlation Serum Vitamin D BMI 42

43 Replacement with Conditional Means Strengths: Better than replacement with means Can utilize auxiliary information from other covariates Weaknesses: Ruins the relationships among the variables Still produces biased estimates 43

44 Conditional Means Plus Error Same as before except randomly wiggle the estimates away from a straight line How much wiggle? 44

45 Conditional Means Plus Error Serum Vitamin D BMI Serum Vitamin D BMI BMI = a 0 + a 1 VitD 45

46 Conditional Means Plus Error Wiggle for each imputed BMI is chosen randomly based on the residual standard error for the BMI prediction model Serum Vitamin D BMI VitD = b 0 + b 1 BMI 46

47 Sample size 100 Missingness 30% Serum Vitamin D BMI 47

48 Sample size 100 Missingness 30% Complete data correlation Imputed data correlation Serum Vitamin D BMI 48

49 Strengths: Conditional Means Plus Error Better than conditional means An attempt is made to adjust the variability upwards Weaknesses: The attempt is insufficient Still produces biased estimates Method is inefficient because of introduced variability (i.e., the random wiggles ) 49

50 Multiple Imputation Do single imputation (previous example) several times and combine the results Combining several results increases efficiency The size of the wiggle needs to be purposely inflated There are many flavors of MI where the details differ (areas of open research) 50

51 Imputation 1 Serum Vitamin D BMI Imputed data correlation Serum Vitamin D BMI 51

52 Imputation 2 Serum Vitamin D BMI Imputed data correlation Serum Vitamin D BMI 52

53 Imputation 3 Serum Vitamin D BMI Imputed data correlation Serum Vitamin D BMI 53

54 Imputation 4 Serum Vitamin D BMI Imputed data correlation Serum Vitamin D BMI 54

55 Combine Results Correlation Correlation Correlation Correlation Serum Vitamin D Ave correlation BMI 55

56 Multiple Imputation Strengths: Unbiased for MAR Available as a canned procedure Weaknesses: Specialized software Complicated 56

57 Example: Compare Methods Simulate the truth: After bypass surgery, Mortality depends on: Age Number of diseased vessels Previous surgery Pump type *** of primary interest *** Force missing values (MAR) Compare analysis methods 57

58 Example: Compare Methods Table 1: Summary Statistics Variable % missing data Mean ± SD or % Mortality 0.0% 10.7% Age 37.8% 69.8 ± 6.7 # of Diseased Vessels 22.9% 3.4 ± 1.5 Previous Surgery 22.8% 67.6% On-pump 8.4% 68.9% 58

59 Example: Compare Methods Table 2: Results Method Odds Ratio of On-Pump Relative Difference Full dataset Complete Case Analysis % Replace with Means % Dummy Variable Adjustment % Replace with Conditional Means % Multiple Imputation % 59

60 Remaining Issues with MI Assumptions: Multivariate normality Harmless assumption for variables with no missing data Robust method, works well even if assumption is violated Software SAS PROC MI and MIANALYZE Stata R (MICE or RMS packages) 60

61 Remaining Issues Consult an expert for more about: How much missingness can MI handle? Should we use the response (dependent variable) for multiple imputation? Should we impute the response itself? What about dichotomous, nominal, ordinal variables? How to impute when the model includes interactions and other non-linearities? What to do with non-ignorable missing? 61

62 Conclusions You will encounter missing data in your research Inappropriate methods will make a bad situation worse Good methods will maximize the information you can get from your data Your data is not MCAR Traditional methods are insufficient for MAR Multiple imputation has optimal properties for MAR (unbiased and efficient) 62

63 Conclusions The goal is not to recreate the truth The goal is to maintain relationships and Minimize bias Maximize utilization of information Get good estimates of uncertainty You statisticians are making up data! Yes, and we are adjusting for the fact that we have made up data. 63

Missing Data and Imputation

Missing Data and Imputation Missing Data and Imputation Barnali Das NAACCR Webinar May 2016 Outline Basic concepts Missing data mechanisms Methods used to handle missing data 1 What are missing data? General term: data we intended

More information

Help! Statistics! Missing data. An introduction

Help! Statistics! Missing data. An introduction Help! Statistics! Missing data. An introduction Sacha la Bastide-van Gemert Medical Statistics and Decision Making Department of Epidemiology UMCG Help! Statistics! Lunch time lectures What? Frequently

More information

Analysis of TB prevalence surveys

Analysis of TB prevalence surveys Workshop and training course on TB prevalence surveys with a focus on field operations Analysis of TB prevalence surveys Day 8 Thursday, 4 August 2011 Phnom Penh Babis Sismanidis with acknowledgements

More information

Logistic Regression with Missing Data: A Comparison of Handling Methods, and Effects of Percent Missing Values

Logistic Regression with Missing Data: A Comparison of Handling Methods, and Effects of Percent Missing Values Logistic Regression with Missing Data: A Comparison of Handling Methods, and Effects of Percent Missing Values Sutthipong Meeyai School of Transportation Engineering, Suranaree University of Technology,

More information

Advanced Handling of Missing Data

Advanced Handling of Missing Data Advanced Handling of Missing Data One-day Workshop Nicole Janz ssrmcta@hermes.cam.ac.uk 2 Goals Discuss types of missingness Know advantages & disadvantages of missing data methods Learn multiple imputation

More information

Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study

Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study STATISTICAL METHODS Epidemiology Biostatistics and Public Health - 2016, Volume 13, Number 1 Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation

More information

Module 14: Missing Data Concepts

Module 14: Missing Data Concepts Module 14: Missing Data Concepts Jonathan Bartlett & James Carpenter London School of Hygiene & Tropical Medicine Supported by ESRC grant RES 189-25-0103 and MRC grant G0900724 Pre-requisites Module 3

More information

Designing and Analyzing RCTs. David L. Streiner, Ph.D.

Designing and Analyzing RCTs. David L. Streiner, Ph.D. Designing and Analyzing RCTs David L. Streiner, Ph.D. Emeritus Professor, Department of Psychiatry & Behavioural Neurosciences, McMaster University Emeritus Professor, Department of Clinical Epidemiology

More information

S Imputation of Categorical Missing Data: A comparison of Multivariate Normal and. Multinomial Methods. Holmes Finch.

S Imputation of Categorical Missing Data: A comparison of Multivariate Normal and. Multinomial Methods. Holmes Finch. S05-2008 Imputation of Categorical Missing Data: A comparison of Multivariate Normal and Abstract Multinomial Methods Holmes Finch Matt Margraf Ball State University Procedures for the imputation of missing

More information

The prevention and handling of the missing data

The prevention and handling of the missing data Review Article Korean J Anesthesiol 2013 May 64(5): 402-406 http://dx.doi.org/10.4097/kjae.2013.64.5.402 The prevention and handling of the missing data Department of Anesthesiology and Pain Medicine,

More information

Some General Guidelines for Choosing Missing Data Handling Methods in Educational Research

Some General Guidelines for Choosing Missing Data Handling Methods in Educational Research Journal of Modern Applied Statistical Methods Volume 13 Issue 2 Article 3 11-2014 Some General Guidelines for Choosing Missing Data Handling Methods in Educational Research Jehanzeb R. Cheema University

More information

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY Lingqi Tang 1, Thomas R. Belin 2, and Juwon Song 2 1 Center for Health Services Research,

More information

Validity and reliability of measurements

Validity and reliability of measurements Validity and reliability of measurements 2 Validity and reliability of measurements 4 5 Components in a dataset Why bother (examples from research) What is reliability? What is validity? How should I treat

More information

AMELIA II: A Package for Missing Data

AMELIA II: A Package for Missing Data AMELIA II: A Package for Missing Data James Honaker Gary King Matthew Blackwell July 24, 2009 I want to convince you of three things. I want to convince you of three things. 1 Missing data is a problem

More information

Inclusive Strategy with Confirmatory Factor Analysis, Multiple Imputation, and. All Incomplete Variables. Jin Eun Yoo, Brian French, Susan Maller

Inclusive Strategy with Confirmatory Factor Analysis, Multiple Imputation, and. All Incomplete Variables. Jin Eun Yoo, Brian French, Susan Maller Inclusive strategy with CFA/MI 1 Running head: CFA AND MULTIPLE IMPUTATION Inclusive Strategy with Confirmatory Factor Analysis, Multiple Imputation, and All Incomplete Variables Jin Eun Yoo, Brian French,

More information

Missing data in medical research is

Missing data in medical research is Abstract Missing data in medical research is a common problem that has long been recognised by statisticians and medical researchers alike. In general, if the effect of missing data is not taken into account

More information

Validity and reliability of measurements

Validity and reliability of measurements Validity and reliability of measurements 2 3 Request: Intention to treat Intention to treat and per protocol dealing with cross-overs (ref Hulley 2013) For example: Patients who did not take/get the medication

More information

The analysis of tuberculosis prevalence surveys. Babis Sismanidis with acknowledgements to Sian Floyd Harare, 30 November 2010

The analysis of tuberculosis prevalence surveys. Babis Sismanidis with acknowledgements to Sian Floyd Harare, 30 November 2010 The analysis of tuberculosis prevalence surveys Babis Sismanidis with acknowledgements to Sian Floyd Harare, 30 November 2010 Background Prevalence = TB cases / Number of eligible participants (95% CI

More information

Missing Data and Institutional Research

Missing Data and Institutional Research A version of this paper appears in Umbach, Paul D. (Ed.) (2005). Survey research. Emerging issues. New directions for institutional research #127. (Chapter 3, pp. 33-50). San Francisco: Jossey-Bass. Missing

More information

Strategies for handling missing data in randomised trials

Strategies for handling missing data in randomised trials Strategies for handling missing data in randomised trials NIHR statistical meeting London, 13th February 2012 Ian White MRC Biostatistics Unit, Cambridge, UK Plan 1. Why do missing data matter? 2. Popular

More information

Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models, 2nd Ed.

Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models, 2nd Ed. Eric Vittinghoff, David V. Glidden, Stephen C. Shiboski, and Charles E. McCulloch Division of Biostatistics Department of Epidemiology and Biostatistics University of California, San Francisco Regression

More information

Best Practice in Handling Cases of Missing or Incomplete Values in Data Analysis: A Guide against Eliminating Other Important Data

Best Practice in Handling Cases of Missing or Incomplete Values in Data Analysis: A Guide against Eliminating Other Important Data Best Practice in Handling Cases of Missing or Incomplete Values in Data Analysis: A Guide against Eliminating Other Important Data Sub-theme: Improving Test Development Procedures to Improve Validity Dibu

More information

Modern Strategies to Handle Missing Data: A Showcase of Research on Foster Children

Modern Strategies to Handle Missing Data: A Showcase of Research on Foster Children Modern Strategies to Handle Missing Data: A Showcase of Research on Foster Children Anouk Goemans, MSc PhD student Leiden University The Netherlands Email: a.goemans@fsw.leidenuniv.nl Modern Strategies

More information

Linear Regression in SAS

Linear Regression in SAS 1 Suppose we wish to examine factors that predict patient s hemoglobin levels. Simulated data for six patients is used throughout this tutorial. data hgb_data; input id age race $ bmi hgb; cards; 21 25

More information

Multiple Imputation For Missing Data: What Is It And How Can I Use It?

Multiple Imputation For Missing Data: What Is It And How Can I Use It? Multiple Imputation For Missing Data: What Is It And How Can I Use It? Jeffrey C. Wayman, Ph.D. Center for Social Organization of Schools Johns Hopkins University jwayman@csos.jhu.edu www.csos.jhu.edu

More information

Chapter Eight: Multivariate Analysis

Chapter Eight: Multivariate Analysis Chapter Eight: Multivariate Analysis Up until now, we have covered univariate ( one variable ) analysis and bivariate ( two variables ) analysis. We can also measure the simultaneous effects of two or

More information

Section on Survey Research Methods JSM 2009

Section on Survey Research Methods JSM 2009 Missing Data and Complex Samples: The Impact of Listwise Deletion vs. Subpopulation Analysis on Statistical Bias and Hypothesis Test Results when Data are MCAR and MAR Bethany A. Bell, Jeffrey D. Kromrey

More information

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research 2012 CCPRC Meeting Methodology Presession Workshop October 23, 2012, 2:00-5:00 p.m. Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy

More information

SESUG Paper SD

SESUG Paper SD SESUG Paper SD-106-2017 Missing Data and Complex Sample Surveys Using SAS : The Impact of Listwise Deletion vs. Multiple Imputation Methods on Point and Interval Estimates when Data are MCAR, MAR, and

More information

Meta-Analysis. Zifei Liu. Biological and Agricultural Engineering

Meta-Analysis. Zifei Liu. Biological and Agricultural Engineering Meta-Analysis Zifei Liu What is a meta-analysis; why perform a metaanalysis? How a meta-analysis work some basic concepts and principles Steps of Meta-analysis Cautions on meta-analysis 2 What is Meta-analysis

More information

Missing data imputation: focusing on single imputation

Missing data imputation: focusing on single imputation Big-data Clinical Trial Column Page 1 of 8 Missing data imputation: focusing on single imputation Zhongheng Zhang Department of Critical Care Medicine, Jinhua Municipal Central Hospital, Jinhua Hospital

More information

Master thesis Department of Statistics

Master thesis Department of Statistics Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Missing Data in the Swedish National Patients Register: Multiple Imputation by Fully Conditional Specification Jesper Hörnblad

More information

Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis

Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis EFSA/EBTC Colloquium, 25 October 2017 Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis Julian Higgins University of Bristol 1 Introduction to concepts Standard

More information

Week 10 Hour 1. Shapiro-Wilks Test (from last time) Cross-Validation. Week 10 Hour 2 Missing Data. Stat 302 Notes. Week 10, Hour 2, Page 1 / 32

Week 10 Hour 1. Shapiro-Wilks Test (from last time) Cross-Validation. Week 10 Hour 2 Missing Data. Stat 302 Notes. Week 10, Hour 2, Page 1 / 32 Week 10 Hour 1 Shapiro-Wilks Test (from last time) Cross-Validation Week 10 Hour 2 Missing Data Stat 302 Notes. Week 10, Hour 2, Page 1 / 32 Cross-Validation in the Wild It s often more important to describe

More information

PSI Missing Data Expert Group

PSI Missing Data Expert Group Title Missing Data: Discussion Points from the PSI Missing Data Expert Group Authors PSI Missing Data Expert Group 1 Abstract The Points to Consider Document on Missing Data was adopted by the Committee

More information

In this module I provide a few illustrations of options within lavaan for handling various situations.

In this module I provide a few illustrations of options within lavaan for handling various situations. In this module I provide a few illustrations of options within lavaan for handling various situations. An appropriate citation for this material is Yves Rosseel (2012). lavaan: An R Package for Structural

More information

Chapter Eight: Multivariate Analysis

Chapter Eight: Multivariate Analysis Chapter Eight: Multivariate Analysis Up until now, we have covered univariate ( one variable ) analysis and bivariate ( two variables ) analysis. We can also measure the simultaneous effects of two or

More information

AVOIDING BIAS AND RANDOM ERROR IN DATA ANALYSIS

AVOIDING BIAS AND RANDOM ERROR IN DATA ANALYSIS AVOIDING BIAS AND RANDOM ERROR IN DATA ANALYSIS Susan S. Ellenberg, Ph.D. Perelman School of Medicine University of Pennsylvania FDA Clinical Investigator Course Silver Spring, MD November 14, 2018 OVERVIEW

More information

Predictive Models for Making Patient Screening Decisions

Predictive Models for Making Patient Screening Decisions Predictive Models for Making Patient Screening Decisions MICHAEL HAHSLER 1, VISHAL AHUJA 1, MICHAEL BOWEN 2, AND FARZAD KAMALZADEH 1 1 Southern Methodist University, 2 UT Southwestern Medical Center and

More information

Accuracy of Range Restriction Correction with Multiple Imputation in Small and Moderate Samples: A Simulation Study

Accuracy of Range Restriction Correction with Multiple Imputation in Small and Moderate Samples: A Simulation Study A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to Practical Assessment, Research & Evaluation. Permission is granted to distribute

More information

Exploring the Impact of Missing Data in Multiple Regression

Exploring the Impact of Missing Data in Multiple Regression Exploring the Impact of Missing Data in Multiple Regression Michael G Kenward London School of Hygiene and Tropical Medicine 28th May 2015 1. Introduction In this note we are concerned with the conduct

More information

Estimands, Missing Data and Sensitivity Analysis: some overview remarks. Roderick Little

Estimands, Missing Data and Sensitivity Analysis: some overview remarks. Roderick Little Estimands, Missing Data and Sensitivity Analysis: some overview remarks Roderick Little NRC Panel s Charge To prepare a report with recommendations that would be useful for USFDA's development of guidance

More information

Longitudinal data monitoring for Child Health Indicators

Longitudinal data monitoring for Child Health Indicators Longitudinal data monitoring for Child Health Indicators Vincent Were Statistician, Senior Data Manager and Health Economist Kenya Medical Research institute [KEMRI] Presentation at Kenya Paediatric Association

More information

Daniel Boduszek University of Huddersfield

Daniel Boduszek University of Huddersfield Daniel Boduszek University of Huddersfield d.boduszek@hud.ac.uk Introduction to Logistic Regression SPSS procedure of LR Interpretation of SPSS output Presenting results from LR Logistic regression is

More information

COMMITTEE FOR PROPRIETARY MEDICINAL PRODUCTS (CPMP) POINTS TO CONSIDER ON MISSING DATA

COMMITTEE FOR PROPRIETARY MEDICINAL PRODUCTS (CPMP) POINTS TO CONSIDER ON MISSING DATA The European Agency for the Evaluation of Medicinal Products Evaluation of Medicines for Human Use London, 15 November 2001 CPMP/EWP/1776/99 COMMITTEE FOR PROPRIETARY MEDICINAL PRODUCTS (CPMP) POINTS TO

More information

Alternative indicators for the risk of non-response bias

Alternative indicators for the risk of non-response bias Alternative indicators for the risk of non-response bias Federal Committee on Statistical Methodology 2018 Research and Policy Conference Raphael Nishimura, Abt Associates James Wagner and Michael Elliott,

More information

Matched Cohort designs.

Matched Cohort designs. Matched Cohort designs. Stefan Franzén PhD Lund 2016 10 13 Registercentrum Västra Götaland Purpose: improved health care 25+ 30+ 70+ Registries Employed Papers Statistics IT Project management Registry

More information

bivariate analysis: The statistical analysis of the relationship between two variables.

bivariate analysis: The statistical analysis of the relationship between two variables. bivariate analysis: The statistical analysis of the relationship between two variables. cell frequency: The number of cases in a cell of a cross-tabulation (contingency table). chi-square (χ 2 ) test for

More information

Analysis of Vaccine Effects on Post-Infection Endpoints Biostat 578A Lecture 3

Analysis of Vaccine Effects on Post-Infection Endpoints Biostat 578A Lecture 3 Analysis of Vaccine Effects on Post-Infection Endpoints Biostat 578A Lecture 3 Analysis of Vaccine Effects on Post-Infection Endpoints p.1/40 Data Collected in Phase IIb/III Vaccine Trial Longitudinal

More information

What to do with missing data in clinical registry analysis?

What to do with missing data in clinical registry analysis? Melbourne 2011; Registry Special Interest Group What to do with missing data in clinical registry analysis? Rory Wolfe Acknowledgements: James Carpenter, Gerard O Reilly Department of Epidemiology & Preventive

More information

The Relative Performance of Full Information Maximum Likelihood Estimation for Missing Data in Structural Equation Models

The Relative Performance of Full Information Maximum Likelihood Estimation for Missing Data in Structural Equation Models University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Educational Psychology Papers and Publications Educational Psychology, Department of 7-1-2001 The Relative Performance of

More information

Catherine A. Welch 1*, Séverine Sabia 1,2, Eric Brunner 1, Mika Kivimäki 1 and Martin J. Shipley 1

Catherine A. Welch 1*, Séverine Sabia 1,2, Eric Brunner 1, Mika Kivimäki 1 and Martin J. Shipley 1 Welch et al. BMC Medical Research Methodology (2018) 18:89 https://doi.org/10.1186/s12874-018-0548-0 RESEARCH ARTICLE Open Access Does pattern mixture modelling reduce bias due to informative attrition

More information

Evaluators Perspectives on Research on Evaluation

Evaluators Perspectives on Research on Evaluation Supplemental Information New Directions in Evaluation Appendix A Survey on Evaluators Perspectives on Research on Evaluation Evaluators Perspectives on Research on Evaluation Research on Evaluation (RoE)

More information

Statistical data preparation: management of missing values and outliers

Statistical data preparation: management of missing values and outliers KJA Korean Journal of Anesthesiology Statistical Round pissn 2005-6419 eissn 2005-7563 Statistical data preparation: management of missing values and outliers Sang Kyu Kwak 1 and Jong Hae Kim 2 Departments

More information

Methods for Computing Missing Item Response in Psychometric Scale Construction

Methods for Computing Missing Item Response in Psychometric Scale Construction American Journal of Biostatistics Original Research Paper Methods for Computing Missing Item Response in Psychometric Scale Construction Ohidul Islam Siddiqui Institute of Statistical Research and Training

More information

Handling Missing Data in Educational Research Using SPSS

Handling Missing Data in Educational Research Using SPSS Handling Missing Data in Educational Research Using SPSS A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at George Mason University By Jehanzeb

More information

MISSING DATA AND PARAMETERS ESTIMATES IN MULTIDIMENSIONAL ITEM RESPONSE MODELS. Federico Andreis, Pier Alda Ferrari *

MISSING DATA AND PARAMETERS ESTIMATES IN MULTIDIMENSIONAL ITEM RESPONSE MODELS. Federico Andreis, Pier Alda Ferrari * Electronic Journal of Applied Statistical Analysis EJASA (2012), Electron. J. App. Stat. Anal., Vol. 5, Issue 3, 431 437 e-issn 2070-5948, DOI 10.1285/i20705948v5n3p431 2012 Università del Salento http://siba-ese.unile.it/index.php/ejasa/index

More information

Problem 1) Match the terms to their definitions. Every term is used exactly once. (In the real midterm, there are fewer terms).

Problem 1) Match the terms to their definitions. Every term is used exactly once. (In the real midterm, there are fewer terms). Problem 1) Match the terms to their definitions. Every term is used exactly once. (In the real midterm, there are fewer terms). 1. Bayesian Information Criterion 2. Cross-Validation 3. Robust 4. Imputation

More information

WELCOME! Lecture 11 Thommy Perlinger

WELCOME! Lecture 11 Thommy Perlinger Quantitative Methods II WELCOME! Lecture 11 Thommy Perlinger Regression based on violated assumptions If any of the assumptions are violated, potential inaccuracies may be present in the estimated regression

More information

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n. University of Groningen Latent instrumental variables Ebbes, P. IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document

More information

Sequential nonparametric regression multiple imputations. Irina Bondarenko and Trivellore Raghunathan

Sequential nonparametric regression multiple imputations. Irina Bondarenko and Trivellore Raghunathan Sequential nonparametric regression multiple imputations Irina Bondarenko and Trivellore Raghunathan Department of Biostatistics, University of Michigan Ann Arbor, MI 48105 Abstract Multiple imputation,

More information

Addendum: Multiple Regression Analysis (DRAFT 8/2/07)

Addendum: Multiple Regression Analysis (DRAFT 8/2/07) Addendum: Multiple Regression Analysis (DRAFT 8/2/07) When conducting a rapid ethnographic assessment, program staff may: Want to assess the relative degree to which a number of possible predictive variables

More information

Methodology for Non-Randomized Clinical Trials: Propensity Score Analysis Dan Conroy, Ph.D., inventiv Health, Burlington, MA

Methodology for Non-Randomized Clinical Trials: Propensity Score Analysis Dan Conroy, Ph.D., inventiv Health, Burlington, MA PharmaSUG 2014 - Paper SP08 Methodology for Non-Randomized Clinical Trials: Propensity Score Analysis Dan Conroy, Ph.D., inventiv Health, Burlington, MA ABSTRACT Randomized clinical trials serve as the

More information

ExperimentalPhysiology

ExperimentalPhysiology Exp Physiol 97.5 (2012) pp 557 561 557 Editorial ExperimentalPhysiology Categorized or continuous? Strength of an association and linear regression Gordon B. Drummond 1 and Sarah L. Vowler 2 1 Department

More information

A Strategy for Handling Missing Data in the Longitudinal Study of Young People in England (LSYPE)

A Strategy for Handling Missing Data in the Longitudinal Study of Young People in England (LSYPE) Research Report DCSF-RW086 A Strategy for Handling Missing Data in the Longitudinal Study of Young People in England (LSYPE) Andrea Piesse and Graham Kalton Westat Research Report No DCSF-RW086 A Strategy

More information

SISCR Module 7 Part I: Introduction Basic Concepts for Binary Biomarkers (Classifiers) and Continuous Biomarkers

SISCR Module 7 Part I: Introduction Basic Concepts for Binary Biomarkers (Classifiers) and Continuous Biomarkers SISCR Module 7 Part I: Introduction Basic Concepts for Binary Biomarkers (Classifiers) and Continuous Biomarkers Kathleen Kerr, Ph.D. Associate Professor Department of Biostatistics University of Washington

More information

The RoB 2.0 tool (individually randomized, cross-over trials)

The RoB 2.0 tool (individually randomized, cross-over trials) The RoB 2.0 tool (individually randomized, cross-over trials) Study design Randomized parallel group trial Cluster-randomized trial Randomized cross-over or other matched design Specify which outcome is

More information

An Empirical Study of Nonresponse Adjustment Methods for the Survey of Doctorate Recipients Wilson Blvd., Suite 965, Arlington, VA 22230

An Empirical Study of Nonresponse Adjustment Methods for the Survey of Doctorate Recipients Wilson Blvd., Suite 965, Arlington, VA 22230 An Empirical Study of Nonresponse Adjustment Methods for the Survey of Doctorate Recipients 1 Fan Zhang 1 and Stephen Cohen 1 Donsig Jang 2, Amang Suasih 2, and Sonya Vartivarian 2 1 National Science Foundation,

More information

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality Week 9 Hour 3 Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality Stat 302 Notes. Week 9, Hour 3, Page 1 / 39 Stepwise Now that we've introduced interactions,

More information

Abstract. Introduction A SIMULATION STUDY OF ESTIMATORS FOR RATES OF CHANGES IN LONGITUDINAL STUDIES WITH ATTRITION

Abstract. Introduction A SIMULATION STUDY OF ESTIMATORS FOR RATES OF CHANGES IN LONGITUDINAL STUDIES WITH ATTRITION A SIMULATION STUDY OF ESTIMATORS FOR RATES OF CHANGES IN LONGITUDINAL STUDIES WITH ATTRITION Fong Wang, Genentech Inc. Mary Lange, Immunex Corp. Abstract Many longitudinal studies and clinical trials are

More information

Missing Data: Our View of the State of the Art

Missing Data: Our View of the State of the Art Psychological Methods Copyright 2002 by the American Psychological Association, Inc. 2002, Vol. 7, No. 2, 147 177 1082-989X/02/$5.00 DOI: 10.1037//1082-989X.7.2.147 Missing Data: Our View of the State

More information

LOCF and MMRM: Thoughts on Comparisons

LOCF and MMRM: Thoughts on Comparisons LOCF and MMRM: Thoughts on Comparisons Raymond J. Carroll Texas A&M University http://stat.tamu.edu/~carroll carroll@stat.tamu.edu Outline Brief rehash of the talks comparing LOCF and mixed models Defense

More information

Should a Normal Imputation Model Be Modified to Impute Skewed Variables?

Should a Normal Imputation Model Be Modified to Impute Skewed Variables? Sociological Methods and Research, 2013, 42(1), 105-138 Should a Normal Imputation Model Be Modified to Impute Skewed Variables? Paul T. von Hippel Abstract (169 words) Researchers often impute continuous

More information

Instrumental Variables Estimation: An Introduction

Instrumental Variables Estimation: An Introduction Instrumental Variables Estimation: An Introduction Susan L. Ettner, Ph.D. Professor Division of General Internal Medicine and Health Services Research, UCLA The Problem The Problem Suppose you wish to

More information

Multiple imputation for handling missing outcome data when estimating the relative risk

Multiple imputation for handling missing outcome data when estimating the relative risk Sullivan et al. BMC Medical Research Methodology (2017) 17:134 DOI 10.1186/s12874-017-0414-5 RESEARCH ARTICLE Open Access Multiple imputation for handling missing outcome data when estimating the relative

More information

Module Overview. What is a Marker? Part 1 Overview

Module Overview. What is a Marker? Part 1 Overview SISCR Module 7 Part I: Introduction Basic Concepts for Binary Classification Tools and Continuous Biomarkers Kathleen Kerr, Ph.D. Associate Professor Department of Biostatistics University of Washington

More information

1.4 - Linear Regression and MS Excel

1.4 - Linear Regression and MS Excel 1.4 - Linear Regression and MS Excel Regression is an analytic technique for determining the relationship between a dependent variable and an independent variable. When the two variables have a linear

More information

Clincial Biostatistics. Regression

Clincial Biostatistics. Regression Regression analyses Clincial Biostatistics Regression Regression is the rather strange name given to a set of methods for predicting one variable from another. The data shown in Table 1 and come from a

More information

How should the propensity score be estimated when some confounders are partially observed?

How should the propensity score be estimated when some confounders are partially observed? How should the propensity score be estimated when some confounders are partially observed? Clémence Leyrat 1, James Carpenter 1,2, Elizabeth Williamson 1,3, Helen Blake 1 1 Department of Medical statistics,

More information

Missing data in clinical trials: making the best of what we haven t got.

Missing data in clinical trials: making the best of what we haven t got. Missing data in clinical trials: making the best of what we haven t got. Royal Statistical Society Professional Statisticians Forum Presentation by Michael O Kelly, Senior Statistical Director, IQVIA Copyright

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

Comparison of imputation and modelling methods in the analysis of a physical activity trial with missing outcomes

Comparison of imputation and modelling methods in the analysis of a physical activity trial with missing outcomes IJE vol.34 no.1 International Epidemiological Association 2004; all rights reserved. International Journal of Epidemiology 2005;34:89 99 Advance Access publication 27 August 2004 doi:10.1093/ije/dyh297

More information

Part 8 Logistic Regression

Part 8 Logistic Regression 1 Quantitative Methods for Health Research A Practical Interactive Guide to Epidemiology and Statistics Practical Course in Quantitative Data Handling SPSS (Statistical Package for the Social Sciences)

More information

Maintenance of weight loss and behaviour. dietary intervention: 1 year follow up

Maintenance of weight loss and behaviour. dietary intervention: 1 year follow up Institute of Psychological Sciences FACULTY OF MEDICINE AND HEALTH Maintenance of weight loss and behaviour change Dropouts following and a 12 Missing week healthy Data eating dietary intervention: 1 year

More information

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data Introduction to Multilevel Models for Longitudinal and Repeated Measures Data Today s Class: Features of longitudinal data Features of longitudinal models What can MLM do for you? What to expect in this

More information

Bayesian approaches to handling missing data: Practical Exercises

Bayesian approaches to handling missing data: Practical Exercises Bayesian approaches to handling missing data: Practical Exercises 1 Practical A Thanks to James Carpenter and Jonathan Bartlett who developed the exercise on which this practical is based (funded by ESRC).

More information

EPI 200C Final, June 4 th, 2009 This exam includes 24 questions.

EPI 200C Final, June 4 th, 2009 This exam includes 24 questions. Greenland/Arah, Epi 200C Sp 2000 1 of 6 EPI 200C Final, June 4 th, 2009 This exam includes 24 questions. INSTRUCTIONS: Write all answers on the answer sheets supplied; PRINT YOUR NAME and STUDENT ID NUMBER

More information

Regression Discontinuity Analysis

Regression Discontinuity Analysis Regression Discontinuity Analysis A researcher wants to determine whether tutoring underachieving middle school students improves their math grades. Another wonders whether providing financial aid to low-income

More information

Survey research (Lecture 1) Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.

Survey research (Lecture 1) Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4. Summary & Conclusion Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.0 Overview 1. Survey research 2. Survey design 3. Descriptives & graphing 4. Correlation

More information

Survey research (Lecture 1)

Survey research (Lecture 1) Summary & Conclusion Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.0 Overview 1. Survey research 2. Survey design 3. Descriptives & graphing 4. Correlation

More information

research methods & reporting

research methods & reporting research methods & reporting Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls Jonathan A C Sterne, 1 Ian R White, 2 John B Carlin, 3 Michael Spratt,

More information

Missing by Design: Planned Missing-Data Designs in Social Science

Missing by Design: Planned Missing-Data Designs in Social Science Research & Methods ISSN 1234-9224 Vol. 20 (1, 2011): 81 105 Institute of Philosophy and Sociology Polish Academy of Sciences, Warsaw www.ifi span.waw.pl e-mail: publish@ifi span.waw.pl Missing by Design:

More information

Design and Analysis Plan Quantitative Synthesis of Federally-Funded Teen Pregnancy Prevention Programs HHS Contract #HHSP I 5/2/2016

Design and Analysis Plan Quantitative Synthesis of Federally-Funded Teen Pregnancy Prevention Programs HHS Contract #HHSP I 5/2/2016 Design and Analysis Plan Quantitative Synthesis of Federally-Funded Teen Pregnancy Prevention Programs HHS Contract #HHSP233201500069I 5/2/2016 Overview The goal of the meta-analysis is to assess the effects

More information

Multivariable Systems. Lawrence Hubert. July 31, 2011

Multivariable Systems. Lawrence Hubert. July 31, 2011 Multivariable July 31, 2011 Whenever results are presented within a multivariate context, it is important to remember that there is a system present among the variables, and this has a number of implications

More information

Comparison And Application Of Methods To Address Confounding By Indication In Non- Randomized Clinical Studies

Comparison And Application Of Methods To Address Confounding By Indication In Non- Randomized Clinical Studies University of Massachusetts Amherst ScholarWorks@UMass Amherst Masters Theses 1911 - February 2014 Dissertations and Theses 2013 Comparison And Application Of Methods To Address Confounding By Indication

More information

ethnicity recording in primary care

ethnicity recording in primary care ethnicity recording in primary care Multiple imputation of missing data in ethnicity recording using The Health Improvement Network database Tra Pham 1 PhD Supervisors: Dr Irene Petersen 1, Prof James

More information

PEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW. Ball State University

PEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW. Ball State University PEER REVIEW HISTORY BMJ Open publishes all reviews undertaken for accepted manuscripts. Reviewers are asked to complete a checklist review form (see an example) and are provided with free text boxes to

More information

(C) Jamalludin Ab Rahman

(C) Jamalludin Ab Rahman SPSS Note The GLM Multivariate procedure is based on the General Linear Model procedure, in which factors and covariates are assumed to have a linear relationship to the dependent variable. Factors. Categorical

More information

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data Introduction to Multilevel Models for Longitudinal and Repeated Measures Data Today s Class: Features of longitudinal data Features of longitudinal models What can MLM do for you? What to expect in this

More information

Data harmonization tutorial:teaser for FH2019

Data harmonization tutorial:teaser for FH2019 Data harmonization tutorial:teaser for FH2019 Alden Gross, Johns Hopkins Rich Jones, Brown University Friday Harbor Tahoe 22 Aug. 2018 1 / 50 Outline Outline What is harmonization? Approach Prestatistical

More information