THE EFFECT OF NUMBER OF IMPUTATIONS ON PARAMETER ESTIMATES IN MULTIPLE IMPUTATION WITH SMALL SAMPLE SIZES MAURICE KAVANAGH DISSERTATION

Size: px
Start display at page:

Download "THE EFFECT OF NUMBER OF IMPUTATIONS ON PARAMETER ESTIMATES IN MULTIPLE IMPUTATION WITH SMALL SAMPLE SIZES MAURICE KAVANAGH DISSERTATION"

Transcription

1 THE EFFECT OF NUMBER OF IMPUTATIONS ON PARAMETER ESTIMATES IN MULTIPLE IMPUTATION WITH SMALL SAMPLE SIZES by MAURICE KAVANAGH DISSERTATION Submitted to the Graduate School of Wayne State University, Detroit, Michigan In partial fulfillment of the requirements for the degree of DOCTOR OF EDUCATION 2018 MAJOR: EVALUATION AND RESEARCH Approved By: Advisor Date ii

2 ACKNOWLEDGEMENTS I would like to acknowledge my advisor, Dr. Shlomo Sawilowsky for his guidance, encouragement, and sense of humor throughout the process. I would also like to acknowledge and thank my committee members, Dr. Irwin Jopps for his kindness and gentle guidance, and Dr. Monte Piliawsky for his sparkling conversations, wit, and encouragement, especially with our shared passion and sanity preserving addiction, running. I would like to acknowledge the support and friendship of Tiana Bosley, the yin to my yang throughout the many courses we did together. I would like to acknowledge Janet Leslie for her encouragement and support as I began the process, and Tinysha McCord for her energy, encouragement and support as I completed it. I especially want to acknowledge the gentle, unwavering support and mentoring of Nelia Afonso, one of the warmest, kindest human beings I have ever had the privilege of encountering. Thank you. iii

3 TABLE OF CONTENTS Acknowledgements ii List of Tables vii List of Figures xiv CHAPTER 1 Introduction 1 Impact of Missing Data 2 Remediating Missing Data 3 Statement of the Problem 4 Limitations 5 Assumptions 6 Definitions 6 CHAPTER 2 Literature Review 9 Classifications of Missing Data 9 Proportion of Missing Data 10 Ignorable Versus Non-ignorable Missingness 10 Classic Data Handling Methods 12 Deletion 12 iv

4 Weighting 14 Data Replacement 15 Explicit Methods of Replacement 15 Implicit Methods 17 Contemporary Methods of Replacement: Multiple Imputation and Maximum Likelihood Estimation 19 Multiple Imputation 19 Likelihood Estimation 24 Numbers of Imputations 26 CHAPTER 3 Methodology 29 Sampling 29 Procedure 30 CHAPTER 4 Results 35 Data Set Adjustments 35 Tables of Organization 35 Regression Analyses, Intact Samples 35 Imputation Procedures 36 v

5 Regression Analysis, Imputed Data Sets 37 Changes to Alpha Values, Imputed Data Sets 37 Out of s Beta, Post Imputation 38 Confidence Interval Changes, Imputed Data Sets 40 CHAPTER 5 Discussion 43 Narrowed Confidence Intervals, Imputed Data Sets 43 Changes to Alpha Values, Imputed Data Sets 45 Manipulating Number Of Imputations, Sample Size, And Percent Missing_ 45 Out of s Beta, Post Imputation 47 Parent Data Set 48 Software Issues 49 Conclusion 50 Future Directions 50 APPENDIX A: Computer Coding and Syntax 52 APPENDIX B: Regression Coefficient Tables 56 APPENDIX C: Tables of Confidence Interval Widths 116 References 128 vi

6 Abstract 134 Autobiographical Statement 135 vii

7 LIST OF TABLES Table 1: Table 2: Table 3: Total Number of Observations, Eight Variables With Intact Data Set 29 Number of Missing Observations, Eight Variables With Data Missing on Six 30 Example Intact 20 Record Sample Chosen From Original Database 31 Table 4: Field Name Descriptions for Variables Used in Study 32 Table 5: Example Regression Analysis Results 33 Table 6: Table 7: Data Set N=20 After 20% MAR Deletion On Age, Education, Family, Race, Sex, And Income 34 Regression Analysis, Confidence Interval Width Predicted By Sample Size, Percent Missing, And Number Imputations 41 Table B1: Intact data, sample size 20, 10% deletion rate to be applied 56 Table B2: Sample size 20, 10% deletion rate, 20 imputations 56 Table B3: Sample size 20, 10% deletion rate, 40 imputations 57 Table B4: Sample size 20, 10% deletion rate, 60 imputations 57 Table B5: Sample size 20, 10% deletion rate, 100 imputations 58 Table B6: Intact data, sample size 20, 20% deletion rate to be applied 58 Table B7: Sample size 20, 20% deletion rate, 20 imputations 59 Table B8: Sample size 20, 20% deletion rate, 40 imputations 59 Table B9: Sample size 20, 20% deletion rate, 60 imputations 60 Table B10: Sample size 20, 20% deletion rate, 100 imputations 60 Table B11: Intact data, sample size 20, 30% deletion rate to be applied 61 viii

8 Table B12: Sample size 20, 30% deletion rate, 20 imputations 61 Table B13: Sample size 20, 30% deletion rate, 40 imputations 62 Table B14: Sample size 20, 30% deletion rate, 60 imputations 62 Table B15: Sample size 20, 30% deletion rate, 100 imputations 63 Table B16: Intact data, sample size 20, 40% deletion rate to be applied 63 Table B17: Sample size 20, 40% deletion rate, 20 imputations 64 Table B18: Sample size 20, 40% deletion rate, 40 imputations 64 Table B19: Sample size 20, 40% deletion rate, 60 imputations 65 Table B20: Sample size 20, 40% deletion rate, 100 imputations 65 Table B21: Intact data, sample size 40, 10% deletion rate to be applied 66 Table B22: Sample size 40, 10% deletion rate, 20 imputations 66 Table B23: Sample size 40, 10% deletion rate, 40 imputations 67 Table B24: Sample size 40, 10% deletion rate, 60 imputations 67 Table B25: Sample size 40, 10% deletion rate, 100 imputations 68 Table B26: Intact data, sample size 40, 20% deletion rate to be applied 68 Table B27: Sample size 40, 20% deletion rate, 20 imputations 69 Table B28: Sample size 40, 20% deletion rate, 40 imputations 69 Table B29: Sample size 40, 20% deletion rate, 60 imputations 70 Table B30: Sample size 40, 20% deletion rate, 100 imputations 70 Table B31: Intact data, sample size 40, 30% deletion rate to be applied 71 Table B32: Sample size 40, 30% deletion rate, 20 imputations 71 Table B33: Sample size 40, 30% deletion rate, 40 imputations 72 ix

9 Table B34: Sample size 40, 30% deletion rate, 60 imputations 72 Table B35: Sample size 40, 30% deletion rate, 100 imputations 73 Table B36: Intact data, sample size 40, 40% deletion rate to be applied 73 Table B37: Sample size 40, 40% deletion rate, 20 imputations 74 Table B38: Sample size 40, 40% deletion rate, 40 imputations 74 Table B39: Sample size 40, 40% deletion rate, 60 imputations 75 Table B40: Sample size 40, 40% deletion rate, 100 imputations 75 Table B41: Intact data, sample size 50, 10% deletion rate to be applied 76 Table B42: Sample size 50, 10% deletion rate, 20 imputations 76 Table B43: Sample size 50, 10% deletion rate, 40 imputations 77 Table B44: Sample size 50, 10% deletion rate, 60 imputations 77 Table B45: Sample size 50, 10% deletion rate, 100 imputations 78 Table B46: Intact data, sample size 50, 20% deletion rate to be applied 78 Table B47: Sample size 50, 20% deletion rate, 20 imputations 79 Table B48: Sample size 50, 20% deletion rate, 40 imputations 79 Table B49: Sample size 50, 20% deletion rate, 60 imputations 80 Table B50: Sample size 50, 20% deletion rate, 100 imputations 80 Table B51: Intact data, sample size 50, 30% deletion rate to be applied 81 Table B52: Sample size 50, 30% deletion rate, 20 imputations 81 Table B53: Sample size 50, 30% deletion rate, 40 imputations 82 Table B54: Sample size 50, 30% deletion rate, 60 imputations 82 Table B55: Sample size 50, 30% deletion rate, 100 imputations 83 x

10 Table B56: Intact data, sample size 50, 40% deletion rate to be applied 83 Table B57: Sample size 50, 40% deletion rate, 20 imputations 84 Table B58: Sample size 50, 40% deletion rate, 40 imputations 84 Table B59: Sample size 50, 40% deletion rate, 60 imputations 85 Table B60: Sample size 50, 40% deletion rate, 100 imputations 85 Table B61: Intact data, sample size 100, 10% deletion rate to be applied 86 Table B62: Sample size 100, 10% deletion rate, 20 imputations 86 Table B63: Sample size 100, 10% deletion rate, 40 imputations 87 Table B64: Sample size 100, 10% deletion rate, 60 imputations 87 Table B65: Sample size 100, 10% deletion rate, 100 imputations 88 Table B66: Intact data, sample size 100, 20% deletion rate to be applied 88 Table B67: Sample size 100, 20% deletion rate, 20 imputations 89 Table B68: Sample size 100, 20% deletion rate, 40 imputations 89 Table B69: Sample size 100, 20% deletion rate, 60 imputations 90 Table B70: Sample size 100, 20% deletion rate, 100 imputations 90 Table B71: Intact data, sample size 100, 30% deletion rate to be applied 91 Table B72: Sample size 100, 30% deletion rate, 20 imputations 91 Table B73: Sample size 100, 30% deletion rate, 40 imputations 92 Table B74: Sample size 100, 30% deletion rate, 60 imputations 92 Table B75: Sample size 100, 30% deletion rate, 100 imputations 93 Table B76: Intact data, sample size 100, 40% deletion rate to be applied 93 Table B77: Sample size 100, 40% deletion rate, 20 imputations 94 Table B78: Sample size 100, 40% deletion rate, 40 imputations 94 xi

11 Table B79: Sample size 100, 40% deletion rate, 60 imputations 95 Table B80: Sample size 100, 40% deletion rate, 100 imputations 95 Table B81: Intact data, sample size 200, 10% deletion rate to be applied 96 Table B82: Sample size 200, 10% deletion rate, 20 imputations 96 Table B83: Sample size 200, 10% deletion rate, 40 imputations 97 Table B84: Sample size 200, 10% deletion rate, 60 imputations 97 Table B85: Sample size 200, 10% deletion rate, 100 imputations 98 Table B86: Intact data, sample size 200, 20% deletion rate to be applied 98 Table B87: Sample size 200, 20% deletion rate, 20 imputations 99 Table B88: Sample size 200, 20% deletion rate, 40 imputations 99 Table B89: Sample size 200, 20% deletion rate, 60 imputations 100 Table B90: Sample size 200, 20% deletion rate, 100 imputations 100 Table B91: Intact data, sample size 200, 30% deletion rate to be applied 101 Table B92: Sample size 200, 30% deletion rate, 20 imputations 101 Table B93: Sample size 200, 30% deletion rate, 40 imputations 102 Table B94: Sample size 200, 30% deletion rate, 60 imputations 102 Table B95: Sample size 200, 30% deletion rate, 100 imputations 103 Table B96: Intact data, sample size 200, 40% deletion rate to be applied 103 Table B97: Sample size 200, 40% deletion rate, 20 imputations 104 Table B98: Sample size 200, 40% deletion rate, 40 imputations 104 Table B99: Sample size 200, 40% deletion rate, 60 imputations 105 Table B100: Sample size 200, 40% deletion rate, 100 imputations 105 Table B101: Intact data, sample size 500, 10% deletion rate to be applied 106 xii

12 Table B102: Sample size 500, 10% deletion rate, 20 imputations 106 Table B103: Sample size 500, 10% deletion rate, 40 imputations 107 Table B104: Sample size 500, 10% deletion rate, 60 imputations 107 Table B105: Sample size 500, 10% deletion rate, 100 imputations 108 Table B106: Intact data, sample size 500, 20% deletion rate to be applied 108 Table B107: Sample size 500, 20% deletion rate, 20 imputations 109 Table B108: Sample size 500, 20% deletion rate, 40 imputations 109 Table B109: Sample size 500, 20% deletion rate, 60 imputations 110 Table B110: Sample size 500, 20% deletion rate, 100 imputations 110 Table B111: Intact data, sample size 500, 30% deletion rate to be applied 111 Table B112: Sample size 500, 30% deletion rate, 20 imputations 111 Table B113: Sample size 500, 30% deletion rate, 40 imputations 112 Table B114: Sample size 500, 30% deletion rate, 60 imputations 112 Table B115: Sample size 500, 30% deletion rate, 100 imputations 113 Table B116: Intact data, sample size 500, 40% deletion rate to be applied 113 Table B117: Sample size 500, 40% deletion rate, 20 imputations 114 Table B118: Sample size 500, 40% deletion rate, 40 imputations 114 Table B119: Sample size 500, 40% deletion rate, 60 imputations 115 Table B120: Sample size 500, 40% deletion rate, 100 imputations 115 Table C1: Confidence Interval Widths, Data Set N=20, Missing 10% 116 Table C2: Confidence Interval Widths, Data Set N=20, Missing 20% 116 Table C3: Confidence Interval Widths, Data Set N=20, Missing 30% 117 xiii

13 Table C4: Confidence Interval Widths, Data Set N=20, Missing 40% 117 Table C5: Confidence Interval Widths, Data Set N=40, Missing 10% 118 Table C6: Confidence Interval Widths, Data Set N=40, Missing 20% 118 Table C7: Confidence Interval Widths, Data Set N=40, Missing 30% 119 Table C8: Confidence Interval Widths, Data Set N=40, Missing 40% 119 Table C9: Confidence Interval Widths, Data Set N=50, Missing 10% 120 Table C10: Confidence Interval Widths, Data Set N=50, Missing 20% 120 Table C11: Confidence Interval Widths, Data Set N=50, Missing 30% 121 Table C12: Confidence Interval Widths, Data Set N=50, Missing 40% 121 Table C13: Confidence Interval Widths, Data Set N=100, Missing 10% 122 Table C14: Confidence Interval Widths, Data Set N=100, Missing 20% 122 Table C15: Confidence Interval Widths, Data Set N=100, Missing 30% 123 Table C16: Confidence Interval Widths, Data Set N=100, Missing 40% 123 Table C17: Confidence Interval Widths, Data Set N=200, Missing 10% 124 Table C18: Confidence Interval Widths, Data Set N=200, Missing 20% 124 Table C19: Confidence Interval Widths, Data Set N=200, Missing 30% 125 Table C20: Confidence Interval Widths, Data Set N=200, Missing 40% 125 Table C21: Confidence Interval Widths, Data Set N=500, Missing 10% 126 Table C22: Confidence Interval Widths, Data Set N=500, Missing 20% 126 Table C23: Confidence Interval Widths, Data Set N=500, Missing 30% 127 Table C24: Confidence Interval Widths, Data Set N=500, Missing 40% 127 xiv

14 LIST OF FIGURES Figure 1: Regression Analysis, Confidence Interval Vs Sample Size 42 xv

15 1 CHAPTER 1 INTRODUCTION Surveys with missing observations are an expected occurrence in social science research (Rubin, 1976), as noted by Widaman (2006) who opined, "The presence of non-trivial levels of attrition is the norm in longitudinal studies (p. 43) Newman (2014) suggested research reports should be viewed with suspicion if data sets were claimed to be complete. Contrary to Enders (2004), who asserted that missing data in social science constitute a statistical nuisance (p. 431) to be eliminated, others suggested missingness in a dataset may constitute part of the total story. For example, timed performance tests. Comparing sequential blank responses to numbers of completed items may be part of a performance measure (Glas & Pimentel, 2008). Rubin (1987) suggested that items which preclude all but specific responses, for example job title, can be expected to return a portion of blank responses. Designs where subsets of available data are collected from larger samples (Baraldi & Enders, 2010) are also examples where useful information is available when the response cell is empty. Rutkowski (2011), however, counseled that biases conferred by missing data may be present even when missingness is due to study design. Rubin (1987) posited missing data are observable, versus inferred, and are therefore finite. Therefore, when referring to missing data, it should be thought of as data which exist but not recorded. Lee and Cai (2012) supported this counsel, equating missing data to latent or unobserved variables.

16 2 Impact of Missing Data Analyses where all potentially available data are not considered when creating estimates risks return of misleading or faulty inferences. Although the value of a missing response may be biased on speculation, ignorance of the true value invokes additional levels of uncertainty. One consequence of missing data pertains to statistical power, as it is in part a function of sample size. Hence, missing data immediately begets a loss of power. Data missing from comparison groups in an asymmetrical fashion may represent differences between groups, and symmetrical patterns of missingness may include common, potentially confounding, unexplored issues. Either pattern may suggest biasing influences which may affect study outcomes. Even when built into the design, Rutkowski (2011) suggested that instruments with intentionally missing data are subject to unintended biases. Not all missing data are equally problematic, however. Occasional random missing observations from a study may likely be of little consequence, while patterns of missingness may constitute a minor nuisance, major concern, or even a focus for study. Remediating missing data without considering the background and causes of the missingness may appear to address the problem at hand, but may actually mask substantial issues (Hair, Black, Babin, Anderson, & Tatham, 2006). Knowledge and understanding of the study population is an important consideration for remediating missing data as indicated by Little and Rubin (2002), who suggested that data replacement by the data collector as opposed to the data analyst is preferable, as not

17 3 only is the collector likely more familiar with the data source, but they may also be the best or only person with access to the original data pool. In support for Little and Rubin (2002), Widamin (2006) opined that the optimal way of handling missing data depends on the rate of missingness. Similarly, Rubin (2003) suggested that even though groups of missing and collected data may have the same mean, they likely have different distributions. This lent credence to the contention that missing observations are a subset of the larger data pool or may even form unique subset of the population. The common message conferred by the above is that familiarity with the data and its background will serve the researcher well when assessing both the impact and remediation of missing data. Remediating Missing Data The three most common approaches for handling missing data include 1) deletion, 2) weighting, and 3) replacement (Little & Rubin, 1989). Deletion methods remove or ignore partial or entire cases where observations are missing. Weighting adds emphasis to remaining observations to compensate for loss. Replacement methods attempt to repair the data set by imputing surrogate values in place of blank observations. Data replacement may be accomplished using either readily available values of convenience or purposively generated ones. Similar to deletion and weighting, classic, ad hoc data replacement techniques are relatively straightforward to use; however, all three methods tend to distort parameter estimates, especially as sample sizes decrease, rates of missingness increase, or as patterns of missingness become more complex.

18 4 Contemporary purposive data replacement techniques, including multiple imputation (MI), and maximum likelihood estimation (ML) address concerns of parameter distortion by using existing data to estimate replacement values. Unlike ad hoc methods, purposive methods track the data (Rubin, 1996, p. 478), potentially resulting in more accurate estimates. Contemporary purposive data handling methods can ameliorate problems inherent in older methods; however, they too may have issues. Multiple imputation procedures can be somewhat unwieldy to implement, return a slightly different result each time, and run the risk of increased parameter estimation errors as sample size decreases and assumption violations increase. Likelihood estimation procedures require intact covariates to be successfully applied. However, covariates can restrict the generalizability of the model, and may introduce unwanted interaction effects or unpredictable biases. As MI has become increasingly accessible through commercial software packages such as SPSS or SAS, guidelines may be beneficial in helping to gauge the optimal number of imputations which balance under and overly precise estimates. Although estimate biases tend to diminish as sample size increases, the problem remains or exacerbates as sample size decreases. Smaller samples will require more effort to generate greater numbers of imputations in an attempt to neutralize bias. Some of these issues appear to have been addressed in the multiple imputation option available in more recent versions of SPSS. While many users will likely address missing data using SPSS defaults of five imputations and ten iterations, users are able to

19 5 specify much higher numbers of each, which risk creation of overly precise and potentially inaccurate estimates. Statement of the Problem Although the number of imputations required to obtain confidence intervals within 10% bias of the original complete data set will increase as the rate of missingness increases and sample size decreases, there is a point where further increasing the number of imputations may spawn overly precise and potentially misleading estimates. It is therefore hypothesized that there exists an optimal range of imputations which balance over and under specified parameter estimates. It is further hypothesized that the number of imputations will vary with 1) sample size decreases and 2) increased rates of missingness. The purpose of the current study, therefore, is to explore the effect of numbers of imputations on parameter estimate biases in smaller sample sizes with varying levels of missingness. The effect of varying the sample size and level of missingness with the number of imputations needed to return parameter estimates with biases within 10% of the original complete data will be explicated. Limitations For the current study, a small sample will be drawn from a larger data set. The randomly drawn sample may not be a true representative of the population. Data deletion and replacement will be simulated using a computer and, as such, may not accurately reflect reality. By its nature, multiple imputation is expected to return a different set of estimates each time it is conducted; therefore, the exact imputation results and resulting parameter estimations may not be readily reproducible.

20 6 Assumptions The following assumptions are made. The chosen sample and parent data set are approximately normally distributed. Variables chosen for study are continuous and approximately normally distributed. Missing data may be classified as missing at random. Definitions Bayesian: A typology of inference where prior information contributes to the likelihood of subsequent outcomes. Bias: Systematic variation of a parameter estimator from one created using full information. Comparison group: A population subset where participants are assigned to the experimental condition. Confidence interval (C.I.): A range of probabilities within a specified interval from the parameter of interest. Convergence: The process of two variables approaching or converging upon the same value. Covariate: An ancillary or auxiliary variable used to predict the value or outcome of a variable of interest. Distribution: The range and frequency of observed and expected frequency of unobserved values. Efficiency: A measure of the accuracy or precision of a parameter estimator. Estimate: (noun) The projected value of a parameter. Factor analysis: A methodology whereby groups of variables are assessed for patterns

21 7 of correlation. Heteroscedasticity: An inconsistent and or unpredictable pattern of variance in a distribution. Imputation: A procedure or procedures whereby values are added to the data set. Inference, statistical: A conclusion based upon the assessment of likelihood. Interaction: Influence or confounding effect of variables upon one another. Latent variable: A variable which is unobserved but assumed to exist. Log likelihood: Use of natural logarithm to estimate probability or likelihood. Misspecification: An incorrect or mismatched variable used for creating parameter estimates. Monotone: A pattern of missingness in longitudinal studies where data is consistently missing after a specific point in time. Monte Carlo: A methodology which uses random sampling for generating arrays of simulated values. Observation: A datum or single data point. Parameter: A value which defines or describes a characteristic of a distribution, e.g. the mean. Population: An entire group of interest. Posterior distribution: A distribution which uses prior information as conditions or components of parameter predictors. Power, statistical: The ability of a statistical test to detect an effect if an effect exists. Predictor line: A trend line used in regression analysis to predict or estimate a value. Regression: A method of estimating or predicting a value by comparing its fit with known

22 8 patterns, usually through graphical means and use of predictor or regression lines. Sample: A selection or subset of a population. Specification: The choice of variables used to generate parameter estimates. Standard error (SE): The average magnitude of difference between a random observation and the true or hypothesized value. Stochastic: A random variable. Type I error: Incorrectly inferring an effect exists. Type II error: Incorrectly inferring an effect does not exist. Univariate: Single variable, usually in reference to descriptive statistics.

23 9 CHAPTER 2 LITERATURE REVIEW Classifications of Missing Data Data may be classified as missing as primarily at the item, scale, or subject levels. Item level missing data refer to single empty data cells where no between-cell connection for the missingness is evident. Blank entries may be a result of participants omitting items with intention to return but not doing so, choosing not to answer, or simply skipping items unintentionally. Single item missingness may also be due to factors outside of individual participant control, such as coding, or recording anomalies among myriad other possible causes. Scale or construct level missingness refers to subsets of items within individual questionnaires (Newman, 2014). Subsets may include construct or thematically linked groupings of questions that participants feel unable or unwilling to answer. Scale level missingness may also refer to patterns where blank responses are returned on all items beyond a certain point within the survey. Such patterns of missingness may be due to participant fatigue or insufficient time for completion (Glas & Pimentel, 2008), the latter being a common feature of timed surveys or tests where gross count of responses forms part of the final measure. Subject level missingness includes surveys returned devoid of responses, often a result of deliberate or inadvertent participant non-response. Potential participants may choose not to respond, they may n=be unable to respond, or their responses were not recorded. Data loss for technical reasons, e.g. loss of groups of records, may also be a root cause of subject level missingness.

24 10 Proportion of Missing Data As the proportion of missing data increases, the challenges for deriving accurate parameter estimates and, in turn, inferences increase. Determining the approximate level of missingness where inferences are considered unacceptably compromised can prove challenging when no clear distinction between low, moderate and high levels of missingness is evident in the literature. Widamin (2006) described missingness in the 1-2% range as relatively minor, and Schafer (1997) categorized as minor any levels of missingness below 5%. Rubin (2003) described levels of missingness below 30% as modest, and Widamin (2006) considered levels greater than 25% relatively high. As the rate of missingness of a variable approaches extreme levels, it may be beneficial to consider the contention by Rubin (2003) that data sets with levels of missingness in excess of 50% may no longer be considered a problem of missing data. In essence, if less than half of the data of interest is observed, then that data which has been gathered essentially becomes a subsample of the overall sample space. Ignorable Versus Non-Ignorable Missingness Data loss at any of item, scale, or subject levels may or may not be random occurrences. Data may also be missing in non-random patterns if interactions within and between levels exist. For example, a subset of potential respondents may refuse to respond to an entire survey if they find parts of it objectionable. Understanding patterns of missingness and the reasons behind nonresponse can aid the researcher considering remediation of missing data. Rubin (1976) suggested the answer to the question of when is it appropriate to ignore missing data lies in the distinction of random versus nonrandom patterns of missingness. Rubin (1976)

25 11 postulated tenets for classification of missing data into ignorable and non-ignorable categories. He suggested classification as ignorable connotes any fluctuations in the unobserved versus observed may be attributable to random variation. Rubin (1976) cautioned, however, that non-responses are rarely ignorable. Data described as missing completely at random (MCAR) are those where there is no discernible pattern to the missingness, and missing at random (MAR) include responses which exhibit a pattern not in conjunction with the dependent variable (Little & Rubin, 2002). Missing not at random (MNAR) data is an evident relationship between the dependent variable and the pattern of missingness. Widamin (2006) suggested both MAR and MCAR be grouped under ignorable, and MNAR be considered non-ignorable. Newman (2014) suggested mechanisms of missingness, e.g. MCAR, MAR, MNAR may be more useful if viewed on a continuum, and argued rarely if ever are data entirely MCAR or MNAR. Along similar lines, Little and Zanganeh (2013) advocated for relaxing the rules surrounding ignorability of missing data, and suggested it may be acceptable to follow guidelines for the MAR condition for data sets whose patterns of missingness are MNAR if the variables or parameters under consideration meet criteria for MAR. Even when no evident relationship exists between the patterns of missingness, it may be prudent to not dismiss the existence of unaccounted for interactions. It may also be advisable to consider groupings of participants whose surveys are missing data as part of a de facto subgroup. It then becomes incumbent upon the researcher to uncover any patterns of missingness and to make a determination if a pattern is ignorable or non-ignorable. For example, Glas and Pimentel (2008) suggested classifying missing

26 12 data as non-ignorable in surveys where the missingness is related to survey design, e.g. surveys which track counts of completed items as a measure of proficiency. They argue that ignoring missing data in this context constitutes a bias. Ultimately, when considering the ignorability of missing data, it is incumbent upon the researcher determine if patterns of missingness represent a meaningful threat to potential statistical inferences. Classic Data Handling Methods Deletion Deletion methods or complete case methods only use records where complete data are available. Methods include listwise deletion and pairwise deletion. With listwise deletion, only complete cases are included and any with missing values are ignored. For pairwise deletion, only cases where both variables are available for individual comparisons are included in the analysis. Listwise deletion discards entirely any cases missing any variable information. Although such a method appears to be an expedient way to circumvent the complication of incomplete records, Roth, Switzer, and Switzer (1999) argued listwise deletion ensures only part of the available data are used. This aligned with Enders (2001) who made the sobering observation that given a 5% rate of missingness in a three-predictor regression model, employing listwise deletion can expect to lose an estimated 18.6% of their subject data (p. 716). Additionally, the smaller sample size resulting from such actions may not only decrease statistical power; it may also consolidate remaining cases into tighter clusters with less variance, thus increasing the likelihood of Type I errors (Baraldi & Enders, 2010).

27 13 According to Enders (2001), Peugh and Enders (2004), Baraldi and Enders (2010), and Thoemmes and Mohan (2015), successful use of listwise deletion requires the strict assumption of MCAR, and according to Cheema (2014), would likely only function at lower levels of missingness. Newman (2014) stated that records converted from item to record level missingness subsequently become more difficult to restore due to a lack of information available from which to make estimates. Widamin (2006) furthered this argument, and suggested additional biases may be introduced by use of listwise deletion, especially if the deleted cases have something in common. This agreed with Rubin (2003), who indicated that subjects with missing information should be considered a subset of the population. In a study on predictive risk factors in the intensive care unit, (ICU), Chevret, Seaman, and Resche-Rigon (2015) implied listwise deletion can mask the possibility of excluded patients being MNAR, this introducing potential biases which can erode the precision of the parameter estimator model. In limited support of listwise deletion, Schafer (1997) opined that use of listwise deletion at levels of missingness below 5% may be acceptable but is only practical for univariate missing data. Pairwise deletion removes cases from individual cell pair comparisons where a subject s data are missing from one of two variables. Although this method may appear more conserving of data, it introduces concerns. Roth et al. (1999), Baraldi and Enders (2010), and Newman (2014) indicated that each variable will contribute differing numbers of unmatched comparisons; such mismatches making total scale comparisons difficult while yielding inaccurate standard errors. As with listwise deletion, the discounting of blank cells by

28 14 pairwise deletion may bias parameter estimates if the emptiness of the cell itself forms part of the overall data profile. Although either method will necessarily erode statistical power simply by virtue of data loss, Widamin (2006), Newman (2014), and Thoemmes and Mohan (2015) agreed the losses are markedly greater with listwise deletion. This aligned with Little and Rubin (2002) who were dismissive of either method of complete case analysis except in very limited circumstances. Jones (1996) described deletion as wasteful of data, while Newman (2014) went further, decrying deliberate deletion of data as disrespectful of the human being who gave their time and effort to provide the information. Weighting According to Little and Rubin (1989), weighting of cases to replace missing data involves discarding records with any missing components and assigning a weighting value to the remaining intact cases to compensate for those lost to deletion. Rubin (1987) opined that weighting of cases in order to replace missing or discarded ones assumes all other values in the weighted case are similar to the replaced one. Rubin (1987) further stated the assumptions become invalid when multivariate statistics are involved. Similarly, Little and Rubin (1989) noted that case weighting can be useful in limited instances and with monotone data patterns; however, using weightings will complicate computation of standard error estimates. Cheema (2014) argued more directly against weighting, and stated missing values due to item non-response cannot be fixed by using weights (p. 489).

29 15 Data Replacement Replacement of missing values may be accomplished in myriad ways, using procedures of widely varying levels of complexity. Little and Rubin (2002) referred to explicit methods where the missing value is replaced with an existing one, and implicit methods where a value is inserted with the assumption of its compatibility. Explicit Methods of Replacement The most straightforward method for replacing missing values involves substituting or imputing the arithmetic mean. Although an easy procedure to implement, mean substitution creates several fundamental problems. Widamin (2006) stated that substituting the mean for an actual value missing from the sample pool will contribute a zero measure of variance to parameter estimates. As occurrences of mean substitution increase in a sample, the alleged sample distribution will increasingly consolidate around the mean, resulting in underestimates of variance and standard errors and increases in the likelihood of Type I errors (Newman, 2014; Rubin, 1987). Widamin (2006) suggested that narrowing of estimated distributions may also play havoc with estimates of correlation which rely on overlaps between variables. Another issue with imputing the mean in place of missing observations is voiced by Rubin (1987), who suggested that each occurrence of mean imputation not only artificially reduces estimates of standard error, but the additional variable counts yield erroneous degrees of freedom measures, thus further biasing parameter estimates. Little and Rubin (2002) described mean imputation as a special case of regression imputation, essentially where the mean functions as a placeholder for missing observations. Not unlike mean substitution, regression imputation uses existing

30 16 values tom predict the value of missing observations. Although this may seem a convenient way to plug an empty spot with what appears to be an ideal placeholder, Widamin (2006) suggested that regression imputation is a double-edged sword. Predictions made using regression lines produce too good (p. 52) to be true values which fall directly on the prediction line. This aligns with Azur, Stuart, Frangakis, and Leaf (2011) who cautioned on misleading overly precise results (p. 41) resulting from a single imputations such as mean substation. Again, the narrowed range of estimated standard error in such distributions will increase the chance of Type I error, or the mirage (Newman, 2014, p. 378) of an effect when none is present. For either of the above types of single imputation where a surrogate missing value is fabricated from existing observations, both Widamin (2006) and Cheema (2014) raised doubts about the validity. Widamin (2006) expressed concerns that real unobserved values may actually be different from imputed ones. This aligns with concerns expressed by Cheema (2014), who suggested that replacing missing values with what amounts to a guess at the mean raises the question whether the estimated value is actually the mean at all. Rubin (2003) furthered the argument of representativeness, proposing that even if the collection of missing records has the same mean as those which have been observed, they may have a different distribution pattern. Rubin (1978) succinctly summarized the problem with single value imputation stating, imputing one value for a missing datum cannot be correct in general, because we don t know what value to impute with certainty (if we did, it wouldn t be missing) (p. 21).

31 17 A third method of explicit replacement modeling, stochastic regression imputation, was described by Little and Rubin (2002) as a regression imputation plus a residual, drawn to reflect uncertainty in the predicted value (p. 60). Baraldi and Enders (2010) agreed that although the stochastic procedure does add an error term, it does it only once, yielding an artificially small confidence interval and commensurate uptick in the likelihood of Type I error. The opposite was claimed by Widamin (2006), who expressed concerns that addition of a random component into the absolute value of imputation may yield an entry out of realistic range or is otherwise implausible. Although their admonition to prevent this happening the researcher must be familiar with the characteristics of the original data appears to have been addressed by Little and Rubin s (2002) description of the component as a residual of the predicted value, their caution to remain mindful of potential unforeseen interactions does provide food for thought on overall context when choosing to replace missing data. Implicit Methods Hot deck imputation involves replacing one missing value with another copied from a different respondent with a similar response pattern from the same population. In a Monte Carlo simulation, Roth et al. (1999) counseled not to immediately discard the hot deck procedure, as it does actually introduce some level of variability. Schenker and Welsh (1988) also saw hot deck imputation as potentially useful when used with large sample sizes, relatively few covariates and small numbers of possible values. Daniel and Kenward (2012) considered hot deck helpful under MAR conditions; however, they suggested that it may actually become a hindrance under MNAR, especially if other participant variates are actually different.

32 18 Although the schemata behind this fairly popular procedure can be rather complex (Little & Rubin, 2002), in the end, the substitute value is at best only a static estimate of the true value (Rubin, 1987). Rubin (1987) suggested that straight draws from the hot deck may not be sufficient to capture the dynamic nature of sampling: instead, it was suggested to follow a Bayesian paradigm (p. 126) which has variability between draws built into the model. Cold deck imputation is similar to hot deck except that the replacement is drawn from a comparable but preexisting data set (Little & Rubin, 2002). The preexisting and static nature of the data source sampled using cold deck imputation may be even more removed from the population from which the replacement is sought. Other replacement methods include straight substitution of one missing participant with another from the population; the zero imputation method of replacing missing values with zero, dummy variable adjustment, and last observation carried forward procedures among others (Little & Rubin, 2002; Cheema, 2014; Barnes, Lindborg, & Seaman Jr., 2006). These methods are straightforward to implement, but can yield parameter estimates biased towards Type I errors. A serious problem according to Rubin (1987) with replacement techniques is that although the substituted data are only estimates of the true values, they are almost always treated by researchers as real responses. The flaw is in unwittingly making inferences and decisions with potentially inaccurate information. Peugh and Enders (2004) suggested all such ad hoc replacement techniques are fundamentally flawed and should be abandoned.

33 19 Contemporary Methods of Replacement: Multiple Imputation and Maximum Likelihood Estimation Rubin (1978) and Rubin (1987) described a method, multiple imputation (MI) where the researcher creates a series of copies using the original data, replacing individual missing observations with values randomly drawn from the existing data pool. The result is a series of similar but unique data sets, each representing plausible versions of the intact data set. Analysis procedures envisioned for the complete data are applied separately to each copy and the results pooled. The pooled estimates are then averaged to create final parameter estimates. Maximum likelihood (ML) methods such as estimation maximization (EM) algorithms (Little & Rubin, 2002) and full information maximum likelihood (FIML) use existing information to essentially triangulate on unobserved values (Peugh & Enders, 2004). The resulting complete data set contains a mix of observed values and empty cells occupied by a best guess at the real value. Multiple Imputation Rubin (1987), Little and Rubin (2002), and Carlin (2014) referred to multiple imputation as a Bayesian grounded approach where information from prior sampling informs subsequent draws. Plausible imputation values and error terms are based upon knowledge of existing data, with each sampling from the data set consisting of a random draw which, when recorded, is added back to the pool for use in subsequent draws. Unlike static imputation methods such as mean and regression imputation or hot and cold deck imputations, in MI, the sample space updates with each imputation (Baraldi & Enders, 2010). Over subsequent sampling draws, the adaptive behavior the MI process

34 20 yields a more distinct sample space, while preserving the element of randomness within the sampling pool. The random element allows, according to Rubin (1987), the imputed collection values to represent the uncertainty (p. vii) about which of the group values corresponds to the actual missing item. Subsequent imputations may or may not more closely approximate the actual value of the missing item; however, van Buuren (2007) noted that the pooling of parameter estimates tends to offset the effects of any extreme values which may appear. MI resembles stochastic regression imputation (Baraldi & Enders, 2010; Enders, 2010). However, the potential for implausible entries (Widamin, 2006), and increase in likelihood of Type I error (Baraldi & Enders, 2010) inherent in stochastic imputation are tempered by the multiple and likely divergent values returned by MI procedures. The aggregated effect of multiple imputations is in agreement with Widamin (2006) who, like van Buuren (2007), noted that the convergent behavior of MI confers the benefits of stochastic regression imputation, while mitigating the effects of extreme values. Although multiple imputation by design adds a measure of uncertainty to parameter estimates, there is support for the scaffolding influence provided by the inclusion of covariates and or auxiliary variable in the imputation model. Rubin (1978), Rubin (1996), and Crawford, Tennstedt, and McKinlay (1995) counseled using auxiliary variables as a form of frameworking. Van Buuren (2007) suggested that including auxiliary variables may help support the ignorabilty assumption of missing data. This aligned with Schenker and Welsh (1988), who suggested the ignorability assumption becomes more plausible with the addition of covariates to help explain or contextualize

35 21 nonresponses, and Enders and Gottschall (2011), who advocated for carefully selected auxiliary variables to satisfy the MAR assumption, (p. 37). Although Rubin (2003) indicated most MI models function under the assumption of ignorabilty, in reality, as noted earlier, nonresponse is rarely truly ignorable (Rubin, 1976; 1987). In an example of nonignorable missingness, Engels and Diehr (2003) proposed using participants prior data to predict those subsequently lost to attrition in longitudinal studies. They eschewed use of MI, suggesting simulations which prove that the utility of MI methodology are conducted using unrealistic patterns of missingness. Although their allegations may appear superficially logical in context of their focus, the conclusion was based on the extreme MNAR nature of their target participants. Comparing MI to data augmentation (DA), Merkle (2011) suggested that DA, essentially MI draws alternating with simulated, i.e. Monte Carlo values, may in some circumstances function better than MI. They found DA returned less biased estimates with smaller sample sizes and higher levels of missingness. Of note is their discussion of the differences in the narrower context of factor analysis, with this procedure functioning for them in both imputation and analysis roles. They acknowledged, however, this advantage is modest, and most evident with the better specified DA imputation model. In essence, the conceded the less precise nature of MI methodology may actually make it more tolerant of misspecified models. Essentially, more tolerant of the inherent messiness of real world data. In comparing MI to doubly robust estimators, Carpenter et al. (2006) appeared to contradict Merkle (2011). Carpenter et al. (2006) suggested MI works better with correctly specified models. Their model, based on inverse probabilities, essentially

36 22 weightings, more strongly supported the choice of parameter estimators, and disallowed some apparent flexibility inherent in the uncertainty aspect of MI. This ran contrary to Rubin (1987), who suggested an advantage of MI was that the data manager no longer needed to risk the bias of misleadingly constrained values potentially conferred by the use of weightings to shore up holes in the data. Although limited support for use of weightings may be found in Little and Rubin (2002), Carpenter et al. (2006) undercut the utility of their version of waiting by offering caveats on the restrictive circumstances under which it may be valid. Hughes, Sterne, and Tilling (2014), not unlike Carpenter et al. (2006), suggested mismatch between assumptions around the data set and the imputation model used can yield misleading results. They expressed concerns regarding overly precise estimates in the presence of heteroscedasticity, especially as sample size decreases. This aligned with Schafer (1997), who cautioned that an overly focused imputation model may restrict the utility of a data set. Hughes et al. (2014) discussed an estimator proposed by Robins and Wang (2000) which appeared more robust in the presence of specification and compatibility mismatches. Both Robins and Wang (2000) and Hughes et al. (2014) conceded the Robins and Wang (2000) estimator appeared to function best in the presence of mismatches. Robins and Wang (2000) further admitted their procedure may not function with smaller data sets. Steele, Wang, and Rafferty (2010) also proposed a method of enhancing the imputations with samples taken from a simulated normal posterior distribution. Although

37 23 this appeared to narrow their confidence intervals, they indicated their method for improving upon Rubin (1987) had limited utility. Kim (2004) noted the Rubin (1987) estimators carried large biases with smaller sample sizes in linear regression applications. The expressed concern in instances of stratified data, suggesting that such further subdivision would exacerbate biases. They proposed use of an estimator, which not unlike a logarithmic coefficient, seemed to better attenuate variance fluctuations with smaller sample sizes. In line with Kim (2004), Barnard and Rubin (1999) suggested an adjustment to reduce the biases inherent in smaller sample sizes, as did Graham and Schafer (1999), who suggested transforming variables back to their original scale afterwards (p. 9). Although Kim (2004) showed their estimator increasing efficiency and shrinking biases in a simulation study, on closer inspection, the most notable differences begin to emerge with the 40% response rate condition. This raised the question of practicality of the Kim (2004) variant when Rubin (2003) suggested that MI can, but was not intended, to work at levels of missingness above 50% due to concerns of loss of efficiency and validity. In comparing MI to ML in small univariate normal samples, von Hippel (2013) argued against MI on the grounds of inefficiency and bias. This aligns with Yuan, Yang- Wallentin, and Bentler (2012) who stated that MI will tend to return biased, less efficient estimates than ML in the presence of violations of distribution conditions. Their suggestion that MI may fare better than ML with smaller sample sizes when performed under conditions of a properly specified posterior distribution agreed with Carpenter et al. (2006) and their belief that MI functions better with more precisely specified models.

Missing Data and Imputation

Missing Data and Imputation Missing Data and Imputation Barnali Das NAACCR Webinar May 2016 Outline Basic concepts Missing data mechanisms Methods used to handle missing data 1 What are missing data? General term: data we intended

More information

Exploring the Impact of Missing Data in Multiple Regression

Exploring the Impact of Missing Data in Multiple Regression Exploring the Impact of Missing Data in Multiple Regression Michael G Kenward London School of Hygiene and Tropical Medicine 28th May 2015 1. Introduction In this note we are concerned with the conduct

More information

Logistic Regression with Missing Data: A Comparison of Handling Methods, and Effects of Percent Missing Values

Logistic Regression with Missing Data: A Comparison of Handling Methods, and Effects of Percent Missing Values Logistic Regression with Missing Data: A Comparison of Handling Methods, and Effects of Percent Missing Values Sutthipong Meeyai School of Transportation Engineering, Suranaree University of Technology,

More information

Inclusive Strategy with Confirmatory Factor Analysis, Multiple Imputation, and. All Incomplete Variables. Jin Eun Yoo, Brian French, Susan Maller

Inclusive Strategy with Confirmatory Factor Analysis, Multiple Imputation, and. All Incomplete Variables. Jin Eun Yoo, Brian French, Susan Maller Inclusive strategy with CFA/MI 1 Running head: CFA AND MULTIPLE IMPUTATION Inclusive Strategy with Confirmatory Factor Analysis, Multiple Imputation, and All Incomplete Variables Jin Eun Yoo, Brian French,

More information

Best Practice in Handling Cases of Missing or Incomplete Values in Data Analysis: A Guide against Eliminating Other Important Data

Best Practice in Handling Cases of Missing or Incomplete Values in Data Analysis: A Guide against Eliminating Other Important Data Best Practice in Handling Cases of Missing or Incomplete Values in Data Analysis: A Guide against Eliminating Other Important Data Sub-theme: Improving Test Development Procedures to Improve Validity Dibu

More information

A Strategy for Handling Missing Data in the Longitudinal Study of Young People in England (LSYPE)

A Strategy for Handling Missing Data in the Longitudinal Study of Young People in England (LSYPE) Research Report DCSF-RW086 A Strategy for Handling Missing Data in the Longitudinal Study of Young People in England (LSYPE) Andrea Piesse and Graham Kalton Westat Research Report No DCSF-RW086 A Strategy

More information

S Imputation of Categorical Missing Data: A comparison of Multivariate Normal and. Multinomial Methods. Holmes Finch.

S Imputation of Categorical Missing Data: A comparison of Multivariate Normal and. Multinomial Methods. Holmes Finch. S05-2008 Imputation of Categorical Missing Data: A comparison of Multivariate Normal and Abstract Multinomial Methods Holmes Finch Matt Margraf Ball State University Procedures for the imputation of missing

More information

Advanced Handling of Missing Data

Advanced Handling of Missing Data Advanced Handling of Missing Data One-day Workshop Nicole Janz ssrmcta@hermes.cam.ac.uk 2 Goals Discuss types of missingness Know advantages & disadvantages of missing data methods Learn multiple imputation

More information

Missing Data and Institutional Research

Missing Data and Institutional Research A version of this paper appears in Umbach, Paul D. (Ed.) (2005). Survey research. Emerging issues. New directions for institutional research #127. (Chapter 3, pp. 33-50). San Francisco: Jossey-Bass. Missing

More information

Multiple Imputation For Missing Data: What Is It And How Can I Use It?

Multiple Imputation For Missing Data: What Is It And How Can I Use It? Multiple Imputation For Missing Data: What Is It And How Can I Use It? Jeffrey C. Wayman, Ph.D. Center for Social Organization of Schools Johns Hopkins University jwayman@csos.jhu.edu www.csos.jhu.edu

More information

An Introduction to Multiple Imputation for Missing Items in Complex Surveys

An Introduction to Multiple Imputation for Missing Items in Complex Surveys An Introduction to Multiple Imputation for Missing Items in Complex Surveys October 17, 2014 Joe Schafer Center for Statistical Research and Methodology (CSRM) United States Census Bureau Views expressed

More information

Modern Strategies to Handle Missing Data: A Showcase of Research on Foster Children

Modern Strategies to Handle Missing Data: A Showcase of Research on Foster Children Modern Strategies to Handle Missing Data: A Showcase of Research on Foster Children Anouk Goemans, MSc PhD student Leiden University The Netherlands Email: a.goemans@fsw.leidenuniv.nl Modern Strategies

More information

Selected Topics in Biostatistics Seminar Series. Missing Data. Sponsored by: Center For Clinical Investigation and Cleveland CTSC

Selected Topics in Biostatistics Seminar Series. Missing Data. Sponsored by: Center For Clinical Investigation and Cleveland CTSC Selected Topics in Biostatistics Seminar Series Missing Data Sponsored by: Center For Clinical Investigation and Cleveland CTSC Brian Schmotzer, MS Biostatistician, CCI Statistical Sciences Core brian.schmotzer@case.edu

More information

Help! Statistics! Missing data. An introduction

Help! Statistics! Missing data. An introduction Help! Statistics! Missing data. An introduction Sacha la Bastide-van Gemert Medical Statistics and Decision Making Department of Epidemiology UMCG Help! Statistics! Lunch time lectures What? Frequently

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

Module 14: Missing Data Concepts

Module 14: Missing Data Concepts Module 14: Missing Data Concepts Jonathan Bartlett & James Carpenter London School of Hygiene & Tropical Medicine Supported by ESRC grant RES 189-25-0103 and MRC grant G0900724 Pre-requisites Module 3

More information

Accuracy of Range Restriction Correction with Multiple Imputation in Small and Moderate Samples: A Simulation Study

Accuracy of Range Restriction Correction with Multiple Imputation in Small and Moderate Samples: A Simulation Study A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to Practical Assessment, Research & Evaluation. Permission is granted to distribute

More information

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n. University of Groningen Latent instrumental variables Ebbes, P. IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document

More information

Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study

Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study STATISTICAL METHODS Epidemiology Biostatistics and Public Health - 2016, Volume 13, Number 1 Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation

More information

Multiple imputation for multivariate missing-data. problems: a data analyst s perspective. Joseph L. Schafer and Maren K. Olsen

Multiple imputation for multivariate missing-data. problems: a data analyst s perspective. Joseph L. Schafer and Maren K. Olsen Multiple imputation for multivariate missing-data problems: a data analyst s perspective Joseph L. Schafer and Maren K. Olsen The Pennsylvania State University March 9, 1998 1 Abstract Analyses of multivariate

More information

Validity and reliability of measurements

Validity and reliability of measurements Validity and reliability of measurements 2 3 Request: Intention to treat Intention to treat and per protocol dealing with cross-overs (ref Hulley 2013) For example: Patients who did not take/get the medication

More information

Validity and reliability of measurements

Validity and reliability of measurements Validity and reliability of measurements 2 Validity and reliability of measurements 4 5 Components in a dataset Why bother (examples from research) What is reliability? What is validity? How should I treat

More information

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY Lingqi Tang 1, Thomas R. Belin 2, and Juwon Song 2 1 Center for Health Services Research,

More information

COMMITTEE FOR PROPRIETARY MEDICINAL PRODUCTS (CPMP) POINTS TO CONSIDER ON MISSING DATA

COMMITTEE FOR PROPRIETARY MEDICINAL PRODUCTS (CPMP) POINTS TO CONSIDER ON MISSING DATA The European Agency for the Evaluation of Medicinal Products Evaluation of Medicines for Human Use London, 15 November 2001 CPMP/EWP/1776/99 COMMITTEE FOR PROPRIETARY MEDICINAL PRODUCTS (CPMP) POINTS TO

More information

MISSING DATA AND PARAMETERS ESTIMATES IN MULTIDIMENSIONAL ITEM RESPONSE MODELS. Federico Andreis, Pier Alda Ferrari *

MISSING DATA AND PARAMETERS ESTIMATES IN MULTIDIMENSIONAL ITEM RESPONSE MODELS. Federico Andreis, Pier Alda Ferrari * Electronic Journal of Applied Statistical Analysis EJASA (2012), Electron. J. App. Stat. Anal., Vol. 5, Issue 3, 431 437 e-issn 2070-5948, DOI 10.1285/i20705948v5n3p431 2012 Università del Salento http://siba-ese.unile.it/index.php/ejasa/index

More information

Section on Survey Research Methods JSM 2009

Section on Survey Research Methods JSM 2009 Missing Data and Complex Samples: The Impact of Listwise Deletion vs. Subpopulation Analysis on Statistical Bias and Hypothesis Test Results when Data are MCAR and MAR Bethany A. Bell, Jeffrey D. Kromrey

More information

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research 2012 CCPRC Meeting Methodology Presession Workshop October 23, 2012, 2:00-5:00 p.m. Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy

More information

Some General Guidelines for Choosing Missing Data Handling Methods in Educational Research

Some General Guidelines for Choosing Missing Data Handling Methods in Educational Research Journal of Modern Applied Statistical Methods Volume 13 Issue 2 Article 3 11-2014 Some General Guidelines for Choosing Missing Data Handling Methods in Educational Research Jehanzeb R. Cheema University

More information

Statistical data preparation: management of missing values and outliers

Statistical data preparation: management of missing values and outliers KJA Korean Journal of Anesthesiology Statistical Round pissn 2005-6419 eissn 2005-7563 Statistical data preparation: management of missing values and outliers Sang Kyu Kwak 1 and Jong Hae Kim 2 Departments

More information

For general queries, contact

For general queries, contact Much of the work in Bayesian econometrics has focused on showing the value of Bayesian methods for parametric models (see, for example, Geweke (2005), Koop (2003), Li and Tobias (2011), and Rossi, Allenby,

More information

The prevention and handling of the missing data

The prevention and handling of the missing data Review Article Korean J Anesthesiol 2013 May 64(5): 402-406 http://dx.doi.org/10.4097/kjae.2013.64.5.402 The prevention and handling of the missing data Department of Anesthesiology and Pain Medicine,

More information

Should a Normal Imputation Model Be Modified to Impute Skewed Variables?

Should a Normal Imputation Model Be Modified to Impute Skewed Variables? Sociological Methods and Research, 2013, 42(1), 105-138 Should a Normal Imputation Model Be Modified to Impute Skewed Variables? Paul T. von Hippel Abstract (169 words) Researchers often impute continuous

More information

EPI 200C Final, June 4 th, 2009 This exam includes 24 questions.

EPI 200C Final, June 4 th, 2009 This exam includes 24 questions. Greenland/Arah, Epi 200C Sp 2000 1 of 6 EPI 200C Final, June 4 th, 2009 This exam includes 24 questions. INSTRUCTIONS: Write all answers on the answer sheets supplied; PRINT YOUR NAME and STUDENT ID NUMBER

More information

Catherine A. Welch 1*, Séverine Sabia 1,2, Eric Brunner 1, Mika Kivimäki 1 and Martin J. Shipley 1

Catherine A. Welch 1*, Séverine Sabia 1,2, Eric Brunner 1, Mika Kivimäki 1 and Martin J. Shipley 1 Welch et al. BMC Medical Research Methodology (2018) 18:89 https://doi.org/10.1186/s12874-018-0548-0 RESEARCH ARTICLE Open Access Does pattern mixture modelling reduce bias due to informative attrition

More information

Individual Participant Data (IPD) Meta-analysis of prediction modelling studies

Individual Participant Data (IPD) Meta-analysis of prediction modelling studies Individual Participant Data (IPD) Meta-analysis of prediction modelling studies Thomas Debray, PhD Julius Center for Health Sciences and Primary Care Utrecht, The Netherlands March 7, 2016 Prediction

More information

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity Measurement & Variables - Initial step is to conceptualize and clarify the concepts embedded in a hypothesis or research question with

More information

The Impact of Relative Standards on the Propensity to Disclose. Alessandro Acquisti, Leslie K. John, George Loewenstein WEB APPENDIX

The Impact of Relative Standards on the Propensity to Disclose. Alessandro Acquisti, Leslie K. John, George Loewenstein WEB APPENDIX The Impact of Relative Standards on the Propensity to Disclose Alessandro Acquisti, Leslie K. John, George Loewenstein WEB APPENDIX 2 Web Appendix A: Panel data estimation approach As noted in the main

More information

Strategies for handling missing data in randomised trials

Strategies for handling missing data in randomised trials Strategies for handling missing data in randomised trials NIHR statistical meeting London, 13th February 2012 Ian White MRC Biostatistics Unit, Cambridge, UK Plan 1. Why do missing data matter? 2. Popular

More information

Master thesis Department of Statistics

Master thesis Department of Statistics Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Missing Data in the Swedish National Patients Register: Multiple Imputation by Fully Conditional Specification Jesper Hörnblad

More information

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 1.1-1

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 1.1-1 Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by Mario F. Triola 1.1-1 Chapter 1 Introduction to Statistics 1-1 Review and Preview 1-2 Statistical Thinking 1-3

More information

In this module I provide a few illustrations of options within lavaan for handling various situations.

In this module I provide a few illustrations of options within lavaan for handling various situations. In this module I provide a few illustrations of options within lavaan for handling various situations. An appropriate citation for this material is Yves Rosseel (2012). lavaan: An R Package for Structural

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

ISPOR Task Force Report: ITC & NMA Study Questionnaire

ISPOR Task Force Report: ITC & NMA Study Questionnaire INDIRECT TREATMENT COMPARISON / NETWORK META-ANALYSIS STUDY QUESTIONNAIRE TO ASSESS RELEVANCE AND CREDIBILITY TO INFORM HEALTHCARE DECISION-MAKING: AN ISPOR-AMCP-NPC GOOD PRACTICE TASK FORCE REPORT DRAFT

More information

Methods for Computing Missing Item Response in Psychometric Scale Construction

Methods for Computing Missing Item Response in Psychometric Scale Construction American Journal of Biostatistics Original Research Paper Methods for Computing Missing Item Response in Psychometric Scale Construction Ohidul Islam Siddiqui Institute of Statistical Research and Training

More information

accuracy (see, e.g., Mislevy & Stocking, 1989; Qualls & Ansley, 1985; Yen, 1987). A general finding of this research is that MML and Bayesian

accuracy (see, e.g., Mislevy & Stocking, 1989; Qualls & Ansley, 1985; Yen, 1987). A general finding of this research is that MML and Bayesian Recovery of Marginal Maximum Likelihood Estimates in the Two-Parameter Logistic Response Model: An Evaluation of MULTILOG Clement A. Stone University of Pittsburgh Marginal maximum likelihood (MML) estimation

More information

Missing by Design: Planned Missing-Data Designs in Social Science

Missing by Design: Planned Missing-Data Designs in Social Science Research & Methods ISSN 1234-9224 Vol. 20 (1, 2011): 81 105 Institute of Philosophy and Sociology Polish Academy of Sciences, Warsaw www.ifi span.waw.pl e-mail: publish@ifi span.waw.pl Missing by Design:

More information

The Relative Performance of Full Information Maximum Likelihood Estimation for Missing Data in Structural Equation Models

The Relative Performance of Full Information Maximum Likelihood Estimation for Missing Data in Structural Equation Models University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Educational Psychology Papers and Publications Educational Psychology, Department of 7-1-2001 The Relative Performance of

More information

The RoB 2.0 tool (individually randomized, cross-over trials)

The RoB 2.0 tool (individually randomized, cross-over trials) The RoB 2.0 tool (individually randomized, cross-over trials) Study design Randomized parallel group trial Cluster-randomized trial Randomized cross-over or other matched design Specify which outcome is

More information

Chapter 21 Multilevel Propensity Score Methods for Estimating Causal Effects: A Latent Class Modeling Strategy

Chapter 21 Multilevel Propensity Score Methods for Estimating Causal Effects: A Latent Class Modeling Strategy Chapter 21 Multilevel Propensity Score Methods for Estimating Causal Effects: A Latent Class Modeling Strategy Jee-Seon Kim and Peter M. Steiner Abstract Despite their appeal, randomized experiments cannot

More information

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha attrition: When data are missing because we are unable to measure the outcomes of some of the

More information

CHAPTER VI RESEARCH METHODOLOGY

CHAPTER VI RESEARCH METHODOLOGY CHAPTER VI RESEARCH METHODOLOGY 6.1 Research Design Research is an organized, systematic, data based, critical, objective, scientific inquiry or investigation into a specific problem, undertaken with the

More information

PubH 7405: REGRESSION ANALYSIS. Propensity Score

PubH 7405: REGRESSION ANALYSIS. Propensity Score PubH 7405: REGRESSION ANALYSIS Propensity Score INTRODUCTION: There is a growing interest in using observational (or nonrandomized) studies to estimate the effects of treatments on outcomes. In observational

More information

Data Analysis Using Regression and Multilevel/Hierarchical Models

Data Analysis Using Regression and Multilevel/Hierarchical Models Data Analysis Using Regression and Multilevel/Hierarchical Models ANDREW GELMAN Columbia University JENNIFER HILL Columbia University CAMBRIDGE UNIVERSITY PRESS Contents List of examples V a 9 e xv " Preface

More information

Identifying Peer Influence Effects in Observational Social Network Data: An Evaluation of Propensity Score Methods

Identifying Peer Influence Effects in Observational Social Network Data: An Evaluation of Propensity Score Methods Identifying Peer Influence Effects in Observational Social Network Data: An Evaluation of Propensity Score Methods Dean Eckles Department of Communication Stanford University dean@deaneckles.com Abstract

More information

Using Test Databases to Evaluate Record Linkage Models and Train Linkage Practitioners

Using Test Databases to Evaluate Record Linkage Models and Train Linkage Practitioners Using Test Databases to Evaluate Record Linkage Models and Train Linkage Practitioners Michael H. McGlincy Strategic Matching, Inc. PO Box 334, Morrisonville, NY 12962 Phone 518 643 8485, mcglincym@strategicmatching.com

More information

Cochrane Pregnancy and Childbirth Group Methodological Guidelines

Cochrane Pregnancy and Childbirth Group Methodological Guidelines Cochrane Pregnancy and Childbirth Group Methodological Guidelines [Prepared by Simon Gates: July 2009, updated July 2012] These guidelines are intended to aid quality and consistency across the reviews

More information

Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models, 2nd Ed.

Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models, 2nd Ed. Eric Vittinghoff, David V. Glidden, Stephen C. Shiboski, and Charles E. McCulloch Division of Biostatistics Department of Epidemiology and Biostatistics University of California, San Francisco Regression

More information

CFSD 21 st Century Learning Rubric Skill: Critical & Creative Thinking

CFSD 21 st Century Learning Rubric Skill: Critical & Creative Thinking Comparing Selects items that are inappropriate to the basic objective of the comparison. Selects characteristics that are trivial or do not address the basic objective of the comparison. Selects characteristics

More information

Analysis of TB prevalence surveys

Analysis of TB prevalence surveys Workshop and training course on TB prevalence surveys with a focus on field operations Analysis of TB prevalence surveys Day 8 Thursday, 4 August 2011 Phnom Penh Babis Sismanidis with acknowledgements

More information

Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis

Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis EFSA/EBTC Colloquium, 25 October 2017 Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis Julian Higgins University of Bristol 1 Introduction to concepts Standard

More information

Cross-Lagged Panel Analysis

Cross-Lagged Panel Analysis Cross-Lagged Panel Analysis Michael W. Kearney Cross-lagged panel analysis is an analytical strategy used to describe reciprocal relationships, or directional influences, between variables over time. Cross-lagged

More information

VALIDITY OF QUANTITATIVE RESEARCH

VALIDITY OF QUANTITATIVE RESEARCH Validity 1 VALIDITY OF QUANTITATIVE RESEARCH Recall the basic aim of science is to explain natural phenomena. Such explanations are called theories (Kerlinger, 1986, p. 8). Theories have varying degrees

More information

Sawtooth Software. The Number of Levels Effect in Conjoint: Where Does It Come From and Can It Be Eliminated? RESEARCH PAPER SERIES

Sawtooth Software. The Number of Levels Effect in Conjoint: Where Does It Come From and Can It Be Eliminated? RESEARCH PAPER SERIES Sawtooth Software RESEARCH PAPER SERIES The Number of Levels Effect in Conjoint: Where Does It Come From and Can It Be Eliminated? Dick Wittink, Yale University Joel Huber, Duke University Peter Zandan,

More information

Political Science 15, Winter 2014 Final Review

Political Science 15, Winter 2014 Final Review Political Science 15, Winter 2014 Final Review The major topics covered in class are listed below. You should also take a look at the readings listed on the class website. Studying Politics Scientifically

More information

Appendix 1. Sensitivity analysis for ACQ: missing value analysis by multiple imputation

Appendix 1. Sensitivity analysis for ACQ: missing value analysis by multiple imputation Appendix 1 Sensitivity analysis for ACQ: missing value analysis by multiple imputation A sensitivity analysis was carried out on the primary outcome measure (ACQ) using multiple imputation (MI). MI is

More information

Psychology 205, Revelle, Fall 2014 Research Methods in Psychology Mid-Term. Name:

Psychology 205, Revelle, Fall 2014 Research Methods in Psychology Mid-Term. Name: Name: 1. (2 points) What is the primary advantage of using the median instead of the mean as a measure of central tendency? It is less affected by outliers. 2. (2 points) Why is counterbalancing important

More information

Evaluation Models STUDIES OF DIAGNOSTIC EFFICIENCY

Evaluation Models STUDIES OF DIAGNOSTIC EFFICIENCY 2. Evaluation Model 2 Evaluation Models To understand the strengths and weaknesses of evaluation, one must keep in mind its fundamental purpose: to inform those who make decisions. The inferences drawn

More information

Multiple imputation for handling missing outcome data when estimating the relative risk

Multiple imputation for handling missing outcome data when estimating the relative risk Sullivan et al. BMC Medical Research Methodology (2017) 17:134 DOI 10.1186/s12874-017-0414-5 RESEARCH ARTICLE Open Access Multiple imputation for handling missing outcome data when estimating the relative

More information

Running head: SELECTION OF AUXILIARY VARIABLES 1. Selection of auxiliary variables in missing data problems: Not all auxiliary variables are

Running head: SELECTION OF AUXILIARY VARIABLES 1. Selection of auxiliary variables in missing data problems: Not all auxiliary variables are Running head: SELECTION OF AUXILIARY VARIABLES 1 Selection of auxiliary variables in missing data problems: Not all auxiliary variables are created equal Felix Thoemmes Cornell University Norman Rose University

More information

What to do with missing data in clinical registry analysis?

What to do with missing data in clinical registry analysis? Melbourne 2011; Registry Special Interest Group What to do with missing data in clinical registry analysis? Rory Wolfe Acknowledgements: James Carpenter, Gerard O Reilly Department of Epidemiology & Preventive

More information

ISC- GRADE XI HUMANITIES ( ) PSYCHOLOGY. Chapter 2- Methods of Psychology

ISC- GRADE XI HUMANITIES ( ) PSYCHOLOGY. Chapter 2- Methods of Psychology ISC- GRADE XI HUMANITIES (2018-19) PSYCHOLOGY Chapter 2- Methods of Psychology OUTLINE OF THE CHAPTER (i) Scientific Methods in Psychology -observation, case study, surveys, psychological tests, experimentation

More information

Detection of Unknown Confounders. by Bayesian Confirmatory Factor Analysis

Detection of Unknown Confounders. by Bayesian Confirmatory Factor Analysis Advanced Studies in Medical Sciences, Vol. 1, 2013, no. 3, 143-156 HIKARI Ltd, www.m-hikari.com Detection of Unknown Confounders by Bayesian Confirmatory Factor Analysis Emil Kupek Department of Public

More information

SESUG Paper SD

SESUG Paper SD SESUG Paper SD-106-2017 Missing Data and Complex Sample Surveys Using SAS : The Impact of Listwise Deletion vs. Multiple Imputation Methods on Point and Interval Estimates when Data are MCAR, MAR, and

More information

Chapter 11. Experimental Design: One-Way Independent Samples Design

Chapter 11. Experimental Design: One-Way Independent Samples Design 11-1 Chapter 11. Experimental Design: One-Way Independent Samples Design Advantages and Limitations Comparing Two Groups Comparing t Test to ANOVA Independent Samples t Test Independent Samples ANOVA Comparing

More information

Chapter 3 Missing data in a multi-item questionnaire are best handled by multiple imputation at the item score level

Chapter 3 Missing data in a multi-item questionnaire are best handled by multiple imputation at the item score level Chapter 3 Missing data in a multi-item questionnaire are best handled by multiple imputation at the item score level Published: Eekhout, I., de Vet, H.C.W., Twisk, J.W.R., Brand, J.P.L., de Boer, M.R.,

More information

Comparing Direct and Indirect Measures of Just Rewards: What Have We Learned?

Comparing Direct and Indirect Measures of Just Rewards: What Have We Learned? Comparing Direct and Indirect Measures of Just Rewards: What Have We Learned? BARRY MARKOVSKY University of South Carolina KIMMO ERIKSSON Mälardalen University We appreciate the opportunity to comment

More information

Evaluators Perspectives on Research on Evaluation

Evaluators Perspectives on Research on Evaluation Supplemental Information New Directions in Evaluation Appendix A Survey on Evaluators Perspectives on Research on Evaluation Evaluators Perspectives on Research on Evaluation Research on Evaluation (RoE)

More information

Measuring and Assessing Study Quality

Measuring and Assessing Study Quality Measuring and Assessing Study Quality Jeff Valentine, PhD Co-Chair, Campbell Collaboration Training Group & Associate Professor, College of Education and Human Development, University of Louisville Why

More information

Recognizing Ambiguity

Recognizing Ambiguity Recognizing Ambiguity How Lack of Information Scares Us Mark Clements Columbia University I. Abstract In this paper, I will examine two different approaches to an experimental decision problem posed by

More information

European Federation of Statisticians in the Pharmaceutical Industry (EFSPI)

European Federation of Statisticians in the Pharmaceutical Industry (EFSPI) Page 1 of 14 European Federation of Statisticians in the Pharmaceutical Industry (EFSPI) COMMENTS ON DRAFT FDA Guidance for Industry - Non-Inferiority Clinical Trials Rapporteur: Bernhard Huitfeldt (bernhard.huitfeldt@astrazeneca.com)

More information

Remarks on Bayesian Control Charts

Remarks on Bayesian Control Charts Remarks on Bayesian Control Charts Amir Ahmadi-Javid * and Mohsen Ebadi Department of Industrial Engineering, Amirkabir University of Technology, Tehran, Iran * Corresponding author; email address: ahmadi_javid@aut.ac.ir

More information

Checking the counterarguments confirms that publication bias contaminated studies relating social class and unethical behavior

Checking the counterarguments confirms that publication bias contaminated studies relating social class and unethical behavior 1 Checking the counterarguments confirms that publication bias contaminated studies relating social class and unethical behavior Gregory Francis Department of Psychological Sciences Purdue University gfrancis@purdue.edu

More information

Vocabulary. Bias. Blinding. Block. Cluster sample

Vocabulary. Bias. Blinding. Block. Cluster sample Bias Blinding Block Census Cluster sample Confounding Control group Convenience sample Designs Experiment Experimental units Factor Level Any systematic failure of a sampling method to represent its population

More information

Adjusting for mode of administration effect in surveys using mailed questionnaire and telephone interview data

Adjusting for mode of administration effect in surveys using mailed questionnaire and telephone interview data Adjusting for mode of administration effect in surveys using mailed questionnaire and telephone interview data Karl Bang Christensen National Institute of Occupational Health, Denmark Helene Feveille National

More information

CHAPTER 4 RESULTS. In this chapter the results of the empirical research are reported and discussed in the following order:

CHAPTER 4 RESULTS. In this chapter the results of the empirical research are reported and discussed in the following order: 71 CHAPTER 4 RESULTS 4.1 INTRODUCTION In this chapter the results of the empirical research are reported and discussed in the following order: (1) Descriptive statistics of the sample; the extraneous variables;

More information

Chapter 1: Explaining Behavior

Chapter 1: Explaining Behavior Chapter 1: Explaining Behavior GOAL OF SCIENCE is to generate explanations for various puzzling natural phenomenon. - Generate general laws of behavior (psychology) RESEARCH: principle method for acquiring

More information

Considerations for requiring subjects to provide a response to electronic patient-reported outcome instruments

Considerations for requiring subjects to provide a response to electronic patient-reported outcome instruments Introduction Patient-reported outcome (PRO) data play an important role in the evaluation of new medical products. PRO instruments are included in clinical trials as primary and secondary endpoints, as

More information

ASSESSING THE EFFECTS OF MISSING DATA. John D. Hutcheson, Jr. and James E. Prather, Georgia State University

ASSESSING THE EFFECTS OF MISSING DATA. John D. Hutcheson, Jr. and James E. Prather, Georgia State University ASSESSING THE EFFECTS OF MISSING DATA John D. Hutcheson, Jr. and James E. Prather, Georgia State University Problems resulting from incomplete data occur in almost every type of research, but survey research

More information

Using multiple imputation to mitigate the effects of low examinee motivation on estimates of student learning

Using multiple imputation to mitigate the effects of low examinee motivation on estimates of student learning James Madison University JMU Scholarly Commons Dissertations The Graduate School Spring 2017 Using multiple imputation to mitigate the effects of low examinee motivation on estimates of student learning

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

26:010:557 / 26:620:557 Social Science Research Methods

26:010:557 / 26:620:557 Social Science Research Methods 26:010:557 / 26:620:557 Social Science Research Methods Dr. Peter R. Gillett Associate Professor Department of Accounting & Information Systems Rutgers Business School Newark & New Brunswick 1 Overview

More information

Minimizing Uncertainty in Property Casualty Loss Reserve Estimates Chris G. Gross, ACAS, MAAA

Minimizing Uncertainty in Property Casualty Loss Reserve Estimates Chris G. Gross, ACAS, MAAA Minimizing Uncertainty in Property Casualty Loss Reserve Estimates Chris G. Gross, ACAS, MAAA The uncertain nature of property casualty loss reserves Property Casualty loss reserves are inherently uncertain.

More information

Review of Pre-crash Behaviour in Fatal Road Collisions Report 1: Alcohol

Review of Pre-crash Behaviour in Fatal Road Collisions Report 1: Alcohol Review of Pre-crash Behaviour in Fatal Road Collisions Research Department Road Safety Authority September 2011 Contents Executive Summary... 3 Introduction... 4 Road Traffic Fatality Collision Data in

More information

HPS301 Exam Notes- Contents

HPS301 Exam Notes- Contents HPS301 Exam Notes- Contents Week 1 Research Design: What characterises different approaches 1 Experimental Design 1 Key Features 1 Criteria for establishing causality 2 Validity Internal Validity 2 Threats

More information

Multivariable Systems. Lawrence Hubert. July 31, 2011

Multivariable Systems. Lawrence Hubert. July 31, 2011 Multivariable July 31, 2011 Whenever results are presented within a multivariate context, it is important to remember that there is a system present among the variables, and this has a number of implications

More information

Supplement 2. Use of Directed Acyclic Graphs (DAGs)

Supplement 2. Use of Directed Acyclic Graphs (DAGs) Supplement 2. Use of Directed Acyclic Graphs (DAGs) Abstract This supplement describes how counterfactual theory is used to define causal effects and the conditions in which observed data can be used to

More information

Quality Digest Daily, March 3, 2014 Manuscript 266. Statistics and SPC. Two things sharing a common name can still be different. Donald J.

Quality Digest Daily, March 3, 2014 Manuscript 266. Statistics and SPC. Two things sharing a common name can still be different. Donald J. Quality Digest Daily, March 3, 2014 Manuscript 266 Statistics and SPC Two things sharing a common name can still be different Donald J. Wheeler Students typically encounter many obstacles while learning

More information

An Empirical Assessment of Bivariate Methods for Meta-analysis of Test Accuracy

An Empirical Assessment of Bivariate Methods for Meta-analysis of Test Accuracy Number XX An Empirical Assessment of Bivariate Methods for Meta-analysis of Test Accuracy Prepared for: Agency for Healthcare Research and Quality U.S. Department of Health and Human Services 54 Gaither

More information

CHAMP: CHecklist for the Appraisal of Moderators and Predictors

CHAMP: CHecklist for the Appraisal of Moderators and Predictors CHAMP - Page 1 of 13 CHAMP: CHecklist for the Appraisal of Moderators and Predictors About the checklist In this document, a CHecklist for the Appraisal of Moderators and Predictors (CHAMP) is presented.

More information

WELCOME! Lecture 11 Thommy Perlinger

WELCOME! Lecture 11 Thommy Perlinger Quantitative Methods II WELCOME! Lecture 11 Thommy Perlinger Regression based on violated assumptions If any of the assumptions are violated, potential inaccuracies may be present in the estimated regression

More information