Challenges and Implications of Missing Data on the Validity of Inferences and Options for Choosing the Right Strategy in Handling Them

Size: px

Start display at page:

Download "Challenges and Implications of Missing Data on the Validity of Inferences and Options for Choosing the Right Strategy in Handling Them"

Prudence Dean
5 years ago
Views:

1 International Journal of Statistical Distributions and Applications 2017; 3(4): doi: /j.ijsd ISSN: (Print); ISSN: (Online) Challenges and Iplications of Missing Data on the Validity of Inferences and Options for Choosing the Right Strategy in Handling The Nicholas Pindar Dibal 1, *, Ray Okafor 2, Haadu Dallah 3 1 Departent of Matheatical Sciences, University of Maiduguri, Maiduguri, Nigeria 2 Departent of Matheatics, University of Lagos, Lagos, Nigeria 3 Departent of Actuarial Science, University of Lagos, Lagos, Nigeria Eail address: pndibal@gail.co (N. P. Dibal), okaforray@yahoo.co (R. Okafor), dallara2014@gail.co (H. Dallah) To cite this article: Nicholas Pindar Dibal, Ray Okafor, Haadu Dallah. Challenges and Iplications of Missing Data on the Validity of Inferences and Options for Choosing the Right Strategy in Handling The. International Journal of Statistical Distributions and Applications. Vol. 3, No. 4, 2017, pp doi: /j.ijsd Received: Septeber 19, 2017; Accepted: October 17, 2017; Published: Noveber 20, 2017 Abstract: Missing data in surveys and experiental research is a coon occurrence which has serious iplications on the validity of inferences. Advances in statistical procedures provides better and efficient ethods of handling issing data yet any researches still handle incoplete data in ways that affects the results negatively. We review in detail the echaniss that generates issingness, and the appropriate ethods to account for the issing values to enable the researcher have adequate knowledge to ake infored decision on the choice of ethod to account for issingness. Keywords: Missing Data, Inference, Missingness Mechaniss, Ignorable, Non-Ignorable Missingness, Multiple Iputation 1. Introduction Missing values creates serious probles for researchers on the field and other data users during statistical analyses and data interpretation, especially when the assuption of ignorability of the issing data does not hold because ost of statistical ethods are not designed to handle incoplete data ([33]; [28]; [18]). Iproper ethods used to account for issing data usually results in biased estiate ([29]; [36]; [42]). Missing data are unobserved values, they occur when the actual data we intend to easure could not be easured for soe reasons leaving us with incoplete data. Missing data is a coon occurrence in surveys and experients when soething goes wrong for unanticipated reasons. Data ay be issing for a wide range of reasons soe of which can be partially controlled by the researcher while others are not. The ethical iperative that participation in any study is voluntary eans that participants are free to skip any question on issues sensitive to the or even withdraw fro the study whenever they wish thereby increasing the possibilities of issing values [1]. Missing values are unavoidable in any instances, hence the occurrence of issing values distorts the representativeness of the saple thereby affecting inferences about the population of study. Incoplete data signifies that soe inforation about the population paraeter are issing which ay likely influence the validity of results ([44]; [38]; [39]). Deciding on how to account for issing values is a challenging task that requires a good understanding of the reasons why the data got issing and the pattern of issingness. Unethical and unprofessional practices by any has led to the filling in of the issing values with zero assuing that unobserved values are equivalent to zero, or replacing the with any arbitrary values. Others siply resort to the default option in soe statistical software which deletes units with issing values, however, this assuption is not universally applicable to all types of issing data. Incoplete data poses great challenges in analysis resulting in invalid inferences, hence issing values should be handled with great care. In any surveys and experients, researchers rarely have detail inforation about the reasons for which parts of the data are issing, hence, any researchers rely on the default position in soe statistical software to account for issing

2 88 Nicholas Pindar Dibal et al.: Challenges and Iplications of Missing Data on the Validity of Inferences and Options for Choosing the Right Strategy in Handling The data without giving regard to what cause the issigness ([32]). Incoplete data yields results that do not adequately describe the population of interest ([4]; [1]; [11]; [15]; [25]; [22]). Accounting for issing values soeties takes longer than necessary due to lack of proper understanding of the issingness echaniss ([32]). In the works of [31]), several journals were reviewed to assess how issing values were handled in researches. Of the 1087 studies in 918 articles with quantitative coponents, attention was given to; saple size, the df of the test statistics reported, and how issing data were treated. The survey showed that out of 1087 studies, 305 (28%) did not ake any report on issing data, 587 (54%) showed evidence of issing data, and the reaining 195 (18%) did not provide sufficient inforation to deterine whether or not the data used were incoplete. Out of the cases with issing values, 569 (97%) reported dealing with such probles where 509 (89.5%) used the listwise deletion (LD) ethod and 43 (7.6%) the pairwise deletion (PD) ethod, only 2.9% used other ethods to account for the issing data. The over usage of listwise deletion ethod in handling issing data as indicated in the survey, could possibly be because of ease of application. Being the default option in ost statistical software, it coes handy for researchers who don t know which ethod to use. Listwise Deletion reduces statistical power due to reduction in the saple size ([5]). Even when the statistical power of a test statistic is not of interest, the accuracy of predicted values ay still be biased. For instance, with 10% of observations eliinated randoly fro each of 5 variables in a dataset, 59% of the total observations will be lost thereby seriously affecting the statistical power of the test ([26]; [14]). Other new ethods used to handle issing data include the works of [12] which uses quantile regression to replace the issing values, their ethod cobine both quantile regression iputation and general estiating equation ethods, which have copetitive advantages over soe of the ost widely used paraetric and non-paraetric iputation estiators. [41], also reviewed ethods of handling issing data giving ephasis on the application of iputation ethod. 2. Methodology The necessary steps and procedures to account for issing values are outlined in [22] and suarized as follows; i) understand the analytic objective and, identify the data structure and study design, ii) ake appropriate assuptions for issing data echanis, iii) identification of variables and the construction of the iputation odel. In developing the odel to handle issing data, it is iportant to balance sophistication, feasibility of odels and achievable results. To adequately account for issing data, descriptive analysis should be perfored on each variable to distinguish the issing data pattern in the data atrix ([17]; [37]). The issing data patterns are useful in deterining whether the survey was adinistered and entered correctly, it also helps in variable choice for inclusion in the iputation odel and analyses. [37] developed SAS 9.2 acro which identifies issing data pattern faster in four ways by; evaluating the proportion of subjects with each pattern of issing data, the nuber and percentage of issing data for each individual variable, the concordance of issingness in any pair of variables, and possible unit nonresponse. [32], suggested that at the preliinary and diagnostic stages of data preprocessing, stateent giving the range of issing data, such as; issing data ranged fro a low of 2% for Cancer to a high of 12% for HIV/AIDS and any other relevant inforation should be included and presented in a tabular forat as presented in Table 1. Table 1. Suary of Missing Units on each Variable across Seven (7) Counities. Variable Reported Std. Missing % of Missing Mean Cases Error Cases Cases Cancer Malaria Tuberculosis HIV/AIDS Pneuonia The perforance of the different ethods of handling issing data should be evaluated so that infored decision about which ethod to use could be ade; in doing so, the variables and scale of easureent should be considered, however, it should be noted that there is no single ethod that works best in all situations. Graphical plot where applicable could also be used to help identify issing data patterns and to assess how uch values are issing (there is no strict guideline to how uch is too uch). If no pattern can be found by ere visualisation, randoness test is suggested Assuptions About Missing Data Data could be issing for any different reasons such as; accidentally skipping an ite, wrong data entry, lacking knowledge on the issue or becoing frustrated and losing interest in the whole exercise, refusing to respond to particular question for soe reason, relocating to another city thereby aking it difficult to continue with the on-going research or data could as well be issing by design ([2]). Whichever ethod is chosen to account for issing data, it is iportant to understand why and how the values got issing in the first place ([44]; [40]; [43]). Missing data can be thought of as being caused in one or soe cobinations of ways which [20] outlined as; rando processes, processes that are easured, and process that are not easurable. [22], pointed out that identifying the exact echanis that generates the issing values is helpful in choosing the appropriate ethod to use in handling issing data Setup and Notations The notations and setup for the issingness echanis is described as follows; the coplete dataset is denoted by =, where represent the observed part of the data and the issing part of the data. Let the data

3 International Journal of Statistical Distributions and Applications 2017; 3(4): atrix with cases for variables be denoted by =, it is also assued that R is atrix of duy variables that irrors the data atrix, then for the response atrix, the issingness indicator is given as; = 1, 0, "# Hence, a issing value is defined as; Let $% = &' be that the () observation is not available (NA). Then the probability of not observing the variable X lying between two specified nubers * and for the () observation, that is +$% $*,% - would be given by the following conditional probability: +$% $*,% - = +$%= &' $% $*,%. (2) In view of this probability, [28], and [7] described the echaniss which cause the data to be issing ore forally as falling into one of the following three categories according to their dependency structure: issing copletely at rando (MCAR), issing at rando (MAR), and issing not at rando (MNAR) Missing Copletely at Rando (MCAR) Data are said to be issing copletely at rando (MCAR) when the probability that issing value on one variable is unrelated to the unit s score on any other variable, whether the other variables are observed or not; that is +"#,0 = +"#. The MCAR assuption is stringent and unreasonable, and rarely holds in real life situations because issingness is usually triggered by other variables in the dataset ([36]; [1]). The assuption of issing copletely at rando ay be reasonable when values are issing by design ([20]; [19]), issingness is copletely rando only when the probability +$% $*,% - is unrelated to the value of the variable X (or to the value of any other variable in the dataset) in the () observation, that is; 2: +$% $*,% - = +$%= &' (3) where 2 =41,2,,7 and +$% = &' is the probability of the observation $% being issing regardless of its value. The idea of values issing copletely at rando virtually appears in every technical paper on issing values, this issingness echanis can be confired using Little's MCAR test ([27]) Missing at Rando (MAR) Data are said to be issing at rando (MAR) when the probability of the issing data does not depend on the unobserved data ( ) but depends on the available inforation ( ), or equally the issing values are not randoly distributed across all observations but are randoly distributed within one or ore subsaples. MAR assuption is less stringent copared to issing copletely at rando assuption ([34]), it depends on observed values and can be justified by including auxiliary variables that either explains why values are issing or predicts the score (1) for the issing values. In reality, there are few auxiliary variables that can do both the functions of predicting issing values and explaining why the values got issing. The +"#,0 = +"# is to say that, the probability that is not observed in the interval $*,%, that is + $*,% - does not depend on the value of the variable X after controlling for another variable Y in the dataset, therefore; 4 2: 8$%=97: +$% $*,% - = +$% = &' (4) where 4 2:8$%=97 is the subset of those observations 8$% on the variable 8, in which 8 is equal to soe constant value 9. The MAR condition is soeties referred to as ignorable issingness because unbiased paraeter estiates can be obtained using direct axiu likelihood (DML) or ultiple iputation (MI) without the need to incorporate an explicit odel that explains why the data are issing ([15]) Missing Not at Rando (MNAR) Missing not at rando (MNAR) assuption is the ost probleatic aong the issingness echaniss as the issing values are not randoly distributed across all the observations, and neither this distribution is rando within any subset(s) that can be drawn fro the given dataset ([7]). The probability of issingness cannot be easily predicted fro the variables in the odel, that is +"#,0 cannot be quantified or siplified since the issingness depends on the issing value itself. The probability +$% $*,% - depends on the unobserved value of the variable X, i.e, 2: +$% $*,% - = :;<$%= >? <$% $A,% :;<$% $A,% (5) where +$% $*,% is the probability of the variable being in the interval $*,% in the () observation, regardless of whether this observation will be issing or not and the joint probability Pr$%= &' $% $*,% is the probability of the observation $% being issing while in the interval $*,%. With the issing observation being dependent on events or ites which the researcher has not easured, it is difficult or ipossible to evaluate the probability of the issing values. The issing not at rando echanis is referred to as non-ignorable issingness Methods of Handling Missing Data Most of the standard statistical ethods of data analysis are usually not applicable with incoplete data, therefore to get valid inferences about the population of study, the researcher needs to understand the iplications of issing data and decide on the best approach to use in accounting for the ([29]; [34]; [28]; [36]; [10]; [40]; [16]; [42]). In choosing the ost appropriate ethod to use in handling the challenges posed by issing data, it is iportant to understand how and why observations are issing and how uch influence they have on the results of the study as

4 90 Nicholas Pindar Dibal et al.: Challenges and Iplications of Missing Data on the Validity of Inferences and Options for Choosing the Right Strategy in Handling The pointed out by [40] and [7]. The best way to handle issing values and ensure valid inference is to coe up with a good design to prevent or reduce their occurrence in the first place ([24]), or repeat the experient to generate the coplete data set again which is not feasible especially where readings are taken at set ties or the cost of retesting is prohibitive. It should be noted that however hard we ay try; values ay still be issing for unanticipated reasons in soe surveys. The procedures and ethods used to account for issing values are not eant to recreate the issing values exactly but to ake valid and efficient inferences about the population of interest with or without issing data ([36]). The choice of the ethod of handling issing data is often related to particular data characteristics and to the goal of iputation ([23]). [36]) pointed out that the perforance of the different ethods of analysing incoplete data depends upon the ultiate goals of the analysis. In deciding which iputation ethod to use, [22] suggests that diagnostic checks be carried out on the iputation odel to help identify odel defects and facilitate odel iproveent. In the following section, several ad hoc and statistically principled ethods of handling issing data together with their benefits and drawbacks are discussed in detail to enable researchers and data analyst ake good decision on the best ethod to use in handling issing values ([34]) Listwise Deletion The ost coonly used ethod of handling issing values is listwise deletion (LD), this ethod siply discards cases with issing values thereby yielding coplete data set with reduced saple size. It is easy to ipleent, very fast to conduct and does not require recreation of the issing data. Most researchers especially those with little or no understanding of the iplications of issing data on the validity of results usually adapt this technique. Being the default in ost statistical software, it allows for the application of any statistical ethod of analysis and it has seen great increase in application without verification of the issingness echanis ([3]; [32]; [1]; [31]). The loss of observations reduces the saple size thereby inflating the standard errors and eventually resulting in invalid inferences. Unbiased paraeter estiates are possible only when data are MCAR, a condition which rarely holds in practice Pairwise Deletion (PD) Pairwise deletion is another ethod of handling issing values where all observed values on a subject are retained and issing values are only deleted in pairs as the analysis is carried out. This technique eliinates pair of cases if one or both values are unavailable and only cases with nonissing values are used to copute eans and variances, under this ethod, different calculations will utilise different cases with different saple sizes thereby producing undesirable effect ([3]; [9]). Pair-wise deletion is applicable and useful when data are MCAR, the issing cases involved are sall on each variable relative to the total saple size and large nubers of variables are involved ([8]; [3]). Due to the varying saple sizes for each analysis, this ethod is not recoended for easures such as correlation and covariance which are highly influenced by saple size Mean Substitution (MS) The ean substitution (MS) ethod replaces issing values with the ean of the observed cases once. Analysis is based on coplete data as the saple size is not altered for any variable, no collected inforation is discarded ([35]). It is fast and easy to ipleent, however it has several drawbacks. The ajor shortcoing of the ean substitution is that it does not take into account the uncertainties associated with the issing data thereby overestiating the saple size and artificially decreasing the variability between individuals' responses ([28]; [1]). This usually results in narrower confidence intervals and produces correlations which are negatively biased between pairs of variables. A different and better approach is to replace the issing values with sub-group ean within the data set, for exaple when handling longitudinal data, issing score can be replaced with the ean of individual s responses on other waves Regression-Based Iputation (RI) The regression-based iputation is another ethod of handling issing data where the issing cases are replaced with predicted values derived fro regression equation based on observed values of the variables in the data set that are coplete. Also referred to as conditional ean iputation, it is probably one of the best aong the siple ad-hoc ethods because cases with issing values are preserved and saple size is aintained. It is ore inforative since all existing inforation are utilized. The shortcoings of this ethod are; values beyond the logical range of the data ay be iputed thereby distorting inferences, choosing the right regression odel to fit the given data is challenging, it is not suitable for application on ultivariate data having ore than one variable with issing values, and large saple size is required to produce valid estiates Hot-Deck and Cold-Deck Iputation Hot-deck iputation replaces issing values with nonissing values taken fro a randoly selected, closely atched observation in the sae data set as the observation containing the issing value. Cold-deck iputation replaces issing values fro observations atched in a different data set. The hot-deck ethod is slightly ore coplex than the other ad-hoc procedures discussed, it involves several steps; first, copletely observed variables are separated fro incoplete ones. Next, both the incoplete variables and copletely observed variables with siilar attributes are grouped together on the basis of soe characteristics (e.g., household size, job category, incoe, educational level, etc.). The issing values are replaced with values randoly drawn fro fully observed individuals having siilar factors that predict issingness. In case of ultiple individuals atching the ite with issing values, the ean score of the atching individuals or rando draw fro the distribution of the

5 International Journal of Statistical Distributions and Applications 2017; 3(4): donor cases can be used. The proble with the hot deck ethod is that it grossly underestiates the variability in the saple as copared to the coplete data, however it is better than the ean iputation Multiple Iputation (MI) Multiple iputation is a flexible predictive approach that replaces issing data by creating " D1 plausible replaceents developed by [34] to address the shortcoings associated with the single iputation ethods. Figure 1. Multiple Iputation Steps. The procedure takes into account the uncertainty associated with the issing values, this ethod is applicable when data are issing at rando. Unlike the ad hoc ethods, ultiple iputation cobines both classical and Bayesian statistical techniques using suitable odels to create " iputed datasets which takes into account the uncertainties associated with the issing values ([34]). The usefulness of ultiple iputation in accounting for issing values has been docuented in any research works ([35]; [3]; [28]; [1]; [7]). The iputation odel should be specified accurately so that the degree of uncertainty about the issing data will be adequately reflected and, data relationships and associations are preserved ([3]; [30]; [22]). When using ultiple iputation, we assue that observations are issing at rando (MAR) to ake it possible to ignore the process that causes the issing data ([30]). Under MAR, the ultiple iputation approach retains the advantages of axiu likelihood (ML) ethod and allows the uncertainty caused by the iputation to be incorporated into the coplete-data analysis ([22]). Multiple iputation consists of three basic steps outlined as; i) iputation step to create coplete datasets using the chosen iputation odel, ii) analysing each of the -iputed dataset separately, and iii) cobining the results into a single estiate using the Rubin s rule. These steps are presented in Figure 1. Multiple iputation (MI) allows for the use of coplete-data analysis ethods and incorporates rando errors to account for the uncertainties associated with the issing values ([34]). Multiple iputation perfors better by iniizing the standard errors and increasing the efficiency of estiates as copared to the single iputation ethods and, can be ipleented using any odel on any data without requiring specialized software. Rando seed should be set to ensure reproducibility of results. There are several nuber of ultiple iputation (MI)) approaches which have been proposed for dealing with issing data probles. [13], cobine the advantages of kernel estiators where kernelbased sapling weights were developed to create iputations, and the popular doubly robust ethods developed to handle the isspecification of the outcoe odel. They used these two strategies to develop a kernelbased doubly robust MI ethod which is ore robust than paraetric alternatives against the isspecification of the outcoe odel Pooling the Paraeter Estiates for Inference Each of the iputed dataset is analysed using coplete-data analysis ethod specified for the research under identical conditions. To draw inference about the population, all the " estiates of the paraeters being considered are cobined into single values ([34]; [38]) thereby yielding; ˆ Qi (6) i= 1 Q = where EF is the pooled paraeter estiate and EG 1 " is the paraeter estiate for the () iputed coplete dataset. The cobined standard error using Rubin s rules ([34]) is slightly ore coplex as there are two coponents that ake up the total error. Let EG be the estiate of a scalar quantity of interest obtained fro the iputed dataset 1 " and H is the standard error associated with EG.

6 92 Nicholas Pindar Dibal et al.: Challenges and Iplications of Missing Data on the Validity of Inferences and Options for Choosing the Right Strategy in Handling The The overall standard error for the iputation process is the su of within and between variances, where; the withiniputation variance is represented by 1 Ui i = 1 and the between-iputation variance by U = (7) K1+ M NOP (11) where is the nuber of iputed datasets and Q the proportion of issing values in the data set. Inference based on ultiple iputation are ore efficient than those based on the ad hoc ethods which either discard cases with issing values or ipute once without taking into account the uncertainties associated with the issing values ([28]). i= 1 ( Q ) 2 i Q 1 B = ˆ (8) 1 and the total variance is then calculated by adding the within and between variances ([21]) as follows: 1 T = U + 1+ B The overall standard error and confidence intervals associated with the paraeters of interest can be estiated in the noral way while the degrees of freedo is given as; df = ( 1) 1 + U ( + 1) B 2 (9) (10) The rate of issing data is used to deterine the nuber of data sets to ipute, [34] and [3] suggest that for practical purposes, sall nuber " 5 of repeated iputations is adequate to produce estiates which give valid inferences. Soe scholars however recoend that ranges fro 20 to 100 or ore as using large nuber of iputed datasets iproves the stability of results ore especially when estiating sall size effect ([21]; [2]). The relative efficiency of the estiates could however be evaluated using 3. Siulation and Results Table 2. Paraeter Estiates under Different Iputation Methods. Four siulated data on disk polishing was used to illustrate soe ethods for handling issing data. The perforance of the different ethods were copared using the following datasets; a coplete data set (to serve as control). b part of the coplete data were set issing at 5% and 35% and analyzed using listwise deletion. c use ean substitution to analyze the two datasets in (ii). d use ultiple iputation to analyze the two datasets in (ii). The usefulness of soe of the ethods of handling issing data is deonstrated using the siulated data on the tie it takes to polish a disk based on the thickness of the disk (), diaeter of the disk (c) and aount of hardener (gras) added to the cast. Fifty nine (59) sets of easureents with 5% and 35% of values set issing randoly were used. Regression analysis was used on the coplete data to odel the tie it takes to polish a disk to serve as a control. Three ethods of handling issing data; Listwise Deletion (LD), Mean Substitution (MS) and Multiple Iputation (MI) were studied and copared with the control. The paraeter estiates together with their standard errors are presented in Table 2. Missing Rate Intercept Diaeter Thickness Hardener Iputation Methods Control LD MS MI 0% 5% 35% 5% 35% 5% 35% (6.78) (7.48) (15.31) (7.74) (9.36) (7.2) (8.53) (0.51) (0.53) (0.99) (0.55) (0.64) (0.53) (0.53) (1.59) (1.59) (4.07) (1.67) (3.14) (1.64) (2.45) (3.05) (3.59) (4.42) (3.42) (3.59) (3.27) (3.14) 4. Discussion With LD, any case with at least one issing value is oitted, hence the residual standard error was calculated on 44 degrees of freedo as 11 observations were deleted with 5% of the observations issing giving estiates coparable with the coplete case. With 35% issingness rate, 49 of the observations were deleted thereby giving biased estiates. With ean substitution (MS), no observations were lost but substituting with the ean of the variables reduces the variation which eventually bias the estiate at both 5% and 35%. Estiates obtained using MI were coparable to the coplete case under both sall and large rates of issing values as all issing values were replaced predictively 15 ties and no case was deleted. Handling issing values therefore requires better understanding of the pattern of issingness and the reason why the observations got issing

7 International Journal of Statistical Distributions and Applications 2017; 3(4): to enable the choice of the best ethod to use. 5. Conclusion The ethods discussed are all useful in handling issing data, however their application depends on how uch of the data are issing and what causes the data to be issing. This paper provide researchers, ore especially those with little or no understanding of statistics with guidance on how to handle issing data efficiently. The detail description of the issing data echaniss, pattern of issing values and why the data got issing will adequately equip the researcher to ake infored decision in the design of the research and the choice of the ethod to use. In all, the choice of ethod for handling issing data should be guided by the need to preserve the essential characteristics of the data, aintain the representativeness of the analyzed data, provide valid statistical inference (control Type I error), axiize the statistical power of the test (iniize Type II error), and avoid bias. However it is recoended that researchers should put in place easures that iniise the occurrence of issing values in the first instance to enhance the quality and validity of inferences obtained fro incoplete data. Consultation with professionals and experts in survey design and experientation could be the first step in overcoing the challenges of issing data. References [1] Acock, A. C. (2005). Working with issing values. Journal of Marriage and Faily. 67; [2] Ader, H. J. {2008). Missing data. In Ader, H. J. & Mellenbergh, G. J. (Eds). Advising on research Methods: A consultant s copanion. (pp ). Huizen, The Netherlands: Johannes van Kessel Publishing. [3] Allison, P. D. (2000). Multiple iputation for issing data: A cautionary tale. Sociological Methods & Research, 28 (3); [4] Allison, P. D. (2003). Missing data techniques for structural equation odeling. Journal of Abnoral Psychology. 112 (4); [5] Beasley, T. M. (1988). Coents on the analysis of data with issing values. Multiple Linear Regression Viewpoints. 25, [6] Blackwell, M., Honaker, J. and King, G (2011). Multiple Over iputation: A Unified Approach to Measureent Error and Missing Data. [7] Bolotin, A. (2010). Anew ethod of ultiple iputation for copletely (or alost copletely) issing data. Proceeding MACMESE'10 Proceedings of the 2 th WSEASinternational conference on Matheatical and coputational ethods in science and engineering. [8] Burke, S. (1998). Missing values, outliers, robust statistics & non-paraetric ethods. LC GC Europe Online Suppleent. Scientific Data Manageent. 2 (2), [9] Carpenter, J. R. (2010). Statistical odeling with issing data using ultiple Iputation. www. issingdata. org.uk. [10] Carpenter, J. R. and Kenward, M. G. (2008). Missing data in randoized controlled trials-a practical guide. Biringha: National Health Service Coordinating Centre for Research Methodology, [11] Carter, R. L. (2006). Solutions for issing data in structural equation odeling. Research & Practice in Assessent. 1 (1); [12] Chen, S. (2014) "Iputation of issing values using quantile regression". Unpublished Graduate Theses and Dissertations. Iowa State University. [13] Chiu-Hsieh, H., He, Y., Li, Y., Long, Q. and Friese, R. (2016). Doubly robust ultiple iputation using kernel-based techniques. Bio J. 58(3): [14] Davey, A. and Savla, J. (2010). Statistical power analysis with issing data: A structural equation odeling approach. NY: Routledge; [15] Enders, C. K. (2006). A prier on the use of odern issingdata ethods in psychosoatic edicine research. Psychosoatic Medicine, 68; [16] Fisher, A. and Waclawski, A. (2009). A survey of techniques for identifying and handling outliers and issing values in tie series data. 29 th International Syposiu on Forecasting. Hong Kong. ww.forecasters.org/isf. [17] Foster, P. J., Mai, M. A. and Bala, A. M. (2009). On treatent of the ultivariate issing data. Research Report No. 13, Probability and Statistics Group, School of Matheatics. The University of Manchester. [18] Glas, C. A. W. and Pientel, J. L. (2008). Modeling Nonignorable Missing Data in Speeded Tests. Educational and Psychological Measureent 68 (6), [19] Graha, J. W. (2009). Missing data analysis: Making it work in the real world. Annual Review of Psychology, 60; [20] Graha, J. W., Cusille, P. E. and Elek-Fisk, E. (2003). Methods for handling issing data. In J. A. Schinka & W. F. Velicer (Eds.), Research Methods in Psychology (pp ). Handbook of Psychology, New York: John Wiley & Sons. [21] Graha, J. W., Olchowski, A. E., and Gilreath, T. D. (2007). How any iputations are really needed: Soe practical clarifications of ultiple iputation theory. Prevention Science, 8, [22] He, Y. (2010). Missing data analysis: Getting to the heart of the atter. Journal of the Aerican Heart Association. 3; [23] Horton, N. J. and Kleinan, K. P. (2007). Much ado about nothing: A coparison of issing data ethods and software to fit incoplete data regression odels. The Aerican Statistician, 61(1), [24] Kang, H. (2013). The prevention and handling of the issing data. Korean J Anesthesiol.; 64(5): [25] Kenward, M. (2007). Missing Data with MLwiN: An overview. A Paper Presented at Researcher Developent Initiative Workshop. London School of Hygiene and Tropical Medicine.

8 94 Nicholas Pindar Dibal et al.: Challenges and Iplications of Missing Data on the Validity of Inferences and Options for Choosing the Right Strategy in Handling The [26] Ki, J. O. and Curry, J. (1977). The treatent of issing data in ultivariate analysis. Sociological Methods & Research, 6 (2); [27] Litttle, R. J. A. (1988). A test of issing copletely at rando data with issing values. Journal of the Aerica Statistical Association. 83 (404); [28] Little, R. J. A. and Rubin, D. B. (2002). Statistical analysis with issing data, (2 nd ed.). New York: John Wiley & Sons. [29] Okafor, R. (1982). Bias due to logistic non-response in saple survey. (Unpublished Ph. D. Thesis Subitted to the Departent of Statistics, Harvard University. Cabridge, Massachusetts). [30] Patrcian, A. P. (2002). Focus on research ethods: Multiple iputation for issing data. Research in Nursing and Health, 25, [31] Peng, C. Y., Harwell, M. R., Liou, S. M., & Ehan, L. H. (2006). Advances in issing data ethods and iplications for educational research. In S. S. Sawilowsky (Ed.), Real Data Analysis. (pp ). New York. [32] Pigott, T. D. (2001). A review of ethods for issing data. Educational Research and Evaluation, 7 (4); [33] Rubin, D. B. (1976). Inference and issing data. Bioetrika, 63; [34] Rubin, D. B. (1987). Multiple iputation for non-response in Surveys. New York: Wiley. [35] Schafer, J. L. (1997). Analysis of incoplete ultivariate data, New York: Chapan and Hall. [36] Schafer, J. L. and Graha, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7(2); [37] Schwartz, T., Chen, NY Q. and Duan, NY N. (2011). Studying issing data patterns using a SAS Macro. Statistics and Data Analysis, SAS Global Foru. Paper 339. [38] Song, Q., Shepperd, M., Cartwright M. and Twala, B. (2005), A New iputation ethod for sall software project data sets. [39] Stratton, I. M., and Aldington, S. J. (2007). Missing data eans lost opportunities. Journal of Clinical Research Best Practices. 3, (5). [40] Stuart, E. A., Azur, M., Frangakis, C. and Leaf, P. (2009). Multiple iputation with large data set: A case study of the children s ental health initiative. Aerican Journal of Epideiology, 169 (9); [41] Swetha, S. (2016). An Integral Study on Missing Value Data Iputation. International Journal of Engineering Sciences & Research Technology. 5(2) [42] Todorov, V., Tepl, M. and Filzoser, P. (2011). Detection of ultivariate outliers in business survey data with incoplete inforation. Advance Data Analyses and Classification. 5; [43] vonhippel, P. T. (2013). Should a noral iputation odel be odified to ipute skewed variables? Sociological Methods and Research, 42 (1); [44] Wayan, J. C. (2003). Multiple iputation for issing data: What is it and how can I use it? Paper presented at the Annual Meeting of the Aerican Educational Research Association, Chicago, IL.

Tucker, L. R, & Lewis, C. (1973). A reliability coefficient for maximum likelihood factor

T&L article, version of 6/7/016, p. 1 Tucker, L. R, & Lewis, C. (1973). A reliability coefficient for axiu likelihood factor analysis. Psychoetrika, 38, 1-10 (4094 citations according to Google Scholar