Skew and Internal Consistency

Size: px
Start display at page:

Download "Skew and Internal Consistency"

Transcription

1 Journal of Applied Psychology Copyright 2006 by the American Psychological Association 2006, Vol. 91, No. 6, /06/$12.00 DOI: / RESEARCH REPORTS Skew and Internal Consistency Tammy Greer The University of Southern Mississippi Samuel T. Hunter University of Oklahoma William P. Dunlap Tulane University Mitchell E. Berman The University of Southern Mississippi The effects of skew on the standardized item alpha were examined with Monte Carlo techniques. Alphas computed from normal variables were compared with alphas from lognormal variables, ranks, and skewed versus normal Likert-type variables. The extent and direction of skew were varied, as was the size of the population interitem correlation (rho), the number of items, and the number of categories for Likert-type variables. Because the average interitem correlation affects alpha and skew affects the average interitem correlation, the effect of skew on the average interitem correlation also was examined. Results indicated that skew decreased the average interitem correlation and produced small decreases in alpha that were largest when skew was large, rho was small, items were skewed in opposite directions, and there were fewer items. Keywords: skew, internal consistency, Monte Carlo simulation A considerable and growing literature is devoted to the development and validation of indices of reliability (Hattie, 1985). Despite differences among these indices, most measures of reliability are dependent in some form on correlation. Whether the relation examined is that between the same test given at different times (test retest reliability), between two similar tests given at different times (parallel forms reliability), between two halves of the same test (split-half reliability), or among items within one test (internal consistency reliability), the measure of the strength of the relation is usually constructed from a ratio of covariance to variance, which is fundamentally a correlation. As correlations are the foundation of most measures of reliability, it is reasonable to assume that conditions that attenuate correlations will attenuate estimates of reliability. Nonnormality in the sample (or population) distribution is one condition that limits the upper bound of the correlation coefficient (Kowalski, 1972). One indicator of nonnormality is the amount of skew in the distribution. Skewed distributions are quite common. Micceri (1989), in his article creatively titled The Unicorn, the Normal Curve, and Other Improbable Creatures, investigated 440 large-sample achievement and psychometric measures (p. 156) and found that all 440 distributions differed significantly Tammy Greer and Mitchell E. Berman, Department of Psychology, The University of Southern Mississippi; William P. Dunlap, Department of Psychology, Tulane University; Samuel T. Hunter, Department of Psychology, University of Oklahoma. William P. Dunlap passed away in Correspondence concerning this article should be addressed to Tammy Greer, Department of Psychology, The University of Southern Mississippi, 118 College Drive, Box 5025, Hattiesburg, MS tammy.greer@usm.edu from normality. Although Micceri s interest was in any deviation from normality, including kurtosis and multimodality as well as skew, skewed distributions were a very common source of nonnormality. A total of 71.6% of the distributions in Micceri s study were classified as exhibiting moderate to extreme asymmetry, and 15.9% of these exhibited exponential skew. The effects of skew on Pearson s product moment correlation are well documented (Calkins, 1974; Corballis, 1968; Dunlap, Burke, & Greer, 1995; Dunlap, Chen, & Greer, 1994) and fairly straightforward: Increases in skew of either or both variables decrease the magnitude of the correlation coefficient. The greatest reductions in Pearson s product moment correlation occur when skew is in opposite directions. When a variable is skewed in a given direction, the correlation between that variable and another variable will be maximum when the distribution of the second variable exhibits some skew in the same direction as the first variable but of a lesser amount. The effects of skew on test retest reliability also have been explored. Dunlap et al. (1994) examined continuous measures that were considerably skewed from a large-sample national survey given twice to the same group of individuals (with 2 years intervening). Skew was then minimized by a power transformation. Results indicated that transforming the variables to reduce skew improved test retest reliability. The greatest improvements in reliability occurred with larger decreases in skew. In addition to the impact of skew on Pearson s product moment correlation and test retest reliability, skew also may affect factor analysis results. When variables that are skewed in opposite directions are factor analyzed, factors may appear that are not content based but artifacts of the different distribution shapes (Corballis, 1968). It has long been known that item skew in dichotomous items affects estimates of internal consistency and, 1351

2 1352 RESEARCH REPORTS when these items are factor analyzed, produces an additional factor, commonly known as a difficulty factor (because dichotomous items are often scored pass fail). The problem of artifactual factors resulting from differences in item distributions and their effects on estimates of internal consistency has been present in the literature since at least Loevinger (1947) criticized the use of Kuder Richardson 20 (Kuder & Richardson, 1937), a special case of Cronbach s coefficient alpha for dichotomous items, because alpha could not attain unity unless all items had distributions of the same shape (Cronbach s rephrasing of Loevinger s criticism; Cronbach, 1951). Issues concerning the presence of difficulty factors, including their impact and possible remedies for their effects, are reviewed extensively in the literature (Corballis, 1968; Dingman, 1958; Horst, 1953; TenVergert, Kingma, & Gillespie, 1990). Dichotomous items are not amenable to skew correction by transformation, however, and hence are beyond the scope of this study. These same issues, though, are generalizable to continuous items with skewed item distributions. For example, Cortina (1993) discussed the effects of multiple factors on internal consistency reliability and concluded that internal consistency is reduced when multiple factors exist. The effects of skew on reliability estimates, therefore, are both direct skew attenuates correlation and indirect skew may produce factors that adversely affect measures of internal consistency. Moreover, skew may affect the overall interpretability of the factors themselves. As noted by Tabachnick and Fidell (2001) regarding the effect of skew on factor analyses, To the extent that normality fails, the solution is degraded (p. 588). Further evidence that measures of internal consistency may be affected by skew was provided by Bandalos and Enders (1996). Using Monte Carlo techniques, these authors examined the effects on Cronbach s alpha of normal and skewed categorical item distributions when sample items were obtained from normal versus skewed underlying continuous distributions. The results revealed that higher values of coefficient alpha were obtained for situations in which the observed categorical distribution was most similar to the underlying distribution (p. 157). These authors acknowledged, however, that if items in one scale have different item distributions, regardless of the underlying distribution shapes, reliability will be decreased. They then cautioned that deleting skewed items from the scale to maximize internal consistency may affect the measure s validity, because naturally skewed items often allow for discriminability in the tails of a distribution (Bandalos & Enders, 1996; Nunnally & Bernstein, 1994). In a similar study, Enders and Bandalos (1999) examined the role of heterogeneously skewed item distributions on coefficient alpha. The results indicated that heterogeneous skew among items was associated with lower internal consistency and that this decrease was greatest when skew was large and in opposite directions. The authors suggested that even the largest decreases in internal consistency were relatively small in magnitude (p. 147), however. These results, consistent with those of their earlier work, led them to again caution readers about removing items to maximize internal consistency. Although we agree with their suggestion to be wary about deleting items, it is arguable whether the reductions in internal consistency they observed are negligible, as there may be substantial practical implications for reductions in internal consistency of the magnitude they observed a point addressed later in this article. Further, heterogeneous skew creates heterogeneity in item variances, and differences in item variances also affect Cronbach s alpha. Therefore, in the Bandalos and Enders (1996) and Enders and Bandalos (1999) studies, the effects of skew and item heterogeneity are confounded. A growing body of research is devoted to the refining of old reliability indices and the development of new and more robust reliability indices that are affected less by variations in the shape of test or item distributions (e.g., Wagner, Adair, & Alexander, 1990; Wilcox, 1992). Although the development of more robust indices of reliability is laudable, these indices tend to be mathematically complex and difficult for most researchers to use. Given this liability, it would seem efficient, at least initially, to attempt to address the issue of skew in more familiar and simple ways. For skew resulting from a lognormal distribution, one simple method of correcting skew in the sample distribution is to apply a natural log transformation. Another straightforward way to address a skewed distribution is to apply a rank transformation. The effect of these simple transformations on internal consistency indices of reliability for interval-scaled variables has yet to be examined in a systematic way. One purpose of this research is to investigate this question by Monte Carlo techniques. Items from many surveys, however, are not interval scaled but are developed with item responses that map onto some form of ordinal-level scale, most often a Likert-type scale. For example, in a recent compilation of 111 measures of religiosity (Hill & Hood, 1999), the response formats for 80% of the measures were ordinal scaled. The remaining 20% of the measures used mostly openended questions or some form of qualitative forced-choice response. For none of the measures was the predominate response format an interval scale, although some items, like church attendance and monetary contributions, were ratio scaled. Of the measures with a preponderance of Likert-type items, approximately one third used Likert-type scales with fewer than five categories (true false included), another one third of the scales had exactly five categories, and the final one third had more than five categories. It is well-known that categorizing a normally distributed interval-level variable has an adverse impact on the magnitude of the Pearson s product moment correlation coefficient relating the categorized variable with other variables (Wert & Ahmann, 1954, Table 7, p. 429). Multiple studies indicate that, generally, normally distributed Likert-type items with more response categories, at least five or seven (Nunnally, 1978; Bandalos & Enders, 1996), result in greater reliability when compared with items with fewer response categories. However, research concerning the impact of skew for Likert-type items on reliability estimates is limited. Therefore, a second purpose of this research is to explore the effects of skew on internal consistency indices of reliability when items are Likert scaled. Although, as Cortina (1993) expressed, Coefficient alpha (Cronbach, 1951) is certainly one of the most important and pervasive statistics in research involving test construction and use (p. 98), the index of internal consistency that we chose to simulate was the standardized item alpha. Cronbach s alpha will equal standardized item alpha if all item variances are equal. When items are skewed different amounts, the items will tend to have different variances and Cronbach s alpha will be smaller than standardized item alpha. This occurs because Cronbach s alpha is adversely affected by the heterogeneity of item variances. However, this is not the case for standardized item alpha because items are standardized before any reliability computations. Cronbach s alpha computed on standard scores is equivalent to the standardized item

3 RESEARCH REPORTS 1353 alpha. Therefore, standardized item alpha is the more reasonable measure of internal consistency for use in this study where the focus is on the effects of skew and not on the effects of heterogeneity of variances produced by skew. Both indices are related to the average of split-half correlations. In the case where the variances of all items are equal, Cronbach s alpha is equal to the average of all possible split-half correlations corrected to full test length by the Spearman Brown prophecy formula (Brown, 1910; Spearman, 1910), and both are equal to the standardized item alpha. When item variances are unequal, which is a more common condition, Cronbach s alpha is equal to the average of all possible split-half correlations using the Flanagan (1937) and Rulon (1939) formula for split-half reliability. A Note of Caution It is often the case that researchers are leery, and legitimately so, of arbitrarily changing the scale of a variable and, in effect, the relation of the variable with other variables by transforming the variable to achieve normality in the sample distribution. This concern is most warranted when sample sizes are small and samples are less likely to represent the population distribution with precision. However, other concerns exist even when sample sizes are large. After a variable is transformed, the relation between that variable and any other variable will be misleading if it is interpreted in terms of the original variable. This inconvenience must be weighed against the possible misinformation obtained when one estimates the reliability of skewed variables with a procedure that is not robust to violations of the normality assumption. Of course, if theoretical reasons for measuring a variable in a given way exist, one may not want to compromise the interpretability of the variable in the original units by transforming the variable. Transformations to symmetry of one or both variables also may sometimes induce a nonlinear relation between the variables (Dunlap et al., 1994). In fact, Dunlap et al. (1995) found some increases and some decreases in curvilinearity between variables when those variables were transformed to reduce skew. The magnitude of the correlation nevertheless increased after transformation to normality in all 20 bivariate relations examined. Even in light of all of these concerns, given the extent of nonnormality in distributions of variables (Micceri, 1989) and given that most of our parametric statistical procedures assume normality, transformation to symmetry prior to analyzing the variables should at least be considered. Method To determine whether item skew affects the standardized item alpha, we conducted two series of simulations. In the first series, standard normal items were generated, then subsets of items were skewed with known amounts of population skew. In the second series of simulations, standard normal items were generated and transformed to skewed Likert-type items. For all simulations, 10,000 samples were generated and the average sample skews, interitem correlations, and standardized item alphas across the samples were computed. When simulation conditions called for item correlations that were greater than 0, a set of variables, y i, was generated with a common population intercorrelation, rho. Rho among the y variables was produced using a procedure from Knapp and Swoyer (1967). Independent standard normal values w and x i first were generated and y i was computed as y i 1/ 2 w 1 1/ 2 x i. (1) When simulation conditions called for skewed items to be produced from standard normal items, skew in the population was produced by exponential transformations applied to normal values. If x is a normal variable, then an exponential transformations of the form y e x, where e is the base of the Naperian logarithms, produces a lognormal variable that has positive skew. To produce negative skew, the transformation is y (e x ). If the variance of x is v, then skew for the lognormal distribution is given by the following equation (from Aitchison & Brown, 1957, and simplified by Dunlap et al., 1994): skew e v 1 1/ 2 e v 2. (2) Theoretical skew, however, had little practical importance in defining the extent of sample skew in these simulations. This is because the extent of skew in a sample is dependent on the sample size. To achieve a stable and accurate approximation of the population skew given in Equation 2 from sample data, the sample size must be very large (greater than or equal to 1,000,000 from preliminary simulations), and the larger the population skew, the larger the sample size must be. The simulations presented here are based on a sample of 100 observations, so the population skew is underestimated by the actual sample skew. Therefore, for these simulations, the empirical average sample skew was reported as an indicator of the extent of skew in the samples. To produce skewed Likert-type variables, first standard normal variables were generated as before, having a mean of 0 and a standard deviation of 1. The variables were correlated, and the mean of the variables was shifted away from 0 by adding or subtracting a constant from each score. This transformation changed only the mean of the scores. Next, cutpoints were established across the range of a standard normal distribution (M 0, SD 1) such that when the cutpoints were applied to the distribution having a nonzero mean, a skewed Likert variable was produced. When the cutpoints were applied to a distribution having a mean greater than 0, negative skew was produced in the distribution of the Likert variable, and for a mean less than 0, positive skew resulted. The code for all computations was written in FORTRAN programming language, compiled with an XL FORTRAN Compiler/6000 Version 3, and all computations were executed on an IBM AIX RISC System/6000. Standard normal variables were generated using the subroutine RNNOF from the International Mathematical and Statistical Library (Visual Numerics, Inc., 1997), which contains mathematical and statistical functions for use with a FORTRAN compiler. All variables and all functions were executed in double precision, resulting in an approximate precision of 15 decimal digits. The accuracy of the output generated with the FORTRAN code was checked by submitting results to SAS procedures UNIVARI- ATE which served as a check on the skew computations and CORR with ALPHA as an option which served as a check on the pairwise Pearson product moment correlations computed between all items as well as a check on computations of the standardized item alpha. Comparisons of results from the FORTRAN code used for these simulations and results from SAS indicated that results from the code were accurate to the decimal places output by the SAS procedures. Simulation Set 1: Continuous Items In the first series of simulations, the effects of skew were investigated by inducing various amounts of skew in different sets of 6, 10, and 20 items. The samples had the following characteristics: 100 observations; interitem population correlations of.1,.3,.5, or.7; items placed into one group of 6 (or 10 or 20) items or divided into two groups of 3 (or 5 or 10) items each, for which the average sample skew differed in both direction and magnitude. The grouping and skew conditions were as follows: When items were placed into a single group, average sample skew values were 2.7, 4.4, or 5.9. When items were divided into two groups, opposite skew was produced, with skew values of 2.7, 4.4, or 5.9. When all items were placed into only one group and then skewed, this produced a group of items similarly skewed (i.e., having the same population skew). This situation

4 1354 RESEARCH REPORTS served to simulate the condition of an easy (or hard) test or, alternatively, a group of lenient (or severe) raters. Groups of opposite-skewed items simulated the condition of a test high in discriminability with both easy and hard items, such as an IQ test, or a mixture of lenient and severe raters. Simulation Set 2: Likert-Type Items In a second set of simulations, Likert-type variables were generated, correlated, and then skewed. Means of 0.0, 0.5, 1.0, 1.5, and 2.0, which correspond to the number of standard deviation units away from the population mean of 0, were used to introduce the following levels of skew into the Likert-type variables: 0.0, 0.6, 1.3, 2.2, and 3.5, respectively. These were the skew values produced when rho equaled 0.0 and the number of Likert categories equaled five. Often with five categories, larger skew resulted in no variance among the category frequencies for items. When this happened, there were observations with values piled at the lowest category value when the mean was set at 2.0 and at the highest category value when the mean was set at 2.0. When this occurred, the iteration was repeated until there was variance among observations. 1 The samples from this series of Likert simulations had the following characteristics: 100 observations; interitem population correlations of.1,.3,.5, or.7 for the standard normal variables prior to transformation to Likert-type variables; items placed into one group of 6 (or 10 or 20) items or divided into two groups of 3 (or 5 or 10) items each for which the average sample skew differed in both direction and magnitude; five levels for the Likert-type items. The grouping and skew conditions were as follows: When items were placed into a single group, average sample skew values were 0.0, 0.6, 1.3, 2.2, or 3.5. When items were divided into two groups, opposite skew was produced, with skew values of 0.6, 1.3, 2.2, or 3.5. Results Simulation Set 1: Continuous Items In the first set of simulations, correlated items were placed into one group or divided into two groups of items and evaluated at three skew ranges. For 100 observations and 6, 10, and 20 items, the average interitem correlations and the standardized item alphas were computed on lognormal, standard normal, and ranked variables at population correlations ranging from 0 to.7 and at each of the three skew ranges. Because skew and rho affect interitem correlations and alpha is related to the average of interitem correlations, the effects of skew and rho on interitem correlations are also presented, as is the relation between alpha and the average of interitem correlations. Standardized item alpha. The effects of skew on alpha are evident in Figure 1. As skew increased, alpha decreased. The impact of skew was most dramatic when there were fewer items (6), rho was small, skew was larger (5.9), and items were divided into two groups and skewed in opposite directions. Alphas from normal variables and from ranks were larger than alpha from lognormal variables in all cases. The difference in alphas between normal and skewed data ranged from.02 for 20 items when there was same-direction skew and rho equaled.70, to.31 for 6 items when there was opposite-direction skew and rho equaled.30. Ranking the skewed items produced alphas that differed from those of standard normal data by no more than.01. Average interitem correlation. The relation between skew and the average interitem correlation has been investigated elsewhere (Calkins, 1974; Carroll, 1961; Dunlap et al., 1995; Kowalski, 1972) and with the same results as obtained from the simulations presented here. Increases in skew resulted in decreases in average pairwise correlations among lognormal variables. Not surprisingly, increases in rho resulted in increases in average intercorrelations for both standard normal and lognormal variables. For all conditions, however, average interitem correlations for lognormal variables were smaller than for standard normal variables and for ranks, as is evident in Figure 2. The relation between the average interitem correlation and the standardized item alpha is presented for 6, 10, and 20 items in Figure 3. Simulation Set 2: Likert-Type Items Figure 4 contains results from the second set of simulations, where Likert-type items with five categories were skewed in the same or opposite directions and evaluated at three skew ranges. For each of these conditions and for each of the population correlations (.1,.3,.5,.7), standardized item alphas and average interitem correlations were computed. With increases in skew, there were decreases in standardized item alphas. The largest difference in alpha resulting from skew occurred when scores were skewed in opposite directions and rho equaled.3. In that condition, the difference in alpha for no skew versus skew of 3.5 was.27 when the number of items was six. As can be seen in Figure 4, opposite-direction skew had a greater impact on the standardized item alpha than did same-direction skew. Both forms of skew, however, impacted the standardized item alpha. In fact, for the largest skew value simulated (3.5), when there were only six items and the items were skewed in opposite directions, the standardized item alpha did not reach.8 even with a large value for rho (.7). As would be predicted, even with no skew, the standardized item alphas for Likert-type variables were decreased compared with those for continuous variables. For example, standardized item alphas ranged from.40 to.93 for continuous variables versus.37 to.92 for Likert-type variables when the number of items was 6. These are relatively small differences, and even less of a discrepancy was observed with a larger number of items (20 items). 2 Discussion The main purpose of this study was to investigate, with Monte Carlo simulation techniques, the effects of skew on the standardized item alpha, an index of internal consistency that is related to Cronbach s alpha. Further, because the foundation for the standardized item alpha is the Pearson product moment correlation and skew has been shown to affect this statistic (Calkins, 1974; Carroll, 1961; Dunlap et al., 1995; Kowalski, 1972), the effects of skew on the average interitem correlation also were examined. Two series of simulations were designed, the first to examine the effects of skew on continuous variables the extent of skew, the number of different groups of skew, and the population correlation were varied. In a second series of simulations, Likert-type items were skewed and the effects on standardized item alpha were examined. 1 This lack of item variance was not so problematic with seven and nine levels of the Likert-type variable. 2 Even less of a discrepancy in alpha between nonskewed and skewed data occurred when there were seven and nine levels of the Likert-type variable.

5 Figure 1. Results from Simulation Set 1: The impact of same- versus opposite-direction skew for lognormal variables having 6, 10, and 20 items; interitem population correlation (rho).1,.3,.5, and.7; skew values ranging from 0 to 5.9; and ranks of skewed items.

6 1356 RESEARCH REPORTS Figure 2. The effect of rho and skew on the average interitem correlation. Results from all simulations revealed that skew produced decreases in alpha and decreases in the average interitem correlation. Also, in the simulations with Likert-type variables, larger skew values sometimes resulted in no variance among the 100 scores for a five-category item. Additional simulations revealed that this happened less frequently for Likert-type items with more than five categories. Because the average sample skews were fairly large for these simulations (ranging from 2.9 to 5.7 for continuous variables and from 0.6 to 3.6 for Likert-type variables), these results may not be equally applicable to all areas of research. For instance, among economic variables, skew values of 6 may not be aberrant, but among behavioral measures, skew of that magnitude may be (and probably is) quite rare. Further, opposite-direction skew may only occur on some specific types of measures. Measures high in discriminability may have items that are severely skewed with skew differing in direction for different items. For example, items from the Information subtest of Weschler s Adult Intelligence Scale III, administered to 136 individuals and collected as part of a larger study examining personality variables and aggression, had skew values for items ranging from to 3.30 (Berman & Greer, 2001). Other IQ subtests had items with similar skew and skew that varied in direction. In general, items that were negatively skewed represented those items administered early on in the subtest that were answered correctly by most individuals. Items that were answered incorrectly by most individuals were positively skewed. Several items were answered correctly (or incorrectly) by everyone, and skew could not be computed because there was no variance among scores. Results from these simulations are encouraging for researchers using the standardized item alpha who have minor skew and skew in one direction. Even more encouraging is the knowledge that transforming skewed item distributions to meet the normality assumption or ranking scores on continuous items may further improve the estimate of internal consistency. Practical Implications Even seemingly trivial decreases in internal consistency, such as was found for alpha computed on items with same-direction skew, however, may have important practical implications. Many decisions about measures are based on their relative levels of internal consistency (Stanton, Sinclair, Balzer, & Smith, 2002). With other things held constant, a measure with a higher level of internal consistency is generally viewed as being better than a measure with lower internal consistency. Put in more concrete terms, a practitioner choosing between two measures with similar validity but different levels of internal consistency will invariably choose the measure with greater reliability. Moreover, decisions about further developing a measure or halting development in favor of an alternate measure can, in large part, be based on the scale s internal consistency. These yes no decisions about measures, for better or worse, are largely determined by internal consistency standards standards that often stand as the border between an unacceptable measure and an acceptable one. Nunnally and Bernstein (1994) suggested that, in practice, a scale is acceptable if it achieves internal consistency of.70 in its early or developmental stages. The same scale, after corrections and revisions, generally requires an alpha of.80 to be considered appropriate and fit for use. Finally, a scale used for making individual decisions such as those in selection and hiring should have an alpha of at least.90 to ensure proper and appropriate decision making. Although these cutoffs were initially established as guidelines, they have, in practice, become more rules than guidelines. For example, the difference between a scale achieving an alpha of.66 and a scale achieving an alpha of.75 coefficients observed in the study by Enders and Bandalos (1999) may easily be the difference between further pursuing the measure or abandoning it in favor of a new one. Differences that are relatively small in magnitude (Enders and Bandalos, 1999, p. 147), therefore, may have a profound consequence for measurement development. Conclusions and Recommendations Skew may negatively affect scale development, interpretability, and application in several ways. First, skew appears to Figure 3. The relation between the average interitem correlation and standardized item alpha for lognormal variables averaged across skew levels having 6, 10, and 20 items with interitem population correlation (rho).1,.3,.5, and.7.

7 RESEARCH REPORTS 1357 Figure 4. Results from Simulation Set 2: The impact of same- versus opposite-direction skew for Likert-type variables having 6, 10, and 20 items; interitem population correlation (rho).1,.3,.5, and.7; and skew values ranging from 0 to 3.5. decrease internal consistency, and we would argue that the magnitude of the decrease has practical implications. Item skew may affect factor analysis results and may produce skew factors that are not content based. Deleting skewed items to maximize internal consistency may lower scale validity (Enders & Bandalos, 1999; Stanton et al., 2002) or affect discriminability. These issues are important for researchers and may be of even greater significance to practitioners who develop measures based on these statistical analyses. In light of findings from this study, we offer these recommendations to help ameliorate the negative effects of skew. First, if a practitioner chooses to use Likert-type scaling, we recommend

8 1358 RESEARCH REPORTS using more rather than fewer levels. Results from additional simulations not presented in this article, as well as past research, have demonstrated that the negative effects of skew are reduced as the number of levels for the items increases, even with increases of more than five categories. Second, transforming skewed variables may reduce the negative effects of skew. If the shape of the underlying skewed distribution is unknown, then ranking is a fairly simple, very straightforward transformation, and the standardized item alpha is minimally affected by ranking. References Aitchison, J., & Brown, J. A. C. (1957). The lognormal distribution with special reference to its uses in economics. Cambridge, England: Cambridge University Press. Bandalos, D. L., & Enders, C. K. (1996). The effects of nonnormality and number of response categories on reliability. Applied Measurement in Education, 9, Berman, M. E., & Greer, T. (2001). [IQ data collected as part of a larger study on the relation between neurobiology and adult psychopathology]. Unpublished raw data. Brown, W. (1910). Some experimental results in the correlation of mental abilities. British Journal of Psychology, 3, Calkins, D. S. (1974). Some effects of non-normal distribution shape on the magnitude of the Pearson product moment correlation coefficient. Interamerican Journal of Psychology, 8, Carroll, J. B. (1961). The nature of the data, or how to choose a correlation coefficient. Psychometrika, 26, Corballis, M. C. (1968). Some difficulties with difficulty: Note on Horst s matrix factoring and test theory. Psychological Reports, 22, Cortina, J. M. (1993). What is coefficient alpha? An examination of theory and applications. Journal of Applied Psychology, 78, Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, Dingman, H. F. (1958). The relation between coefficients of correlation and difficulty factors. British Journal of Statistical Psychology, 11, Dunlap, W. P., Burke, M. J., & Greer, T. (1995). The effect of skew on the magnitude of product moment correlations. Journal of General Psychology, 122, Dunlap, W. P., Chen, R., & Greer, T. (1994). Skew reduces test retest reliability. Journal of Applied Psychology, 79, Enders, C. K., & Bandalos, D. L. (1999). The effects of heterogeneous item distributions on reliability. Applied Measurement in Education, 12, Flanagan, J. C. (1937). A proposed procedure for increasing the efficiency of objective tests. Journal of Educational Psychology, 28, Hattie, J. A. (1985). Methodology review: Assessing unidimensionality of tests and items. Applied Psychological Measurement, 9, Hill, P. C., & Hood, R. W., Jr. (1999). Measures of religiosity. Birmingham, AL: Religious Education Press. Horst, P. (1953). Correcting the Kuder Richardson reliability for dispersion of item difficulties. Psychological Bulletin, 50, Knapp, T. R., & Swoyer, V. H. (1967). Some empirical results concerning the power of Bartlett s test of the significance of a correlation matrix. American Educational Research Journal, 4, Kowalski, C. J. (1972). On the effects of non-normality on the distribution of the sample product moment correlation coefficient. Applied Statistics, 21, Kuder, G. F., & Richardson, M. W. (1937). The theory of the estimation of a test reliability. Psychometrika, 2, Loevinger, J. (1947). A systematic approach to the construction and evaluation of test of ability. Psychological Monographs, 61(4). Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105, Nunnally, J. C. (1978). Psychometric theory. New York: McGraw-Hill. Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). New York: McGraw-Hill. Rulon, P. J. (1939). A simplified procedure for determining the reliability of a test by split-halves. Harvard Educational Review, 9, Spearman, C. (1910). Correlation calculated with faulty data. British Journal of Psychology, 3, Stanton, J. M., Sinclair, E. F., Balzer, W. K., & Smith, P. C. (2002). Issues and strategies for reducing the length of self-report scales. Personnel Psychology, 55, Tabachnick, B. G., & Fidell, L. S. (2001). Using multivariate statistics (4th ed.). Needham Heights, MA: Allyn & Bacon. TenVergert, E., Kingma, J., & Gillespie, M. W. (1990). Dichotomous items and extreme item difficulties: Factor analysis of the conflict tactics scale. Methodika, 4, Visual Numerics, Inc. (1997). International Mathematical and Statistical Library [Computer software]. Houston, TX: Author. Wagner, E. E., Adair, H. E., & Alexander, R. A. (1990). An empirical demonstration of the stability of the maximized correlation as an internal-consistency reliability estimate for tests of small item size. Educational and Psychological Measurement, 50, Wert, J. E., & Ahmann, C. O. (1954). Statistical methods in educational and psychological research. New York: Appleton-Century-Crofts. Wilcox, R. R. (1992). Robust generalizations of classical test reliability and Cronbach s alpha. British Journal of Mathematical and Statistical Psychology, 45, Received May 29, 2003 Revision received May 20, 2005 Accepted September 13, 2005

LANGUAGE TEST RELIABILITY On defining reliability Sources of unreliability Methods of estimating reliability Standard error of measurement Factors

LANGUAGE TEST RELIABILITY On defining reliability Sources of unreliability Methods of estimating reliability Standard error of measurement Factors LANGUAGE TEST RELIABILITY On defining reliability Sources of unreliability Methods of estimating reliability Standard error of measurement Factors affecting reliability ON DEFINING RELIABILITY Non-technical

More information

Internal Consistency: Do We Really Know What It Is and How to. Assess it?

Internal Consistency: Do We Really Know What It Is and How to. Assess it? Internal Consistency: Do We Really Know What It Is and How to Assess it? Wei Tang, Ying Cui, CRAME, Department of Educational Psychology University of Alberta, CANADA The term internal consistency has

More information

A Comparison of Several Goodness-of-Fit Statistics

A Comparison of Several Goodness-of-Fit Statistics A Comparison of Several Goodness-of-Fit Statistics Robert L. McKinley The University of Toledo Craig N. Mills Educational Testing Service A study was conducted to evaluate four goodnessof-fit procedures

More information

The effects of ordinal data on coefficient alpha

The effects of ordinal data on coefficient alpha James Madison University JMU Scholarly Commons Masters Theses The Graduate School Spring 2015 The effects of ordinal data on coefficient alpha Kathryn E. Pinder James Madison University Follow this and

More information

The Impact of Continuity Violation on ANOVA and Alternative Methods

The Impact of Continuity Violation on ANOVA and Alternative Methods Journal of Modern Applied Statistical Methods Volume 12 Issue 2 Article 6 11-1-2013 The Impact of Continuity Violation on ANOVA and Alternative Methods Björn Lantz Chalmers University of Technology, Gothenburg,

More information

Standard Scores. Richard S. Balkin, Ph.D., LPC-S, NCC

Standard Scores. Richard S. Balkin, Ph.D., LPC-S, NCC Standard Scores Richard S. Balkin, Ph.D., LPC-S, NCC 1 Normal Distributions While Best and Kahn (2003) indicated that the normal curve does not actually exist, measures of populations tend to demonstrate

More information

Examining the Psychometric Properties of The McQuaig Occupational Test

Examining the Psychometric Properties of The McQuaig Occupational Test Examining the Psychometric Properties of The McQuaig Occupational Test Prepared for: The McQuaig Institute of Executive Development Ltd., Toronto, Canada Prepared by: Henryk Krajewski, Ph.D., Senior Consultant,

More information

Effects of the Number of Response Categories on Rating Scales

Effects of the Number of Response Categories on Rating Scales NUMBER OF RESPONSE CATEGORIES 1 Effects of the Number of Response Categories on Rating Scales Roundtable presented at the annual conference of the American Educational Research Association, Vancouver,

More information

On the purpose of testing:

On the purpose of testing: Why Evaluation & Assessment is Important Feedback to students Feedback to teachers Information to parents Information for selection and certification Information for accountability Incentives to increase

More information

Estimating Individual Rater Reliabilities John E. Overall and Kevin N. Magee University of Texas Medical School

Estimating Individual Rater Reliabilities John E. Overall and Kevin N. Magee University of Texas Medical School Estimating Individual Rater Reliabilities John E. Overall and Kevin N. Magee University of Texas Medical School Rating scales have no inherent reliability that is independent of the observers who use them.

More information

Regression Discontinuity Analysis

Regression Discontinuity Analysis Regression Discontinuity Analysis A researcher wants to determine whether tutoring underachieving middle school students improves their math grades. Another wonders whether providing financial aid to low-income

More information

Basic concepts and principles of classical test theory

Basic concepts and principles of classical test theory Basic concepts and principles of classical test theory Jan-Eric Gustafsson What is measurement? Assignment of numbers to aspects of individuals according to some rule. The aspect which is measured must

More information

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2 MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and Lord Equating Methods 1,2 Lisa A. Keller, Ronald K. Hambleton, Pauline Parker, Jenna Copella University of Massachusetts

More information

Construct Reliability and Validity Update Report

Construct Reliability and Validity Update Report Assessments 24x7 LLC DISC Assessment 2013 2014 Construct Reliability and Validity Update Report Executive Summary We provide this document as a tool for end-users of the Assessments 24x7 LLC (A24x7) Online

More information

Sample Sizes for Predictive Regression Models and Their Relationship to Correlation Coefficients

Sample Sizes for Predictive Regression Models and Their Relationship to Correlation Coefficients Sample Sizes for Predictive Regression Models and Their Relationship to Correlation Coefficients Gregory T. Knofczynski Abstract This article provides recommended minimum sample sizes for multiple linear

More information

CHAPTER VI RESEARCH METHODOLOGY

CHAPTER VI RESEARCH METHODOLOGY CHAPTER VI RESEARCH METHODOLOGY 6.1 Research Design Research is an organized, systematic, data based, critical, objective, scientific inquiry or investigation into a specific problem, undertaken with the

More information

Measurement and Descriptive Statistics. Katie Rommel-Esham Education 604

Measurement and Descriptive Statistics. Katie Rommel-Esham Education 604 Measurement and Descriptive Statistics Katie Rommel-Esham Education 604 Frequency Distributions Frequency table # grad courses taken f 3 or fewer 5 4-6 3 7-9 2 10 or more 4 Pictorial Representations Frequency

More information

Numerical Integration of Bivariate Gaussian Distribution

Numerical Integration of Bivariate Gaussian Distribution Numerical Integration of Bivariate Gaussian Distribution S. H. Derakhshan and C. V. Deutsch The bivariate normal distribution arises in many geostatistical applications as most geostatistical techniques

More information

Chapter 3. Psychometric Properties

Chapter 3. Psychometric Properties Chapter 3 Psychometric Properties Reliability The reliability of an assessment tool like the DECA-C is defined as, the consistency of scores obtained by the same person when reexamined with the same test

More information

Investigating the robustness of the nonparametric Levene test with more than two groups

Investigating the robustness of the nonparametric Levene test with more than two groups Psicológica (2014), 35, 361-383. Investigating the robustness of the nonparametric Levene test with more than two groups David W. Nordstokke * and S. Mitchell Colp University of Calgary, Canada Testing

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

Research and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida

Research and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida Vol. 2 (1), pp. 22-39, Jan, 2015 http://www.ijate.net e-issn: 2148-7456 IJATE A Comparison of Logistic Regression Models for Dif Detection in Polytomous Items: The Effect of Small Sample Sizes and Non-Normality

More information

Collecting & Making Sense of

Collecting & Making Sense of Collecting & Making Sense of Quantitative Data Deborah Eldredge, PhD, RN Director, Quality, Research & Magnet Recognition i Oregon Health & Science University Margo A. Halm, RN, PhD, ACNS-BC, FAHA Director,

More information

Statistics for Psychosocial Research Session 1: September 1 Bill

Statistics for Psychosocial Research Session 1: September 1 Bill Statistics for Psychosocial Research Session 1: September 1 Bill Introduction to Staff Purpose of the Course Administration Introduction to Test Theory Statistics for Psychosocial Research Overview: a)

More information

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1 Nested Factor Analytic Model Comparison as a Means to Detect Aberrant Response Patterns John M. Clark III Pearson Author Note John M. Clark III,

More information

THE USE OF CRONBACH ALPHA RELIABILITY ESTIMATE IN RESEARCH AMONG STUDENTS IN PUBLIC UNIVERSITIES IN GHANA.

THE USE OF CRONBACH ALPHA RELIABILITY ESTIMATE IN RESEARCH AMONG STUDENTS IN PUBLIC UNIVERSITIES IN GHANA. Africa Journal of Teacher Education ISSN 1916-7822. A Journal of Spread Corporation Vol. 6 No. 1 2017 Pages 56-64 THE USE OF CRONBACH ALPHA RELIABILITY ESTIMATE IN RESEARCH AMONG STUDENTS IN PUBLIC UNIVERSITIES

More information

HPS301 Exam Notes- Contents

HPS301 Exam Notes- Contents HPS301 Exam Notes- Contents Week 1 Research Design: What characterises different approaches 1 Experimental Design 1 Key Features 1 Criteria for establishing causality 2 Validity Internal Validity 2 Threats

More information

Class 7 Everything is Related

Class 7 Everything is Related Class 7 Everything is Related Correlational Designs l 1 Topics Types of Correlational Designs Understanding Correlation Reporting Correlational Statistics Quantitative Designs l 2 Types of Correlational

More information

ESTABLISHING VALIDITY AND RELIABILITY OF ACHIEVEMENT TEST IN BIOLOGY FOR STD. IX STUDENTS

ESTABLISHING VALIDITY AND RELIABILITY OF ACHIEVEMENT TEST IN BIOLOGY FOR STD. IX STUDENTS International Journal of Educational Science and Research (IJESR) ISSN(P): 2249-6947; ISSN(E): 2249-8052 Vol. 4, Issue 4, Aug 2014, 29-36 TJPRC Pvt. Ltd. ESTABLISHING VALIDITY AND RELIABILITY OF ACHIEVEMENT

More information

Reliability and Validity checks S-005

Reliability and Validity checks S-005 Reliability and Validity checks S-005 Checking on reliability of the data we collect Compare over time (test-retest) Item analysis Internal consistency Inter-rater agreement Compare over time Test-Retest

More information

11/24/2017. Do not imply a cause-and-effect relationship

11/24/2017. Do not imply a cause-and-effect relationship Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are highly extraverted people less afraid of rejection

More information

Meeting-5 MEASUREMENT 8-1

Meeting-5 MEASUREMENT 8-1 Meeting-5 MEASUREMENT 8-1 Measurement Measurement Process: 1. Selecting observable empirical events 2. Using numbers or symbols to represent aspects of the events being measured 3. Applying a mapping rule

More information

Business Statistics Probability

Business Statistics Probability Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

Issues Surrounding the Normalization and Standardisation of Skin Conductance Responses (SCRs).

Issues Surrounding the Normalization and Standardisation of Skin Conductance Responses (SCRs). Issues Surrounding the Normalization and Standardisation of Skin Conductance Responses (SCRs). Jason J. Braithwaite {Behavioural Brain Sciences Centre, School of Psychology, University of Birmingham, UK}

More information

Daniel Boduszek University of Huddersfield

Daniel Boduszek University of Huddersfield Daniel Boduszek University of Huddersfield d.boduszek@hud.ac.uk Introduction to Correlation SPSS procedure for Pearson r Interpretation of SPSS output Presenting results Partial Correlation Correlation

More information

ADMS Sampling Technique and Survey Studies

ADMS Sampling Technique and Survey Studies Principles of Measurement Measurement As a way of understanding, evaluating, and differentiating characteristics Provides a mechanism to achieve precision in this understanding, the extent or quality As

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Please note the page numbers listed for the Lind book may vary by a page or two depending on which version of the textbook you have. Readings: Lind 1 11 (with emphasis on chapters 10, 11) Please note chapter

More information

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Greg Pope, Analytics and Psychometrics Manager 2008 Users Conference San Antonio Introduction and purpose of this session

More information

Still important ideas

Still important ideas Readings: OpenStax - Chapters 1 13 & Appendix D & E (online) Plous Chapters 17 & 18 - Chapter 17: Social Influences - Chapter 18: Group Judgments and Decisions Still important ideas Contrast the measurement

More information

On the Practice of Dichotomization of Quantitative Variables

On the Practice of Dichotomization of Quantitative Variables Psychological Methods Copyright 2002 by the American Psychological Association, Inc. 2002, Vol. 7, No. 1, 19 40 1082-989X/02/$5.00 DOI: 10.1037//1082-989X.7.1.19 On the Practice of Dichotomization of Quantitative

More information

Measure twice and cut once: the carpenter s rule still applies

Measure twice and cut once: the carpenter s rule still applies Mark Lett (2015) 26:237 243 DOI 10.1007/s11002-014-9298-x Measure twice and cut once: the carpenter s rule still applies Wagner A. Kamakura Published online: 27 April 2014 # Springer Science+Business Media

More information

Appendix B Statistical Methods

Appendix B Statistical Methods Appendix B Statistical Methods Figure B. Graphing data. (a) The raw data are tallied into a frequency distribution. (b) The same data are portrayed in a bar graph called a histogram. (c) A frequency polygon

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

Still important ideas

Still important ideas Readings: OpenStax - Chapters 1 11 + 13 & Appendix D & E (online) Plous - Chapters 2, 3, and 4 Chapter 2: Cognitive Dissonance, Chapter 3: Memory and Hindsight Bias, Chapter 4: Context Dependence Still

More information

02a: Test-Retest and Parallel Forms Reliability

02a: Test-Retest and Parallel Forms Reliability 1 02a: Test-Retest and Parallel Forms Reliability Quantitative Variables 1. Classic Test Theory (CTT) 2. Correlation for Test-retest (or Parallel Forms): Stability and Equivalence for Quantitative Measures

More information

By Hui Bian Office for Faculty Excellence

By Hui Bian Office for Faculty Excellence By Hui Bian Office for Faculty Excellence 1 Email: bianh@ecu.edu Phone: 328-5428 Location: 1001 Joyner Library, room 1006 Office hours: 8:00am-5:00pm, Monday-Friday 2 Educational tests and regular surveys

More information

A Comparison of Pseudo-Bayesian and Joint Maximum Likelihood Procedures for Estimating Item Parameters in the Three-Parameter IRT Model

A Comparison of Pseudo-Bayesian and Joint Maximum Likelihood Procedures for Estimating Item Parameters in the Three-Parameter IRT Model A Comparison of Pseudo-Bayesian and Joint Maximum Likelihood Procedures for Estimating Item Parameters in the Three-Parameter IRT Model Gary Skaggs Fairfax County, Virginia Public Schools José Stevenson

More information

Research Questions and Survey Development

Research Questions and Survey Development Research Questions and Survey Development R. Eric Heidel, PhD Associate Professor of Biostatistics Department of Surgery University of Tennessee Graduate School of Medicine Research Questions 1 Research

More information

Maximizing the Accuracy of Multiple Regression Models using UniODA: Regression Away From the Mean

Maximizing the Accuracy of Multiple Regression Models using UniODA: Regression Away From the Mean Maximizing the Accuracy of Multiple Regression Models using UniODA: Regression Away From the Mean Paul R. Yarnold, Ph.D., Fred B. Bryant, Ph.D., and Robert C. Soltysik, M.S. Optimal Data Analysis, LLC

More information

alternate-form reliability The degree to which two or more versions of the same test correlate with one another. In clinical studies in which a given function is going to be tested more than once over

More information

26:010:557 / 26:620:557 Social Science Research Methods

26:010:557 / 26:620:557 Social Science Research Methods 26:010:557 / 26:620:557 Social Science Research Methods Dr. Peter R. Gillett Associate Professor Department of Accounting & Information Systems Rutgers Business School Newark & New Brunswick 1 Overview

More information

PTHP 7101 Research 1 Chapter Assignments

PTHP 7101 Research 1 Chapter Assignments PTHP 7101 Research 1 Chapter Assignments INSTRUCTIONS: Go over the questions/pointers pertaining to the chapters and turn in a hard copy of your answers at the beginning of class (on the day that it is

More information

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form INVESTIGATING FIT WITH THE RASCH MODEL Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form of multidimensionality. The settings in which measurement

More information

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics Biost 517 Applied Biostatistics I Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 3: Overview of Descriptive Statistics October 3, 2005 Lecture Outline Purpose

More information

Jeremy F. Dawson Aston Business School Aston University Birmingham, B4 7ET Phone: Fax:

Jeremy F. Dawson Aston Business School Aston University Birmingham, B4 7ET Phone: Fax: On the Use of Likert Scales 1 Running head: LIKERT SCALES IN MULTILEVEL RESEARCH On the Use of Likert Scales in Multilevel Data: Influence on Aggregate Variables Daniel J. Beal Department of Psychology

More information

The Effect of Guessing on Item Reliability

The Effect of Guessing on Item Reliability The Effect of Guessing on Item Reliability under Answer-Until-Correct Scoring Michael Kane National League for Nursing, Inc. James Moloney State University of New York at Brockport The answer-until-correct

More information

Collecting & Making Sense of

Collecting & Making Sense of Collecting & Making Sense of Quantitative Data Deborah Eldredge, PhD, RN Director, Quality, Research & Magnet Recognition i Oregon Health & Science University Margo A. Halm, RN, PhD, ACNS-BC, FAHA Director,

More information

Simultaneous Gamma Prediction Limits for Ground Water Monitoring Applications

Simultaneous Gamma Prediction Limits for Ground Water Monitoring Applications Simultaneous Gamma Prediction Limits for Ground Water Monitoring Applications by Robert D. Gibbons and Dulal K. Bhaumi Abstract Common problems in the analysis of environmental monitoring data are nonnormal

More information

Small Group Presentations

Small Group Presentations Admin Assignment 1 due next Tuesday at 3pm in the Psychology course centre. Matrix Quiz during the first hour of next lecture. Assignment 2 due 13 May at 10am. I will upload and distribute these at the

More information

Issues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy

Issues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy Industrial and Organizational Psychology, 3 (2010), 489 493. Copyright 2010 Society for Industrial and Organizational Psychology. 1754-9426/10 Issues That Should Not Be Overlooked in the Dominance Versus

More information

Reliability Analysis: Its Application in Clinical Practice

Reliability Analysis: Its Application in Clinical Practice Reliability Analysis: Its Application in Clinical Practice NahathaiWongpakaran Department of Psychiatry, Faculty of Medicine Chiang Mai University, Thailand TinakonWongpakaran Department of Psychiatry,

More information

Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking

Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking Jee Seon Kim University of Wisconsin, Madison Paper presented at 2006 NCME Annual Meeting San Francisco, CA Correspondence

More information

Development and Psychometric Properties of the Relational Mobility Scale for the Indonesian Population

Development and Psychometric Properties of the Relational Mobility Scale for the Indonesian Population Development and Psychometric Properties of the Relational Mobility Scale for the Indonesian Population Sukaesi Marianti Abstract This study aims to develop the Relational Mobility Scale for the Indonesian

More information

Score Tests of Normality in Bivariate Probit Models

Score Tests of Normality in Bivariate Probit Models Score Tests of Normality in Bivariate Probit Models Anthony Murphy Nuffield College, Oxford OX1 1NF, UK Abstract: A relatively simple and convenient score test of normality in the bivariate probit model

More information

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz This study presents the steps Edgenuity uses to evaluate the reliability and validity of its quizzes, topic tests, and cumulative

More information

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison Empowered by Psychometrics The Fundamentals of Psychometrics Jim Wollack University of Wisconsin Madison Psycho-what? Psychometrics is the field of study concerned with the measurement of mental and psychological

More information

Intransitivity on Paired-Comparisons Instruments:

Intransitivity on Paired-Comparisons Instruments: Intransitivity on Paired-Comparisons Instruments: The Relationship of the Total Circular Triad Score to Stimulus Circular Triads Darwin D. Hendel Measurement Services Center, University of Minnesota Intransitivity

More information

Overview of Non-Parametric Statistics

Overview of Non-Parametric Statistics Overview of Non-Parametric Statistics LISA Short Course Series Mark Seiss, Dept. of Statistics April 7, 2009 Presentation Outline 1. Homework 2. Review of Parametric Statistics 3. Overview Non-Parametric

More information

Statistical Methods and Reasoning for the Clinical Sciences

Statistical Methods and Reasoning for the Clinical Sciences Statistical Methods and Reasoning for the Clinical Sciences Evidence-Based Practice Eiki B. Satake, PhD Contents Preface Introduction to Evidence-Based Statistics: Philosophical Foundation and Preliminaries

More information

Mantel-Haenszel Procedures for Detecting Differential Item Functioning

Mantel-Haenszel Procedures for Detecting Differential Item Functioning A Comparison of Logistic Regression and Mantel-Haenszel Procedures for Detecting Differential Item Functioning H. Jane Rogers, Teachers College, Columbia University Hariharan Swaminathan, University of

More information

Chapter 1: Explaining Behavior

Chapter 1: Explaining Behavior Chapter 1: Explaining Behavior GOAL OF SCIENCE is to generate explanations for various puzzling natural phenomenon. - Generate general laws of behavior (psychology) RESEARCH: principle method for acquiring

More information

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity Measurement & Variables - Initial step is to conceptualize and clarify the concepts embedded in a hypothesis or research question with

More information

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Plous Chapters 17 & 18 Chapter 17: Social Influences Chapter 18: Group Judgments and Decisions

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

so that a respondent may choose one of the categories to express a judgment about some characteristic of an object or of human behavior.

so that a respondent may choose one of the categories to express a judgment about some characteristic of an object or of human behavior. Effects of Verbally Labeled Anchor Points on the Distributional Parameters of Rating Measures Grace French-Lazovik and Curtis L. Gibson University of Pittsburgh The hypothesis was examined that the negative

More information

Internal structure evidence of validity

Internal structure evidence of validity Internal structure evidence of validity Dr Wan Nor Arifin Lecturer, Unit of Biostatistics and Research Methodology, Universiti Sains Malaysia. E-mail: wnarifin@usm.my Wan Nor Arifin, 2017. Internal structure

More information

2 Types of psychological tests and their validity, precision and standards

2 Types of psychological tests and their validity, precision and standards 2 Types of psychological tests and their validity, precision and standards Tests are usually classified in objective or projective, according to Pasquali (2008). In case of projective tests, a person is

More information

accuracy (see, e.g., Mislevy & Stocking, 1989; Qualls & Ansley, 1985; Yen, 1987). A general finding of this research is that MML and Bayesian

accuracy (see, e.g., Mislevy & Stocking, 1989; Qualls & Ansley, 1985; Yen, 1987). A general finding of this research is that MML and Bayesian Recovery of Marginal Maximum Likelihood Estimates in the Two-Parameter Logistic Response Model: An Evaluation of MULTILOG Clement A. Stone University of Pittsburgh Marginal maximum likelihood (MML) estimation

More information

DATA is derived either through. Self-Report Observation Measurement

DATA is derived either through. Self-Report Observation Measurement Data Management DATA is derived either through Self-Report Observation Measurement QUESTION ANSWER DATA DATA may be from Structured or Unstructured questions? Quantitative or Qualitative? Numerical or

More information

VARIABLES AND MEASUREMENT

VARIABLES AND MEASUREMENT ARTHUR SYC 204 (EXERIMENTAL SYCHOLOGY) 16A LECTURE NOTES [01/29/16] VARIABLES AND MEASUREMENT AGE 1 Topic #3 VARIABLES AND MEASUREMENT VARIABLES Some definitions of variables include the following: 1.

More information

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD Psy 427 Cal State Northridge Andrew Ainsworth, PhD Contents Item Analysis in General Classical Test Theory Item Response Theory Basics Item Response Functions Item Information Functions Invariance IRT

More information

An Examination of Culture Bias in the Wonderlic Personnel Test*

An Examination of Culture Bias in the Wonderlic Personnel Test* INTELLIGENCE 1, 51--64 (1977) An Examination of Culture Bias in the Wonderlic Personnel Test* ARTHUR R. JENSEN University of California, Berkeley Internal evidence of cultural bias, in terms of various

More information

THE NATURE OF OBJECTIVITY WITH THE RASCH MODEL

THE NATURE OF OBJECTIVITY WITH THE RASCH MODEL JOURNAL OF EDUCATIONAL MEASUREMENT VOL. II, NO, 2 FALL 1974 THE NATURE OF OBJECTIVITY WITH THE RASCH MODEL SUSAN E. WHITELY' AND RENE V. DAWIS 2 University of Minnesota Although it has been claimed that

More information

The Relative Performance of Full Information Maximum Likelihood Estimation for Missing Data in Structural Equation Models

The Relative Performance of Full Information Maximum Likelihood Estimation for Missing Data in Structural Equation Models University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Educational Psychology Papers and Publications Educational Psychology, Department of 7-1-2001 The Relative Performance of

More information

Chapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE

Chapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE Chapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE 1. When you assert that it is improbable that the mean intelligence test score of a particular group is 100, you are using. a. descriptive

More information

Statistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN

Statistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN Statistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN Vs. 2 Background 3 There are different types of research methods to study behaviour: Descriptive: observations,

More information

Are people with Intellectual disabilities getting more or less intelligent II: US data. Simon Whitaker

Are people with Intellectual disabilities getting more or less intelligent II: US data. Simon Whitaker Are people with Intellectual disabilities getting more or less intelligent II: US data By Simon Whitaker Consultant Clinical Psychologist/Senior Visiting Research Fellow The Learning Disability Research

More information

Chapter 3 Psychometrics: Reliability and Validity

Chapter 3 Psychometrics: Reliability and Validity 34 Chapter 3 Psychometrics: Reliability and Validity Every classroom assessment measure must be appropriately reliable and valid, be it the classic classroom achievement test, attitudinal measure, or performance

More information

Page 1 of 11 Glossary of Terms Terms Clinical Cut-off Score: A test score that is used to classify test-takers who are likely to possess the attribute being measured to a clinically significant degree

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

Chapter 7: Descriptive Statistics

Chapter 7: Descriptive Statistics Chapter Overview Chapter 7 provides an introduction to basic strategies for describing groups statistically. Statistical concepts around normal distributions are discussed. The statistical procedures of

More information

how good is the Instrument? Dr Dean McKenzie

how good is the Instrument? Dr Dean McKenzie how good is the Instrument? Dr Dean McKenzie BA(Hons) (Psychology) PhD (Psych Epidemiology) Senior Research Fellow (Abridged Version) Full version to be presented July 2014 1 Goals To briefly summarize

More information

Methods for Computing Missing Item Response in Psychometric Scale Construction

Methods for Computing Missing Item Response in Psychometric Scale Construction American Journal of Biostatistics Original Research Paper Methods for Computing Missing Item Response in Psychometric Scale Construction Ohidul Islam Siddiqui Institute of Statistical Research and Training

More information

WELCOME! Lecture 11 Thommy Perlinger

WELCOME! Lecture 11 Thommy Perlinger Quantitative Methods II WELCOME! Lecture 11 Thommy Perlinger Regression based on violated assumptions If any of the assumptions are violated, potential inaccuracies may be present in the estimated regression

More information

A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) *

A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) * A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) * by J. RICHARD LANDIS** and GARY G. KOCH** 4 Methods proposed for nominal and ordinal data Many

More information

Discrimination Weighting on a Multiple Choice Exam

Discrimination Weighting on a Multiple Choice Exam Proceedings of the Iowa Academy of Science Volume 75 Annual Issue Article 44 1968 Discrimination Weighting on a Multiple Choice Exam Timothy J. Gannon Loras College Thomas Sannito Loras College Copyright

More information

Six Modifications Of The Aligned Rank Transform Test For Interaction

Six Modifications Of The Aligned Rank Transform Test For Interaction Journal of Modern Applied Statistical Methods Volume 1 Issue 1 Article 13 5-1-2002 Six Modifications Of The Aligned Rank Transform Test For Interaction Kathleen Peterson Macomb Intermediate School District

More information

Manifestation Of Differences In Item-Level Characteristics In Scale-Level Measurement Invariance Tests Of Multi-Group Confirmatory Factor Analyses

Manifestation Of Differences In Item-Level Characteristics In Scale-Level Measurement Invariance Tests Of Multi-Group Confirmatory Factor Analyses Journal of Modern Applied Statistical Methods Copyright 2005 JMASM, Inc. May, 2005, Vol. 4, No.1, 275-282 1538 9472/05/$95.00 Manifestation Of Differences In Item-Level Characteristics In Scale-Level Measurement

More information

1 The conceptual underpinnings of statistical power

1 The conceptual underpinnings of statistical power 1 The conceptual underpinnings of statistical power The importance of statistical power As currently practiced in the social and health sciences, inferential statistics rest solidly upon two pillars: statistical

More information