Journal of Applied Psychology

Size: px
Start display at page:

Download "Journal of Applied Psychology"

Transcription

1 Journal of Applied Psychology Effect Size Indices for Analyses of Measurement Equivalence: Understanding the Practical Importance of Differences Between Groups Christopher D. Nye, and Fritz Drasgow Online First Publication, April 4, doi: /a CITATION Nye, C. D., & Drasgow, F. (2011, April 4). Effect Size Indices for Analyses of Measurement Equivalence: Understanding the Practical Importance of Differences Between Groups. Journal of Applied Psychology. Advance online publication. doi: /a

2 Journal of Applied Psychology 2011 American Psychological Association 2011, Vol., No., /11/$12.00 DOI: /a Effect Size Indices for Analyses of Measurement Equivalence: Understanding the Practical Importance of Differences Between Groups Christopher D. Nye and Fritz Drasgow University of Illinois at Urbana Champaign Because of the practical, theoretical, and legal implications of differential item functioning (DIF) for organizational assessments, studies of measurement equivalence are a necessary first step before scores can be compared across individuals from different groups. However, commonly recommended criteria for evaluating results from these analyses have several important limitations. The present study proposes an effect size index for confirmatory factor analytic (CFA) studies of measurement equivalence to address 1 of these limitations. The application of this index is illustrated with personality data from American English, Greek, and Chinese samples. Results showed a range of nonequivalence across these samples, and these differences were linked to the observed effects of DIF on the outcomes of the assessment (i.e., group-level mean differences and adverse impact). Keywords: differential item functioning, measurement equivalence, effect size, employee selection Practitioners and organizational researchers confront a vast number of questions that involve comparing scores on assessment instruments across groups. Are workers more satisfied in organizations with empowerment programs? Are successful salespersons more extraverted? Are employees in a multinational organization more satisfied in one country than employees in another? Moreover, because of the legal and practical implications of using selection assessments that advantage one group over another, group comparisons may be particularly salient during the hiring process. For all of these comparisons to be meaningful, it is essential that the tests and scales provide equivalent measurement across groups. Equivalent measurement is obtained when individuals with the same standing on the trait assessed by the test or scale, but sampled from different groups, have equal expected observed scores (Drasgow, 1984). As such, measurement invariance can be examined by a differential item functioning (DIF) analysis using item-response theory (IRT) or with confirmatory factor analytic (CFA) mean and covariance structure (MACS) analysis. The latter method is the focus of this article. Although several articles have proposed various decision rules for determining if measurement nonequivalence exists with MACS analysis (Cheung & Rensvold, 2002; Hu & Bentler, 1999; Meade, Johnson, & Braddy, 2008), these rules generally involve empirically derived cutoffs or statistical significance tests. As such, the Christopher D. Nye and Fritz Drasgow, Department of Psychology, University of Illinois at Urbana Champaign. An earlier version of this article was presented at the annual meeting of the Academy of Management, Montréal, Quebec, Canada, August, We would like to thank Brent W. Roberts and Gerard Saucier for the use of their data in the empirical example. Correspondence concerning this article should be addressed to Christopher D. Nye, Department of Psychology, University of Illinois at Urbana Champaign, 603 East Daniel Street, Champaign, IL cnye2@cyrus.psych.illinois.edu analysis does not address the practical importance of observed differences between groups and does not provide users with information about the effects of nonequivalence on the organizational outcomes of an assessment. In the broader psychological literature, effect size statistics have been proposed to overcome this limitation (Cohen, 1990, 1994; Kirk, 2006; Schmidt, 1996). However, effect size indices for CFA evaluations of measurement equivalence have not yet been developed. In the present study, we propose such an index and examine its application to real-world data. To illustrate its practical importance, we also demonstrate the effects of measurement nonequivalence on the observed outcomes (e.g., means, adverse impact) of group-level comparisons. This information will enable researchers and practitioners to further evaluate the theoretical and practical importance of observed differences. The Importance of Measurement Equivalence Measurement invariance techniques can and should be applied prior to testing between-groups differences. For example, these methods are commonly used to examine the equivalence of tests and assessments across cultures (e.g., Wasti, Bergman, Glomb, & Drasgow, 2000), races (e.g., D. Chan, 1997), sexes (e.g., Parker, Baltes, & Christiansen, 1997), and other demographic groups or over time (e.g., K.-Y. Chan, Drasgow, & Sawin, 1999; Ryan, West, & Carr, 2003). Therefore, these techniques have been used to address questions in a number of substantive organizational research areas such as job attitudes (Ryan et al., 2003), employee selection (Stark, Chernyshenko, Chan, Lee, & Drasgow, 2001), organizational citizenship behaviors (Lam, Hui, & Law, 1999), motivation (Sagie, Elizur, & Yamauchi, 1996), performance ratings (Woehr, Sheehan, & Bennett, 2005), leadership (M. S. Cole, Bedeian, & Field, 2006), and sexual harassment (Wasti et al., 2000), among others. However, studies of measurement equivalence have garnered the most attention in areas where consistent group-level differences are observed. 1

3 2 NYE AND DRASGOW Within the cognitive ability domain, some content has been found to function differently across groups (Kuncel & Hezlett, 2007). Specifically, men tend to perform better than women on science-related questions (Lawrence, Curley, & McHale, 1988), and some verbal stimuli tend to favor European American over Hispanic individuals (Schmitt & Dorans, 1988). Other research has demonstrated that the psychometric properties of these measures can vary over time. For example, Lievens, Reeve, and Heggestad (2007) showed that allowing job applicants to retake a cognitive ability test could affect the functioning of the test in a selection context. In other words, results from their study showed that retesting resulted in nonequivalent test scores and biased prediction. Another phenomenon, known as item drift, may also occur as items on a selection test become obsolete or outdated as a result of educational, technological, and/or cultural changes (Bock, Muraki, & Pfeiffenberger, 1988). For example, K.-Y. Chan et al. (1999) examined the Armed Services Vocational Aptitude Battery (ASVAB) and found that items requiring greater technical knowledge (e.g., Electrical Information or General Sciences tests) exhibited the most DIF over time. In addition, Drasgow, Nye, and Guo (2008) found significant levels of item drift on the National Council Licensure Examination for Registered Nurses (NCLEX RN). Interestingly, in this study, DIF canceled out at the test level, suggesting that nonequivalence would not have a substantial impact on overall test scores. Because of their predictive validity and smaller subgroup differences, some have recommended using personality measures to supplement cognitive ability tests in selection settings (Cascio, Jacobs, & Silva, 2009; Maxwell & Arvey, 1993). In addition, personality variables play a key role in a variety of theories and models of organizational behavior, including, but not limited to, leadership (Judge, Bono, Ilies, & Gerhardt, 2002), motivation (Kanfer & Heggestad, 1997), organizational justice (Colquitt, Scott, Judge, & Shaw, 2006), job satisfaction (Judge, Heller, & Mount, 2002), and turnover (Salgado, 2000). Although personality measures may exhibit smaller subgroup differences within a single culture, recent research suggests that problems may occur in multicultural contexts (Ghorpade, Hattrup, & Lackritz, 1999; Oishi, 2006). For example, Nye, Roberts, Saucier, and Zhou (2008) found a lack of invariance for the majority of items on a common measure of personality when compared across three cultures. Because of these differences, scores may not be comparable across cultural groups. Given the increasing importance of cross-cultural psychology and international organizations, this issue has growing significance. Measurement Equivalence A variety of methods have been developed for examining measurement equivalence. Some have suggested using t tests, analysis of variance (ANOVA), or other statistical tests of observed score differences to evaluate DIF. However, these methods are inappropriate for this purpose because they confound DIF with true differences (referred to as impact; Stark, Chernyshenko, & Drasgow, 2004) between groups. Stated differently, these methods require the assumption of equal latent trait distributions across groups, which is unlikely to be true in practice (Hulin, Drasgow, & Parsons, 1983). Showing the inadequacy of these tests, Camilli and Shepard (1987) demonstrated that when true mean differences exist between groups, ANOVA comparisons were incapable of detecting DIF. In fact, even when measurement nonequivalence accounted for 35% of the observed mean difference, the effects suggested by ANOVA were negligible. More important, these authors found that the presence of true group differences can result in high Type I error rates. Mean and covariance structure (MACS) analysis is a more appropriate method for examining measurement equivalence (Cheung & Rensvold, 2000; Little, 1997; Stark, Chernyshenko, & Drasgow, 2006; Vandenberg & Lance, 2000) because it has important advantages over the alternative methods described above. First, it does not assume equal distributions of the latent trait across groups (Drasgow & Kanfer, 1985). Thus, a MACS analysis allows researchers to differentiate DIF from impact. Second, the adequacy of models can be evaluated using several well-established indices of fit. Therefore, this approach applies a more comprehensive definition of nonequivalence to the data. Mean and Covariance Structure (MACS) Analysis Although the number and order of the steps in MACS analysis can vary across studies (Vandenberg & Lance, 2000), researchers are generally interested in tests of configural, metric, and scalar invariance. However, a number of additional tests for invariance are available, and the exact forms of invariance that are assessed should be linked to the purposes of the study (Steenkamp & Baumgartner, 1998). Nevertheless, these additional tests should be preceded by a confirmation of configural, metric, and scalar equivalence. Configural invariance is the first step to assessing measurement equivalence (Vandenberg & Lance, 2000). Here, the pattern of fixed (at zero) and free loadings is compared across groups. Essentially, this test assesses the extent to which items in the scale or test load on the same factors in both samples and determines whether individuals in these samples employ the same conceptualization of the focal constructs. Therefore, this type of invariance has been particularly important in the personality literature, where there have been substantial debates over the latent structure of individual differences (see Saucier & Goldberg, 2001). If the pattern of zero and nonzero loadings differs across groups, no further tests will be justified; constructs that are conceptualized differently are not comparable across groups. In contrast, if configural invariance is confirmed, assessments of metric invariance should proceed. Metric invariance is tested by constraining the factor loadings to be equivalent across groups. In LISREL notation, metric equivalence tests the hypothesis that xg xg, where xg is the loading matrix for the gth group. Items that are found invariant at the metric level can then be assessed for scalar equivalence. In this step, the model for the data is X g xg xg g g, (1) where X g is the vector of observed variables, xg is the vector of intercepts of the regressions of the observed variables on the latent factor, and g is the vector of measurement errors. To test for scalar invariance, the item intercepts are constrained in addition to the factor loadings (i.e., xg xg and xg xg ). As such, this test assesses the comparability of scores across groups. A failure to support the null hypothesis suggests that the scores, and hence the

4 MEASUREMENT EQUIVALENCE 3 group means, are not directly comparable. Therefore, tests of scalar invariance have critical importance for drawing conclusions about group differences. Although metric and scalar equivalence are generally assessed sequentially, Stark et al. (2006) suggested that it may be useful to assess these forms of invariance simultaneously. Examining metric and scalar equivalence separately increases the number of comparisons and, therefore, also increases the risk of Type I errors. Moreover, the sequential process may propagate errors from one step (e.g., metric invariance) to another (i.e., scalar invariance). Interpreting Results of CFA Studies of Measurement Equivalence To test measurement equivalence, it is common to use statistical significance tests based on a chi-square distribution. In addition to the traditional chi-square difference tests, some authors (D. Chan, 2000; González-Romá, Hernández, & Gómez-Benito, 2006) have recommended examining modification indices that represent a chi-square estimate of the improvement in fit when the corresponding parameter is freed (Bollen, 1989). However, it is well known that chi-square significance tests are affected by sample size (Meade et al., 2008). Thus, in large samples, even small differences will be identified as statistically significant. González- Romá et al. (2006) illustrated this problem with modification indices. Under some conditions, these authors showed that power was only.29 when N 100 but increased to 1.00 when N 800. In other conditions, Type I error rates for the modification indices increased by.15 when samples of 100 and 800 were compared. Because of the limitations of chi-square tests, other indices (i.e., changes in fit statistics) have been suggested for evaluating equivalence in the CFA framework. For example, Cheung and Rensvold (2002) suggested that a change in the comparative fit index ( CFI) greater than.01 be used as a cutoff for identifying nonequivalence. More recently, Meade et al. (2008) showed that this cutoff was too liberal and did not detect some forms of nonequivalence. Instead, these authors recommended that a CFI.002 be used. This emphasis on statistical significance tests and empirically derived cutoff values has been criticized for several reasons (Cohen, 1990, 1994; Harlow, Mulaik, & Steiger, 1997; Kirk, 1996; Schmidt, 1996). Kirk (2006) critiqued these criteria because they force researchers to turn a decision continuum into a dichotomous reject/do not reject decision. As Kirk pointed out, this practice treats a p value that is only slightly larger than the cutoff (e.g., p.055) the same as a much larger value. Another important criticism is that these statistical tests do not reflect the practical significance of a difference. Meade et al. (2008) differentiated between detectable and practically significant DIF, noting that they were independent issues. These authors stated that the development of conservative cutoffs is primarily focused on the detection of DIF and not its practical significance. Thus, observing CFI.002 indicates that DIF exists but does not reflect the importance of the difference. As a result of these criticisms (and others), it is widely believed that the interpretation of empirical results should be based on an evaluation of effect sizes rather than tests of statistical significance. Therefore, a number of effect size statistics have been developed for ANOVAs (e.g., Hays, 1963), t tests (e.g., Cohen, 1988), and other traditional statistical tests. However, no such indices exist for CFA analyses. Stark et al. (2004) proposed an effect size for IRT analyses of differential test functioning (DTF) that they referred to as d dtf. Despite the usefulness of this measure, it is not applicable to CFA methodology, where no viable alternatives exist for these techniques. In addition, it does not address nonequivalence at the item level, which can be informative for test development. By assessing the magnitude of DIF, one can more easily detect items and/or specific content that may be problematic for a specific group. For this reason, single-item tests of measurement equivalence are frequently recommended to diagnose the source of any nonequivalence at the scale level (Vandenberg & Lance, 2000). Moreover, although CFA analyses have traditionally been conducted by placing simultaneous constraints on all of the items in a scale (i.e., the constrained-baseline approach), singleitem constraints (i.e., the free-baseline approach) provide lower Type I and II error rates (Stark et al., 2006). Consequently, we propose an item-level effect size measure for CFA analyses of measurement equivalence. An Effect Size Index for MACS Analyses As suggested by Stark et al. (2004) for their IRT effect size, an index of practically important nonequivalence can be defined as the contribution that DIF makes to expected score differences for each item. In CFA methodology, the mean predicted response ˆX ir to item i for an individual with a score of on the latent variable in the reference group (Group R) is given by ˆX ir ir ir, (2) where ir is the intercept and ir is the item s loading. Here, we use the language of IRT to differentiate the groups being compared. In this terminology, the reference group is the majority or baseline group, and the focal group (Group F) is the sample we are comparing to it. Thus, DIF is reflected in the area between the regression lines for the reference and focal groups (see Figures 1 and 2 for an illustration). Consequently, an effect size for MACS analysis can be defined as d MACS 1 SD ˆX ir ˆX if 2 f F d, (3) ip Figure 1. Comparing the adjective bold (d MACS 0.26) across the American English and Greek samples.

5 4 NYE AND DRASGOW also suggested using the average effect across respondents in the focal group or the raw differences between groups to provide additional information about the magnitude of an effect. Of course, if different denominators are used in the literature, effect size estimates will need to be adjusted for meta-analytic comparisons (cf. Morris & DeShon, 2002). Practical Consequences of Measurement Nonequivalence Figure 2. Comparing the adjective bold (d MACS 1.11) across the American English and Chinese samples. where SD ip is the pooled within-group standard deviation of item i across Groups R and F given by SD ip N R 1 SD R N F 1 SD F. (4) N R 1 N F 1 In addition, f F is the distribution of the latent trait in the focal group, 1 which is assumed to have a normal distribution with a mean and variance estimated from the latent factor in the focal group (i.e., F and F, respectively, in LISREL notation). It is also worth noting that the integral in Equation 3 can be approximated by summing across quadrature nodes. However, for practical use, a computer program is available from the first author for calculating this index using parameter estimates from commonly used structural equation modeling (SEM) software (e.g., LISREL, MPlus, AMOS). Dividing by the pooled standard deviation puts this measure in a standardized metric similar to other effect size indices. Although some indices use the pooled standard deviation from the reference and focal groups to standardize the raw differences between them (e.g., Cohen, 1988), others have suggested pooling the standard deviations across all g groups for increased comparability and more precise estimates of the population standard deviation (e.g., Hedges, 1981). If the magnitude of measurement nonequivalence is compared across three or more groups, pooling the standard deviations for the reference group and a single focal group (cf. Cohen, 1988) would result in effect sizes from the same study that are in different metrics and are not comparable to tests with other focal groups. However, if the pooled standard deviation for all groups is used as the denominator for Equation 3, effect size indices can be compared to evaluate the relative size of nonequivalence in each of the focal groups. Therefore, this approach is suggested here. Although the present study focuses on the pooled standard deviation across all groups, we also note that alternative denominators can be used. For example, Glass (1976) recommended using the standard deviation in the reference group 2 as the denominator for his effect size. This approach has a similar advantage to the pooled standard deviation in that effect sizes will be in the same metric when a single reference group is compared with multiple focal groups. In his taxonomy of IRT effect sizes, Meade (2010) Although this effect size measure can be used to describe the magnitude of an effect, it still provides little information about the observed consequences of measurement nonequivalence. For example, what effects does item-level nonequivalence have on the mean and the variance of the scale? Or, how will nonequivalence affect the outcomes of the selection process? To address these issues, equations were derived to calculate the effects of DIF on the mean and variance of a measure. These equations will help researchers and practitioners to further understand the effects of nonequivalence. In group-level comparisons, observed mean differences can be defined as Observed differences DIF impact. (5) To quantify the effects of DIF on the mean of a scale, one can calculate n mean X S 1 ˆX ir ˆX if f F d, (6) where X S is the scale score. Notice that the integral in this equation is similar to that in Equation 3 except that the differences between the mean predicted responses in Groups R and F at each ability level are not squared so that DIF in opposite directions can cancel. In addition, because we are interested in the change in the mean of the scale, item-level differences are summed across all n items to obtain the overall mean difference in raw score points. In sum, mean (X S ) refers to the amount of the observed difference that can be attributed to DIF; impact is not a factor in this calculation. Differences between the variances of a scale in the reference and focal groups due to DIF can also be calculated. Using the itemlevel parameters from the CFA model, these effects are defined as var x i 2C i ir F C i 2 F, (7) 1 The distribution of the latent factor in the focal group was used in Equation 3 because we are interested in DIF relative to this group. In other words, analyses of measurement equivalence are designed to detect DIF across groups, and we are interested in determining the magnitude of these differences across the range of the latent trait displayed by the focal group s members. This approach is consistent with other similar indices (Stark et al., 2004). 2 Glass discussed his effect size measure in the context of experimental manipulations where experimental and control groups are being compared. Thus, he suggested using the standard deviation of the control group as the denominator. In the language of DIF, the control group is most analogous to the reference group.

6 MEASUREMENT EQUIVALENCE 5 where ir is the factor loading of item i in the reference group, F is the variance of the latent factor in the reference group, and C i is the difference between the factor loadings for item i in the reference and focal groups. As illustrated in the Appendix, two key assumptions were made in this derivation. First, because we are interested only in identifying differences due to DIF, we assumed R F. When this is the case, var is not influenced by true group-level differences in the latent construct. Instead, only metric nonequivalence (i.e., differences in the factor loadings) can result in var 0. The second simplifying assumption is var( ir ) var( if ). Several authors have suggested that requiring equivalent error variances is the least important hypothesis to test and is generally unnecessary for analyses of measurement equivalence (Bentler, 1995; Byrne, 1998; Jöreskog, 1971). Speaking about constraining the error variances and covariances to equality in multigroup tests, Byrne (1998) noted that it is now widely accepted that to do so represents an overly restrictive test of the data (p. 261). Therefore, we do not include differences in when evaluating the effects of DIF. To estimate the total effect of DIF on the variance of a scale, the var x i can be aggregated across all n items in the scale using the formula for calculating the variance of a composite. In other words, var X S var x 1 var x 2... var x n where 2 cov x 1,x cov x n 1,x n, (8) cov x i,x j jr C i F C j ir F C i C j F (9) is the covariance of items i and j (see Appendix for derivations of var and cov). Adverse Impact Because DIF can result in mean differences between groups (see Equation 6), selection decisions may be affected. Indeed, mean differences between groups are the primary source of the differential selection outcomes experienced by members of various groups (Newman, Jacobs, & Bartram, 2007). Therefore, it is important to examine the consequences of DIF for adverse impact (AI). Although AI has been defined in a number of ways (Gatewood, Field, & Barrick, 2007), courts generally recognize two forms of evidence that it exists: statistical significance tests and the fourfifths rule (Bobko & Roth, 2009). Because of the problems with significance tests noted above, the present study focuses solely on the four-fifths rule. Using the four-fifths rule, AI is identified by the ratio of selection ratios for the majority and minority groups. If the value of this ratio is less than.80, AI is said to occur. Although selection ratios are typically calculated using the results of the selection process, a prospective ratio can be estimated from the CFA model by assuming that the latent trait is normally distributed. Here, the AI ratio is defined as AI ratio P F Z XF Z Cut P R Z XR Z Cut, (10) where Z XF and Z XR are the standardized scores on a selection measure in the focal and reference groups, Z Cut is the standardized cut score used to select employees, and P F Z XF Z Cut is the probability of an individual in Group F obtaining an observed score on the assessment that is greater than the cut score. The denominator in this equation is the same probability for an individual in the reference group. Note that the probabilities in the denominator and numerator are calculated using the model-based means and standard deviations from the reference and focal groups, respectively, but assuming equivalent distributions for the latent trait. In other words, differences between the numerator and denominator are entirely the result of DIF in the measure because differences due to impact (i.e., differences in the latent trait distribution) are not incorporated into these calculations. Thus, if this ratio is less than.80, AI will occur solely because of DIF in the measure. The Current Study The primary goal of the present study was to provide an empirical illustration of the effect size index described above and to examine the magnitude of DIF in a measure of the Big Five personality traits. In a reanalysis of data from Nye et al. (2008), the Mini-Markers Scale (Saucier, 1994) was used to determine the extent of CFA nonequivalence across American English, Greek, and Chinese cultures. Next, Equations 6 9 were used to show the effects of DIF on the observed scale-level properties of this measure. Finally, the AI ratio shown in Equation 10 was calculated and compared with the four-fifths rule to illustrate the consequences of DIF for employee selection. An Empirical Example Samples Because SEM models generally require large samples for accurate parameter estimates, even small effects may be statistically significant. Therefore, we chose samples for our empirical example that were large enough to obtain accurate parameters and to illustrate the advantages of effect size indices for MACS analysis. In the American English sample, responses were provided by 727 undergraduate students from two large Midwestern universities. The sample contained 388 women and 339 men, and the mean age was years. Because the measure examined in this study was developed in the United States and has been researched extensively in this country (Saucier, 1994), this sample was used as the reference group, and all analyses were conducted as twogroup comparisons between the U.S. sample and a focal group. The Greek sample was composed of 991 undergraduate students from several Greek universities. In this sample, there were substantially more women (N 751) than men (N 224; 16 did not report their gender). The Chinese sample consisted of 433 undergraduate students from a large university in Shanghai. The sample contained approximately 49% women and 51% men. Measure Participants in all three samples responded to Saucier s (1994) Mini-Marker Scale. This scale contains 40 adjective items assess-

7 6 NYE AND DRASGOW ing the five-factor personality structure. In their original study, Nye et al. (2008) found that a majority of the items were nonequivalent across all three cultures. However, the magnitudes of the CFI and chi-square difference tests suggested that effect sizes may vary across items in the scale. Analyses MACS analysis was used to examine the equivalence of the personality scales across the three samples. Specifically, configural, metric, and scalar invariance were assessed using the maximum-likelihood estimator and the multigroup function in LISREL 8.7. Interestingly, Nye et al. (2008) found that all five of the personality scales were multidimensional, with the two latent factors represented by positive and negative items, respectively. Thus, the four positive items in the Extraversion and Agreeableness scales composed one factor, and the four negative items defined the second factor. Although the two latent factors in the Conscientiousness scale were largely defined by positive and negative items, the adjective inefficient was the only item that did not conform to this structure. Despite the negative wording, inefficient loaded negatively on the positive factor. Given these results, we tested both one- and two-factor models in the American English sample and used the best fitting models to assess configural invariance. Metric and scalar equivalence were assessed simultaneously, and parameters were constrained using the free-baseline approach (i.e., a single item was constrained at a time). In MACS analysis, the referent item plays an important role in statistically identifying the latent trait scale. Because the latent factors being modeled are unobservable, they do not have an inherent scale and must be given one for the model to be identified. The most common approach to doing this is to constrain the loading of a single item to 1.00 for each factor. These items are referred to as the referent items. When the mean structure is estimated, as it is for tests of scalar equivalence, a scale must also be given to the mean of the latent factor. In the present study, we did this by constraining the intercept of the referent item to zero as suggested by Bollen (1989). Thus, the latent factor will have a mean and variance equal to that of the referent item. An alternative approach to scaling the mean of the latent factor is to constrain this parameter to zero (i.e., 0) in one of the groups, typically the reference group. With this approach, the means of the latent factors in the unconstrained groups represent their deviation from the reference group s mean. Either of these methods of scaling the latent factor can be used when calculating effect size indices. In studies of measurement equivalence, it is essential that the referent items are equivalent across groups. A nonequivalent referent will confound results and render the analysis meaningless. For example, Johnson, Meade, and DuVernet (2009) showed that a referent item that functions differently across groups can either mask or exacerbate nonequivalence in other items, resulting in low power or high Type I error rates, respectively. Thus, for each of the two factors in the scales examined here, only items found to be equivalent by Nye et al. (2008) were used as referent items. However, all of the items in the Neuroticism and Openness scales exhibited nonequivalence and, therefore, were excluded from the present analyses. In addition, Nye et al. were able to identify an equivalent referent item for only one of the latent factors in the Conscientiousness scale. Specifically, none of the negatively worded items were equivalent across cultures. Therefore, effect size estimates will be accurate for the positive Conscientiousness items but not for the negatively worded items. Nevertheless, we calculate effect sizes for the positive items and use the negative items to illustrate the effects of a nonequivalent referent item on the magnitude of an effect. Reliabilities for Extraversion ranged from.59 (Greece) to.84 (U.S.), Agreeableness ranged from.65 (Greece) to.78 (U.S.), and Conscientiousness ranged from.59 (U.S.) to.74 (China). Results To facilitate the use of the indices we propose here, we first provide step-by-step instructions for calculating these measures with examples from the Extraversion scale. The purpose of providing this level of detail is to help readers understand how to calculate these indices and to illustrate the application of them to complex survey data. Following this discussion, we present the results for all three personality scales. Calculating Effect Size Indices Step 1. The first step in the process of calculating effect sizes for MACS analyses is to identify the factor structure that will be tested across groups. On the basis of the results presented by Nye et al. (2008), we tested two factor-models for each of the scales, with every item loading on a single latent factor. However, we also hypothesized additional relationships between several of the items that were not tested by Nye et al. In all three scales, some items were included with their antonyms and/or synonyms. For example, the antonyms sympathetic and unsympathetic were both included in the Agreeableness scale. In contrast, the antonyms talkative and quiet were included in the Extraversion scale. Consequently, we also modeled correlated uniqueness terms for the antonyms and synonyms in each scale. Because of the inherent methodological relationship between these types of items, freeing these parameters seems justified (D. A. Cole, Ciesla, & Steiger, 2007) and is a common practice in personality research (Hopwood & Donnellan, 2010). Table 1 shows the fit indices for both the one- and two-factor models in the American English sample. As shown here, the Table 1 Fit Statistics for the One- and Two-Factor Models of the Extraversion, Agreeableness, and Conscientiousness Scales Scale 2 df RMSEA NNFI CFI SRMR Extraversion One-factor model Two-factor model Agreeableness One-factor model Two-factor model Conscientiousness One-factor model Two-factor model Note. RMSEA root mean square error of approximation; NNFI nonnormed fit index; CFI comparative fit index; SRMR standardized root mean square residual.

8 MEASUREMENT EQUIVALENCE 7 Conscientiousness scale was clearly not unidimensional, but the two-factor model fit well. In addition, although the single-factor models fit moderately well in the Extraversion and Agreeableness scales, the two-factor models fit better in both cases. Thus, the two-factor models were used to test for measurement equivalence. Step 2. After identifying a factor structure, multigroup analyses were applied to test for measurement equivalence. For the Extraversion scale, separate referent items were identified by Nye et al. (2008) for each of the two latent factors using the method suggested by Cheung and Rensvold (1999). With this approach, n 1 tests of equivalence are conducted for each of the items in the scale, with a different item serving as the referent in each test. An appropriate referent item is identified if an item is equivalent across each of the n 1 tests. Using this approach, the adjective extraverted was selected as the referent for the positively worded items, and the adjective shy was used to scale the negative items. After a referent item has been identified and the mean and variance of the latent factor set, tests for configural, metric, and scalar invariance can proceed as normal (see Vandenberg & Lance, 2000, for a comprehensive review of these analyses). In this process, effect size indices will be calculated using the unconstrained parameters that are estimated in each of the groups. Step 3. Next, the item-level pooled standard deviations were estimated. To calculate these for the Extraversion scale, the observed standard deviations for each item in the American English, Greek, and Chinese samples were pooled using Equation 4. As described above, the advantage of calculating the pooled standard deviation across all groups is that effect sizes will be on the same metric for comparisons of different groups. Thus, the magnitude of nonequivalence can be compared across groups. Step 4. After obtaining the item parameters and the pooled standard deviations, these values can be input to the dmacs computer program developed by and freely available from the first author. With this software, item parameters estimated in LISREL, MPlus, or any other statistical program can be used to estimate d MACS, mean, and var. LISREL estimates of the item parameters that were used to calculate effect sizes for the Extraversion scale are provided in Table 2. To illustrate the link between parameter differences and the magnitude of an effect, Figures 1 and 2 plot the mean predicted responses for the adjective bold in the Greek and Chinese comparisons, respectively. Note that when a scale is multidimensional, as is the case in the present study, the effect size indices must be calculated separately for each latent factor. Because group-level differences are integrated over the assumed normal distribution of the latent trait in the focal group (i.e., with a mean of F and a variance of F ), the distributions will not necessarily be the same for different dimensions. Thus, the parameters used to estimate the effect size will not be the same for each latent factor, and effect sizes must be estimated separately for items loading on different factors. Tables 3, 4, and 5 show the results for the Extraversion, Agreeableness, and Conscientiousness scales, respectively. Although there were eight items in each scale, the referent items are excluded from the tables because the parameters for these items are fixed and, therefore, identical across groups. The column headed 2 contains the increase in overall chi-square obtained when item parameters were constrained to be equal across the reference and focal groups. The columns headed RMSEA and CFI show the corresponding increases in the root mean square error of approximation (RMSEA) and CFI. Significant chi-square tests are in bold, and changes in CFI greater than Meade et al. s (2008) recommended cutoff (i.e.,.002) are marked by an asterisk. Modification indices (MIs) are also provided for comparison, and indices greater than 3.84 are significant. The values reported for the MIs depend on the presence of metric or scalar equivalence. If significant, the MI for the factor loading of the item is reported. If not, then the MI for the intercept is provided. As shown, nearly all of the items in these scales were nonequivalent in one or both of the focal groups. Moreover, the chi-square difference tests and the changes in CFI agreed in most cases. Although the MIs were generally consistent with the chisquare difference test and the change in CFI, this was not always the case. For example, the adjective talkative in the Extraversion scale was flagged as nonequivalent by both the chi-square differ- Table 2 Item Parameters for the Extraversion Scale Factor loading ( i ) Intercept ( i ) d MACS Item American English Greek Chinese American English Greek Chinese Greek Chinese Extraverted a Talkative Bold Energetic Latent mean ( 1 ) Latent variance ( 2 ) Shy a Quiet Bashful Withdrawn Latent mean ( 2 ) Latent variance ( 2 ) Note. MACS mean and covariance structure. a Referent items.

9 8 NYE AND DRASGOW Table 3 Measurement Equivalence of Extraversion Across Cultures 2 RMSEA CFI MI d MACS Item G C G C G C G C G C Talkative b a Bold b a Energetic a a Quiet a 8.05 a Bashful a 9.96 a Withdrawn a a Note. Bold values represent significant chi-square differences. All modification indices greater than 3.84 are significant and suggest that differential item functioning (DIF) is present. Indices are not presented for the referent items because these items are constrained across groups and, therefore, the values in each column are zero. RMSEA root mean square error of approximation; CFI comparative fit index; MI modification index; MACS mean and covariance structure; G Greek sample; C Chinese sample. a Modification index identifying metric nonequivalence. All significant modification indices suggested metric nonequivalence, and, therefore, scalar nonequivalence is not identified here. b Nonsignificant modification indices are represented by the values for the intercept of the item. CFI.002. ence test and the change in CFI. However, the MI did not identify significant DIF for this item at the metric or scalar level. Despite these discrepancies, nonequivalence appears to be pervasive in this measure of personality when used cross-culturally. The final two columns in Tables 3 5 show the effect sizes of nonequivalence for each of the items. In both the Greek and Chinese samples, a range of nonequivalence was found. Within a single scale, the broadest range of effect sizes was observed for the Extraversion scale in Table 3, where the magnitude of effects ranged from 0.26 for the adjective bold in the Greek sample to 1.11 for the same adjective in the Chinese sample. If Cohen s (1988) guidelines (i.e., values greater than 0.20 are considered small, 0.50 are medium, and 0.80 or greater are large) are used, nonequivalence on the Extraversion scale ranged from small to large. The effects for the adjective bold are also graphed in Figures 1 and 2. Figure 1 shows that the regression lines for the American English and Greek samples do not differ much at any point on the latent trait continuum. Thus, although this item may display significant nonequivalence, the magnitude of the effect appears small. In contrast, large differences are evident in the Chinese sample, particularly at the upper end of the latent trait continuum. In other words, highly extraverted respondents in the American English and Chinese samples are likely to respond differently to this item. Tables 4 and 5 present the results for the Agreeableness and Conscientiousness scales, respectively. Overall, results were generally consistent with those for the Extraversion scale. For both Agreeableness and positive Conscientiousness, a range of nonequivalence was identified. The smallest effects were observed on the Agreeableness scale. Here, the effect sizes for four of the six items were 0.20 or below in the Greek sample. However, substantially larger effects were observed in the Chinese sample and on several of the positively worded Conscientiousness items. Note that d MACS provides useful information for interpreting the magnitudes of the chi-square test and the CFI. In particular, using d MACS to evaluate results does not force a dichotomous interpretation that an effect either exists or does not exist (cf. Kirk, 2006). Instead, this index can be used to evaluate the magnitude of the differences between groups on a continuum of nonequivalence. The adjective unsympathetic in the Agreeableness scale (presented Table 4 Measurement Equivalence of Agreeableness Across Cultures 2 RMSEA CFI MI d MACS Item G C G C G C G C G C Warm b 5.73 a Kind a a Unsympathetic a a Cooperative a 9.94 a Rude a a Harsh a 0.13 b Note. Bold values represent significant chi-square differences. All modification indices greater than 3.84 are significant and suggest that differential item functioning (DIF) is present. Indices are not presented for the referent items because these items are constrained across groups and, therefore, the values in each column are zero. RMSEA root mean square error of approximation; CFI comparative fit index; MI modification index; MACS mean and covariance structure; G Greek sample; C Chinese sample. a Modification index identifying metric nonequivalence. All significant modification indices suggested metric nonequivalence, and, therefore, scalar nonequivalence is not identified here. b Nonsignificant modification indices are represented by the values for the intercept of the item. CFI.002.

10 MEASUREMENT EQUIVALENCE 9 Table 5 Measurement Equivalence of Conscientiousness Across Cultures 2 RMSEA CFI MI d MACS Item G C G C G C G C G C Organized a a Efficient a a Systematic b a Inefficient a 0.04 a Sloppy a a Careless a a Note. Bold values represent significant chi-square differences. All modification indices greater than 3.84 are significant and suggest that differential item functioning (DIF) is present. Indices are not presented for the referent items because these items are constrained across groups and, therefore, the values in each column are zero. RMSEA root mean square error of approximation; CFI comparative fit index; MI modification index; MACS mean and covariance structure; G Greek sample; C Chinese sample. a Modification index identifying metric nonequivalence. All significant modification indices suggested metric nonequivalence, and, therefore, scalar nonequivalence is not identified here. b Nonsignificant modification indices are represented by the values for the intercept of the item. CFI.002. in Table 4) provides a compelling example of the information that can be gained from using d MACS. Both the chi-square index and the CFI indicate that this item displays significant nonequivalence. However, the effect size presented for the Greek sample suggests that the magnitude of the difference between American English and Greek respondents is small. Similar results were obtained for the adjectives rude and harsh. Because an equivalent referent item was not available for the negatively worded factor in the Conscientiousness scale, accurate estimates of the effect sizes for these items are not possible. Therefore, we use these items to illustrate the influence of the referent item on the magnitude of the differences between groups. Table 6 provides effect sizes for the three negatively worded items using each of the other items on this factor as the referent. The bolded values represent the largest differences between effect sizes for the same item. For example, with the adjective disorganized as the referent item, the effect size for sloppy was 0.85 in the Chinese sample. However, when the adjective careless was used as the referent, the effect size was Similar results were obtained for the other items as well. Thus, using a nonequivalent referent item can have a substantial effect on the magnitude of differences between groups. Table 6 Demonstrating the Importance of an Equivalent Referent Item Using the Conscientiousness Scale Item Referent item G d MACS Disorganized Sloppy Disorganized Careless Sloppy Careless Sloppy Disorganized Careless Sloppy Careless Disorganized Note. Bolded values identify the item-level analyses that were affected most by the choice of the referent indicator. MACS mean and covariance structure; G Greek sample; C Chinese sample. C The Effects of Nonequivalence on Scale Characteristics Although the item-level effect size indices were calculated separately for items loading on a single latent factor, mean and var were aggregated across the two latent factors in the present study. Although this practice may not always be appropriate, we used this approach for the present study because the two factors represented methodological rather than substantive dimensions. As a consequence, research typically reports results at the scale level rather than for each methodological subfactor. Therefore, the effect of nonequivalence at the scale level is arguably more important than differences in the method factors. The effects of DIF on the scales properties are shown in Table 7. The first column shows the differences between the means of the reference and focal groups due to DIF. As shown in Equation 6, the focal group s mean was subtracted from the reference group s to obtain mean (X S ). Thus, negative values in this column suggest that DIF will result in a higher mean for the focal group. The next column shows the total observed differences between groups (i.e., DIF impact), and the following column provides the percentage of the observed difference that is accounted for by DIF; the remaining difference can be attributed to impact. The fourth column gives the differences between the variances of the scales in the reference and focal groups due to DIF, and the next two columns give the corresponding observed differences and the percentage of this difference attributable to DIF, respectively. The final column shows the range of d MACS for each focal group. As shown in Table 7, there was a range of differences between the means of the reference and focal groups due to DIF. The largest difference was 2.08 points for the Extraversion scale in the Chinese sample, and the smallest was 0.26 points for the Agreeableness scale in the Greek sample. These differences are in raw score points and indicate that the U.S. group would be expected to have a mean that was 2.08 points higher (or 0.26 points lower in the smallest case) than the Chinese group because of DIF. For the Extraversion and Agreeableness scales, these differences are at the scale level and, therefore, should be interpreted relative to a 40-point maximum score (i.e., eight items with five response options). In contrast, the effects of DIF on the Conscientiousness

THE DEVELOPMENT AND VALIDATION OF EFFECT SIZE MEASURES FOR IRT AND CFA STUDIES OF MEASUREMENT EQUIVALENCE CHRISTOPHER DAVID NYE DISSERTATION

THE DEVELOPMENT AND VALIDATION OF EFFECT SIZE MEASURES FOR IRT AND CFA STUDIES OF MEASUREMENT EQUIVALENCE CHRISTOPHER DAVID NYE DISSERTATION THE DEVELOPMENT AND VALIDATION OF EFFECT SIZE MEASURES FOR IRT AND CFA STUDIES OF MEASUREMENT EQUIVALENCE BY CHRISTOPHER DAVID NYE DISSERTATION Submitted in partial fulfillment of the requirements for

More information

Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University.

Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University. Running head: ASSESS MEASUREMENT INVARIANCE Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies Xiaowen Zhu Xi an Jiaotong University Yanjie Bian Xi an Jiaotong

More information

Manifestation Of Differences In Item-Level Characteristics In Scale-Level Measurement Invariance Tests Of Multi-Group Confirmatory Factor Analyses

Manifestation Of Differences In Item-Level Characteristics In Scale-Level Measurement Invariance Tests Of Multi-Group Confirmatory Factor Analyses Journal of Modern Applied Statistical Methods Copyright 2005 JMASM, Inc. May, 2005, Vol. 4, No.1, 275-282 1538 9472/05/$95.00 Manifestation Of Differences In Item-Level Characteristics In Scale-Level Measurement

More information

Differential Item Functioning

Differential Item Functioning Differential Item Functioning Lecture #11 ICPSR Item Response Theory Workshop Lecture #11: 1of 62 Lecture Overview Detection of Differential Item Functioning (DIF) Distinguish Bias from DIF Test vs. Item

More information

Measures of children s subjective well-being: Analysis of the potential for cross-cultural comparisons

Measures of children s subjective well-being: Analysis of the potential for cross-cultural comparisons Measures of children s subjective well-being: Analysis of the potential for cross-cultural comparisons Ferran Casas & Gwyther Rees Children s subjective well-being A substantial amount of international

More information

Confirmatory Factor Analysis and Item Response Theory: Two Approaches for Exploring Measurement Invariance

Confirmatory Factor Analysis and Item Response Theory: Two Approaches for Exploring Measurement Invariance Psychological Bulletin 1993, Vol. 114, No. 3, 552-566 Copyright 1993 by the American Psychological Association, Inc 0033-2909/93/S3.00 Confirmatory Factor Analysis and Item Response Theory: Two Approaches

More information

Instrument equivalence across ethnic groups. Antonio Olmos (MHCD) Susan R. Hutchinson (UNC)

Instrument equivalence across ethnic groups. Antonio Olmos (MHCD) Susan R. Hutchinson (UNC) Instrument equivalence across ethnic groups Antonio Olmos (MHCD) Susan R. Hutchinson (UNC) Overview Instrument Equivalence Measurement Invariance Invariance in Reliability Scores Factorial Invariance Item

More information

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling Olli-Pekka Kauppila Daria Kautto Session VI, September 20 2017 Learning objectives 1. Get familiar with the basic idea

More information

Sensitivity of DFIT Tests of Measurement Invariance for Likert Data

Sensitivity of DFIT Tests of Measurement Invariance for Likert Data Meade, A. W. & Lautenschlager, G. J. (2005, April). Sensitivity of DFIT Tests of Measurement Invariance for Likert Data. Paper presented at the 20 th Annual Conference of the Society for Industrial and

More information

On the Performance of Maximum Likelihood Versus Means and Variance Adjusted Weighted Least Squares Estimation in CFA

On the Performance of Maximum Likelihood Versus Means and Variance Adjusted Weighted Least Squares Estimation in CFA STRUCTURAL EQUATION MODELING, 13(2), 186 203 Copyright 2006, Lawrence Erlbaum Associates, Inc. On the Performance of Maximum Likelihood Versus Means and Variance Adjusted Weighted Least Squares Estimation

More information

Issues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy

Issues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy Industrial and Organizational Psychology, 3 (2010), 489 493. Copyright 2010 Society for Industrial and Organizational Psychology. 1754-9426/10 Issues That Should Not Be Overlooked in the Dominance Versus

More information

Preliminary Conclusion

Preliminary Conclusion 1 Exploring the Genetic Component of Political Participation Brad Verhulst Virginia Institute for Psychiatric and Behavioral Genetics Virginia Commonwealth University Theories of political participation,

More information

Measurement Invariance (MI): a general overview

Measurement Invariance (MI): a general overview Measurement Invariance (MI): a general overview Eric Duku Offord Centre for Child Studies 21 January 2015 Plan Background What is Measurement Invariance Methodology to test MI Challenges with post-hoc

More information

Paul Irwing, Manchester Business School

Paul Irwing, Manchester Business School Paul Irwing, Manchester Business School Factor analysis has been the prime statistical technique for the development of structural theories in social science, such as the hierarchical factor model of human

More information

A Comparison of Item Response Theory and Confirmatory Factor Analytic Methodologies for Establishing Measurement Equivalence/Invariance

A Comparison of Item Response Theory and Confirmatory Factor Analytic Methodologies for Establishing Measurement Equivalence/Invariance 10.1177/1094428104268027 ORGANIZATIONAL Meade, Lautenschlager RESEARCH / COMP ARISON METHODS OF IRT AND CFA A Comparison of Item Response Theory and Confirmatory Factor Analytic Methodologies for Establishing

More information

Comparing Factor Loadings in Exploratory Factor Analysis: A New Randomization Test

Comparing Factor Loadings in Exploratory Factor Analysis: A New Randomization Test Journal of Modern Applied Statistical Methods Volume 7 Issue 2 Article 3 11-1-2008 Comparing Factor Loadings in Exploratory Factor Analysis: A New Randomization Test W. Holmes Finch Ball State University,

More information

Personal Style Inventory Item Revision: Confirmatory Factor Analysis

Personal Style Inventory Item Revision: Confirmatory Factor Analysis Personal Style Inventory Item Revision: Confirmatory Factor Analysis This research was a team effort of Enzo Valenzi and myself. I m deeply grateful to Enzo for his years of statistical contributions to

More information

Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida

Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida and Oleksandr S. Chernyshenko University of Canterbury Presented at the New CAT Models

More information

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1 Nested Factor Analytic Model Comparison as a Means to Detect Aberrant Response Patterns John M. Clark III Pearson Author Note John M. Clark III,

More information

The Psychometric Properties of Dispositional Flow Scale-2 in Internet Gaming

The Psychometric Properties of Dispositional Flow Scale-2 in Internet Gaming Curr Psychol (2009) 28:194 201 DOI 10.1007/s12144-009-9058-x The Psychometric Properties of Dispositional Flow Scale-2 in Internet Gaming C. K. John Wang & W. C. Liu & A. Khoo Published online: 27 May

More information

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS Michael J. Kolen The University of Iowa March 2011 Commissioned by the Center for K 12 Assessment & Performance Management at

More information

Factorial Validity and Consistency of the MBI-GS Across Occupational Groups in Norway

Factorial Validity and Consistency of the MBI-GS Across Occupational Groups in Norway Brief Report Factorial Validity and Consistency of the MBI-GS Across Occupational Groups in Norway Astrid M. Richardsen Norwegian School of Management Monica Martinussen University of Tromsø The present

More information

ASSESSING THE UNIDIMENSIONALITY, RELIABILITY, VALIDITY AND FITNESS OF INFLUENTIAL FACTORS OF 8 TH GRADES STUDENT S MATHEMATICS ACHIEVEMENT IN MALAYSIA

ASSESSING THE UNIDIMENSIONALITY, RELIABILITY, VALIDITY AND FITNESS OF INFLUENTIAL FACTORS OF 8 TH GRADES STUDENT S MATHEMATICS ACHIEVEMENT IN MALAYSIA 1 International Journal of Advance Research, IJOAR.org Volume 1, Issue 2, MAY 2013, Online: ASSESSING THE UNIDIMENSIONALITY, RELIABILITY, VALIDITY AND FITNESS OF INFLUENTIAL FACTORS OF 8 TH GRADES STUDENT

More information

Impact and adjustment of selection bias. in the assessment of measurement equivalence

Impact and adjustment of selection bias. in the assessment of measurement equivalence Impact and adjustment of selection bias in the assessment of measurement equivalence Thomas Klausch, Joop Hox,& Barry Schouten Working Paper, Utrecht, December 2012 Corresponding author: Thomas Klausch,

More information

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories Kamla-Raj 010 Int J Edu Sci, (): 107-113 (010) Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories O.O. Adedoyin Department of Educational Foundations,

More information

Running Head: ADVERSE IMPACT. Significance Tests and Confidence Intervals for the Adverse Impact Ratio. Scott B. Morris

Running Head: ADVERSE IMPACT. Significance Tests and Confidence Intervals for the Adverse Impact Ratio. Scott B. Morris Running Head: ADVERSE IMPACT Significance Tests and Confidence Intervals for the Adverse Impact Ratio Scott B. Morris Illinois Institute of Technology Russell Lobsenz Federal Bureau of Investigation Adverse

More information

Comprehensive Statistical Analysis of a Mathematics Placement Test

Comprehensive Statistical Analysis of a Mathematics Placement Test Comprehensive Statistical Analysis of a Mathematics Placement Test Robert J. Hall Department of Educational Psychology Texas A&M University, USA (bobhall@tamu.edu) Eunju Jung Department of Educational

More information

Scale Building with Confirmatory Factor Analysis

Scale Building with Confirmatory Factor Analysis Scale Building with Confirmatory Factor Analysis Latent Trait Measurement and Structural Equation Models Lecture #7 February 27, 2013 PSYC 948: Lecture #7 Today s Class Scale building with confirmatory

More information

Bruno D. Zumbo, Ph.D. University of Northern British Columbia

Bruno D. Zumbo, Ph.D. University of Northern British Columbia Bruno Zumbo 1 The Effect of DIF and Impact on Classical Test Statistics: Undetected DIF and Impact, and the Reliability and Interpretability of Scores from a Language Proficiency Test Bruno D. Zumbo, Ph.D.

More information

TESTING PSYCHOMETRIC PROPERTIES IN DYADIC DATA USING CONFIRMATORY FACTOR ANALYSIS: CURRENT PRACTICES AND RECOMMENDATIONS

TESTING PSYCHOMETRIC PROPERTIES IN DYADIC DATA USING CONFIRMATORY FACTOR ANALYSIS: CURRENT PRACTICES AND RECOMMENDATIONS TESTING PSYCHOMETRIC PROPERTIES IN DYADIC DATA USING CONFIRMATORY FACTOR ANALYSIS: CURRENT PRACTICES AND RECOMMENDATIONS SHANNON E. CLAXTON HAYLEE K. DELUCA MANFRED H. M. VAN DULMEN KENT STATE UNIVERSITY

More information

Are Cross-Cultural Comparisons of Personality Profiles Meaningful? Differential Item and Facet Functioning in the Revised NEO Personality Inventory

Are Cross-Cultural Comparisons of Personality Profiles Meaningful? Differential Item and Facet Functioning in the Revised NEO Personality Inventory PERSONALITY PROCESSES AND INDIVIDUAL DIFFERENCES Are Cross-Cultural Comparisons of Personality Profiles Meaningful? Differential Item and Facet Functioning in the Revised NEO Personality Inventory A. Timothy

More information

Structural Equation Modeling of Multiple- Indicator Multimethod-Multioccasion Data: A Primer

Structural Equation Modeling of Multiple- Indicator Multimethod-Multioccasion Data: A Primer Utah State University DigitalCommons@USU Psychology Faculty Publications Psychology 4-2017 Structural Equation Modeling of Multiple- Indicator Multimethod-Multioccasion Data: A Primer Christian Geiser

More information

Confirmatory Factor Analysis of the Procrastination Assessment Scale for Students

Confirmatory Factor Analysis of the Procrastination Assessment Scale for Students 611456SGOXXX10.1177/2158244015611456SAGE OpenYockey and Kralowec research-article2015 Article Confirmatory Factor Analysis of the Procrastination Assessment Scale for Students SAGE Open October-December

More information

Multifactor Confirmatory Factor Analysis

Multifactor Confirmatory Factor Analysis Multifactor Confirmatory Factor Analysis Latent Trait Measurement and Structural Equation Models Lecture #9 March 13, 2013 PSYC 948: Lecture #9 Today s Class Confirmatory Factor Analysis with more than

More information

To link to this article:

To link to this article: This article was downloaded by: [Vrije Universiteit Amsterdam] On: 06 March 2012, At: 19:03 Publisher: Psychology Press Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered

More information

Applications of Structural Equation Modeling (SEM) in Humanities and Science Researches

Applications of Structural Equation Modeling (SEM) in Humanities and Science Researches Applications of Structural Equation Modeling (SEM) in Humanities and Science Researches Dr. Ayed Al Muala Department of Marketing, Applied Science University aied_muala@yahoo.com Dr. Mamdouh AL Ziadat

More information

Modeling the Influential Factors of 8 th Grades Student s Mathematics Achievement in Malaysia by Using Structural Equation Modeling (SEM)

Modeling the Influential Factors of 8 th Grades Student s Mathematics Achievement in Malaysia by Using Structural Equation Modeling (SEM) International Journal of Advances in Applied Sciences (IJAAS) Vol. 3, No. 4, December 2014, pp. 172~177 ISSN: 2252-8814 172 Modeling the Influential Factors of 8 th Grades Student s Mathematics Achievement

More information

The Development of Scales to Measure QISA s Three Guiding Principles of Student Aspirations Using the My Voice TM Survey

The Development of Scales to Measure QISA s Three Guiding Principles of Student Aspirations Using the My Voice TM Survey The Development of Scales to Measure QISA s Three Guiding Principles of Student Aspirations Using the My Voice TM Survey Matthew J. Bundick, Ph.D. Director of Research February 2011 The Development of

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

PSIHOLOGIJA, 2015, Vol. 48(4), UDC by the Serbian Psychological Association DOI: /PSI M

PSIHOLOGIJA, 2015, Vol. 48(4), UDC by the Serbian Psychological Association DOI: /PSI M PSIHOLOGIJA, 2015, Vol. 48(4), 431 449 UDC 303.64 2015 by the Serbian Psychological Association 159.9.072 DOI: 10.2298/PSI1504431M The impact of frequency rating scale formats on the measurement of latent

More information

Alternative Methods for Assessing the Fit of Structural Equation Models in Developmental Research

Alternative Methods for Assessing the Fit of Structural Equation Models in Developmental Research Alternative Methods for Assessing the Fit of Structural Equation Models in Developmental Research Michael T. Willoughby, B.S. & Patrick J. Curran, Ph.D. Duke University Abstract Structural Equation Modeling

More information

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz This study presents the steps Edgenuity uses to evaluate the reliability and validity of its quizzes, topic tests, and cumulative

More information

Journal of Applied Developmental Psychology

Journal of Applied Developmental Psychology Journal of Applied Developmental Psychology 35 (2014) 294 303 Contents lists available at ScienceDirect Journal of Applied Developmental Psychology Capturing age-group differences and developmental change

More information

Assessing Measurement Invariance of the Teachers Perceptions of Grading Practices Scale across Cultures

Assessing Measurement Invariance of the Teachers Perceptions of Grading Practices Scale across Cultures Assessing Measurement Invariance of the Teachers Perceptions of Grading Practices Scale across Cultures Xing Liu Assistant Professor Education Department Eastern Connecticut State University 83 Windham

More information

Business Statistics Probability

Business Statistics Probability Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

Using a multilevel structural equation modeling approach to explain cross-cultural measurement noninvariance

Using a multilevel structural equation modeling approach to explain cross-cultural measurement noninvariance Zurich Open Repository and Archive University of Zurich Main Library Strickhofstrasse 39 CH-8057 Zurich www.zora.uzh.ch Year: 2012 Using a multilevel structural equation modeling approach to explain cross-cultural

More information

Detection of Differential Test Functioning (DTF) and Differential Item Functioning (DIF) in MCCQE Part II Using Logistic Models

Detection of Differential Test Functioning (DTF) and Differential Item Functioning (DIF) in MCCQE Part II Using Logistic Models Detection of Differential Test Functioning (DTF) and Differential Item Functioning (DIF) in MCCQE Part II Using Logistic Models Jin Gong University of Iowa June, 2012 1 Background The Medical Council of

More information

The Bilevel Structure of the Outcome Questionnaire 45

The Bilevel Structure of the Outcome Questionnaire 45 Psychological Assessment 2010 American Psychological Association 2010, Vol. 22, No. 2, 350 355 1040-3590/10/$12.00 DOI: 10.1037/a0019187 The Bilevel Structure of the Outcome Questionnaire 45 Jamie L. Bludworth,

More information

REVERSED ITEM BIAS: AN INTEGRATIVE MODEL

REVERSED ITEM BIAS: AN INTEGRATIVE MODEL REVERSED ITEM BIAS: AN INTEGRATIVE MODEL Bert Weijters, Hans Baumgartner, and Niels Schillewaert - Accepted for publication in Psychological Methods - Bert Weijters, Vlerick Business School and Ghent University,

More information

Multiple Act criterion:

Multiple Act criterion: Common Features of Trait Theories Generality and Stability of Traits: Trait theorists all use consistencies in an individual s behavior and explain why persons respond in different ways to the same stimulus

More information

The Nature and Structure of Correlations Among Big Five Ratings: The Halo-Alpha-Beta Model

The Nature and Structure of Correlations Among Big Five Ratings: The Halo-Alpha-Beta Model See discussions, stats, and author profiles for this publication at: http://www.researchgate.net/publication/40455341 The Nature and Structure of Correlations Among Big Five Ratings: The Halo-Alpha-Beta

More information

The Problem of Measurement Model Misspecification in Behavioral and Organizational Research and Some Recommended Solutions

The Problem of Measurement Model Misspecification in Behavioral and Organizational Research and Some Recommended Solutions Journal of Applied Psychology Copyright 2005 by the American Psychological Association 2005, Vol. 90, No. 4, 710 730 0021-9010/05/$12.00 DOI: 10.1037/0021-9010.90.4.710 The Problem of Measurement Model

More information

Examining the efficacy of the Theory of Planned Behavior (TPB) to understand pre-service teachers intention to use technology*

Examining the efficacy of the Theory of Planned Behavior (TPB) to understand pre-service teachers intention to use technology* Examining the efficacy of the Theory of Planned Behavior (TPB) to understand pre-service teachers intention to use technology* Timothy Teo & Chwee Beng Lee Nanyang Technology University Singapore This

More information

Running head: CFA OF TDI AND STICSA 1. p Factor or Negative Emotionality? Joint CFA of Internalizing Symptomology

Running head: CFA OF TDI AND STICSA 1. p Factor or Negative Emotionality? Joint CFA of Internalizing Symptomology Running head: CFA OF TDI AND STICSA 1 p Factor or Negative Emotionality? Joint CFA of Internalizing Symptomology Caspi et al. (2014) reported that CFA results supported a general psychopathology factor,

More information

The Factor Structure and Factorial Invariance for the Decisional Balance Scale for Adolescent Smoking

The Factor Structure and Factorial Invariance for the Decisional Balance Scale for Adolescent Smoking Int. J. Behav. Med. (2009) 16:158 163 DOI 10.1007/s12529-008-9021-5 The Factor Structure and Factorial Invariance for the Decisional Balance Scale for Adolescent Smoking Boliang Guo & Paul Aveyard & Antony

More information

Assessing e-banking Adopters: an Invariance Approach

Assessing e-banking Adopters: an Invariance Approach Assessing e-banking Adopters: an Invariance Approach Vincent S. Lai 1), Honglei Li 2) 1) The Chinese University of Hong Kong (vslai@cuhk.edu.hk) 2) The Chinese University of Hong Kong (honglei@baf.msmail.cuhk.edu.hk)

More information

The Youth Experience Survey 2.0: Instrument Revisions and Validity Testing* David M. Hansen 1 University of Illinois, Urbana-Champaign

The Youth Experience Survey 2.0: Instrument Revisions and Validity Testing* David M. Hansen 1 University of Illinois, Urbana-Champaign The Youth Experience Survey 2.0: Instrument Revisions and Validity Testing* David M. Hansen 1 University of Illinois, Urbana-Champaign Reed Larson 2 University of Illinois, Urbana-Champaign February 28,

More information

The Multidimensionality of Revised Developmental Work Personality Scale

The Multidimensionality of Revised Developmental Work Personality Scale The Multidimensionality of Revised Developmental Work Personality Scale Work personality has been found to be vital in developing the foundation for effective vocational and career behavior (Bolton, 1992;

More information

Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses

Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses Item Response Theory Steven P. Reise University of California, U.S.A. Item response theory (IRT), or modern measurement theory, provides alternatives to classical test theory (CTT) methods for the construction,

More information

Methodological Issues in Measuring the Development of Character

Methodological Issues in Measuring the Development of Character Methodological Issues in Measuring the Development of Character Noel A. Card Department of Human Development and Family Studies College of Liberal Arts and Sciences Supported by a grant from the John Templeton

More information

Measurement Equivalence of Ordinal Items: A Comparison of Factor. Analytic, Item Response Theory, and Latent Class Approaches.

Measurement Equivalence of Ordinal Items: A Comparison of Factor. Analytic, Item Response Theory, and Latent Class Approaches. Measurement Equivalence of Ordinal Items: A Comparison of Factor Analytic, Item Response Theory, and Latent Class Approaches Miloš Kankaraš *, Jeroen K. Vermunt* and Guy Moors* Abstract Three distinctive

More information

Analysis of single gene effects 1. Quantitative analysis of single gene effects. Gregory Carey, Barbara J. Bowers, Jeanne M.

Analysis of single gene effects 1. Quantitative analysis of single gene effects. Gregory Carey, Barbara J. Bowers, Jeanne M. Analysis of single gene effects 1 Quantitative analysis of single gene effects Gregory Carey, Barbara J. Bowers, Jeanne M. Wehner From the Department of Psychology (GC, JMW) and Institute for Behavioral

More information

Simple Linear Regression the model, estimation and testing

Simple Linear Regression the model, estimation and testing Simple Linear Regression the model, estimation and testing Lecture No. 05 Example 1 A production manager has compared the dexterity test scores of five assembly-line employees with their hourly productivity.

More information

Running head: CFA OF STICSA 1. Model-Based Factor Reliability and Replicability of the STICSA

Running head: CFA OF STICSA 1. Model-Based Factor Reliability and Replicability of the STICSA Running head: CFA OF STICSA 1 Model-Based Factor Reliability and Replicability of the STICSA The State-Trait Inventory of Cognitive and Somatic Anxiety (STICSA; Ree et al., 2008) is a new measure of anxiety

More information

Factor structure and measurement invariance of a 10-item decisional balance scale:

Factor structure and measurement invariance of a 10-item decisional balance scale: Decisional Balance Measurement Invariance - 1 Factor structure and measurement invariance of a 10-item decisional balance scale: Longitudinal and subgroup examination within an adult diabetic sample. Michael

More information

Latent Trait Standardization of the Benzodiazepine Dependence. Self-Report Questionnaire using the Rasch Scaling Model

Latent Trait Standardization of the Benzodiazepine Dependence. Self-Report Questionnaire using the Rasch Scaling Model Chapter 7 Latent Trait Standardization of the Benzodiazepine Dependence Self-Report Questionnaire using the Rasch Scaling Model C.C. Kan 1, A.H.G.S. van der Ven 2, M.H.M. Breteler 3 and F.G. Zitman 1 1

More information

11/24/2017. Do not imply a cause-and-effect relationship

11/24/2017. Do not imply a cause-and-effect relationship Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are highly extraverted people less afraid of rejection

More information

A methodological perspective on the analysis of clinical and personality questionnaires Smits, Iris Anna Marije

A methodological perspective on the analysis of clinical and personality questionnaires Smits, Iris Anna Marije University of Groningen A methodological perspective on the analysis of clinical and personality questionnaires Smits, Iris Anna Mare IMPORTANT NOTE: You are advised to consult the publisher's version

More information

ON-LINE TECHNICAL APPENDIX

ON-LINE TECHNICAL APPENDIX ON-LINE TECHNICAL APPENDIX Not another safety culture survey : Using the Canadian Patient Safety Climate Survey (Can-PSCS) to measure provider perceptions of PSC across health settings Authors: Ginsburg,

More information

Assessing the item response theory with covariate (IRT-C) procedure for ascertaining. differential item functioning. Louis Tay

Assessing the item response theory with covariate (IRT-C) procedure for ascertaining. differential item functioning. Louis Tay ASSESSING DIF WITH IRT-C 1 Running head: ASSESSING DIF WITH IRT-C Assessing the item response theory with covariate (IRT-C) procedure for ascertaining differential item functioning Louis Tay University

More information

International Conference on Humanities and Social Science (HSS 2016)

International Conference on Humanities and Social Science (HSS 2016) International Conference on Humanities and Social Science (HSS 2016) The Chinese Version of WOrk-reLated Flow Inventory (WOLF): An Examination of Reliability and Validity Yi-yu CHEN1, a, Xiao-tong YU2,

More information

The CSGU: A Measure of Controllability, Stability, Globality, and Universality Attributions

The CSGU: A Measure of Controllability, Stability, Globality, and Universality Attributions Journal of Sport & Exercise Psychology, 2008, 30, 611-641 2008 Human Kinetics, Inc. The CSGU: A Measure of Controllability, Stability, Globality, and Universality Attributions Pete Coffee and Tim Rees

More information

TLQ Reliability, Validity and Norms

TLQ Reliability, Validity and Norms MSP Research Note TLQ Reliability, Validity and Norms Introduction This research note describes the reliability and validity of the TLQ. Evidence for the reliability and validity of is presented against

More information

ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION SCALE

ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION SCALE California State University, San Bernardino CSUSB ScholarWorks Electronic Theses, Projects, and Dissertations Office of Graduate Studies 6-2016 ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION

More information

GMAC. Scaling Item Difficulty Estimates from Nonequivalent Groups

GMAC. Scaling Item Difficulty Estimates from Nonequivalent Groups GMAC Scaling Item Difficulty Estimates from Nonequivalent Groups Fanmin Guo, Lawrence Rudner, and Eileen Talento-Miller GMAC Research Reports RR-09-03 April 3, 2009 Abstract By placing item statistics

More information

The Modification of Dichotomous and Polytomous Item Response Theory to Structural Equation Modeling Analysis

The Modification of Dichotomous and Polytomous Item Response Theory to Structural Equation Modeling Analysis Canadian Social Science Vol. 8, No. 5, 2012, pp. 71-78 DOI:10.3968/j.css.1923669720120805.1148 ISSN 1712-8056[Print] ISSN 1923-6697[Online] www.cscanada.net www.cscanada.org The Modification of Dichotomous

More information

CONFIRMATORY ANALYSIS OF EXPLORATIVELY OBTAINED FACTOR STRUCTURES

CONFIRMATORY ANALYSIS OF EXPLORATIVELY OBTAINED FACTOR STRUCTURES EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT VAN PROOIJEN AND VAN DER KLOOT CONFIRMATORY ANALYSIS OF EXPLORATIVELY OBTAINED FACTOR STRUCTURES JAN-WILLEM VAN PROOIJEN AND WILLEM A. VAN DER KLOOT Leiden University

More information

An Empirical Study on Causal Relationships between Perceived Enjoyment and Perceived Ease of Use

An Empirical Study on Causal Relationships between Perceived Enjoyment and Perceived Ease of Use An Empirical Study on Causal Relationships between Perceived Enjoyment and Perceived Ease of Use Heshan Sun Syracuse University hesun@syr.edu Ping Zhang Syracuse University pzhang@syr.edu ABSTRACT Causality

More information

Aesthetic Response to Color Combinations: Preference, Harmony, and Similarity. Supplementary Material. Karen B. Schloss and Stephen E.

Aesthetic Response to Color Combinations: Preference, Harmony, and Similarity. Supplementary Material. Karen B. Schloss and Stephen E. Aesthetic Response to Color Combinations: Preference, Harmony, and Similarity Supplementary Material Karen B. Schloss and Stephen E. Palmer University of California, Berkeley Effects of Cut on Pair Preference,

More information

Psychometric Details of the 20-Item UFFM-I Conscientiousness Scale

Psychometric Details of the 20-Item UFFM-I Conscientiousness Scale Psychometric Details of the 20-Item UFFM-I Conscientiousness Scale Documentation Prepared By: Nathan T. Carter & Rachel L. Williamson Applied Psychometric Laboratory at The University of Georgia Last Updated:

More information

VALIDATION OF TWO BODY IMAGE MEASURES FOR MEN AND WOMEN. Shayna A. Rusticus Anita M. Hubley University of British Columbia, Vancouver, BC, Canada

VALIDATION OF TWO BODY IMAGE MEASURES FOR MEN AND WOMEN. Shayna A. Rusticus Anita M. Hubley University of British Columbia, Vancouver, BC, Canada The University of British Columbia VALIDATION OF TWO BODY IMAGE MEASURES FOR MEN AND WOMEN Shayna A. Rusticus Anita M. Hubley University of British Columbia, Vancouver, BC, Canada Presented at the Annual

More information

Anumber of studies have shown that ignorance regarding fundamental measurement

Anumber of studies have shown that ignorance regarding fundamental measurement 10.1177/0013164406288165 Educational Graham / Congeneric and Psychological Reliability Measurement Congeneric and (Essentially) Tau-Equivalent Estimates of Score Reliability What They Are and How to Use

More information

Understanding University Students Implicit Theories of Willpower for Strenuous Mental Activities

Understanding University Students Implicit Theories of Willpower for Strenuous Mental Activities Understanding University Students Implicit Theories of Willpower for Strenuous Mental Activities Success in college is largely dependent on students ability to regulate themselves independently (Duckworth

More information

The Association Design and a Continuous Phenotype

The Association Design and a Continuous Phenotype PSYC 5102: Association Design & Continuous Phenotypes (4/4/07) 1 The Association Design and a Continuous Phenotype The purpose of this note is to demonstrate how to perform a population-based association

More information

Extraversion. The Extraversion factor reliability is 0.90 and the trait scale reliabilities range from 0.70 to 0.81.

Extraversion. The Extraversion factor reliability is 0.90 and the trait scale reliabilities range from 0.70 to 0.81. MSP RESEARCH NOTE B5PQ Reliability and Validity This research note describes the reliability and validity of the B5PQ. Evidence for the reliability and validity of is presented against some of the key

More information

Using contextual analysis to investigate the nature of spatial memory

Using contextual analysis to investigate the nature of spatial memory Psychon Bull Rev (2014) 21:721 727 DOI 10.3758/s13423-013-0523-z BRIEF REPORT Using contextual analysis to investigate the nature of spatial memory Karen L. Siedlecki & Timothy A. Salthouse Published online:

More information

Investigating the robustness of the nonparametric Levene test with more than two groups

Investigating the robustness of the nonparametric Levene test with more than two groups Psicológica (2014), 35, 361-383. Investigating the robustness of the nonparametric Levene test with more than two groups David W. Nordstokke * and S. Mitchell Colp University of Calgary, Canada Testing

More information

Linking Assessments: Concept and History

Linking Assessments: Concept and History Linking Assessments: Concept and History Michael J. Kolen, University of Iowa In this article, the history of linking is summarized, and current linking frameworks that have been proposed are considered.

More information

Basic concepts and principles of classical test theory

Basic concepts and principles of classical test theory Basic concepts and principles of classical test theory Jan-Eric Gustafsson What is measurement? Assignment of numbers to aspects of individuals according to some rule. The aspect which is measured must

More information

Connectedness DEOCS 4.1 Construct Validity Summary

Connectedness DEOCS 4.1 Construct Validity Summary Connectedness DEOCS 4.1 Construct Validity Summary DEFENSE EQUAL OPPORTUNITY MANAGEMENT INSTITUTE DIRECTORATE OF RESEARCH DEVELOPMENT AND STRATEGIC INITIATIVES Directed by Dr. Daniel P. McDonald, Executive

More information

Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria

Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria Thakur Karkee Measurement Incorporated Dong-In Kim CTB/McGraw-Hill Kevin Fatica CTB/McGraw-Hill

More information

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD Psy 427 Cal State Northridge Andrew Ainsworth, PhD Contents Item Analysis in General Classical Test Theory Item Response Theory Basics Item Response Functions Item Information Functions Invariance IRT

More information

The Influence of Psychological Empowerment on Innovative Work Behavior among Academia in Malaysian Research Universities

The Influence of Psychological Empowerment on Innovative Work Behavior among Academia in Malaysian Research Universities DOI: 10.7763/IPEDR. 2014. V 78. 21 The Influence of Psychological Empowerment on Innovative Work Behavior among Academia in Malaysian Research Universities Azra Ayue Abdul Rahman 1, Siti Aisyah Panatik

More information

Personality Traits Effects on Job Satisfaction: The Role of Goal Commitment

Personality Traits Effects on Job Satisfaction: The Role of Goal Commitment Marshall University Marshall Digital Scholar Management Faculty Research Management, Marketing and MIS Fall 11-14-2009 Personality Traits Effects on Job Satisfaction: The Role of Goal Commitment Wai Kwan

More information

A Hierarchical Comparison on Influence Paths from Cognitive & Emotional Trust to Proactive Behavior Between China and Japan

A Hierarchical Comparison on Influence Paths from Cognitive & Emotional Trust to Proactive Behavior Between China and Japan A Hierarchical Comparison on Influence Paths from Cognitive & Emotional Trust to Proactive Behavior Between China and Japan Pei Liu School of Management and Economics, North China Zhen Li Data Science

More information

Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking

Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking Jee Seon Kim University of Wisconsin, Madison Paper presented at 2006 NCME Annual Meeting San Francisco, CA Correspondence

More information

Self-Oriented and Socially Prescribed Perfectionism in the Eating Disorder Inventory Perfectionism Subscale

Self-Oriented and Socially Prescribed Perfectionism in the Eating Disorder Inventory Perfectionism Subscale Self-Oriented and Socially Prescribed Perfectionism in the Eating Disorder Inventory Perfectionism Subscale Simon B. Sherry, 1 Paul L. Hewitt, 1 * Avi Besser, 2 Brandy J. McGee, 1 and Gordon L. Flett 3

More information

Isabel Castillo, Inés Tomás, and Isabel Balaguer University of Valencia, Valencia, Spain

Isabel Castillo, Inés Tomás, and Isabel Balaguer University of Valencia, Valencia, Spain International Journal of Testing, : 21 32, 20 0 Copyright C Taylor & Francis Group, LLC ISSN: 1530-5058 print / 1532-7574 online DOI: 10.1080/15305050903352107 The Task and Ego Orientation in Sport Questionnaire:

More information

Psychometric Validation of the Four Factor Situational Temptations for Smoking Inventory in Adult Smokers

Psychometric Validation of the Four Factor Situational Temptations for Smoking Inventory in Adult Smokers University of Rhode Island DigitalCommons@URI Open Access Master's Theses 2013 Psychometric Validation of the Four Factor Situational Temptations for Smoking Inventory in Adult Smokers Hui-Qing Yin University

More information

Evaluating Factor Structures of Measures in Group Research: Looking Between and Within

Evaluating Factor Structures of Measures in Group Research: Looking Between and Within Group Dynamics: Theory, Research, and Practice 2016 American Psychological Association 2016, Vol. 20, No. 3, 165 180 1089-2699/16/$12.00 http://dx.doi.org/10.1037/gdn0000043 Evaluating Factor Structures

More information