COMPARING THE DOMINANCE APPROACH TO THE IDEAL-POINT APPROACH IN THE MEASUREMENT AND PREDICTABILITY OF PERSONALITY. Alison A. Broadfoot.

Size: px
Start display at page:

Download "COMPARING THE DOMINANCE APPROACH TO THE IDEAL-POINT APPROACH IN THE MEASUREMENT AND PREDICTABILITY OF PERSONALITY. Alison A. Broadfoot."

Transcription

1 COMPARING THE DOMINANCE APPROACH TO THE IDEAL-POINT APPROACH IN THE MEASUREMENT AND PREDICTABILITY OF PERSONALITY Alison A. Broadfoot A Dissertation Submitted to the Graduate College of Bowling Green State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY August 2008 Committee: Michael Zickar, Advisor James Albert Graduate Faculty Representative Scott Highhouse John Tisak

2 ii ABSTRACT Michael Zickar, Advisor This study investigated how using different measurement models affects the ordering of respondents on personality measures and then how model choice affects the criterion-related validity of the measure. Of interest is what are called Generalized Graded Unfolding Models (GGUMs), which do not assume monotonically increasing Item Response Functions (in IRT), but instead require the response functions to form a single peak. It was hypothesized that these fairly new measurement models would more accurately estimate respondents' personalities when compared to models from general Item Response Theory (IRT), such as the Generalized Partial Credit Model (GPCM), as it was assumed that the GGUM has greater flexibility in modeling of the response process. In addition, this study conducted impact analyses to assess the amount of rank order change that occurred at the upper end of scores on the personality measure. Criterion-related validities were not found to change much from measurement model to measurement model, but the impact analyses revealed substantial changes occurring at the upper end of the score distribution depending on the measurement model used. In a personnel selection context, this would result in the selection of different applicants when a top-down selection strategy is utilized. Beyond possible linear relationships between personality and criteria, this study also investigated the possibility of non-linear relationships. More non-linear relations were observed with the GGUM compared to the GPCM.

3 iii Finally, a simulation comparing the GGUM to the GPCM was conducted to compare the accuracy of latent trait estimates from these models. Results found that item characteristics within a scale helped determine whether the GGUM or GPCM produced more accurate thetas and more accurate criterion-related validities from those thetas. These findings suggest that it is important for researchers and practitioners to be aware of the characteristics of the items on their scales and use that knowledge to select the best measurement model. In addition, unfolding models may not accurately estimate thetas in all situations, particularly when the items in the scale do not exhibit meaningful unfolding. In these situations, if unfolding models were used, fairness and test efficacy could be in jeopardy.

4 This work is dedicated to my loving family. Thank you for your support in all of my life endeavors. iv

5 v ACKNOWLEDGMENTS I would like to thank my advisor, Michael Zickar, for his guidance throughout the dissertation process and throughout my time at BGSU. I truly feel lucky to have had such a wonderful advisor. In addition, I would like to thank my dissertation committee; James Albert, Scott Highhouse, and John Tisak. It was a joy to have you all on my committee as you each contributed, in a unique way, to my learning and development and provided excellent suggestions and questions to make my study stronger. I would also like to thank Nathan Carter for his assistance with a part of my simulation. Your help saved me much time and frustration! Finally, I would like thank my close friends at Bowling Green who helped to make my years here enjoyable and successful; thank you Gabriel De La Rosa, May Colatat, Maya Yankelevich, and Anna Zarubin. It s been fun!

6 vi TABLE OF CONTENTS Page CHAPTER I: INTRODUCTION.. 1 Personality Scale Development and Measurement 5 The Dominance Approach. 5 The Ideal-Point Approach.. 6 Unfolding Latent-Probabilistic Models Generalized Graded Unfolding Model.. 11 The GGUM Formula.. 12 Item Parameters. 12 Applications of Probability-Based Unfolding Models.. 14 Dominance Latent-Probabilistic Models The GPCM formula 17 Personality s Relation to Outcomes CHAPTER II: STUDY The Effect of Measurement Models on the Criterion-Related Validity for Personality.. 22 Impact Analyses. 23 Non-Linear Relationships.. 23 CHAPTER III: STUDY 1 METHODS. 28 Participants. 28 Measures. 28 Procedures.. 29

7 vii Analysis.. 29 Unidimensionality.. 29 Item and Person Parameter Estimation.. 29 Testing Hypothesis Testing Hypothesis Testing Hypothesis Testing Hypothesis CHAPTER IV: STUDY 1 RESULTS 33 Assessing Unidimensionality for the Big-Five Personality Scales. 33 Item Parameters.. 33 Evaluating Hypothesis Assessing Model Fit for the Agreeableness Scale.. 34 Assessing Model Fit for the Contentiousness Scale Assessing Model Fit for the Emotional Stability Scale.. 35 Assessing Model Fit for the Extroversion Scale. 35 Assessing Model Fit for the Intellectance Scale. 36 Model Fit Conclusion. 36 Evaluating Hypothesis Evaluating Hypothesis 3 38 Evaluating Hypotheses 4a.. 39 Non-Linear Relations with the GGUM.. 39 Non-Linear Relations with the GPCM Evaluating Hypotheses 4b.. 40

8 viii CHAPTER V: STUDY 1 DISCUSSION CHAPTER VI: STUDY The Effect of Measurement Models on the Criterion-Related Validity for Personality Theta Estimation Accuracy. 45 CHAPTER VII: STUDY 2 METHODS Procedures Analysis CHAPTER VIII: STUDY 2 RESULTS.. 52 Evaluating Hypothesis Evaluating the Simulated Agreeableness Scale.. 52 Evaluating the Simulated Conscientiousness Scale 53 Evaluating Hypothesis Evaluating Hypothesis Evaluating Hypothesis CHAPTER IX: STUDY 2 DISCUSSION.. 56 CHAPTER X: DISCUSSION. 58 Generalizing to Classical Test Theory (CTT). 59 Item Content and Unfolding 60 Model fit. 62 Impact Analyses. 62 Criterion-Related Validity.. 63 Non-Linear Relations. 64

9 ix Limitations. 65 Conclusion. 66 REFERENCES. 68 APPENDIX A: SURVEY. 128 APPENDIX B: PERSONALITY ITEMS. 133 APPENDIX C: GGUM OPTION RESPONSE FUNCTIONS (ORFs) FOR THE EMPIRICAL CONSCIENTIOUSNESS ITEMS APPENDIX D: GPCM OPTION RESPONSE FUNCTIONS (ORFs) FOR THE EMPIRICAL CONSCIENTIOUSNESS ITEMS.. 139

10 x LIST OF FIGURES/TABLES Figure/Table Page Table 1 GPCM and GGUM Item Parameters for the Agreeableness Personality Scale Table 2 GPCM and GGUM Item Parameters for the Conscientiousness Personality Scale Table 3 GPCM and GGUM Item Parameters for the Emotional Stability Personality Scale 78 Table 4 GPCM and GGUM Item Parameters for the Extroversion Personality Scale 79 Table 5 GPCM and GGUM Item Parameters for the Intellectance Personality Scale 80 Table 6 Items with Meaningful Unfolding and their GGUM Discrimination and Location Item Parameters Table 7 Chi-Square to Degree of Freedom Ratios for the GPCM and GGUM for the Agreeableness Personality Scale. 82 Table 8 Chi-Square to Degree of Freedom Ratios for the GPCM and GGUM for the Conscientiousness Personality Scale Table 9 Chi-Square to Degree of Freedom Ratios for the GPCM and GGUM for the Emotional Stability Personality Scale.. 84 Table 10 Chi-Square to Degree of Freedom Ratios for the GPCM and GGUM for the Extroversion Personality Scale 85 Table 11 Chi-Square to Degree of Freedom Ratios for the GPCM and GGUM for the Intellectance Personality Scale 86 Table 12 Correlations between GGUM Thetas, GPCM Thetas, and Student Criteria

11 xi for the Agreeableness Personality Scale. 87 Table 13 Correlations between GGUM Thetas, GPCM Thetas, and Student Criteria for the Conscientiousness Personality Scale 88 Table 14 Correlations between GGUM Thetas, GPCM Thetas, and Student Criteria for the Emotional Stability Personality Scale. 89 Table 15 Correlations between GGUM Thetas, GPCM Thetas, and Student Criteria for the Extroversion Personality Scale 90 Table 16 Correlations between GGUM Thetas, GPCM Thetas, and Student Criteria for the Intellectance Personality Scale 91 Table 17 Correlations between GGUM and GPCM Thetas Using Top, Bottom, and Full Respondent Distributions for the Big-Five Personality Scales Table 18 Step-Wise Power Polynomial Regressions for GGUM Agreeableness Thetas and GPCM Agreeableness Thetas as Predictors. 93 Table 19 Step-Wise Power Polynomial Regressions for GGUM Conscientiousness Thetas and GPCM Conscientiousness Thetas as Predictors.. 94 Table 20 Step-Wise Power Polynomial Regressions for GGUM Emotional Stability Thetas and GPCM Emotional Stability Thetas as Predictors. 95 Table 21 Step-Wise Power Polynomial Regressions for GGUM Extroversion Thetas and GPCM Extroversion Thetas as Predictors Table 22 Step-Wise Power Polynomial Regressions for GGUM Intellectance Thetas and GPCM Intellectance Thetas as Predictors Table 23 Absolute Differences in Correlation Scores between Correlations based on True Thetas and True Dependent Variable Scores and Correlations based

12 xii on Estimated Thetas (GGUM and GPCM) and True Dependent Variable Scores, Summed up Across 10 Rounds for each Condition Table 24 Correlations among True Dependent Variable Scores, True Thetas, Estimated GPCM Thetas, and Estimated GGUM Thetas when Correlation between True Dependent Variable Scores and True Thetas Equals 0.00 for 10 Simulation Rounds 99 Table 25 Correlations among True Dependent Variable Scores, True Thetas, Estimated GPCM Thetas, and Estimated GGUM Thetas when Correlation between True Dependent Variable Scores and True Thetas Equals 0.15 for 10 Simulation Rounds 101 Table 26 Correlations among True Dependent Variable Scores, True Thetas, Estimated GPCM Thetas, and Estimated GGUM Thetas when Correlation between True Dependent Variable Scores and True Thetas Equals 0.38 for 10 Simulation Rounds 103 Table 27 Correlations between GGUM and GPCM Thetas Using Top, Bottom, and Full Respondent Distributions for the Simulated Agreeableness and Conscientiousness Personality Scales for the 10 Simulation Rounds across each Correlation Condition Table 28 The Difference (D) between True Theta and GPCM Theta Correlations from True Theta and GGUM Theta Correlations, Summed up Across the 10 Simulation Rounds within each Condition Table 29 Correlations among True Theta Scores, Estimated GPCM Thetas, and Estimated GGUM Thetas, located 1 Standard Deviation or More Above

13 xiii the Mean, when the True Correlation Equals 0.00 for 10 Simulation Rounds 108 Table 30 Correlations among True Theta Scores, Estimated GPCM Thetas, and Estimated GGUM Thetas, located 1 Standard Deviation or More Above the Mean, when the True Correlation Equals 0.15 for 10 Simulation Rounds 109 Table 31 Correlations among True Theta Scores, Estimated GPCM Thetas, and Estimated GGUM Thetas, located 1 Standard Deviation or More Above the Mean, when the True Correlation Equals 0.38 for 10 Simulation Rounds 110 Table 32 The Difference in Correlations between True Thetas and GPCM Thetas Located One Standard Deviation or More above the Mean from True Thetas and GGUM Thetas Located 1 Standard Deviation or More above the Mean, Summed up Across the 10 Simulation Rounds within each Condition 111 Figure 1 Item Response Function for Dominance Approach 2-Parameter Logistic Model Likelihood of Endorsing Most Positive Response Option Figure 2 Item Response Function for Ideal-Point Approach Generalized Graded Unfolding Model Likelihood of Endorsing Least Positive to Most Positive Response Options Based on Theta Figure 3 Category Probability Function for an Item Exhibiting Unfolding, Modeled by the Generalized Graded Unfolding Model, Showing the Probability of Endorsing Response Options of an Item Based on Theta.. 114

14 xiv Figure 4 Category Probability Function for an Item Exhibiting Unfolding, Modeled by the Generalized Graded Unfolding Model, Showing the Probability of Endorsing Response Options of an Item Based on Theta Figure 5 An Item Characteristic Curve for the Generalized Partial Credit Model, Showing the Probability of Endorsing Response Options of an Item Based on Theta (Ability) Figure 6 An Item Characteristic Curve for the Generalized Partial Credit Model, Showing the Probability of Endorsing Response Options of an Item Based on Theta (Ability) Figure 7 GGUM Item Characteristic Curves for Two Items on the Agreeableness Scale Figure 8 GGUM Item Characteristic Curves for Two Items on the Conscientiousness Scale. 119 Figure 9 GGUM Item Characteristic Curves for Two Items on the Emotional Stability Scale. 120 Figure 10 GGUM Item Characteristic Curves for Two Items on the Extroversion Scale Figure 11 GGUM Item Characteristic Curves for Two Items on the Intellectance Scale 122 Figure 12 Cubic Relation between GGUM Conscientiousness Thetas and Predicted High School GPA Figure 13 Cubic Relation between GGUM Conscientiousness Thetas and Predicted College GPA.. 124

15 xv Figure 14 Cubic Relation between GGUM Intellectance Thetas and Predicted High School GPA 125 Figure 15 Quadratic Relation between GGUM Intellectance Thetas and Predicted Study Skills SJT Scores. 126 Figure 16 Quadratic Relation between GPCM Intellectance Thetas and Predicted Study Skills SJT Scores. 127

16 1 CHAPTER I: INTRODUCTION This study investigated how using different measurement models affects the ordering of respondents on personality measures and then how model choice affects the criterion-related validity of the measure (both linear and non-linear criterion-related validity). To estimate respondents scores on a particular scale or test, such as a personality scale measuring conscientiousness, some type of measurement model must be utilized. The measurement model could be something simple, such as adding up a respondent s individual item scores on a scale to create a total score. This is a common method that assumes that each item deserves the same weight for measuring the construct of interest and it assumes that each item is equally difficult to answer. However, these assumptions are not always tenable. Item Response Theory (IRT) is one method to estimate scores for respondents that does not assume that each item relates to the construct equally or has the same difficulty. Instead, IRT measurement models estimate item parameters that describe how difficult it is to endorse each item and how well each item is able to relate to the construct of interest. The better the item relates to the construct, the more weight is placed on scores obtained from the item for each respondent s score. In IRT, a respondent s scale score has been termed theta, latent trait, or true score. These terms will be used interchangeably throughout this document when referring to respondents standing on a particular personality trait when estimated with an IRT model. For this study, a new IRT measurement model is of interest, the Generalized Graded Unfolding Model (GGUM). This model does not assume monotonically increasing Item Response Functions (IRFs; see Figure 1), but instead requires the response functions to form a single peak (i.e., they bend back down, after going up), which is called unfolding (see Figure 2). These IRFs will be described in greater detail in the subsequent section. It is hypothesized that

17 2 this new measurement model will more accurately estimate respondents' personalities (also called latent traits or thetas), when compared to a dominance Item Response Theory measurement model as the GGUM may allow for greater flexibility in modeling of the response process compared to the GPCM. In personality measurement, unfolding is likely due to the nature of the items. For example, an item measuring extroversion might be worded "I sometimes like to go to parties." A person who is extremely extroverted might not endorse this item because he always likes to go to parties. Alternatively, an extremely introverted person might not endorse this item because he never likes to go to parties. GGUM could accurately measure these types of item interpretations whereas dominance IRT models, due to their restriction of a monotonically increasing relationship between thetas and probability of item endorsement, would not be able to model these interpretations accurately. Instead dominance IRT models may provide estimates of thetas for those in the middle range that are higher than theta estimates provided for those who are truly higher on the attribute of interest. Recently, researchers have begun to compare the GGUM, and other unfolding IRT models, to standard dominance IRT measurement models. Stark, Chernyshenko, Drasgow, and Williams (2006) investigated how well different measurement models, both unfolding and dominance, fit response data to various pre-developed personality scales. They found that unfolding IRT models fit the data best. This is an interesting finding considering that these predeveloped personality scales were constructed in such a way as to promote a dominance interpretation of the items, but the dominance measurement models did not fit the data as well as the unfolding model.

18 3 Building off of Stark et al (2006), Chernyshenko, Stark, Drasgow, and Roberts (2007) created new personality scales using different approaches to scale development (i.e., creating dominance items and unfolding items). They found similar criterion-related validities for both dominance and unfolding-based scales when related to the same criteria (study and health behaviors). They suggested that this finding is promising as the unfolding approach to scale development allows for the use of more neutral items, opening up the pool of possible items and perhaps allowing for better estimation of respondents with latent traits located in the middle range on the attribute continuum. A few points to note about the studies from Stark, Chernyshenko and colleagues, in their 2006 study they found that an unfolding model fit the data best, which suggests that the unfolding model can more accurately estimate respondents thetas compared to the dominance models. If this is indeed the case then more accurate criterion-related validities should be observed when using scores estimated from the unfolding measurement model compared to a dominance model, as there would be less error in the estimated unfolding scores. However, their 2007 study found similar criterion-related validities when using scales created from these two different measurement approaches. Maybe this finding was due to their dichotomization of the response data, perhaps losing psychometric information about respondents, resulting in more similar criterion-related validities than would otherwise be found. In addition, although their 2007 study did look at criterion-related validities, perhaps it would have been more insightful if they had conducted impact analyses to investigate the possibility of greater rank-order changes in respondents located at the upper end of the theta distribution. It could have been that substantial changes occurred at the upper end of the distribution, as this is where unfolding is likely to occur, compared to the rest of the distribution. If this were to occur, there would be important

19 4 implications for personnel selection where applicants are typically chosen starting with those who have test scores at the upper end of the distribution. Following from the above argument, this study hypothesized that if the GGUM provides better estimates of respondents traits, it should produce more accurate criterion-related validities, compared to the GPCM. In addition, this study conducted impact analyses to assess the amount of rank order change that occurs at the upper end of scores on the personality measure, which is where applicants would be chosen in top-down selection. It might be that the criterion-related validity would not change much from measurement model to measurement model, as was observed by Chernyshenko, Stark, Drasgow, and Roberts (2007), but an impact analysis might demonstrate that there are substantial changes occurring at the upper end of the score distribution depending on the measurement model used. If the GGUM provides more accurate estimates of respondents latent traits (thetas), resulting in the proper ordering of respondents and produces more accurate criterion-related validities, especially at the upper end of the distribution, the GGUM would be an important tool for personnel selection specialists and researchers. The goal of this paper is to help assess the worth of this relatively new tool in the context of using personality tests to aid in employee selection. This paper is specifically interested in the measurement and predictability of personality. Thus, the introduction will first discuss two approaches to personality scale development and measurement, the dominance approach and the ideal-point approach, and their relation to certain latent-probabilistic IRT models. Within this discussion, applications of and studies using the new unfolding latent-probabilistic models will be described. Finally, a brief summary of what

20 5 has been found regarding personality s relation to criteria (both job and academic) will be provided. Personality Scale Development and Measurement Personality can be defined as general tendencies and predispositions for the way an individual interacts with and responds to his or her environment (Paunonen, Haddock, Forsterling, & Keinonen, 2003). Personality is thought to be multidimensional (i.e., there are many personality traits), an individual difference characteristic, such that there are meaningful differences between people on the personality traits, and is somewhat enduring and stable (Major, Turner, & Fletcher, 2006). Typically, personality is measured via self report where a number of statements are provided and respondents are asked to indicate if these statements describe them or not (Benson & Campbell, 2007). To make these indications, respondents are given rating scales, which are either dichotomous (i.e., agree/disagree) or polytomous (i.e., strongly-agree /agree /neutral /disagree /strongly-disagree). The Dominance Approach. The dominance approach is usually utilized when developing predictor measures such as attitudinal and personality instruments (Petty & Cacioppo, 1981; Roberts, Laughlin & Wedell, 1999). This approach was first conceptualized by Likert (1932) and later termed by Coombs (1964). It assumes respondents endorse items when their standing on some trait is above the standing of that item on the same trait, and respondents will not endorse items when their standing on some trait is below the standing of that item on the same trait (Coombs, 1964). Item-total correlations, discriminant analysis, item-deleted alpha coefficients, principle component analysis, and factor analysis are common approaches used in selecting items that will perform well in the dominant approach (Roberts, et al., 1999).

21 6 Therefore, items are likely selected if they correlate highly with other items and if they load highly onto the same factor as the majority of the items. This ensures that items will only be chosen if they express a relatively extreme level of the trait of interest (either negative or positive). Items that are more neutral on the construct of interest will not be selected, as they will not correlate well with the other items, and will likely not load onto the same factor as the other, more extreme, items (van Schuur, & Kiers, 1994). Thinking about the dominance approach from a general IRT perspective, this approach would produce the typical ogive, monotonically increasing, curve that is observed in the typical Item Response Functions (see Figure 1). The IRF in Figure 1 shows the probability of endorsing an item given different levels of ability (theta). The probability ranges from 0 to 1, where 0 indicates no chance of endorsing the item and 1 indicates a 100 percent chance of endorsing the item in a certain direction (i.e., positively or negatively). Looking at the IRF in Figure 1, it is much more difficult for a person with a low level of ability to endorse this item than for someone with a high level of ability. Thus, this ogive curve shows that as a person s standing on a latent trait increases, her likelihood of endorsing an item increases, a property of the dominance approach to measurement. The Ideal-Point Approach. The dominance approach to scale development and measurement can be contrasted with the ideal-point approach. In 1928, Thurstone wrote a paper entitled Attitudes can be measured, describing his method of scale development and measurement. Thurstone s approach is considered an ideal-point conceptualization, a term that Coombs (1964) created to describe processes used, which assumes an individual will endorse an item to the degree that the item reflects her own standing on the attribute of interest and will not endorse an item to the degree that the item does not reflect her own standing on the attribute of

22 7 interest. Disagreement with an item can occur for two reasons: from above the item and from below the item. When disagreeing from above an item, the individual possesses more of the attribute than the item represents. Alternatively, when disagreeing from below an item, the individual possesses less of the attribute than the item represents. Thurstone s approach was mainly for the scale development and measurement of attitudes and what he called temperaments (1927). This latter term is similar to the term personality. The ideal-point approach to scale development attempts to identify items that cover the full range of the attitudinal construct or personality trait of interest. In terms of measurement, this approach assumes respondents are more likely to endorse items that are near one s level on the attitude or personality continuum and will tend to not endorse items that are further away, in either direction, from his or her true level on the attitude or trait of interest. Therefore, it is necessary to incorporate more neutral items into the scale so that respondents, who fall within this neutral zone, can obtain accurate estimates of their standing on the attribute of interest. For example, in the measurement of extroversion, the more neutral item I sometimes like to go to parties, could be used. An implication of this is that tools used to evaluate dominance measures, such as factor analysis, would not produce meaningful results. For example, Roberts and Laughlin (1996) found that when factor analysis has been used on scales derived from the idealpoint methods, a two-factor solution, with the component loadings forming a simplex pattern, using principal components, is usually found. Typically, researchers would seek to find evidence for unidimensionality (i.e., one dominant factor) when developing a scale that is supposed to be measuring one attitudinal construct or personality trait (van Schuur & Kiers, 1994). Therefore, finding that there are two dominant factors would suggest that the measure is multidimensional when in fact this may not necessarily be so.

23 8 Thurstone s (1927; 1928) approach to scale development was quite labor intensive. First, he stated that a group of experts should write statements that reflect the attribute of interest. Ideally one should recruit experts who vary in their level of the trait, or attribute. Once many statements have been written, he recommended using around 200 to 300 individuals as judges who would be asked to place these statements into 11 piles that ranged from extremely high on the attribute to extremely low. Importantly, these judges were not to indicate their own standing on the continuum but rather the standing of the item on the continuum of interest. Once this step was completed, researchers calculated the percentages of each statement s location in each pile. Each statement was then assigned a location, approximately based on the 50 percent mark of category placement. Specifically, an item is located where 50 percent of judges consider the item to contain more of the attribute (i.e., it is placed in a pile above its determined location about 50 percent of the time) and 50 percent of judges consider the item to contain less of the attribute (i.e., it is placed in a pile below its determined location about 50 percent of the time). From this, a diagram can be created showing monotonically increasing logarithmic functions that look similar to dominant IRFs from IRT. However, in this situation, the y-axis would contain the cumulative proportion of judges placement of items into the 11 piles and the x-axis would represent the 11 equally-spaced piles, which are assumed to be on a unidimensional continuum of the attribute of interest. Once the scale has been developed, respondents can then complete the measure. To determine where respondents stand on the attribute, the researcher would look at which items the respondent endorsed. Items were arranged in order from representing a low amount of the attribute to a high amount and respondents were placed along the same measurement continuum as the items were placed and were located around the locations of the items that they endorsed.

24 9 For example, if a measure of extroversion contains six items, which are ordered in terms of the amount of extroversion they possess (items A, B, C, D, E, & F), respondents will tend to endorse items that are located close to each other on the continuum and close to the respondent s location on the continuum. Respondent C truly has a location on the extroversion continuum that is at the same location as item C. Therefore, respondent C will likely highly endorse item C and will likely also endorse items B and D. It is less likely that respondent C will endorse items further from her location on the continuum, such as items A and F. A few points to note about this approach: first, as stated above, Thurstone s method utilized neutrally phrased statements, as well as more extreme statements for scale development; the use of more neutral statements precludes the use of item-total correlations and factor analysis in his approach to scale development. Another point to note is that Thurstone s approach requires locating each item on the continuum of interest whereas Likert s approach does not require this; Likert s approach is only interested in the rank-order of respondents not the rank-order of items. A final point to note is that Thurstone s approach places item and person parameters on the same scale, which is a characteristic of latent-probabilistic models such as IRT. Continuing with the example in the above paragraph, with the Thurstone approach, respondents can be given a location of where they are on the latent extroversion trait scale (i.e., respondent C) and items can be given a location on where they are on the same latent extroversion trait scale (i.e., item C; where respondent C and item C are at the same location). Items can be rank ordered based on their location, just as respondents can, and this rank order helps to elicit more meaning when a person responds to an item. So if a person affirms an item that is located high on the latent trait of interest, this likely means that this person is very high on the trait. With the Likert approach,

25 10 items and respondents are not placed on the same scale and the rank order of items is not emphasized. As such, less meaning can be extracted from the responses to items. Scale development using Thurstone s approach requires many participants and can be very time consuming. This likely explains why Likert s dominance approach became so popular and is the prevailing approach used today (Roberts et al., 1999). Luckily, recent developments in psychometrics allow for Thurstone s ideal-point method to be a realistic alternative. Specifically, researchers have begun to use the IRT framework to create latent-probabilistic based unfolding models. These models can be used to identify the location of items on the latent trait continuum, in which some of the items may have ideal-point characteristics, without the need for hundreds of judges to help determine the location of items. In addition, these models can derive both item parameters, which include the location of the item on the continuum, as well as person parameters (Roberts, Donoghue, & Laughlin, 2000). Therefore, these latent-probabilistic unfolding models do not require the hundreds of judges to assist in item location. With these new models, the respondents help to provide the item locations. Unfolding Latent-Probabilistic Models Unfolding latent-probabilistic models allow for the IRF to bend back down after it has gone up (see Figure 2). This is called unfolding. Although in most instances, items modeled with unfolding models will exhibit unfolding, there may be instances when this does not occur within a meaningful theta range. When the item s location is very extreme and there are no respondents located above the item s location, the item s response function will be approximately monotonic (Roberts et al., 2000; Roberts et al., 1999). Because of this, unfolding models may be able to be conceptualized as a more general model under which the dominance approach falls.

26 11 A variety of parametric and nonparametric 1 unfolding latent-probabilistic models have been developed. This study will focus on the parametric Generalized Graded Unfolding Model (GGUM; Roberts et al., 2000) as it allows for polytomous data and incorporates an extension of the 2-parameter logistical model (2-PLM). Specifically, it is an unfolding extension of Muraki s (1992) generalized partial credit model (GPCM). This is the first such model, to the author s knowledge, that allows for such flexibility in the modeling of the data. Generalized Graded Unfolding Model. The GGUM has been recently developed by Roberts et al. (2000) and, as stated above, is a direct extension of Muraki s (1992) GPCM. The GPCM is a polytomous, monotonically increasing, IRT model (see Figure 1), whereas the GGUM allows for unfolding (i.e., a single peak in the IRF; see Figure 2) to occur. Following the ideal-point approach, the GGUM was developed such that respondents will endorse an item to the extent that it is located close to where the person s latent trait or theta is located on the same latent continuum. And a respondent will likely disagree with items that are located further away from where he or she is located on the continuum. This means that a person can disagree with an item for two reasons: (1) he disagrees from above the item (i.e., the item is too far below where he stands on the trait to agree with it) or (2) he disagrees from below the item (i.e., the item is too far above where he stand on the trait to agree with it) (Roberts et al., 2000). Again, an item measuring extroversion might say "I sometimes like to go to parties." A person that is extremely extroverted might not endorse this item because he always likes to go to parties. This person is disagreeing with the item from above the item. Alternatively, an extremely introverted person might not endorse this item because she never likes to go to parties. 1 e.g., For parametric models: the Hyperbolic cosine Model (HCM) for dichotomous data extending the Rasch (1-PLM) IRT model (Andrich & Lou, 1993); the Graded Unfolding Model (GUM) for polytomous data extending the Rasch (1-PLM) IRT model (Roberts & Laughlin, 1996; Lou, 2001). For nonparametric models: for dichotomous data (Post & Snijders, 1993).

27 12 This person is disagreeing with the item from below the item. The GGUM can handle both of these types of disagreement. The GGUM Formula. The formula for the GGUM can be written as follows: Where Z i = a response to the i th item, z = 0, 1, 2,...,C; z = 0 corresponds to the strongest level of disagreement and z = C refers to the strongest level of agreement, θ j = the location of the j th individual on the latent continuum, δ i = the location of the i th item on the latent continuum, α i = the discrimination of the i th item, τ ik = the k th subjective category threshold parameter associated with the i th item, C = the number of response categories minus 1. Item Parameters. In the GGUM, each item has one discrimination parameter (alpha; α), which describes how well the item is able to discriminate among respondents who possess different levels of the trait of interest. Looking at Category Probability Functions (CPFs) for two items modeled with the GGUM in Figures 3 and 4, one can see that Figure 3 shows an item with greater discrimination than the item in Figure 4. This is because item discrimination relates to the steepness of the curves in the CPFs. In general, the greater the discrimination, the better the item can accurately locate respondents on the continuum. Each item also has an item location parameter (delta; δ), which describes where the item is located on the attribute latent trait (theta; θ) continuum. A high positive location parameter estimate would suggest that the item represents a strong endorsement of the latent trait whereas a high negative location parameter estimate would suggest that the item represents strong

28 13 disagreement with the latent trait of interest. In Figure 3, this item had a delta value of 2.36, which is on a z-score scale. Therefore, this item is relatively difficult to positively endorse. Only those who possess much of the latent trait would likely agree with this item. The item in Figure 4 is slightly less difficult than the item in Figure 3. In addition to an item location, each item has k 1 threshold parameters (Taus; τ, where k equals the number of observable response options for the item). These threshold parameters are based on threshold parameters derived from Subjective Response Category (SRC) Probability Functions (PFs). The SRC threshold parameters describe the location on the latent continuum where it is equally likely for a respondent to endorse one subjective response option verses the next. The SRC PFs reflect how individuals interpret an unfolding item. For example, with five response options, a person can strongly disagree, disagree, be neutral, agree, and strongly agree from above or below the location of the item. This creates 10 subjective response options and therefore there are k-1 or 9 threshold parameters (taus) for each SRC PF. This information is then used to create an item s Category Probably Function (CPF), which reflects the probability of observable responses given individuals thetas. The taus therefore cannot be interpreted as the thresholds between observable response category options. As can be seen in Figures 3 and 4, respondents who are located close to where the item is located are more likely to strongly positively endorse that particular item. Respondents who are located further away from where the item is located are less likely to strongly positively endorse that particular item. This is a characteristic of the ideal-point approach to measurement. GGUM is a parametric model in that it makes an assumption regarding the distribution of individuals latent traits. Usually a normal distribution is assumed; however, other distributions can be specified. Parametric models can be contrasted with nonparametric models, which do not

29 14 specify an a priori distribution of individual s latent traits, but instead allow the data to determine the nature of the latent trait distribution. The advantage of parametric models to nonparametric models is that parametric models, when correctly specified, allow for item and person invariance, whereas nonparametric models do not (Roberts et al., 2000). With item invariance, no matter the sample used, approximately the same item parameter estimates, within sampling error, should be obtained for each item. With person invariance, estimates of a person s latent trait do not depend upon the items that were specifically used to arrive at that estimate. Other items, measuring the same trait of interest, could be used and the person should be found to be located at the same location on the continuum, within sampling error. These two characteristics allow for further applications of these latent-probabilistic models, such as item banking and computer adaptive testing (Roberts et al., 2000). Applications of Probability-Based Unfolding Models. Probability-based unfolding models are relatively new; however, some researchers have used these models for answering research questions and, more obviously, for scale development and measurement. With regards to scale development, Roberts et al. (2000) used the GGUM to develop a measure of attitudes toward abortion. Rost and Luo (1997) used an unfolding model that was based on the Rasch IRT model for the measurement of adolescent centrism, which reflects adolescents attitudes toward the world of adults. Stark, Chernyshenko, Drasgow, and Williams (2006) specifically investigated the efficacy of using unfolding models for the measurement of personality by comparing the model fit of different measurement models: the dominance parametric twoparameter logistic models for dichotomous data; Levine s nonparametric maximum likelihood formula scoring models for dichotomous data, with both ideal-point (unfolding) and dominance

30 15 constraints; and the parametric GGUM for dichotomous data. Stark et al (2006) found the unfolding models fit the personality data best. McCloy, Heggestad, and Reeve (2005) conducted a simulation, which applied the idealpoint approach to personality measurement. Their study utilized a multidimensional forced choice (MFC) format, which is thought to limit applicants ability to fake the personality measure, a serious concern in personnel selection. In the simulation, respondents were given five statements, with one statement reflecting each of the Big-Five personality factors. The respondents were then asked to indicate the first and second statements that best described them. Their simulation found that accurate estimates of respondents thetas (latent traits) were achieved. This research demonstrates the usefulness of the unfolding approach to measurement as multidimensional scales can be utilized and it is possible that because of this, faking can be curbed with these personality measures. A study by Noel (1999) used the Graded Unfolding Model (GUM; where response option thresholds are held constant across items and the discrimination parameter is equal to 1 for all items) to model the stages of change that occur in smokers who are attempting to quit. When they compared their results to longitudinal studies, they found that probability of endorsement of a particular item related to the smoker s stage of quitting. For example, items that described dramatic relief (e.g., I react emotionally to warnings about smoking cigarettes ) were found to be mostly endorsed by smokers in the pre-contemplation stage, which is considered the first stage of smoking cessation. However, once smokers moved onto the next stages, their likelihood of endorsing these types of items diminished and their likelihood or endorsing other types of items increased.

31 16 Andrich and Styles (1998) used unfolding models to investigate a research question that is still being asked in the research community: what is the relationship between attitudes and behavior? They suggested that attitudes and behavior can be conceptualized as on the same continuum with behaviors more often at the upper end of the continuum and attitudes more likely at the lower end. Earlier research had attempted to answer this question by conducting factor analyses with attitudes and behaviors and running correlations. This earlier research has concluded that attitudes and behaviors were distinct as separate factors appeared for the two and attitudes and behaviors do not correlate highly. However, it could be that the tools used by these earlier researchers to assess this question were inappropriate. As stated above, factor analysis is a tool used for the dominance approach to scale development. Andrich and Styles suggested that it may be that an ideal-point approach would be better suited to answer their question as the ideal-point approach allows for a full range of items, even at the neutral zone, describing the attribute of interest. Andrich and Styles looked at attitudes and behaviors toward the environment and found evidence for them being on the same continuum. They suggest that those who are more likely to endorse the items assessing behavior toward the environment have higher latent traits than those who are more likely to endorse attitudinal items that promote environmentalism. A recent study by Chernyshenko, Stark, Drasgow, and Roberts (2007) utilized different approaches to scale development in creating measures of personality (i.e., dominance and idealpoint). They found that the personality measure constructed using the ideal-point approach yielded similar criterion-related validities as its dominance approach counterpart. In their study, they dichotomized their data for the GGUM, which was used for the ideal-point measurement, and the 2-PLM, which was used for the dominance measurement. It could be that dichotomizing

32 17 their data resulted in obtaining less information from their data, which may have affected the criterion-related validities. In addition, it may have been insightful if they had conducted an impact analysis, looking at changes in rank ordering of respondents thetas that were located at the upper end of the theta distribution. It could have been that substantial changes occurred at the upper end of the distribution, as this is where unfolding is likely to occur, compared to the rest of the distribution. If this were to occur, there would be important implications for personnel selection where applicants are typically chosen starting at the upper end of the score distribution. Dominance Latent-Probabilistic Models As opposed to probability-based unfolding models, like the GGUM, which follow the ideal-point approach to measurement, most IRT models follow the dominance approach to measurement (Chernyshenko et al., 2007). IRT models assume monotonically increasing IRFs such that as the person s standing on the latent trait increases her likelihood of endorsing a positively phrased item increases (see Figure 1). In addition to looking at the GGUM, this study will also look at a traditional, dominancebased IRT model to help determine which type of model (dominance or ideal-point) is best at estimating respondents true latent traits and which model produces latent traits that relate best to criteria. Specifically, for the IRT model, Muraki s (1992) GPCM will be used, which generalizes the dichotomous 2-PLM to polytomous data. The GGUM is based on the GPCM, but allows for unfolding to occur; thus, the GPCM is a natural comparator. The GPCM formula. The formula for the GPCM can be written as follows:

33 18 And Where D = 1.7, which is a scaling constant that puts the latent trait (θ) scale on the same metric as the normal ogive model, a j = the slope or discrimination parameter, b jv = an item-category parameter, b j = an item location parameter, and d v = a category parameter. In the GPCM, each item has a discrimination parameter (a) and a location parameter (b j ) and these can differ across items. Looking at the Item Category Characteristic Functions (ICCs) for the GPCM in Figures 5 and 6, the item in Figure 5 has a much higher discrimination parameter than Figure 6, which is reflected in much steeper curves for the item in Figure 5 compared to the item in Figure 6. In addition, there are k-1 item-category parameters (b jv ) for each item, which can differ within and across items. The category parameters (d v ), are the thresholds for the item and are equal to b j - b jv; thus, there are k-1 threshold parameters, which are derived from the location parameter and category parameters. Looking at Figures 5 and 6, one can compute the threshold parameters and see that they are equal to the points at which one curve crosses with the next successive curve. These cross points signify where on the ability distribution, a person has just as likely a chance of endorsing the two successive response options that cross at that point. Finally, each respondent receives a person parameter (theta; θ), describing his location on the

34 19 latent trait. ICCs characterize the likelihood that a respondent, with a certain theta or ability, will endorse a specific response option for a particular item. With the GPCM, the higher a person s theta, the more likely he will strongly agree with the positively phrased item. As can be seen in Figure 5, a person with an ability parameter equal to 3, on a normal z distribution, has almost a 100 percent chance of endorsing strongly agree for this item. A person with an ability parameter equal to -1, for the same item, is most likely to endorse agree, but also has a good chance of endorsing neutral, and a very slight chance of endorsing disagree. Personality s Relation to Outcomes Currently, the ability of personality measures to predict various criteria is generally accepted in the research community, but this has not always been the case, especially with regard to the prediction of employee performance (Benson & Campbell, 2007). After a review of the then current research on the criterion-related validities for personality measures, Guion and Gottier, in 1965, suggested that personality is not a useful tool in personnel selection. This was due to finding correlations that did not differ much from zero across a wide range of studies. Others echoed positions similar to that of Guion and Gottier (Locke & Hulin, 1962; Reilly & Chao, 1982; Schmitt, Gooding, Noe, & Kirsch, 1984). The use of personality in personnel selection did not regain its stronghold again until the 1990s (Hogan, 2005). This was due to new research findings showing acceptable levels for predictive validities when using personality scales to predict job performance and artifacts such as restriction of range and unreliability in the criterion were accounted for. For example, Tett, Jackson, and Rothstein (1991) found in their meta-analysis that corrected mean validities for the Big-Five personality factors ranged from.33 for agreeableness to.16 for extraversion. Not surprisingly, when factors are combined to create a measure (compound variables), and artifacts

35 20 have been accounted for, personality has been found to do even better in predicting and relating to job performance. For example, a meta-analysis by Ones, Viswesvaran, and Schmidt (1993) found integrity, which is thought to measure conscientiousness, agreeableness, and emotional stability (in that order of dominance in the measure), to predict overall job performance with a corrected correlation of And Ones and Viswesvaran (2001) found customer service orientation, which is thought to measure agreeableness, emotional stability, and conscientiousness (in that order of dominance in the measure), to predict overall job performance with a corrected correlation of In addition to job performance, personality has been observed to relate other criteria. Specifically, within the education setting, personality has been found to relate to academic performance (GPA). A meta-analysis by O Connor and Paunonen (2007) found that conscientiousness is consistently positively related to academic performance (r =.24) in postsecondary education. Although less consistent, they also found that extroversion was sometimes negatively related to academic performance and that openness to experiences was sometimes positively related. However, they note that there may be moderating variables for these two relationships and the magnitudes of the correlations were small. Finally, agreeableness and neuroticism did not relate to academic performance in their meta-analysis. Consistent with O Connor and Paunonen s (2007) study, many researchers have observed a significant positive relationship between conscientiousness and academic performance (Wintre & Sugar, 2000; Dollinger & Orf, 1991; Wolfe & Johnson, 1995). In addition, evidence of a relationship between openness to experience and academic performance has been observed (e.g., Dollinger & Orf, 1991; Blickle, 1996). However, other research has failed to find this 2 Corrected for unreliability in the criterion and restriction of range in the sample, although this latter correction was small. 3 Corrected for unreliability in the criterion only.

36 21 relationship (e.g., Wintre & Sugar, 2000; Wolfe & Johnson, 1995), again suggesting that moderator variables may play a role here. In addition to GPA, personality has also been found to relate to student effort. Specifically, Blinkle (1996) found that conscientiousness related positively to learning effort, which makes sense as conscientiousness is thought to relate to motivation as would student effort. Now that an overview of the measurement and predictability of personality has been provided, this manuscript will next present a series of hypotheses that flow from this overview. To answer these hypotheses two studies were conducted. Study 1 involved an empirical investigation and Study 2 comprised a simulation. Both studies were necessary to answer all of the research questions presented in this paper. However, some of the research questions were addressed in both studies when possible.

37 22 CHAPTER II: STUDY 1 The Effect of Measurement Models on the Criterion-Related Validity for Personality As stated above, dominance IRT models can be thought of as a special case of the unfolding probabilistic models as the later can produce practically monotonically increasing IRFs. This is becasuse IRFs from unfolding models could be monotonically increasing up to four standard deviations, for example, within the population before unfolding occurs, making it unlikely that the unfolding would affect the scoring of respondents; however, unfolding can also occur within a meaningful theta range for unfolding model. Alternatively, dominance IRT models assume the data will only fit monotonically increasing IRFs (Roberts et al, 1999), perhaps limiting its ability for modeling different types of items. Because of this, the GGUM appears to allow for respondents the freedom to interpret an item either from an ideal-point or dominance perspective as the GGUM should appropriately model that item according to respondents interpretation of it. The GPCM will only model the dominance interpretation. So if a respondent interprets an item from an ideal-point perspective, it is more likely that this item will be inappropriately modeled with the GPCM, resulting in greater error in the estimation of thetas. Therefore, in study 1, it is assumed that the GGUM will more accurately estimate respondents true scores on the latent trait compared to the GPCM. Related to this, research comparing model-data fit of unfolding IRT models to the model-data fit of dominance IRT models has found the unfolding IRT models to better fit personality data (Stark et al., 2006; Chernyshenko et al., 2007). Thus, it was hypothesized that the GGUM will better fit the personality data in this study compared to the GPCM. Hypothesis 1: The GGUM will have better model-data fit when compared to the GPCM.

38 23 Because the GGUM should do better at estimating respondents standing on the latent trait, meaning that there is less error in the theta estimates from the GGUM, it should produce scores that will better predict or relate to criteria that the trait should relate to compared to the GPCM. For this study, a student sample was utilized. Thus, criteria included GPA, ACT scores, class attendance, time spent studying, and study skills. Hypothesis 2: The GGUM will produce person parameters that will better relate with these criteria compared to the GPCM (a linear relation). Impact Analyses. However, based on previous research by Chernyshenko et al. (2007), it is unlikely that the differences observed in the criterion-related validities between measurement models will be very different (e.g., Chernyshenko et al., 2007). This is because only those respondents at the extreme positive end of the theta distribution will likely change rank ordering from one measurement model to the next and this change in rank ordering will probably only occur in some items (items that allow for an ideal-point interpretation to occur). But it is those individuals at the upper end of the latent distribution who are of most interest as the upper end of the distribution is where personnel decisions are likely to be made (Zickar, Rosse, Levin & Hulin, 1996; Mueller-Hanson, Heggestad, & Thornton, 2003). Therefore, impact analyses will also be conducted at the upper end of the distribution to evaluate differences in rank order across models. It is hypothesized that the impact analyses will find greater differences in rank order across measurement models at the upper end of the distribution compared to the lower end of the distribution of the latent trait. Hypothesis 3: Greater differences in rank orders across measurement models at the upper end of the distribution will be found compared to the lower end of the distribution. Non-Linear Relationships

39 24 Researchers have made another suggestion for why correlations between personality measures and criteria may not be as high as desired. It has been hypothesized that the relationship between personality measures and criteria may not be linear (Benson, & Campbell, 2007; Murphy, 1996; LaHuis, Martin, & Avis, 2005; Robie & Ryan, 1999). Most of the research assessing the relationship between personality variables and outcomes assumes a linear relationship, as correlations and linear regressions are most often used to investigate these relations. Therefore there is a common assumption that when an individual possesses more of a positive personality trait, she is more likely to engage in certain behaviors or have certain attitudes. However, this may not be the case. For example, consider the personality trait extroversion. It has been found to predict performance in sales jobs such that as one s standing on extroversion increases, his likelihood of performing well on the job increases (Barrick & Mount, 1991, 1993; Hurley, 1998; a linear relationship). And this relationship is to hold all the way up to the most extreme levels of extroversion. However, it may be that at some point having more extroversion would not be good for job performance. The extremely extroverted salesperson may waste too much time talking with his customers, losing precious sales opportunities. Surprisingly, there has been little research on this proposed phenomenon. And what little research has been conducted has found mixed results. For the personality trait, conscientiousness, two models of non-linearity with job performance have been suggested. Specifically, Murphy (1996) suggested two forms of the general quadratic model, which allows for one curve or bend in the regression model. The two forms are the asymptotic and the inverted-u-shaped functions. The asymptotic function might occur when a certain amount of conscientiousness is needed to perform well on the job and then after that amount, there is little additional benefit from having more conscientiousness for

40 25 effective job performance (LaHuis et al., 2005). An inverted-u-shaped function could be found when having a conscientiousness score at either end of the scale would result in lower performance compared to those who have scores at the middle range of the scale. LaHuis et al. (2005) conducted an investigation of the possible quadratic relationship between conscientiousness and job performance and observed a nonlinear, asymptotic, relationship with job performance for clerical workers. They used two independent sets of clerical workers with one set responding to a situational judgment test (SJT) and a biodata measure, both of which were thought to measure conscientiousness. The second set of workers responded to a more traditional measure of conscientiousness. Similar asymptotic nonlinear relationships were observed in both samples. Robie and Ryan (1999) suggested that an inverted-u-shaped function might be found in their study, which looked at the relationship between conscientiousness and supervisory ratings of job performance. However, contrary to their hypothesis, they did not find evidence of nonlinear relationships in their series of studies, which assessed both concurrent and predictive relationships. Their study found only robust linear effects between conscientiousness and job performance across a variety of occupations and industries. In addition to the five factor model of personality, there has been some work to develop personality measures for the assessment of leadership derailment. These personality instruments are said to measure the dark side of personality (i.e., more is worse) whereas the Big-Five are considered to measure the bright side of personality (i.e., more is better). The Hogan Development Survey (HDS; Hogan & Hogan, 1997) is one such measure of the dark side of personality, which is composed of 11 leadership traits that can lead to manager ineptitude (i.e., excitability, argumentiveness, cautiousness, arrogance, & mischief). Hogan and Hogan (1997)

41 26 suggest that individuals who score high on these dimensions may derail from the leadership track unless they monitor themselves carefully. In addition to the HDS, the Global Personality Inventory (GPI) also measures personalities that can derail leaders (epredix, 2001). Benson and Campbell (2007) used the HDS and GPI to evaluate the relationship between job performance and these personality derailers. They suggested and found an inverted-u relationship between the dark side personalities and leader job performance such that having too much or too little of these dark side traits tends to result in lower leader job performance; whereas, having a moderate amount of cautiousness and excitability, for example, tends to result in higher job performance for leaders. Going beyond the organizational setting, Cucina and Vasilopoulos (2005) assessed the criterion-related validity of the Big-Five personality factors in academics. Specifically, they investigated the relationship between college students academic performance (GPA) and the Big-Five to determine whether quadratic relationships exist. They found both conscientiousness and openness to experiences related to academic performance quadratically (with conscientiousness also having a significant linear relationship). Individuals with conscientiousness scores at the extremes were more likely to have lower grades compared to individuals with conscientiousness scores in the middle of the score range. In addition to conscientiousness, Cucina and Vasilopoulos (2005) found a quadratic relationship for openness to experience and GPA. However, a U-shaped relationship was observed with those scoring in the middle range of openness having the lowest GPAs. Although it has been typically assumed that measures of personality relate to criteria in a linear fashion, this may not always be the case. Given the dearth of research on this subject, it is difficult to conclude whether the few studies that found a quadratic relationship are due to

42 27 random sampling error or if they have found evidence for a common, but little known, phenomenon. This study will attempt to help shed more light on this issue by specifically assessing whether measures of personality relate to criteria in a linear or non-linear way. Therefore, in addition to the linear relations hypothesized earlier, this study also hypothesizes that personality and criteria will relate in a non-linear fashion. Beyond this, it is hypothesized that the GGUM will produce stronger non-linear relations compared to the GPCM as it is believed that the GGUM will more accurately estimate respondents true standing on the personality traits and thus there will be less error affecting the relationship between personality and outcomes. Hypothesis 4a: Respondents standing on the personality latent traits and criteria will relate non-linearly when using both the GGUM and the GPCM to estimate respondents thetas. Hypothesis 4b: The relation will be stronger when the GGUM is used, as opposed to the GPCM, to estimate respondents standings on the personality latent traits.

43 28 CHAPTER III: STUDY 1 METHODS Participants Four hundred and thirty nine undergraduate students at Bowling Green State University participated in this study. Forty percent of the participants were male and seventy-six percent were in their first year of undergraduate study. Although it may be of greater benefit to assess workers within a specific occupation or industry, as this would provide direct implications for personnel selection, due to sample size constraints and an inability to obtain such a dataset students were used. Although this can be considered a limitation of this study, this author cannot think of any reason why the personality distributions of these students would differ much from applicant distributions. Given the fact that much of this study is considering the measurement of personality, it seems that using a student sample would not limit the generalizability of the findings of this study to a working population. Measures To assess the Big-Five personality traits (i.e., extroversion, agreeableness, conscientiousness, emotional stability, & intellectance), the 50 Big-Five Factor Markers from the International Personality Item Pool (IPIP; Goldberg, 1990) were utilized (please see Appendix A for the complete measure and the full survey and Appendix B for a list of the personality items). Each trait was measured by 10 items. The internal consistency reliabilities for these traits are as follows: α =.87 for extroversion, α =.81 for agreeableness, α =.80 for conscientiousness, α =.85 for emotional stability, and α =.80 for intellectance. Criteria were selected that have been found or are thought to relate to the Big-Five personality traits, such as high school GPA, college GPA, ACT scores, hours spent studying each week (reverse scored for analyses), and frequency of attending class. In addition, a nine-item

44 29 study skills Situational Judgment Test (SJT) was given. For each item, students were given a situation and four behavioral response options from which they were asked to select the response option that best reflects what they would do in the situation. The internal consistency reliability for the SJT was α =.53. Finally, basic demographic information was obtained (please see Appendix A for the complete measure). Procedures Undergraduate students were given the paper and pencil survey located in Appendix A to complete, in a large classroom setting. Typically the survey was given right after their introduction to psychology class lecture was over for the day. On the survey, they were first given the personality items, followed by basic demographic questions and the criteria described above. The study skills SJT was given last. Participants were verbally asked to take their time when responding and to respond honestly. Analysis Unidimensionality. First, unidimensionality of the five personality trait scales was assessed as unidimensionality is an assumption of the GPCM. Reckase (1979) observed stable theta and item calibrations when the first factor of the test (either by principal components analysis or by principal factor analysis) accounts for at least twenty percent of the total variance. Therefore this criterion will be used to assess unidimensionality for the GPCM. These methods are inappropriate for judging whether the data are appropriate for the GGUM (Roberts, et al., 1999; van Schuur, & Kiers, 1994). Nonetheless, assessing unidimensionality is important for determining the appropriateness of the GPCM and therefore these analyses were conducted. Item and Person Parameter Estimation. Student responses to the personality items were used to estimate item parameters and students standing on the latent traits (thetas). This was

45 30 done using two measurement models, the GGUM and the GPCM. Therefore, students have two theta estimates for each trait (10 thetas in total per student). Parameter estimates for the GGUM were obtained by the software program GGUM2004, Version 1.00 (Roberts, 2004). For the GPCM, the software program, Parscale 4.1 (Muraki & Bock, 2003) was used. Testing Hypothesis 1. To test hypothesis 1 and to ensure that the two latent probabilistic models fit the data well and are thus useful for obtaining reliable and valid person parameters, model-data fit was assessed for both models across the five personality scales. Stark s (2001) MODFIT ( for web distribution) computer program was used to establish chi-square ratios (observed responses, for each response to each item, minus expected responses all divided by expected responses) divided by degrees of freedom for singles, pairs, and triples of items within each scale. Drasgow, Levine, Tsien, Williams, and Mead (1995) found that chi-square divided by degrees of freedom ratios under three (for singles, pairs and triples of items) indicate good model fit. This criterion was used to investigate model fit for the two latent probabilistic measurement models. In addition, to explicitly test hypothesis 1, these ratios were compared across models to determine which model better fit the data. It was hypothesized that the GGUM would fit the data better as it is a more flexible model. Testing Hypothesis 2. The person parameters (thetas) obtained from both the GGUM and GPCM and student responses to the criteria on the survey were then used to answer hypotheses 2-4. Hypothesis 2 predicts that the GGUM will produce person parameters that will better relate to criteria compared to the GPCM. To evaluate this, comparisons were made between the correlations mentioned above using the magnitude of the correlations as the comparator. Only significant correlations, if observed in at least one model, were compared as this indicates that the correlations are significantly different from zero. To establish support for hypothesis 2, the

46 31 significant correlations should be greater when the GGUM thetas are used to predict the criteria than when the GPCM thetas are used. This was determined using two approaches. The first approach involved taking the difference in magnitudes between the significant GGUM and GPCM correlations established for each personality scale. The second approach involved conducting Hotelling-Williams tests, for each criterion, to determine whether the correlations using the GGUM theta estimates compared to the correlations using the GPCM theta estimates are statistically significantly different. The Hotelling-Williams test is the most appropriate test because it is able to account for two dependencies in the two correlations to be compared: the use of the same sample to conduct both correlations and the use of the same variable (the criterion) in both correlations. Testing Hypothesis 3. Hypothesis 3 states that greater differences in rank order across measurement models at the upper end of the distribution will be found compared to the lower end of the distribution. To evaluate this, respondents with thetas one standard deviation above the mean or higher on the personality trait were selected to represent the upper end of the distribution. The lower end of the distribution contained all thetas one standard deviation below the mean or lower on the personality trait. These cutoffs were chosen somewhat arbitrarily but they do represent the upper and lower end of the distributions respectively. Correlations between the thetas obtained from the GGUM and the GPCM for the top and bottom of the distribution were then compared to evaluate hypothesis 3. Testing Hypothesis 4. Hypothesis 4 involves two hypotheses: 4a states that personality and criteria will relate non-linearly for both the GGUM and the GPCM and 4b states that using the GGUM will result in stronger non-linear relations than the GPCM. To evaluate hypotheses 4a, step-wise power polynomial regressions were run with the theta estimates (one for each

47 32 measurement model) predicting the criterion, for each personality trait and each criterion. At the first step, the dependent variable was regressed on to the linear predictor term; at the second step, the dependent variable was regressed on to the quadratic predictor term and the linear term; and finally, at the third step, the dependent variable was regressed on to the cubic predictor term, the quadratic predictor term, and the linear predictor term. This was done for each dependent variable regressed on to the thetas from each personality trait, estimated from the GGUM and GPCM. To evaluate hypothesis 4b, when significant non-linear relations were found, the R 2 s of the two models were compared at the step where significance was observed. The GGUM should possess stronger effect sizes. In addition, hypothesis 4b would be supported if more non-linear relations were observed for the GGUM compared to the GPCM.

48 33 CHAPTER IV: STUDY 1 RESULTS Assessing Unidimensionality for the Big-Five Personality Scales. Using Reckase s (1979) recommendations, that the first factor of a measure should account for at least 20% of the measure s total variance, five tests of unidimensionality were conducted, one for each personality scale. All five scales had sufficient unidimensionality to meet the assumption of the GPCM. The conscientiousness scale had 36% of its total variance explained by the first component (using principal components analysis), the agreeableness scale had 39%, the extroversion scale had 47%, the emotional stability scale had 42%, and the intellectance scale had 37% of its total variance explained by the first component. Therefore, it was appropriate to use the GPCM. Item and person parameters (thetas) were next estimated using the GPCM and the GGUM. Item Parameters. The GPCM and GGUM item parameters are presented on Tables 1 through 5 for the five personality scales. The GGUM Item Characteristic Curves (ICCs) for two items within each scale are presented in Figures 7 through 11. Looking at these figures, it is clear that some items tend to unfold within the -4 to +4 theta range whereas other items appear to increase in a monotonic fashion up to or past +4. For this study, a somewhat arbitrary criterion of delta < 2 was established to identify meaningful unfolding (when negatively phrased items were reverse scored; see Appendix B). This means that for items with a delta parameter of less than 2, these items unfold in a theta range where respondents are more likely to be located. Alternatively, if an item has a delta parameter of +4, for example, it is highly unlikely that any individual would be located beyond that point. For all practical purposes, this item has a monotonically increasing ICC.

49 34 Based on the criterion of delta < 2, unfolding items were identified. These items and the content from these items can be found in Table 6. The conscientiousness scale had by far the most number of unfolding items, with 7 of the 10 items having delta parameters of < 2. An example of an item from the conscientiousness scale that showed meaningful unfolding is I follow a schedule, The agreeableness and emotional stability scales demonstrated the least unfolding with only one item from each scale exhibiting meaningful unfolding. For the agreeableness scale, the item that had meaningful unfolding is I insult people. Evaluating Hypothesis 1. Hypothesis 1 states that the GGUM will better fit the personality data compared to the GPCM. Model fit was assessed for both the GGUM and GPCM models for the five personality scales. To do this, chi-square divided by degrees of freedom ratios were computed and used to evaluate model fit for the two latent probabilistic models. As stated earlier, the more ratios under three the better the model fits the data. Assessing Model Fit for the Agreeableness Scale. For the GPCM, 9 of 10 singles were under 3 (Item 9, I feel other s emotions," had a ratio of 15.52); for the doubles, 35 of 45 were under 3; and for the triples, 95 of 120 were under 3. The mean for the singles was 1.58, the mean for double was 2.31, and the mean for triple was For the GGUM, all 10 singles were under 3; for the doubles, 44 of 45 were under 3; and for the triples, 119 of 120 were under 3 (see Table 7). The mean for singles was.04, for doubles was 1.14, and for triples was Although neither model had all singles, doubles, and triples under 3, all of the means were under 3, suggesting that both models fit the agreeableness data reasonably well. Comparing the two models based on fit, it appears that the GGUM fits better than the GPCM for the agreeableness scale, providing support for hypothesis 1.

50 35 Assessing Model Fit for the Contentiousness Scale. For both the GPCM and GGUM, all 10 singles were under 3; for the doubles, all 45 were under 3; and for the triples, all 120 were under 3 (see Table 8). For the GPCM, the mean for singles was.00, for doubles was.92, and for triples was For the GGUM, the mean for singles was.09, for doubles was 1.05, and for triples was Both models appear to fit the conscientiousness data very well. When comparing the fit of the models, the GPCM actually fit better than the GGUM as it had lower means for the singles, double, and triples. Thus, hypothesis 1 was not supported when using the data from the conscientiousness scale. Assessing Model Fit for the Emotional Stability Scale. For the GPCM, all 10 singles were under 3; for the doubles, 42 of 45 were under 3; and for the triples, 116 of 120 were under 3. The mean for the singles was.03, for double was 1.18, and for triples was For the GGUM, all 10 singles were under 3; for the doubles, 42 of 45 were under 3; and for the triples, 119 of 120 were under 3 (see Table 9). The mean for the singles was.02, for double was 1.19, and for triples was Thus, although neither model had all singles, doubles, and triples under 3, all of the means were under 3, suggesting that both models fit the emotional stability data reasonably well. Comparing the two models based on fit, it appears that the GGUM fits the data slightly better than the GPCM, providing support for hypothesis 1. Assessing Model Fit for the Extroversion Scale. For the GPCM, 9 of 10 singles were under 3 (Item 1, I am the life of the party, had a ratio of 6.25); for the doubles, 44 of 45 were under 3; and for the triples, all 120 were under 3. The mean for the singles was.66, the mean for the doubles was 1.38, and the mean for the triples was For the GGUM, all singles, doubles, and triples were under 3 (see Table 10). The mean for the singles was.05, for the doubles was 1.07, and for the triples was The GGUM appears to fit the data very well and although the

51 36 GPCM did not have all singles, doubles, and triples under 3, its means were under 3, suggesting good fit. However, the GGUM seems to fit the extroversion data better than the GPCM, providing support for hypothesis 1. Assessing Model Fit for the Intellectance Scale. For the GPCM, all singles were under 3; for the doubles, 41 of 45 were under 3; and for the triples, 96 of 120 were under 3. The mean for the singles was.027, the mean for double was 1.64, and the mean for triples was For the GGUM, all 10 singles were under 3; for the doubles, 41 of 45 were under 3; and for the triples, 96 of 120 were under 3 (see Table 11). The mean for singles was.045, for doubles was 1.60, and for triples was Although neither model had all singles, doubles, and triples under 3, all of the means were under 3, suggesting that both models fit the agreeableness data reasonably well. Comparing the two models based on fit, it appears that the GGUM may fit slightly better than the GPCM for the intellectance scale, providing support for hypothesis 1. Model Fit Conclusion. Both the GGUM and GPCM appear to fit each personality scale reasonably well. Exceptional fit was observed with both models for the conscientiousness scale and with the GGUM for the extroversion scale. These findings indicate that the models fit the data well and that the scales appear to be unidimensional. When comparing models, the GGUM appeared to fit the data better than the GPCM for all of the personality scales except the conscientiousness scale. These findings provide partial support for hypothesis 1. Evaluating Hypothesis 2. Hypothesis 2 postulates that the GGUM will produce person parameters that will correlate better with the criteria compared to the GPCM. To evaluate this hypothesis, the magnitudes of these correlations were compared across measurement models. To establish support for hypothesis 2, significant correlations should be greater when the GGUM thetas are

52 37 used to predict the criteria than when the GPCM thetas are used. This was determined using two approaches. The first approach involved taking the difference (D) in magnitudes between the significant GGUM and GPCM correlations established for each personality scale. The second approach involved conducting Hotelling-Williams tests, for each criterion, to determine whether the correlations using the GGUM theta estimates compared to the correlations using the GPCM theta estimates are statistically significantly different. For the agreeableness scale, the difference was D =.02, in favor of the GPCM (see Table 12). Actually, for all four criteria, with significant correlations with agreeableness scores, the magnitude of the correlations was stronger with the GPCM thetas than the GGUM thetas. However, none of the correlations were significantly different from each other based on the Hotelling-Williams test. For the conscientiousness scale, the difference was D =.16, also in favor of the GPCM (see Table 13). Again, surprisingly, for all five criteria with significant correlations, the magnitude of the correlations was stronger for the GPCM thetas than the GGUM thetas. The Hotelling-William test revealed statistically significant differences for two of the correlation pairs. The correlations between high school GPA and conscientiousness, estimated with the GGUM verses the GPCM were statistically different (r =.21 verses r =.25 respectively; ), t(438) = 2.26, p <.05). In addition, the correlation between college GPA and GGUM conscientiousness thetas (r =.17) verses the correlation between college GPA and GPCM conscientiousness thetas (r =.22) were statistically different (t(438) = 2.29, p <.05). For both of these comparisons, the correlations using the GPCM theta estimates were stronger in magnitude. No significant correlations were found for the emotional stability scale therefore neither analyses were conducted for this scale (see Table 14). For the extroversion scale, there was only

53 38 one significant correlation and this correlation was stronger when using the GGUM thetas compared to the GPCM thetas (D =.02, in favor of the GGUM; see Table 15). However, the difference was not statistically significant based on the Hotelling-Williams test. Finally, for the intellectance scale, the difference in magnitudes for significant correlations with criteria was D =.01, in favor of the GGUM (see Table 16). Looking at the correlations, it appears that this finding is due to the stronger GGUM theta correlation with ACT scores. However, none of these differences were statistically significant. Based on the above findings, hypothesis 2 failed to be supported. The GGUM did not relate to the criteria more strongly than the GPCM. Evaluating Hypothesis 3. Hypothesis 3 states that greater differences in rank orders across measurement models at the upper end of the distribution will be found compared to the lower end of the distribution. To evaluate this, correlations between the GGUM and GPCM thetas at the top of the distribution (one standard deviation or above) were compared to correlations made from the bottom of the theta distributions (one standard deviation or below) for all five personality scales. For all of the personality scales, the correlations at the top of the theta distribution were much lower compared to the correlations at the bottom of the theta distribution and also for the full theta distribution (see Table 17). For the agreeableness scale, the correlation between the GGUM and GPCM thetas was.84 for the top of the theta distribution and.999 for the bottom. For the emotional stability scale the correlations were.89 and.998 respectively. For the extroversion scale, the correlations were.40 and.999. For the intellectance scale, the correlations were.78 and.998 respectively. Finally, the difference in correlations was particularly striking for the conscientiousness Scale, which had a negative correlation (-.55)

54 39 between thetas at the upper end of the distribution. The conscientiousness correlation for the lower end of the distribution was.99. These findings provide support for hypothesis 3. Evaluating Hypotheses 4a. Hypothesis 4a proposes that respondents standing on the personality latent traits and criteria will relate non-linearly when using both the GGUM and the GPCM to estimate respondents thetas. To evaluate hypotheses 4a, step-wise power polynomial regressions were run with the theta estimates (run on both measurement models in separate analyses) predicting the criterion, for each personality trait and each criterion. Statistical significance was evaluated at each regression step (Results from all analyses are reported in Tables 18-22). Non-Linear Relations with the GGUM. When using the GGUM thetas to investigate nonlinear relations with student criteria, four significant non-linear relations emerged. Conscientiousness thetas related cubically (and linearly) with high school GPA (ΔR 2 from step 2 to step 3 =.03; At step 3, linear B =.38, p =.00; cubic B = -.24, p =.00, see Table 19 & Figure 12) and college GPA (ΔR 2 from step 2 to step 3 =.02; At step 3, linear B =.30, p =.00; cubic B = -.17, p = 0.02; see Table 19 & Figure 13). Both of these relationships have a face-down S- shape where those with extremely low conscientiousness scores are predicted to have some of the highest GPAs and those with extremely high conscientiousness scores are predicted to have some of the lowest GPAs (Figures 12 & 13). In the middle range of thetas, however, there appears to be an inverted-u relationship, with predicted GPA scores peaking at about one standard deviation above the mean on theta. Intellectance thetas related cubically with high school GPA (ΔR 2 from step 2 to step 3 =.01; At step 3, cubic B =.19, p =.03; see Table 22 & Figure 14) and quadratically with scores from the study skills SJT (ΔR 2 from step 1 to step 2 =.02; At step 2, quadratic B = -.14, p =.01;

55 40 see Table 22 & Figure 15). The cubic relationship has a logistic shape (Figure 14). It appears that there is a positive relationship between high school GPA and intellectance at the extreme ends of intellectance (high and low). However, at the mid ranges of intellectance, the relationship approaches zero. An inverted-u shape is observed for the quadratic relationship with study skills SJT scores (Figure 15). Those who have thetas around the mean are predicted to have the highest study skills scores. Non-Linear Relations with the GPCM. When using the GPCM thetas to investigate nonlinear relations with student criteria, one significant non-linear relation emerged. Intellectance thetas related quadratically with scores from the study skills SJT (ΔR 2 from step 1 to step 2 =.02; At step 2, quadratic B = -.13, p =.01; see Table 22 & Figure 16). An inverted-u shape was again observed for this quadratic relationship with study skills SJT scores (Figure 16). Again, those who have thetas around the mean are predicted to have the highest study skills scores. Taken together, these findings indicate that hypothesis 4a was partially supported as some non-linear relations were found. Evaluating Hypotheses 4b. Hypothesis 4b states that non-linear relations will be stronger when the GGUM is used, as opposed to the GPCM, to estimate respondents standings on the personality latent traits. To evaluate this hypothesis, significant non-linear relations were compared based on their effect sizes (R 2 s). The only significant non-linear relation observed for both the GGUM and GPCM thetas was the quadratic relation between intellectance thetas and scores from the study skills SJT. The R 2 for both of these models equaled.02. Therefore, there was no difference in effect size due to theta estimation for this relationship. However, the GGUM thetas produced three

56 41 other non-linear relations between personality and criteria, which the GPCM did not. This finding provides some support for hypothesis 4b.

57 42 CHAPTER V: STUDY 1 DISCUSSION Study 1 provided partial support for hypothesis 1. As hypothesized, the GGUM better fit the agreeableness, emotional stability, extroversion, and intellectance data. However, counter to hypothesis 1, it was the GPCM that better fit the conscientiousness data. Although it should be noted that exceptional fit was observed for both measurement models for the conscientiousness data. The empirical study demonstrated a lack of support for hypothesis 2. For some traits, the GGUM produced thetas that correlated higher with criteria than the GPCM and for other traits, the opposite was observed. However, this test is not definitive in determining whether one measurement model is superior to the other with respect to producing thetas that better correlate with criteria. In study 1, an assumption was made that a stronger correlation indicates a better or more accurate correlation, but this is unlikely to be the case. It could be that a true correlation is lower than an observed correlation. Unfortunately, the empirical study (study 1) was unable to determine which correlations were more accurate; therefore, a simulation was conducted in study 2 to assess accuracy. It should be noted that the difference in correlations (criterion-related validities) was not very different when using one model verses the other, as was expected. The largest difference was for the conscientiousness scale (D =.16). Looking at the correlations between the thetas from the GGUM and GPCM for the full distribution (Table 17), this makes sense, as the conscientiousness scale also had the lowest correlation between the thetas (r =.903) compared to the other scales (r =.996 for agreeableness; r =.995 for emotional stability; r =.976 for extroversion; and r =.991 for intellectance).

58 43 Hypothesis 3, which involved impact analyses, was fully supported in study 1. The correlations between thetas estimated by the GGUM and thetas estimated by the GPCM correlated less at the upper end of the theta distribution than at the rest of the distribution. The results for the conscientiousness scale were particularly striking with a negative correlation observed at the upper end of the distribution. These findings suggest that care should be taken when selecting a measurement model for estimating thetas that will be used in high-stakes testing situations, such as in personnel selection. Typically, the upper end of the distribution is where applicants are selected from, in a top-down fashion. Thus, different applicants would be selected depending on which measurement model was utilized. Unfortunately, the empirical study was not able to determine which measurement model better rank ordered respondents. This question will be addressed by the simulation in study 2. Hypothesis 4a was partially supported. Significant non-linear relations between thetas and criteria were observed for both measurement models: one non-linear relation for the GPCM thetas and four non-linear relations for the GGUM thetas. Hypothesis 4b was also partially supported as the GGUM thetas did produce more significant non-linear relations. The findings from study 1 had some similarities to findings from past research. Cucina and Vasilopoulos (2005) found quadratic relations between conscientiousness and academic performance and between openness to experiences (similar to intellectance) and academic performance. In their study, they observed an inverted-u-shaped relationship for conscientiousness and a U-shaped relationship for openness. In this study, both conscientiousness and intellectance were found to relate non-linearly to GPA (academic performance) as well, but only when using the GGUM thetas. However, cubic relations, instead of quadratic relations, were observed. But when looking within the largely populated ranges of thetas, there does appear to be an inverted-u-

59 44 shaped relationship between conscientiousness and GPA. Similarly, when looking at the mid-toupper ranges of intellectance thetas, there does appear to be a U-shaped relationship between intellectance and GPA. It may be that Cucina and Vasilopoulos (2005) sample did not span the entire range for conscientiousness or intellectance, resulting in them finding only quadratic as opposed to cubic relations. Finally, for both measurement models, a quadratic relationship was observed between intellectance and scores on the study skills SJT. For both models this relationship clearly produced an inverted-u-shape. It should be noted that the amount of explained variation in the criteria for these significant non-linear relationships was not very high. The highest R 2 equaled 0.08 for the cubic relationship between GGUM conscientiousness thetas and high school GPA. This means that 8% of the variation in predicted high school GPA can be attributed to these conscientiousness thetas. This R 2 was by far the most robust with the other R 2 s ranging from.05 to.02. In this study, the GGUM uncovered more non-linear relationships than the GPCM. The next question is whether these observed non-linear relationships reflect true relationships between the personality traits and the criteria described above. The simulation in Study 2 will help to answer this question as well. The results from study 1 assisted in the understanding of how different measurement models affect relationships between personality traits and criteria. It does appear that using different measurement models affects relationships; however, the next question is which measurement model produces more accurate thetas and more accurate relationships with criteria. To investigate accuracy, a second study was conducted using a simulation.

60 45 CHAPTER VI: STUDY 2 The Effect of Measurement Models on the Criterion-Related Validity for Personality A Monte Carlo simulation was used to investigate hypothesis 2 (from study 1) more precisely. In study 1, an assumption was made that if the thetas from a measurement model more strongly relate to criteria, compared to the other model, this is evidence that the former model produced more accurate thetas than the latter. However this is unlikely to be the case. Given that true correlations were not known, it could be that lower observed correlations with criteria are more accurate. Unfortunately, the empirical study could not investigate correlation accuracy (only correlation strength). Thus, the simulation was used to further investigate hypothesis 2, but correlation accuracy, as opposed to the magnitude of the correlation, was evaluated in the simulation. Hypothesis 2: GGUM will produce person parameters that correlate more accurately with criteria compared to the GPCM (a linear relation). The simulation will also be used to investigate hypothesis 3, from the empirical study. Hypothesis 3: Greater differences in rank orders across measurement models at the upper end of the distribution will be found compared to the lower end of the distribution. Theta Estimation Accuracy In the empirical investigation, another assumption was made regarding error in the estimation of thetas when using different measurement models. It was assumed that the GGUM will have less error in the estimation of its thetas compared to the GPCM as the GGUM allows for greater flexibility in the interpretation of, and thus responses to, personality items. A simulation can directly assess this assumption. Therefore, it was hypothesized that the GGUM will more accurately estimate respondents true scores on the latent trait compared to the GPCM.

61 46 Hypothesis 5: GGUM will more accurately estimate respondents true scores on the latent trait compared to the GPCM and thus the thetas estimated from GGUM will correlate more with respondents true thetas compared to thetas estimated from GPCM. And as it was hypothesized that GGUM will more accurately estimate respondents thetas, it was further hypothesized that the rank orders at the upper end of the distributions would be more accurate when using the GGUM than when using the GPCM. Hypothesis 6: The GGUM will more accurately estimate respondents true latent traits at the upper end of the distribution compared to the GPCM estimates. Therefore, another set of impact analyses were conducted, but this time comparing rank orders from estimated thetas to rank orders from true thetas located at the upper end of the distribution.

62 47 CHAPTER VII: STUDY 2 METHODS Procedures A simulation was conducted to investigate hypotheses 2, 3, 5 and 6. The simulation study focused on comparing the GPCM to the GGUM. True scores of respondents on personality and outcome criteria were generated using SPSS, where the distributions of personality and outcome criteria were required to correlate. Different levels of relations between personality and the dependent variable were simulated based on the correlations found in this empirical study. Specifically, correlations of: r =.00, r =.15, and r =.38 were used. A correlation of r =.00 was chosen as many of the observed criterion-related validities in the empirical study were found to be non-significant and thus are considered no different than a correlation of zero. A correlation of r =.15 was chosen as many of the significant observed criterion-related validities in the empirical study were around r =.15. For example, agreeableness GGUM thetas correlated.19 with high school GPA and.21 with college GPA. Similar correlations were found with the GPCM thetas. The conscientiousness GGUM thetas correlated.21 with high school GPA,.18 with college GPA, and.13 with hours spent studying. Again similar correlations were observed for the conscientiousness GPCM thetas. Finally, a correlation of r =.38 was chosen to represent the largest correlation observed in the criterion-related validities from the empirical study. This was the correlation between intellectance GGUM thetas and ACT scores. The types of items used in the simulation reflect the type of items found on two of the five personality scales: conscientiousness and agreeableness. The conscientiousness item parameters were chosen as these items exhibited a great deal of unfolding compared to the other scales. Using a somewhat arbitrary criterion of delta < 2 (the item location parameter for the GGUM) to signify meaningful unfolding (i.e, unfolding that occurs in a theta range where

63 48 respondents would actually be located), the conscientiousness scale exhibited meaningful unfolding for 7 of its 10 items; whereas, the agreeableness scale had only 1 item of 10 exhibit meaningful unfolding. Similar to the agreeableness scale, the emotional stability scale also only had 1 item that displayed meaningful unfolding; the intellectance scale had 2 items and the extroversion scale had 3 items with meaningful unfolding. Therefore, the parameters from the agreeableness scale were chosen to represent scales that displayed minimal unfolding. To review, the simulation compared the GGUM to the GPCM, using three different correlations between true thetas and true criterion or dependent variable scores (r =.00, r =.15, r =.38), and involved item parameters from two of the Big-Five personality scales (conscientiousness and agreeableness). Therefore, this simulation had a 2 (scales) X 3 (correlations) X 2 (measurement models) design. Finally, 10 simulation rounds were conducted within each of the 12 conditions. The simulation produced randomly generated true personality scores (true thetas) for 1,000 simulees, which were then correlated (r =.00, r =.15, or r =.38) with randomly generated scores on a dependent variable, or criterion, for the same 1,000 simulees. Next, simulees responses to the two personality scales were generated. Because the person and item parameters are already known in the simulation, these values were used in the GGUM and GPCM formulas to generate each simulee s response to each item. Because both the GGUM and GPCM are probabilistic models, they describe the probability of endorsing one response option verses the next, this probabilistic nature was incorporated into the generation of responses. Finally, the arbitrary criteria of delta < 2 (for the GGUM items) was used to determine whether or not the item exhibited meaningful unfolding. If an item, for either the conscientiousness or agreeableness scale, exhibited meaningful unfolding, the GGUM and the item parameters

64 49 estimated by the GGUM for those unfolding items were used to estimate individuals responses. If the item did not exhibit meaningful unfolding, the GPCM and the item parameters estimated by the GPCM for those dominance items were used to generate responses. Therefore, the simulation for the conscientiousness scale generated responses to seven items using the GGUM and responses to the other three items using the GPCM. For the agreeableness scale, responses to one item were generated with the GGUM and responses to the remaining nine items were generated with the GPCM. Once responses to both scales were generated, these responses were then used to obtain new estimated thetas for both the GGUM and the GPCM. Parameter estimates for the GGUM were obtained by the software program GGUM2004, Version 1.00 (Roberts, 2004). For the GPCM, the software program, Parscale 4.1 (Muraki & Bock, 2003) was used. Therefore, for each simulation round and at each level of correlation with the criterion, each of the 1,000 simulees had a true theta, an estimated conscientiousness GGUM theta, an estimated conscientiousness GPCM theta, an estimated agreeableness GGUM theta, an estimated agreeableness GPCM theta, and finally, they had a true criterion (dependent variable) score, which was correlated with the true theta either at r =.00, r =.15, or r =.38. Analysis As stated above, the simulation was used to help answer hypotheses 2, 3, 5 and 6. Hypothesis 2 postulated that the GGUM would produce person parameters that would correlate more accurately with the criterion compared to the GPCM. To evaluate this hypothesis, the accuracy of these correlations was compared across measurement models. To establish support for hypothesis 2, the correlations must be more accurate (i.e., better reflect the true correlations between the true theta and true dependent variable) when the GGUM thetas are used to predict

65 50 the criteria than when the GPCM thetas are used. This was determined by taking the difference (D) of the true correlation (i.e., r =.00, r =.15, or r =.38) from the observed correlations (for both the GGUM and GPCM thetas). These differences were then summed up across the 10 rounds within each of the 12 conditions. To find support for hypothesis 2, the GGUM D scores should be closer to zero than the GPCM D scores. Hypothesis 3 stated that greater differences in rank orders across measurement models at the upper end of the distribution will be found compared to the lower end of the distribution. This hypothesis was evaluated in a fashion similar to the evaluation of hypotheses 3 in the empirical study. Specifically, correlations made from the top of the theta distribution (one standard deviation or above) were compared to correlations made from the bottom of the theta distributions (one standard deviation or below). This was done for the two simulated personality scales, across the three correlation with criterion conditions (i.e., r =.00, r =.15, r =.38), and for the ten rounds within each of the six conditions. Hypothesis 5 stated that the GGUM would more accurately estimate simulees true scores on the latent trait compared to the GPCM. To test this hypothesis, correlations were performed between simulees true theta parameters and their theta estimates from the GPCM and between simulees true theta parameters and their theta estimates from the GGUM. To find support for Hypothesis 5, the latter correlation should be positive and greater than the former correlation. Difference (D) scores were calculated to evaluate this hypothesis. Specifically, the difference between the GPCM theta correlation and the GGUM theta correlation with the true thetas was taken and summed up within each of the six conditions. To find support for hypothesis 5, a negative D score should be observed.

66 51 Hypothesis 6 was a continuation of hypothesis 5 and stated that the GGUM would more accurately estimate simulees true latent traits at the upper end of the distribution compared to the GPCM s estimates. To evaluate this, correlations were again compared but this time restricted only to simulees who had theta parameters one standard deviation above the mean or higher. Difference (D) scores were again calculated to evaluate hypothesis 6. Hypothesis 6 would be supported if a negative D score is observed.

67 52 CHAPTER VIII: STUDY 2 RESULTS Evaluating Hypothesis 2. Hypothesis 2 postulated that the GGUM would produce person parameters that correlate more accurately with the criterion compared to the GPCM. To evaluate this hypothesis, the accuracy of these correlations was compared across measurement models. To establish support for hypothesis 2, the correlations should be more accurate (i.e., better reflect the true correlations between the true theta and true dependent variable) when the GGUM thetas are used to predict the criteria than when the GPCM thetas are used. This was determined by taking the difference (D) in the correlations from the true correlation (i.e., r =.00, r =.15, r =.38) and the observed correlations (for both the GGUM and GPCM thetas). This was done for both the simulated agreeableness and conscientiousness scales for all three true correlations summed up across the ten replications. Evaluating the Simulated Agreeableness Scale. Table 23 shows the absolute difference between the true correlations (i.e., r =.00, r =.15, r =.38) summed across the 10 rounds for each correlation for both the simulated agreeableness and conscientiousness scales. For the agreeableness scale, the GPCM appeared to produce thetas that correlated more accurately with the criterion than the GGUM thetas (see Tables 24, 25, & 26 for these correlations). The absolute difference summed across the 10 rounds when the correlation was equal to.00 was D =.07 for the GPCM and D =.07 for the GGUM; when the correlation was equal to.15, D =.15 for the GPCM and D =.17 for the GGUM; finally, when the correlation was equal to.38, D =.28 for the GPCM and D =.30 for the GGUM. A larger D indicates less accuracy in the correlation produced from the estimated thetas with the simulees true dependent variable (criterion) scores. For the agreeableness scale, it appears that the GGUM produced thetas that less accurately

68 53 correlated with the criterion whereas the GPCM produced thetas that more accurately correlated with the criterion. This finding does not support hypothesis 2. Evaluating the Simulated Conscientiousness Scale. Table 23 also shows the D scores for the conscientiousness scale. For the conscientiousness scale, the GGUM produced thetas that correlated more accurately with the criterion than the GPCM thetas (see Tables 24, 25, & 26 for these correlations). The absolute difference summed across the 10 rounds when the correlation was equal to.00 was D =.16 for the GPCM and D =.14 for the GGUM; when the correlation was equal to.15, D =.22 for the GPCM and D =.16 for the GGUM; finally, when the correlation was equal to.38, D =.43 for the GPCM and D =.35 for the GGUM. Again, a larger D indicates less accuracy in the correlation produced from the estimated thetas with the simulees true dependent variables (criterion) scores. Thus, for the conscientiousness scale, it appeared that the GGUM produced thetas that more accurately correlated with the criterion compared to the GPCM thetas, supporting hypothesis 2. Overall, these results provide partial support for hypothesis 2. Specifically, the conscientiousness results support hypothesis 2 whereas the agreeableness results do not. Evaluating Hypothesis 3. Hypothesis 3 stated that greater differences in rank orders in thetas across measurement models at the upper end of the distribution would be found compared to the lower end of the distribution. To evaluate this, correlations made from the top of the theta distribution (one standard deviation or above) were compared to correlations made from the bottom of the theta distributions (one standard deviation or below) for the two simulated personality scales, for the three correlations with dependent variable conditions (i.e., r =.00, r =.15, r =.38), and for ten rounds within each of the conditions.

69 54 Looking at Table 27, hypothesis 3 was supported across all rounds and conditions. This provides strong support for hypothesis 3. Again, as was observed in the empirical study, some negative relations between the GGUM and GPCM thetas for the conscientiousness scale were observed. Evaluating Hypothesis 5. Hypothesis 5 stated that the GGUM would more accurately estimate simulees true scores on the latent trait compared to the GPCM. To test this hypothesis, correlations were performed between simulees true theta parameters and their theta estimates from the GPCM and between simulees true theta parameters and their theta estimates from the GGUM. Tables 24, 25, and 26 contain these correlations. To find support for Hypothesis 5, the later correlation should be positive and greater than the former correlation. Difference (D) scores were calculated to evaluate this hypothesis. Specifically, the differences between the GPCM theta correlations and the GGUM theta correlations with the true thetas were taken and summed up across the ten simulation rounds within each condition (see Table 28). A positive D score demonstrates that the GPCM thetas correlated better with the true thetas and a negative D score demonstrates that the GGUM thetas correlated better with the true thetas. Looking at Table 28, there was a clear pattern. For the simulated agreeableness scale, the GPCM thetas correlated better with the true thetas; however, for the simulated conscientiousness scale, the GGUM thetas correlated better with the true thetas. These findings provide partial support for hypothesis 5. Evaluating Hypothesis 6. Hypothesis 6 was an extension of hypothesis 5 and it states that the GGUM will more accurately estimate simulees true latent traits at the upper end of the distribution compared to the GPCM theta estimates. To evaluate this, correlations were again compared but this time

70 55 restricted only to simulees who have theta parameters one standard deviation above the mean or higher. Tables 29, 30 and 31 show these correlations. Hypothesis 6 would be supported if the correlation between simulees true theta parameters and the GGUM theta estimates is higher than the correlation between simulees true theta parameters and the GPCM theta estimates. To evaluate this, difference (D) scores were again used. The differences in the correlations of the true theta and the estimated GPCM theta minus the correlations of the true theta and the estimated GGUM theta, summed up across the 10 simulation rounds within each condition were obtained (see Table 32). Again, a positive difference score indicates that the GPCM thetas correlated better with the true thetas at the upper end of the theta distribution and a negative difference score indicates that the GGUM thetas correlated better with the true thetas at the upper end of the theta distribution. Looking at Table 32, it is clear that for the simulated conscientiousness scale, the GGUM thetas correlated more accurately with the true thetas. For the simulated agreeableness scale, the results, although less in magnitude, do seem to favor the GPCM thetas over the GGUM thetas. These results provide partial support for hypothesis 6.

71 56 CHAPTER IX: STUDY 2 DISCUSSION In study 2, hypothesis 2 was partially supported. The simulated conscientiousness GGUM thetas more accurately related to the criterion compared to the GPCM thetas. However, the opposite results were found for the simulated agreeableness scale with the GPCM thetas more accurately relating to the criterion. As stated earlier, these simulated personality scales (i.e., item parameters) are based on the empirical findings (i.e., the observed item parameters from study 1); thus, the conscientiousness scale exhibited a great deal of unfolding in its items (7 of 10) compared to the agreeableness scale (1 of 10 items). It appears that for scales that exhibit a great deal of unfolding, an unfolding IRT model, such as the GGUM, should be used to estimate thetas, as these thetas will more accurately relate to criteria. However, if a scale does not exhibit much unfolding, it appears that a dominance IRT model, such as the GPCM, should be used as these thetas will more accurately relate to criteria. The unfolding model cannot as accurately recover the thetas when there are mostly dominance type items within the scale. Again, hypothesis 3, which involved impact analyses, was fully supported. The correlations between thetas estimated by the GGUM and thetas estimated by the GPCM correlated less at the upper end of the theta distribution than at the rest of the distribution. To determine which measurement model more accurately estimates thetas, hypotheses 5 and 6 were evaluated. Hypothesis 5 was partially supported. For the simulated conscientiousness scale, the GGUM thetas correlated better with true thetas; however, for the simulated agreeableness scale, the GPCM thetas correlated better with true thetas. Therefore, once again, when a scale exhibits much unfolding (i.e., conscientiousness), the GGUM was more accurate; it produced more accurate thetas. However, when a scale does not exhibit much unfolding (i.e., agreeableness), the GPCM was more accurate.

72 57 Finally, hypothesis 6, which involved another set of impact analyses and is a continuation of hypothesis 5, received partial support as well. For the simulated conscientiousness scale, the GGUM thetas, located at the upper end of the theta distribution, better correlated with true thetas. And again, for the simulated agreeableness scale, the GPCM thetas, located at the upper end of the theta distribution, more accurately correlated with true thetas compared to the GGUM thetas. Thus, once again, when the conscientiousness scale, which had many unfolding items, was used, the GGUM was more accurate and when the agreeableness scale was used, which had only one unfolding item, the GPCM was more accurate.

73 58 CHAPTER X: DISCUSSION A central assumption of this paper was that the GGUM would allow for greater flexibility in the interpretation of and responses to personality items. As such, the GGUM would produce accurate scores for respondents no matter if they interpreted items in a dominance or ideal-point way. If this assumption were to hold, then researchers could confidently use the GGUM for personality scales regardless of whether any of the items actually exhibited unfolding. Unfortunately, this assumption did not hold. Instead it was found that one must consider the characteristics of the items within the scale to determine which measurement model would be more appropriate. In this study, the conscientiousness scale had 7 of 10 items exhibit meaningful unfolding and for this scale the GGUM produced more accurate thetas, which then more accurately related to the criterion in the simulation. However, contrary to what was anticipated, the agreeableness scale, which only had 1 of 10 items exhibit meaningful unfolding, produced more accurate GPCM thetas, which then more accurately related to the criterion in the simulation. It is important to note that both of the scales (in fact all 5 scales) were developed using dominance approaches to scale development (i.e., using factor analyses, item-total correlations, alphas, etc., to select the best performing items for the scale). Therefore, even though the conscientiousness scale was created using dominance approaches, it exhibited much unfolding. It is likely that researchers and practitioners are unaware of the possibility of unfolding in their measures. In addition, the items and response options used for these personality scales are similar to other personality scales in that the items ask respondents to indicate the extent to which they agree with items measuring the big-five personality traits and that they were developed using dominance approaches to scale development. Taking all of this information together, researchers

74 59 and practitioners must be aware of how their personality scales are behaving at the item level, even if their scales were developed with a dominance approach. As mentioned above, the GPCM produced more accurate thetas and criterion-related validities for the agreeableness scale. Thus, it may be that the GGUM cannot, in fact, allow for both a dominance and ideal-point interpretation of items, as was thought. Perhaps problems arise in using the GGUM when items, that are dominant in nature, have more neutral locations. For these items, their IRFs would be forced to bend down within a range where many respondents would be located, causing inaccurate theta estimates with the GGUM. This could be what occurred for the agreeableness scale; however, I could not find evidence for this possibility. Generalizing to Classical Test Theory (CTT) Most of the IRT measurement models and classical test theory (CTT) models follow the dominant approach to measurement. They assume that respondents interpret items in a dominant fashion where if they possess much of the attribute, they would agree with the item. If they possess little of the attribute, they would disagree. This paper compared one dominant IRT model to one ideal-point IRT model. It did not consider CTT measurement, which typically involves summing or averaging item scores to create scale scores. However, findings for CTT measurement approaches are likely to be similar to the findings observed for the GPCM as both incorporate dominant approaches to measurement. However, CTT measurement approaches would likely have greater error in the estimation of its trait scores compared to the GPCM (one of the advantages of using IRT models as long as its assumptions hold). Thus, the findings reported here should roughly generalize to CTT measurement approaches in that they should be similar to the findings for the GPCM.

75 60 Many researchers and practitioners utilize CTT measurement approaches when scoring respondents. This may be due to a lack of training in IRT approaches. Based on the findings from this study, this could be more unfortunate than previously believed. The results from this study suggest that at times, the GGUM, an IRT model, is best. In fact, the results suggest that not using the GGUM to estimate respondents scores in some situations would produce unfair rank orders and potentially harm the efficacy of the measure. This is not something that one would want to do when making important decisions based on those scores, such as is done in personnel selection. Item Content and Unfolding Although all of the items used in this study were developed with a dominance approach, it seems likely that the items identified as unfolding would somehow be different in item content. Comparing the content of unfolding items to dominance items, I cannot find any meaningful distinctions (see Appendix B). In fact, many of the unfolding items do not make sense in that I cannot understand how someone would disagree with them from above. Take, for example, the conscientiousness item I am always prepared. It is difficult to imagine a person so extremely conscientious that they would disagree with this item because they are more than always prepared. Looking through all of the items, two stand out as having the possibility of unfolding, but neither of these items was flagged as unfolding. The first was emotional stability item 2, which stated I am relaxed most of the time (delta = 3.4) and the second was conscientiousness item 6, which stated I often forget to put things back in their proper place (delta = 2.1). The conscientiousness item just missed the unfolding cutoff of delta < 2, and so one could argue that it should have been considered an unfolding item. However, the emotional stability item had a

76 61 high delta parameter, suggesting that it was fairly difficult to endorse and maybe there just were not many extremely relaxed students in the sample. An important point to make regarding item content is that this study s hypotheses were formulated before the content of the items in the personality scales was known. It was assumed that the personality scales used would have some items that would lend themselves, in content, to unfolding, such as more moderate items (i.e., items that included statements with sometimes added to them). But it appears that there were only two items on all five scales that had content that would make sense for unfolding. It is interesting that neither of these items was flagged for unfolding but many other items were flagged. It does seem that something else is going on with the unfolding items in this study. Compared to the other scales, the conscientiousness scale exhibited much more unfolding. Thus, one possibility is that people who are extremely conscientious are more reserved in how they use the response options and are therefore less likely to endorse the most extreme option, Very Accurate or Very Inaccurate whereas, those who are slightly less conscientious are more liberal in their use of the Very Accurate or Very Inaccurate response options. Maybe extremely conscientious people are more aware of when they are not always prepared or when they leave their belonging around, or these events are more salient to them, making them less likely to endorse the Very Accurate or Very Inaccurate (respectively), response options compared to their slightly less conscientious counterparts. This would explain why the conscientiousness items exhibit so much more unfolding compared to the other scales. It should be noted that if this explanation were to hold up, the unfolding observed in these items would still be real, and the GGUM, or another unfolding model, would still be a more appropriate measurement model for estimating respondents scale scores.

77 62 Model fit A peculiar finding in the empirical study was that for the agreeableness, emotional stability, extroversion, and intellectance scales, the GGUM fit the data better; however, for the conscientiousness scale, the GPCM fit the data better. This was found using chi-square divided by degrees of freedom ratios, which is what Stark, Chernyshenko, Drasgow, and Williams (2006) used to evaluate model fit in their study. They found that an unfolding model fit personality data best compared to dominance IRT models. Similar to this study, they used a personality scale that was developed using dominance approaches. This study s model-fit results are confusing considering that in the simulation the GGUM produced more accurate thetas for the conscientiousness scale whereas the GPCM produced more accurate thetas for the agreeableness scale and more items unfolded in the conscientiousness scale than in the agreeableness scale. However, it is interesting to note that for the conscientiousness scale, both the GGUM and GPCM demonstrated excellent model-data fit. But in the simulation, drastic differences were observed for the conscientiousness scale; the GGUM had much more accurate theta estimation, criterion-related validities, and did much better in the impact analyses. Clearly more research is needed to understand how model-fit ties in with the findings from the simulation. Impact Analyses Of particular interest were the outcomes of the impact analyses. Using only the upper end of the distribution, it is clear that rank order differences occur depending on the measurement model used. These differences are particularly large for the conscientiousness scale in both the empirical study and simulation. Using the GPCM to estimate thetas for the conscientiousness scale would produce inaccurate rank orders at the upper end of the

78 63 distribution. This is also true of the agreeableness scale and the GGUM, but to a lesser degree. Again these findings have serious implications for high stakes testing situations such as those that occur in personnel selection. Typically in selection, applicants are chosen in a top-down manner in which those who possess the highest scores on the attribute of interest are chosen first and so on. Inaccuracies in rank orders at the top of the distribution would result in an unfair and less effective selection system. Criterion-Related Validity Many significant relationships were observed between personality traits and criteria, especially for the agreeableness, conscientiousness, and intellectance traits. Supporting previous research (e.g., O Connor & Paunonen, 2007), GPA (both high school and college) was found to positively and significantly relate to conscientiousness. In addition, intellectance was found to positively and significantly relate to college GPA. Intellectance is similar to openness to experience, which O Connor and Paunonen (2007) suggested in their meta-analysis may be positively related to academic success but moderators could affect this relationship. In their meta-analysis, O Connor and Paunonen (2007) suggested the possibility that extroversion is negatively related to GPA; however, this study observed almost a zero correlation for both high school and college GPA on extroversion. Again, they suggested the possibility of moderators affecting this relationship. Clearly more research is needed to determine under what conditions significant correlations between extroversion and intellectance occur with GPA. In addition, Blinkle (1996) found that conscientiousness related positively to learning effort. In this study, aside from GPA, many of the criteria contain an effort or motivation component, such as time spent studying, class attendance, and the study skill SJT. For all of these variables, conscientiousness was found to relate positively and significantly, mirroring Blinkle s (1996)

79 64 findings. A few other significant correlations were found; however, care should be taken in interpreting these as it is likely that capitalization on chance occurred due to the sheer number of correlations estimated. Based on the simulation, it appears that the GPCM correlations may more accurately reflect true correlations for the agreeableness scale; on the other hand, the GGUM correlations may more accurately reflect true correlations for the conscientiousness scale (which were consistently lower compared to the GPCM correlations). It is unclear which model is more accurate for the intellectance scale but the correlations are so similar that it does not seem to matter much here, although it does matter when considering non-linear relations. Non-Linear Relations Non-linear relations were observed for the conscientiousness and intellectance GGUM thetas. Previous research has also observed non-linear relations for conscientiousness and intellectance with GPA (Cucina & Vasilopoulos, 2005). However, most of these relations were not observed with the corresponding GPCM thetas. The simulation showed that the GGUM produced more accurate conscientiousness thetas compared to the GPCM. This suggests that for the conscientiousness relations, the observed non-linear relations may be accurate. The conscientiousness items exhibited much unfolding whereas for the intellectance scale, only two items showed meaningful unfolding. Therefore, it is difficult to generalize to the non-linear findings for the intellectance scale. It is interesting to note that when the GGUM was used, some non-linear relations with criteria emerged but when the GPCM was used, these non-linear relations were not found. Although the simulation suggests that the GGUM was more appropriate for the conscientiousness scale, it could be that the linear relations for the GPCM may actually produce

80 65 similar outcomes in a top-down selection situation. For example, if respondents are selected one at a time from most likely to have the highest college GPA to the least likely, using both the GGUM (with non-linear relations with GPA) and the GPCM (with linear relations with GPA), the results may be similar. Interestingly, while the GGUM was observed to do better in the simulation for the conscientiousness scale, the GPCM actually fit the data better in the empirical study. Thus, one possible explanation for these findings could be that the non-linear relations make up for model misfit when relating to other variables. However, as mentioned above, nonlinear relations have been observed in the literature, but those in the literature are typically quadratic, not cubic. Most of the non-linear relations that emerged in this study, for the conscientiousness scale, were cubic. Perhaps it is the cubic term that makes up for model misfit. More research is clearly needed to investigate this issue. Limitations This study used undergraduate students and criteria, such as GPA, that make sense for an undergraduate student population. Unfortunately, a working sample could not be obtained. This is unfortunate because many of the implications arising from the studies in this paper are presented in a personnel selection context. Thus, it would have been easier to generalize findings from a working sample than from a student sample. However, I can think of no reason why personality trait distributions might differ from a student population to a working population and why there would be differences in the interpretation of personality items between these two populations. Therefore, the only real concern is the type of criteria used. In personnel selection, practitioners are interested in predicting job performance (and other organizational outcomes). This study was unable to evaluate job performance, as this criterion was unavailable for the student population, but instead used GPA as an indicator of academic performance. However,

81 66 aside from different criteria, many of the findings from this study are directly applicable to a working population. For example, the finding that the characteristics of the items in a scale help determine which measurement model (unfolding or monotonically increasing) produces more accurate scores, has direct uses in personnel selection. Clearly, practitioners want to use the measurement model that produces scores with less error in its theta estimates. In addition, the finding that even when items are developed with a dominant approach the resulting items can unfold is an important finding for selection. This is because there would be drastic differences in the rank order of applicants at the upper end of the distribution, with, in this case, the thetas produced by the GGUM being more accurate. Again, fairness and testing efficacy are important issues in personnel selection. The simulation in study 2 is a necessary step toward understanding when unfolding verses monotonically increasing IRT models should be utilized. However, it only considered two personality scales with very specific item characteristics. More research is needed to understand when one model is more appropriate than another. In addition, this paper only considered two measurement models, one unfolding and one monotonically increasing. As stated earlier, there are many different measurement models that can be used. More research is needed to understand how these other IRT models (and even CTT approaches to measurement) fit within the findings from the studies presented here. Conclusion In conclusion, this study demonstrated that researchers and practitioners must be aware of how items are behaving in their scales and they should use this knowledge to guide their selection of the most appropriate measurement model. Researchers have suggested that unfolding models can be thought of as more general measurement models, compared to

82 67 dominance models, and can accurately model data derived from both an ideal-point or dominance interpretation; whereas dominance models can only accurately model data derived from a dominance interpretation. Unfortunately, this study observed that there are cases when unfolding models cannot as accurately model data compared to dominance models and it appears that this is when items are interpreted in a dominance fashion. In addition, although the scales used in this study were developed using dominance approaches to scale development, the resulting scales did not always produce items that fit well with monotonically increasing (dominance) measurement models, such as the GPCM. In this paper, one of the personality scales had many of its items show meaningful unfolding and, for this scale, the unfolding IRT model (GGUM) produced more accurate theta estimates compared to the dominant IRT model (GPCM). Therefore, researchers and practitioners cannot assume that if a scale was developed with a dominance approach all of its items will work best in a dominance measurement model. Taken together, these findings have important implications for personnel selection, where applicants are typically selected in a top-down approach with those applicants possessing the most amount of some trait of interest being chosen first and then the next and so on until all positions are filled. In these high stakes top-down selection situations, choosing the wrong measurement model to estimate applicants level on the trait of interest could result in an unfair and ineffective selection system. However, using the most appropriate measurement model would result in selecting the candidates who are most likely to perform the best on the job. This is a primary goal of personnel selection and the findings from this study should be utilized to help accomplish this goal.

83 68 REFERENCES Ackerman, P. L., & Heggestad, E. D (1997). Intelligence, personality, and interests: Evidence for overlapping traits. Psychological Bulletin, 121, Andrich, D., & Luo, G. (1993). A hyperbolic cosine latent trait model for unfolding dichotomous single-stimulus responses. Applied Psychological Measurement, 17, Andrich, D., & Styles, I. M. (1998). The structural relationship between attitude and behaviour statements from the unfolding perspective. Psychological Methods, 3(4), Barrick, M. R. & Mount, M. K. (1991) The Big-Five Personality Dimensions and Job Performance: A meta-analysis. Personnel Psychology, 44, Barrick, M. R., & Mount, M. K. (1993). Autonomy as a moderator of the relationships between the big five personality dimensions and job performance. Journal of Applied Psychology, 78, Barrick, M.R. & Mount, M.K. (2005) Yes, Personality Matters: Moving on to more important matters. Human Performance, 18, Benson, M. J., & Campbell, J. P. (2007). To be or not to be linear: An expanded representation of personality and its relationship to leadership performance. International Journal of Personality and Assessment, (15)2, Blickle, G. (1996). Personality traits, learning strategies, and performance. European Journal of Personality, 10, Bobko, P. (1995). Testing correlations for statistical significance. In P. Bobko (Eds.) Correlation and Regression: Principles and Applications for Industrial/Organizational Psychology and Management. New York, NY: McGraw-Hill, Inc. Borgatta, E. E. (1964). The structure of personality characteristics. Behavioral Science, 12, 8-17.

84 69 Campbell, C. H., Ford, P., Rumsey, M. G., Pulakos, E.D., Borman, W. C., Felker, D. B., De Vera, M. V., & Riegelhaupt, B. J. (1990). Development of multiple job performance measures in a representative sample of jobs. Personnel Psychology, 43(2), Chernyshenko, O. S., Stark, S., Drasgow, F., & Roberts, B. W. (2007). Constructing personality scales under the assumptions of an ideal point response process: Toward increasing the flexibility of personality measures. Psychological Assessment, 19(1), Coombs, C. H. (1964). A theory of data. New York: Wiley. Cucina, J.M. & Vasilopoulos, N.L. (2005) Nonlinear Personality Performance Relationships and the Spurious Moderating Effects of Traitedness. Journal of Personality, 73, Dollinger, S. J., & Orf, L. A. (1991). Personality and performance in personality : conscientiousness and openness. Journal of Research in Personality, 25, Drasgow, F., Levine, M. V., Tsien, S., Williams, B. A., & Mead, A. D. (1995). Fitting polytomous item response theory models to multiple choice tests. Applied Psychological Measurement, 19, epredix (2001) Technical Manual for the Global Personality Inventory. Minneapolis, MN: epredix. Fisher, R. A. (1921). On the probable error of a coefficient of correlation deduced from a small sample. Metron, 1, Goldberg, L.R. (1990) An Alternative Description of Personality : The Big Five Factor structure. Journal of Personality and Social Psychology, 59, Goldberg, L.R. (1993).The structure of phenotypic personality traits. American Psychologist, 48,

85 70 Guion, R.M. & Gottier, R.F. (1965) Validity of Personality Measures in Personnel Selection. Personnel Psychology, 18, Hakel M. D. (1974). Normative personality factors recovered from ratings of personality descriptors: The beholder's eye. Personnel Psychology, 27, Hogan, R. (2005) In Defense of Personality Measurement: New wine for old whiners. Human Performance, 18, Hunter J. E., & Schmidt F. L. (1990). Methods of meta-analysis: correcting error and bias in research findings. Newbury Park, CA: Sage Publications. Hurley, R. F. (1998). Customer service behavior in retail settings: A study of the effect of service provider personality. Journal of the Academy of Marketing Science, 26(2), LaHuis, D. M., Martin, N. R. & Avis, J. M. (2005) Investigating nonlinear conscientiousness job performance relations for clerical employees. Human Performance, 18, Likert R. (1932). A technique for the measurement of attitudes. Archives of Psychology, 140, Locke, E. A., & Hulin, C. L. (1962). A review and evaluation of the validity studies of activity vector analysis, Personnel Psychology, 15, Lou, G. (2001). A class of probabilistic unfolding models for polytomous responses. Journal of Mathematical Psychology, 45, Major, D. A., Turner, J. E., & Fletcher, T. D. (2006). Linking proactive personality and the big five to motivation to learn and development activity. Journal of Applied Psychology, 91(4),

86 71 McCloy, R. A., Heggestad, E. D., & Reeve., C. L. (2005). A silk purse from the sow's ear: Retrieving normative information from multidimensional forced-choice items. Organizational Research Methods, 8(2), McHenry, J. J., Hough, L. M., Toquam, J. L., Hanson, M. A., & Ashworth, S. (1990). Project A validity results: The relationship between predictor and criterion domains. Personnel Psychology, 43(2), Morgeson, F. P., Campion, M. A., Dipboye, R. L., Hollenbeck, J. R., Murphy, K., & Schmitt, N. (2007). Reconsidering the use of personality tests in personnel selection contexts. Personnel Psychology, 60, Mount, M. K., & Barrick, M. R. (1995). The Big Five personality dimensions: Implications for research and practice in human resource management. Research in Personnel and Human Resources Management, 13, Mueller-Hanson, R., Heggestad, E. D., & Thornton, G. C. (2003). Faking and selection: Considering the use of personality from select-in and select-out perspectives. Journal of Applied Psychology, 88(2), Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, Muraki, E., & Bock, D. (2003). Parscale for Windows, Version 4.1. Chicago: Scientific Software International. Murphy, K. R. (1996). Individual differences and behavior in organizations: Much more than g. In K. R. Murphy (Ed.), Individual Differences and Behavior in Organizations. San Francisco: Jossey-Bass, pp

87 72 Noel, Y. (1999). Recovering unimodal latent patterns of change by unfolding analysis: Applications to smoking cessation. Psychological Methods, 4(2), Norman, W. T. (1963). Toward an adequate taxonomy of personality attributes: Replicated factor structure in peer nomination personality ratings. Journal of Abnormal and Social Psychology, 66, O Connor, M. C., & Paunonen S. V. (2007). Big Five personality predictors of post-secondary academic performance. Personality and Individual Differences, 43, Ones, D. S., Dilchert, S., Viswesvaran, C., & Judge, T. A. (2007). In support of personality assessment in organizational settings. Personnel Psychology, 60, Ones, D. S., & Viswesvaran, C. (2001). Personality at work: Criterion-focused occupational personality scales used in personnel selection. In B. W. Roberts & R. Hogan (Eds.), Personality Psychology in the Workplace (pp ). Washington, DC: American Psychological Association. Ones, D.S., Viswesvaran, C. & Dilchert, S. (2005) Personality at Work: Raising awareness and correcting misconceptions. Human Performance, 18, Ones, D. S., Viswesvaran, C., & Schmidt, F. L. (1993). Comprehensive meta-analysis of integrity test validities: Findings and implications for personnel selection and theories of job performance. Journal of Applied Psychology, 78, Paunonen, S. V., Haddock, G., Forsterling, F., & Keinonen, M. (2003). Broad Verses Narrow Personality Measures and the Prediction of Behaviour Across Cultures. European Journal of Personality,17,

88 73 Peterson, N. G., Hough, L. M., Dunnette, M. D., Rosse, R. L., Houston, J. S., Toquam, J. L., & Wing, H. (1990). Project A: Specification of the predictor domain and development of new selection/classification tests. Personnel Psychology, 43(2), Petty, R. E., & Cacioppo, J. T. (1981). Attitudes and Persuasion: Classic Contemporary Approaches. Dubuque, IA.: William C. Brown. Post, W. J., & Snijders, T. A. B. (1993). Nonparametric unfolding models for dichotomous data. Methodika, 7, Reckase, M. D. (1979). Unifactor latent trait models applied to multifactor tests: results and implications. Journal of Educational Statistics, 4, Reilly, R. R., & Chao, G.T. (1982). Validity and fairness of some alternative employee selection procedures, Personnel Psychology, 5, Roberts, J. S. (2004). GGUM2004 Technical Reference Manual. Version Roberts, J. S., Donoghue, J. R., & Laughlin, J. E. (2000). A General Item Response Theory Model for Unfolding Unidimensional Polytomous Responses. Applied Psychological Measurement, 24(1), Roberts, J. S., & Laughlin, J. E. (1996). A unidimensional item response model for unfolding responses from a graded disagree-agree response scale. Applied Psychological Measurement, 20, Roberts, J. S., & Laughlin, J. E., & Wedell, D. H. (1999). Validity issues in the Likert and Thurstone approaches to attitude measurement. Educational and Psychological Measurement, 59,

89 74 Robie, C. & Ryan, A.M. (1999) Effects of Nonlinearity and Heteroscedasticity on the Validity of Conscientiousness in Predicting Overall Job Performance. International Journal of Selection and Assessment, 7, Rost J. & Luo G. (1997). An application of a Rasch-based unfolding model to a questionnaire on adolescent centrism. In J. Rost & Rolf L. (Eds.), Applications of latent trait and latent class models in the social sciences. New York: Waxmann Munster. Schmitt, N., Gooding, R. Z., Noe, R.A., & Kirsch, M. (1984). Meta-analyses of validity studies, Personnel Psychology, 37, Spriegel, W. R. & Dale, A. G. (1953). Trends in personnel selection and induction. Personnel, 30, Stark, S. (2001). MODFIT: A computer program for model-data fit. Unpublished manuscript, University of Illinois at Urban-Champaign. Stark, S., Chernyshenko, O. S., Drasgow, F., & Williams, B. A. (2006). Examining assumptions about item responding in personality assessment: Should ideal point methods be considered for scale development and scoring? Journal of Applied Psychology, 91(1), Steiger, J. H. (1980). Tests for comparing elements of a correlations matrix. Psychological Bulletin, 87, Tett, R. P., & Christiansen N. D. (2007). Personality tests at the crossroads: A response to Morgeson, Campion, Dipboye, Hollenbeck, Murphy, and Schmitt (2007). Personnel Psychology, 60, Tett, R. P., Jackson, D. N. & Rothstein, M. (1991) Personality Measures as Predictors of Job Performance: A meta-analytic review. Personnel Psychology, 44,

90 75 Thurstone, L. L. (1928). Attitudes can be measured. American Journal of Sociology, 33, Thurstone, L. L. (1927). A law of comparative judgments. Psychological Review, 34, Van Schaur, W. H., & Kiers, H. A. L. (1994). Why factor analysis often is the incorrect model for analyzing bipolar concepts, and what model to use instead. Applied Psychological Measurement, 18, Wintre, M. G., & Sugar, L. A. (2000). Relationships with parents, personality, and the university transition. Journal of College Student Development, 41(2), Wolfe, R. N., & Johnson, S. D. (1995). Personality as a predictor of college performance. Educational and Psychological Measurement, 55(2), Zickar, M. J., Rosse, J. G., Levin, R. A., & Hulin, C. L. (1996). Modeling the effects of faking on personality tests. A paper presented at the 11th annual meeting of the Society for Industrial and Organizational Psychology, San Diego, CA.

91 76 Table 1 GPCM and GGUM Item Parameters for the Agreeableness Personality Scale GPCM Agreeableness Item # Discrimination Threshold1 Threshold2 Threshold3 Threshold GGUM Agreeableness Item # Discrimination Location Tau1 Tau2 Tau3 Tau

92 77 Table 2 GPCM and GGUM Item Parameters for the Conscientiousness Personality Scale GPCM Conscientiousness Item # Discrimination Threshold1 Threshold2 Threshold3 Threshold GGUM Conscientiousness Item # Discrimination Location Tau1 Tau2 Tau3 Tau

93 78 Table 3 GPCM and GGUM Item Parameters for the Emotional Stability Personality Scale GPCM Emotional Stability Item # Discrimination Threshold1 Threshold2 Threshold3 Threshold GGUM Emotional Stability Item # Discrimination Location Tau1 Tau2 Tau3 Tau

94 79 Table 4 GPCM and GGUM Item Parameters for the Extroversion Personality Scale GPCM Extroversion Item # Discrimination Threshold1 Threshold2 Threshold3 Threshold GGUM Extroversion Item # Discrimination Location Tau1 Tau2 Tau3 Tau

95 80 Table 5 GPCM and GGUM Item Parameters for the Intellectance Personality Scale GPCM Intellectance Item # Discrimination Threshold1 Threshold2 Threshold3 Threshold GGUM Intellectance Item # Discrimination Location Tau1 Tau2 Tau3 Tau

96 81 Table 6 Items with Meaningful Unfolding and their GGUM Discrimination and Location Item Parameters* Scale Item # Item Content Discrimination Location Agreeableness 3 I insult people Conscientiousness 1 I am always prepared I leave my belongings around I pay attention to details I like order I shirk my duties I follow a schedule I am exacting in my work Emotional Stability 7 I change my mood a lot Extroversion 3 I feel comfortable around people I start conversations I have little to say Intellectance 5 I have excellent ideas I am full of ideas * Meaningful unfolding was determined by GGUM item location (delta) parameters less than or equal to 2.

97 82 Table 7 Chi-Square to Degree of Freedom Ratios for the GPCM and GGUM for the Agreeableness Personality Scale Agreeableness GPCM FREQUENCY TABLE OF CHISQUARE/DF RATIOS <1 1<2 2<3 3<4 4<5 5<7 >7 Mean SD Singlets Doublets Triplets Agreeableness GGUM FREQUENCY TABLE OF CHISQUARE/DF RATIOS <1 1<2 2<3 3<4 4<5 5<7 >7 Mean SD Singlets Doublets Triplets

98 83 Table 8 Chi-Square to Degree of Freedom Ratios for the GPCM and GGUM for the Conscientiousness Personality Scale Conscientiousness GPCM FREQUENCY TABLE OF CHISQUARE/DF RATIOS <1 1<2 2<3 3<4 4<5 5<7 >7 Mean SD Singlets Doublets Triplets Conscientiousness GGUM FREQUENCY TABLE OF CHISQUARE/DF RATIOS <1 1<2 2<3 3<4 4<5 5<7 >7 Mean SD Singlets Doublets Triplets

99 84 Table 9 Chi-Square to Degree of Freedom Ratios for the GPCM and GGUM for the Emotional Stability Personality Scale Emotional Stability GPCM FREQUENCY TABLE OF CHISQUARE/DF RATIOS <1 1<2 2<3 3<4 4<5 5<7 >7 Mean SD Singlets Doublets Triplets Emotional Stability GGUM FREQUENCY TABLE OF CHISQUARE/DF RATIOS <1 1<2 2<3 3<4 4<5 5<7 >7 Mean SD Singlets Doublets Triplets

100 85 Table 10 Chi-Square to Degree of Freedom Ratios for the GPCM and GGUM for the Extroversion Personality Scale Extroversion GPCM FREQUENCY TABLE OF CHISQUARE/DF RATIOS <1 1<2 2<3 3<4 4<5 5<7 >7 Mean SD Singlets Doublets Triplets Extroversion GGUM FREQUENCY TABLE OF CHISQUARE/DF RATIOS <1 1<2 2<3 3<4 4<5 5<7 >7 Mean SD Singlets Doublets Triplets

101 86 Table 11 Chi-Square to Degree of Freedom Ratios for the GPCM and GGUM for the Intellectance Personality Scale Intellectance GPCM FREQUENCY TABLE OF CHISQUARE/DF RATIOS <1 1<2 2<3 3<4 4<5 5<7 >7 Mean SD Singlets Doublets Triplets Intellectance GGUM FREQUENCY TABLE OF CHISQUARE/DF RATIOS <1 1<2 2<3 3<4 4<5 5<7 >7 Mean SD Singlets Doublets Triplets

102 87 Table 12 Correlations between GGUM Thetas, GPCM Thetas, and Student Criteria for the Agreeableness Personality Scale Agreeableness GGUM Theta GPCM Theta ACT * H.S. GPA 0.189** 0.193** College GPA 0.207** 0.214** Hours Study Attend Class SJT 0.097* 0.103* Difference + D =.021, in favor of the GPCM *Correlation significant at the α =.05 level. **Correlation significant at the α =.01 level. + The Difference in the magnitude of significant GGUM and GPCM theta correlations.

103 88 Table 13 Correlations between GGUM Thetas, GPCM Thetas, and Student Criteria for the Conscientiousness Personality Scale Conscientiousness GGUM Theta GPCM Theta ACT H.S. GPA 0.207** 0.253** College GPA 0.177** 0.224** Hours Study 0.126* 0.157** Attend Class 0.252** 0.284** SJT 0.316** 0.321** Difference + D =.161, in favor of the GPCM *Correlation significant at the α =.05 level. **Correlation significant at the α =.01 level. + The Difference in the magnitude of significant GGUM and GPCM theta correlations.

104 89 Table 14 Correlations between GGUM Thetas, GPCM Thetas, and Student Criteria for the Emotional Stability Personality Scale Emotional Stability GGUM Theta GPCM Theta ACT H.S. GPA College GPA Hours Study Attend Class SJT Difference + None. No Significant Correlations. + The Difference in the magnitude of significant GGUM and GPCM theta correlations.

105 90 Table 15 Correlations between GGUM Thetas, GPCM Thetas, and Student Criteria for the Extroversion Personality Scale Extroversion GGUM Theta GPCM Theta ACT H.S. GPA College GPA Hours Study Attend Class * SJT Difference + D =.017, in favor of the GGUM *Correlation significant at the α =.05 level. + The Difference in the magnitude of significant GGUM and GPCM theta correlations.

106 91 Table 16 Correlations between GGUM Thetas, GPCM Thetas, and Student Criteria for the Intellectance Personality Scale Intellectance GGUM Theta GPCM Theta ACT 0.378** 0.368** H.S. GPA College GPA 0.134* 0.137** Hours Study Attend Class ** ** SJT Difference + D =.006, in favor of the GGUM *Correlation significant at the α =.05 level. **Correlation significant at the α =.01 level. + The Difference in the magnitude of significant GGUM and GPCM theta correlations.

107 92 Table 17 Correlations between GGUM and GPCM Thetas Using Top, Bottom, and Full Respondent Distributions for the Big-Five Personality Scales Personality Scale Top* Bottom** Full*** Agreeableness Conscientiousness Emotional Stability Extroversion Intellectance * Correlations are based on estimated GGUM thetas and estimated GPCM thetas at upper end of distribution (1 standard deviation or more above the GGUM mean). ** Correlations are based on estimated GGUM thetas and estimated GPCM thetas at lower end of distribution (1 standard deviation or more below the GGUM mean). *** Correlations are based on estimated GGUM thetas and estimated GPCM thetas using the full distribution.

108 93 Table 18 Step-Wise Power Polynomial Regressions for GGUM Agreeableness Thetas and GPCM Agreeableness Thetas as Predictors GGUM Agreeableness DV = Time Studying DV = ACT Score DV = HS GPA DV = College GPA Step Beta R2 ΔR2 DV = Attend Class DV = Study SJT Beta R2 ΔR2 Beta R2 ΔR2 Beta R2 ΔR2 Beta R2 ΔR2 Beta R2 ΔR **.04.04** *.01 Agreeableness **.21** * **.00.04** Agreeableness **.21** Squared-Term **.00.04** Agreeableness **.21** Squared-Term Cubed-Term Step Beta R2 ΔR2 GPCM Agreeableness DV = Time Studying DV = ACT Score DV = HS GPA DV = College GPA DV = Attend Class DV = Study SJT Beta R2 ΔR2 Beta R2 ΔR2 Beta R2 ΔR2 Beta R2 ΔR2 Beta R2 ΔR *.01.04**.04.05** *.01 Agreeableness.06.10*.19**.21** * **.00.05** Agreeableness **.22** Squared-Term **.00.05** Agreeableness **.20* Squared-Term Cubed-Term * p <.05; ** p <.01

109 94 Table 19 Step-Wise Power Polynomial Regressions for GGUM Conscientiousness Thetas and GPCM Conscientiousness Thetas as Predictors GGUM Conscientiousness DV = Time Studying DV = ACT Score DV = HS GPA DV = College GPA DV = Attend Class DV = Study SJT Step Beta R2 ΔR2 Beta R2 ΔR2 Beta R2 ΔR2 Beta R2 ΔR2 Beta R2 ΔR2 Beta R2 ΔR2 1.02* **.04.03**.03.02**.02.10**.10 Conscientiousness.13* **.18** -.14**.32** 2.02* **.01.03**.00.02*.00.11**.01 Conscientiousness.13** **.18** -.14**.32** Squared-Term * **.03.05**.02.02*.00.11**.00 Conscientiousness.12*.04.38**.30** -.18**.37** Squared-Term Cubed-Term ** -.17* Step Beta R2 ΔR2 GPCM Conscientiousness DV = Time Studying DV = ACT Score DV = HS GPA DV = College GPA DV = Attend Class DV = Study SJT Beta R2 ΔR2 Beta R2 ΔR2 Beta R2 ΔR2 Beta R2 ΔR2 Beta R2 ΔR2 1.03** **.06.05**.05.03**.03.10**.10 Conscientiousness.16** **.22** -.16**.32** 2.03** **.01.06**.01.03**.00.10**.00 Conscientiousness.16** **.23** -.16**.32** Squared-Term ** **.00.06**.00.03*.00.11**.01 Conscientiousness **.22** -.17*.34** Squared-Term Cubed-Term * p <.05; ** p <.01

110 95 Table 20 Step-Wise Power Polynomial Regressions for GGUM Emotional Stability Thetas and GPCM Emotional Stability Thetas as Predictors GGUM Emotional Stability DV = Time Studying DV = ACT Score DV = HS GPA Step Beta R2 ΔR2 DV = College GPA DV = Attend Class DV = Study SJT Beta R2 ΔR2 Beta R2 ΔR2 Beta R2 ΔR2 Beta R2 ΔR2 Beta R2 ΔR Emotional Stability Emotional Stability Squared-Term Emotional Stability Squared-Term Cubed-Term Step Beta R2 ΔR2 GPCM Emotional Stability DV = Time Studying DV = ACT Score DV = HS GPA DV = College GPA DV = Attend Class DV = Study SJT Beta R2 ΔR2 Beta R2 ΔR2 Beta R2 ΔR2 Beta R2 ΔR2 Beta R2 ΔR Emotional Stability Emotional Stability Squared-Term Emotional Stability Squared-Term Cubed-Term * p <.05; ** p <.01

111 96 Table 21 Step-Wise Power Polynomial Regressions for GGUM Extroversion Thetas and GPCM Extroversion Thetas as Predictors GGUM Extroversion DV = Time Studying DV = ACT Score DV = HS GPA Step Beta R2 ΔR2 DV = College GPA DV = Attend Class DV = Study SJT Beta R2 ΔR2 Beta R2 ΔR2 Beta R2 ΔR2 Beta R2 ΔR2 Beta R2 ΔR Extroversion Extroversion Squared-Term Extroversion Squared-Term Cubed-Term Step Beta R2 ΔR2 GPCM Extroversion DV = Time Studying DV = ACT Score DV = HS GPA DV = College GPA DV = Attend Class DV = Study SJT Beta R2 ΔR2 Beta R2 ΔR2 Beta R2 ΔR2 Beta R2 ΔR2 Beta R2 ΔR Extroversion Extroversion Squared-Term Extroversion Squared-Term Cubed-Term * p <.05; ** p <.01

112 97 Table 22 Step-Wise Power Polynomial Regressions for GGUM Intellectance Thetas and GPCM Intellectance Thetas as Predictors GGUM Intellectance DV = Time Studying DV = ACT Score DV = HS GPA DV = College GPA Step Beta R2 ΔR2 DV = Attend Class DV = Study SJT Beta R2 ΔR2 Beta R2 ΔR2 Beta R2 ΔR2 Beta R2 ΔR2 Beta R2 ΔR ** **.02.01* Intellectance.01.38**.09.13**.10* ** * *.02 Intellectance.02.35**.09.14** Squared-Term ** **.00.02* Intellectance.03.33** Squared-Term Cubed-Term * Step Beta R2 ΔR2 GPCM Intellectance DV = Time Studying DV = ACT Score DV = HS GPA DV = College GPA DV = Attend Class DV = Study SJT Beta R2 ΔR2 Beta R2 ΔR2 Beta R2 ΔR2 Beta R2 ΔR2 Beta R2 ΔR ** **.02.01* Intellectance.00.37**.08.14**.11* ** * *.02 Intellectance.01.36**.09.15** Squared-Term ** ** * Intellectance **.02.21* Squared-Term Cubed-Term * p <.05; ** p <.01

113 98 Table 23 Absolute Differences in Correlation Scores between Correlations based on True Thetas and True Dependent Variable Scores and Correlations based on Estimated Thetas (GGUM and GPCM) and True Dependent Variable Scores, Summed up Across 10 Rounds for each Condition Correlation = 0.00 Agreeableness Conscientiousness GPCM Theta D = D = GGUM Theta D = D = Correlation = 0.15 Agreeableness Conscientiousness GPCM Theta D = D = GGUM Theta D = D = Correlation = 0.38 Agreeableness Conscientiousness GPCM Theta D = D = GGUM Theta D = D = 0.347

114 99 Table 24 Correlations among True Dependent Variable Scores, True Thetas, Estimated GPCM Thetas, and Estimated GGUM Thetas when Correlation between True Dependent Variable Scores and True Thetas Equals 0.00 for 10 Simulation Rounds Correlation = 0.00 Agreeableness Conscientiousness round 1 True Theta True DV round 1 True Theta True DV True Theta True Theta True DV True DV GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 2 round 2 True Theta True Theta True DV True DV GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 3 round 3 True Theta True Theta True DV True DV GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 4 round 4 True Theta True Theta True DV True DV GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 5 round 5 True Theta True Theta True DV True DV GPCM Theta GPCM Theta GGUM Theta GGUM Theta

115 100 Table 24 Continued Correlation = 0.00 Agreeableness Conscientiousness round 6 True Theta True DV round 6 True Theta True DV True Theta True Theta True DV True DV GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 7 round 7 True Theta True Theta True DV True DV GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 8 round 8 True Theta True Theta True DV True DV GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 9 round 9 True Theta True Theta True DV True DV GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 10 round 10 True Theta True Theta True DV True DV GPCM Theta GPCM Theta GGUM Theta GGUM Theta

116 101 Table 25 Correlations among True Dependent Variable Scores, True Thetas, Estimated GPCM Thetas, and Estimated GGUM Thetas when Correlation between True Dependent Variable Scores and True Thetas Equals 0.15 for 10 Simulation Rounds Correlation = 0.15 Agreeableness Conscientiousness round 1 True Theta True DV round 1 True Theta True DV True Theta True Theta True DV True DV GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 2 round 2 True Theta True Theta True DV True DV GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 3 round 3 True Theta True Theta True DV True DV GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 4 round 4 True Theta True Theta True DV True DV GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 5 round 5 True Theta True Theta True DV True DV GPCM Theta GPCM Theta GGUM Theta GGUM Theta

117 102 Table 25 Continued Correlation = 0.15 Agreeableness Conscientiousness round 6 True Theta True DV round 6 True Theta True DV True Theta True Theta True DV True DV GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 7 round 7 True Theta True Theta True DV True DV GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 8 round 8 True Theta True Theta True DV True DV GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 9 round 9 True Theta True Theta True DV True DV GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 10 round 10 True Theta True Theta True DV True DV GPCM Theta GPCM Theta GGUM Theta GGUM Theta

118 103 Table 26 Correlations among True Dependent Variable Scores, True Thetas, Estimated GPCM Thetas, and Estimated GGUM Thetas when Correlation between True Dependent Variable Scores and True Thetas Equals 0.38 for 10 Simulation Rounds Correlation = 0.38 Agreeableness Conscientiousness round 1 True Theta True DV round 1 True Theta True DV True Theta True Theta True DV True DV GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 2 round 2 True Theta True Theta True DV True DV GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 3 round 3 True Theta True Theta True DV True DV GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 4 round 4 True Theta True Theta True DV True DV GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 5 round 5 True Theta True Theta True DV True DV GPCM Theta GPCM Theta GGUM Theta GGUM Theta

119 104 Table 26 Continued Correlation = 0.38 Agreeableness Conscientiousness round 6 True Theta True DV round 6 True Theta True DV True Theta True Theta True DV True DV GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 7 round 7 True Theta True Theta True DV True DV GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 8 round 8 True Theta True Theta True DV True DV GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 9 round 9 True Theta True Theta True DV True DV GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 10 round 10 True Theta True Theta True DV True DV GPCM Theta GPCM Theta GGUM Theta GGUM Theta

120 105 Table 27 Correlations between GGUM and GPCM Thetas Using Top, Bottom, and Full Respondent Distributions for the Simulated Agreeableness and Conscientiousness Personality Scales for the 10 Simulation Rounds across each Correlation Condition Correlation = 0.00 Agreeableness Conscientiousness Top* Bottom** Full*** Top* Bottom** Full*** Correlation = 0.15 Agreeableness Conscientiousness Top* Bottom** Full*** Top* Bottom** Full***

121 106 Table 27 Continued Correlation = 0.38 Agreeableness Conscientiousness Top* Bottom** Full*** Top* Bottom** Full*** * Correlations are based on estimated GGUM thetas and estimated GPCM thetas at upper end of distribution (1 standard deviation or more above the mean). ** Correlations are based on estimated GGUM thetas and estimated GPCM thetas at lower end of distribution (1 standard deviation or more below the mean). *** Correlations are based on estimated GGUM thetas and estimated GPCM thetas using the full distribution

122 107 Table 28 The Difference (D) between True Theta and GPCM Theta Correlations from True Theta and GGUM Theta Correlations, Summed up Across the 10 Simulation Rounds within each Condition Correlation = 0.00 Agreeableness Conscientiousness GPCM - GGUM* D = D = Correlation = 0.15 Agreeableness Conscientiousness GPCM - GGUM* D = D = Correlation = 0.38 Agreeableness Conscientiousness GPCM - GGUM* D = D = * The Difference (D) scores represent the difference in the correlations between true thetas and GPCM thetas minus the correlation between true thetas and GGUM thetas summed up across the 10 simulation rounds within each condition. A positive D indicates that the GPCM thetas correlated better with the true thetas. A negative D indicates that the GGUM thetas correlated better with the true thetas.

123 108 Table 29 Correlations among True Theta Scores, Estimated GPCM Thetas, and Estimated GGUM Thetas, located 1 Standard Deviation or More Above the Mean, when the True Correlation Equals 0.00 for 10 Simulation Rounds Correlation = 0.00 Agreeableness Conscientiousness True Theta* True Theta* round 1 round 1 GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 2 round 2 GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 3 round 3 GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 4 round 4 GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 5 round 5 GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 6 round 6 GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 7 round 7 GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 8 round 8 GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 9 round 9 GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 10 round 10 GPCM Theta GPCM Theta GGUM Theta GGUM Theta *All correlations are using true thetas located 1 standard deviation above the mean or higher only.

124 109 Table 30 Correlations among True Theta Scores, Estimated GPCM Thetas, and Estimated GGUM Thetas, located 1 Standard Deviation or More Above the Mean, when the True Correlation Equals 0.15 for 10 Simulation Rounds Correlation = 0.15 Agreeableness Conscientiousness True Theta* True Theta* round 1 round 1 GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 2 round 2 GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 3 round 3 GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 4 round 4 GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 5 round 5 GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 6 round 6 GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 7 round 7 GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 8 round 8 GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 9 round 9 GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 10 round 10 GPCM Theta GPCM Theta GGUM Theta GGUM Theta *All correlations are using true thetas located 1 standard deviation above the mean or higher only.

125 110 Table 31 Correlations among True Theta Scores, Estimated GPCM Thetas, and Estimated GGUM Thetas, located 1 Standard Deviation or More Above the Mean, when the True Correlation Equals 0.38 for 10 Simulation Rounds Correlation = 0.38 Agreeableness Conscientiousness True Theta* True Theta* round 1 round 1 GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 2 round 2 GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 3 round 3 GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 4 round 4 GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 5 round 5 GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 6 round 6 GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 7 round 7 GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 8 round 8 GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 9 round 9 GPCM Theta GPCM Theta GGUM Theta GGUM Theta round 10 round 10 GPCM Theta GPCM Theta GGUM Theta GGUM Theta *All correlations are using true thetas located 1 standard deviation above the mean or higher only.

126 111 Table 32 The Difference in Correlations between True Thetas and GPCM Thetas Located One Standard Deviation or More above the Mean from True Thetas and GGUM Thetas Located 1 Standard Deviation or More above the Mean, Summed up Across the 10 Simulation Rounds within each Condition Correlation = 0.00 Agreeableness Conscientiousness GPCM - GGUM* D = D = Correlation = 0.15 Agreeableness Conscientiousness GPCM - GGUM* D = D = Correlation = 0.38 Agreeableness Conscientiousness GPCM - GGUM* D = D = * The Difference (D) scores represent the difference in the correlations between true thetas and GPCM thetas 1 standard deviation or more above the mean minus the correlation between true thetas and GGUM thetas 1 standard deviation or more above the mean, summed up across the 10 simulation rounds within each condition. A positive D indicates that the GPCM thetas correlated better with the true thetas at 1 standard deviation or more above the mean. A negative D indicates that the GGUM thetas correlated better with the true thetas at 1 standard deviation or more above the mean.

127 112 Figure 1 Item Response Function for Dominance Approach 2-Parameter Logistic Model - Likelihood of Endorsing Most Positive Response Option Test Item 0001

128 113 Figure 2 Item Response Function for Ideal-Point Approach Generalized Graded Unfolding Model - Likelihood of Endorsing Least Positive to Most Positive Response Options Based on Theta Test Item 0002

129 114 Figure 3 Category Probability Function for an Item Exhibiting Unfolding, Modeled by the Generalized Graded Unfolding Model, Showing the Probability of Endorsing Response Options of an Item Based on Theta

130 115 Figure 4 Category Probability Function for an Item Exhibiting Unfolding, Modeled by the Generalized Graded Unfolding Model, Showing the Probability of Endorsing Response Options of an Item Based on Theta

131 116 Figure 5 An Item Characteristic Curve for the Generalized Partial Credit Model, Showing the Probability of Endorsing Response Options of an Item Based on Theta (Ability) a = 2.051, b j = , b jv = 1.755, 0.530, ,

132 117 Figure 6 An Item Characteristic Curve for the Generalized Partial Credit Model, Showing the Probability of Endorsing Response Options of an Item Based on Theta (Ability) a = 0.408, b j = , b jv = 3.788, , ,

Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida

Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida and Oleksandr S. Chernyshenko University of Canterbury Presented at the New CAT Models

More information

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD Psy 427 Cal State Northridge Andrew Ainsworth, PhD Contents Item Analysis in General Classical Test Theory Item Response Theory Basics Item Response Functions Item Information Functions Invariance IRT

More information

DEVELOPING IDEAL INTERMEDIATE ITEMS FOR THE IDEAL POINT MODEL MENGYANG CAO THESIS

DEVELOPING IDEAL INTERMEDIATE ITEMS FOR THE IDEAL POINT MODEL MENGYANG CAO THESIS 2013 Mengyang Cao DEVELOPING IDEAL INTERMEDIATE ITEMS FOR THE IDEAL POINT MODEL BY MENGYANG CAO THESIS Submitted in partial fulfillment of the requirements for the degree of Master of Arts in Psychology

More information

Type I Error Rates and Power Estimates for Several Item Response Theory Fit Indices

Type I Error Rates and Power Estimates for Several Item Response Theory Fit Indices Wright State University CORE Scholar Browse all Theses and Dissertations Theses and Dissertations 2009 Type I Error Rates and Power Estimates for Several Item Response Theory Fit Indices Bradley R. Schlessman

More information

Issues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy

Issues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy Industrial and Organizational Psychology, 3 (2010), 489 493. Copyright 2010 Society for Industrial and Organizational Psychology. 1754-9426/10 Issues That Should Not Be Overlooked in the Dominance Versus

More information

Journal of Educational and Psychological Studies - Sultan Qaboos University (Pages ) Vol.7 Issue

Journal of Educational and Psychological Studies - Sultan Qaboos University (Pages ) Vol.7 Issue Journal of Educational and Psychological Studies - Sultan Qaboos University (Pages 537-548) Vol.7 Issue 4 2013 Constructing a Scale of Attitudes toward School Science Using the General Graded Unfolding

More information

ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION SCALE

ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION SCALE California State University, San Bernardino CSUSB ScholarWorks Electronic Theses, Projects, and Dissertations Office of Graduate Studies 6-2016 ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION

More information

Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses

Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses Item Response Theory Steven P. Reise University of California, U.S.A. Item response theory (IRT), or modern measurement theory, provides alternatives to classical test theory (CTT) methods for the construction,

More information

Psychometric Details of the 20-Item UFFM-I Conscientiousness Scale

Psychometric Details of the 20-Item UFFM-I Conscientiousness Scale Psychometric Details of the 20-Item UFFM-I Conscientiousness Scale Documentation Prepared By: Nathan T. Carter & Rachel L. Williamson Applied Psychometric Laboratory at The University of Georgia Last Updated:

More information

Chapter 1 Introduction. Measurement Theory. broadest sense and not, as it is sometimes used, as a proxy for deterministic models.

Chapter 1 Introduction. Measurement Theory. broadest sense and not, as it is sometimes used, as a proxy for deterministic models. Ostini & Nering - Chapter 1 - Page 1 POLYTOMOUS ITEM RESPONSE THEORY MODELS Chapter 1 Introduction Measurement Theory Mathematical models have been found to be very useful tools in the process of human

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure. Rob Cavanagh Len Sparrow Curtin University

Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure. Rob Cavanagh Len Sparrow Curtin University Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure Rob Cavanagh Len Sparrow Curtin University R.Cavanagh@curtin.edu.au Abstract The study sought to measure mathematics anxiety

More information

Disagree-Agree Response Scale

Disagree-Agree Response Scale A Unidimensional Item Response Model for Unfolding Responses From a Graded Disagree-Agree Response Scale James S. Roberts, Educational Testing Service James E. Laughlin, University of South Carolina Binary

More information

Development, Standardization and Application of

Development, Standardization and Application of American Journal of Educational Research, 2018, Vol. 6, No. 3, 238-257 Available online at http://pubs.sciepub.com/education/6/3/11 Science and Education Publishing DOI:10.12691/education-6-3-11 Development,

More information

Resolving binary responses to the Visual Arts Attitude Scale with the Hyperbolic Cosine Model

Resolving binary responses to the Visual Arts Attitude Scale with the Hyperbolic Cosine Model International Education Journal Vol 1, No 2, 1999 http://www.flinders.edu.au/education/iej 94 Resolving binary responses to the Visual Arts Attitude Scale with the Hyperbolic Cosine Model Joanna Touloumtzoglou

More information

1.1 The Application of Measurement Scales in Psychological Research

1.1 The Application of Measurement Scales in Psychological Research Chapter 1 General Introduction 1.1 The Application of Measurement Scales in Psychological Research Since over a century (i.e., since Cattell, 1890), psychological scaling methods have become more and more

More information

AND ITS VARIOUS DEVICES. Attitude is such an abstract, complex mental set. up that its measurement has remained controversial.

AND ITS VARIOUS DEVICES. Attitude is such an abstract, complex mental set. up that its measurement has remained controversial. CHAPTER III attitude measurement AND ITS VARIOUS DEVICES Attitude is such an abstract, complex mental set up that its measurement has remained controversial. Psychologists studied attitudes of individuals

More information

Research and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida

Research and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida Vol. 2 (1), pp. 22-39, Jan, 2015 http://www.ijate.net e-issn: 2148-7456 IJATE A Comparison of Logistic Regression Models for Dif Detection in Polytomous Items: The Effect of Small Sample Sizes and Non-Normality

More information

Influences of IRT Item Attributes on Angoff Rater Judgments

Influences of IRT Item Attributes on Angoff Rater Judgments Influences of IRT Item Attributes on Angoff Rater Judgments Christian Jones, M.A. CPS Human Resource Services Greg Hurt!, Ph.D. CSUS, Sacramento Angoff Method Assemble a panel of subject matter experts

More information

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Jill F. Kilanowski, PhD, APRN,CPNP Associate Professor Alpha Zeta & Mu Chi Acknowledgements Dr. Li Lin,

More information

Nearest-Integer Response from Normally-Distributed Opinion Model for Likert Scale

Nearest-Integer Response from Normally-Distributed Opinion Model for Likert Scale Nearest-Integer Response from Normally-Distributed Opinion Model for Likert Scale Jonny B. Pornel, Vicente T. Balinas and Giabelle A. Saldaña University of the Philippines Visayas This paper proposes that

More information

Connexion of Item Response Theory to Decision Making in Chess. Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan

Connexion of Item Response Theory to Decision Making in Chess. Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan Connexion of Item Response Theory to Decision Making in Chess Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan Acknowledgement A few Slides have been taken from the following presentation

More information

Examining the Process of Responding to Circumplex Scales of Interpersonal Values Items: Should Ideal Point Scoring Methods Be Considered?

Examining the Process of Responding to Circumplex Scales of Interpersonal Values Items: Should Ideal Point Scoring Methods Be Considered? Journal of Personality Assessment ISSN: 0022-3891 (Print) 1532-7752 (Online) Journal homepage: http://www.tandfonline.com/loi/hjpa20 Examining the Process of Responding to Circumplex Scales of Interpersonal

More information

Attitude Measurement

Attitude Measurement Business Research Methods 9e Zikmund Babin Carr Griffin Attitude Measurement 14 Chapter 14 Attitude Measurement 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or

More information

A Comparison of Several Goodness-of-Fit Statistics

A Comparison of Several Goodness-of-Fit Statistics A Comparison of Several Goodness-of-Fit Statistics Robert L. McKinley The University of Toledo Craig N. Mills Educational Testing Service A study was conducted to evaluate four goodnessof-fit procedures

More information

An Empirical Examination of the Impact of Item Parameters on IRT Information Functions in Mixed Format Tests

An Empirical Examination of the Impact of Item Parameters on IRT Information Functions in Mixed Format Tests University of Massachusetts - Amherst ScholarWorks@UMass Amherst Dissertations 2-2012 An Empirical Examination of the Impact of Item Parameters on IRT Information Functions in Mixed Format Tests Wai Yan

More information

Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and

Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and private study only. The thesis may not be reproduced elsewhere

More information

André Cyr and Alexander Davies

André Cyr and Alexander Davies Item Response Theory and Latent variable modeling for surveys with complex sampling design The case of the National Longitudinal Survey of Children and Youth in Canada Background André Cyr and Alexander

More information

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2 MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and Lord Equating Methods 1,2 Lisa A. Keller, Ronald K. Hambleton, Pauline Parker, Jenna Copella University of Massachusetts

More information

Evaluating the quality of analytic ratings with Mokken scaling

Evaluating the quality of analytic ratings with Mokken scaling Psychological Test and Assessment Modeling, Volume 57, 2015 (3), 423-444 Evaluating the quality of analytic ratings with Mokken scaling Stefanie A. Wind 1 Abstract Greatly influenced by the work of Rasch

More information

Comprehensive Statistical Analysis of a Mathematics Placement Test

Comprehensive Statistical Analysis of a Mathematics Placement Test Comprehensive Statistical Analysis of a Mathematics Placement Test Robert J. Hall Department of Educational Psychology Texas A&M University, USA (bobhall@tamu.edu) Eunju Jung Department of Educational

More information

Section 5. Field Test Analyses

Section 5. Field Test Analyses Section 5. Field Test Analyses Following the receipt of the final scored file from Measurement Incorporated (MI), the field test analyses were completed. The analysis of the field test data can be broken

More information

1. Evaluate the methodological quality of a study with the COSMIN checklist

1. Evaluate the methodological quality of a study with the COSMIN checklist Answers 1. Evaluate the methodological quality of a study with the COSMIN checklist We follow the four steps as presented in Table 9.2. Step 1: The following measurement properties are evaluated in the

More information

Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking

Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking Jee Seon Kim University of Wisconsin, Madison Paper presented at 2006 NCME Annual Meeting San Francisco, CA Correspondence

More information

UNFOLDING THE MEASUREMENT OF THE CREATIVE PERSONALITY

UNFOLDING THE MEASUREMENT OF THE CREATIVE PERSONALITY Page 1 UNFOLDING THE MEASUREMENT OF THE CREATIVE PERSONALITY Leonidas A. Zampetakis * *Send correspondence to: Leonidas A. Zampetakis Department of Production Engineering and Management, Management Systems

More information

Exploring rater errors and systematic biases using adjacent-categories Mokken models

Exploring rater errors and systematic biases using adjacent-categories Mokken models Psychological Test and Assessment Modeling, Volume 59, 2017 (4), 493-515 Exploring rater errors and systematic biases using adjacent-categories Mokken models Stefanie A. Wind 1 & George Engelhard, Jr.

More information

Item Analysis: Classical and Beyond

Item Analysis: Classical and Beyond Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013 Why is item analysis relevant? Item analysis provides

More information

AN ANALYSIS OF THE ITEM CHARACTERISTICS OF THE CONDITIONAL REASONING TEST OF AGGRESSION

AN ANALYSIS OF THE ITEM CHARACTERISTICS OF THE CONDITIONAL REASONING TEST OF AGGRESSION AN ANALYSIS OF THE ITEM CHARACTERISTICS OF THE CONDITIONAL REASONING TEST OF AGGRESSION A Dissertation Presented to The Academic Faculty by Justin A. DeSimone In Partial Fulfillment of the Requirements

More information

Effects of the Number of Response Categories on Rating Scales

Effects of the Number of Response Categories on Rating Scales NUMBER OF RESPONSE CATEGORIES 1 Effects of the Number of Response Categories on Rating Scales Roundtable presented at the annual conference of the American Educational Research Association, Vancouver,

More information

Author s response to reviews

Author s response to reviews Author s response to reviews Title: The validity of a professional competence tool for physiotherapy students in simulationbased clinical education: a Rasch analysis Authors: Belinda Judd (belinda.judd@sydney.edu.au)

More information

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS Michael J. Kolen The University of Iowa March 2011 Commissioned by the Center for K 12 Assessment & Performance Management at

More information

Continuum Specification in Construct Validation

Continuum Specification in Construct Validation Continuum Specification in Construct Validation April 7 th, 2017 Thank you Co-author on this project - Andrew T. Jebb Friendly reviewers - Lauren Kuykendall - Vincent Ng - James LeBreton 2 Continuum Specification:

More information

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling Olli-Pekka Kauppila Daria Kautto Session VI, September 20 2017 Learning objectives 1. Get familiar with the basic idea

More information

Methodology Introduction of the study Statement of Problem Objective Hypothesis Method

Methodology Introduction of the study Statement of Problem Objective Hypothesis Method 3.1. Introduction of the study 3.2. Statement of Problem 3.3. Objective 3.4. Hypothesis 3.5. Method 3.5.1. Procedure Sample A.5.2. Variable A.5.3. Research Design A.5.4. Operational Definition Of The Terms

More information

Differential Item Functioning

Differential Item Functioning Differential Item Functioning Lecture #11 ICPSR Item Response Theory Workshop Lecture #11: 1of 62 Lecture Overview Detection of Differential Item Functioning (DIF) Distinguish Bias from DIF Test vs. Item

More information

Centre for Education Research and Policy

Centre for Education Research and Policy THE EFFECT OF SAMPLE SIZE ON ITEM PARAMETER ESTIMATION FOR THE PARTIAL CREDIT MODEL ABSTRACT Item Response Theory (IRT) models have been widely used to analyse test data and develop IRT-based tests. An

More information

Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University.

Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University. Running head: ASSESS MEASUREMENT INVARIANCE Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies Xiaowen Zhu Xi an Jiaotong University Yanjie Bian Xi an Jiaotong

More information

APPLYING AN UNFOLDING MODEL TO THE STAGES AND PROCESSES OF CHANGE. A Thesis. Submitted to. The College of Graduate Studies and Research

APPLYING AN UNFOLDING MODEL TO THE STAGES AND PROCESSES OF CHANGE. A Thesis. Submitted to. The College of Graduate Studies and Research APPLYING AN UNFOLDING MODEL TO THE STAGES AND PROCESSES OF CHANGE A Thesis Submitted to The College of Graduate Studies and Research In Partial Fulfillment of the Requirements For the Degree of Master

More information

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison Empowered by Psychometrics The Fundamentals of Psychometrics Jim Wollack University of Wisconsin Madison Psycho-what? Psychometrics is the field of study concerned with the measurement of mental and psychological

More information

TECHNICAL REPORT. The Added Value of Multidimensional IRT Models. Robert D. Gibbons, Jason C. Immekus, and R. Darrell Bock

TECHNICAL REPORT. The Added Value of Multidimensional IRT Models. Robert D. Gibbons, Jason C. Immekus, and R. Darrell Bock 1 TECHNICAL REPORT The Added Value of Multidimensional IRT Models Robert D. Gibbons, Jason C. Immekus, and R. Darrell Bock Center for Health Statistics, University of Illinois at Chicago Corresponding

More information

Mantel-Haenszel Procedures for Detecting Differential Item Functioning

Mantel-Haenszel Procedures for Detecting Differential Item Functioning A Comparison of Logistic Regression and Mantel-Haenszel Procedures for Detecting Differential Item Functioning H. Jane Rogers, Teachers College, Columbia University Hariharan Swaminathan, University of

More information

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Greg Pope, Analytics and Psychometrics Manager 2008 Users Conference San Antonio Introduction and purpose of this session

More information

NCSALL Reports #19 August 2001 APPENDIX A. Standardized Measures Analysis Report

NCSALL Reports #19 August 2001 APPENDIX A. Standardized Measures Analysis Report APPENDIX A Standardized Measures Analysis Report To answer research questions concerning the degree of self-change the participants reported or demonstrated, we collected a range of demographic data and

More information

Scoring Multiple Choice Items: A Comparison of IRT and Classical Polytomous and Dichotomous Methods

Scoring Multiple Choice Items: A Comparison of IRT and Classical Polytomous and Dichotomous Methods James Madison University JMU Scholarly Commons Department of Graduate Psychology - Faculty Scholarship Department of Graduate Psychology 3-008 Scoring Multiple Choice Items: A Comparison of IRT and Classical

More information

More is not always better: Unpacking the cognitive process underlying introspective psychological measurement

More is not always better: Unpacking the cognitive process underlying introspective psychological measurement University of Central Florida Electronic Theses and Dissertations Masters Thesis (Open Access) More is not always better: Unpacking the cognitive process underlying introspective psychological measurement

More information

On indirect measurement of health based on survey data. Responses to health related questions (items) Y 1,..,Y k A unidimensional latent health state

On indirect measurement of health based on survey data. Responses to health related questions (items) Y 1,..,Y k A unidimensional latent health state On indirect measurement of health based on survey data Responses to health related questions (items) Y 1,..,Y k A unidimensional latent health state A scaling model: P(Y 1,..,Y k ;α, ) α = item difficulties

More information

Rhonda L. White. Doctoral Committee:

Rhonda L. White. Doctoral Committee: THE ASSOCIATION OF SOCIAL RESPONSIBILITY ENDORSEMENT WITH RACE-RELATED EXPERIENCES, RACIAL ATTITUDES, AND PSYCHOLOGICAL OUTCOMES AMONG BLACK COLLEGE STUDENTS by Rhonda L. White A dissertation submitted

More information

A Bayesian Nonparametric Model Fit statistic of Item Response Models

A Bayesian Nonparametric Model Fit statistic of Item Response Models A Bayesian Nonparametric Model Fit statistic of Item Response Models Purpose As more and more states move to use the computer adaptive test for their assessments, item response theory (IRT) has been widely

More information

CLINICAL VS. BEHAVIOR ASSESSMENT

CLINICAL VS. BEHAVIOR ASSESSMENT CLINICAL VS. BEHAVIOR ASSESSMENT Informal Tes3ng Personality Tes3ng Assessment Procedures Ability Tes3ng The Clinical Interview 3 Defining Clinical Assessment The process of assessing the client through

More information

Analysis of Confidence Rating Pilot Data: Executive Summary for the UKCAT Board

Analysis of Confidence Rating Pilot Data: Executive Summary for the UKCAT Board Analysis of Confidence Rating Pilot Data: Executive Summary for the UKCAT Board Paul Tiffin & Lewis Paton University of York Background Self-confidence may be the best non-cognitive predictor of future

More information

Quality of Life. The assessment, analysis and reporting of patient-reported outcomes. Third Edition

Quality of Life. The assessment, analysis and reporting of patient-reported outcomes. Third Edition Quality of Life The assessment, analysis and reporting of patient-reported outcomes Third Edition PETER M. FAYERS Institute of Applied Health Sciences, University ofaberdeen School of Medicine and Dentistry,

More information

Description of components in tailored testing

Description of components in tailored testing Behavior Research Methods & Instrumentation 1977. Vol. 9 (2).153-157 Description of components in tailored testing WAYNE M. PATIENCE University ofmissouri, Columbia, Missouri 65201 The major purpose of

More information

Regression Discontinuity Analysis

Regression Discontinuity Analysis Regression Discontinuity Analysis A researcher wants to determine whether tutoring underachieving middle school students improves their math grades. Another wonders whether providing financial aid to low-income

More information

The Impact of Item Sequence Order on Local Item Dependence: An Item Response Theory Perspective

The Impact of Item Sequence Order on Local Item Dependence: An Item Response Theory Perspective Vol. 9, Issue 5, 2016 The Impact of Item Sequence Order on Local Item Dependence: An Item Response Theory Perspective Kenneth D. Royal 1 Survey Practice 10.29115/SP-2016-0027 Sep 01, 2016 Tags: bias, item

More information

Validating Measures of Self Control via Rasch Measurement. Jonathan Hasford Department of Marketing, University of Kentucky

Validating Measures of Self Control via Rasch Measurement. Jonathan Hasford Department of Marketing, University of Kentucky Validating Measures of Self Control via Rasch Measurement Jonathan Hasford Department of Marketing, University of Kentucky Kelly D. Bradley Department of Educational Policy Studies & Evaluation, University

More information

Statistical Methods and Reasoning for the Clinical Sciences

Statistical Methods and Reasoning for the Clinical Sciences Statistical Methods and Reasoning for the Clinical Sciences Evidence-Based Practice Eiki B. Satake, PhD Contents Preface Introduction to Evidence-Based Statistics: Philosophical Foundation and Preliminaries

More information

THE USE OF CRONBACH ALPHA RELIABILITY ESTIMATE IN RESEARCH AMONG STUDENTS IN PUBLIC UNIVERSITIES IN GHANA.

THE USE OF CRONBACH ALPHA RELIABILITY ESTIMATE IN RESEARCH AMONG STUDENTS IN PUBLIC UNIVERSITIES IN GHANA. Africa Journal of Teacher Education ISSN 1916-7822. A Journal of Spread Corporation Vol. 6 No. 1 2017 Pages 56-64 THE USE OF CRONBACH ALPHA RELIABILITY ESTIMATE IN RESEARCH AMONG STUDENTS IN PUBLIC UNIVERSITIES

More information

CHAPTER 3 RESEARCH METHODOLOGY

CHAPTER 3 RESEARCH METHODOLOGY CHAPTER 3 RESEARCH METHODOLOGY 3.1 Introduction 3.1 Methodology 3.1.1 Research Design 3.1. Research Framework Design 3.1.3 Research Instrument 3.1.4 Validity of Questionnaire 3.1.5 Statistical Measurement

More information

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories Kamla-Raj 010 Int J Edu Sci, (): 107-113 (010) Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories O.O. Adedoyin Department of Educational Foundations,

More information

Table of Contents. Preface to the third edition xiii. Preface to the second edition xv. Preface to the fi rst edition xvii. List of abbreviations xix

Table of Contents. Preface to the third edition xiii. Preface to the second edition xv. Preface to the fi rst edition xvii. List of abbreviations xix Table of Contents Preface to the third edition xiii Preface to the second edition xv Preface to the fi rst edition xvii List of abbreviations xix PART 1 Developing and Validating Instruments for Assessing

More information

Scales and Component Items March 2017

Scales and Component Items March 2017 www.gpi.hs.iastate.edu Scales and Component Items March 2017 Recommended Citation: Research Institute for Studies in Education (2017). Global Perspective Inventory: Scales and component items. Iowa State

More information

Linking across forms in vertical scaling under the common-item nonequvalent groups design

Linking across forms in vertical scaling under the common-item nonequvalent groups design University of Iowa Iowa Research Online Theses and Dissertations Spring 2013 Linking across forms in vertical scaling under the common-item nonequvalent groups design Xuan Wang University of Iowa Copyright

More information

Introduction to Item Response Theory

Introduction to Item Response Theory Introduction to Item Response Theory Prof John Rust, j.rust@jbs.cam.ac.uk David Stillwell, ds617@cam.ac.uk Aiden Loe, bsl28@cam.ac.uk Luning Sun, ls523@cam.ac.uk www.psychometrics.cam.ac.uk Goals Build

More information

Developing an instrument to measure informed consent comprehension in non-cognitively impaired adults

Developing an instrument to measure informed consent comprehension in non-cognitively impaired adults University of Wollongong Research Online University of Wollongong Thesis Collection 1954-2016 University of Wollongong Thesis Collections 2009 Developing an instrument to measure informed consent comprehension

More information

Item Response Theory. Robert J. Harvey. Virginia Polytechnic Institute & State University. Allen L. Hammer. Consulting Psychologists Press, Inc.

Item Response Theory. Robert J. Harvey. Virginia Polytechnic Institute & State University. Allen L. Hammer. Consulting Psychologists Press, Inc. IRT - 1 Item Response Theory Robert J. Harvey Virginia Polytechnic Institute & State University Allen L. Hammer Consulting Psychologists Press, Inc. IRT - 2 Abstract Item response theory (IRT) methods

More information

Statistics as a Tool. A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations.

Statistics as a Tool. A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations. Statistics as a Tool A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations. Descriptive Statistics Numerical facts or observations that are organized describe

More information

Psychometrics in context: Test Construction with IRT. Professor John Rust University of Cambridge

Psychometrics in context: Test Construction with IRT. Professor John Rust University of Cambridge Psychometrics in context: Test Construction with IRT Professor John Rust University of Cambridge Plan Guttman scaling Guttman errors and Loevinger s H statistic Non-parametric IRT Traces in Stata Parametric

More information

Ordinal Data Modeling

Ordinal Data Modeling Valen E. Johnson James H. Albert Ordinal Data Modeling With 73 illustrations I ". Springer Contents Preface v 1 Review of Classical and Bayesian Inference 1 1.1 Learning about a binomial proportion 1 1.1.1

More information

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1 Nested Factor Analytic Model Comparison as a Means to Detect Aberrant Response Patterns John M. Clark III Pearson Author Note John M. Clark III,

More information

Moderators of the Tailored Adaptive Personality Assessment System Validity

Moderators of the Tailored Adaptive Personality Assessment System Validity Technical Report 1357 Moderators of the Tailored Adaptive Personality Assessment System Validity Stephen Stark, Oleksandr S. Chernyshenko, Christopher D. Nye, Fritz Drasgow Drasgow Consulting Group Leonard

More information

The validity of polytomous items in the Rasch model The role of statistical evidence of the threshold order

The validity of polytomous items in the Rasch model The role of statistical evidence of the threshold order Psychological Test and Assessment Modeling, Volume 57, 2015 (3), 377-395 The validity of polytomous items in the Rasch model The role of statistical evidence of the threshold order Thomas Salzberger 1

More information

Psychology Research Process

Psychology Research Process Psychology Research Process Logical Processes Induction Observation/Association/Using Correlation Trying to assess, through observation of a large group/sample, what is associated with what? Examples:

More information

Item Response Theory (IRT): A Modern Statistical Theory for Solving Measurement Problem in 21st Century

Item Response Theory (IRT): A Modern Statistical Theory for Solving Measurement Problem in 21st Century International Journal of Scientific Research in Education, SEPTEMBER 2018, Vol. 11(3B), 627-635. Item Response Theory (IRT): A Modern Statistical Theory for Solving Measurement Problem in 21st Century

More information

PTHP 7101 Research 1 Chapter Assignments

PTHP 7101 Research 1 Chapter Assignments PTHP 7101 Research 1 Chapter Assignments INSTRUCTIONS: Go over the questions/pointers pertaining to the chapters and turn in a hard copy of your answers at the beginning of class (on the day that it is

More information

Bruno D. Zumbo, Ph.D. University of Northern British Columbia

Bruno D. Zumbo, Ph.D. University of Northern British Columbia Bruno Zumbo 1 The Effect of DIF and Impact on Classical Test Statistics: Undetected DIF and Impact, and the Reliability and Interpretability of Scores from a Language Proficiency Test Bruno D. Zumbo, Ph.D.

More information

A study of association between demographic factor income and emotional intelligence

A study of association between demographic factor income and emotional intelligence EUROPEAN ACADEMIC RESEARCH Vol. V, Issue 1/ April 2017 ISSN 2286-4822 www.euacademic.org Impact Factor: 3.4546 (UIF) DRJI Value: 5.9 (B+) A study of association between demographic factor income and emotional

More information

HARRISON ASSESSMENTS DEBRIEF GUIDE 1. OVERVIEW OF HARRISON ASSESSMENT

HARRISON ASSESSMENTS DEBRIEF GUIDE 1. OVERVIEW OF HARRISON ASSESSMENT HARRISON ASSESSMENTS HARRISON ASSESSMENTS DEBRIEF GUIDE 1. OVERVIEW OF HARRISON ASSESSMENT Have you put aside an hour and do you have a hard copy of your report? Get a quick take on their initial reactions

More information

Modelling Research Productivity Using a Generalization of the Ordered Logistic Regression Model

Modelling Research Productivity Using a Generalization of the Ordered Logistic Regression Model Modelling Research Productivity Using a Generalization of the Ordered Logistic Regression Model Delia North Temesgen Zewotir Michael Murray Abstract In South Africa, the Department of Education allocates

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

Running Head: ADVERSE IMPACT. Significance Tests and Confidence Intervals for the Adverse Impact Ratio. Scott B. Morris

Running Head: ADVERSE IMPACT. Significance Tests and Confidence Intervals for the Adverse Impact Ratio. Scott B. Morris Running Head: ADVERSE IMPACT Significance Tests and Confidence Intervals for the Adverse Impact Ratio Scott B. Morris Illinois Institute of Technology Russell Lobsenz Federal Bureau of Investigation Adverse

More information

Hyperbolic Cosine Latent Trait Models

Hyperbolic Cosine Latent Trait Models Hyperbolic Cosine Latent Trait Models for Unfolding Direct Responses and Pairwise Preferences David Andrich Murdoch University The hyperbolic cosine unfolding model for direct responses of persons to individual

More information

Performance of Median and Least Squares Regression for Slightly Skewed Data

Performance of Median and Least Squares Regression for Slightly Skewed Data World Academy of Science, Engineering and Technology 9 Performance of Median and Least Squares Regression for Slightly Skewed Data Carolina Bancayrin - Baguio Abstract This paper presents the concept of

More information

(entry, )

(entry, ) http://www.eolss.net (entry, 6.27.3.4) Reprint of: THE CONSTRUCTION AND USE OF PSYCHOLOGICAL TESTS AND MEASURES Bruno D. Zumbo, Michaela N. Gelin, & Anita M. Hubley The University of British Columbia,

More information

MEASUREMENT, SCALING AND SAMPLING. Variables

MEASUREMENT, SCALING AND SAMPLING. Variables MEASUREMENT, SCALING AND SAMPLING Variables Variables can be explained in different ways: Variable simply denotes a characteristic, item, or the dimensions of the concept that increases or decreases over

More information

By Hui Bian Office for Faculty Excellence

By Hui Bian Office for Faculty Excellence By Hui Bian Office for Faculty Excellence 1 Email: bianh@ecu.edu Phone: 328-5428 Location: 1001 Joyner Library, room 1006 Office hours: 8:00am-5:00pm, Monday-Friday 2 Educational tests and regular surveys

More information

Thriving in College: The Role of Spirituality. Laurie A. Schreiner, Ph.D. Azusa Pacific University

Thriving in College: The Role of Spirituality. Laurie A. Schreiner, Ph.D. Azusa Pacific University Thriving in College: The Role of Spirituality Laurie A. Schreiner, Ph.D. Azusa Pacific University WHAT DESCRIBES COLLEGE STUDENTS ON EACH END OF THIS CONTINUUM? What are they FEELING, DOING, and THINKING?

More information

Extraversion. The Extraversion factor reliability is 0.90 and the trait scale reliabilities range from 0.70 to 0.81.

Extraversion. The Extraversion factor reliability is 0.90 and the trait scale reliabilities range from 0.70 to 0.81. MSP RESEARCH NOTE B5PQ Reliability and Validity This research note describes the reliability and validity of the B5PQ. Evidence for the reliability and validity of is presented against some of the key

More information

Published by European Centre for Research Training and Development UK (

Published by European Centre for Research Training and Development UK ( DETERMINATION OF DIFFERENTIAL ITEM FUNCTIONING BY GENDER IN THE NATIONAL BUSINESS AND TECHNICAL EXAMINATIONS BOARD (NABTEB) 2015 MATHEMATICS MULTIPLE CHOICE EXAMINATION Kingsley Osamede, OMOROGIUWA (Ph.

More information

THE MANTEL-HAENSZEL METHOD FOR DETECTING DIFFERENTIAL ITEM FUNCTIONING IN DICHOTOMOUSLY SCORED ITEMS: A MULTILEVEL APPROACH

THE MANTEL-HAENSZEL METHOD FOR DETECTING DIFFERENTIAL ITEM FUNCTIONING IN DICHOTOMOUSLY SCORED ITEMS: A MULTILEVEL APPROACH THE MANTEL-HAENSZEL METHOD FOR DETECTING DIFFERENTIAL ITEM FUNCTIONING IN DICHOTOMOUSLY SCORED ITEMS: A MULTILEVEL APPROACH By JANN MARIE WISE MACINNES A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF

More information