Psychometric Issues in the Measurement of Emotional Intelligence

Size: px
Start display at page:

Download "Psychometric Issues in the Measurement of Emotional Intelligence"

Transcription

1 Psychometric Issues in the Measurement of Emotional Intelligence Kimberly A. Barchard University of Nevada, Las Vegas James A. Russell Boston College Reference: Barchard, Kimberly. A. & Russell, James A. (in press). Psychometric Issues in the Measurement of Emotional Intelligence. In Glenn Geher (Ed.), The Measurement of Emotional Intelligence (tentative title). Hauppauge, NY: Nova Science Publishers. Contact Information: Kim Barchard, Department of Psychology, University of Nevada, Las Vegas, 4505 S. Maryland Parkway, P.O. Box , Las Vegas, NV, , USA, DO NOT QUOTE. SUBJECT TO COPY-EDITING

2 Psychometric Issues 1 Psychometric Issues in the Measurement of Emotional Intelligence Everyone knows that a test of Emotional Intelligence (EI) has to be shown to be reliable and valid before it can be considered to be a good test. This chapter discusses, in a non-technical manner, reliability, validity, and other aspects of the psychometric evaluation of a test. Our theme is that the traditional approach to these issues has been oversimplified, and that much progress in the field of EI can be achieved through a better understanding of these issues and their application. Test manuals and journal articles sometimes claim that a particular test has been shown to be reliable and valid. Such claims strike us as at best oversimplified. Whether a test is a good test depends on how it is used, the assumptions being made about the resulting scores, and a host of other factors. The answer therefore requires many different types of evidence. There is no simple checklist, such as reliability and validity, that can apply to all tests or all uses of one test. The user of a test -- whether in research or in an applied setting -- necessarily makes many assumptions about that test. For example, the test user might have to assume that the test can be given at different times and in different locations, that scores on the test capture something about the test-takers that remains stable over time, that the test predicts how the testtakers will respond to an emotionally demanding situation, and that the test captures what they (rather than the test designer) mean by the term Emotional Intelligence. The process of evaluating a test in a particular context is a part of the scientific enterprise in which we require empirical evidence for the assumptions being made. A variety of terms have been created to discuss test quality and the suitability of tests for particular purposes internal consistency, construct validity, test bias, etc. Not all of these types of evidence will be relevant to each testing situation, and sometimes the evidence that is needed will have no clearly defined psychometric label. In most cases, many different types of evidence will be needed. In addition, different test makers and test users often make somewhat different assumptions about EI even when discussing the same test. If so, the types of evidence needed will vary. Thus, there is no such thing as the reliability of a test and there is no such thing as the validity of a test. Whatever model or measure of EI is being used, many different types of evidence will be needed. Reliability Scores on a test are reliable to the extent that they are free from unwanted random variation (noise). More precisely, a test is reliable if scores are consistent across different measurement occasions. For each test, there is no one reliability coefficient. Rather, different types of reliability can be distinguished, and they need not coincide. The most frequently encountered forms include test-retest, inter-rater, and internal consistency reliability. Other types could also be defined, based upon other factors that could vary from one test-taker to another: for example, location, administrator, or method of administration (paper or computer-based). The general concepts discussed here can be straight-forwardly generalized to these other factors. Test-retest Reliability: Allowing Time to Vary Between Respondents Test-retest reliability is almost always of interest to users of EI tests, for two reasons. First, most researchers assume that a person s EI is stable over long periods of time; any test of EI must capture that stability. Second, in most applications, the test of EI will be administered to

3 Psychometric Issues 2 different test-takers at different times, and yet the test-user will ignore these differences when interpreting test scores. To assess test-retest reliability, the test is administered twice and the correlation between test scores from the two administrations is calculated. Classical Test Score Theory shows that this correlation is equal to the proportion of total score variance that is due to factors that are consistent from one time to another (Allen & Yen, 1979). For example, a two-week test-retest reliability coefficient of.80 tells us that 80% of the variance remains constant over a two-week period. Depending upon the use to which the measure is being put, different lengths of time may be needed between the two testing sessions. Test-takers might take the test at widely different times, perhaps months or years apart. For example, if data for a research study are collected over a period of six months, if a measure is used to hire or promote people whenever openings arise even several years after the test is taken, or if the test is given whenever a patient arrives for therapy, then the equivalence of test scores across months or years is of interest. Evidence of test-retest reliability of EI tests is unfortunately sparse (see Table 1), and what evidence there is often comes from too short a time period. The EQi is a self-report measure with 15 subscales designed to assess non-cognitive abilities associated with success in life; its test manual (BarOn, 1997) reports test-retest reliability coefficients for one-month and four-month follow-ups for 11 of the 15 subscales. Schutte at al. (1997) created a 33-item selfreport questionnaire to assess the wide range of cognitive abilities and personality characteristics originally suggested in Salovey and Mayer s (1990) article. They reported two-week test-retest coefficients for total scores on their measure. The Mayer-Salovey-Caruso Emotional Intelligence Test (MSCEIT; Mayer, Salovey, & Caruso, 2002) is a maximum-performance test with eight subscales designed to measure cognitive abilities related to emotions. Brackett and Mayer (2003) reported that total scores on the MSCEIT had a test-retest coefficient of.86 over a three-week period. In each case, where test-retest reliability has been examined, it appears to be adequate; however, it has not been examined for all scales and subscales and has not be examined over a wide enough range of time periods. Inter-rater Reliability: Allowing Raters to Vary Between Respondents Inter-rater reliability is the consistency of scores given by different raters. Two different raters score the same protocols, and the scores from the two raters are correlated. Classical Test Score Theory shows that this correlation is equal to the proportion of total score variance that is due to factors that are consistent from one rater to another (Allen & Yen, 1979). Most measures of EI use objective-scoring methods, with clearly specified rules for scoring each item and for obtaining total scores from the item scores. In some cases, the scoring is even done by computer. In these situations, inter-rater reliability is likely to be near perfect. Some measures of EI, however, use subjective-scoring methods, and it is here that inter-rater reliability must be established. For example, the Levels of Emotional Awareness Scale (LEAS; Lane, Quinlan, Schwartz, Walker, & Zeitlin, 1990) consists of open-ended questions (the participants describe how they would feel in each of 20 different situations). The scoring manual specifies rules for calculating test scores based on the number and specificity of the emotion words used. Such scoring rules involve subjective judgment, and thus inter-rater reliability needs to be established. The inter-rater reliability coefficient for experienced scorers has been estimated as.97 in one study (Lane, Kivley, DuBois, Shamasundara, & Schwartz, 1995) and as.99 in another

4 Psychometric Issues 3 (Ciarrochi, Caputi, & Mayer, 2003). Thus, with adequate training, it appears that inter-rater reliability for the LEAS is very good. Internal Consistency: Allowing Items to Vary Between Respondents A third type of reliability is Internal Consistency, which is usually estimated using Coefficient Alpha. Coefficient Alpha equals the correlation that would be obtained between the existing k-item test and a hypothetical k-item test designed to measure the same construct. Internal consistency tells us the proportion of total score variance that is due to factors that are consistent from one set of items to another. Internal Consistency is perhaps the easiest of the reliability coefficients to calculate, because it can be derived from a single administration of a test; the only requirement is that the test consists of more than one item. Perhaps because of this simplicity of computation, EI test manuals and articles usually include Internal Consistency estimates (see Table 1). When researchers or test users refer to test scores as measuring some underlying construct (such as EI), establishing the internal consistency of test scores reassures the test-user that total test scores do not vary greatly as a function of the particular test items that were used. Slightly different test items could have been used to measure the same construct, and very similar scores would have resulted. Therefore, internal consistency reliability should be calculated and reported for all measures of EI, and should typically be high. When a test consists of multiple subscales, however, Coefficient Alpha is not an appropriate method of calculating the influence of item variation on total test scores. Coefficient alpha is based on a one-factor repeated measures analysis of variance (ANOVA) design. This model is appropriate whenever all the items on a scale were designed to measure one construct. However, a test made up of multiple subscales better corresponds to a two-factor ANOVA design. The first factor is subscale (which might be considered fixed or random) and the second factor is items nested within each subscale (which would be considered random). This is a nested design. The random selection of items onto subscales will influence both subscale scores and total test scores. To assess this influence for subscale scores, coefficient alpha can be used. To assess this influence on total test scores, a nested ANOVA model needs to be used. The interested reader can refer to Shavelson and Webb (1991) and Gillmore (1983) for more information. Generalizability Theory: Multiple Sources of Variance Each of the three types of reliability mentioned so far considers only a single source of random variation at a time. This kind of reliability coefficient therefore tells us how well scores would generalize when that one factor varies -- and all other factors are held constant. In many measurement and testing situations, however, more than one factor varies. A study on EI might occur in which data are collected in different sessions over a period of months, in various classrooms, administered by different research assistants, and scored by different raters. In such cases, it is not enough to know one reliability coefficient: The test-user needs to know all of the relevant reliability coefficients: inter-room reliability, inter-administrator reliability, and so on. More importantly, what the test-user really needs to know is the reliability of scores when all these factors vary simultaneously. This is not a question that can be answered with even a large set of traditional reliability coefficients. But it is precisely the problem for which Generalizability Theory (Cronbach, Gleser, & Rajaratnam, 1963) was designed. To our knowledge, Generalizability Theory has not been used in the domain of EI, but because of its

5 Psychometric Issues 4 importance both conceptually and methodologically, we would like to introduce readers to it. For the interested reader, Brennan (1992) provides a good source of information on how to calculate Generalizability Coefficients, and Crocker and Algina (1986) provide a solid foundation in the concepts involved. Generalizability Theory allows the researcher to estimate the reliability of test scores when multiple sources of random error are all acting on the scores at once. For example, if data are collected at different times and locations and are scored by different raters, the basic question is what proportion of the variance reflects real differences between the respondents and what proportion is due to all the extraneous variations (time, location, and rater in this example). Generalizability Theory expands our understanding of reliability by underscoring that our ability to generalize test scores depends not on the test alone (reliability is not simply a property of the test) but also on how the test is used. Generalizability Coefficients are calculated from variance components estimated from a repeated measures analysis of variance. A special study, called a Generalizability study or G- study, is conducted to estimate the variance components. In the ideal G-study, all plausible sources of random error are treated as factors and are varied in a fully-crossed repeated measures design. So, for example, each participant might complete multiple items in multiple locations at multiple times, and each response might be scored by multiple raters. The variance component associated with each of these factors is then calculated. Results from this G-Study can then be used to estimate the generalizability of scores whenever the test is put to use in research or an applied setting (e.g., deciding who to hire or what treatment to use for a particular client). The generalizability of those particular test scores can be estimated by a combination of the variance components obtained earlier in the G-study. The G-Study can thus tell the test user if generalizability will be sufficiently high in a particular use or if additional steps need to be taken to increase it. For example, if the variance component due to location is negligible, then the test user can use whatever locations are convenient without worry of introducing too much random error. When the G-Study indicates that generalizability may be compromised, then steps can be taken. There are two different ways to increase the generalizability of the test scores. First, the test-user can make each score dependent on more information (more items, more raters, more repetitions of the test, etc), especially for those factors that introduce the most random error. In our example, location can safely be ignored, but perhaps time has a large influence on the scores; in this case, the test could be repeated and the final score could be the average across repetitions. Second, the test-user could hold important factors constant for all respondents. For example, if time is an important source of random error, then all respondents can complete the test at the same time. This second solution ensures that the scores of the particular respondents can be compared to one another without worry of undue influence of temporal variation. That worry remains, however, if the scores from these respondents are compared to those from another group who completed the test at another time (e.g., respondents in another study or application). Summary The first step in determining if a test is any good is establishing that scores on the test are not unduly influenced by extraneous factors, such as time and location of taking the test. Familiar and easy-to-use reliability coefficients (test-retest, inter-rater, and internal consistency) should be provided by test developers. But these are not different estimates of the same thing ( the reliability of the test.) Instead, all three and more are typically needed. Indeed, test users

6 Psychometric Issues 5 need to examine and sometimes supplement these figures. Reliability of a test depends upon context of its use -- whether times, locations, and raters will be standardized across subjects or allowed to vary, for example. In the future, test developers would do well to consider taking into account multiple sources of random noise simultaneously. Generalizability Theory provides the conceptualization and tools for this next step. Validity A test is valid if it measures what it was supposed to measure or if it predicts what it was supposed to predict. Validity therefore necessarily involves specifying the construct underlying the test. Over the last decade, two different conceptions of EI have emerged (Petrides & Furnham, 2001) and establishing validity will depend on which of these conceptions is involved. The first approach defines EI as a cognitive ability related to emotions and is referred to as Ability EI. The second defines EI as a cluster of personality characteristics or a set of noncognitive abilities related to life success, and is referred to as Trait EI. The MSCEIT (Mayer et al., 2002) Faces subscale is a good example of a measure of Ability EI. Respondents are asked to rate the extent to which each of several emotions is present in given faces. Tett s Flexible Planning subscale (Tett, Wang, & Fox, in prep) is a good example of a measure of Trait EI. Respondents complete a set of 12 self-report Likert-type items, to indicate the extent to which they prefer to base major life decisions on emotions rather than logic. The MSCEIT (Mayer et al., 2002) and MEIS (Mayer, Caruso, & Salovey, 1999) were both designed as measures of Ability EI, whereas the EQi (BarOn, 1997), Schutte s measure (Schutte et al., 1997), and Tett s SEI (Tett et al., in prep) appear to measure Trait EI. The evidence required to demonstrate the validity of a test of EI will depend upon which conception of EI is presupposed. Whatever conception of EI is being used, many different pieces of evidence will usually be required to demonstrate the validity of the inferences we wish to make. Such evidence can usually be classified as falling into one of the following three approaches to validity: content validity, criterion-related validity, and construct validity. Content Validity A test is considered to be content valid if its items are a representative sample from the content domain that the test is intended to measure. To assess content validity, the researcher must first clearly specify the content, boundaries and structure of the content domain, creating subdomains and specifying their relative importance. Experts in the content domain then classify each item as relevant to the domain or not, and, if relevant, into a subdomain. In this way, the researcher verifies that each item belongs in the domain and that the proportion of items from each subdomain mirrors its relative importance. A content validity approach is especially convincing in an educational setting, for example, where scores on a final examination in a course are considered valid to the extent that the exam representatively samples the content of the course. If a math class covered 45 different math skills, then the exam should sample them in proportion to their importance. Some EI researchers have used a content validity justification for their items or subscales, by indicating that the items or subscales were created to measure every aspect of some specific model of EI. For example, both the Schutte et al. (1997) measure and Tett Survey of Emotional Intelligence (SEI; Tett et al., in prep) were designed to include items or subscales covering all aspects of the Salovey and Mayer (1990) model. Similarly, the MSCEIT (Mayer et al., 2002) was designed to include scales for all aspects of the Mayer and Salovey later (1997) model. This

7 Psychometric Issues 6 work could be taken a step further by asking EI experts to verify that the items indeed sample the relevant models subdomains in the right proportions. Criterion-related Validity In applied settings, such as psychiatric evaluations and treatments, criminal hearings, and hiring and promotion decisions, the most important validity evidence is likely to be the correlation between the test and whatever criterion variable is taken to represent success or failure in that endeavor. More generally, when a test is used to make a particular decision, the test user is likely most concerned with what has been called criterion-related validity: the correlation of the test with the criterion of interest. Criterion variables are those behaviors or outcomes that are intrinsically interesting to test users: recidivism, psychiatric diagnoses, suicide, job success or theft, for example. In rare circumstances, the criterion variable will be another psychological test. This typically occurs when a shortened form or screening test is being designed, and the test-user wants to know if the short-form will produce scores that are similar to the full-length test. Measures of EI are used in various settings for a wide variety of purposes. For example, BarOn (1997) states that the EQi can be used in corporate settings as a screening tool to select personnel (p. 9), in educational settings to identify students who are unable to cope with scholastic demands (p. 9), in clinical settings to establish therapeutic goals (p 10) and evaluate the successfulness of the therapy (p. 10), and in medical settings to evaluate a person s ability to deal with the pressures of being seriously ill (p. 10). These suggestions all raise the question of criterion-related validity, which would need to be assessed separately for each of these purposes. Two types of criterion-related validity are distinguished. In concurrent validity the criterion and test occur simultaneously or close in time. For example, a short form of a test might be given simultaneously with a longer form to determine if the short form is an adequate substitute for the longer form. Or a test of EI might be given to a graduating class of clinicians to determine if EI is related to success in their studies (GPA). In predictive validity, the test precedes the criterion. For example, scores from an EI test given before admission to graduate school in clinical psychology could be correlated with success upon graduation. Here, the measurement preceded the criterion by several years. In many situations, test users are interested in predictive validity, but researchers find it too expensive and time-consuming to collect data on the predictor first and then wait months or years before measuring the criterion. In these cases, concurrent validity studies are often conducted instead. For example, we could examine the relationship between EI and graduate school success for a sample of people who have already been in graduate school for several years. Unfortunately, a concurrent validity study may not provide the predictive validity information desired, for three reasons. First, the research participants may be quite different: for example, hopeful graduate school applicants versus stressed graduate students. Second, a person s true level of EI might change over time. Such change may be especially likely after training in psychology, as in our example. Third, the relationship between EI and the criterion might change over time, especially among participants who are at critical stages of emotional development. For these three reasons, concurrent validity coefficients are likely to be higher than (and in any case, different than) predictive validity coefficients. As is the case in other areas of psychological measurement, concurrent validity studies of EI measures are more common than predictive validity studies. For example, BarOn (1997) cites

8 Psychometric Issues 7 four studies as providing support for the criterion-related validity of the EQi. Three of the studies used concurrent measurement; only one was predictive. Because many test users assume that EI scores predict future behavior, more predictive studies are needed for all EI tests. Incremental Criterion-Related Validity One additional complication in the estimation of criterion-related validity exists: In many testing situations, the criterion is already being predicted from one or more predictors. In these situations, test users want to know if the addition of an EI measure will improve their predictions, rather than being interested in the predictive validity of the EI measure by itself. In this case, incremental criterion-related validity is called for. It can be assessed using a two-step multiple regression analysis. In the first step, the original predictors are used to predict the criterion. In the second step, the new predictor (EI, in our example) is added to the regression equation, and the researcher determines if the change in the R-squared value (the proportion of variance accounted for) is statistically significant. Alternatively, the partial correlation between the new predictor and the criterion can be calculated when the old predictors have been partialled out. In practical contexts, test users really should be more interested in incremental criterionrelated validity than they have been. This is because they will likely use a measure of EI to supplement rather than replace their existing test battery. After all, a graduate school would likely include the EI measure along with the already used Graduate Record Exams, Grade Point Averages, and letters of recommendation: they would not discard these other predictors; they would supplement them. Similarly to predict managerial success, a company might supplement the information gained from applicants resumes, interviews, and previous experiences with a measure of EI. We know of only four studies that have examined the incremental criterion-related validity of EI. Brackett and Mayer (2003) found that each of three measures of EI was able to contribute to the prediction of one of their six criterion variables, after personality and verbal intelligence were held constant (significant partial correlations ranging from -.16 to -.20), although the three measures of EI each demonstrated incremental criterion-related validity for a different criterion. Petrides and Furnham (2003) showed that EI predicted sensitivity to mood induction procedures when the Big Five personality traits had already been taken into account. van der Zee, Thijs, and Schakel (2002) found that EI predicted both academic success and social success when traditional indicators of intelligence and personality had already been taken into account. In contrast, Barchard (2003) found that none of 31 measures of EI was able to improve the prediction of academic success when personality and cognitive abilities had already been taken into account. In summary, there is at least some evidence for the incremental criterionrelated validity of measures of EI, but not all measures provided incremental criterion-related validity in all contexts. Future research needs to expand the focus on incremental criterionrelated validity. Construct Validity Construct validity is the broadest approach to validity; indeed, it subsumes the two previous types. A measure is considered to have high construct validity if it has theoretically predicted empirical relationships with other measures. To establish construct validity, therefore, the researcher first needs to embed the construct, here EI, in a theory. Doing so requires defining the construct being measured and explicitly stating the relationships between this construct and

9 Psychometric Issues 8 other constructs. The next step is for the researcher to measure each of the constructs specified, and determine their empirical relationships. Because construct validation presupposes a theory, the process of construct validation is complex and iterative. In practice, theory building, test development, and empirical exploration go on simultaneously and evolve continuously. The construct validity approach is illustrated by Mayer et al. s (1999) article Emotional Intelligence Meets Traditional Standards for an Intelligence. They specified four empirical criteria that any construct must satisfy in order to be a type of intelligence, and then collected evidence that showed that their measure, the Multi-factor Emotional Intelligence Scale, satisfied these four criteria. If an empirical relationship is not as predicted by the theory, then this could mean (a) the measure of EI is not valid, (b) the measure used for another construct in the study is not a valid measure of that construct, or (c) the theory about the relationship between these constructs is incorrect. As an example, Barchard and Hakstian (in press) argued that if a test is designed as a measure of a new cognitive ability, it should have moderate positive correlations with other cognitive abilities and small correlations with personality variables. They found this pattern for maximum-performance tests of EI: These tests had moderate correlations with standard tests of cognitive abilities and low correlations with personality dimensions. Self-report measures of EI, in contrast, did not show this pattern: they had non-significant correlations with cognitive abilities and moderate to large correlations with personality dimensions. Because there was longstanding empirical evidence of moderate positive relationships among measures of different types of intelligence (Cattell, 1971; Thurstone, 1947), and adequate evidence for the construct validity of the cognitive and personality measures they used, Barchard and Hakstian concluded that self-report measures of EI do not assess a cognitive ability. Readers who question the validity of the cognitive or personality measures they used or their theory of the relationships between and among measures of intelligence and personality, however, need not reach the same conclusion. Efforts to validate a test may be complicated by the fact that the same label is sometimes used for different conceptions. Recall the distinction between Ability EI and Trait EI. The types of construct validity evidence that are relevant to one of these may be irrelevant or even opposite to the evidence needed for the other. For example, when EI is defined as a cognitive ability (Mayer & Salovey, 1997; Mayer, Salovey, & Caruso, 2000, 2002), Barchard and Hakstian s study suggests that performance measures but not self-report measures are valid. In contrast, when EI is defined as a non-cognitive ability (Bar-On, 1997), Barchard and Hakstian s study suggests that the self-report measures but not the performance measures are valid. In other words, just the reverse pattern of correlations might be needed for construct validation. The dispute here is not resolvable through empirical study, but through agreement on conceptualization and labeling. In short, it is meaningless to compare the validity of measures of EI that are based on different conceptions. The two most widely used measures of EI -- the MSCEIT (Mayer et al., 2002) and the EQ-i (BarOn, 1997) -- are based on different conceptualizations. They could both be valid and yet unrelated to one another. Summary Most types of validity evidence can be classified as falling into one of three approaches: content, criterion-related, and construct validity. The types of validity evidence that are required will depend upon the way EI has been conceptualized by the test designer or test user and how the measure will be used. In each measurement situation, some types of validity evidence will be

10 Psychometric Issues 9 more compelling and convincing than others, but many types are usually needed. We know of no case in which a single piece of evidence could be considered definitive by itself in validating a test. Validity Viewed from the Perspective of Generalizability Theory Earlier, we introduced Generalizability Theory as a way of thinking about reliability. Here we argue that Generalizability Theory also provides a way of thinking about validity and, even more importantly, about the relation of reliability to validity. The reader has likely noted some blurring between reliability and validity; for example, the correlations of two tests of EI could be called convergent validity by one researcher, but alternate-form reliability by another. The reader has likely also heard that the reliability of a measure cannot exceed its validity. Generalizability Theory allows us to understand such notions in a particularly clear way. Let us imagine that we have two sets of scores and are interested in the relation between them. We can think of this as the problem of generalizing from one set of scores to the other. These two sets of scores differ from each other in some ways and are the same as each other in other ways. The scores might come from the same or from different items, or raters, or times, and might be designed to measure the same or different constructs. For example, in test-retest reliability, the two sets of scores differ in the time the test is administered but are the same in that both derive from the same test. In predictive validity, the two sets of scores differ in both time and the constructs measured. The general principle is this: The more ways in which the two sets of scores differ, the lower is our ability to generalize from the first set to the second the lower is their correlation. Conversely, the more we make the two sets of scores similar, the greater is our ability to generalize from the first to the second the greater is the correlation. Consider an example. Imagine that the first set of scores, Set 1, consists of responses to the MSCEIT, and that the second set of scores, Set 2, consists of an index of job performance (such as number of widgets sold) measured one year later. These two sets of scores differ in two principal ways. First, they differ in time: Set 2 is one year later than Set 1. Second, they differ in the constructs measured: EI versus job performance. The correlation between Set 1 and Set 2 is called a predictive validity coefficient. Now let us change Set 2 to make it more similar to Set 1. For example, we can measure Set 2 at the same time as Set 1. By reducing the number of ways that Set 1 and Set 2 differ, we increase the correlation between the two sets: We increase our ability to generalize from one set of scores to the other. This correlation would be called a concurrent validity coefficient, and (in the population) would be higher than the predictive validity coefficient. Alternatively, we can change Set 2 in a different way. We can continue to measure Set 2 one year after Set 1, but change what we are measuring for Set 2: We can use the MSCEIT. Now, the two sets of scores differ only in terms of time. We call this correlation a one-year test-retest reliability coefficient. Because Set 2 and Set 1 differ in only one way (time), the correlation will again be higher than it was in the first situation: Test-retest reliability will be higher than predictive validity. Finally, let us consider one other way in which Set 2 could be changed. We could use an alternate form of the MSCEIT and measure it at the same time as Set 1. The correlation between Set 1 and Set 2 would be called alternate form reliability, and would, once again, be higher than the predictive validity coefficient in the first situation. To summarize, when we calculate the correlation between two sets of scores, whether we call the correlation a reliability coefficient or a validity coefficient depends upon the ways in which the two sets of scores are the same and the ways they differ. For the most reliability coefficients, the two sets are similar in all ways but one. For validity, the two differ in more

11 Psychometric Issues 10 ways. Generalizability Theory thus subsumes both reliability and validity, with no clear border between the two. When it is said that the validity of a test cannot exceed its reliability, what is meant, therefore, is that we can generalize from one set of scores to another better if we standardize additional aspects of the measurement process. When we recognize that Generalizability Theory covers both reliability and validity, we see that there are two principal ways to increase a validity coefficient. The first method is to increase the amount of information that goes into each set of scores. For example, measurement of school performance is improved by replacing a grade in a single course with GPA in the first year, and is improved even more if we use GPA over a four-year period. The second method is to increase the similarity between the testing situations (time, location, administrator, etc.) for the predictor and criterion. Scoring Keys and Validity Suppose you want a test of the ability to read emotional facial expressions. A set of facial expressions is presented, and the test taker is asked to specify the emotional meaning of each. The question arises of how to determine which answers are correct -- how to create the scoring key. There are three methods of developing scoring keys: expert opinion, criterion-keying, and consensus scoring. The most common method in both the area of EI and in other areas of psychology is expert opinion: The test designers specify in advance how items will be scored. This method was used by BarOn (1997), Schutte et al. (1997), and Lane et al. (1990), for example. In our example of a facial expression test, a set of experts would be asked to specify the emotion in each photo. Of course, this method presupposes that the experts know the correct answers. The existence of scientific progress shows that expert opinion can change. The second method of developing a scoring key is criterion-keying. Items are selected based upon their ability to distinguish between criterion groups. This method was used to select items for the Minnesota Multiphasic Personality Inventory (MMPI). Patients diagnosed with a particular psychological disorder were compared to a normal control group. Items that distinguished the former from the latter were scored for that diagnosis. Criterion-keying has not been used to create the scoring key for any of the established tests of EI, perhaps because there are no obvious criterion groups to use. The criterion group would need to be high on EI, but equivalent in as many other ways as possible to the normal control group. A third method of creating a scoring key has recently been added to the psychometrician s toolbox: consensus scoring. This method uses the norm group to determine how to score each item on the test. In our example of facial expressions, the most common response to a face could be taken to be correct. This method was used to develop scoring keys for the MEIS (Mayer et al., 1999) and its successor, the MSCEIT (Mayer et al., 2002). On those tests, rather than scoring each response dichotomously as right or wrong, the score for a particular response is equal to the proportion of the norm group who gave that response for the item. This method has the advantage of providing a graded measure of similarity to the consensus. Consensus scoring is most convincing when the test itself purports to capture knowledge of majority views (which in turn could vary with cultural or social group.) Thus, strictly speaking, we might want to ask the test takers to specify how most people would label the facial expression in the photo. In practice, expert scoring and consensus scoring have yielded similar results. This finding could mean that the majority of people are experts in emotion, or, alternatively, that experts so far have not progressed past the majority opinion.

12 Psychometric Issues 11 Whatever method is used to develop the scoring key, the usefulness of the resulting scoring key can be determined empirically by examination of the content, construct, and criterion-related validity of the resulting items and scores. Test Bias Yet another consideration when asking if a test is any good is whether the test scores are biased. Indeed, when the bias stems from race, creed, color, sex, or national origin, it violates Title VII of the Civil Rights Act of 1964 (Pub. L. No , 42 U.S.C. 2000e et seq.) and needs to be examined and corrected. On many psychological measures, differences can be found between the average scores of men and women or between people from different ethnic groups. Even when these differences are statistically significant, such differences in and of themselves do not constitute evidence of test bias. If differences in test scores reflect true differences on the underlying construct or predict true differences on the criterion of interest, then the test is unbiased. Bias can exist in two different situations. Bias in measurement occurs when the relationship between test scores and the underlying construct is different for people who belong to different groups. Such observed differences in test scores do not reflect differences on the underlying construct. Bias in prediction occurs when the relationship between test scores and some criterion measure is different for people who belong to different groups. Group differences in test scores do not reflect differences on the criterion variable. Bias in measurement and bias in prediction can both be examined in detail using the Cleary model of test bias (Cleary, 1968; Gulliksen & Wilks, 1950; Humphreys, 1952). A multiple-regression analysis is conducted with the test of interest as the predictor (or independent) variable. The criterion (or dependent) variable is either a well-established measure of the construct (bias in measurement) or the criterion-variable of interest (bias in prediction). In each of these cases, if the relationship between the predictor and criterion is the same for the groups being compared, then the predictor is unbiased with respect to that group difference. There are two ways that regression equations can differ from each other, and so there are two types of bias. If the slopes of the regression equations are different for different groups, this is referred to as slope bias. Because the slope of the regression line is also equal to the validity coefficient, slope bias is also referred to as differential validity. This type of bias occurs when the validity coefficient for one group is significantly higher than the validity coefficient for another group. An example of slope bias is given in Figure 1. Here, the test is more valid for women than men. If a single regression line (the one in the middle) were used for both men and women, then criterion scores would be under-predicted for men who scored low on the predictor and would be over-predicted for men who scored high. The opposite would occur for women: Criterion-scores would be under-predicted for women who scored low and would be overpredicted for women who scored high. If the intercepts of the regression equations are different, this is referred to as intercept bias. An example of this type of bias is given in Figure 2. Here, the intercept for women is higher than the intercept for men. Men and women with identical predictor scores obtain different scores, on average, on the criterion measure. If a single regression line were used for both groups (or a single cut-off score were used for both groups), this line (or cut-off score) would over-predict scores for one group and under-predict scores for the other group. In the example shown in Figure 2, the criterion-scores for women would be under-predicted while the

13 Psychometric Issues 12 criterion-scores for men would be over-predicted. This will also reduce the overall validity coefficient. Testing for slope and intercept bias is quite easy. Two regression equations are used. The first regression equation contains only one predictor, the measure of interest. The second regression equation contains two more predictors: the possible bias variable and an interaction term that is calculated as the product of the measure of interest and the possible bias variable. If the beta-weight for the possible bias variable is statistically significant, this means that the intercepts are different for the different groups. If the interaction term is significant, this means that the slopes are different and that the possible bias variable does indeed moderate the relationship between the predictor and criterion. This method of testing for slope and intercept bias can be used with both categorical variables like sex and ethnicity (for which it was designed) and continuous variables, such as irrelevant cognitive abilities, personality traits, and response styles. If a continuous variable is used, rather than saying that there are group differences, we would say that the slope and intercept depend upon this third variable. At one point, some researchers thought that they had found evidence of slope bias in some specific settings (Katzell & Dyer, 1977; Schmitt, Mellon, & Bylenga, 1978), but most testing specialists now agree that this type of test bias does not occur often and many apparent instances of slope bias were due to inappropriate statistical analyses (Anastasi & Urbina, 1997; Murphy & Davidshofer, 2001). In contrast, there has been relatively consistent evidence of intercept bias with respect to both ethnicity (e.g., Arroyo, 1996; Cleary, Humphreys, Kendrick & Wesman, 1975; Hartigan & Wigdor, 1989; Gael, Grant, & Richie, 1975; Kranzler, Miller, & Jordon, 1999; Linn, 1982) and sex (Share & Silva, 2003; Kranzler et al., 1999; Zeidner, 1987). Linn and Werts (1971) and Reilly (1973) demonstrated mathematically that intercept bias will occur whenever one or more relevant predictors have been omitted from the regression equations. For example, if both emotion management and criminology background are relevant to predicting success as a police negotiator and two groups differ on both these predictors, then if either predictor is considered on its own, intercept bias will occur. Prediction intercept bias may be more common than measurement intercept bias, because most real world criteria are associated with multiple predictors, but most tests are designed to measure only one construct. Thus, a test that has no measurement bias can still show prediction bias if important predictor variables have been left out. On the other hand, bias in prediction could itself be caused by bias in measurement. Barchard and Biesecker (2003) examined 16 measures of EI (12 self-report and 4 maximum performance) for possible sex bias in the prediction of 15 measures of Relationship Success. For some measures of Relationship Success, most measures of EI evidenced intercept bias: Compared to men with equal scores on measures of EI, women obtained higher scores on the Relationship Success measures. This prediction bias could be caused by measurement bias on the measures of EI or Relationship Success or both, or it could be caused by the omission of other important predictors of Relationship Success. Therefore, when predicting real world behaviors, all of the key predictors should be used, and bias in prediction should be assessed for the linear combination of all the predictors, not for each individual predictor considered separately. Physical abilities, cognitive abilities, personality traits, and response styles could all cause measurement bias in tests of EI. One obvious example of a physical ability that could cause test bias in the area of EI is visual acuity. For example, some people may have difficulty

14 Psychometric Issues 13 seeing the facial expressions on the Missing Cartoons test (O Sullivan & Guilford, 1976) and scores on the test may be influenced by their lack of visual acuity. If so, this test may suffer from both slope bias (the test may be less valid for people with poor vision) and intercept bias (test scores may be artificially deflated for people with poor vision). An obvious example of a cognitive ability that could cause test bias in the area of EI is familiarity with the language the test is written in. Most tests of EI involve reading and writing. If the test is administered in English, then poor ability to read and write in English may result in artificially low scores for people who do not speak and read English as their first language. Barchard (2001a) found that people who were less familiar with English obtained lower scores on measures of Verbal Ability and written measures of EI, but received average scores on nonverbal measures of both Intelligence and EI. Personality characteristics could cause measurement bias. For example, Barchard (2001b) found that people who are more emotionally expressive obtained higher scores on the LEAS (Lane et al., 1990) and that sex differences on the LEAS were associated with and could be accounted for by sex differences in Emotional Expressivity. Barchard is currently collecting data to investigate whether these results were caused by intercept bias on the LEAS and if changes to the instructions can reduce or eliminate the sex differences. Finally, socially desirable response style could cause measurement bias. On self-report questionnaires, it is sometimes clear which response is most desirable for at least some items. Respondents who are influenced by the social desirability of items will then receive overly high scores. Fortunately, measurement bias caused by socially desirable responding has received some attention from EI researchers. One test, the EQi (BarOn, 1997) has a Positive Impression scale designed to assess socially desirable response style. Another test, the Tett SEI (Tett et al., in prep) was designed to minimize the influence of socially desirable response bias on test scores, but new data is needed to assess the effectiveness of this reduction. Only one study has specifically examined measures of EI for bias due to socially desirable responding. Sjorberg (2001) conducted a factor analysis of a test battery for the selection in business and business education. He found that a second-order factor related to EI was less subject to self-presentation bias than Big Five Scales. Test bias occurs whenever test scores are systematically influenced by factors that are irrelevant to the construct of interest, or when relevant variables have not been included in the prediction equation. Many tests may be influenced by Verbal Ability or socially desirable responding, and so many tests may suffer from some degree of measurement bias. As well, batteries often exclude important predictors or use sub-optimal linear combinations of predictors, and therefore prediction bias may also be common. Test bias may therefore be very common. If test bias is associated with race, creed, color, sex, or national origin, we need to take steps to reduce this bias. Conclusion When you are handed a test of EI (or any other psychological construct), you should ask, Is this test any good? Unfortunately, this simple question has no simple answer. Traditionally, the answer has been summarized as reliability, validity, and freedom from bias. However, this answer is too simple: Reliability, validity, and freedom from bias are not simply checklists of required information. Instead, many different types of evidence are needed. The evidence required depends on the way EI has been defined in this particular test, the purpose of using the test, and the context of its use. There is no such thing as the reliability of a test. Reliability of

alternate-form reliability The degree to which two or more versions of the same test correlate with one another. In clinical studies in which a given function is going to be tested more than once over

More information

ISC- GRADE XI HUMANITIES ( ) PSYCHOLOGY. Chapter 2- Methods of Psychology

ISC- GRADE XI HUMANITIES ( ) PSYCHOLOGY. Chapter 2- Methods of Psychology ISC- GRADE XI HUMANITIES (2018-19) PSYCHOLOGY Chapter 2- Methods of Psychology OUTLINE OF THE CHAPTER (i) Scientific Methods in Psychology -observation, case study, surveys, psychological tests, experimentation

More information

VARIABLES AND MEASUREMENT

VARIABLES AND MEASUREMENT ARTHUR SYC 204 (EXERIMENTAL SYCHOLOGY) 16A LECTURE NOTES [01/29/16] VARIABLES AND MEASUREMENT AGE 1 Topic #3 VARIABLES AND MEASUREMENT VARIABLES Some definitions of variables include the following: 1.

More information

Sex Differences in Ability Emotional Intelligence. Monica Beisecker and Kimberly. A. Barchard University of Nevada, Las Vegas

Sex Differences in Ability Emotional Intelligence. Monica Beisecker and Kimberly. A. Barchard University of Nevada, Las Vegas Sex Differences in Ability EI 1 Sex Differences in Ability Intelligence Monica Beisecker and Kimberly. A. Barchard University of Nevada, Las Vegas Reference: Beisecker, M. & Barchard, K.A. (2004, May).

More information

Emotional Intelligence Assessment Technical Report

Emotional Intelligence Assessment Technical Report Emotional Intelligence Assessment Technical Report EQmentor, Inc. 866.EQM.475 www.eqmentor.com help@eqmentor.com February 9 Emotional Intelligence Assessment Technical Report Executive Summary The first

More information

In search of the correct answer in an ability-based Emotional Intelligence (EI) test * Tamara Mohoric. Vladimir Taksic.

In search of the correct answer in an ability-based Emotional Intelligence (EI) test * Tamara Mohoric. Vladimir Taksic. Published in: Studia Psychologica - Volume 52 / No. 3 / 2010, pp. 219-228 In search of the correct answer in an ability-based Emotional Intelligence (EI) test * 1 Tamara Mohoric 1 Vladimir Taksic 2 Mirjana

More information

ADMS Sampling Technique and Survey Studies

ADMS Sampling Technique and Survey Studies Principles of Measurement Measurement As a way of understanding, evaluating, and differentiating characteristics Provides a mechanism to achieve precision in this understanding, the extent or quality As

More information

Answers to end of chapter questions

Answers to end of chapter questions Answers to end of chapter questions Chapter 1 What are the three most important characteristics of QCA as a method of data analysis? QCA is (1) systematic, (2) flexible, and (3) it reduces data. What are

More information

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS Michael J. Kolen The University of Iowa March 2011 Commissioned by the Center for K 12 Assessment & Performance Management at

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

Chapter 11. Experimental Design: One-Way Independent Samples Design

Chapter 11. Experimental Design: One-Way Independent Samples Design 11-1 Chapter 11. Experimental Design: One-Way Independent Samples Design Advantages and Limitations Comparing Two Groups Comparing t Test to ANOVA Independent Samples t Test Independent Samples ANOVA Comparing

More information

Report on the Ontario Principals Council Leadership Study. Executive Summary

Report on the Ontario Principals Council Leadership Study. Executive Summary Report on the Ontario Principals Council Leadership Study Executive Summary Howard Stone 1, James D. A. Parker, and Laura M. Wood The purpose of the Ontario Principals Council (OPC) leadership study (funded

More information

Importance of Good Measurement

Importance of Good Measurement Importance of Good Measurement Technical Adequacy of Assessments: Validity and Reliability Dr. K. A. Korb University of Jos The conclusions in a study are only as good as the data that is collected. The

More information

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison Empowered by Psychometrics The Fundamentals of Psychometrics Jim Wollack University of Wisconsin Madison Psycho-what? Psychometrics is the field of study concerned with the measurement of mental and psychological

More information

A review and critique of emotional intelligence measures

A review and critique of emotional intelligence measures Journal of Organizational Behavior J. Organiz. Behav. 26, 433 440 (2005) Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/job.319 A review and critique of emotional intelligence

More information

In this chapter we discuss validity issues for quantitative research and for qualitative research.

In this chapter we discuss validity issues for quantitative research and for qualitative research. Chapter 8 Validity of Research Results (Reminder: Don t forget to utilize the concept maps and study questions as you study this and the other chapters.) In this chapter we discuss validity issues for

More information

HPS301 Exam Notes- Contents

HPS301 Exam Notes- Contents HPS301 Exam Notes- Contents Week 1 Research Design: What characterises different approaches 1 Experimental Design 1 Key Features 1 Criteria for establishing causality 2 Validity Internal Validity 2 Threats

More information

Validity refers to the accuracy of a measure. A measurement is valid when it measures what it is suppose to measure and performs the functions that

Validity refers to the accuracy of a measure. A measurement is valid when it measures what it is suppose to measure and performs the functions that Validity refers to the accuracy of a measure. A measurement is valid when it measures what it is suppose to measure and performs the functions that it purports to perform. Does an indicator accurately

More information

26:010:557 / 26:620:557 Social Science Research Methods

26:010:557 / 26:620:557 Social Science Research Methods 26:010:557 / 26:620:557 Social Science Research Methods Dr. Peter R. Gillett Associate Professor Department of Accounting & Information Systems Rutgers Business School Newark & New Brunswick 1 Overview

More information

EMOTIONAL INTELLIGENCE skills assessment: technical report

EMOTIONAL INTELLIGENCE skills assessment: technical report OnlineAssessments EISA EMOTIONAL INTELLIGENCE skills assessment: technical report [ Abridged Derek Mann ] To accompany the Emotional Intelligence Skills Assessment (EISA) by Steven J. Stein, Derek Mann,

More information

Variables in Research. What We Will Cover in This Section. What Does Variable Mean?

Variables in Research. What We Will Cover in This Section. What Does Variable Mean? Variables in Research 9/20/2005 P767 Variables in Research 1 What We Will Cover in This Section Nature of variables. Measuring variables. Reliability. Validity. Measurement Modes. Issues. 9/20/2005 P767

More information

Chapter 11 Nonexperimental Quantitative Research Steps in Nonexperimental Research

Chapter 11 Nonexperimental Quantitative Research Steps in Nonexperimental Research Chapter 11 Nonexperimental Quantitative Research (Reminder: Don t forget to utilize the concept maps and study questions as you study this and the other chapters.) Nonexperimental research is needed because

More information

CHAPTER 3 METHOD AND PROCEDURE

CHAPTER 3 METHOD AND PROCEDURE CHAPTER 3 METHOD AND PROCEDURE Previous chapter namely Review of the Literature was concerned with the review of the research studies conducted in the field of teacher education, with special reference

More information

ACDI. An Inventory of Scientific Findings. (ACDI, ACDI-Corrections Version and ACDI-Corrections Version II) Provided by:

ACDI. An Inventory of Scientific Findings. (ACDI, ACDI-Corrections Version and ACDI-Corrections Version II) Provided by: + ACDI An Inventory of Scientific Findings (ACDI, ACDI-Corrections Version and ACDI-Corrections Version II) Provided by: Behavior Data Systems, Ltd. P.O. Box 44256 Phoenix, Arizona 85064-4256 Telephone:

More information

Validity and Reliability. PDF Created with deskpdf PDF Writer - Trial ::

Validity and Reliability. PDF Created with deskpdf PDF Writer - Trial :: Validity and Reliability PDF Created with deskpdf PDF Writer - Trial :: http://www.docudesk.com Validity Is the translation from concept to operationalization accurately representing the underlying concept.

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

2 Types of psychological tests and their validity, precision and standards

2 Types of psychological tests and their validity, precision and standards 2 Types of psychological tests and their validity, precision and standards Tests are usually classified in objective or projective, according to Pasquali (2008). In case of projective tests, a person is

More information

Investigating the Reliability of Classroom Observation Protocols: The Case of PLATO. M. Ken Cor Stanford University School of Education.

Investigating the Reliability of Classroom Observation Protocols: The Case of PLATO. M. Ken Cor Stanford University School of Education. The Reliability of PLATO Running Head: THE RELIABILTY OF PLATO Investigating the Reliability of Classroom Observation Protocols: The Case of PLATO M. Ken Cor Stanford University School of Education April,

More information

Validity. Ch. 5: Validity. Griggs v. Duke Power - 2. Griggs v. Duke Power (1971)

Validity. Ch. 5: Validity. Griggs v. Duke Power - 2. Griggs v. Duke Power (1971) Ch. 5: Validity Validity History Griggs v. Duke Power Ricci vs. DeStefano Defining Validity Aspects of Validity Face Validity Content Validity Criterion Validity Construct Validity Reliability vs. Validity

More information

Multiple Act criterion:

Multiple Act criterion: Common Features of Trait Theories Generality and Stability of Traits: Trait theorists all use consistencies in an individual s behavior and explain why persons respond in different ways to the same stimulus

More information

Competitive Edge, Inc. presents TRAIN-THE-TRAINER Conducted by Judy Suiter March 5-6, 2019 Hampton Inn, Peachtree City, Georgia

Competitive Edge, Inc. presents TRAIN-THE-TRAINER Conducted by Judy Suiter March 5-6, 2019 Hampton Inn, Peachtree City, Georgia Competitive Edge, Inc. presents TRAIN-THE-TRAINER Conducted by Judy Suiter March 5-6, 2019 Hampton Inn, Peachtree City, Georgia Where do I go to get EQ-i 2.0 and EQ360 Certified? We want to make you the

More information

Endogeneity is a fancy word for a simple problem. So fancy, in fact, that the Microsoft Word spell-checker does not recognize it.

Endogeneity is a fancy word for a simple problem. So fancy, in fact, that the Microsoft Word spell-checker does not recognize it. Jesper B Sørensen August 2012 Endogeneity is a fancy word for a simple problem. So fancy, in fact, that the Microsoft Word spell-checker does not recognize it. Technically, in a statistical model you have

More information

Associate Prof. Dr Anne Yee. Dr Mahmoud Danaee

Associate Prof. Dr Anne Yee. Dr Mahmoud Danaee Associate Prof. Dr Anne Yee Dr Mahmoud Danaee 1 2 What does this resemble? Rorschach test At the end of the test, the tester says you need therapy or you can't work for this company 3 Psychological Testing

More information

ABSTRACT. Field of Research: Academic achievement, Emotional intelligence, Gifted students.

ABSTRACT. Field of Research: Academic achievement, Emotional intelligence, Gifted students. 217- Proceeding of the Global Summit on Education (GSE2013) EMOTIONAL INTELLIGENCE AS PREDICTOR OF ACADEMIC ACHIEVEMENT AMONG GIFTED STUDENTS Ghasem Mohammadyari Department of educational science, Payame

More information

COMPUTING READER AGREEMENT FOR THE GRE

COMPUTING READER AGREEMENT FOR THE GRE RM-00-8 R E S E A R C H M E M O R A N D U M COMPUTING READER AGREEMENT FOR THE GRE WRITING ASSESSMENT Donald E. Powers Princeton, New Jersey 08541 October 2000 Computing Reader Agreement for the GRE Writing

More information

Psychology, 2010, 1: doi: /psych Published Online August 2010 (

Psychology, 2010, 1: doi: /psych Published Online August 2010 ( Psychology, 2010, 1: 194-198 doi:10.4236/psych.2010.13026 Published Online August 2010 (http://www.scirp.org/journal/psych) Using Generalizability Theory to Evaluate the Applicability of a Serial Bayes

More information

The happy personality: Mediational role of trait emotional intelligence

The happy personality: Mediational role of trait emotional intelligence Personality and Individual Differences 42 (2007) 1633 1639 www.elsevier.com/locate/paid Short Communication The happy personality: Mediational role of trait emotional intelligence Tomas Chamorro-Premuzic

More information

Doctoral Dissertation Boot Camp Quantitative Methods Kamiar Kouzekanani, PhD January 27, The Scientific Method of Problem Solving

Doctoral Dissertation Boot Camp Quantitative Methods Kamiar Kouzekanani, PhD January 27, The Scientific Method of Problem Solving Doctoral Dissertation Boot Camp Quantitative Methods Kamiar Kouzekanani, PhD January 27, 2018 The Scientific Method of Problem Solving The conceptual phase Reviewing the literature, stating the problem,

More information

EMOTIONAL INTELLIGENCE A GATEWAY TO SUCCESS FOR MANAGEMENT STUDENTS

EMOTIONAL INTELLIGENCE A GATEWAY TO SUCCESS FOR MANAGEMENT STUDENTS EMOTIONAL INTELLIGENCE A GATEWAY TO SUCCESS FOR MANAGEMENT STUDENTS Dr.G.Kalaiamuthan Assistant Professor in Commerce, Faculty of Management, Dept. of Management Studies, SCSVMV University, Enathur, Kanchipuram

More information

Measuring and Assessing Study Quality

Measuring and Assessing Study Quality Measuring and Assessing Study Quality Jeff Valentine, PhD Co-Chair, Campbell Collaboration Training Group & Associate Professor, College of Education and Human Development, University of Louisville Why

More information

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form INVESTIGATING FIT WITH THE RASCH MODEL Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form of multidimensionality. The settings in which measurement

More information

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

3 CONCEPTUAL FOUNDATIONS OF STATISTICS 3 CONCEPTUAL FOUNDATIONS OF STATISTICS In this chapter, we examine the conceptual foundations of statistics. The goal is to give you an appreciation and conceptual understanding of some basic statistical

More information

QUESTIONING THE MENTAL HEALTH EXPERT S CUSTODY REPORT

QUESTIONING THE MENTAL HEALTH EXPERT S CUSTODY REPORT QUESTIONING THE MENTAL HEALTH EXPERT S CUSTODY REPORT by IRA DANIEL TURKAT, PH.D. Venice, Florida from AMERICAN JOURNAL OF FAMILY LAW, Vol 7, 175-179 (1993) There are few activities in which a mental health

More information

Overview of the Logic and Language of Psychology Research

Overview of the Logic and Language of Psychology Research CHAPTER W1 Overview of the Logic and Language of Psychology Research Chapter Outline The Traditionally Ideal Research Approach Equivalence of Participants in Experimental and Control Groups Equivalence

More information

Saville Consulting Wave Professional Styles Handbook

Saville Consulting Wave Professional Styles Handbook Saville Consulting Wave Professional Styles Handbook PART 4: TECHNICAL Chapter 19: Reliability This manual has been generated electronically. Saville Consulting do not guarantee that it has not been changed

More information

APS Interest Group for Coaching Psychologists (QLD)

APS Interest Group for Coaching Psychologists (QLD) APS Interest Group for Coaching Psychologists (QLD) Enhancing Emotional Intelligence via Coaching: Evidence from the field Presented by Dr Benjamin Palmer Director of Research & Development, Genos Proudly

More information

Measurement and Descriptive Statistics. Katie Rommel-Esham Education 604

Measurement and Descriptive Statistics. Katie Rommel-Esham Education 604 Measurement and Descriptive Statistics Katie Rommel-Esham Education 604 Frequency Distributions Frequency table # grad courses taken f 3 or fewer 5 4-6 3 7-9 2 10 or more 4 Pictorial Representations Frequency

More information

How Do We Gather Evidence of Validity Based on a Test s Relationships With External Criteria?

How Do We Gather Evidence of Validity Based on a Test s Relationships With External Criteria? CHAPTER 8 How Do We Gather Evidence of Validity Based on a Test s Relationships With External Criteria? CHAPTER 8: HOW DO WE GATHER EVIDENCE OF VALIDITY BASED ON A TEST S RELATIONSHIPS WITH EXTERNAL CRITERIA?

More information

Psychology Research Process

Psychology Research Process Psychology Research Process Logical Processes Induction Observation/Association/Using Correlation Trying to assess, through observation of a large group/sample, what is associated with what? Examples:

More information

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data TECHNICAL REPORT Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data CONTENTS Executive Summary...1 Introduction...2 Overview of Data Analysis Concepts...2

More information

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA Data Analysis: Describing Data CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA In the analysis process, the researcher tries to evaluate the data collected both from written documents and from other sources such

More information

Everything DiSC Manual

Everything DiSC Manual Everything DiSC Manual PRODUCTIVE CONFLICT ADDENDUM The most recently published version of the Everything DiSC Manual includes a new section, found in Chapter 6, The Everything DiSC Applications, for Everything

More information

Reliability and Validity checks S-005

Reliability and Validity checks S-005 Reliability and Validity checks S-005 Checking on reliability of the data we collect Compare over time (test-retest) Item analysis Internal consistency Inter-rater agreement Compare over time Test-Retest

More information

The complete Insight Technical Manual includes a comprehensive section on validity. INSIGHT Inventory 99.72% % % Mean

The complete Insight Technical Manual includes a comprehensive section on validity. INSIGHT Inventory 99.72% % % Mean Technical Manual INSIGHT Inventory 99.72% Percentage of cases 95.44 % 68.26 % -3 SD -2 SD -1 SD +1 SD +2 SD +3 SD Mean Percentage Distribution of Cases in a Normal Curve IV. TEST DEVELOPMENT Historical

More information

Three Subfactors of the Empathic Personality Kimberly A. Barchard, University of Nevada, Las Vegas

Three Subfactors of the Empathic Personality Kimberly A. Barchard, University of Nevada, Las Vegas 1 Three Subfactors of the Empathic Personality Kimberly A. Barchard, University of Nevada, Las Vegas Reference: Barchard, K.A. (2002, May). Three subfactors of the empathic personality. Poster presented

More information

CHAPTER 2. RESEARCH METHODS AND PERSONALITY ASSESSMENT (64 items)

CHAPTER 2. RESEARCH METHODS AND PERSONALITY ASSESSMENT (64 items) CHAPTER 2. RESEARCH METHODS AND PERSONALITY ASSESSMENT (64 items) 1. Darwin s point of view about empirical research can be accurately summarized as... a. Any observation is better than no observation

More information

Assessment Information Brief: REVELIAN EMOTIONAL INTELLIGENCE ASSESSMENT (MSCEIT)

Assessment Information Brief: REVELIAN EMOTIONAL INTELLIGENCE ASSESSMENT (MSCEIT) Assessment Information Brief: REVELIAN EMOTIONAL INTELLIGENCE ASSESSMENT (MSCEIT) Prepared by: Revelian Psychology Team E: psych@revelian.com P: (AU) or +61 7 3552 www.revelian.com 1 www.revelian.com 2

More information

Issues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy

Issues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy Industrial and Organizational Psychology, 3 (2010), 489 493. Copyright 2010 Society for Industrial and Organizational Psychology. 1754-9426/10 Issues That Should Not Be Overlooked in the Dominance Versus

More information

Methodology for Non-Randomized Clinical Trials: Propensity Score Analysis Dan Conroy, Ph.D., inventiv Health, Burlington, MA

Methodology for Non-Randomized Clinical Trials: Propensity Score Analysis Dan Conroy, Ph.D., inventiv Health, Burlington, MA PharmaSUG 2014 - Paper SP08 Methodology for Non-Randomized Clinical Trials: Propensity Score Analysis Dan Conroy, Ph.D., inventiv Health, Burlington, MA ABSTRACT Randomized clinical trials serve as the

More information

Test Validity. What is validity? Types of validity IOP 301-T. Content validity. Content-description Criterion-description Construct-identification

Test Validity. What is validity? Types of validity IOP 301-T. Content validity. Content-description Criterion-description Construct-identification What is? IOP 301-T Test Validity It is the accuracy of the measure in reflecting the concept it is supposed to measure. In simple English, the of a test concerns what the test measures and how well it

More information

Assignment 4: True or Quasi-Experiment

Assignment 4: True or Quasi-Experiment Assignment 4: True or Quasi-Experiment Objectives: After completing this assignment, you will be able to Evaluate when you must use an experiment to answer a research question Develop statistical hypotheses

More information

Examining the Psychometric Properties of The McQuaig Occupational Test

Examining the Psychometric Properties of The McQuaig Occupational Test Examining the Psychometric Properties of The McQuaig Occupational Test Prepared for: The McQuaig Institute of Executive Development Ltd., Toronto, Canada Prepared by: Henryk Krajewski, Ph.D., Senior Consultant,

More information

Linking Assessments: Concept and History

Linking Assessments: Concept and History Linking Assessments: Concept and History Michael J. Kolen, University of Iowa In this article, the history of linking is summarized, and current linking frameworks that have been proposed are considered.

More information

Carrying out an Empirical Project

Carrying out an Empirical Project Carrying out an Empirical Project Empirical Analysis & Style Hint Special program: Pre-training 1 Carrying out an Empirical Project 1. Posing a Question 2. Literature Review 3. Data Collection 4. Econometric

More information

THE RELATIONSHIP BETWEEN EMOTIONAL INTELLIGENCE AND STRESS MANAGEMENT

THE RELATIONSHIP BETWEEN EMOTIONAL INTELLIGENCE AND STRESS MANAGEMENT THE RELATIONSHIP BETWEEN EMOTIONAL INTELLIGENCE AND STRESS MANAGEMENT Ms S Ramesar Prof P Koortzen Dr R M Oosthuizen Department of Industrial and Organisational Psychology University of South Africa th

More information

International Research Journal of Engineering and Technology (IRJET) e-issn: Volume: 03 Issue: 06 June p-issn:

International Research Journal of Engineering and Technology (IRJET) e-issn: Volume: 03 Issue: 06 June p-issn: INSPIRING LEADERSHIP THROUGH EMOTIONAL INTELLIGENCE Syed Mansoor Pasha Asst.professor Anurag Group of Institutions E-mail: Syd.mansoor@gmail.com Abstract: In today s rapidly changing environment effective

More information

SEMINAR ON SERVICE MARKETING

SEMINAR ON SERVICE MARKETING SEMINAR ON SERVICE MARKETING Tracy Mary - Nancy LOGO John O. Summers Indiana University Guidelines for Conducting Research and Publishing in Marketing: From Conceptualization through the Review Process

More information

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review Results & Statistics: Description and Correlation The description and presentation of results involves a number of topics. These include scales of measurement, descriptive statistics used to summarize

More information

Psychological testing

Psychological testing What is a psychological test Psychological testing An evaluative device or procedure in which a sample of an examinee s behavior in a specified domain is obtained and subsequently evaluated and scored

More information

Emotional Intelligence and Leadership

Emotional Intelligence and Leadership The Mayer Salovey Caruso Notes Emotional Intelligence Test (MSCEIT) 2 The Mayer Salovey Caruso Emotional Intelligence Test (MSCEIT) 2 The MSCEIT 2 measures four related abilities. 3 Perceiving Facilitating

More information

Validity and reliability of measurements

Validity and reliability of measurements Validity and reliability of measurements 2 3 Request: Intention to treat Intention to treat and per protocol dealing with cross-overs (ref Hulley 2013) For example: Patients who did not take/get the medication

More information

Chapter 3. Psychometric Properties

Chapter 3. Psychometric Properties Chapter 3 Psychometric Properties Reliability The reliability of an assessment tool like the DECA-C is defined as, the consistency of scores obtained by the same person when reexamined with the same test

More information

Critical Thinking Assessment at MCC. How are we doing?

Critical Thinking Assessment at MCC. How are we doing? Critical Thinking Assessment at MCC How are we doing? Prepared by Maura McCool, M.S. Office of Research, Evaluation and Assessment Metropolitan Community Colleges Fall 2003 1 General Education Assessment

More information

DRAFT (Final) Concept Paper On choosing appropriate estimands and defining sensitivity analyses in confirmatory clinical trials

DRAFT (Final) Concept Paper On choosing appropriate estimands and defining sensitivity analyses in confirmatory clinical trials DRAFT (Final) Concept Paper On choosing appropriate estimands and defining sensitivity analyses in confirmatory clinical trials EFSPI Comments Page General Priority (H/M/L) Comment The concept to develop

More information

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots Correlational Research Stephen E. Brock, Ph.D., NCSP California State University, Sacramento 1 Correlational Research A quantitative methodology used to determine whether, and to what degree, a relationship

More information

The IE-ACCME test. The IE-ACCME test. MetaEmotional Intelligence: Antonella D'Amico. Meta-Emotional Intelligence in adolescents

The IE-ACCME test. The IE-ACCME test. MetaEmotional Intelligence: Antonella D'Amico. Meta-Emotional Intelligence in adolescents Measuring and empowering Meta-Emotional Intelligence in adolescents Measuring : Antonella D'Amico Models and assessment tools of EI differ greatly: ABILITY MODELS TRAIT OR MIXED MODELS Mayer and Salovey

More information

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data Introduction to Multilevel Models for Longitudinal and Repeated Measures Data Today s Class: Features of longitudinal data Features of longitudinal models What can MLM do for you? What to expect in this

More information

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data Introduction to Multilevel Models for Longitudinal and Repeated Measures Data Today s Class: Features of longitudinal data Features of longitudinal models What can MLM do for you? What to expect in this

More information

A study of association between demographic factor income and emotional intelligence

A study of association between demographic factor income and emotional intelligence EUROPEAN ACADEMIC RESEARCH Vol. V, Issue 1/ April 2017 ISSN 2286-4822 www.euacademic.org Impact Factor: 3.4546 (UIF) DRJI Value: 5.9 (B+) A study of association between demographic factor income and emotional

More information

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity Measurement & Variables - Initial step is to conceptualize and clarify the concepts embedded in a hypothesis or research question with

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

Personality Down Under: Perspectives from Australia

Personality Down Under: Perspectives from Australia Personality Down Under: Perspectives from Australia Edited by Simon Boag Macquarie University, Sydney, NSW, Australia Chapter 10 Does Emotional Intelligence predict real-world performance? John Reid Department

More information

CHAPTER VI RESEARCH METHODOLOGY

CHAPTER VI RESEARCH METHODOLOGY CHAPTER VI RESEARCH METHODOLOGY 6.1 Research Design Research is an organized, systematic, data based, critical, objective, scientific inquiry or investigation into a specific problem, undertaken with the

More information

EXPERIMENTAL RESEARCH DESIGNS

EXPERIMENTAL RESEARCH DESIGNS ARTHUR PSYC 204 (EXPERIMENTAL PSYCHOLOGY) 14A LECTURE NOTES [02/28/14] EXPERIMENTAL RESEARCH DESIGNS PAGE 1 Topic #5 EXPERIMENTAL RESEARCH DESIGNS As a strict technical definition, an experiment is a study

More information

Validation of Scales

Validation of Scales Validation of Scales ἀγεωμέτρητος μηδεὶς εἰσίτω (Let none enter without a knowledge of mathematics) D R D I N E S H R A M O O Introduction Validity and validation are crucial to understanding psychological

More information

Measurement is the process of observing and recording the observations. Two important issues:

Measurement is the process of observing and recording the observations. Two important issues: Farzad Eskandanian Measurement is the process of observing and recording the observations. Two important issues: 1. Understanding the fundamental ideas: Levels of measurement: nominal, ordinal, interval

More information

1. Evaluate the methodological quality of a study with the COSMIN checklist

1. Evaluate the methodological quality of a study with the COSMIN checklist Answers 1. Evaluate the methodological quality of a study with the COSMIN checklist We follow the four steps as presented in Table 9.2. Step 1: The following measurement properties are evaluated in the

More information

Variables in Research. What We Will Cover in This Section. What Does Variable Mean? Any object or event that can take on more than one form or value.

Variables in Research. What We Will Cover in This Section. What Does Variable Mean? Any object or event that can take on more than one form or value. Variables in Research 1/1/2003 P365 Variables in Research 1 What We Will Cover in This Section Nature of variables. Measuring variables. Reliability. Validity. Measurement Modes. Issues. 1/1/2003 P365

More information

Reliability AND Validity. Fact checking your instrument

Reliability AND Validity. Fact checking your instrument Reliability AND Validity Fact checking your instrument General Principles Clearly Identify the Construct of Interest Use Multiple Items Use One or More Reverse Scored Items Use a Consistent Response Format

More information

PLANNING THE RESEARCH PROJECT

PLANNING THE RESEARCH PROJECT Van Der Velde / Guide to Business Research Methods First Proof 6.11.2003 4:53pm page 1 Part I PLANNING THE RESEARCH PROJECT Van Der Velde / Guide to Business Research Methods First Proof 6.11.2003 4:53pm

More information

The Youth Experience Survey 2.0: Instrument Revisions and Validity Testing* David M. Hansen 1 University of Illinois, Urbana-Champaign

The Youth Experience Survey 2.0: Instrument Revisions and Validity Testing* David M. Hansen 1 University of Illinois, Urbana-Champaign The Youth Experience Survey 2.0: Instrument Revisions and Validity Testing* David M. Hansen 1 University of Illinois, Urbana-Champaign Reed Larson 2 University of Illinois, Urbana-Champaign February 28,

More information

Emotional Intelligence. A Literature Review

Emotional Intelligence. A Literature Review Emotional Intelligence A Literature Review Scott Jensen Carolynn Kohn Stacy Rilea Roseann Hannon Gary Howells University of the Pacific Department of Psychology July 15, 2007 Table of Contents - - - -

More information

Handout 5: Establishing the Validity of a Survey Instrument

Handout 5: Establishing the Validity of a Survey Instrument In this handout, we will discuss different types of and methods for establishing validity. Recall that this concept was defined in Handout 3 as follows. Definition Validity This is the extent to which

More information

11-3. Learning Objectives

11-3. Learning Objectives 11-1 Measurement Learning Objectives 11-3 Understand... The distinction between measuring objects, properties, and indicants of properties. The similarities and differences between the four scale types

More information

Chapter 4: Defining and Measuring Variables

Chapter 4: Defining and Measuring Variables Chapter 4: Defining and Measuring Variables A. LEARNING OUTCOMES. After studying this chapter students should be able to: Distinguish between qualitative and quantitative, discrete and continuous, and

More information

DATA is derived either through. Self-Report Observation Measurement

DATA is derived either through. Self-Report Observation Measurement Data Management DATA is derived either through Self-Report Observation Measurement QUESTION ANSWER DATA DATA may be from Structured or Unstructured questions? Quantitative or Qualitative? Numerical or

More information

(CORRELATIONAL DESIGN AND COMPARATIVE DESIGN)

(CORRELATIONAL DESIGN AND COMPARATIVE DESIGN) UNIT 4 OTHER DESIGNS (CORRELATIONAL DESIGN AND COMPARATIVE DESIGN) Quasi Experimental Design Structure 4.0 Introduction 4.1 Objectives 4.2 Definition of Correlational Research Design 4.3 Types of Correlational

More information

Validity and reliability of measurements

Validity and reliability of measurements Validity and reliability of measurements 2 Validity and reliability of measurements 4 5 Components in a dataset Why bother (examples from research) What is reliability? What is validity? How should I treat

More information

Thinking Like a Researcher

Thinking Like a Researcher 3-1 Thinking Like a Researcher 3-3 Learning Objectives Understand... The terminology used by professional researchers employing scientific thinking. What you need to formulate a solid research hypothesis.

More information