Psychological testing

Size: px

Start display at page:

Download "Psychological testing"

Harvey Jasper Cannon
5 years ago
Views:

1 Psychological testing Lecture 12 Mikołaj Winiewski, PhD

2 Test Construction Strategies Content validation Empirical Criterion Factor Analysis Mixed approach (all of the above)

3 Content Validation Defining all aspects of the construct and create test items Derived from theory or based on purpose of the test Content of the item is of the primary importance Consulting experts about the constructs using qualitative methods Employ expert s as judges to assess each potential item using quantitative measures Perform psychometric analyses of items

4 Empirical Keying Create test items to measure one or more traits Derived from theory or based on purpose of the test Content of the item is not of the primary importance Administer test items to a criterion and control group Select items that best distinguish between these two groups

5 Factor Analisys Create test items to measure one or more traits derived from theory Content of the item is of the primary importance Administer test items to appropriate sample derived from population of interest large (depending on technique and base no of items) possibly representative Employ factor analysis (family of correlational techniques used to determine underlying structure of data)

6 Mixed approach Employing mixed strategy For example Defining all aspects of the construct cretin test items Employ expert s as judges to assess each potential item or use experts to create item pool Parallel Employ factor analysis Administer test items to a criterion and control group

7 Adaptation Producing (adjusting existing instruments) instruments that measure target constructs adequately in target cultures Using set of procedures and techniques to create equivalent tool Purpose / application - central issue in adaptations.

8 Main Applications of Translations/Adaptations Comparative Studies (diagnosis & research) Focus: Comparison of construct or mean scores across cultures Strategy: Maximizing comparability Studies in target culture (diagnosis & research) Focus: validity in new context Strategy: Maximizing local suitability

9 Considerations cultural equivalence Psychological theory / dimensions Psychological concepts / terms Behavioral indicators procedures

10 Considerations test equivalence Face equivalence (superficial) Psychometrical statistics, validity and reliability Psychological - functional Translation Construction

11 Adoption / translation Not only language! Literal/close translation: What is the name of the queen of the England? Problem: Item more difficult for American children than for English children Adaptation: What is the name of the president of the USA? Problem: Queen and president are not equally known in their respective countries

12 equivalence Words linguistic Meanings - psychological

13 Linguistic Equivalence (Broader than similarity of words) Linguistic equivalence refers to similarity of linguistic features of a text. Examples of relevant linguistic features are: Lexical similarity Grammatical accuracy In general: emphasis on formal-textual characteristics (cf. automatic translations)

14 Psychological Equivalence Psychological equivalence refers to similarity of (psychological) meaning and scores Similarity in a broad sense: Textual, e.g., Connotation of words, implied context of text Comprehensibility Metrical: Score comparability

15 Relationship between Two Perspectives Three possible relations between linguistic and psychological features, depending on the overlap: a. complete b. partial c. none psych. linguistic Translatable Poorly translatable Essentially non-translatable

16 Cultural adaptation Options / strategies Adoption / transcription (Close literal translation) Advantage: maintains metric equivalence Disadvantage: adequacy (too) readily assumed, should be demonstrated Adaptation translation travesty paraphrase Advantage: more flexible, more tailored to the context Disadvantage: fewer statistical techniques available to compare scores across cultures Assembly (re-assembly) (composing a new instrument) Advantage: very flexible Disadvantage: almost no comparability maintained

17 Adoption / transcription Literal translation of all items Focus: extreme translation fidelity Assumption: universality of constructs and behaviors Pros: metric equivalence possibility of straightforward comparisons Cons: language and psychometric problems

18 Adaptation: translation Faithful translation of original pool of items with possible changes Focus: translation fidelity Assumption: universality of constructs and behaviors, but not language Pros: better psychometric properties better construct and ecological validity Cons: Fewer comparison options Still some language and psychometric problems

19 Adaptation: travesty Free translation of original pool of items keeping meaning and changing language adjusting to language and psychological needs Focus: psychological meaning Assumption: universality of constructs but not language and possible cultural differences in behaviors Pros: better cultural adjustment less metric equivalence but still pretty good better psychometric properties Cons: Few comparison options Major differences between versions of the tests

20 Adaptation: paraphrase Creating new tool using original items as inspiration rather than base Focus: psychological meaning Assumption: universality of constructs but not behaviors and language Pros: good cultural adjustment good psychometric properties cultural equivalence Cons: No metric equivalence Major differences between versions of the tests

21 Assembly (re-assembly) Composing new instrument using original theoretical model and development strategy Focus: adaptation of tool and theory Assumption: no cultural universality of behaviors and language and possible differences in constructs Pros: Best cultural adjustment Cons: No metric equivalence Two different tools

22 Item Analysis

23 Purpose of Item Analysis Evaluates the quality of each item Rationale: the quality of items determines the quality of test (i.e., reliability & validity) May suggest ways of improving the measurement of a test Can help with understanding why certain tests predict some criteria but not others

24 Item Analysis When analyzing the test items, we have several questions about the performance of each item. Some of these questions include: Are the items congruent with the test objectives? Are the items valid? Do they measure what they're supposed to measure? Are the items reliable? Do they measure consistently? How long does it take an examinee to complete each item? What items are most difficult to answer correctly? What items are easy? Are there any poor performing items that need to be discarded?

25 Types of Item Analyses for CTT Three major types: 1. Assess quality of the distractors 2. Assess difficulty of the items 3. Assess how well an item differentiates between high and low performers

26 DISTRACTOR ANALYSIS 1) Question DISTRACTORS A. Multiple-Choice B. Multiple-Choice Correct answer C. Multiple-Choice D. Multiple-Choice

27 Distractor Analysis First question of item analysis: How many people choose each response? If there is only one best response, then all other response options are distractors. Example (N = 35): Which method has the best internal consistency? # a) projective test 1 b) peer ratings 1 c) forced choice 21 d) differences n.s. 12

28 Distractor Analysis A perfect test item would have 2 characteristics: 1. Everyone who knows the item gets it right 2. People who do not know the item will have responses equally distributed across the wrong answers. It is not desirable to have one of the distracters chosen more often than the correct answer. This result indicates a potential problem with the question. This distractor may be too similar to the correct answer and/or there may be something in either the stem or the alternatives that is misleading.

29 Distractor Analysis (cont d) Calculate the # of people expected to choose each of the distractors. If random same expected number for each wrong response (Figure 10-1). # of Persons Exp. To Choose Distractor N answering incorrectly 14 Number of distractors 3 = = 4.7

30 Distractor Analysis (cont d) When the number of persons choosing a distractor significantly exceeds the number expected, there are 2 possibilities: 1. It is possible that the choice reflects partial knowledge 2. The item is a poorly worded trick question unpopular distractors may lower item and test difficulty because it is easily eliminated extremely popular is likely to lower the reliability and validity of the test

31 Item Difficulty Percentage of test takers who respond correctly What if p =.00 What if p = 1.00?

32 Item Difficulty An item with a p value of.0 or 1.0 does not contribute to measuring individual differences and thus is certain to be useless When comparing 2 test scores, we are interested in who had the higher score or the differences in scores p value of.5 have most variation so seek items in this range and remove those with extreme values can also be examined to determine proportion answering in a particular way for items that don t have a correct answer

33 Item Difficulty (cont.) What is the best p-value? most optimal p-value =.50 maximum discrimination between good and poor performers Should we only choose items of.50? When shouldn t we?

34 Item Difficulty (cont.) Should we only choose items of.50? Not necessarily... When wanting to screen the very top group of applicants (i.e., admission to university or medical school). Cutoffs may be much higher Other institutions want a minimum level (i.e., minimum reading level) Cutoffs may be much lower

35 Item Difficulty (cont d) General Rules of Item Difficulty p low (<.20) p moderate ( ) p high (>.80) difficult test item moderately diff. easy item

Technical Specifications

Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically