Psychological testing Lecture 11 Mikołaj Winiewski, PhD Marcin Zajenkowski, PhD
Strategies for test development and test item considerations The procedures involved in item generation, item selection, and item analysis Critical to test authors and test developers. Test users need to be familiar with the procedures (to understand the nature of the tasks and evaluate the instruments they select). Different perspective Test author -> theory and design Test user -> - test selection - observing test takers responses
Test development (what for?) operationalization -> new theory -> sample behaviors from a newly defined test domain meet the needs of a special (not covered by existing tests) group of test takers improve the accuracy of test scores for their intended purpose Tests need to be revised
Defining the test universe, target group purpose Test development First Steps Developing a test plan Composing the test items Writing the administration instructions
Continued Steps of Test Construction Constructing Scales Piloting the Test Standardizing the Test Collecting Norms Validation & Reliability Studies Manual Writing Test Revision
Test Universe working definition of the construct locate studies that explain the construct locate current measures of the construct
Target group a list of characteristics of persons who will take the test characteristics that will affect how test takers will respond to the test questions reading level, disabilities, honesty, language, etc >>specific characteristic will depend on the construct<<
Purpose what the test will measure (and what for) how scores will be used to compare test takers (normative approach) to indicate achievement (criterion approach) will scores be used to test a theory or to provide information about an individual?
Developing a Test Plan a definition of the construct the content to be measured (test domain) the format for the questions how the test will be administered and scored how the test will be scored
Defining the Construct Define construct after reviewing literature about the construct and any available measures Operationalize in terms of observable and measurable behaviours Provides boundaries for the test domain (what should and shouldn t be included) Specify approximate number of items needed
Choosing the Test Format test format refers to the type of items the test will contain two important aspects of test format: stimulus mechanism for response (e.g., multiple choice, true-false, reaction, etc ). objective or subjective test format test items -> stimuli presented to the test taker form depends on decisions made in the test plan (purpose, target, administration, scoring) multiple options (see lecture no1 and 12)
Writing Good Items base of whole test construction process multiple problems (gazillions of tests with poor items) requires originality, creativity, knowledge of test domain and practice not all items perform as expected--may be too easy or difficult, may be misinterpreted, etc. rule of thumb to start with at least twice as many items as you expect to use multiple strategies
Administration and scoring Instructions Precise!! specify the testing environment to decrease variation or error in test scores should address: group or individual administration requirements for location (e.g., quiet, amount of space, etc ) required equipment time limits or approximate completion time script for administrator and answers to questions test takers may ask
Scoring Methods Cumulative model: most common assumes that the more a test taker responds in a particular fashion the more he/she has of the attribute being measured (e.g., more correct answers, or endorses higher numbers on a Likert scale) correct responses or responses on Likert scale are summed yields interval data that can be interpreted with reference to norms
Scoring Methods Categorical model: place test takers in a group e.g., a particular pattern of responses may suggest diagnosis of a certain psychological disorder typically yields nominal data because it places test takers in categories
Scoring Methods Ipsative model: test takers scores are not compared to that of other test takers but rather compare the scores on various scales WITHIN the test taker (Which scores are high & low) e.g., a test taker may complete a measure of interpersonal problems of various types and the test administrator may want to determine which of the types the test taker feels is most problematic for him or her Cumulative model may be combined with categorical or ipsative model
Response Bias In preparing an item review, each question can be evaluated from two perspectives: Is the item fair? Is the item biased? Tests are subject to error and one form comes from the test takers
Response Sets/Styles Are patterns of responding that result in misleading information and limit the accuracy and usefulness of the test scores Reasons for misleading information 1. Information requested is too personal 2. Distort their responses 3. Answer items carelessly 4. May feel coerced into completing the test
Response Style People always agree (acquiescence) or disagree (criticalness) with statements without attending to the actual content Usually, when items are ambiguous Solution: use both positively- and negativelykeyed items
Social Desirability Some test takers choose socially acceptable answers or present themselves in a favorable light People often do not attend to the accurate answer (trait being measured) but to the social acceptability of the statement This represents unwanted variance it might be a problem of the item itself
Faking Faking -- some test takers may respond in a particular way to cause a desired outcome may fake good (e.g., in employment settings) to create a favorable impression may fake bad (e.g., in clinical or forensic settings) as a cry for help or to appear mentally disturbed may use some subtle questions that are difficult to fake because they aren t clearly face valid
Faking Bad People try to look worse than they really are Common problem in clinical settings Reasons: Cry for help Want to plea insanity in court Want to avoid draft into military Want to show psychological damage Most people who fake bad overdo it
Handling Impression Management Use positive and negative impression scales (endorsed by 10% of the population) Use lie scales to flag those who score high (e.g., I get angry sometime ). Inconsistency scales (e.g., two different responses to two similar questions) (Use multiple assessment methods (other than self-report)
Random Responding Random responding may occur when test takers are unwilling or unable to respond accurately. likely to occur when test taker lacks the skills (e.g., reading), does not want to be evaluated, or lacks attention to the task try to detect by embedding a scale that tends to yield clear results from vast majority such that a different result suggests the test taker wasn t cooperating
Random Responding Strategies for detection: Duplicate items: I love my mother. I hate my mother. Infrequency scales: I ve never had hair on my head. I have not seen a car in 10 years.
Random Responding May occur for several reasons: People are not motivated to participate Reading or language difficulties Do not understand instructions / item content Too confused or disturbed to respond appropriately