Psychological testing

Similar documents
Psychological testing

Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida

ALABAMA SELF-ASSESSMENT INDEX PILOT PROGRAM SUMMARY REPORT

TECH 646 Analysis of Research in Industry and Technology

CLINICAL VS. BEHAVIOR ASSESSMENT

ADMS Sampling Technique and Survey Studies

Assessment: Interviews, Tests, Techniques. Clinical Psychology Lectures

Strategies for Dealing with Challenging Interpersonal Interactions in a Mental Health Setting

THE PROFESSIONAL BOARD FOR PSYCHOLOGY HEALTH PROFESSIONS COUNCIL OF SOUTH AFRICA TEST DEVELOPMENT / ADAPTATION PROPOSAL FORM

A Guide to Clinical Interpretation of the Test of Variables of Attention (T.O.V.A. TM )

Free Choice or Forced Choice? Your Choice

Chapter 6. Methods of Measuring Behavior Pearson Prentice Hall, Salkind. 1

Does the Use of Personality Inventory Cause Bias on Assessment Center Results Because of Social Desirability? Yasin Rofcanin Levent Sevinç

Piers Harris Children s Self-Concept Scale, Second Edition (Piers-Harris 2)

ANXIETY A brief guide to the PROMIS Anxiety instruments:

Chapter 1 Applications and Consequences of Psychological Testing

Project: Date: Presented by: Siegel HR

Measurement of Constructs in Psychosocial Models of Health Behavior. March 26, 2012 Neil Steers, Ph.D.

Assignment 2: Experimental Design

Psychological testing

ORIGINS AND DISCUSSION OF EMERGENETICS RESEARCH

Importance of Good Measurement

By Hui Bian Office for Faculty Excellence

Endler Multidimensional Anxiety Scales (EMAS)

Validity. Ch. 5: Validity. Griggs v. Duke Power - 2. Griggs v. Duke Power (1971)

Understanding and serving users

5/6/2008. Psy 427 Cal State Northridge Andrew Ainsworth PhD

Organizational Behaviour

CIRCLE 2.3. Why software?

AN ANALYSIS ON VALIDITY AND RELIABILITY OF TEST ITEMS IN PRE-NATIONAL EXAMINATION TEST SMPN 14 PONTIANAK

Diagnosis of Mental Disorders. Historical Background. Rise of the Nomenclatures. History and Clinical Assessment

Validity. Ch. 5: Validity. Griggs v. Duke Power - 2. Griggs v. Duke Power (1971)

INTRODUCTION TO ASSESSMENT OPTIONS

Issues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy

Manual Supplement. Posttraumatic Stress Disorder Checklist (PCL)

Selecting Research Participants. Conducting Experiments, Survey Construction and Data Collection. Practical Considerations of Research

SLEEP DISTURBANCE ABOUT SLEEP DISTURBANCE INTRODUCTION TO ASSESSMENT OPTIONS. 6/27/2018 PROMIS Sleep Disturbance Page 1

Best on the Left or on the Right in a Likert Scale

An introduction to personality assessments in the workplace. Getting more from your people.

PEDIATRIC SLEEP AND AUTISM CLINICAL GLOBAL IMPRESSIONS SCALE

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

by Peter K. Isquith, PhD, Robert M. Roth, PhD, Gerard A. Gioia, PhD, and PAR Staff

CHAPTER 2. RESEARCH METHODS AND PERSONALITY ASSESSMENT (64 items)

5. is the process of moving from the specific to the general. a. Deduction

PAIN INTERFERENCE. ADULT ADULT CANCER PEDIATRIC PARENT PROXY PROMIS-Ca Bank v1.1 Pain Interference PROMIS-Ca Bank v1.0 Pain Interference*

Basic SPSS for Postgraduate

Saville Consulting Wave Professional Styles Handbook

MEASUREMENT THEORY 8/15/17. Latent Variables. Measurement Theory. How do we measure things that aren t material?

PSYC2600 Lecture One Attitudes

FATIGUE. A brief guide to the PROMIS Fatigue instruments:

SURVEY RESEARCH. Topic #9. Measurement and assessment of opinions, attitudes, etc. Usually by means of questionnaires and sampling methods.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Effects of Food Labels on Consumer Buying Behaviour of Packaged food Products: a Comparative Study of Male-Female in NCR, India

Cómo se dice...? An analysis of patient-provider communication through bilingual providers, blue phones and live translators

10/9/2018. Ways to Measure Variables. Three Common Types of Measures. Scales of Measurement

ANXIETY. A brief guide to the PROMIS Anxiety instruments:

Animals in Translation TDQs pages 1 8

Item Analysis for Beginners

Test Validity. What is validity? Types of validity IOP 301-T. Content validity. Content-description Criterion-description Construct-identification

Deciding whether a person has the capacity to make a decision the Mental Capacity Act 2005

Constructing a Three-Part Instrument for Emotional Intelligence, Social Intelligence and Learning Behavior

On the usefulness of the CEFR in the investigation of test versions content equivalence HULEŠOVÁ, MARTINA

INTRODUCTION TO ASSESSMENT OPTIONS

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form

Comparison of Male and Female Response Behaviour on Minnesota Multiphasic Personality Inventory-2

The Psychometric Principles Maximizing the quality of assessment

Making a psychometric. Dr Benjamin Cowan- Lecture 9

PHYSICAL FUNCTION A brief guide to the PROMIS Physical Function instruments:

26:010:557 / 26:620:557 Social Science Research Methods

Psych 1Chapter 2 Overview

Career Counseling and Services: A Cognitive Information Processing Approach

SURVEY TOPIC INVOLVEMENT AND NONRESPONSE BIAS 1

ABOUT PHYSICAL ACTIVITY

Marc J. Tassé, PhD Nisonger Center UCEDD

Biology 321 Lab 1 Measuring behaviour Sept , 2011

PHYSICAL STRESS EXPERIENCES

Introduction to SOCIAL STYLE sm

Psychometric Instrument Development

EVALUATING AND IMPROVING MULTIPLE CHOICE QUESTIONS

7 Mistakes HR Professionals Make When Accommodating Employees Living on the Autism Spectrum By Sarah Taylor

HUMAN-COMPUTER INTERACTION EXPERIMENTAL DESIGN

MEANING AND PURPOSE. ADULT PEDIATRIC PARENT PROXY PROMIS Item Bank v1.0 Meaning and Purpose PROMIS Short Form v1.0 Meaning and Purpose 4a

Topics for today Ethics Bias

Chapter 6: Attribution Processes

EVMS Authorship Guidelines

DFCI IRB RECOMMENDED CONSENT LANGUAGE RELATING TO RETURN OF RESEARCH RESULTS

Ursuline College Accelerated Program

Technical Specifications

Interviewing, Structured and Unstructured

Unconscious Bias: From Awareness to Action!

Survey Question. What are appropriate methods to reaffirm the fairness, validity reliability and general performance of examinations?

Modes of Measurement. Outline. Modes of Measurement. PSY 395 Oswald

Chapter 13. Social Psychology

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS

GENERAL SELF-EFFICACY AND SELF-EFFICACY FOR MANAGING CHRONIC CONDITIONS

Reliability of feedback fechanism based on root cause defect analysis - case study

Questionnaires and Surveys

COGNITIVE FUNCTION. PROMIS Pediatric Item Bank v1.0 Cognitive Function PROMIS Pediatric Short Form v1.0 Cognitive Function 7a

Internal Consistency and Reliability of the Networked Minds Measure of Social Presence

Transcription:

Psychological testing Lecture 11 Mikołaj Winiewski, PhD Marcin Zajenkowski, PhD

Strategies for test development and test item considerations The procedures involved in item generation, item selection, and item analysis Critical to test authors and test developers. Test users need to be familiar with the procedures (to understand the nature of the tasks and evaluate the instruments they select). Different perspective Test author -> theory and design Test user -> - test selection - observing test takers responses

Test development (what for?) operationalization -> new theory -> sample behaviors from a newly defined test domain meet the needs of a special (not covered by existing tests) group of test takers improve the accuracy of test scores for their intended purpose Tests need to be revised

Defining the test universe, target group purpose Test development First Steps Developing a test plan Composing the test items Writing the administration instructions

Continued Steps of Test Construction Constructing Scales Piloting the Test Standardizing the Test Collecting Norms Validation & Reliability Studies Manual Writing Test Revision

Test Universe working definition of the construct locate studies that explain the construct locate current measures of the construct

Target group a list of characteristics of persons who will take the test characteristics that will affect how test takers will respond to the test questions reading level, disabilities, honesty, language, etc >>specific characteristic will depend on the construct<<

Purpose what the test will measure (and what for) how scores will be used to compare test takers (normative approach) to indicate achievement (criterion approach) will scores be used to test a theory or to provide information about an individual?

Developing a Test Plan a definition of the construct the content to be measured (test domain) the format for the questions how the test will be administered and scored how the test will be scored

Defining the Construct Define construct after reviewing literature about the construct and any available measures Operationalize in terms of observable and measurable behaviours Provides boundaries for the test domain (what should and shouldn t be included) Specify approximate number of items needed

Choosing the Test Format test format refers to the type of items the test will contain two important aspects of test format: stimulus mechanism for response (e.g., multiple choice, true-false, reaction, etc ). objective or subjective test format test items -> stimuli presented to the test taker form depends on decisions made in the test plan (purpose, target, administration, scoring) multiple options (see lecture no1 and 12)

Writing Good Items base of whole test construction process multiple problems (gazillions of tests with poor items) requires originality, creativity, knowledge of test domain and practice not all items perform as expected--may be too easy or difficult, may be misinterpreted, etc. rule of thumb to start with at least twice as many items as you expect to use multiple strategies

Administration and scoring Instructions Precise!! specify the testing environment to decrease variation or error in test scores should address: group or individual administration requirements for location (e.g., quiet, amount of space, etc ) required equipment time limits or approximate completion time script for administrator and answers to questions test takers may ask

Scoring Methods Cumulative model: most common assumes that the more a test taker responds in a particular fashion the more he/she has of the attribute being measured (e.g., more correct answers, or endorses higher numbers on a Likert scale) correct responses or responses on Likert scale are summed yields interval data that can be interpreted with reference to norms

Scoring Methods Categorical model: place test takers in a group e.g., a particular pattern of responses may suggest diagnosis of a certain psychological disorder typically yields nominal data because it places test takers in categories

Scoring Methods Ipsative model: test takers scores are not compared to that of other test takers but rather compare the scores on various scales WITHIN the test taker (Which scores are high & low) e.g., a test taker may complete a measure of interpersonal problems of various types and the test administrator may want to determine which of the types the test taker feels is most problematic for him or her Cumulative model may be combined with categorical or ipsative model

Response Bias In preparing an item review, each question can be evaluated from two perspectives: Is the item fair? Is the item biased? Tests are subject to error and one form comes from the test takers

Response Sets/Styles Are patterns of responding that result in misleading information and limit the accuracy and usefulness of the test scores Reasons for misleading information 1. Information requested is too personal 2. Distort their responses 3. Answer items carelessly 4. May feel coerced into completing the test

Response Style People always agree (acquiescence) or disagree (criticalness) with statements without attending to the actual content Usually, when items are ambiguous Solution: use both positively- and negativelykeyed items

Social Desirability Some test takers choose socially acceptable answers or present themselves in a favorable light People often do not attend to the accurate answer (trait being measured) but to the social acceptability of the statement This represents unwanted variance it might be a problem of the item itself

Faking Faking -- some test takers may respond in a particular way to cause a desired outcome may fake good (e.g., in employment settings) to create a favorable impression may fake bad (e.g., in clinical or forensic settings) as a cry for help or to appear mentally disturbed may use some subtle questions that are difficult to fake because they aren t clearly face valid

Faking Bad People try to look worse than they really are Common problem in clinical settings Reasons: Cry for help Want to plea insanity in court Want to avoid draft into military Want to show psychological damage Most people who fake bad overdo it

Handling Impression Management Use positive and negative impression scales (endorsed by 10% of the population) Use lie scales to flag those who score high (e.g., I get angry sometime ). Inconsistency scales (e.g., two different responses to two similar questions) (Use multiple assessment methods (other than self-report)

Random Responding Random responding may occur when test takers are unwilling or unable to respond accurately. likely to occur when test taker lacks the skills (e.g., reading), does not want to be evaluated, or lacks attention to the task try to detect by embedding a scale that tends to yield clear results from vast majority such that a different result suggests the test taker wasn t cooperating

Random Responding Strategies for detection: Duplicate items: I love my mother. I hate my mother. Infrequency scales: I ve never had hair on my head. I have not seen a car in 10 years.

Random Responding May occur for several reasons: People are not motivated to participate Reading or language difficulties Do not understand instructions / item content Too confused or disturbed to respond appropriately