What is a psychological test Psychological testing An evaluative device or procedure in which a sample of an examinee s behavior in a specified domain is obtained and subsequently evaluated and scored using a standardized process AERA et al. 1999 Lecture 2 Test objective and standardized measure of a sample of behaviour Anastasi (1988) Marcin Zajenkowski, PhD Mikołaj Winiewski, PhD A psychological test is a systematic procedure for obtaining samples of behaviors, relevant to cognitive or affective functioning, and for scoring and evaluating those samples according to standards Urbina (2004) Defining psychological test Some vocabulary issues Defining Element Systematic procedures Sample of behaviour Relevant, represenatative Scoring and evaluating Evaluation of test results acording to standards Explanation planning, uniformity and attention to details Representation of behaviors of interest Some sort of preestablished system for calculation Applying common / universal measure Rationale Objective and fair Efficiency and validty Identification and of test results Interpretation and meaning of test results Term TEST (strictly speaking) refers to those procedures in which outcome is evaluated based on correctness or quality. Instruments whose responses are neither evaluated nor scored as right-wrong or pass-fail are called INVENTORIES, QUESTIONNAIRES, SURVEYS, CHECKLISTS. However we re goanna use term TEST (or psychological test) to refer to all instruments regardless of type. 1
Some vocabulary issues Some vocabulary issues Term SCALE might refer to: A whole test made up of several parts (Wehsler Adult Intelligence Scale or Stanford-Binet Intelligence Scale) A subtest, or set of items within a test that measures specific characteristic. (Extraversion Scale of the NEO-PI-R) An array of subtests that share some common characteristic, (verbal scale in Wechsler intelligence tests) A separate instrument made up of items designed to evaluate a single characteristic, (Internal-External Locus of Control Scale, Rotter, 1966) The numerical system used to rate or to report value on some measured dimension. (1 strongly disagree to 5 strongly agree) NOTICE in psychometrics scale has more specific meaning. It refers to grup of items measuring single variable. BATTERY a group of tests or subtests that are administrated to one person at one time. STANDARIZATION disambiguation: sometimes you will find that many authors are using term standardization in two different meanings 1. uniformity of test procedure 2. standards for evaluating test results Urbina, S. 2004 Concept developed at the begining of XX century Measuring variables (explore and investigate) range of psychological variables, their most basic and typical use is as tools in making decisions about people Personnel selection ~200 B.C.E. - The Chinese civil service examination system, (Bowman, 1989) proficiency in music, archery, horsemanship, etc written exams in law, agriculture, geography, etc Influence civil service examination systems around the world: Vietnam (~1000 C.E.), Korea(~800 C.E.), Britain 1850s, U.S. Civil Service Examination 1860s 2
Education 13 century in Europe - development of universities and need to ascertain that students have acquired the knowledge University degrees - oral examinations Later secondary education Written exam (connected development of paper production) Clinical Psychology differentiating the normal from the abnormal Psychiatry beginning of the 19 century assessing the level of cognitive functioning of patients with various kinds of disorders such as mental retardation or brain damage. 1890s, Emil Kraepelin is trying to classify mental disorders according to their causes, symptoms, and courses Concept of comparing sane and insane individuals on the basis of characteristics (e.g. distractibility, memory capacity) Pioneering in utilizing free association techniques Experimental Psychology 1879 Wundt - creation of the first psychological lab in Leipzig, Germany. emphases on accuracy of measurements and for standardized conditions. Francis Galton (1822-1911) - statistical methods Some major steps 1905 Binet-Simon scale First intelligence test 30 tests or tasks varied in content and difficulty, assessing judgment and reasoning ability. 1917-1918 Army Alpha / Beta team of psychologists set up by Robert Yerkes (one of them was Lewis M. Terman) 1910 Thorndike s handwriting scale 1926 Scholastic Aptitude Test (SAT) 3
Some major steps 1939: Wechsler Wechsler-Bellevue Intelligence Scale (1997 - WAIS-III); 1943: Hathaway i McKinley Minnesota Multiphasic Personality Inventory (1989 - MMPI 2); 1949: Cattell 16 PF (Personality Factors); 1950-1990 Eysencks inventories; 1990 2000 Big 5 (Costa & McCrae).cal Psychological tests Thinking about psychological tests: Test as a tool Test as a product Psychological tests Type of gathered information (a very basic distinction) Performance Self description Observation Urbina, 2004 (p.7) 4
Performance Self description Test subject has to complete some sort of assignment (usually several test items). We are providing test taker / examinee with a task and we are assessing performance. Scoring of the test covers whole spectrum of performance (from no to maximal performance) We are usually assessing final performance (product) Tests of power maximal performance Tests of speed speed of performance Tests with limited time maximal performance within certain time Quality? We have to quantify it! BUT THEN NOTICE: We are calling it maximal performance but we might be interested in minimal or set cut-of point for our purpose Test taker provide us with description of his/her behavior. inventories, questionnaires, surveys, checklists, biographic forms, etc We are providing participant with set of questions or points of our interests (open ended) and we are recording and assessing: Behaviors (or sets of behaviors) Emotions Thoughts Preferences Facts NOTICE: Individuals have access to a wealth of information about themselves that is inaccessible to anyone else latent variables Traits (e.g. personality) States (e.g. mood, emotions) Attitudes (e.g. toward minority groups) Observation We are observing subject in specific or nonspecific situation or situations. We are recording and assessing typical behavior. Observation scales Rating scales Performance assessments aspects of tests Area of assessment cognition, personality, clinical, values, etc Commercial vs. non commercial form of publication (manual and test materials or journal article) copyright and legal status resources and institution Test function diagnosis, predictions, selection, placement, screening, etc Maximal vs. typical performance Administration criteria group vs individual; speed vs. power, security invasiveness 5
Qualifications Score interpretation norm vs. criterion reference normative vs. ipsative psychometric vs. qualitative Medium Item structure Objective subjective Verbal performance Age range Setting Different qualification levels in different companies. Qualifications levels differ but are mostly based on 1953 / 1954 APA outline (3 levels) Techniques for which are required special / additional qualifications (MMPI, CATTEL). Standardization Standardization has to do with uniformity of procedure in all important aspects of the administration, scoring and interpretation of tests. Urbina (2004) What does it mean? Procedure Standard procedure - What is it? What aspects of testing it s concerns. (based on Diagnoza Intelektu by Anna Matczak) External conditions place, light, group vs. individual testing, time (hour of the day) Instruction Test material Time limits (unlimited vs. limited) Test and test taker (age, education etc) Test user and test taker (attitude toward test taker, behavior of test user) Sequence of used tests 6
Two aspects of standardisation Basic (common for both) assumption Procedure address the same processes wich drives performance -> the obtained score External -> effect is outcome of individual s ability but with controlling that all individuals work in the same conditions. Internal -> effect is the outcome of an individual s ability when working in conditions that are optimal for the individual s abilities (different for each individual) Standardized test as a tool for Science(research) vs Practice (diagnosis)? AIM of the study Diagnosis we are interested in the individual All aspects should be standardized including dependant variable Experiment we are interested in our dependant variable All aspect should be standardized but we frequently manipulate dependant measure 7