Chapter Fairness. Test Theory and Fairness

Size: px
Start display at page:

Download "Chapter Fairness. Test Theory and Fairness"

Transcription

1 175 6 Chapter Fairness A test is considered fair when it is free from test bias. Test bias generally refers to elements of a test that are construct irrelevant and that systematically cause different levels or patterns of performance by members of various groups. The UNIT was conceptualized, designed, and developed to optimize fairness for individuals varying in age, sex, race, ethnicity, language, and nationality. Moreover, the UNIT was designed to provide a fair assessment of intelligence for individuals with hearing impairments, language disabilities, and color-vision deficiencies. The demonstration of the UNIT s fairness required examination of the internal characteristics of the test as well as the test s relationship to relevant external variables. Systematic evaluation and minimization of situational characteristics that may contribute to test bias were also conducted. The extent to which the UNIT is free from bias is based on theoretical, statistical, and practical aspects of its development and application. In this chapter, the multiple approaches and procedures undertaken during UNIT development to ensure fairness and to minimize bias are described. Test Theory and Fairness The UNIT was formulated with an underlying model of fairness positing five core concepts: (a) A language-free intelligence test is less susceptible to bias than a language-loaded test; (b) an intelligence test with multiple indexes of ability is fairer than one that assesses a single dimension of ability; (c) an intelligence test that minimizes the impact of previously acquired knowledge (crystallized intelligence) in the assessment of cognitive ability is fairer than one that does not; (d) an intelligence test that has minimal emphasis on speeded performance is fairer than one with greater emphasis on speed; and (e) an intelligence test with varied response modes is motivating and thereby less biased than a test with a

2 176 Examiner s Manual unidimensional response mode. Each of these concepts has a rich theoretical and applied back g r o u n d, w h i ch is summarized in the following sections. First, the UNIT was formulated to reduce the bias problems inherent in all language-dependent tests. Individuals who are not native speakers of English may be disadvantaged by English-language procedures (e.g., Duran, 1989; Geisinger, 1992; Oakland & Parmelee, 1985). Individuals who are speech and language impaired or hearing impaired may be disadvantaged by testing procedures in any language (e.g., Braden, 1994). The UNIT requires no receptive or expressive language on the part of the examinee. The UNIT s use of universal gestures with cross-cultural applicability, in conjunction with completely nonverbal administration, reduces bias related to language. Moreover, for separate samples of Hispanic populations and individuals who are bilingually educated or who are enrolled in ESL classes, there are minimal differences in their performance on the UNIT scales compared to that of a corresponding White, English-speaking sample. Standardized effect-size differences (Cohen s d) were no greater than 0.20 and 0.40, respectively, in these studies. Differences of this magnitude are considered to be small (Cohen, 1987). Second, the UNIT includes indexes of short-term memory that may be considered less culturally determined than many other indexes of intelligence. Jensen (1968, 1974, 1980) noted that differences in performance attributable to race on tasks involving rote learning and short-term memory are less than those on tasks involving abstract conceptual ability. From these findings, Jensen proposed two classes of abilities, which are differentially biased. One level involves the registration and consolidation of stimulus inputs and the formation of simple associations, with little transformation from the input to the response output. The other level, by contrast, involves self-initiated, conscious elaboration and transformation of input before an overt response is generated. The crucial distinction between the classes of abilities involves a difference in the complexity of the transformations and mental manipulations required to perform cognitive tasks (Jensen, 1980). In this model, tasks with fewer transformations and manipulations are considered less culturally influenced than tasks requiring many mental transformations. The UNIT s memory subtests, which are complex but which require minimal mental transformations and manipulations, are assumed to enhance its fairness. This assumption was supported by bias studies (see Tables 6.4 and 6.7) insofar as differences in mean scores on the Memory Quotient scale for U.S. Hispanic and non- Hispanic examinees as well as differences in scores for African Americans and Whites were consistently smaller than those found for the Reasoning Quotient scale. Third, the UNIT was designed to assess abilities, particularly fluid reasoning and memory, that are less influenced by educational opportunity and that are thereby fairer than tasks reliant on previously acquired knowledge. Educational achievement tends to be more strongly associated with knowledge of the dominant culture than with fluid reasoning ability (e.g., Horn & Noll, 1997). Accordingly, tests, such as the UNIT, that minimize

3 Fairness 177 reliance on previously acquired knowledge may be fairer measures of ability freer from the consequences of varying educational achievement and different educational opportunities. Fourth, the UNIT places comparatively little emphasis on time as part of the examinee s response and ultimate score. Timed testing has been implicated as a potential source of testing bias, although its biased properties tend to be small (Jensen, 1980). Decreased reliance on timed performance has been recognized as important (e. g., We ch s l e r, ), principally because of the emphasis on accuracy over speed in intelligent behavior. Pure speeded tasks have items that are generally so easy that nearly all examinees would obtain a perfect score if allowed enough time; however, speeded tasks place individuals with certain exceptionalities (e.g., motor impairments) or cultural backgrounds at an unfair disadvantage (Knapp, 1960). Finally, the UNIT was developed to engage and interest examinees by the use of varying test stimuli as well as by varying response modes. If a test is sufficiently engaging to minimize the problems with rapport with minority and special populations that are sometimes reported in standardized tests, the test may reduce situational test bias. Internal Test Characteristics The internal characteristics of a test include the content, statistical properties, and structure of its items, subtests, and composite scales. The extent to which internal characteristics are fair can be evaluated by multiple methods, including expert reviews of item, subtest, and scale content; statistical analyses of item, subtest, and scale fairness; comparative analyses of group mean performance by sex, race, ethnicity, and language; and comparison of factorial structure across groups. Test Content and Procedures The UNIT was designed to be as fair as possible in terms of situational v a r i a b l e s, i n cluding instructional sets, task composition, and response modes. Situational test-session variables may constitute a source of test bias if the directions for test administration or the rapport between the examiner and examinee systematically interact with the group membership of the examinee (Braden, ). Commonly cited examples include White examiners who insufficiently engage African American or Hispanic American children (Sattler, 1988) or hearing examiners who inadequately motivate hearingimpaired children (Braden, 1994). Other situational sources of bias in the UNIT were minimized through the use of nonverbal instructional sets, h i g h- ly varied and engaging items and procedures, and unbiased response modes. The UNIT test instructions were developed to be equally intelligible and understandable by all individuals, independent of their group membership.

4 178 Examiner s Manual In other words, all individuals taking the UNIT should have an equal understanding of what is being asked of them. To the extent that a test itself can interest and engage individuals of varying groups, the testing situation as a source of bias can be minimized. The UNIT incorporates a varied array of tasks that change frequently and that are thereby interesting, engaging, and motivating for individuals of diverse backgrounds and group memberships. Varied response modes are also intended to encourage application of varied problem-solving strategies. UNIT tasks included paperand-pencil constructional responses, fine motor manipulation and movement of response chips, and simple pointing responses. The diversity of responses may reduce sources of test situation bias, compared to a test with a unidimensional response mode (i.e., pointing to the correct response only). Expert Review of Item and Subtest Content At every stage of development, the content of UNIT items and procedures were reviewed for potential bias. Reviews were initially conducted by the authors and the test development staff. During two additional phases of development (i.e., during item pilot/tryout studies and during national standardization), psychologist consultants representing diverse cultural, ethnic, and racial backgrounds also reviewed the content of items and procedures for bias. These bias consultants represented the perspectives of female and male respondents, African Americans, Asian Americans, Hispanic Americans, Native Americans, and deaf and hearing-impaired individuals. During item pilot/tryout studies, six individuals with postgraduate degrees were asked to review individual items for bias. The reviewers included African American, Native American, Hispanic, and White individuals. During test standardization, a panel of reviewers was asked to identify any test items, artwork, and manipulable materials that might be offensive or that contained bias by culture, race, age, sex, disability, or any other type of bias. Items, artwork, and manipulables were modified or eliminated when they were consensually (i.e., by more than one reviewer) perceived as containing bias. Additionally, an optometrist with expertise in color-vision deficiencies reviewed the stimulus materials and presented them to individuals with the most common types of color-vision deficiencies. These individuals were able to discriminate accurately the colors presented in the stimuli. On the basis of this review, the UNIT is fair for examinees with color-vision deficiencies. Analysis of Item Fairness A number of statistical analyses were conducted to evaluate differential item functioning to ensure that items functioned similarly across sex, race, ethnicity, and language. Among the indexes examined were item character-

5 Fairness 179 istic curves, item performance by ability, and item demographic-variable correlations. Items were individually studied within separate groups to assess the adequacy of fit to expected item properties. These groups included Hispanic, African American, male and female respondents, and individuals with limited English proficiency. In item response theory, the acceptability of an item is based on its characteristic curve, that is, the relationship between individual ability and the probability of passing the item (i.e., item difficulty). Item characteristic curves are models based on two assumptions: A more able person should always have a greater probability of success on any item than a less able person, and any person should always be more likely to do better on an easier item than on a more difficult item (Wright & Stone, 1979). Item-fit statistics from the entire standardization sample were used as one criterion for identifying items that should be discarded or modified. As previously reported, every retained item in the UNIT demonstrated reasonable fit with the expected model. This criterion was extended to determine if any consistent differences in fit occurred in any demographic or special population. If an item has poor fit for members of a specific demographic group but adequate fit for the entire standardization sample, then the item properties may be different for that specific group. Only one item in one group (Item 17 in Analogic Reasoning for Hispanics) was found to have significantly poor fit statistics. A careful review of this item by bias reviewers and analyses by the Mantel Haenszel method did not support evidence of bias, so the item was retained. Performance across groups on UNIT items was also evaluated with the Mantel Haenszel statistic for differential item functioning, which assesses similarities in item functioning across sex, race, ethnicity, and language. The Mantel Haenszel procedure provides a means of detecting comparable performance on individual items by groups of comparable ability (i.e., members are matched by their total scores). On the basis of Educational Testing Service classification rules (Dorans & Holland, 1993), an item was considered to exhibit differential item functioning if chi-square was statistically significant (p.001) and the delta value was greater than an absolute value of 1.5. Items were analyzed for the following sets of groups: male and female respondents, African Americans and Whites, Asian Americans and Whites, Native Americans and Whites, Hispanic and non- Hispanic individuals, and hearing-impaired and non-hearing-impaired individuals. Results from these analyses showed differences between only two groups on only two items (Asian Americans and Whites for one item, and Native Americans and Whites for the other item). Because these differences were small, near the extremes of difficulty, and in favor of the minority group, the items were retained. UNIT items were also screened for bias by demonstrating that performance was not related to sex, race, or ethnicity at statistically significant or meaningful levels when ability level was controlled. These analyses were conducted with partial correlations between success on individual

6 180 Examiner s Manual items and sex, race, and ethnicity and with overall subtest performance controlled. Results showed that with the total subtest score held constant, little more than 2.25% of performance variance for an item was ever due to group membership. Accordingly, for the UNIT, item performance is not meaningfully related to sex, race, and ethnicity. Analyses of Subtest and Scale Fairness Three approaches were taken to demonstrate subtest and scale fairness: (a) demonstration of similar measurement precision (reliability) for varying groups categorized by sex, race, and ethnicity; (b) demonstration of similar internal (factor) structure for varying groups categorized by sex, race, and ethnicity; (c) comparison of actual levels of performance between groups with socioeconomic status and demographic characteristics controlled. Reliability The UNIT was designed to have comparable levels of score reliability and precision for all groups. However, without verification of comparable measurement accuracy, the assumption cannot be made that the reliability coefficients of any test scores for a normative population are the same as the reliability coefficients of that test's scores for diverse groups. Accordingly, reliability coefficients were calculated separately for male and female examinees, African Americans, and Hispanics. The split-half method was used to calculate subtest reliability, with coefficients corrected with the Spearman Brown formula. All scale reliability coefficients were calculated with the formula for the reliability of linear combinations (Nunnally, 1978). Table 6.1 includes the split-half reliability coefficients with Spearman Brown corrections for these groups. Cases were sampled from a broad age range so that several ages within each group were effectively examined. Therefore, also reported are reliability coefficients corrected by Gulliksen s (1987) formula to account for restriction or expansion in range. Results indicate that the UNIT subtests and scales are consistently reliable across sex, race, and ethnicity and that all reliabilities meet Bracken s (1987) standards. Construct Validity Just as measurement precision must be comparable across groups, so must the construct validity of the test. The UNIT's construct validity across sex, race, and ethnic groups was examined through a series of confirmatory factor analyses conducted separately for female and male examinees, African Americans, and Hispanics from the standardization sample. The same methods described in Chapter 5 were utilized: analysis of all six UNIT subtests and the comparison of three plausible, potentially falsifiable models (a one-factor model, a two-factor memory reasoning model, and a two-factor symbolic nonsymbolic model) according to multiple fit statistics. Results are presented in Table 6.2 and support the interpretation of a single general intelligence factor as well as the primary and

7 Fairness 181

8 182 Examiner s Manual secondary scales. All three models tested showed fully adequate fit to the data although the two-factor memory reasoning model had a slightly superior fit. This investigation provides initial evidence supporting the construct validity of the UNIT across sex, race, and ethnic groups. Group Comparison Studies Some test users consider mean score differences between groups an index of fairness. The underlying assumption is that groups should show equal ability and if they do not, the test is biased against the group (or groups) obtaining the lower score(s). Others do not consider mean differences de facto evidence of bias. For example, Jensen (1980) referred to the belief that unequal mean scores automatically indicate bias as the egalitarian fallacy, which is the gratuitous assumption that all human populations

9 Fairness 183 are essentially identical or equal in whatever trait or ability the tests purport to measure (p. 370). He noted that there is no a priori reason to expect groups who differ along sex, race, or ethnicity to exhibit the same mean score on intelligence tests (or a host of other human variables such as height, weight, muscle mass, gregariousness, etc.). In fact, Suzuki and Valencia (1997) concluded that racial/ethnic IQ differences are among the most thoroughly documented findings in psychology. They reported the following differences in mean IQs as generally accepted by experts in the field: Whites, 100; Native Americans, 90; African Americans, 85; Hispanics, somewhere between Whites and African Americans; and Asians, somewhere above 100. Suzuki and Valencia addressed the complexities associated with understanding racial/ethnic differences and concluded that focusing primarily on group differences can lead to misconceptions,

10 184 Examiner s Manual particularly in view of the fact that within-group differences exceed between-group differences. Obviously, the use of mean-score differences as an index of fairness is controversial. However, because many test users are interested in examining such differences, data for a number of (group) comparisons are presented. It is recommended that examiners interpret mean-score differences with caution and with the understanding that many influences interact to produce those differences. Mean differences in intellectual ability can best be determined by an examination of the extent to which groups differ only on that variable, with other variables, such as socioeconomic status and demographic characteristics, held constant. Performance on the UNIT subtests and scales was compared for members of diverse groups with demographically matched participants (on the basis of age, sex, and parent education level) from the entire pool of examinees, including those not part of the final normative sample. All examinees receiving special education services were excluded from these studies except where noted (i.e., individuals with hearing impairments and individuals with limited English proficiency). The following groups were compared: male and female examinees; African Americans and Whites; Asian American/Pacific Islanders and Whites; Hispanics and non-hispanics; Native Americans and Whites; bilingual and ESL individuals and White non-hispanics; individuals with hearing impairments and those without hearing impairments; and Ecuadorian examinees and English-speaking examinees living in the United States. Group mean scores, standard deviations, score differences, and effect sizes are presented for each of the comparisons. Cohen s d was again selected as an index of effect size. Female and Male Examinees Performance by separate samples of 1,159 female and 1,159 male examinees from the UNIT standardization sample was compared. The samples were matched on age (M 10.6 years, SD 3.6 years for both groups), race (for both groups: White, 78%; African American, 12%; and Other, 10%), ethnicity (Hispanic, 6%, and non-hispanic, 94% for female examinees; Hispanic, 8%, and non-hispanic, 92% for male examinees), parent education level (for both groups: HS, 14%; HS, 30%; Some College, 23%; and 4 Years College, 33%). The performance results for both groups are presented in Table 6.3. As the data in Table 6.3 show, mean FSIQs for male and female examinees are essentially identical across the Abbreviated, Standard, and Extended batteries. In all cases, the mean differences are less that one-half point. Small differences are apparent at the scale level. Mean scores on the Reasoning and Nonsymbolic quotients are slightly higher for male examinees than for female examinees; conversely, mean scores on the Memory and Symbolic quotients are slightly higher for female examinees. However, in all cases, the differences between quotients are negligible; differences range in magnitude from 1.29 to These differences are consistent with some of the literature recently reviewed by Halpern (1997), suggesting that female respondents score higher on tasks requiring certain

11 Fairness 185 language skills (e.g., phonological awareness and semantic understanding from long-term memory, comprehension of complex information); on the other hand, male respondents score higher on tasks requiring transformations in visual spatial working memory and tasks requiring spatial temporal reasoning and fluid reasoning. African Americans Study A sample of 352 African American examinees in regular educational settings (i.e., not receiving special educational services) was selected from the standardization sample. The mean age of the 174 female and 178 male examinees was 10.0 years (SD 3.5). The sample consisted of 4 Hispanics and 348 non-hispanics. By parent education level, the sample had the following composition: HS, 27%; HS, 33%; Some College, 25%; and 4 Years College, 15%. Members of this sample were matched according to age, sex, ethnicity, and parent education level on a case-by-case basis to a sample of White examinees also drawn from the standardization sample. The performance results for both groups are presented in Table 6.4.

12 186 Examiner s Manual The mean-score difference for the African American sample and comparison sample ranged from 7.13 (Symbolic Quotient, Standard Battery) to 9.77 (FSIQ, Extended Battery) on global scores across the three UNIT batteries. The mean FSIQ differences were 7.63, 8.63, and 9.77, respectively, for the Abbreviated, Standard, and Extended batteries. Notably, differences on the Standard Battery are smaller than those on the Extended Battery. Although all differences favor White examinees, differences between global scores are less than the 15 points often reported in the literature (see Jensen, 1980, and Suzuki & Valencia, 1997). In fact, the UNIT mean-score differences are almost one-half those typically reported. This decrease in racial differences is likely due to a variety of influences. One possible explanation is suggested in the literature describing the relationship between socioeconomic status and intellectual functioning. Suzuki and Valencia (1997) noted that when socioeconomic status is controlled, the observed White minority difference remains but is reduced. In addition, the UNIT tasks were developed to reduce the effects (e.g., emphasis

13 Fairness 187 on speed, culturally loaded language content) assumed to influence negatively the performance of minority examinees. Asian Americans/Pacific Islanders Study A sample of 49 Asian Americans/Pacific Islanders in regular educational settings (i.e., not receiving special educational services) was selected from the standardization sample. The average age of the 25 female and 24 male examinees in this sample was 10.4 years (SD 3.3). The sample consisted of only non-hispanics. Parent educational level was very high: HS, 2.3%; HS, 4.5%; Some College, 20.5%; and 4 Years College, 72.7%. Members of this sample were matched according to age, sex, and parent education level on a case-by-case basis to a sample of White examinees also drawn from the standardization sample. The performance results for both groups are presented in Table 6.5. Consistent with conclusions offered by Suzuki and Valencia (1997), Asian Americans/Pacific Islanders consistently scored higher on the UNIT scales than did the comparison sample. The mean FSIQs obtained by the Asian Americans/Pacific Islanders were higher than those obtained by the comparison sample, with differences of 7.31 (Abbreviated Battery), 9.41 (Standard Battery), and (Extended Battery). These differences are slightly greater than is typically found on language-loaded intelligence tests but are consistent with the differences found on visual spatial tests (Stanley, Feng, & Zhu, 1989; Vernon, 1982). Native Americans Study The UNIT was administered to 18 female and 16 male Native Americans aged from 6 to 16 years (M 13.6, SD 3.1). Parent education level was primarily at the high school level and above: HS, 9%; HS, 30%; Some College, 24%; and 4 Years College, 36%). The Native American sample was matched on age, sex, and parent education level to a sample of White examinees also drawn from the standardization sample. Performance results for the two groups are presented in Table 6.6. The mean-score differences between the Native American sample and the matched comparison sample ranged from 3.26 (Abbreviated Battery FSIQ) to 7.35 (Reasoning Quotient, Standard Battery). The mean FSIQ differences for the Abbreviated, Standard, and Extended batteries were 3.26, 6.50, and 6.24, respectively. All of these differences favored the matched comparison sample but are less than the differences (i.e., 10 points) often cited in the literature. The UNIT tasks were developed to reduce minority nonminority performance differences that are due to reliance on culturally laden lang u a g e, emphasis on speeded performance, and so on. A l s o, the socioeconomic status of the Native American sample was equated to that of members of the comparison group. Notably, on both the Standard and Extended batteries, the Symbolic Quotient difference was larger than the Nonsymbolic Quotient difference. This pattern was expected in view of the higher reliance on symbolic mediation in the symbolic component of the UNIT.

14 188 Examiner s Manual Hispanics Study A sample of 194 Hispanics in regular educational settings (i.e., not receiving special educational services) was selected from the standardization sample. The mean age of the 92 female and 102 male examinees in this sample was 10.4 years (SD 3.5). Parent education level was primarily at the high school level and below: HS, 38%; HS, 31%; Some College, 17%; and 4 Years College, 14%. Members of this sample were matched according to age, sex, and parent education level on a case-by-case basis to a sample of non-hispanic examinees also drawn from the standardization s a m p l e. The performance results for both groups are presented in Table 6.7. UNIT mean-score differences obtained by the Hispanic and non-hispanic examinees on all the global scores across the three batteries were very small, ranging from 0.14 to The mean FSIQ differences were 2.00, 2.13, and 1.43, respectively, for the Abbreviated, Standard, and Extended batteries. Of the 11 differences, 9 favored the non-hispanic examinees. Overall, differences were very small and are smaller than the performance

15 Fairness 189 differences between Hispanic and non-hispanic examinees reported in the literature. A variety of influences may operate simultaneously to reduce the UNIT differences (e.g., equating socioeconomic status and reducing the influences of language and speed). Notably, the differences were larger for the Symbolic Quotient than for the Nonsymbolic Quotient for both the Standard and Extended batteries. This pattern of scores would be expected due to the heavier reliance on symbolic mediation required on the Symbolic component of the UNIT. Bilingual and ESL Examinees Study The UNIT was administered to a sample of 78 examinees who are native Spanish speakers and whose English proficiency was either limited (bilingual) or high (ESL) according to formal assessments with instruments such as the Language Assessment Scales (DeAvila & Duncan, 1987). Half (50%) of the members of this sample were born outside the United States, and the average member of this sample had lived in the United States for 4.13 years (SD 2.4). The average age of the 39 female and 39 male examinees in this sample was 11.0 years (SD 2.1). Parent education

16 190 Examiner s Manual level was primarily at the high school level and below: HS, 53%; HS, 37%; and Some College, 10%. Members of this sample were matched according to age, sex, and parent education level on a case-by-case basis to a sample of W h i t e, non-hispanic examinees drawn from the standardization s a m p l e. The performance results for both groups are presented in Table 6.8. UNIT mean-score differences obtained by the native Spanish speakers and the matched comparison group ranged from 0.54 (Nonsymbolic Quotient, Extended Battery) to 6.04 (Symbolic Quotient, Extended Battery). The mean FSIQ differences were 2.82, 3.56, and 3.73, respectively, for the Abbreviated, Standard, and Extended batteries. Although 10 of the 11 differences favored the matched comparison group, these differences are relatively minor. Notably, this Spanish-speaking sample obtained dramatically lower scores on the Spanish-language Batería R (Woodcock & Muñoz- Sandoval, 1996) than on the nonverbal UNIT (see Chapter 5). The UNIT tasks were developed to reduce the effects of culture (e.g., de-emphasis on language and speed); in addition, the Spanish-speaking sample was matched on relevant variables, such as age, sex, and parent education

17 Fairness 191 level. The influence of reduced language effects can be inferred from the pattern of scores obtained on the UNIT. That is, relative to the Nonsymbolic Quotient difference, the Symbolic Quotient difference is very large on both the Standard and Extended batteries. Although no language is required to administer or to respond to the UNIT, some symbolic mediation is necessary to complete UNIT subtests. The symbolic component (Symbolic Quotient) of the UNIT requires more symbolic mediation than does the nonsymbolic component (Nonsymbolic Quotient). As Greenfield (1997) noted, cultural effects of language can be minimized but never eliminated from tests. The small magnitude of these differences is important because state departments of education frequently suggest the use of language-reduced tests for evaluating the cognitive abilities of ESL students. For example, the Tennessee State Department of Education s (1993) Student Evaluation Manual states, Tests that do not require verbal comprehension of directions or verbal expression of responses are generally considered to be

18 192 Examiner s Manual preferable for use with LEP students (p ). The acronym LEP ( limited English proficient ) is included in the language of the Bilingual Education Act, reauthorized in 1988 (Public Law ). A student is considered limited English-proficient if he or she has sufficient difficulty reading, writing, or understanding the English language because he or she is (a) a student who was born outside the United States or whose native language is not English, (b) a student who comes from an environment where a language other than English is dominant, or (c) a student who is an American Indian or Alaskan native and comes from an environment where a language other than English has had a significant impact on his or her level of English language proficiency. Deaf and Hearing-Impaired Examinees Study The term hearing-impaired usually includes individuals who are hard of hearing or deaf. Hard of hearing describes individuals with mild to moderate hearing losses who still retain sufficient residual hearing for communication through spoken language. Deaf is the term preferred by deaf people to refer to individuals with severe hearing losses who use sign language as their primary means of communication. For this study, a sample of 106 deaf and hearing-impaired individuals receiving special services was selected. Deaf examinees constituted the majority of this sample, with moderate hearing losses apparent in only a few participants. All members of this sample were attending a school for deaf or hearing-impaired students, which required one or more of the following criteria for enrollment: Inability to communicate effectively due to hearing impairments Delayed language development due to hearing impairments Inability to perform academically on a level commensurate with the expected level because of hearing problems The average age of the 60 female and 46 male examinees in this sample was 10.7 years (SD 3.3). The sample had the following racial/ethnic composition: 85 White, 15 African American, and 6 Other; and 7 Hispanic and 99 non-hispanic. Parent education level was primarily at the high s chool level ( H S, 1 8 % ; H S, 4 5 % ; Some College, 1 7 % ; and 4 Years College, 20%). Members of this sample were matched according to age, sex, race, ethnicity, and parent education level on a case-by-case basis to a sample of non-hearing-impaired examinees drawn from the standardization sample. The performance results of the two groups are presented in Table 6.9. The mean UNIT score differences between the deaf and hearing-impaired sample and the demographically matched comparison group ranged from 3.59 (Abbreviated Battery) to 8.01 (FSIQ, Extended Battery). The mean FSIQ differences were 3.59, 6.20, and 8.01, respectively, for the Abbreviated, Standard, and Extended batteries. All differences were in favor of the non-hearing-impaired examinees. In general, these differences are about one-third standard deviation and are considerably smaller than would be expected on language-loaded tests.

19 Fairness 193 Ecuadorian Examinees Study The performance of 30 Ecuadorian examinees was used to evaluate the use of the UNIT with individuals living in countries other than the United States. The average age of the 15 female and 15 male examinees in this sample was 9.4 years (SD 0.9). By parent education level, the sample had the following composition: HS, 7%; HS, 40%; Some College, 20%; and 4 Years College, 33%. Members of this sample were matched according to age, sex, and parent education level to individuals from the United States drawn from the standardization sample. The mean age of the U.S. comparison group, 9.3 years (SD 0.7), was slightly younger, but demographic variables were otherwise identical. The performance results of the two groups are presented in Table The mean UNIT score differences between the Ecuadorian sample and the demographically matched comparison group ranged from 0.67 (Nonsymbolic Quotient, Standard Battery) to (Symbolic Quotient,

20 194 Examiner s Manual Standard Battery). The mean FSIQ differences were 2.73, 5.33, and 3.26, respectively, for the Abbreviated, Standard, and Extended batteries. Of the 11 differences, 9 were in favor of the matched comparison group. Significantly, the two differences in favor of the Ecuadorian examinees were on the Nonsymbolic Quotient from the Standard and Extended batteries. In contrast, the largest differences, and 9.23, occurred on the Symbolic Quotient from the Standard and Extended batteries, respectively. This pattern of scores is consistent with expectations. The symbolic portion of the UNIT requires more symbolic mediation and is assumed to be more language-loaded. This is the component of the UNIT that should be more sensitive to cultural differences. It should be noted that most of the score differences between the Ecuadorian examinees and the matched comparison group were small and mainly a function of the large Symbolic Quotient influences.

21 Fairness 195 Relationship to External Variables: Prediction of Achievement Because intelligence tests like the UNIT were originally intended to predict academic success, studies were conducted to determine whether the UNIT fairly and equally predicts achievement, independent of race or sex. A widely accepted measure of academic achievement, the WJ R (Woodcock & Johnson, 1989/1990), was used to determine whether diverse groups of examinees with similar UNIT FSIQs would obtain the equal or nearly equal average achievement scores. The performance of African Americans and Whites (sample sizes were 130, 126, and 112, respectively, for the Abbreviated, Standard, and Extended batteries) and male and female examinees (sample sizes were 135, 131, and 112, respectively, for the Abbreviated, Standard, and Extended batteries) was compared. For each comparison, the score on the WJ R Skills cluster was predicted from the UNIT Abbreviated, Standard, and Extended FSIQs, race or sex, and the interaction of the FSIQ and race or sex. The regression slope (i.e., the strength of the relationship between the variable and achievement) for each of the groups was compared. Results indicated that race and sex did not contribute significantly (p.05) to the prediction of academic achievement. As expected, FSIQs were significantly related (p.05) to academic achievement. Accordingly, for these groups, the UNIT fairly and equally predicted achievement, independent of race or sex. Summary of Fairness Studies As stated in Chapter 2, a major goal for the development of the UNIT was to ensure fairness through the use of multiple methods and analyses. In this chapter, these methods were discussed, including the implications of the UNIT s theoretical underpinnings for fairness and the use of situational variables to optimize fairness. Several avenues of evaluating the UNIT s fairness were taken: expert bias reviews of procedures and item content; extensive analyses of differential item functioning; analyses of the comparability of the UNIT s measurement precision (reliability) and factor structure (validity) across sex, race, and ethnicity; and numerous group comparison studies. According to the findings from the comparison studies, it is apparent that some group differences in intelligence do occur, at least as assessed by the UNIT; it is also apparent that differences do not occur in other groups. Moreover, when group differences on the UNIT do occur, those differences are smaller than the differences typically reported in the literature for many language-loaded tests. The UNIT tasks were designed to reduce the influence of culture, and that goal appears to have been met.

22 196 Examiner s Manual Other noncognitive influences may also produce mean differences, and it is impossible to remove all of those. For example, groups were equated on the basis of socioeconomic status, with parent education level the basis of socioeconomic status. The literature suggests that parent occupational status and schooling attainment may act differently as predictors of children s intelligence between White and minority groups (Valencia & Rankin, 1986; White, 1982). Perhaps equating by additional or different measures of socioeconomic status would reduce the effect of culture even more. Finally, not all factors that affect group differences are apparent. As the knowledge base in psychological assessment increases, so will the ability of test developers to construct better measures of intelligence. The UNIT is a step in that direction.

Differential Item Functioning

Differential Item Functioning Differential Item Functioning Lecture #11 ICPSR Item Response Theory Workshop Lecture #11: 1of 62 Lecture Overview Detection of Differential Item Functioning (DIF) Distinguish Bias from DIF Test vs. Item

More information

Chapter 6 Topic 6B Test Bias and Other Controversies. The Question of Test Bias

Chapter 6 Topic 6B Test Bias and Other Controversies. The Question of Test Bias Chapter 6 Topic 6B Test Bias and Other Controversies The Question of Test Bias Test bias is an objective, empirical question, not a matter of personal judgment. Test bias is a technical concept of amenable

More information

Intelligence. Exam 3. Conceptual Difficulties. What is Intelligence? Chapter 11. Intelligence: Ability or Abilities? Controversies About Intelligence

Intelligence. Exam 3. Conceptual Difficulties. What is Intelligence? Chapter 11. Intelligence: Ability or Abilities? Controversies About Intelligence Exam 3?? Mean: 36 Median: 37 Mode: 45 SD = 7.2 N - 399 Top Score: 49 Top Cumulative Score to date: 144 Intelligence Chapter 11 Psy 12000.003 Spring 2009 1 2 What is Intelligence? Intelligence (in all cultures)

More information

Strategies for Reducing Adverse Impact. Robert E. Ployhart George Mason University

Strategies for Reducing Adverse Impact. Robert E. Ployhart George Mason University Strategies for Reducing Adverse Impact Robert E. Ployhart George Mason University Overview Diversity, Validity, and Adverse Impact Strategies for Reducing Adverse Impact Conclusions Tradeoff Between Optimal

More information

No part of this page may be reproduced without written permission from the publisher. (

No part of this page may be reproduced without written permission from the publisher. ( CHAPTER 4 UTAGS Reliability Test scores are composed of two sources of variation: reliable variance and error variance. Reliable variance is the proportion of a test score that is true or consistent, while

More information

Running head: CPPS REVIEW 1

Running head: CPPS REVIEW 1 Running head: CPPS REVIEW 1 Please use the following citation when referencing this work: McGill, R. J. (2013). Test review: Children s Psychological Processing Scale (CPPS). Journal of Psychoeducational

More information

Intelligence. PSYCHOLOGY (8th Edition) David Myers. Intelligence. Chapter 11. What is Intelligence?

Intelligence. PSYCHOLOGY (8th Edition) David Myers. Intelligence. Chapter 11. What is Intelligence? PSYCHOLOGY (8th Edition) David Myers PowerPoint Slides Aneeq Ahmad Henderson State University Worth Publishers, 2006 1 Intelligence Chapter 11 2 Intelligence What is Intelligence? Is Intelligence One General

More information

Intelligence. Exam 3. iclicker. My Brilliant Brain. What is Intelligence? Conceptual Difficulties. Chapter 10

Intelligence. Exam 3. iclicker. My Brilliant Brain. What is Intelligence? Conceptual Difficulties. Chapter 10 Exam 3 iclicker Mean: 32.8 Median: 33 Mode: 33 SD = 6.4 How many of you have one? Do you think it would be a good addition for this course in the future? Top Score: 49 Top Cumulative Score to date: 144

More information

Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria

Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria Thakur Karkee Measurement Incorporated Dong-In Kim CTB/McGraw-Hill Kevin Fatica CTB/McGraw-Hill

More information

Chapter 3. Psychometric Properties

Chapter 3. Psychometric Properties Chapter 3 Psychometric Properties Reliability The reliability of an assessment tool like the DECA-C is defined as, the consistency of scores obtained by the same person when reexamined with the same test

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

Section 5. Field Test Analyses

Section 5. Field Test Analyses Section 5. Field Test Analyses Following the receipt of the final scored file from Measurement Incorporated (MI), the field test analyses were completed. The analysis of the field test data can be broken

More information

Testing and Individual Differences

Testing and Individual Differences Testing and Individual Differences College Board Objectives: AP students in psychology should be able to do the following: Define intelligence and list characteristics of how psychologists measure intelligence:

More information

Reliability. Internal Reliability

Reliability. Internal Reliability 32 Reliability T he reliability of assessments like the DECA-I/T is defined as, the consistency of scores obtained by the same person when reexamined with the same test on different occasions, or with

More information

Psychology in Your Life

Psychology in Your Life Sarah Grison Todd Heatherton Michael Gazzaniga Psychology in Your Life SECOND EDITION Chapter 8 Thinking and Intelligence 1 2016 W. W. Norton & Company, Inc. 8.1 What Is Thinking? How we represent ideas

More information

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS Michael J. Kolen The University of Iowa March 2011 Commissioned by the Center for K 12 Assessment & Performance Management at

More information

Detection of Differential Test Functioning (DTF) and Differential Item Functioning (DIF) in MCCQE Part II Using Logistic Models

Detection of Differential Test Functioning (DTF) and Differential Item Functioning (DIF) in MCCQE Part II Using Logistic Models Detection of Differential Test Functioning (DTF) and Differential Item Functioning (DIF) in MCCQE Part II Using Logistic Models Jin Gong University of Iowa June, 2012 1 Background The Medical Council of

More information

Technical Report #2 Testing Children Who Are Deaf or Hard of Hearing

Technical Report #2 Testing Children Who Are Deaf or Hard of Hearing Technical Report #2 Testing Children Who Are Deaf or Hard of Hearing September 4, 2015 Lori A. Day, PhD 1, Elizabeth B. Adams Costa, PhD 2, and Susan Engi Raiford, PhD 3 1 Gallaudet University 2 The River

More information

On the purpose of testing:

On the purpose of testing: Why Evaluation & Assessment is Important Feedback to students Feedback to teachers Information to parents Information for selection and certification Information for accountability Incentives to increase

More information

CHAPTER VI RESEARCH METHODOLOGY

CHAPTER VI RESEARCH METHODOLOGY CHAPTER VI RESEARCH METHODOLOGY 6.1 Research Design Research is an organized, systematic, data based, critical, objective, scientific inquiry or investigation into a specific problem, undertaken with the

More information

Chapter 9: Intelligence and Psychological Testing

Chapter 9: Intelligence and Psychological Testing Chapter 9: Intelligence and Psychological Testing Intelligence At least two major "consensus" definitions of intelligence have been proposed. First, from Intelligence: Knowns and Unknowns, a report of

More information

Cultural Intelligence: A Predictor of Ethnic Minority College Students Psychological Wellbeing

Cultural Intelligence: A Predictor of Ethnic Minority College Students Psychological Wellbeing From the SelectedWorks of Teresa A. Smith March 29, 2012 Cultural Intelligence: A Predictor of Ethnic Minority College Students Psychological Wellbeing Teresa A. Smith Available at: https://works.bepress.com/teresa_a_smith/2/

More information

DIETARY RISK ASSESSMENT IN THE WIC PROGRAM

DIETARY RISK ASSESSMENT IN THE WIC PROGRAM DIETARY RISK ASSESSMENT IN THE WIC PROGRAM Office of Research and Analysis June 2002 Background Dietary intake patterns of individuals are complex in nature. However, assessing these complex patterns has

More information

The Psychometric Principles Maximizing the quality of assessment

The Psychometric Principles Maximizing the quality of assessment Summer School 2009 Psychometric Principles Professor John Rust University of Cambridge The Psychometric Principles Maximizing the quality of assessment Reliability Validity Standardisation Equivalence

More information

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review Results & Statistics: Description and Correlation The description and presentation of results involves a number of topics. These include scales of measurement, descriptive statistics used to summarize

More information

Intelligence, Thinking & Language

Intelligence, Thinking & Language Intelligence, Thinking & Language Chapter 8 Intelligence I. What is Thinking? II. What is Intelligence? III. History of Psychological Testing? IV. How Do Psychologists Develop Tests? V. Legal & Ethical

More information

Mantel-Haenszel Procedures for Detecting Differential Item Functioning

Mantel-Haenszel Procedures for Detecting Differential Item Functioning A Comparison of Logistic Regression and Mantel-Haenszel Procedures for Detecting Differential Item Functioning H. Jane Rogers, Teachers College, Columbia University Hariharan Swaminathan, University of

More information

COPYRIGHTED MATERIAL. One OVERVIEW OF THE SB5 AND ITS HISTORY

COPYRIGHTED MATERIAL. One OVERVIEW OF THE SB5 AND ITS HISTORY One OVERVIEW OF THE SB5 AND ITS HISTORY INTRODUCTION New editions of nationally standardized tests provide modern wording, illustrations, enhanced measurement procedures, updated theory and research, and

More information

Basic concepts and principles of classical test theory

Basic concepts and principles of classical test theory Basic concepts and principles of classical test theory Jan-Eric Gustafsson What is measurement? Assignment of numbers to aspects of individuals according to some rule. The aspect which is measured must

More information

Critical Thinking Assessment at MCC. How are we doing?

Critical Thinking Assessment at MCC. How are we doing? Critical Thinking Assessment at MCC How are we doing? Prepared by Maura McCool, M.S. Office of Research, Evaluation and Assessment Metropolitan Community Colleges Fall 2003 1 General Education Assessment

More information

Career Counseling and Services: A Cognitive Information Processing Approach

Career Counseling and Services: A Cognitive Information Processing Approach Career Counseling and Services: A Cognitive Information Processing Approach James P. Sampson, Jr., Robert C. Reardon, Gary W. Peterson, and Janet G. Lenz Florida State University Copyright 2003 by James

More information

Developmental Assessment of Young Children Second Edition (DAYC-2) Summary Report

Developmental Assessment of Young Children Second Edition (DAYC-2) Summary Report Developmental Assessment of Young Children Second Edition (DAYC-2) Summary Report Section 1. Identifying Information Name: Marcos Sanders Gender: M Date of Testing: 05-10-2011 Date of Birth: 09-15-2009

More information

VARIABLES AND MEASUREMENT

VARIABLES AND MEASUREMENT ARTHUR SYC 204 (EXERIMENTAL SYCHOLOGY) 16A LECTURE NOTES [01/29/16] VARIABLES AND MEASUREMENT AGE 1 Topic #3 VARIABLES AND MEASUREMENT VARIABLES Some definitions of variables include the following: 1.

More information

Intelligence. Intelligence Assessment Individual Differences

Intelligence. Intelligence Assessment Individual Differences Intelligence Intelligence Assessment Individual Differences Intelligence Theories of Intelligence Intelligence Testing Test Construction Extremes of Intelligence Differences in Intelligence Creativity

More information

Test review. Comprehensive Trail Making Test (CTMT) By Cecil R. Reynolds. Austin, Texas: PRO-ED, Inc., Test description

Test review. Comprehensive Trail Making Test (CTMT) By Cecil R. Reynolds. Austin, Texas: PRO-ED, Inc., Test description Archives of Clinical Neuropsychology 19 (2004) 703 708 Test review Comprehensive Trail Making Test (CTMT) By Cecil R. Reynolds. Austin, Texas: PRO-ED, Inc., 2002 1. Test description The Trail Making Test

More information

Chapter 1 Applications and Consequences of Psychological Testing

Chapter 1 Applications and Consequences of Psychological Testing Chapter 1 Applications and Consequences of Psychological Testing Topic 1A The Nature and Uses of Psychological Testing The Consequences of Testing From birth to old age, people encounter tests at all most

More information

2008 Ohio State University. Campus Climate Study. Prepared by. Student Life Research and Assessment

2008 Ohio State University. Campus Climate Study. Prepared by. Student Life Research and Assessment 2008 Ohio State University Campus Climate Study Prepared by Student Life Research and Assessment January 22, 2009 Executive Summary The purpose of this report is to describe the experiences and perceptions

More information

Gender-Based Differential Item Performance in English Usage Items

Gender-Based Differential Item Performance in English Usage Items A C T Research Report Series 89-6 Gender-Based Differential Item Performance in English Usage Items Catherine J. Welch Allen E. Doolittle August 1989 For additional copies write: ACT Research Report Series

More information

Appendix E: Limited English Proficiency Plan What s in our LEP Plan?

Appendix E: Limited English Proficiency Plan What s in our LEP Plan? Appendix E: Limited English Proficiency Plan What s in our LEP Plan? Content related to planning for LEP populations from the Title VI Nondiscrimination Plan (2018 Update) Limited English Proficient Households

More information

Snohomish Middle School 321 West B Street Snohomish, Wa Initial Evaluation

Snohomish Middle School 321 West B Street Snohomish, Wa Initial Evaluation Angela Deering 6 th grade DOB: 1/17/1999 Age 11 Snohomish Middle School 321 West B Street Snohomish, Wa 98297 Initial Evaluation Evaluation Team: Sarah Pemble, School Psychologist Nurse Miles Ms. Truman,

More information

Examining the Psychometric Properties of The McQuaig Occupational Test

Examining the Psychometric Properties of The McQuaig Occupational Test Examining the Psychometric Properties of The McQuaig Occupational Test Prepared for: The McQuaig Institute of Executive Development Ltd., Toronto, Canada Prepared by: Henryk Krajewski, Ph.D., Senior Consultant,

More information

THE APPLICATION OF ORDINAL LOGISTIC HEIRARCHICAL LINEAR MODELING IN ITEM RESPONSE THEORY FOR THE PURPOSES OF DIFFERENTIAL ITEM FUNCTIONING DETECTION

THE APPLICATION OF ORDINAL LOGISTIC HEIRARCHICAL LINEAR MODELING IN ITEM RESPONSE THEORY FOR THE PURPOSES OF DIFFERENTIAL ITEM FUNCTIONING DETECTION THE APPLICATION OF ORDINAL LOGISTIC HEIRARCHICAL LINEAR MODELING IN ITEM RESPONSE THEORY FOR THE PURPOSES OF DIFFERENTIAL ITEM FUNCTIONING DETECTION Timothy Olsen HLM II Dr. Gagne ABSTRACT Recent advances

More information

Samantha Sample 01 Feb 2013 EXPERT STANDARD REPORT ABILITY ADAPT-G ADAPTIVE GENERAL REASONING TEST. Psychometrics Ltd.

Samantha Sample 01 Feb 2013 EXPERT STANDARD REPORT ABILITY ADAPT-G ADAPTIVE GENERAL REASONING TEST. Psychometrics Ltd. 01 Feb 2013 EXPERT STANDARD REPORT ADAPTIVE GENERAL REASONING TEST ABILITY ADAPT-G REPORT STRUCTURE The Standard Report presents s results in the following sections: 1. Guide to Using This Report Introduction

More information

Critical Review: Speech Perception and Production in Children with Cochlear Implants in Oral and Total Communication Approaches

Critical Review: Speech Perception and Production in Children with Cochlear Implants in Oral and Total Communication Approaches Critical Review: Speech Perception and Production in Children with Cochlear Implants in Oral and Total Communication Approaches Leah Chalmers M.Cl.Sc (SLP) Candidate University of Western Ontario: School

More information

TRANSCRIPT: This is Dr. Chumney with a very brief overview of the most common purposes served by program evaluation research.

TRANSCRIPT: This is Dr. Chumney with a very brief overview of the most common purposes served by program evaluation research. This is Dr. Chumney with a very brief overview of the most common purposes served by program evaluation research. 1 Generally speaking, the purposes of evaluation research can be grouped into three broad

More information

Analogical Representations. Symbolic Representations. Culture as Cognition. Abstract mental representations. Includes: 9/15/2012

Analogical Representations. Symbolic Representations. Culture as Cognition. Abstract mental representations. Includes: 9/15/2012 Analogical Representations Mental images Analogous to object Brain processes mental objects like real objects Symbolic Representations Abstract mental representations Includes: Words Concepts Culture as

More information

Myers Psychology for AP, 2e

Myers Psychology for AP, 2e Myers Psychology for AP, 2e David G. Myers PowerPoint Presentation Slides by Kent Korek Germantown High School Worth Publishers, 2014 AP is a trademark registered and/or owned by the College Board, which

More information

TExES Deaf and Hard-of-Hearing (181) Test at a Glance

TExES Deaf and Hard-of-Hearing (181) Test at a Glance TExES Deaf and Hard-of-Hearing (181) Test at a Glance See the test preparation manual for complete information about the test along with sample questions, study tips and preparation resources. Test Name

More information

Empirical Knowledge: based on observations. Answer questions why, whom, how, and when.

Empirical Knowledge: based on observations. Answer questions why, whom, how, and when. INTRO TO RESEARCH METHODS: Empirical Knowledge: based on observations. Answer questions why, whom, how, and when. Experimental research: treatments are given for the purpose of research. Experimental group

More information

INCREASING REPRESENTATION IN A MIXED-MODE QUALITY OF LIFE SURVEY OF MEDICARE ESRD BENEFICIARIES

INCREASING REPRESENTATION IN A MIXED-MODE QUALITY OF LIFE SURVEY OF MEDICARE ESRD BENEFICIARIES ACADEMY HEALTH 2018 INCREASING REPRESENTATION IN A MIXED-MODE QUALITY OF LIFE SURVEY OF MEDICARE ESRD BENEFICIARIES Speaker: Katherine Harris, Principal Research Scientist Team Members: Amy L. Djangali,

More information

Testing and Intelligence. What We Will Cover in This Section. Psychological Testing. Intelligence. Reliability Validity Types of tests.

Testing and Intelligence. What We Will Cover in This Section. Psychological Testing. Intelligence. Reliability Validity Types of tests. Testing and Intelligence 10/19/2002 Testing and Intelligence.ppt 1 What We Will Cover in This Section Psychological Testing Reliability Validity Types of tests. Intelligence Overview Models Summary 10/19/2002

More information

Limited English Proficiency (LEP)

Limited English Proficiency (LEP) Policy Number: P-WIOA-LEP-1.A Effective Date: November 13, 2018 Approved By: Nick Schultz, Executive Director Limited English Proficiency (LEP) PURPOSE The purpose of the policy is to provide guidance

More information

Social and Pragmatic Language in Autistic Children

Social and Pragmatic Language in Autistic Children Parkland College A with Honors Projects Honors Program 2015 Social and Pragmatic Language in Autistic Children Hannah Li Parkland College Recommended Citation Li, Hannah, "Social and Pragmatic Language

More information

Interaction of Genes and the Environment

Interaction of Genes and the Environment Some Traits Are Controlled by Two or More Genes! Phenotypes can be discontinuous or continuous Interaction of Genes and the Environment Chapter 5! Discontinuous variation Phenotypes that fall into two

More information

Language Access Guidance Statutes

Language Access Guidance Statutes 9th Annual Domestic Violence Symposium UNDERSTANDING YOUR RIGHTS AND BEST PRACTICES FOR PROVIDING LANGUAGE ACCESS Seattle 2017 Language Access Guidance Statutes 1. Title Vi 2. Executive Order 13166 3.

More information

Exam #4 Study Guide. Chapter 7 Memory

Exam #4 Study Guide. Chapter 7 Memory Exam #4 Study Guide Chapter 7 Memory I. Memory Structural categorizations A. By length of time i. Sensory Store ii. Short Term Memory (working memory) iii. Long Term Memory B. By type of information i.

More information

(CORRELATIONAL DESIGN AND COMPARATIVE DESIGN)

(CORRELATIONAL DESIGN AND COMPARATIVE DESIGN) UNIT 4 OTHER DESIGNS (CORRELATIONAL DESIGN AND COMPARATIVE DESIGN) Quasi Experimental Design Structure 4.0 Introduction 4.1 Objectives 4.2 Definition of Correlational Research Design 4.3 Types of Correlational

More information

Instrument equivalence across ethnic groups. Antonio Olmos (MHCD) Susan R. Hutchinson (UNC)

Instrument equivalence across ethnic groups. Antonio Olmos (MHCD) Susan R. Hutchinson (UNC) Instrument equivalence across ethnic groups Antonio Olmos (MHCD) Susan R. Hutchinson (UNC) Overview Instrument Equivalence Measurement Invariance Invariance in Reliability Scores Factorial Invariance Item

More information

Psychologist use statistics for 2 things

Psychologist use statistics for 2 things Psychologist use statistics for 2 things O Summarize the information from the study/experiment O Measures of central tendency O Mean O Median O Mode O Make judgements and decisions about the data O See

More information

Chapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE

Chapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE Chapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE 1. When you assert that it is improbable that the mean intelligence test score of a particular group is 100, you are using. a. descriptive

More information

Melissa Heydon M.Cl.Sc. (Speech-Language Pathology) Candidate University of Western Ontario: School of Communication Sciences and Disorders

Melissa Heydon M.Cl.Sc. (Speech-Language Pathology) Candidate University of Western Ontario: School of Communication Sciences and Disorders Critical Review: Can joint attention, imitation, and/or play skills predict future language abilities of children with Autism Spectrum Disorders (ASD)? Melissa Heydon M.Cl.Sc. (Speech-Language Pathology)

More information

Methodology for Non-Randomized Clinical Trials: Propensity Score Analysis Dan Conroy, Ph.D., inventiv Health, Burlington, MA

Methodology for Non-Randomized Clinical Trials: Propensity Score Analysis Dan Conroy, Ph.D., inventiv Health, Burlington, MA PharmaSUG 2014 - Paper SP08 Methodology for Non-Randomized Clinical Trials: Propensity Score Analysis Dan Conroy, Ph.D., inventiv Health, Burlington, MA ABSTRACT Randomized clinical trials serve as the

More information

Statistical Methods and Reasoning for the Clinical Sciences

Statistical Methods and Reasoning for the Clinical Sciences Statistical Methods and Reasoning for the Clinical Sciences Evidence-Based Practice Eiki B. Satake, PhD Contents Preface Introduction to Evidence-Based Statistics: Philosophical Foundation and Preliminaries

More information

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison Empowered by Psychometrics The Fundamentals of Psychometrics Jim Wollack University of Wisconsin Madison Psycho-what? Psychometrics is the field of study concerned with the measurement of mental and psychological

More information

A framework for predicting item difficulty in reading tests

A framework for predicting item difficulty in reading tests Australian Council for Educational Research ACEReSearch OECD Programme for International Student Assessment (PISA) National and International Surveys 4-2012 A framework for predicting item difficulty in

More information

Reliability Theory for Total Test Scores. Measurement Methods Lecture 7 2/27/2007

Reliability Theory for Total Test Scores. Measurement Methods Lecture 7 2/27/2007 Reliability Theory for Total Test Scores Measurement Methods Lecture 7 2/27/2007 Today s Class Reliability theory True score model Applications of the model Lecture 7 Psych 892 2 Great Moments in Measurement

More information

Supplementary Online Content

Supplementary Online Content Supplementary Online Content Sun LS, Li G, Miller TLK, et al. Association between a single general anesthesia exposure before age 36 months and neurocognitive outcomes in later childhood. JAMA. doi:10.1001/jama.2016.6967

More information

A proposal for collaboration between the Psychometrics Committee and the Association of Test Publishers of South Africa

A proposal for collaboration between the Psychometrics Committee and the Association of Test Publishers of South Africa A proposal for collaboration between the Psychometrics Committee and the Association of Test Publishers of South Africa 27 October 2015 Table of contents Introduction... 3 Overview of the Association of

More information

Personal Well-being Among Medical Students: Findings from a Pilot Survey

Personal Well-being Among Medical Students: Findings from a Pilot Survey Analysis IN BRIEF Volume 14, Number 4 April 2014 Association of American Medical Colleges Personal Well-being Among Medical Students: Findings from a Pilot Survey Supplemental Information References 1.

More information

JENSEN'S THEORY OF INTELLIGENCE: A REPLY

JENSEN'S THEORY OF INTELLIGENCE: A REPLY Journal of Educational Pevcholon WS, Vol. 00, No. «, 427-431 JENSEN'S THEORY OF INTELLIGENCE: A REPLY ARTHUR R. JENSEN 1 University of California, Berkeley The criticism of Jensen's "theory of intelligence"

More information

Psych 1Chapter 2 Overview

Psych 1Chapter 2 Overview Psych 1Chapter 2 Overview After studying this chapter, you should be able to answer the following questions: 1) What are five characteristics of an ideal scientist? 2) What are the defining elements of

More information

The Relationship Between Clinical Diagnosis and Length of Treatment. Beth Simpson-Cullor. Senior Field Research Project. Social Work Department

The Relationship Between Clinical Diagnosis and Length of Treatment. Beth Simpson-Cullor. Senior Field Research Project. Social Work Department 1 The Relationship Between Clinical Diagnosis and Length of Treatment Beth Simpson-Cullor Senior Field Research Project Social Work Department University of Tennessee at Chattanooga 2 Abstract Clinicians

More information

Chapter 2--Norms and Basic Statistics for Testing

Chapter 2--Norms and Basic Statistics for Testing Chapter 2--Norms and Basic Statistics for Testing Student: 1. Statistical procedures that summarize and describe a series of observations are called A. inferential statistics. B. descriptive statistics.

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

Chapter 11. Experimental Design: One-Way Independent Samples Design

Chapter 11. Experimental Design: One-Way Independent Samples Design 11-1 Chapter 11. Experimental Design: One-Way Independent Samples Design Advantages and Limitations Comparing Two Groups Comparing t Test to ANOVA Independent Samples t Test Independent Samples ANOVA Comparing

More information

MindmetriQ. Technical Fact Sheet. v1.0 MindmetriQ

MindmetriQ. Technical Fact Sheet. v1.0 MindmetriQ 2019 MindmetriQ Technical Fact Sheet v1.0 MindmetriQ 1 Introduction This technical fact sheet is intended to act as a summary of the research described further in the full MindmetriQ Technical Manual.

More information

English 10 Writing Assessment Results and Analysis

English 10 Writing Assessment Results and Analysis Academic Assessment English 10 Writing Assessment Results and Analysis OVERVIEW This study is part of a multi-year effort undertaken by the Department of English to develop sustainable outcomes assessment

More information

Lecture 4: Research Approaches

Lecture 4: Research Approaches Lecture 4: Research Approaches Lecture Objectives Theories in research Research design approaches ú Experimental vs. non-experimental ú Cross-sectional and longitudinal ú Descriptive approaches How to

More information

Bruno D. Zumbo, Ph.D. University of Northern British Columbia

Bruno D. Zumbo, Ph.D. University of Northern British Columbia Bruno Zumbo 1 The Effect of DIF and Impact on Classical Test Statistics: Undetected DIF and Impact, and the Reliability and Interpretability of Scores from a Language Proficiency Test Bruno D. Zumbo, Ph.D.

More information

USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION

USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION Iweka Fidelis (Ph.D) Department of Educational Psychology, Guidance and Counselling, University of Port Harcourt,

More information

Linking Assessments: Concept and History

Linking Assessments: Concept and History Linking Assessments: Concept and History Michael J. Kolen, University of Iowa In this article, the history of linking is summarized, and current linking frameworks that have been proposed are considered.

More information

Unit Three: Behavior and Cognition. Marshall High School Mr. Cline Psychology Unit Three AE

Unit Three: Behavior and Cognition. Marshall High School Mr. Cline Psychology Unit Three AE Unit Three: Behavior and Cognition Marshall High School Mr. Cline Psychology Unit Three AE In 1994, two American scholars published a best-selling, controversial book called The Bell Curve. * Intelligence

More information

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form INVESTIGATING FIT WITH THE RASCH MODEL Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form of multidimensionality. The settings in which measurement

More information

Allina Health Neighborhood Health Connection

Allina Health Neighborhood Health Connection Allina Health Neighborhood Health Connection Findings from the 2016 Neighborhood Health Connection Grant Program Evaluation Survey M A Y 2 0 1 7 Prepared by: Nick Stuber 451 Lexington Parkway North Saint

More information

Reaction Time, Movement Time, and Intelligence

Reaction Time, Movement Time, and Intelligence INTELLIGENCE 3, (1979) 121-126 Reaction Time, Movement Time, and Intelligence ARTHUR R. JENSEN ELLA MUNRO University of California, Berkeley Speed of information processing is measured in terms of reaction

More information

CASE HISTORY (ADULT) Date form completed:

CASE HISTORY (ADULT) Date form completed: Mailing Address: TCU Box 297450 Fort Worth, TX 76129 MILLER SPEECH AND HEARING CLINIC TEXAS CHRISTIAN UNIVERSITY Street Address: 3305 W. Cantey Fort Worth, TX 76129 CASE HISTORY (ADULT) Date form completed:

More information

National Child Measurement Programme Changes in children s body mass index between 2006/07 and 2010/11

National Child Measurement Programme Changes in children s body mass index between 2006/07 and 2010/11 National Child Measurement Programme Changes in children s body mass index between 2006/07 and 2010/11 Delivered by NOO on behalf of the Public Health Observatories in England Published: March 2012 NOO

More information

Rajeev Raizada: Statement of research interests

Rajeev Raizada: Statement of research interests Rajeev Raizada: Statement of research interests Overall goal: explore how the structure of neural representations gives rise to behavioural abilities and disabilities There tends to be a split in the field

More information

A PARENT S GUIDE TO DEAF AND HARD OF HEARING EARLY INTERVENTION RECOMMENDATIONS

A PARENT S GUIDE TO DEAF AND HARD OF HEARING EARLY INTERVENTION RECOMMENDATIONS A PARENT S GUIDE TO DEAF AND HARD OF HEARING EARLY INTERVENTION RECOMMENDATIONS 2017 Developed by the Early Hearing Detection & Intervention Parent to Parent Committee A PARENT S GUIDE TO DEAF AND HARD

More information

Chapter 1 Chapter 1. Chapter 1 Chapter 1. Chapter 1 Chapter 1. Chapter 1 Chapter 1. Chapter 1 Chapter 1

Chapter 1 Chapter 1. Chapter 1 Chapter 1. Chapter 1 Chapter 1. Chapter 1 Chapter 1. Chapter 1 Chapter 1 psychology theory pure research applied research introspection structuralism functionalism behaviorism reinforcement Gestalt psychology a formulation of relationships underlying observed events the science

More information

CSE 258 Lecture 1.5. Web Mining and Recommender Systems. Supervised learning Regression

CSE 258 Lecture 1.5. Web Mining and Recommender Systems. Supervised learning Regression CSE 258 Lecture 1.5 Web Mining and Recommender Systems Supervised learning Regression What is supervised learning? Supervised learning is the process of trying to infer from labeled data the underlying

More information

Observation and Assessment. Narratives

Observation and Assessment. Narratives Observation and Assessment Session #4 Thursday March 02 rd, 2017 Narratives To understand a child we have to watch him at play, study him in his different moods; we cannot project upon him our own prejudices,

More information

A Message from Leiter-3 Author, Dr. Gale Roid: June 2014

A Message from Leiter-3 Author, Dr. Gale Roid: June 2014 A Message from Author, Dr. Gale Roid: June 2014 The development and standardization of a widely-used cognitive test requires several years and some very complex statistical and psychometric analyses. The

More information

IMPORTANT: Upcoming Test

IMPORTANT: Upcoming Test IMPORTANT: Upcoming Test one week from today Thursday January 29 in class, NatSci 1, at 12:00-1:50 worth 10% of course grade 40 multiple choice questions Test Yourself questions give you some idea of what

More information

Testing and Individual Differences UNIT 11

Testing and Individual Differences UNIT 11 Testing and Individual Differences UNIT 11 What is Intelligence? Understanding Shakespeare? Being able to solve mathematical equations? Development of a second or third language? Understanding how to interact

More information

Cross-Cultural Psychology Psy 420

Cross-Cultural Psychology Psy 420 Cross-Cultural Psychology Psy 420 Chapter 5 Culture and Cognition 1 Culture & Physiological Processes Old Model: physiology Psychology New Model: physiology Psychology Experience & learning alters brain

More information

Supplementary Online Content 2

Supplementary Online Content 2 Supplementary Online Content 2 Bieleninik Ł, Geretsegger M, Mössler K, et al; TIME-A Study Team. Effects of improvisational music therapy vs enhanced standard care on symptom severity among children with

More information

Chapter 2 Interactions Between Socioeconomic Status and Components of Variation in Cognitive Ability

Chapter 2 Interactions Between Socioeconomic Status and Components of Variation in Cognitive Ability Chapter 2 Interactions Between Socioeconomic Status and Components of Variation in Cognitive Ability Eric Turkheimer and Erin E. Horn In 3, our lab published a paper demonstrating that the heritability

More information

Everyday Problem Solving and Instrumental Activities of Daily Living: Support for Domain Specificity

Everyday Problem Solving and Instrumental Activities of Daily Living: Support for Domain Specificity Behav. Sci. 2013, 3, 170 191; doi:10.3390/bs3010170 Article OPEN ACCESS behavioral sciences ISSN 2076-328X www.mdpi.com/journal/behavsci Everyday Problem Solving and Instrumental Activities of Daily Living:

More information