Chapter 6 Topic 6B Test Bias and Other Controversies. The Question of Test Bias

Chapter 6 Topic 6B Test Bias and Other Controversies The Question of Test Bias Test bias is an objective, empirical question, not a matter of personal judgment. Test bias is a technical concept of amenable to impartial analysis. In contrast, test fairness reflects social values and philosophies of test use, particularly when test use extends to selection for privilege or employment. The Test Bias Controversy The test bias controversy has its origins in the observed differences in average IQ among various racial and ethnic groups (African Americans on standardized IQ tests). The proof of test bias however must rest on other criteria listed below. Criteria of Test Bias and Test Fairness Test bias refers to objective statistical indices that examine the patterning of test scores for relevant subpopulations. In general, a test is considered biased if it is differentially valid in different subpopulations. In contrast to the narrow concept of test bias, test fairness is a broad concept that recognizes the importance of social values in test usage. Ultimately, test fairness is based on social conceptions such as one s image of a just society. The Technical Meaning of Test Bias: A Definition Bias is present when a test score has meanings or implications for a relevant, definable subgroup of test takers that are different from meanings or implications for the remainder of test takers. Bias in Content Validity Bias in content validity is probably the most common criticism of those who denounce the use of standardized tests with minorities. Content bias: an item or subscale of a test is considered to be biased in content when it is demonstrated to be relatively more difficult for members of one group than another when the general ability level are held constant. Attempts to prove that expert-nominated items are culturally biased have not yielded the conclusive evidence that critics expect. Expert judges cannot identify culturally biased test items. In general, with respect to well known standardized tests of ability and aptitude, research has not supported the popular belief that the specific content of test items is a source of cultural bias against minorities. Bias in Predictive or Criterion Related Validity In general, an unbiased test will predict future performance equally well for persons from different subpopulations. Criterion related or predictive validity bias: A test is considered biased with respect to predictive validity if the inference drawn from the test score is not made with the smallest feasible random error or if there is constant error in an inference or prediction as a function of

membership in a particular group. According to this definition, for a tests to be unbiased the results for all relevant subpopulations must cluster equally well around a single regression line. Y= bx + a higher values of b indicates a steeper slope and more accurate prediction. The value of a depicts the intercept on the vertical axis. If separate regression lines are not even parallel, the test possesses a high degree of test bias in criterion related validity. Bias in Construct Validity Bias in construct validity: Bias exists in regard to construct validity when a test is shown to measure different hypothetical traits for one group than for another. If a test is unbiased, comparisons across relevant subpopulations should reveal a high degree of similarity for 1) the factorial structure of the test and 2) the rank order of item difficulties within the test. An essential criterion of nonbias is that the factor structure of test scores should remain invariant across relevant subpopulations. When test items or subscales of prominent ability and aptitude tests are factoralyzed separately in white and minority samples, the same factors emerge in the relevant subpopulations. A second criterion of nonbias in construct validity is that the rank order of item difficulties within a test should be highly similar for relevant subpopulations. What is essential is that the items most difficult for one subgroup should also be the most difficult for the other relevant subgroups. Reprise on Test Bias In general, ability and aptitude tests fare quite well by the criteria of factor analysis, regression equations, intergroup comparisons of the difficulty levels for biased versus unbiased items, and rank ordering of item difficulties. There is no domain of ability or aptitude testing in which there has been cumulative evidence suggesting test bias. Social Values and Test Fairness Three ethical positions can be distinguished. Unqualified Individualism Ethical stance of unqualified individualism dictates that, without exception, the best qualified candidates should be selected for employment, admission, or other privilege. Quotas The ethical stance of quotas acknowledges that many bureaucracies and educational institutions owe their very existence to the city or state in which they function. Fair share quotas are based initially upon population percentages. Qualified Individualism

Qualified individualism not using race, sex, and so on, as a predictor even if it were in fact scientifically valid to do so. The practical impact of a qualified individualism is therefore midway between quotas and unqualified individualism. Reprise on Test Fairness None of the three philosophies is correct and at one time or another, each of the ethical stances has been championed. Genetic and Environmental Determinants of Intelligence Genetic Contributions to Intelligence The genetic contribution to human characteristics such as intelligence is usually measured in terms of a heritability index that can vary from 0.0 to 1.0. The heritability index is an estimate of how much of the total variance in a given trait is due to genetic factors. It is important to stress that heritability is a population statistic that cannot be extended to explain an individual score. Furthermore, heritability for a given trait is not constant. Environmental Effects: Impoverishment and Enrichment The Skeels (1966) study indicates that the difference between a severely depriving early environment and a more normal one might account for perhaps 15 to 20 IQ points. In another study, findings showed that growing up in a poverty, segregation, and turmoil of the inner city imposes hardhips that lead to a decline in IQ scores from the age of 6 to 11. According to the cumulative deficit hypothesis, a consistent downward trend in IQ is a result of the cumulative effects of environmental disadvantages in factors related to mental development. So what happens when African American children are adopted into a more economically and educationally advantaged environment? This study indicates that when the early environment is optimal, IQ can be boosted by perhaps 20 points. Teratogenic Effects on Intelligence and Development Some substances known as teratogens cross the placental barrier and cause physical deformities in the fetus. Heavy drinking by pregnant women causes their offspring to be at a very high risk for fetal alcohol syndrome (FAS) intelligence is markedly lower in children with FAS. With lower levels of drinking, a more muted manifestation of the syndrome known as fetal alcohol effect may arise normal physical appearance but exhibits impaired attentional capacities and is slower to respond in a reaction time paradigm. Effects of Environmental Toxins on Intelligence Most common environmental toxin is lead others include mercury, manganese, arsenic, thallium, etc. High doses of lead are irrefutably linked to cerebral palsy, seizure disorders, blindness, mental retardation, or event death.

Can a level of absorption that is insufficient to cause obvious medical symptoms nonetheless produce a decrement in intellectual abilities? Research findings have been contradictory but most likely lead exposure has harmful effects on the nervous system. Origins and Trends in Racial IQ Differences Early Studies of African-American and White IQ Differences A discrepancy favouring Whites of about one standard deviation (15 points) have been historically reported. The existence of racial differences in IQ has been reported with such consistency that is it no longer the focus of serious dispute. The Genetic Hypothesis for Racial Differences in IQ The hypothesis came about Why Jensen dismissed the average child concept and the social deprivation hypothesis in his book How Much Can We Boost IQ and Scholastic Achievement? The Bell Curve authors suggested that the IQ gap between black and white have changed little in this century and argued that test bias cannot explain the race differences. Tenability of the Genetic Hypothesis Is the genetic hypothesis for IQ differences tenable? With three lines of evidence, the answer is no. One is that the genetic hypothesis is based on the questionable assumption that evidence of IQ heritability within groups can be used to infer heritability between racial groups. Another criticism is that careful analysis of environmental factors provides sufficient explanation of race differences in IQ. A third criticism is that race as a biological entity is simply nonexistent; that is, there are no biological races. Recent Trends in Race Differences in IQ A recent analysis supports a significant narrowing of the racial IQ gap. Ages Changes in Intelligence Early Cross Sectional Research Results indicated a rapid growth in general intelligence in childhood through age 15 or 20, followed by a slow decline to age 65. Overlooked however was the influence of their methodology on their findings confound age effects with educational disparities or other age group differences. Sequential Studies of Intelligence To control for age group differences, many researchers prefer a longitudinal design in which the same subjects are retested one or more times over periods of many years. There are potential pitfalls however: 1) Time of measurement 2) selective attrition 3) practice effects 4) regression to the mean. The most efficient research method for studying age changes in ability is a cross-sequential design that combines cross sectional and longitudinal methodologies. Seattle Longitudinal Study (most comprehensive cross sequential study ever) conclusions: 1) some abilities decreased after the age of 50, some beginning after the age of 35 2) those born and tested

most recently performed better than those born and tested at an earlier time 3) the longitudinal comparisons showed a tendency for mean scores either to rise slightly or to remain constant until approximately age 60 or 70. In sum, the vast majority of us show no meaningful decline in the skills measured by the Primary Mental Abilities Test until we are well into our seventies. Age and the Fluid/Crystallized Distinction Significant age related decrement in fluid intelligence (reasoning and spatial thinking) while crystallized abilities such as comprehension and vocabulary did not decrease as one aged. Generational Changes in IQ Scores Flynn charted the comparison data from successive editions of the Stanford-Binet and the Wechsler tests from 1932 to 1981 and found that, with only one exception, each edition established a higher standard than its predecessor. This apparent rise in IQ over generations is known as the Flynn effect. Explanations could be better nutrition, improved prenatal care, greater educational access, and increased environmental complexity. Several recent studies indicate that the Flynn effect may have abated or even reversed in the beginning of the twenty-first century.