Developmental Range of Reflective Judgment: The Effect of Contextual Support and Practice on Developmental Stage

Size: px

Start display at page:

Download "Developmental Range of Reflective Judgment: The Effect of Contextual Support and Practice on Developmental Stage"

Shon Harmon
6 years ago
Views:

Developmental Psychology 1993, Vol. 29, No. 5, 893-906 Copyright 1993 by the American Psychological Association, Inc.

1 Developmental Psychology 1993, Vol. 29, No. 5, Copyright 1993 by the American Psychological Association, Inc. Developmental Range of Reflective Judgment: The Effect of Contextual Support and Practice on Developmental Stage Karen Strohm Kitchener, Cindy L. Lynch, Kurt W Fischer, and Phillip K. Wood In this study of K. W Fischer's (1980) skill theory and the development of reflective judgment (K. S. Kitchener & P. M. King, 1981), 156 students, years old, were tested. Two thirds responded to the Reflective Judgment Interview (RJI) and the Prototypic Reflective Judgment Interview (PRJI) twice, with the 2 administrations approximately 2 weeks apart. The remaining one third were tested at 2-week intervals only on the RJI. The PRJI was designed to provide support for optimal level reflective judgment responses, whereas the RJI measured functional level. Ss scored significantly higher on the PRJI than they did on the RJI at both testings, and there was a significant age effect on both measures. Age differences on the 2 measures could not be statistically accounted for by a measure of verbal ability. The PRJI data also provided evidence for spurts in development between ages 18 and 20 and between ages 23 and In reaction to earlier claims of Inhelder and Piaget (1958) that formal operations are the pinnacle of intellectual development, thefieldof adult cognitive development has focused on showing that formal operations are an inadequate account of the cognitive abilities of adults (Basseches, 1984; Kitchener, 1983; Kuhn, 1989), that the tasks used to investigate formal operations have little relevance to the problems adults face (Wood, 1983), and that few adults actually score as formal operational on Piagetian tasks (King, 1986; Niemark, 1979). As a result of these apparent deficiencies, several neo-piagetian models have been postulated (Basseches, 1984; Fischer, 1980; Kitchener & King, 1981; Richards & Commons, 1984). However, empirical testing of these models remains in its infancy, and only a few attempts have been made to evaluate relations between models (Benack & Basseches, 1989; Commons et al, 1989; King, Kitchener, Wood, & Davison, 1989). Similarly, although environmental variables have been found to play an important role in the exhibition of cognitive skills in children (Flavell, 1985), little has been done to evaluate the role of environmental variables in adolescent and adult perfor- Karen Strohm Kitchener and Cindy L. Lynch, School of Education, University of Denver; Kurt W Fischer, Graduate School of Education, Harvard University; Phillip K. Wood, Department of Psychology, University of Missouri. Earlier versions of this article were presented at the meetings of the American Educational Research Association in March 1989 and the American Psychological Association in August It is based in part on a dissertation titled The Impact of a High Support Condition on the Exhibition ofreflective Judgment (University of Denver, Denver, Colorado, 1989) by Cindy L. Lynch. We wish to express our gratitude to Cathy Kasala, Patricia M. King, Marcia Middel, and Christina Whitmire for their help in carrying out the study, to the Spencer Foundation for its financial support, and to the reviewers of this article for their insightful comments and suggestions. Correspondence concerning this article should be addressed to Karen Strohm Kitchener, School of Education, University of Denver, 2450 South Vine Street, Denver, Colorado mance on the tasks designed to measure development according to these neo-piagetian models. What has been done suggests that such variables cannot be ignored (Fischer & Kenny, 1986; Irwin & Sheese, 1989; Rest, 1973). The primary purpose of this study was to test the predictions of one model, skill theory (Fischer, 1980), for another, reflective judgment (Kitchener & King, 1981), particularly in terms of the influence of different contextual conditions on the stage or level of performance (see Table 1). Fischer (1980; Fischer, Bullock, Rotenberg, & Raya, 1993) has argued that seven general skill levels emerge between age 2 and age 30; however, he also argued that no skill exists independent of the environment, and thus, no individual can be said to be at a single developmental level. Rather, an individual's competence will vary depending on the conditions under which it is assessed. The degree to which environmental factors (such as memory prompts, practice, and the nature of the task) support high-level performance is a primary determinant of observed variation in the range of scores (Lamborn & Fischer, 1988). One of the most powerful factors, called contextual support, involves the prompting of a skill. With such support the individual shows a relatively high developmental level, but without the support performance drops, even after a short period of time. Fischer (1980) called performance without contextual support functional level, and he noted that it shows substantial variation across d6mains. It marks the low end of the developmental range. Fischer (1980) contrasted functional level to optimal level, the highest level that the individual can consistently produce under conditions that provide high levels of contextual support and opportunities for practice (Lamborn & Fischer, 1988). Even with contextual support and other environmental factors supporting high performance, individuals will show an upper limit their optimal level beyond which their performance will not go when they are engaged in independent problem solving. That is, they will fail at all tasks that are more complex than their optimal level. Fischer and his colleagues (Fischer & Elmendorf, 1986; Fischer, Hand, Watson, Van Parys, & Tucker, 1984; Lamborn & Fischer, 1988) have identified this developmental range in several domains in studies with children.

894 KITCHENER, LYNCH, FISCHER, AND WOOD Table 1 Skill Levels and Stages of Reflective Judgment Skill level Level Rpl: Single representation Level Rp2: Representational mapping; concrete

2 894 KITCHENER, LYNCH, FISCHER, AND WOOD Table 1 Skill Levels and Stages of Reflective Judgment Skill level Level Rpl: Single representation Level Rp2: Representational mapping; concrete representations are coordinated with each other Level Rp3: Representational system; several aspects of two concrete representations are coordinated Level Rp4 = A1: Systems of representational systems, which are single abstractions Level A2: Abstract mapping; abstractions are coordinated with each other Level A3: Abstract system; several aspects of two abstractions are coordinated Level A4: System of abstract systems, which are single principles Reflective judgment stage Stage 1: Knowing is limited to single concrete instances Stage 2: Two categories for knowing right answers are contrasted with wrong answers Stage 3: Knowledge is uncertain in some areas and certain in others Stage 4: Concept that knowledge is unknown in several specific cases leads to abstract generalization that knowledge is uncertain Stage 5: Knowledge is uncertain and must be understood within a context; thus it can be justified by arguments within those contexts Stage 6: Knowledge is uncertain but constructed by comparing and coordinating evidence and opinion on different sides of an issue Stage 7: Knowledge develops probabilistically through a process of inquiry that is generalizable across domains Note. Rp = representation. From "A Skill Approach to the Development of Reflective Thinking" (pp ) by K. S. Kitchener and K. W. Fischer, 1990, in D. Kuhn (Ed.), Contributions to Human Development: Vol. 21. Developmental Perspectives on Teaching and Learning Thinking Skills (pp ), S. Karger AG, Basel, Switzerland. Copyright 1990 by S. Karger. Reprinted by permission. Reflective judgment is reasoning about the basis for knowing in relation to ill-structured problem solving (Wood, 1983). As in skill theory, the reflective judgment model postulates that seven levels, called stages, emerge between childhood and adulthood (Kitchener & King, 1981; see Table 1). In childhood, individuals view knowing as a concrete state based on direct experience. By adolescence, individuals develop an abstract understanding of knowing, suggesting that it is uncertain and dependent on a person's viewpoint. In the later stages, which are typically exhibited only by educated adults, individuals are able to integrate several abstract concepts of knowing, which allows them to move beyond the focus on uncertainty to consider using evidence and the process of inquiry to justify conclusions. These stages parallel those identified by Fischer (1980). Although this relationship has been articulated in more detail elsewhere (Kitchener & Fischer, 1990), the following examples serve as illustrations. In Stage 1 of the reflective judgment model, knowing is limited to single concrete instances, for example, "I know the cereal is in the box." This appears comparable with what Fischer describes as a single representation (Level Rpl). Similarly, in reflective judgment Stage 4, knowledge is understood as an abstract concept for the first time, although there appears to be an inability to relate an abstract concept of knowledge to other abstract concepts, for example, a concept of justification for beliefs. This is what Fischer labels Level Rp4 = Al, where single abstractions emerge. It is not until the next level in Fischer's model, Level A2, that individuals can relate abstract concepts to each other. A similar development appears in reflective judgment Stage 5. For example, the ability to justify knowledge within a context is dependent on the ability to relate the abstract concepts of knowledge and justification to each other. In contrast to skill theory, research on the reflective judgment model has generally ignored the role of environment and has focused on documenting the existence of developmental changes (Kitchener, 1986; Kitchener & King, 1990) and the sequential appearance of those changes (Davison, King, Kitchener, & Parker, 1980; Kitchener, King, Wood, & Davison, 1989). Reflective judgment has been assessed with the Reflective Judgment Interview (RJI), which asks individuals to form a judgment about several ill-structured problems and to talk about epistemological assumptions underlying the judgment. It provides little contextual support for reflective judgment. As a result, we hypothesized in this study that under conditions of contextual support and practice, reflective judgment scores would be higher than with the RJI. The form of development for reflective judgment was a second focus of the study. Several scholars have recently documented that growth and development of specific behaviors and brain functions do not routinely fit the standard developmental models of monotonic increase but often show substantial variability, including sharp spurts, plateaus, and drops (Thatcher, 1991; van der Maas & Molenaar, 1992; van Geert, 1991). Measures tied to specific behaviors and domains often show this

variability, whereas composite measures combining many different developmental levels, behaviors, and domains typically show smooth growth functions.

3 variability, whereas composite measures combining many different developmental levels, behaviors, and domains typically show smooth growth functions. In other words, the complexities of growth curves for specific behaviors tend to be averaged out when measures are combined. Fischer and his colleagues (Fischer, Pipp, & Bullock, 1984) have suggested that the shapes of developmental functions for specific behaviors and domains vary systematically under different assessment conditions. In particular, optimal and functional levels show different growth functions: Optimal level shows stagelike discontinuities in development at predictable ages, whereas functional level tends to show gradual, continuous change or less predictable growth. They suggested that at each age when a new developmental level is emerging, a person shows spurts in optimal performance in a wide range of domains. Specifically, spurts occur for each of the skill levels in Table 1 when a new level emerges. If a person does not reach full performance in a skill when an initial spurt occurs, then there is likely to be a second spurt in skills at that level when the next developmental level emerges, with only slow growth in between. For both individuals and cultural groups, these spurts are distributed over a relatively narrow age range. These optimal growth functions are hypothesized to be most evident in areas in which individuals have had many opportunities for practice or regular instruction, such as arithmetic during the school years. Previous research has found support for spurts at 14 to 16 years in understanding arithmetic concepts (Fischer & Kenny, 1986), in perceived conflicts in adolescents' own personalities (Harter & Monsour, 1992), in understanding interpersonal interactions involving honesty and kindness (Fischer & Lamborn, 1989), and in understanding logical conditionally (O'Brien & Overton, 1982). There is also evidence for spurts at some younger age periods, for example, at about 2 years (Corrigan, 1983) and at about 11 years (Moshman & Franks, 1986). For older ages, no studies have definitively tested for optimal level discontinuities, although several do suggest such changes at and years (Fischer, Hand, & Russell, 1984). The rationale for why spurts may occur at these ages has been elaborated elsewhere (Fischer, Pipp, & Bullock, 1984; van der Maas & Molenaar, 1992; van Geert, 1991). In the present research, a new method for assessing reflective judgment was devised, called the Prototypic Reflective Judgment Interview (PRJI). This method was designed to provide strong contextual support for arguments at each stage of reflective judgment. In addition, participants were given the opportunity to practice the arguments, with some instructional guides. Because spurts have been found in certain age ranges in prior research, the following specific ages were hypothesized for discontinuities in performance under conditions of contextual support and practice: The ability to construct abstract mappings should emerge at about age 14 to 15, the ability to use abstract systems should emerge at about age 19 to 20, and the ability to construct single principles or systems of abstract systems should emerge at about age 24 to 25 (Fischer, Hand, & Russell, 1984). These skills correspond to reflective judgment Stages 5,6, and 7, respectively (see Table 1; Kitchener & Fischer, 1990). On the other hand, we hypothesized that functional level, as DEVELOPMENTAL RANGE 895 assessed by the traditional RJI, would show mostly slow, gradual increase, with some irregularity. In general, with a functional level assessment, the proportion of individuals who exhibit a skill at a given level will first rise above zero at the age when the optimal level spurt occurs and then increase slowly. Of course, many factors can influence the exact shape of the functional growth curve. For example, when people have a strong motivation or interest in a domain, they often spontaneously function near their optimal level. Also, when they are given an opportunity to use a number of skills in the domain, a few of the skills may reflect their optimal level, even though their modal performance is not at optimum. In such cases, they may demonstrate a spurt under functional conditions because they function near optimum without contextual support. The RJI provides an opportunity to test this hypothesis about detecting optimal level under functional conditions, because it is a lengthy, open-ended interview in which the participants present many arguments. To see if optimal level performance could be detected in the RJI, we scored it not only in the usual way (mean stage of argument) but also with a new method designed to detect the highest stage that participants could use with some regularity. The question we asked was whether this score matched that of the condition specifically designed to assess optimal level (PRJI). Thus, we designed the study to investigate the hypothesized differences in growth functions between optimal and functional levels of reflective judgment. We predicted that under conditions of contextual support and practice (PRJI), reflective judgment growth functions would show discontinuous spurts at about ages 14-15, 19-20, and On the other hand, under more spontaneous conditions (RJI), growth was predicted to appear mostly slow and continuous. These two conditions of the study were predicted to produce a clear contrast in the two growth functions, although it was not expected that all persons in the samples would show the differences. Because the support-and-practice conditions for the study were brief interventions, we expected that the conditions probably would not be sufficient to produce optimal performance in all participants. A final area of investigation involved potential differences between male and female subjects. In many studies using the RJI, gender effects have not been significant (King, Kitchener, Davison, Parker, & Wood, 1983; Kitchener & King, 1981; McKinney, 1985; Welfel, 1982; Welfel & Davison, 1986). However, one study has found an effect favoring female subjects (Schmidt, 1985), whereas others have found an effect favoring male subjects (Lawson, 1980; Strange & King, 1981). Furthermore, Belinky, Clinchy, Goldberger, and Tarule (1986) have argued that women's epistemological development is unique, with different emphases than men's development. Therefore, we believed it important to ask: Are there different developmental patterns for men and women, particularly under conditions that might enhance performance? Subjects Method A total of 156 subjects from ages 14 to 28 years were interviewed. Twelve individuals (6 female and 6 male subjects) at each age from 14 to

4 896 KITCHENER, LYNCH, FISCHER, AND WOOD 24, twelve 25- or 26-year-olds, and twelve 27- or 28-year-olds were tested. The subjects were volunteers from local middle and high schools and undergraduate and graduate student volunteers from a private university. To control for the assumed strong academic abilities of the graduate students, we drew the middle and high school students who participated in the study from the upper third of their class based on standardized achievement tests. The undergraduate students were from the upper half of the University of Denver student population based on their college entrance examination scores. During the course of the research, each participant responded to the Wechsler Intelligence Scale for Children Revised (WISC-R) or the Wechsler Adult Intelligence Scale Revised (WAIS-R) Vocabulary subscale to provide a standard measure of verbal ability for each participant. We used these data to determine if differences in verbal ability (a part of academic competence) could account for differences in RJI and PRJI Measures Reflective Judgment Interview (RJI). Reflective judgment functional level was assessed through the RJI. In the standard RJI, people respond to a standard set of seven questions about four ill-structured problems from the social and physical sciences. For this study, data from only two of the four problems were used: one about the safety of chemical additives to foods and one about the building of the pyramids. These two problems were chosen because in previous research they had strong interrater reliability and agreement as well as high interproblem correlations. The rationale for the interview (Kitchener, 1986) and its psychometric properties (Mines, 1982) have been reviewed in detail elsewhere. In general, interrater reliability has been moderate to high. Test-retest reliability on four small homogeneous samples over a 3-month period has ranged from.71 to.83 (Sakalys, 1982). Coefficient alpha, a measure of internal consistency, has ranged from.62 (Welfel, 1982) to.96 (Kitchener & King, 1981). Interviews were transcribed verbatim, and responses to each problem were scored independently by two trained and certified raters. Raters gave stage scores to responses to each of the seven interview questions based on the rating rules developed by Kitchener and King (1985) for the RJI. Based on the scores assigned by the raters to the responses, scores were summarized into a three-digit code indicating each rater's evaluation of the stages that appeared in the responses. If only one stage was identified by a rater in the responses for that problem, that stage was used for all three digits (e.g., 5-5-5). If two stages were used in the rating, the one that was used most frequently was listed as two of the digits and the third digit indicated some evidence of another stage in the transcript (e.g., 5-5-4). For the two raters to be considered in agreement, the sums of their three-digit scores had to be within 2 points of each other. For example, scores of and would be in agreement (16 is only 2 points higher than 14); scores of and would not be in agreement (16 is 3 points higher than 13). The raters were in agreement on 78% of thefirst-roundratings, and interrater reliability was.78. In cases in which the raters were not in agreement on the first round of ratings, both raters rerated that transcript without knowing what score the other had assigned. If the raters still were not in adequate agreement after the second round of ratings, they conferred and reached a consensus rating. For purposes of group comparisons, we derived a mean score by summing the three digits for both raters and deriving a mean for each problem. (See Kitchener et al., 1989, for a more complete discussion of scoring.) The mean for the two problems was then averaged to arrive at an overall RJI score. The RJI mean score was used as a measure of functional level because it represented how participants responded in a spontaneous testing condition. To compare the highest stage used on the RJI to the highest stage used with practice and support, we derived a second score, RJI highest stage. Based on the three-digit code assigned by each rater described earlier, each subject had a total of six digits for each problem (3 digits x 2 raters). Of these six digits, the highest number that was represented at least twice was operationally defined as the RJI highest stage score. For example, if the digits were and 3-3-4, or if the digits were and 4-4-3, the RJI highest stage score would be 4. Prototypic Reflective Judgment Interview (PRJI). The PRJI was devised to be a high support condition. The general format was to present respondents with a problem-specific (pyramids and chemicals) prototypic summary statement of each reflective judgment stage from Stages 2 to 7, one at a time. Each prototypic stage summary statement addressed three major rating components of the reflective judgment model in a stage-appropriate way: (a) the extent to which one can know for sure the viewpoint is correct; (b) the basis for the point of view; and (c) an explanation of why people have different points of view about the issue. For example, the prototypic summary statement for Stage 5 from the chemicals problem was as follows: I am on the side that chemicals in foods cause cancer, but we can never know without a doubt. There is evidence on both sides of the issue. On the one hand there is evidence relating certain chemicals to cancer, and on the other hand there is evidence that certain chemicals in foods prevent things like food poisoning. People look at the evidence differently because of their own perspective, so what they conclude is relative to their perspective. The prototypic statements were reviewed by two experts on reflective judgment scoring who judged the statements to be adequate representations of each stage and qualitatively different. The mean reading level of the statements was seventh grade. For each prototypic statement, a participant read the statement silently while the researcher read it aloud. Then the participant was asked a series of questions to clarify his or her understanding of that particular statement. These questions focused attention on the three major rating components identified in the previous paragraph. Finally, the participant was asked to summarize the prototypic statement in his or her own words. These summaries were the basis for the person's score on the PRJI. Support was provided through the interview process itself. Participants were asked to explain good prototypic statements at each level of the model rather than to articulate responses independently, which was the task in the RJI. Furthermore, as a memory prompt, the interviewer provided a verbal introduction that informed the participants of the concepts about which they would be questioned. These concepts again were based on the three rating components, for example, whether the statement asserted that knowledge can be gained with certainty. Each of the stage prototypic statements also included a concrete example of the issue in question. The statements were presented to the respondents in developmental stage order to support their understanding of the increasing complexity of the statements. The questions that were asked prior to the final summary for each problem directed the participants' attention to key aspects of the statement that were used in scoring the summary response. Finally, as a further memory support, the interviewer provided the respondents with a list of questions to consider as they were making their summary. This list was the same for each stage statement of both problems and paralleled the questions asked before the summary. The questions were the following: 1. What does the person believe about this problem? 2. To what extent can this person say he/she knows for sure that his/her viewpoint is correct? or To what extent does the person believe that we can ever know for sure about this issue? 3. On what does the person base his/her point of view? 4. How does the person explain why people believe differently?

5 DEVELOPMENTAL RANGE 897 These questions were nearly identical to four of the seven questions on the RJI. For example, Question 2 on the PRJI parallels these interview questions: "Can you ever say you know your viewpoint for sure?" or "Can we ever know for sure about this issue?" From audiotapes, trained raters scored the participants' summary responses in the PRJI for each stage. Rating rules for scoring the PRJI were drawn from the RJI rating rules and were checked by an expert not associated with the project for accuracy and consistency with the RJI rating rules. Limitations in time and funds made it impossible to transcribe the PRJI for rating. Responses were scored as a hit for a summary that was an accurate, stage-appropriate reflection of the statement and as a miss for an interpretation that was inaccurate. This format is similar to one used by Rest (1973) in moral judgment research. Thirty-nine percent of the interviews were rated by more than one rater, resulting in 85% agreement. Disagreements in ratings were resolved through discussion between the raters, which led to a consensus rating. There were several indications that the PRJI is a valid tool for assessing reflective judgment. The scores of over 99% of the respondents scaled perfectly on each of the problems. The two whose scores did not scale perfectly missed at Stages 5 and 7 but hit at Stage 6. These data suggest that participants could understand all the stage prototypic statements lower than their highest stage "hit," The correlations with the RJI mean scores were moderately high, ranging from.71 for Time 1 to.74 for Time 2 (p <.001). The split-half reliability for the PRJI was.85 both at the first and second testings. Because the reliability was high, an overall PRJI score for each participant was derived by averaging his or her highest stage hit on the two problems of the PRJI. WISC-R and WAIS-R. The Vocabulary subscales of the WISC-R and WAIS-R were used to obtain a standard measure of verbal ability. The 14- and 15-year-old participants responded to the WISC-R word list, and the remaining subjects responded to the WAIS-R word list. Procedure Over the course of a year,fiveinterviewers who were trained in both RJI and PRJI procedures conducted the interviews. One hundred and four students, 8 from each age group, participated in 2-hr interviews. During the interview, they were administered the two RJI problems (the low support measure) and then the PRJI (the high support measure). This constituted the experimental condition. For the remaining 52 students (4 from each age group), only the 1-hr RJI was given. This was a test-retest condition designed to assess the effects of practice without the intervention provided by the PRJI. All participants were interviewed twice by the same interviewer, with approximately 2 weeks between the interviews. Because of a recording error, data were missing on the RJI at Time 1 for one 21-year-old female respondent in the experimental condition. All other data sets were complete. At the end of the Time 1 PRJI, participants in the experimental condition were given two examples of good prototypic stage statements: one at the stage that the interviewer observed to be the highest stage clearly summarized in the first session and one at the next highest stage. These statements did not make direct reference to the content of the problems (i.e., chemical additives or pyramids) on which participants were being tested. Instead, they generally described the underlying concepts for the stages involved. The idea was to have the participants review the concepts and, during the second testing, to evaluate their ability to use these concepts in discussing issues involving the building of the pyramids and chemical additives to foods. An example for Stage 5 follows: Some people believe that knowledge must be understood within a context. They claim that we can know only our own viewpoint about issues. In other words, beliefs are always based on interpretations of evidence and data from a particular point of view. As a result, beliefs can be justified only from within a particular perspective. Knowledge is based on subjective evaluations of evidence. Evidence and arguments are always interpreted through the person's particular perspective. Differences in points of view result from different interpretations. Differences in interpretation result from real differences in how people see the issue. In addition, participants were provided the same four questions used as memory prompts during the interview. They were asked to read the statements between interviews and to think about the questions and the statements. Before the second testing, interviewers asked participants whether they had done the homework; 87% of the participants indicated they had at least read the material before the second testing. The information given to the participants in the experimental condition following the Time 1 PRJI, together with the administration of the RJI and the PRJI in the first session, provided the practice condition before the second testing. The test-retest participants were given neither written statements nor questions and were given no instructions other than to return for the second interview. Results Comparison of No Support and High Support Conditions In this study the PRJI was designed as a high support condition, and we predicted that (a) PRJI scores would be higher than RJI mean scores and (b) the PRJI scores at the second testing following practice would be higher than the PRJI scores at Time 1 and higher than the RJI mean scores at Time 2. In addition, the question remained whether the RJI, when scored for the highest stage, would operate like the high support measure. We used two general linear model procedures to evaluate the Age X Measure X Time of Testing differences in the experimental group. For purposes of this analysis, to make the cell sizes associated with older groups more similar, we combined the data from 25- and 26-year-olds as well as the data from the 27- and 28-year-olds. Because the average RJI scores share some logical dependencies with highest stage RJI scores, two analyses of variance (ANO\As) were conducted, the first examining differences between PRJI and mean RJI performance and the second comparing PRJI with highest stage RJI performance. In the first analysis comparing PRJI and RJI mean scores, the overall model for this 1 3 x 2 x 2 analysis was significant,.f(51, 363) = 9.14, p <.001. The proportion of variability in performance accounted for by this model was high and comparable with previous studies of age-related change in reflective judgment (i? 2 =.56). As expected, the age effect for this model was significant,^ 2,363)= 30.93, p<.001. Based on a Waller- Duncan follow-up test, the means of the age groups can be generally divided into three groups consisting of individuals 14-18,19-22, and years, respectively. In other words, the means of all of those participants 18 years and younger were significantly lower than for those 19 years and older. The means of those 22 and younger were significantly lower than for those over age 23, although the means of the 22- and 23-year-olds were not significantly different from each other. A significant time effect was found, F(l, 363) = 13.12, p <.001. Finally, there was a significant measure effect, F(\, 363) = 59.99, p <.001, with PRJI scores higher than RJI mean scores. Means of the

898 KITCHENER, LYNCH, FISCHER, AND WOOD Time 1 and Time 2 RJI and PRJI scores for these three groups are given in thefirstand fifth columns of Table 2.

6 898 KITCHENER, LYNCH, FISCHER, AND WOOD Time 1 and Time 2 RJI and PRJI scores for these three groups are given in thefirstand fifth columns of Table 2. Parallel analyses to the ones just described comparing PRJI scores with RJI highest stage scores revealed the same significant overall model, F(5l, 363) = 8.45, p <.001. The time and age effects were significant, F(l, 363) = 11.10, p <.001 and F(l 2,363) = 32.70, p <.001, respectively, but in contrast to the analysis comparing RJI mean scores to PRJI scores, the comparison of RJI highest stage scores and PRJI scores was not even marginally significantly different, F(l, 363) = 0.02, p >.8. The RJI highest stage scores are reported in the third column of Table 2. Follow-up Duncan-Waller tests on the means for this design yielded identical groups to those described earlier, except that the average of the 14-year-old group was found to be significantly lower than the average for the 18-year-olds. Because of the small size of the 14-year-old group relative to the others and for convenience of presentation, we decided to use the three age categories defined earlier when describing age trends in the data. In summary, from the results of these two analyses, there was an overall time-of-testing effect on all measures. Furthermore, in the high support (PRJI) condition, respondents scored higher than they did in the low support (RJI) condition. The RJI highest stage score appears to provide a reasonable average estimate of PRJI score. These results, combined with the developmental spurt analyses outlined later, help to determine the degree to which RJI highest stage score can be used as a substitute for PRJI score. Finally, as predicted, scores on the PRJI at Time 2 were higher than the mean scores on the RJI at either testing or the PRJI at Time 1. Given these two main effects (time and test effects), the skill theory predictions regarding the patterns of means across groups were confirmed: Individuals in the experimental high support group scored higher after receiving the support intervention, and scores on the measure designed to assess optimal level were higher than RJI mean score, a measure of functional level. Additional examination of the descriptive statistics presented in Table 2 reveals other patterns worth mentioning. First, the patterns of means for Time 1 and Time 2 for the three groups suggest a pattern of interaction, as does the time-of-testing effect on the RJI highest stage scores. Specifically, the difference between RJI Time 1 means and PRJI Time 2 means for Group 1 is 0.61, and the difference for Group 3 is 1.02, suggesting a larger difference between scores for the older group. Within the context of reflective judgment research, mean differences of a half stage have been described as practically significant amounts of change. It is puzzling, given the traditionally high reliability of the instrument and the differential magnitude of time effects across groups, that we failed tofindstatistically significant interactions involving some combination of time, group, or measures. In addition, heteroscedasticity in performance across the cells of the study also is evident. For example, the variance in performance on mean RJI scores, highest RJI scores, and PRJI measures for the older group at Time 1 is much larger than that of the younger group at Time 1 (i.e., Group 3 Time 1 PRJI variance = =1.14, compared with Group l's variance of =.38). In general, skill theory predicts increasing variance with age, especially for individuals who are not provided with opportunities to practice a skill or the contextual support for functioning at optimal levels. In addition, with opportunities to practice, it predicts that the variability in scores associated with older ages should decrease. This pattern can be observed in these data as the variance accounted for in the PRJI scores at Time 2 drops to = It also should be noted that although there was no Measure X Time of Testing X Age interaction, the difference between the RJI highest stage score at Times 1 and 2 for Group 1 was.03 and for Group 3, it was.50. This suggests that there was no time-of-testing effect on the RJI highest stage score for the youngest group, although there were for the other age groups and measures. Wood and Games (1991) have discussed how patterns of statistically nonsignificant interactions can occur in ANOVA in the presence of differential variability. In summary, their work reports that although analysis of variance is a statistically robust technique (meaning that reported significant differences using ANOVA are likely to reflect "true" differences), heteroscedasticity affects the power of tests of interactions in analy- Table 2 Mean Scores and Standard Deviations for the Reflective Judgment Interview (RJI), the Reflective Judgment Highest Stage Hit, and the Prototypic Reflective Judgment Interview (PRJI) by Time of Testing RJI RJI highest stage PRJI Group/time M SD M SD M SD Group 1 (ages 14-18) Time 1 Time 2 Group 2 (ages 19-22) Time 1 Time 2 Group 3 (ages 23-28) Time 1 Time Note. Sample sizes are, for Group 1, n = 40; for Group 2, n = 31 at Time 1 for RJI scores and n = 32 for remainder of scores; and for Group 3, n = 32.

DEVELOPMENTAL RANGE 899 sis of variance so that the researcher is likely to fail to uncover the presence of all interactions in the phenomenon of interest.

7 DEVELOPMENTAL RANGE 899 sis of variance so that the researcher is likely to fail to uncover the presence of all interactions in the phenomenon of interest. Others have made similar observations (Wahlsten, 1990). Thus, the failure to find an Age X Measure interaction or an Age X Time X Measure interaction may be due to the lack of power of the ANOVA in light of the heteroscedasticity across cells. In addition to these predictions based on skill theory, there may be other reasons for the observed patterns of heteroscedasticity in the data. Some heteroscedasticity in Table 2 may be caused by interaction effects involving gender (discussed later). It is unlikely, though, that gender interactions explain all of the observed heteroscedasticity. Effects ofretesting on RJI Scores Because the increase in scores between testings might have resulted from a repeated testing effect, a control test-retest condition was included in this study for the RJI. Test-retest reliability based on the combined RJI mean score for this group was r =.87. A general linear model was constructed with age, testing condition, and time of testing as independent variables and RJI mean scores as dependent variables. The overall F for the model was significant, F(51,259) = 6.60, p <.001, for the RJI mean score. Analyses revealed a significant testing condition effect, F(l, 259) = 5.77, p =.017, with respondents in the experimental condition scoring higher than those in the control group (RJI mean =4.21 and 4.41 for the control and experimental groups, respectively). In addition, an age main effect was found, ^(12, 259) = 21.84, p <.001; however, there was no main effect for time of testing. Follow-up Duncan-Waller tests on this age effect revealed the same three distinct young, middle, and older age groups described earlier. Similar analyses using RJI highest stage score yielded the same pattern of results. 1 In other words, RJI scores were stable across testings for both the control and experimental groups. It should be noted that this finding differs from the results of the initial comparison of the RJI and PRJI in which a significant time-of-testing effect was found for the experimental group. There may be several reasons for the different outcomes of the initial analyses and the test-retest analysis. First, it may be that the PRJI had a strong time-of-testing effect and the RJI in the experimental condition had a weaker one; thus, in the initial analysis the overall time-of-testing effect was significant. In the test-retest analysis, by contrast, it may be that there was little or no time-of-testing effect for the RJI control group, and the small time-of-testing effect in the experimental condition was not strong enough to create a significant main effect. The fact that no test-retest effect has been found on the RJI in the past (Sakalys, 1982) supports this interpretation. However, if this were the case, one would expect the interaction effect to be significant. Although it was not, it did approach statistical significance, F(l, 259) = 3.26, p =.07. In addition, as with the initial comparison of no support and high support conditions, a pattern of heteroscedasticity was found for the test-retest analysis. An inspection of the means and standard deviations of the experimental and control groups revealed large differences in standard deviations from cell to cell, with standard deviations being larger in the experimental condition. This suggests that some individuals in the experimental group may have benefitted from practice while others did not. Again, the failure to find a significant interaction may have been affected by the differential variance (Wahlsten, 1990; Wood & Games, 1991). Taken together, the results suggest that there was a small, inconsistent effect of the experimental condition on Time 2 RJI scores that could not be accounted for by test-retest effects alone. Verbal Ability and Age Effects Because the older groups were drawn from undergraduate and graduate programs and even though scholastic aptitude was roughly controlled in sample selection, age effects could be caused by having more verbally able students in the two older groups. The WISC-R/WAIS-R Vocabulary subtest scaled score was therefore used as an additional control for this possibility. Mean scaled scores for Groups 1,2, and 3 over both conditions were (n = 60), (n = 48), and («= 48), respectively. The correlation between RJI mean scores and the WISC-R/ WAIS-R Vocabulary scaled scores at the first testing was.33, p <.001. To determine whether differences in verbal ability as measured by the Vocabulary scaled score accounted for time of testing, age, or condition effects in the RJI mean scores, we ran an Age X Time X Condition analysis of covariance (ANCOVA). The overall model was significant, F(52,258) = 7.81, p <.001. The covariate WISC-R/WAIS-R score, condition, and age effects all were significant, F(l, 258) = 71.78, p <.001; F(l, 258) = 6.21, p =.013; and F(12,258) = 25.40, p <.001, respectively. Again, the time-of-testing effect was not significant. These analyses suggest that the age differences in RJI mean scores could not be statistically attributed to higher verbal ability in the successively older groups, nor could verbal ability account for the higher reflective judgment scores of those in the experimental groups versus the test-retest control group. We also used ANCOVAs to evaluate the effects of the WISC-R/WAIS-R scores on the RJI highest stage score. Again, the main effects paralleled those for the RJI mean and are reported here. Turning again to the mean PRJI scores of the experimental group, the correlation between the scaled WISC-R/WAIS-R Vocabulary scores and the PRJI mean at the first testing was. 34 (p <.001). A Time X Age ANCOVA was calculated to evaluate whether verbal ability could account for age and practice effects on the PRJI; it could not. The overall ANCOVA model was significant, F(26,181) = 9.18, p <.001, as were the age, ^(12,181) = 15.36, p <.001, and time effects, F(l, 181) = 9.25, p <.001. The covariate also was significant, F(l,181) = 39.13, p<.001. In summary, the results of the ANCOVAs provide additional support for the conclusions drawn with the initial analyses. Specifically, the age effects on all three measures could not be accounted for statistically by verbal ability scores, suggesting that the initial participant selection criteria designed to control for the high scholastic ability of the older participants were effective. 1 These results, as well as others that are not reported here because of space limitations, are available on request from Karen Strohm Kitchener.

8 900 KITCHENER, LYNCH, FISCHER, AND WOOD Developmental Spurts Skill theory makes specific predictions about the timing of developmental spurts and predicts that spurts will be more apparent after practice on measures such as the PRJI that are designed to provide contextual support than on measures such as the RJI (Fischer, Hand, & Russell, 1984; Fischer, Pipp, & Bullock, 1984). Although the initial Age X Time X Measure ANOVAs provided initial support for the predicted developmental spurts, to further test this prediction, we graphed RJI scores for the experimental group at Time 1 against those on the PR JI at Time 2 (see Figure 1) and ran one-way ANOVAs on each set of scores. As expected, for both the RJI at Time 1 and the PRJI at Time 2, the age effects were significant, F(12, 90) = 8.01, p <.01, and F(12, 91) = 9.58, p <.01, respectively. To statistically test for developmental spurts, we used a Waller-Duncan K-ratio t test to compare all means for adjacent years and 2-year intervals. There were significant increases in RJI scores for only one 2-year interval (between ages 21 and 23). The PRJI condition at Time 2 showed a significant 1-year increase between ages 18 and 19 and significant 2-year increases between ages 17 and 19, 18 and 20, and 23 and 25/26. These results support the hypothesis of developmental spurts on the PRJI for Time 2 at about 18 to 20 years and 23 to 25 years but not the earlier spurt at ages 14 to 15. Although there was a significant increase on the RJI between ages 21 and 23, there were no significant differences between the scores of those aged 20 and 23 or those aged 19 and 23. Thus, the apparent spurt may result from the particularly low scores of the 21-year-olds in this sample (see Figure 1). In fact, when a similar analysis was conducted on the RJI means of the control group, the 21-yearolds in the control group scored higher than the experimental group (4.5 vs. 4.1), and there were no significant differences between the scores of the 19-, 20-, or 21-year-olds and the 23- year-olds. Thus, the differences on the RJI at Time 1 for the experimental group could have resulted from a sampling error. When a similar analysis was run on the RJI highest stage score at Time 2, the age effect was significant,.f(12, 91) = 13.49, p <.01, and there were significant 1-year increases between ages 14 and 15,18 and 19, and 22 and 23. These analyses are consistent with the earlier ANOVAs comparing the PRJI and RJI mean scores and the PRJI mean and Age of Subjects - RJI Mean Time 1 --& -. PRJI Time 2 Figure 1. Mean Reflective Judgment Interview (RJI) Time 1 and mean Prototypic Reflective Judgment Interview (PRJI) Time 2 scores.

9 DEVELOPMENTAL RANGE 901 the RJI highest stage scores. However, these analyses also suggest that the RJI scores were contributing less to the age differences in the initial ANOVAs than were the PRJI and RJI highest stage scores. Taken together, the results suggest that under conditions of support and practice (PRJI) and with the RJI highest stage scores, developmental discontinuities can be identified at about age 19 to 20 and in the mid-20s. Support for the earlier spurt between ages 14 and 15 was only found in the RJI highest stage score data. The data also are consistent with the claim that spurts will be more apparent under conditions that allow participants to approach their optimal level. Van Geert (1991) and Fischer and Kenny (1986) argued that discontinuities in developmental functions should be most evident for single, relatively homogeneous categories of behavior rather than composite measures of behaviors. The one-way AN- OVAs and Figure 1 reflect a composite growth function, in which scores for each of the reflective judgment stages are combined into a single score. We can analyze the development of reflective judgment into its components by examining growth functions for each reflective judgment stage; each stage constitutes a single, relatively homogeneous category of behavior. Therefore, to visualize the individual stages that contributed to the apparent spurts identified in the one-way ANOVAs, we graphed by stage the PRJI Time 2 scores and the RJI Time 1 scores. For thesefigures,scores for the two PRJI problems were combined, so that each person could obtain a score of 0,1, or 2 tasks passed at a stage. For each stage, the skill theory prediction is that a spurt will occur initially when the stage first emerges (about age for Stage 5, age for Stage 6, and age for Stage 7) and that additional spurts will occur at later specified ages until performance is near ceiling. For example, success with Stage 5 tasks is predicted to spurt initially at years and then again at and years until ceiling has been reached. Analyses of the percentage of tasks passed at each age for each stage on the PRJI at Time 2 supported the hypothesized spurts, as well as the change on the RJI at 21 to 23 years. As illustrations, the data for Stages 5 and 6 for PRJI Time 2 and RJI mean Time 1 are graphed in Figures 2 and 3. For Stage 5, no PRJI tasks were passed at age 14, but at age 15, 37.5% of the tasks were passed (see Figure 2). In other words, as skill theory predicted, the initial appearance of abstract mapping skills (reflective judgment Stage 5) occurred between ages 14 and 15. As noted earlier, skill theory also predicts that because of limited opportunities for practice, not all individuals will reach their optimal level between ages 14 and 15. It also predicts there should be another spurt in performance at about age 19 when abstract systems are hypothesized to emerge. In fact, the percentage of Stage 5 tasks passed went from 37.5% at age 18 to 75% at age 19 and 87.5% at age 20. On the RJI, performance on Stage 5 tasks remained relatively low until age 19, when performance jumped to almost 40% of Stage 5 tasks passed. Performance on Stage 6 PRJI tasks, abstract systems, remained low until age 19, when 31 % of the tasks were passed (see Figure 3). By age 21, this number had increased to 56%. A secondary spurt appeared to begin at age 24, with performance increasing to 100% at age 25/26. On the RJI, performance on Stage 6 tasks remained at zero or nearly zero until age 23, when the percentage increased to 37.5%. Although not graphed here, the percentage of Stage 7 tasks, systems of abstract systems, remained low (12.5%) until age 24 on the PRJI. This percentage increased to 62.5% by age 25/26. Taken together, the results for the individual stages support the predictions of three spurts at about 14-15,19-20, and years for the PRJI Time 2 data. These results also indicate the components that contributed to the composite developmental function illustrated in Figure 1: The spurt in Figure 1 at 18 to 20 years on the PRJI was produced by spurts in both Stage 5 and Stage 6 responses at that age. The spurt at 23 to 25 years was produced by spurts in both Stage 6 and Stage 7 responses at that age. The composite developmental function of the PRJI scores did not identify the spurt between ages 14 and 15, but the stagespecific analysis provided some evidence that a spurt may be occurring at this age. Gender Effects To evaluate gender effects, we compared RJI mean scores for age, condition, time of testing, and gender using a general linear models procedure. The overall model was significant. In addition, the condition and age effects were significant, paralleling those found in earlier analyses and are not reported here. The overall gender effect was significant, F(l, 207) = 10.20, p <.01, favoring male subjects (M = 4.46) over female subjects (M = 4.23), as was the Age X Condition X Gender interaction, F(12, 207) = 1.90, p =.035. To communicate the magnitude of this effect, we show in Table 3 the means for male subjects in the three age groups that were reported in Table 2. Results for the analyses based on RJI highest stage score yielded generally the same pattern of results, although the Age X Condition X Gender interaction failed to achieve statistical significance. These analyses are not reported here because of space limitations. Using the same procedure, we also compared PRJI scores for age, gender, and time of testing. The overall model was significant,,f(51, 156) = 4.57, p <.001, as were the age and time effects, which paralleled the results of earlier analyses and are not reported here. The gender main effect was not significant, although the Age X Gender interaction was,.f(12,156) = 2.38, p <.01. The fifth column of Table 3 shows the pattern of PRJI differences for male and female subjects in the three age categories. As can be seen from this table, the scores of the youngest female and male subjects were virtually identical, whereas female subjects scored higher than male subjects in the middle group and lower than male subjects in the older group. Discussion The primary purpose of this study was to test the predictions of skill theory for the reflective judgment model. Fischer (1980) argued that his model describes the form of cognitive development across domains; thus, the general form and processes he describes ought to generalize to domain-specific development. Although this hypothesis has been tested in several domains with children (Fischer & Elmendorf, 1986; Fischer, Pipp, & Bullock, 1984; Lamborn & Fischer, 1988), in this study an ini-

10 902 KITCHENER, LYNCH, FISCHER, AND WOOD 100% -r 75% -- 50% - 25% -- 0% -25% Age of Subjects high support with practice (PRJI, Time 2) low support (RJI, Time 1) Figure 2. Percentage of tasks passed at reflective judgment Stage 5 under two testing conditions. (PRJI = Prototypic Reflective Judgment Interview; RJI = Reflective Judgment Interview; n = 8; N= 104.) tial test of this hypothesis was made with adolescents and adults on the reflective judgment model. Specifically, Fischer, Hand, and Russell (1984) suggested that if tasks could be identified that measure each level of Fischer's model independently, then several developmental predictions could be evaluated. First, if the steps in the model are sequential, then they predicted that performance ought tofita Guttman scale. In other words, although there might be a ceiling on performance, individuals ought to be able to accomplish all of the tasks below the highest level at which they were successful. In this study, the PRJI was designed to assess each step in the reflective judgment model between Stages 2 and 7, separately, and in fact, the scores of over 99% of the participants in the study scaled perfectly on both the chemicals and pyramids problems, with older subjects scoring at successively higher stages. These data added additional support for the sequentiality of the reflective judgment stages (Davison et al., 1980; Kitchener et al., 1989). Furthermore, Fischer, Hand, and Russell (1984) argued that when using a measure that assesses each level or stage separately, tests can be made for developmental spurts. They predicted that if individuals are provided with contextual support for high level performance and practice, their performance will move toward their optimal level, but they will fail to pass tasks beyond their optimal level. In other words, performance will be higher when individuals are provided with contextual support and practice, as they were on the PRJI, and developmental spurts should be apparent at predicted ages. Although these authors predicted that spurts would be most apparent with a separate task designed to measure each level independently, as noted in the introduction, spurts also may appear in measures designed to test functional level. Their predictions were supported by several of the findings of this study. First, as was hypothesized, participants' reflective judgment scores were higher on the PRJI after practice and were higher in the high support condition (PRJI) than on the traditional measure of reflective judgment (RJI). These findings support the skill theory assertion that the assessment context affects the level of competence that individuals exhibit (Fischer & Kenny, 1986). Thus, the same person exhibits different levels of performance under different assessment conditions and these differences are predictable ones, even in a delimited domain such as reflective judgment. Although similar findings have been documented in the cognitive developmental domain with children, this pattern had not been documented with models of adult cognitive development prior to this research effort. The effects of practice on RJI mean scores were less consis-

11 DEVELOPMENTAL RANGE % -r 75% 50% 25% 0% -25% Age of Subjects high support with practice (PRJI, Time 2) - low support (RJ1, Time 1) Figure 3. Percentage of tasks passed at reflective judgment Stage 6 under two testing conditions. (PRJI = Prototypic Reflective Judgment Interview; RJI = Reflective Judgment Interview; n = 8; N= 104.) tent. Because the RJI mean score is a measure of functional level, it provides no contextual support for performance; thus, the effects of practice may be less consistently observed. However, the RJI highest stage score may provide an estimate of the upper end of a person's developmental range, because there were no overall significant differences between the RJI highest stage score and the PRJI score. The RJI is an extended interview in which individuals are asked to exhibit a number of skills. Although the majority of responses fall below optimal level, these RJI highest stage data suggest that a few of these skills reflect optimal level even without contextual support. In other words, the RJI score itself may provide an indication of both an individual's functional and optimal levels. Kitchener and Fischer (1990) have suggested that educational interventions may be most useful when focused on enhancing the skills between where persons are typically operating and their optimal level. If that is the case, the RJI mean and highest stage scores may be a good indicator of this range. In addition, there was initial evidence for developmental spurts based on several analyses. Based on the overall ANOY\ of RJI and PRJI scores, there were significant differences between the scores of those 18 years and younger and those 19 years and older and between those 22 years and younger and those 24 years and older. The analysis of RJI highest stage and PRJI scores provided similar results except that there also were significant differences between the 14-year-olds and those who were older. In fact, these data provide support for the prediction that there would be spurts in developmental growth at about age 19 (Fischer, Hand, & Russell, 1984). The difference in scores between 22-year-olds and those 24 and older was slightly earlier than the spurt between those of ages 24 and 25 that Fischer and his colleagues predicted. However, evidence for this spurt can be drawn from the initial ANOVAs, the one-way ANOVAs of the PRJI and the RJI highest stage score, and the graphs of individual stages. Evidence for the spurt between ages 14 and 15 was less strong, although it was apparent in the initial analyses of PRJI and RJI highest stage scores and in the graphs of individual stages. Fischer, Hand, and Russell's (1984) predictions were, however, more specific than the age-related differences noted earlier. They predicted that abstract mappings would first be apparent at age 14-15, abstract systems would appear at about age 19, and single principles or systems of abstract systems would be initially observed at age and that these would be most apparent in conditions favorable to testing for optimal level. The second PRJI testing was designed to test this hypothesis. In fact, based on the graphs of the PRJI data by stage at the second testing, these predictions received some support. No

904 KITCHENER, LYNCH, FISCHER, AND WOOD Table 3 Reflective Judgment Scores: Means, Highest Stage, and Prototypic by Gender and Condition Group/subjects Group 1 (ages 14-18) Experimental Male Female

12 904 KITCHENER, LYNCH, FISCHER, AND WOOD Table 3 Reflective Judgment Scores: Means, Highest Stage, and Prototypic by Gender and Condition Group/subjects Group 1 (ages 14-18) Experimental Male Female Control Male Female Group 2 (ages 19-22) Experimental Male Female Control Male Female Group 3 (ages 23-28) Experimental Male Female Control Male Female M RJI SD RJI highest stage M SD M Note. RJI = Reflective Judgment Interview; PRJI = Prototypic Reflective Judgment Interview. Dashes indicate that subjects in the control condition did not participate in the PRJI. PRJI SD _ individuals at age 14 were able to pass the reflective judgment Stage 5 tasks (Fischer's abstract mapping level), but by age 15 the percentage was 37.5% (see Figure 2). Although the data for the level of abstract systems (reflective judgment Stage 6) were not quite as clear-cut, in general, performance was low through age 18 and then climbed rapidly over the next 3 years to 56% (see Figure 3). The data on the development of single principles were similar (reflective judgment Stage 7). No more than 12.5% of Stage 7 tasks were passed by any age group before age 25/26; however, 62.5% of the Stage 7 tasks were passed by this age group. These graphs also more clearly reveal the apparent agerelated ceilings on both RJI and PRJI performance. In fact, the amount of practice and instruction provided in this study was minimal: Participants completed the PRJI during the first testing session and then considered some examples of more adequate reasoning between testing sessions. Fischer and Kenny (1986) have argued that performing consistently at optimal level requires sustained effort and practice. Therefore, there is no guarantee that the full limits of the participants' competence were tested in this study (Kliegl & Baltes, 1987). On the other hand, the upper limits on performance found on the PRJI were theoretically consistent ones. Based on skill theory, the prediction would be that with more practice, more individuals within an age group would reach their optimal level but optimal level itself would not be higher. If this prediction is supported through future research, it would suggest that there are important developmental changes that do not take place until the mid-20s (Basseches, 1984; Richards & Commons, 1984). Thisfindinghas important implications for the teaching and learning of reflective thinking and judgment skills. It also should be noted that some evidence for developmental spurts occurred in the RJI data even at the first testing, although these changes lagged behind those that occurred in the PRJI data. For example, virtually no Stage 6 reasoning was apparent before age 22, but by age 23 this had increased to 37.5%. Although Fischer, Pipp, and Bullock (1984), van der Maas and Molenaar (1992), and van Geert (1991) have argued that cross-sectional data may be used to discern developmental discontinuities, this approach is not without its problems. There may be several alternative explanations for the appearance of the spurts in the ANOVAs andfigures,particularly those found in the PRJI data. First, there may be unequal intervals between the stages of the reflective judgment model or the skill theory levels. However, all prior reflective judgment studies have found no discontinuities or gaps in performance, and no evidence from longitudinal studies indicates such departures from an interval scale. Cross-sectional and longitudinal studies of sequentiality have revealed no patterns of stage skipping or reversal (Davison et al., 1980; Kitchener et al., 1989). Furthermore, it would be highly coincidental if scaling problems led to the particular age group differences in PRJI scores that were predicted by Fischer's model. In addition, in this study, age and educational level were at least partially confounded. Although individuals were sampled by age and not age and educational level, in general, most of the 18-year-olds had started college whereas most of the 17-yearolds had not. Similarly, those participants age 24 and older were all in graduate school, whereas before age 22, participants had only an undergraduate education. Thus, the apparent developmental spurts in the data may be due to some extent to the increased demands of a changing educational environment. Be-

DEVELOPMENTAL RANGE 905 cause Fischer's model suggests that practice increases the chances that individuals will be operating at their optimal level, if this occurred, it would not be inconsistent

13 DEVELOPMENTAL RANGE 905 cause Fischer's model suggests that practice increases the chances that individuals will be operating at their optimal level, if this occurred, it would not be inconsistent with the model. To further test Fischer's predictions that these spurts mark the upper developmental level more universally, one would need to control for age and educational level and to study the appearance of spurts in cross-cultural settings. One additional finding is worth discussion. As noted in the introduction to this article, prior studies using the RJI have found mixed results regarding gender effects. The Gender X Age X Condition interactions for the RJI in this study provide some insight into these discrepant conclusions. In this study, when data from both the control and experimental groups at Time 1 were averaged, gender differences between men and women were virtually zero for the youngest group (.04 for Group 1) but increased to approximately one third of a stage for the oldest group (.34 for Group 3). One needs to examine prior studies to evaluate whether a similar age-related pattern of gender differences may account for the apparent discrepancies in the findings of other studies. On the other hand, although there were no overall gender differences on the PRJI, there was a Gender X Age interaction. In this case, the scores of male subjects in the youngest group were virtually identical to the female subjects'. In the middle group, the female subjects scored higher, and in the oldest group, the male subjects outscored the female subjects. It may be that, in the middle group, the contextual support provided by the PRJI itself as well as the practice compensated for the lower scores female subjects exhibited on the RJI. On the other hand, Fischer (1980) has argued that performance at the highest levels of any model requires sustained practice; thus, it may be that the practice associated with the PRJI was not sufficient to compensate for the lower scores that female subjects exhibited on the RJI; thus, they exhibited lower scores on the PRJI as well. In general, this study provided additional support for the predictions of skill theory, particularly as it relates to the development of reflective judgment and for the reflective judgment model itself. The form of the developmental function for reflective judgment is different depending on how it is assessed, although the sequentiality of the stages remains constant. People show higher levels of competence in reflective judgment after practice and even higher levels when there is contextual support during the assessment. Furthermore, at least with a sample that is in an educational setting, there is some support for the claim that under optimal testing conditions, there are age-related spurts in performance associated with the emergence of a new skill level. Whether these spurts can be accounted for by the effects of education alone or the effects of other unknown cohort effects remains to be tested. On the other hand, at least in these data it appears that ordinary functional growth is more gradual and lags behind optimal performance for several years. References Basseches, M. A. (1984). Dialectical thinking as a metasystemic form of cognitive organization. In M. L. Commons, F. A. Richards, & C. Armon (Eds.), Beyond formal operations (pp ). New York: Praeger. Belinky, M., Clinchy, B., Goldberger, N, & Tarule, J. (1986). Women's ways ofknowing: The development ofself, voice and mind. New York: Basic Books. Benack, S., & Basseches, M. A. (1989). Dialectical thinking and relativistic epistemology: Their relation in adult development. In M. L. Commons, J. D. Sinnott, F. A. Richards, & C. Armon (Eds.), Adult development (Vol. 1, pp ). New York: Praeger. Commons, M. L., Armon, C, Richards, F. A., Schrader, D. E., Farrell, E. W, Tappan, M. B., & Bauer, N. F. (1989). A multi-domain study of adult development. In M. L. Commons, J. D. Sinnott, F. A. Richards, & C. Armon (Eds.), Adult development (Vol. 1, pp ). New York: Praeger. Corrigan, R. (1983). The development of representational skills. In K. W Fischer (Ed.), Levels and transitions in children's development: New directions for child development (No. 21, pp ). San Francisco: Jossey-Bass. Davison, M. L., King, P. M., Kitchener, K. S., & Parker, C. A. (1980). The stage sequence concept in cognitive social development. Developmental Psychology, 16, Fischer, K. W (1980). A theory of cognitive development: The control and construction of hierarchies of skills. Psychological Review, 87, Fischer, K. W, Bullock, D, Rotenberg, E. J., & Raya, P. (1993). The dynamics of competence: How context contributes directly to skill. In R. Wozniak & K. Fischer (Eds.), Development in context: Acting and thinking in specific environments (JPS Series on Knowledge and Development, Vol. 1, pp ). Hillsdale, NJ: Erlbaum. Fischer, K. W, & Elmendorf, D. (1986). Becoming a different person: Transformations in personality and social behavior. In M. Perlmutter (Ed.), Minnesota symposium on child psychology (Vol. 18, pp ). Hillsdale, NJ: Erlbaum. Fischer, K. W, Hand, H. H., & Russell, S. L. (1984). The development of abstractions in adolescence and adulthood. In M. S. Commons, F. A. Richards, & C. Armon (Eds.), Beyond formal operations: Late adolescent and adult cognitive development (pp ). New York: Praeger. Fischer, K. W, Hand, H. H., Watson, M. W, Van Parys, M., & Tucker, J. (1984). Putting the child into socialization: The development of social categories in preschool children. In L. Katz (Ed.), Current topics in early childhood education (Vol. 5, pp ). Norwood, NJ: Ablex. Fischer, K. W., & Kenny, S. L. (1986). Environmental conditions for discontinuities in the development of abstractions. In R. A. Mines & K. S. Kitchener (Eds.), Adult cognitive development: Methods and models (pp ). New York: Praeger. Fischer, K. W, & Lamborn, S. (1989). Mechanismsof variation in developmental levels: Cognitive and emotional transitions during adolescence. In A. de Ribaupierre (Ed.), Transition mechanisms in child development (pp ). Cambridge, England: Cambridge University Press. Fischer, K. W, Pipp, S. L., & Bullock, D. (1984). Detecting discontinuities in development: Method and measurement. In R. Harmon & R. Emde (Eds.), Continuities and discontinuities in development (pp ). New York: Plenum Press. Flavell, J. H. (1985). Cognitive development. Englewood Cliffs, NJ: Prentice Hall. Harter, S., & Monsour, A. (1992). Developmental analysis of conflict caused by opposing attributes in the adolescent self-portrait. Developmental Psychology, 28, Inhelder, B., & Piaget, J. (1958). The growth oflogical thinking. London: Routledge & Kegan Paul. Irwin, R. R., & Sheese, R. L. (1989). Problems in the proposal for a "stage" of dialectical thinking. In M. L. Commons, J. D. Sinnott, F. A. Richards, & C. Armon (Eds.), Adult development (Vol. 1, pp ). New York: Praeger.

906 KITCHENER, LYNCH, FISCHER, AND WOOD King, P. M. (1986). Formal reasoning in adults: A review and critique. In R. A. Mines&K.S. Kitchener (Eds), Adult cognitive development: Methods and models (pp.

14 906 KITCHENER, LYNCH, FISCHER, AND WOOD King, P. M. (1986). Formal reasoning in adults: A review and critique. In R. A. Mines&K.S. Kitchener (Eds), Adult cognitive development: Methods and models (pp. 1-21). New York: Praeger. King, P. M., Kitchener, K. S., Davison, M. L., Parker, C, & Wood, P. K. (1983). The justification of beliefs in young adults: A longitudinal study. Human Development, 26, King, P. M., Kitchener, K. S., Wood, P. K., & Davison, M. L. (1989). Relationship across developmental domains: A longitudinal study of intellectual, moral and ego development. In M. L. Commons, J. D. Sinnott, F. A. Richards, & C. Armon (Eds.), Adult development (Vol. 1, pp ). New York: Praeger. Kitchener, K. S. (1983). Cognition, metacognition and epistemic cognition: A three-level model of cognitive processing. Human Development, 4, Kitchener, K. S. (1986). The reflective judgment model: Characteristics, evidence, and measurement. In R. A. Mines & K. S. Kitchener (Eds.), Adult cognitive development: Methods and models (pp ). New York: Praeger. Kitchener, K. S., & Fischer, K. W (1990). A skill approach to the development of reflective thinking. In D. Kuhn (Ed.), Contributions to human development: Vol. 21. Developmental perspectives on teaching and learning thinking skills (pp ). New York: S. Karger. Kitchener, K. S., & King, P. M. (1981). Reflective judgment: Concepts of justification and their relationship to age and education. Journal of Applied Developmental Psychology, 2, Kitchener, K. S., & King, P. M. (1985). Reflective judgment scoring rules. (Available from K. S. Kitchener, School of Education, University of Denver, 2450 S. Vine Street, Denver, Colorado 80208) Kitchener, K. S., & King, P. M. (1990). The reflective judgment model: Ten years of research. In M. Commons, C. Armon, L. Kohlberg, F. Richards, T. Gratzer, & J. Sinnott (Eds.), Adult development: Vol. 2. Models and methods in the study of adolescent and adult thought (pp ). New York: Praeger. Kitchener, K. S., King, P. M., Wood, P. K., & Davison, M. L. (1989). Sequentiality and consistency in the development of reflective judgment: A six-year longitudinal study. Journal ofapplied Developmental Psychology, 10, Kliegl, R., & Baltes, P. B. (1987). Theory guided analysis of mechanisms of development and aging through testing-the-limit and research on expertise. In C. Schooler & D. W Schaie (Eds.), Cognitive functioning and social structure over the life course(pp ). Norwood, NJ: Ablex. Kuhn, D. (1989). Children and adults and intuitive scientists. Psychological Review, 96, Lamborn, S. D., & Fischer, K. W (1988). Optimal and functional levels in cognitive development: The individual's developmental range. Newsletter of the International Society for the Study of Behavioral Development, 2(Serial No 14), 1-4. Lawson, J. M. (1980). The relationship between graduate education and the development of reflective judgment: A function of age or educational experience (Doctoral dissertation, University of Minnesota). Dissertation Abstracts International, 41, 4655A. McKinney, M. (1985). Reflective judgment: An aspect of adolescent cognitive development (Doctoral dissertation, University of Denver). Dissertation Abstracts International, 47, 402B. Mines, R. A. (1982). Student development assessment techniques. In G. R. Hanson (Ed.), Measuring student development {pp ). San Francisco: Jossey-Bass. Moshman, D, & Franks, B. A. (1986). Development of the concept of inferential validity. Child Development, 57, Niemark, E. D. (1979). Current status of formal operations research. Human Development, 22, O'Brien, D. P., & Overton, W F. (1982). Conditional reasoning and the competence-performance issue: A developmental analysis of a training task. Journal of Experimental Child Psychology, 34, Rest, J. R. (1973). The hierarchical nature of moral judgment: A study of patterns of comprehension and preference of moral stages. Journal of Personality, 41, Richards, F. A., & Commons, M. L. (1984). Systemic, metasystemic and cross-paradigmatic reasoning: A case for stages of reasoning beyond formal operations. In M. L. Commons, F. A. Richards, & C. Armon (Eds.), Beyond formal operations: Late adolescent and adult cognitive development (pp ). New \brk: Praeger. Sakalys, J. A. (1982). Effects of a research methods course on nursing students' research attitudes and cognitive development (Doctoral dissertation, University of Denver). Dissertation Abstracts International, 43, 2254A. Schmidt, J. A. (1985). Older and wiser? A longitudinal study of the impact of college on intellectual development. Journal of College Student Personnel, 26, Strange, C, & King, P. M. (1981). Intellectual development and its relationship to maturation during the college years. Journal of Applied Developmental Psychology, 2, Thatcher, R. W (1991). Maturation of the human frontal lobes: Physiological evidence for staging. Developmental Neuropsychology, 7, van der Maas, H., & Molenaar, P. (1992). A catastrophe theoretical approach to stagewise cognitive development. Psychological Review, 99, van Geert, P. (1991). A dynamic systems model of cognitive and language growth. Psychological Review, 98, Wahlsten, D. (1990). The insensitivity of analysis of variance to heredity environment interactions. Behavioral and Brain Sciences, 13, Welfel, E. R. (1982). How students make judgments: Do educational level and academic major make a difference? Journal ofcollege Student Personnel, 23, Welfel, E. R., & Davison, M. L. (1986). The development of reflective judgment in the college years: A four year longitudinal study. Journal of College Student Personnel, 27, Wood, P. K. (1983). Inquiring systems and problem structure: Implications for cognitive development. Human Development, 26, Wood, P. K., & Games, P. (1991). Rationale, detection, and implications of interactions between independent variables and unmeasured variables in linear models. Multivariate Behavioral Research, 25, Received July 24,1991 Revision received September 16,1992 Accepted March 17,1993

Critical Thinking Assessment at MCC. How are we doing?

Critical Thinking Assessment at MCC How are we doing? Prepared by Maura McCool, M.S. Office of Research, Evaluation and Assessment Metropolitan Community Colleges Fall 2003 1 General Education Assessment