ANALYSIS OF THE CHUNKED READING TEST AND READING COMPREHENSION 1. Ronald P. Carver and Charles A. Darby, Jr.* Abstract

Similar documents
FACTOR ANALYSIS OF THE ABILITY TO COMPREHEND TIME-COMPRESSED SPEECH 1

SEMINAR ON SERVICE MARKETING

THE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER

Description of components in tailored testing

Denny Borsboom Jaap van Heerden Gideon J. Mellenbergh

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2

Incorporating quantitative information into a linear ordering" GEORGE R. POTTS Dartmouth College, Hanover, New Hampshire 03755

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity

Bruno D. Zumbo, Ph.D. University of Northern British Columbia

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison

1 The conceptual underpinnings of statistical power

CHAPTER 3 METHOD AND PROCEDURE

A Comparison of Several Goodness-of-Fit Statistics

ASSESSING THE EFFECTS OF MISSING DATA. John D. Hutcheson, Jr. and James E. Prather, Georgia State University

Development, Standardization and Application of

PSYCHOMETRIC PROPERTIES OF CLINICAL PERFORMANCE RATINGS

In this chapter we discuss validity issues for quantitative research and for qualitative research.

Bayesian Tailored Testing and the Influence

An Empirical Test of a Postulate of a Mediating Process between Mind Processes Raimo J Laasonen Project Researcher Vihti Finland

Psychology Research Methods Lab Session Week 10. Survey Design. Due at the Start of Lab: Lab Assignment 3. Rationale for Today s Lab Session

Discrimination Weighting on a Multiple Choice Exam

How to interpret results of metaanalysis

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form

LEDYARD R TUCKER AND CHARLES LEWIS

Technical Specifications

Chapter 11 Nonexperimental Quantitative Research Steps in Nonexperimental Research

Chapter 3 Tools for Practical Theorizing: Theoretical Maps and Ecosystem Maps

USE AND MISUSE OF MIXED MODEL ANALYSIS VARIANCE IN ECOLOGICAL STUDIES1

Competency Rubric Bank for the Sciences (CRBS)

Empirical Knowledge: based on observations. Answer questions why, whom, how, and when.

INADEQUACIES OF SIGNIFICANCE TESTS IN

Essential Skills for Evidence-based Practice Understanding and Using Systematic Reviews

EVALUATING AND IMPROVING MULTIPLE CHOICE QUESTIONS

equation involving two test variables.

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

Research Prospectus. Your major writing assignment for the quarter is to prepare a twelve-page research prospectus.

Conclusion. The international conflicts related to identity issues are a contemporary concern of societies

Does momentary accessibility influence metacomprehension judgments? The influence of study judgment lags on accessibility effects

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories

The Myers Briggs Type Inventory

European Federation of Statisticians in the Pharmaceutical Industry (EFSPI)

Highlighting Effect: The Function of Rebuttals in Written Argument

Reliability, validity, and all that jazz

Author's response to reviews

Measuring and Assessing Study Quality

BACKGROUND CHARACTERISTICS OF EXAMINEES SHOWING UNUSUAL TEST BEHAVIOR ON THE GRADUATE RECORD EXAMINATIONS

CONTENT ANALYSIS OF COGNITIVE BIAS: DEVELOPMENT OF A STANDARDIZED MEASURE Heather M. Hartman-Hall David A. F. Haaga

Cambridge Pre-U 9773 Psychology June 2013 Principal Examiner Report for Teachers

Chapter 4 DESIGN OF EXPERIMENTS

Comments on David Rosenthal s Consciousness, Content, and Metacognitive Judgments

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto

Psychology 2019 v1.3. IA2 high-level annotated sample response. Student experiment (20%) August Assessment objectives

Fundamental Concepts for Using Diagnostic Classification Models. Section #2 NCME 2016 Training Session. NCME 2016 Training Session: Section 2

The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016

Basic Concepts in Research and DATA Analysis

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments

Pediatrics Milestones and Meaningful Assessment Translating the Pediatrics Milestones into Assessment Items for use in the Clinical Setting

Chapter 1. Research : A way of thinking

Feature encoding and pattern classifications with sequentially presented Markov stimuli*

Reliability, validity, and all that jazz

sample of 85 graduate students at The University of Michigan s influenza, benefits provided by a flu shot, and the barriers or costs associated

Chapter 1. Research : A way of thinking

computation and interpretation of indices of reliability. In

Critical Thinking Assessment at MCC. How are we doing?

TYPES OF HYPNOTIC DREAMS AND THEIR RELATION TO HYPNOTIC DEPTH 1

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS

Comparing Direct and Indirect Measures of Just Rewards: What Have We Learned?

INDIVIDUAL DIFFERENCES, COGNITIVE ABILITIES, AND THE INTERPRETATION OF AUDITORY GRAPHS. Bruce N. Walker and Lisa M. Mauney

Author s response to reviews

VARIED THRUSH MANUSCRIPT REVIEW HISTORY REVIEWS (ROUND 2) Editor Decision Letter

The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing

Reliability Theory for Total Test Scores. Measurement Methods Lecture 7 2/27/2007

Issues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy

1. The Role of Sample Survey Design

Gender-Based Differential Item Performance in English Usage Items

VERDIN MANUSCRIPT REVIEW HISTORY REVISION NOTES FROM AUTHORS (ROUND 2)

2 Critical thinking guidelines

Item Writing Guide for the National Board for Certification of Hospice and Palliative Nurses

The complete Insight Technical Manual includes a comprehensive section on validity. INSIGHT Inventory 99.72% % % Mean

Word Association Type and the Temporal Stacking of Responses

A brief history of the Fail Safe Number in Applied Research. Moritz Heene. University of Graz, Austria

DON M. PALLAIS, CPA 14 Dahlgren Road Richmond, Virginia Telephone: (804) Fax: (804)

FSA Training Papers Grade 7 Exemplars. Rationales

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

FMEA AND RPN NUMBERS. Failure Mode Severity Occurrence Detection RPN A B

Social inclusion as recognition? My purpose here is not to advocate for recognition paradigm as a superior way of defining/defending SI.

PEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW. Ball State University

Audio: In this lecture we are going to address psychology as a science. Slide #2

Generalization and Theory-Building in Software Engineering Research

Checklist of Key Considerations for Development of Program Logic Models [author name removed for anonymity during review] April 2018

Recognizing Ambiguity

Shiken: JALT Testing & Evaluation SIG Newsletter. 12 (2). April 2008 (p )

Information Structure for Geometric Analogies: A Test Theory Approach

By Gordon Welty Wright State University Dayton, OH 45435

Basic concepts and principles of classical test theory

DRAFT (Final) Concept Paper On choosing appropriate estimands and defining sensitivity analyses in confirmatory clinical trials

THE NATURE OF OBJECTIVITY WITH THE RASCH MODEL

How is ethics like logistic regression? Ethics decisions, like statistical inferences, are informative only if they re not too easy or too hard 1

Transcription:

282 Journal of Reading Behavior 1972-73 * Vol. 5, No. 4, Fall ANALYSIS OF THE CHUNKED READING TEST AND READING COMPREHENSION 1 Ronald P. Carver and Charles A. Darby, Jr.* Abstract The newly developed Chunked Reading Test was further analyzed by correlating the scores on this test with three other standardized reading tests-davis Reading Test, Nelson-Denny Reading Test, and Tinker Speed of Reading Test. A rational analysis of the scores on all of the tests suggested that each score could be designated as measuring one of the following three types of variables: efficiency (E), accuracy (A), and rate (R). The tests were administered to 41 college students and the inter-correlations were factor analyzed. Two factors fit the data, and they were readily interpretable as an A factor and an R factor. A single factor fit forced upon the data was readily interpretable as an E factor. The results suggested that: (a) apparent differences among the variables measured by standardized reading tests for adults are more superficial than real, (b) all of the scores on these tests can be interpreted as being valid indicators of individual differences in the efficiency, accuracy, and rate at which thoughts are understood while reading, and (c) empirical support exists for the purported theoretical relationship, E=AR. Recently, a new type of test item, termed "chunked" has shown promise as an indicator of information stored during reading (Carver, 1970a). A chunk is a group of words, usually longer than a word and usually shorter than a sentence in length. Each chunk forms a meaningful and practical unit of connected discourse. A standardized test, using the chunked type of item, has been developed (Carver and Darby, 1971). In developing the test, items were selected and revised on the basis of whether they discriminated between those individuals who had previously read the passages accompanying the test and those who had not. An evaluation of the test indicated that it was highly successful in discriminating between readers" and nonreaders of passages. This result suggested that the test can be considered as valid as a measure of information stored during reading. 1 This research was supported by a General Research Support Grant, National Institutes of Health, and by the American Institutes for Research. Following the conduct of this research, the Chunked Reading Test has been published by the American Institutes for Research as the Carver-Darby Chunked Reading Test, 1970, and it is now being distributed by Revrac Publications, Silver Spring, Maryland 20910. * Drs. Ronald P. Carver and Charles A. Darby, Jr. are on the staff of the American Institutes for Research, Washington, D. C.

283 The previous research on this test has focused primarily upon the measurement of changes within individuals as a result of reading rather than upon differences between individuals and their interrelationships. Previous results have suggested, however, that the test correlates highly with traditional reading comprehension tests (Carver & Darby, 1971). The following research was designed to investigate in more detail the extent to which individual differences indicated by the Chunked Reading Test are similar to or different from these indicated by traditional standardized reading tests. After the data were collected, it became evident that a factor analysis would provide results relevant to one aspect of a recently formulated theory of reading (Carver, 1972a). It has been theorized that the efficiency of understanding thoughts during reading (E) is a product of the accuracy connected with the understanding process (A), and the' rate at which the thoughts are being input (R), i.e., E = AR. (1) E is the number of thoughts correctly understood per a unit of time. A is the number of thoughts correctly understood per the number of thoughts input, or covered. And, R is the number of thoughts input per a unit of time. Since it appears to be almost impossible to discriminate between understanding during reading and information stored during reading, both conceptually (Carver, 1971a) and empirically (Carver, 1973), it seemed desirable to analyze and interpret the Chunked test results with respect to Equation 1. And, it seemed rationally sound to analyze traditional reading tests in terms of E, A, and R, also. Method Subjects. Forty-one volunteer college students, male and female, were paid for their participation in the study. Procedure. The 5s were administered five tests during a three hour period. The tests, in their order of administration, were: Chunked Reading Test, Form A; Davis Reading Test, Form 1-B; Chunked Reading Test, Form B; Nelson-Denny Reading Test, Form B, Rate and Comprehension; and Tinker Speed of Reading Test. A 10 minute break was provided after the Davis Reading Test. To facilitate movitation, Ss were informed prior to the session that their tests would be graded during the testing period and that they would receive their scores with individual interpretation at the end of the period. Test Variables. The Chunked Reading Test, Form A and Form B, each consists of five passages and five tests with 100 test items, total. The chunked test items for each passage were developed by (a) dividing each reading passsage into 100 chunks, i.e., groups of one to five meaningfully related words, (b) retyping the passage into two columns, one chunk per column line, 50 chunks per column, (c) randomly deleting one chunk from each set of five, (d) writing 20 new chunks to replace the deleted ones (the new inserted chunks changed the meaning from the original passage), and (e)

284 revising the inserted chunks on the basis of whether they discriminated between readers and non-readers of the passage. The test requires each S to: (a) read a passage, (b) turn to the test on the following page, and (c) complete the 20 items by identifying the chunks which have changed the meaning of the original passage. This cycle is repeated for succeeding passages and tests until the 25 minute time limit is reached. Reference back to a previously read passage is not permitted. There are three scores on the test-efficiency, Rate, and Accuracy. The Efficiency score is represented by the total number of chunked items answered correctly during the time limit. The other two scores are: (a) Rate, the number of the last item attempted, and (b) Accuracy, the percent of items answered correctly, i.e., Efficiency score divided by Rate score multipled by 100. The Efficiency score would seem to be an indicant of E, in Equation 1, since the efficiency of storing information on the Chunked test would seem to be highly related to the efficiency of understanding the thoughts contained in the prose material; efficiency in both cases being the number of units (correct chunks or correct thoughts) per unit of time. Similarly, the Rate score would seem to be an indicant of R, in Equation 1, since the number of the last item attempted should be an indicant of the rate at which the thoughts were input. And, the Accuracy score would seem to be an indicant of A, in Equation 1, since the percent of items attempted that are correct should be an indicant of the accuracy at which the thoughts input are being understood. A complete description of the development and evaluation of the test has been reported elsewhere (Carver and Darby, 1971). Presented below are an example reading passage and four example test items: EXAMPLE PASSAGE Voter apathy is almost a cliche in discussions of American politics. Yet, only a cursory look at voting and registration restrictions shows that many would-be voters do not cast ballots because they are prevented from doing so. In twelve states, moving across a county line renders the voter ineligible for six months. In Mississippi the situation is even more severe, since you must be... EXAMPLE ITEMS 1. (A) Voter apathy 3. (A) because they are prevented (B) is almost a cliche (B) from doing so.. (C) in discussions (C) In twelve states, (D) of American politics. (D) moving across a county line (E) A recent poll directed (E) is sufficient, unless you are

2. (A) at voting 4. (A) ineligible for six months. (B) and registration restrictions (B) In Mississippi (C) shows that (C) the situation (D) many would-be voters (D) is the reverse, (E) seldom protest or demonstrate (E) since you must be 285 The Davis Reading Test, a 40 minute timed multiple-choice question test, includes two scores: (a) Level of Comprehension, the number correct (corrected for guessing) on the first 40 test questions, and (b) Speed of Comprehension, the total number correct (corrected for guessing) on the total 80 question test. The Level of Comprehension score should be an indicant of A, in Equation 1, since all individuals are expected to attempt the first 40 items on which this score is based. That is, the Level of Comprehension score should be an indicant of the accuracy of understanding thoughts which accompanied the rate at which the individual worked on the entire test, even though the accuracy estimate is not based upon the entire test. The Speed of Comprehension score should be an indicant of E, in Equation 1, since it represents the number correct per a fixed amount of time. From the Nelson-Denny Reading Test, the two following variables were analyzed: (a) Comprehension, total number of items answered correctly out of total of 36, and (b) Reading Rate, the number of words read during the first one minute of the test as reported by the examinee. The Comprehension score should be an indicant of E since it also represents the number correct per a fixed amount of time. The Rate score should be an indicant of R, in Equation 1. Both the Davis Reading Test and the Nelson-Denny Reading Test contain the same type of format, i.e., short reading passages are presented with questions adjacent to the passages so that the examinee may refer back to the passage while he is answering the questions if he desires. The Tinker Speed of Reading Test 2 has only one score, the number of items attempted during a 5 minute time limit. An item consists of a short sentence containing information which is used in the following sentence to detect and cross out the single word which obviously does not belong. This variable has been designated as a speed of reading variable and would seem, therefore, to be an indicant of R, in Equation 1. However, the items are so easy that A, in Equation 1, probably would not vary (i.e., A = 1.00 or 100%), therefore making the socre an indicant of E as well as A (i.e., E = 1.00 R). 2 The Tinker Speed of Reading Test, published by the University of Minnesota Press, recently has been revised so as to be appropriate for elementary school students as well as adults. It is now being published by Revrac Publications as the Basic Reading Rate Scale.

286 On the surface, the Tinker test would appear to have a great deal in common with the Chunked test, since both involve the recognition of incorrectly inserted words. However, there are fundamental differences between these two tests. The Chunked test requires an individual to read a passage and then to take a test on the passage, the test consisting of incorrectly inserted words; whereas, on the Tinker test, the reading material and the test items are one and the same. On the Tinker test, the detection of the incorrectly inserted words does not require that an original passage be read first; whereas, on the Chunked test, the items were designed so that it is practically impossible to detect the incorrectly inserted words without first having read and stored the information contained in the original passage. Finally, the Chunked test has been designed in a manner which allows the E, A, and R type variables to vary within-individuals upon two administrations of the test and between-individuals upon one administration of the test; whereas, the Tinker test has been designed so that the A variable approaches zero variance both within and between individuals. Results Reliability of the Results. Table 1 contains the intercorrelations among the eleven variables measured by the five tests, together with means and standard deviations. Some of the variables in Table 1 are the same as were investigated in an earlier study (Carver & Darby, 1971), thus allowing the reliability of the results to be evaluated. In the previous study, Efficiency scores on Forms A and B of the Chunked Reading Test correlated.71 and.61 with the Nelson-Denny Comprehension variable; these correlations were for two different groups of Ss. In the present study, the two correlations were.71 and.63 for the same group. Although the mean efficiency scores on Form A, 60.7, and Form B, 57.9, for the present group were higher than those reported for the two groups in the earlier study, 53.6 and 49.7, the differences between the scores on the two forms were essentially equal in the two studies, 2.8 and 3.9. Thus, it appears that these differences are attributable to reliable alternate-form differences, and not attributable to group differences in the earlier study or to motivation or test order interaction differences in the present study. Furthermore, in both studies there appears to be a general tendency for Form B to correlate lower with the other variables than Form A. Since the Form B Efficiency score also had a lower mean and lower standard deviation in both studies, it appears that the lower variance of Form B may be attenuating the correlations due to a slight restriction of range. Correlational Analysis. From Table 1, it may be noted that the Efficiency variable on the Chunked test correlated highly with comprehension variables on the Davis and Nelson-Denny tests. Efficiency on Form A and Form B correlated.77 and.69 with Davis Speed of Comprehension and.71 and.63 with Nelson-Denny Comprehension. These correlations are almost as high as the correlation of Form A Efficiency with Form B Efficiency,.81.

Table 1 Intercorrelations Among the Eleven Variables Measured by Five Reading Tests Chunked A Chunked B Davis Nelson-Denny Tinker Test Variables Eff. Rate Ace. Eff. Rate Ace. Level Speed Comp. Rate Speed 1 2 3 4 5 6 7 8 9 10 11 Mean S.D. Chunked Reading Test, Form A Efficiency Rate 1 2.92.48 60.7 71.3 16.4 13.8 Accuracy 3.73.48 83.9 9.8 Chunked Reading Test, Form B Efficiency 4.81.82.46 57.9 14.3 Rate 5.66.75.25.90 71.4 14.7 Accuracy 6.64.48.69.55.21 80.4 7.9 Davis Reading Test Level of Comprehension 7.64.48.67.49.31.58 24.0 6.5 Speed of Comprehension 8.77.69.64.69.59.52.82 42.7 13.8 Nelson-Denny Reading Test Comprehension 9.71.66.56.63.49.54.73.79 26.0 5.6 Reading Rate 10.34.42.09.43.42.17.11.26.27 301 124 Tinker Speed of Reading Test 11.66.64.48.61.56.35.54.75.64.51 45.0 12.5 00-0

288 These correlations are also almost as high as the correlation between the comprehension variables on these two traditional types of reading comprehension tests,.79. The correlations between the Accuracy scores on the Chunked test and the Davis Level of Comprehension scores were.67 and.58 for Form A and Form B, respectively. These correlations are almost as high as the correlation between Accuracy scores on Form A and Form B,.69. The Form A and Form B Rate scores were among the highest correlates of the Rate score on the Nelson-Denny test,.42 and.42, and were also among the highest correlates of the Tinker test,.64 and.56. These latter correlations were almost as high as the correlation between the Rate scores on Form A and Form B,.75. All of the correlations discussed above have been attenuated due to a restriction of range in ability. Table 2 compares the means and variances of the present sample (N = 41) with the corresponding means and variances of the normative groups. The mean of the present sample was above the normative average and the variance was less on both tests. Table 2 Data Comparing Present Sample with Normative Sample for the Davis and Nelson-Denny Tests Test Present Sample Normative Sample* Davis Reading Test Speed of Comprehension Mean Variance 43 190 32 220 Nelson-Denny Reading Test Comprehension Mean Variance 26 32 19 40 * These data are averages of a set of relevant normative groups given in the respective test manuals. Factor Analysis. The correlations among the two forms of the Chunked test and the other three reading tests were factor analyzed. The results of this analysis are presented in Table 3. The factor analysis was a principal components solution with an oblique rotation of the primary factors. The decision to terminate the extraction of factors was based upon the number of eigen values greater than one, and the two factors which resulted accounted for 75.0% of the variance. The oblique analysis was chosen because previous data suggest that individual differences in A and R are not orthogonal to each other but are correlated positively (e.g., Stroud, 1942). That is, those individuals who are the more accurate in understanding

289 thoughts tend also to understand them at higher rates. between the two oblique factors in Table 3 was.56. The correlation Table 3 Principal Component Factor Analysis of the Reading Test Variables With an Oblique Rotation of the Primary Factors Variables Chunked Reading Test, Form A Efficiency Score Accuracy Score Rate Score Chunked Reading Test, Form B Efficiency Score Accuracy Score Rate Score Davis Reading Test Level of Comprehension Score Speed of Comprehension Score Type of Score E A R E A R A E Single Factor Fit (E) *.93.71-87 *.87.67.73.76 *.89 Oblique Analysis Factor I (A).58 *.96.26.21 -.13 I *.93 1.69 Factor II (R).48 -.17 *.73.78 -.11 *.97 -.09.32 Nelson-Denny Reading Test Comprehension Score Rate Score E R *.83.43.67 -.35.27 *.86 Tinker Speed of Reading Test E&R *.79.29 *.61 Note: A rectangular box circumscribes all loadings greater than.80. An asterisk accompanies the loading factor which each test variable was supposed to measure as designated by the type of score, E, A, or R. Thus, the relative degree to which the theoretically designated type of score matches the empirical results can be evaluated by noting the number of coincidences between the asterisks and rectangular boxes for each variable. The loadings greater than.80 in Table 3 have been indicated with a circumscribed rectangular box so as to indicate the variables which help define each factor. Also included in Table 3 is a column which designates the type of score, i.e., E, R, or A, supposedly being measured by each test variable.

290 Notice that Factor I has been designated as an A factor because the three loadings greater than.80 were all accuracy type variables and no variable other than an A type variable loaded this high. Factor II has been designated an R factor because its two loadings greater than.80 were both R type variables. Two of the four R type variables failed to load at the.80 level, but both were relatively close to this arbitrary cut-off (.73 &.61). The factor analysis has partitioned the data into its primary components, A and R. Since the product of A and R is theorized to be a single variable, E, it seemed desirable to check on whether one factor fit to these data would, in fact, be an E type variable. The single factor fit is also presented in Table 3. The results are in accordance with the relationship presented in Equation 1. Four of the five variables designated as being of the E type loaded greater than.80 and the remaining variable was only.01 below the arbitrarily selected cut-off, i.e.,.79. Only one variable which was not designated as an E variable loaded higher than.80. Test Analysis. Given that E, A, and R more parsimoniously describe the eight purportedly different variables measured by the four reading tests, the results for each test will now be analyzed in terms of E, A, and R. The three Chunked test scores-efficiency, Accuracy, and Rateloaded highest on the corresponding E, A, and R factors with only one minor exception out of the six instances. The Form A Rate score loaded somewhat higher on the E factor,.87, than it did on the R Factor,.73. The Davis test appears to measure quite precisely the variables designated by the a priori rational analysis of the nature of the scores. That is, the Level of Comprehension score is an A type variable since it loaded.93 on the A factor. The Speed of Comprehension score is an E type variable since it loaded.89 on the E factor. The Nelson-Denny Test scores also appeared to measure the variables designated by the a priori rational analysis. The Comprehension score is an E type variable since it loads.83 on the E factor. The Rate score is an R type variable since it loads.86 on the R factor. The Tinker Test was designated as an E type variable as well as an R type variable since A was held constant between individuals. It loaded highest on the E factor,.79, and second highest on the R factor,.61. Discussion Reliability of Results. The data were collected from only 41 Ss and the primary results involving the factor analysis were not replicated (see Armstrong & Soelberg, 1968). Thus, a certain degree of caution should be exercised when generalizing from the results. However, there was a near perfect replication of the correlational results of an earlier study involving the Chunked test and the Nelson-Denny test. This result suggests that the probability of getting drastically different results using a large randomly

291 sampled population is too small to justify the collection of additional replication data. Furthermore, none of the correlations were outside the bounds that might be expected from previous research, and the correlation between A and R factors,.56, is about what might be expected from the reported average of obtained correlations between reading rate and comprehension,.40 (Stroud, 1942). Factor Analysis. The pattern of loadings of the variables on the factors was so consistent that the naming of the factors was self-evident. Nine of the twelve variables designated as E, A, or R were perfectly consistent with the E, A, and R factors, and two of the three inconsistencies were only inconsistent because the E and R loadings for the Tinker Test did not quite reach the arbitrarily selected cut-off of.80. The results can be interpreted as providing a conceptual framework for succinctly integrating what is being measured by existing reading tests. These tests may appear on the surface to be different because: (a) different names are given to the test scores, (b) different techniques have been used for test development, (c) different procedures are used for scoring, and (d) different passage and item formats are employed. Yet, the present results serve to explain the superficial differences among existing standardized reading tests since most of the variance can be parsimoniously accounted for in terms of the three theoretical variables E, A, and R. It seems rationally sound to conceptualize reading comprehension in terms of efficiency, accuracy, and rate. There appears to be no data which would be inconsistent with the theoretical framework presented in Equation 1, and the factor analytic results of this research tends to support it. Since the scores on the tests used in the present research also represent the major types of scores measured by most standardized reading tests for mature readers, it seems reasonable to suggest that: (a) the efficiency, accuracy, and rate of understanding thoughts are the primary variables measured on all such tests, and (b) the single most important variable measured by such tests is an efficiency type variable which is a synthesis of an accuracy and a rate variable. Theoretical Implications. Reading measurement has suffered from the lack of a comprehensive theory of reading. Farr (1971) seems to haye been correct when he contended that standardized reading tests are being developed as if there was a well known theoretical construct called reading comprehension. The theoretical relationship presented in Equation 1 represents an attempt to make more explicit the old and vague concept that the most important aspect of reading is the mere understanding of sentences, or "thought-getting" (see Farr, 1971). It is a continual nemesis to reading researchers to have to admit that we are really not sure what we are talking about when we speak and write about reading "comprehension." The present research can be interpreted as

292 providing a partial solution to this problem, a solution which has empirical support. Reading comprehension can be defined as a thought communication process which involves two primary components the rate at which the thoughts are received and the accuracy with which the thoughts are understood or stored. The end product of these two components is the efficiency with which the thoughts are communicated. This definition of reading comprehension allows certain ambiguities to be illuminated and avoided. For example, it is ambiguous for certain speed reading advocates to claim that individuals can be taught to triple their reading rate with no loss in comprehension (see Carver, 1972b). If comprehension means efficiency, it may be reasonable to make this claim because a threefold increase in rate accompanied by a one-third reduction in accuracy would result in no overall loss in efficiency. Yet, speed reading advocates often suggest that there will be an increase in quantity with no loss in quality, i.e., they suggest that rate can be tripled with no reduction in accuracy. There is no theoretical rationale for how this can be or is accomplished, and the data suggest that it cannot be accomplished (see Iiddle, 1965). Previous use of the term comprehension as an umbrella for both efficiency and accuracy, but excluding rate, has presented a great deal of needless ambiguity. If comprehension is used as a term to cover all three variables efficiency, accuracy, and rate users of the term can be forced to specify which of the three variables is pertinent to any particular discussion. In the past, reading researchers have responded to claims of fantastically high reading rates with the question: What about comprehension? No longer would this question be valid because its ambiguity would be evident. A more valid question would be: What about the accuracy of comprehension? Two Major Dimensions of Test Validity. The validity of the tests used in this research can be evaluated in terms of how well they measure E, A, & R. However, before discussing these validity results, it will be helpful to present the concept of edumetric validity. Edumetric is a term which has been used to refer to those measurement concerns which focus upon the progressive within-individual gains of direct relevance to education (Carver, 1972c). Edumetric can be contrasted with psychometric, a term used to refer to those measurement concerns which have tended to focus upon the stable between-individual differences of direct relevance to psychology. The psychometric approach to testing had its historical beginnings in trait theory which focuses upon those characteristics of individuals which change very little with time, conditions, or treatment. The edumetric approach must focus upon change, gain, or improvement, however, because education is a dynamic treatment process which strives to change individuals. It is important to emphasize the differences between these two approaches to measurement because tests which are highly valid from a psychometric standpoint may be highly invalid from an edumetric standpoint. Psychometric Validity. In this section, the four reading tests will be

293 discussed with respect to their validity as indicators of stable individual differences in E, A, and R. The Chunked test does not appear to contribute anything unique to the measurement of individual differences in reading aptitude. Conversely stated, individual differences on the Chunked test appear to have a great deal in common with those on traditional reading tests, thus suggesting construct validity for this attribute of the Chunked test. However, the Chunked test is the only test of the four studied which provides a valid indicator of individual differences in all three of the variables, E, A, and R. The Davis test purports to be reflecting eight different reading skills (Davis, 1968), yet the factor analysis suggests that the two scores which it provides are reflecting E and A type variables. It appears, therefore, to be reasonable to interpret the Davis Level of Comprehension Score and Speed of Comprehension Score as valid indicants of the individual differences in the accuracy and efficiency of comprehension. The Davis test provides no indicant of rate of comprehension, in spite of the fact that one of the test scores is named "Speed of Comprehension." It should be noted that the Davis test appears to provide highly valid measures of individual differences of both E (.89 loading) and A (.93 loading). These two loadings are each lower than their counterparts on Form A of the Chunked Test but are both higher than their counterparts on Form B. This result not only reflects on the relative validity of the Chunked and the Davis tests, but it also lessens the likelihood that the research results were artifactually influenced by the disproportionate representation of the Chunked test variables in the analysis. The Nelson-Denny test purports to measure Comprehension and Rate, and it seems to validly indicate individual differences in both E (.83 loading) and R (.86 loading). It should be noted that the Nelson-Denny provides no indicator of accuracy of comprehension. The Tinker test measures what is termed "Speed of Reading." It does seem to be moderately valid as an indicate of individual differences in R (.61 loading). However, the score on this test seems to be more valid as an indicant of E (.79 loading). This test provides no indicant of the accuracy of comprehension since it uses a task and a level of material difficulty which tends to eliminate individual differences in accuracy. Edumetric Validity. None of the results of this study bear directly upon the edumetric validity of the tests. However, it is important to discuss the edumetric validity of these reading tests so that the psychometric results will not be misinterpreted through overgeneralization. The inherent danger involved in generalizing the correlational results of between-individual difference studies to within-individual change situations, there appears to be a positive relationship between R and A, of which most researchers are aware (see Carver, 1971b). That is, most research has found that those individuals who are the fastest readers (high R on reading

294 tests) tend also to be the most accurate (high A on reading tests). What often goes unnoticed is that this relationship between R and A is likely to be negative in the within-individual change situations common to reading improvement training (see Carver, 1971b). That is, those individuals who increase their R tend to emerge with a decreased A, and those individuals who decrease their R tend to emerge with an increased A. Given the fact that individual difference research will tend to produce a relationship between R and A which is exactly opposite to the relationship found in within-individual change research, it is extremely important to keep these two research situations separate when evaluating the validity of reading test scores. To further illuminate the importance of separating psychometric validify from edumetric validity, the Reading Rate score on the Nelson-Denny will be analyzed in detail. The results of the individual difference relationships presented in Table 3 suggest that this score is valid as an indicator of R in individual difference situations. But, to use this score as an indicator of R in within-individual change situations implies that it is indeed valid as an indicator of change or gain. Yet, there appears to be no research which establishes its validity in this situation. In fact, it seems likely that this score will produce a highly invalid score in situations involving reading improvement training. The score is based upon a one-minute reading sample and includes no check on the accuracy of the reading which accompanied the self-report of rate. An individual may increase his R, i.e., the Rate score may increase from pretest to posttest, with a corresponding decrease in A and E, but there is no indication of what happened to A and E, as R was increasing. It appears to be relatively simple to get individuals to skim material and thus increase R in training situations. The crucial question is whether R can be increased with no decrease in A or E. It may seem that since the Nelson- Denny provides an indicant of E, that it is not highly important that A be measured. Yet, it is relatively easy to show high increases in R on the Nelson- Denny without a serious adverse effect upon E when the individuals accelerate their rate initially for the Rate score and then decrease R considerably during that part of the test concerned with E. Hopefully, the preceding discussing has demonstrated that it is highly dangerous to generalize about the edumetric validity of a test from data which are only relevant to its psychometric validity. In this regard, the efficiency score on the Chunked test has been shown to be valid from an edumetric standpoint, i.e., more valid than the Nelson-Denny (Carver & Darby, 1971). It is also important to note that the manual for the Nelson- Denny does not even claim to produce scores which are valid from an edumetric standpoint. Yet, the Nelson-Denny is probably the most popular test used to evaluate the with-individual changes in reading ability which supposedly accrue to students enrolled in reading improvement courses, e.g., speed reading courses. If the manual for a test does not even claim to be valid from an edumetric standpoint, and if there is no data to support its

295 validity from an edumetric standpoint, it would seem to be reasonable to question the edumetric validity of the test. In general, all of the tests investigated appear to validly measure variables which differentiate among mature individuals with respect to relatively stable differences in reading ability. Thus, the present data support the recommendations made earlier (Carver, 1970b) regarding the use of all of these tests for the purpose of measuring reading aptitude. Yet, it would be dangerous to generalize the present psychometric-type results to situations wherein it is desirable to measure knowledge gained, amount comprehended, or reading improvement, i.e., edumetric-type situations (see Carver, 1970c). Practical Implications. It is hoped that Equation 1 and the results of this study will contribute to a clearer conceptualization of reading comprehension and clearer interpretations of test score results. In the past, it has been easy to confuse efficiency of comprehension with accuracy of comprehension. As mentioned earlier, the Nelson-Denny presents a single Comprehension score which is in fact an efficiency type of score. Yet, the Personal Record sheet published for the Nelson-Denny explicitly invites examinees to interpret the Comprehension score as an "accuracy" score. The Davis test is conceptually confusing since it treats the efficiency variable (Speed of Comprehension) and the accuracy variable (Level of Comprehension) as though they were two independent and equally important dimensions which completely describe all of the important variables involved. The E = AR formulation shows that it would be most logical to report either the E results or report both the A and R results. Or, as in the case of the Chunked test, an indication of all three variables could be given. The Tinker Test is also conceptually confusing in its accent upon speed when it seems to be more valid as an indicant of E. It would seem to be desirable for those who use reading tests in applied settings to evaluate them from both the psychometric and edumetric standpoint. As mentioned previously, it is dangerous to assume that a test which has been shown to be valid from a psychometric standpoint will continue to be valid in educational settings which require edumetric validity. References ARMSTRONG, J. S., & SOELBERG, P. On the interpretation of factor analysis. Psychological Bulletin, 1968, 70, 361-364. CARVER, R. P. Brief report: on the danger involved in the use of tests which measure factors. Multivariate Behavioral Research, 1968, 5, 509-512. CARVER, R. P. Analysis of chunked test items as measures of reading and listening comprehension. Journal of Educational Measurement, 1970, 7, 141-150. (a) CARVER, R. P. What is reading comprehension and how should it be measured? In G. B. Schick (Ed.) Nineteenth Yearbook of the National Reading Conference. Milwaukee: National Reading Conference, 1970, 19, 99-106. (b) CARVER, R. P. Special problems in measuring change with psychometric devices. Proceedings of the A.I.R. seminar on evaluative research-strategies and methods.

296 Pittsburg: American Institutes for Research, 1970, 48-66. (c) CARVER, R. P. A computer model of reading and its implications for measurement and research. Reading Research Quarterly, 1971, 6, 449471. (a) CARVER, R. P. Sense and nonsense in speed reading. Silver Spring, Md.: Revrac Publications, 1971. (b) CARVER, R. P. Toward a comprehensive theory of reading, 1972, unpublished manuscript, (a) CARVER, R. P. Speed readers don't read; they skim. Psychology Today, 1972, August, 22-30. (b) CARVER, R. P. Reading tests in 1970 versus 1980: psychometric versus edumetric tests. The Reading Teacher, 1972, 26, 299-302. (c) CARVER, R. P. Understanding, information-processing, and learning from prose materials. Journal of Educational Psychology, 1973, 64, 76-84. CARVER, R. P., & DARBV, CHARLES A., Jr. Development and evaluation of a test of information storage during reading. Journal of Educational Measurement, 1971, 8(1), 33-44. DAVIS, F. B. Research in comprehension in reading. Reading Research Quarterly, 1968, 3, 499-545. EARR, R. Measuring reading comprehension: an historical perspective. Twentieth Yearbook of the National Reading Conference. Milwaukee: National Reading Conference, 1971, 187-197. LIDDLE, W. An investigation of the Wood Reading' Dynamics Method. Ann Arbor, Michigan: University Microfilms, 1965, No. 66-5559. STROUD, J. B. A critical note on reading. Psychological Bulletin, 1942, 39, 173-178.