ANALYSIS OF THE CHUNKED READING TEST AND READING COMPREHENSION 1. Ronald P. Carver and Charles A. Darby, Jr.* Abstract

Size: px
Start display at page:

Download "ANALYSIS OF THE CHUNKED READING TEST AND READING COMPREHENSION 1. Ronald P. Carver and Charles A. Darby, Jr.* Abstract"

Transcription

1 282 Journal of Reading Behavior * Vol. 5, No. 4, Fall ANALYSIS OF THE CHUNKED READING TEST AND READING COMPREHENSION 1 Ronald P. Carver and Charles A. Darby, Jr.* Abstract The newly developed Chunked Reading Test was further analyzed by correlating the scores on this test with three other standardized reading tests-davis Reading Test, Nelson-Denny Reading Test, and Tinker Speed of Reading Test. A rational analysis of the scores on all of the tests suggested that each score could be designated as measuring one of the following three types of variables: efficiency (E), accuracy (A), and rate (R). The tests were administered to 41 college students and the inter-correlations were factor analyzed. Two factors fit the data, and they were readily interpretable as an A factor and an R factor. A single factor fit forced upon the data was readily interpretable as an E factor. The results suggested that: (a) apparent differences among the variables measured by standardized reading tests for adults are more superficial than real, (b) all of the scores on these tests can be interpreted as being valid indicators of individual differences in the efficiency, accuracy, and rate at which thoughts are understood while reading, and (c) empirical support exists for the purported theoretical relationship, E=AR. Recently, a new type of test item, termed "chunked" has shown promise as an indicator of information stored during reading (Carver, 1970a). A chunk is a group of words, usually longer than a word and usually shorter than a sentence in length. Each chunk forms a meaningful and practical unit of connected discourse. A standardized test, using the chunked type of item, has been developed (Carver and Darby, 1971). In developing the test, items were selected and revised on the basis of whether they discriminated between those individuals who had previously read the passages accompanying the test and those who had not. An evaluation of the test indicated that it was highly successful in discriminating between readers" and nonreaders of passages. This result suggested that the test can be considered as valid as a measure of information stored during reading. 1 This research was supported by a General Research Support Grant, National Institutes of Health, and by the American Institutes for Research. Following the conduct of this research, the Chunked Reading Test has been published by the American Institutes for Research as the Carver-Darby Chunked Reading Test, 1970, and it is now being distributed by Revrac Publications, Silver Spring, Maryland * Drs. Ronald P. Carver and Charles A. Darby, Jr. are on the staff of the American Institutes for Research, Washington, D. C.

2 283 The previous research on this test has focused primarily upon the measurement of changes within individuals as a result of reading rather than upon differences between individuals and their interrelationships. Previous results have suggested, however, that the test correlates highly with traditional reading comprehension tests (Carver & Darby, 1971). The following research was designed to investigate in more detail the extent to which individual differences indicated by the Chunked Reading Test are similar to or different from these indicated by traditional standardized reading tests. After the data were collected, it became evident that a factor analysis would provide results relevant to one aspect of a recently formulated theory of reading (Carver, 1972a). It has been theorized that the efficiency of understanding thoughts during reading (E) is a product of the accuracy connected with the understanding process (A), and the' rate at which the thoughts are being input (R), i.e., E = AR. (1) E is the number of thoughts correctly understood per a unit of time. A is the number of thoughts correctly understood per the number of thoughts input, or covered. And, R is the number of thoughts input per a unit of time. Since it appears to be almost impossible to discriminate between understanding during reading and information stored during reading, both conceptually (Carver, 1971a) and empirically (Carver, 1973), it seemed desirable to analyze and interpret the Chunked test results with respect to Equation 1. And, it seemed rationally sound to analyze traditional reading tests in terms of E, A, and R, also. Method Subjects. Forty-one volunteer college students, male and female, were paid for their participation in the study. Procedure. The 5s were administered five tests during a three hour period. The tests, in their order of administration, were: Chunked Reading Test, Form A; Davis Reading Test, Form 1-B; Chunked Reading Test, Form B; Nelson-Denny Reading Test, Form B, Rate and Comprehension; and Tinker Speed of Reading Test. A 10 minute break was provided after the Davis Reading Test. To facilitate movitation, Ss were informed prior to the session that their tests would be graded during the testing period and that they would receive their scores with individual interpretation at the end of the period. Test Variables. The Chunked Reading Test, Form A and Form B, each consists of five passages and five tests with 100 test items, total. The chunked test items for each passage were developed by (a) dividing each reading passsage into 100 chunks, i.e., groups of one to five meaningfully related words, (b) retyping the passage into two columns, one chunk per column line, 50 chunks per column, (c) randomly deleting one chunk from each set of five, (d) writing 20 new chunks to replace the deleted ones (the new inserted chunks changed the meaning from the original passage), and (e)

3 284 revising the inserted chunks on the basis of whether they discriminated between readers and non-readers of the passage. The test requires each S to: (a) read a passage, (b) turn to the test on the following page, and (c) complete the 20 items by identifying the chunks which have changed the meaning of the original passage. This cycle is repeated for succeeding passages and tests until the 25 minute time limit is reached. Reference back to a previously read passage is not permitted. There are three scores on the test-efficiency, Rate, and Accuracy. The Efficiency score is represented by the total number of chunked items answered correctly during the time limit. The other two scores are: (a) Rate, the number of the last item attempted, and (b) Accuracy, the percent of items answered correctly, i.e., Efficiency score divided by Rate score multipled by 100. The Efficiency score would seem to be an indicant of E, in Equation 1, since the efficiency of storing information on the Chunked test would seem to be highly related to the efficiency of understanding the thoughts contained in the prose material; efficiency in both cases being the number of units (correct chunks or correct thoughts) per unit of time. Similarly, the Rate score would seem to be an indicant of R, in Equation 1, since the number of the last item attempted should be an indicant of the rate at which the thoughts were input. And, the Accuracy score would seem to be an indicant of A, in Equation 1, since the percent of items attempted that are correct should be an indicant of the accuracy at which the thoughts input are being understood. A complete description of the development and evaluation of the test has been reported elsewhere (Carver and Darby, 1971). Presented below are an example reading passage and four example test items: EXAMPLE PASSAGE Voter apathy is almost a cliche in discussions of American politics. Yet, only a cursory look at voting and registration restrictions shows that many would-be voters do not cast ballots because they are prevented from doing so. In twelve states, moving across a county line renders the voter ineligible for six months. In Mississippi the situation is even more severe, since you must be... EXAMPLE ITEMS 1. (A) Voter apathy 3. (A) because they are prevented (B) is almost a cliche (B) from doing so.. (C) in discussions (C) In twelve states, (D) of American politics. (D) moving across a county line (E) A recent poll directed (E) is sufficient, unless you are

4 2. (A) at voting 4. (A) ineligible for six months. (B) and registration restrictions (B) In Mississippi (C) shows that (C) the situation (D) many would-be voters (D) is the reverse, (E) seldom protest or demonstrate (E) since you must be 285 The Davis Reading Test, a 40 minute timed multiple-choice question test, includes two scores: (a) Level of Comprehension, the number correct (corrected for guessing) on the first 40 test questions, and (b) Speed of Comprehension, the total number correct (corrected for guessing) on the total 80 question test. The Level of Comprehension score should be an indicant of A, in Equation 1, since all individuals are expected to attempt the first 40 items on which this score is based. That is, the Level of Comprehension score should be an indicant of the accuracy of understanding thoughts which accompanied the rate at which the individual worked on the entire test, even though the accuracy estimate is not based upon the entire test. The Speed of Comprehension score should be an indicant of E, in Equation 1, since it represents the number correct per a fixed amount of time. From the Nelson-Denny Reading Test, the two following variables were analyzed: (a) Comprehension, total number of items answered correctly out of total of 36, and (b) Reading Rate, the number of words read during the first one minute of the test as reported by the examinee. The Comprehension score should be an indicant of E since it also represents the number correct per a fixed amount of time. The Rate score should be an indicant of R, in Equation 1. Both the Davis Reading Test and the Nelson-Denny Reading Test contain the same type of format, i.e., short reading passages are presented with questions adjacent to the passages so that the examinee may refer back to the passage while he is answering the questions if he desires. The Tinker Speed of Reading Test 2 has only one score, the number of items attempted during a 5 minute time limit. An item consists of a short sentence containing information which is used in the following sentence to detect and cross out the single word which obviously does not belong. This variable has been designated as a speed of reading variable and would seem, therefore, to be an indicant of R, in Equation 1. However, the items are so easy that A, in Equation 1, probably would not vary (i.e., A = 1.00 or 100%), therefore making the socre an indicant of E as well as A (i.e., E = 1.00 R). 2 The Tinker Speed of Reading Test, published by the University of Minnesota Press, recently has been revised so as to be appropriate for elementary school students as well as adults. It is now being published by Revrac Publications as the Basic Reading Rate Scale.

5 286 On the surface, the Tinker test would appear to have a great deal in common with the Chunked test, since both involve the recognition of incorrectly inserted words. However, there are fundamental differences between these two tests. The Chunked test requires an individual to read a passage and then to take a test on the passage, the test consisting of incorrectly inserted words; whereas, on the Tinker test, the reading material and the test items are one and the same. On the Tinker test, the detection of the incorrectly inserted words does not require that an original passage be read first; whereas, on the Chunked test, the items were designed so that it is practically impossible to detect the incorrectly inserted words without first having read and stored the information contained in the original passage. Finally, the Chunked test has been designed in a manner which allows the E, A, and R type variables to vary within-individuals upon two administrations of the test and between-individuals upon one administration of the test; whereas, the Tinker test has been designed so that the A variable approaches zero variance both within and between individuals. Results Reliability of the Results. Table 1 contains the intercorrelations among the eleven variables measured by the five tests, together with means and standard deviations. Some of the variables in Table 1 are the same as were investigated in an earlier study (Carver & Darby, 1971), thus allowing the reliability of the results to be evaluated. In the previous study, Efficiency scores on Forms A and B of the Chunked Reading Test correlated.71 and.61 with the Nelson-Denny Comprehension variable; these correlations were for two different groups of Ss. In the present study, the two correlations were.71 and.63 for the same group. Although the mean efficiency scores on Form A, 60.7, and Form B, 57.9, for the present group were higher than those reported for the two groups in the earlier study, 53.6 and 49.7, the differences between the scores on the two forms were essentially equal in the two studies, 2.8 and 3.9. Thus, it appears that these differences are attributable to reliable alternate-form differences, and not attributable to group differences in the earlier study or to motivation or test order interaction differences in the present study. Furthermore, in both studies there appears to be a general tendency for Form B to correlate lower with the other variables than Form A. Since the Form B Efficiency score also had a lower mean and lower standard deviation in both studies, it appears that the lower variance of Form B may be attenuating the correlations due to a slight restriction of range. Correlational Analysis. From Table 1, it may be noted that the Efficiency variable on the Chunked test correlated highly with comprehension variables on the Davis and Nelson-Denny tests. Efficiency on Form A and Form B correlated.77 and.69 with Davis Speed of Comprehension and.71 and.63 with Nelson-Denny Comprehension. These correlations are almost as high as the correlation of Form A Efficiency with Form B Efficiency,.81.

6 Table 1 Intercorrelations Among the Eleven Variables Measured by Five Reading Tests Chunked A Chunked B Davis Nelson-Denny Tinker Test Variables Eff. Rate Ace. Eff. Rate Ace. Level Speed Comp. Rate Speed Mean S.D. Chunked Reading Test, Form A Efficiency Rate Accuracy Chunked Reading Test, Form B Efficiency Rate Accuracy Davis Reading Test Level of Comprehension Speed of Comprehension Nelson-Denny Reading Test Comprehension Reading Rate Tinker Speed of Reading Test

7 288 These correlations are also almost as high as the correlation between the comprehension variables on these two traditional types of reading comprehension tests,.79. The correlations between the Accuracy scores on the Chunked test and the Davis Level of Comprehension scores were.67 and.58 for Form A and Form B, respectively. These correlations are almost as high as the correlation between Accuracy scores on Form A and Form B,.69. The Form A and Form B Rate scores were among the highest correlates of the Rate score on the Nelson-Denny test,.42 and.42, and were also among the highest correlates of the Tinker test,.64 and.56. These latter correlations were almost as high as the correlation between the Rate scores on Form A and Form B,.75. All of the correlations discussed above have been attenuated due to a restriction of range in ability. Table 2 compares the means and variances of the present sample (N = 41) with the corresponding means and variances of the normative groups. The mean of the present sample was above the normative average and the variance was less on both tests. Table 2 Data Comparing Present Sample with Normative Sample for the Davis and Nelson-Denny Tests Test Present Sample Normative Sample* Davis Reading Test Speed of Comprehension Mean Variance Nelson-Denny Reading Test Comprehension Mean Variance * These data are averages of a set of relevant normative groups given in the respective test manuals. Factor Analysis. The correlations among the two forms of the Chunked test and the other three reading tests were factor analyzed. The results of this analysis are presented in Table 3. The factor analysis was a principal components solution with an oblique rotation of the primary factors. The decision to terminate the extraction of factors was based upon the number of eigen values greater than one, and the two factors which resulted accounted for 75.0% of the variance. The oblique analysis was chosen because previous data suggest that individual differences in A and R are not orthogonal to each other but are correlated positively (e.g., Stroud, 1942). That is, those individuals who are the more accurate in understanding

8 289 thoughts tend also to understand them at higher rates. between the two oblique factors in Table 3 was.56. The correlation Table 3 Principal Component Factor Analysis of the Reading Test Variables With an Oblique Rotation of the Primary Factors Variables Chunked Reading Test, Form A Efficiency Score Accuracy Score Rate Score Chunked Reading Test, Form B Efficiency Score Accuracy Score Rate Score Davis Reading Test Level of Comprehension Score Speed of Comprehension Score Type of Score E A R E A R A E Single Factor Fit (E) * * *.89 Oblique Analysis Factor I (A).58 * I * Factor II (R) * * Nelson-Denny Reading Test Comprehension Score Rate Score E R * *.86 Tinker Speed of Reading Test E&R * *.61 Note: A rectangular box circumscribes all loadings greater than.80. An asterisk accompanies the loading factor which each test variable was supposed to measure as designated by the type of score, E, A, or R. Thus, the relative degree to which the theoretically designated type of score matches the empirical results can be evaluated by noting the number of coincidences between the asterisks and rectangular boxes for each variable. The loadings greater than.80 in Table 3 have been indicated with a circumscribed rectangular box so as to indicate the variables which help define each factor. Also included in Table 3 is a column which designates the type of score, i.e., E, R, or A, supposedly being measured by each test variable.

9 290 Notice that Factor I has been designated as an A factor because the three loadings greater than.80 were all accuracy type variables and no variable other than an A type variable loaded this high. Factor II has been designated an R factor because its two loadings greater than.80 were both R type variables. Two of the four R type variables failed to load at the.80 level, but both were relatively close to this arbitrary cut-off (.73 &.61). The factor analysis has partitioned the data into its primary components, A and R. Since the product of A and R is theorized to be a single variable, E, it seemed desirable to check on whether one factor fit to these data would, in fact, be an E type variable. The single factor fit is also presented in Table 3. The results are in accordance with the relationship presented in Equation 1. Four of the five variables designated as being of the E type loaded greater than.80 and the remaining variable was only.01 below the arbitrarily selected cut-off, i.e.,.79. Only one variable which was not designated as an E variable loaded higher than.80. Test Analysis. Given that E, A, and R more parsimoniously describe the eight purportedly different variables measured by the four reading tests, the results for each test will now be analyzed in terms of E, A, and R. The three Chunked test scores-efficiency, Accuracy, and Rateloaded highest on the corresponding E, A, and R factors with only one minor exception out of the six instances. The Form A Rate score loaded somewhat higher on the E factor,.87, than it did on the R Factor,.73. The Davis test appears to measure quite precisely the variables designated by the a priori rational analysis of the nature of the scores. That is, the Level of Comprehension score is an A type variable since it loaded.93 on the A factor. The Speed of Comprehension score is an E type variable since it loaded.89 on the E factor. The Nelson-Denny Test scores also appeared to measure the variables designated by the a priori rational analysis. The Comprehension score is an E type variable since it loads.83 on the E factor. The Rate score is an R type variable since it loads.86 on the R factor. The Tinker Test was designated as an E type variable as well as an R type variable since A was held constant between individuals. It loaded highest on the E factor,.79, and second highest on the R factor,.61. Discussion Reliability of Results. The data were collected from only 41 Ss and the primary results involving the factor analysis were not replicated (see Armstrong & Soelberg, 1968). Thus, a certain degree of caution should be exercised when generalizing from the results. However, there was a near perfect replication of the correlational results of an earlier study involving the Chunked test and the Nelson-Denny test. This result suggests that the probability of getting drastically different results using a large randomly

10 291 sampled population is too small to justify the collection of additional replication data. Furthermore, none of the correlations were outside the bounds that might be expected from previous research, and the correlation between A and R factors,.56, is about what might be expected from the reported average of obtained correlations between reading rate and comprehension,.40 (Stroud, 1942). Factor Analysis. The pattern of loadings of the variables on the factors was so consistent that the naming of the factors was self-evident. Nine of the twelve variables designated as E, A, or R were perfectly consistent with the E, A, and R factors, and two of the three inconsistencies were only inconsistent because the E and R loadings for the Tinker Test did not quite reach the arbitrarily selected cut-off of.80. The results can be interpreted as providing a conceptual framework for succinctly integrating what is being measured by existing reading tests. These tests may appear on the surface to be different because: (a) different names are given to the test scores, (b) different techniques have been used for test development, (c) different procedures are used for scoring, and (d) different passage and item formats are employed. Yet, the present results serve to explain the superficial differences among existing standardized reading tests since most of the variance can be parsimoniously accounted for in terms of the three theoretical variables E, A, and R. It seems rationally sound to conceptualize reading comprehension in terms of efficiency, accuracy, and rate. There appears to be no data which would be inconsistent with the theoretical framework presented in Equation 1, and the factor analytic results of this research tends to support it. Since the scores on the tests used in the present research also represent the major types of scores measured by most standardized reading tests for mature readers, it seems reasonable to suggest that: (a) the efficiency, accuracy, and rate of understanding thoughts are the primary variables measured on all such tests, and (b) the single most important variable measured by such tests is an efficiency type variable which is a synthesis of an accuracy and a rate variable. Theoretical Implications. Reading measurement has suffered from the lack of a comprehensive theory of reading. Farr (1971) seems to haye been correct when he contended that standardized reading tests are being developed as if there was a well known theoretical construct called reading comprehension. The theoretical relationship presented in Equation 1 represents an attempt to make more explicit the old and vague concept that the most important aspect of reading is the mere understanding of sentences, or "thought-getting" (see Farr, 1971). It is a continual nemesis to reading researchers to have to admit that we are really not sure what we are talking about when we speak and write about reading "comprehension." The present research can be interpreted as

11 292 providing a partial solution to this problem, a solution which has empirical support. Reading comprehension can be defined as a thought communication process which involves two primary components the rate at which the thoughts are received and the accuracy with which the thoughts are understood or stored. The end product of these two components is the efficiency with which the thoughts are communicated. This definition of reading comprehension allows certain ambiguities to be illuminated and avoided. For example, it is ambiguous for certain speed reading advocates to claim that individuals can be taught to triple their reading rate with no loss in comprehension (see Carver, 1972b). If comprehension means efficiency, it may be reasonable to make this claim because a threefold increase in rate accompanied by a one-third reduction in accuracy would result in no overall loss in efficiency. Yet, speed reading advocates often suggest that there will be an increase in quantity with no loss in quality, i.e., they suggest that rate can be tripled with no reduction in accuracy. There is no theoretical rationale for how this can be or is accomplished, and the data suggest that it cannot be accomplished (see Iiddle, 1965). Previous use of the term comprehension as an umbrella for both efficiency and accuracy, but excluding rate, has presented a great deal of needless ambiguity. If comprehension is used as a term to cover all three variables efficiency, accuracy, and rate users of the term can be forced to specify which of the three variables is pertinent to any particular discussion. In the past, reading researchers have responded to claims of fantastically high reading rates with the question: What about comprehension? No longer would this question be valid because its ambiguity would be evident. A more valid question would be: What about the accuracy of comprehension? Two Major Dimensions of Test Validity. The validity of the tests used in this research can be evaluated in terms of how well they measure E, A, & R. However, before discussing these validity results, it will be helpful to present the concept of edumetric validity. Edumetric is a term which has been used to refer to those measurement concerns which focus upon the progressive within-individual gains of direct relevance to education (Carver, 1972c). Edumetric can be contrasted with psychometric, a term used to refer to those measurement concerns which have tended to focus upon the stable between-individual differences of direct relevance to psychology. The psychometric approach to testing had its historical beginnings in trait theory which focuses upon those characteristics of individuals which change very little with time, conditions, or treatment. The edumetric approach must focus upon change, gain, or improvement, however, because education is a dynamic treatment process which strives to change individuals. It is important to emphasize the differences between these two approaches to measurement because tests which are highly valid from a psychometric standpoint may be highly invalid from an edumetric standpoint. Psychometric Validity. In this section, the four reading tests will be

12 293 discussed with respect to their validity as indicators of stable individual differences in E, A, and R. The Chunked test does not appear to contribute anything unique to the measurement of individual differences in reading aptitude. Conversely stated, individual differences on the Chunked test appear to have a great deal in common with those on traditional reading tests, thus suggesting construct validity for this attribute of the Chunked test. However, the Chunked test is the only test of the four studied which provides a valid indicator of individual differences in all three of the variables, E, A, and R. The Davis test purports to be reflecting eight different reading skills (Davis, 1968), yet the factor analysis suggests that the two scores which it provides are reflecting E and A type variables. It appears, therefore, to be reasonable to interpret the Davis Level of Comprehension Score and Speed of Comprehension Score as valid indicants of the individual differences in the accuracy and efficiency of comprehension. The Davis test provides no indicant of rate of comprehension, in spite of the fact that one of the test scores is named "Speed of Comprehension." It should be noted that the Davis test appears to provide highly valid measures of individual differences of both E (.89 loading) and A (.93 loading). These two loadings are each lower than their counterparts on Form A of the Chunked Test but are both higher than their counterparts on Form B. This result not only reflects on the relative validity of the Chunked and the Davis tests, but it also lessens the likelihood that the research results were artifactually influenced by the disproportionate representation of the Chunked test variables in the analysis. The Nelson-Denny test purports to measure Comprehension and Rate, and it seems to validly indicate individual differences in both E (.83 loading) and R (.86 loading). It should be noted that the Nelson-Denny provides no indicator of accuracy of comprehension. The Tinker test measures what is termed "Speed of Reading." It does seem to be moderately valid as an indicate of individual differences in R (.61 loading). However, the score on this test seems to be more valid as an indicant of E (.79 loading). This test provides no indicant of the accuracy of comprehension since it uses a task and a level of material difficulty which tends to eliminate individual differences in accuracy. Edumetric Validity. None of the results of this study bear directly upon the edumetric validity of the tests. However, it is important to discuss the edumetric validity of these reading tests so that the psychometric results will not be misinterpreted through overgeneralization. The inherent danger involved in generalizing the correlational results of between-individual difference studies to within-individual change situations, there appears to be a positive relationship between R and A, of which most researchers are aware (see Carver, 1971b). That is, most research has found that those individuals who are the fastest readers (high R on reading

13 294 tests) tend also to be the most accurate (high A on reading tests). What often goes unnoticed is that this relationship between R and A is likely to be negative in the within-individual change situations common to reading improvement training (see Carver, 1971b). That is, those individuals who increase their R tend to emerge with a decreased A, and those individuals who decrease their R tend to emerge with an increased A. Given the fact that individual difference research will tend to produce a relationship between R and A which is exactly opposite to the relationship found in within-individual change research, it is extremely important to keep these two research situations separate when evaluating the validity of reading test scores. To further illuminate the importance of separating psychometric validify from edumetric validity, the Reading Rate score on the Nelson-Denny will be analyzed in detail. The results of the individual difference relationships presented in Table 3 suggest that this score is valid as an indicator of R in individual difference situations. But, to use this score as an indicator of R in within-individual change situations implies that it is indeed valid as an indicator of change or gain. Yet, there appears to be no research which establishes its validity in this situation. In fact, it seems likely that this score will produce a highly invalid score in situations involving reading improvement training. The score is based upon a one-minute reading sample and includes no check on the accuracy of the reading which accompanied the self-report of rate. An individual may increase his R, i.e., the Rate score may increase from pretest to posttest, with a corresponding decrease in A and E, but there is no indication of what happened to A and E, as R was increasing. It appears to be relatively simple to get individuals to skim material and thus increase R in training situations. The crucial question is whether R can be increased with no decrease in A or E. It may seem that since the Nelson- Denny provides an indicant of E, that it is not highly important that A be measured. Yet, it is relatively easy to show high increases in R on the Nelson- Denny without a serious adverse effect upon E when the individuals accelerate their rate initially for the Rate score and then decrease R considerably during that part of the test concerned with E. Hopefully, the preceding discussing has demonstrated that it is highly dangerous to generalize about the edumetric validity of a test from data which are only relevant to its psychometric validity. In this regard, the efficiency score on the Chunked test has been shown to be valid from an edumetric standpoint, i.e., more valid than the Nelson-Denny (Carver & Darby, 1971). It is also important to note that the manual for the Nelson- Denny does not even claim to produce scores which are valid from an edumetric standpoint. Yet, the Nelson-Denny is probably the most popular test used to evaluate the with-individual changes in reading ability which supposedly accrue to students enrolled in reading improvement courses, e.g., speed reading courses. If the manual for a test does not even claim to be valid from an edumetric standpoint, and if there is no data to support its

14 295 validity from an edumetric standpoint, it would seem to be reasonable to question the edumetric validity of the test. In general, all of the tests investigated appear to validly measure variables which differentiate among mature individuals with respect to relatively stable differences in reading ability. Thus, the present data support the recommendations made earlier (Carver, 1970b) regarding the use of all of these tests for the purpose of measuring reading aptitude. Yet, it would be dangerous to generalize the present psychometric-type results to situations wherein it is desirable to measure knowledge gained, amount comprehended, or reading improvement, i.e., edumetric-type situations (see Carver, 1970c). Practical Implications. It is hoped that Equation 1 and the results of this study will contribute to a clearer conceptualization of reading comprehension and clearer interpretations of test score results. In the past, it has been easy to confuse efficiency of comprehension with accuracy of comprehension. As mentioned earlier, the Nelson-Denny presents a single Comprehension score which is in fact an efficiency type of score. Yet, the Personal Record sheet published for the Nelson-Denny explicitly invites examinees to interpret the Comprehension score as an "accuracy" score. The Davis test is conceptually confusing since it treats the efficiency variable (Speed of Comprehension) and the accuracy variable (Level of Comprehension) as though they were two independent and equally important dimensions which completely describe all of the important variables involved. The E = AR formulation shows that it would be most logical to report either the E results or report both the A and R results. Or, as in the case of the Chunked test, an indication of all three variables could be given. The Tinker Test is also conceptually confusing in its accent upon speed when it seems to be more valid as an indicant of E. It would seem to be desirable for those who use reading tests in applied settings to evaluate them from both the psychometric and edumetric standpoint. As mentioned previously, it is dangerous to assume that a test which has been shown to be valid from a psychometric standpoint will continue to be valid in educational settings which require edumetric validity. References ARMSTRONG, J. S., & SOELBERG, P. On the interpretation of factor analysis. Psychological Bulletin, 1968, 70, CARVER, R. P. Brief report: on the danger involved in the use of tests which measure factors. Multivariate Behavioral Research, 1968, 5, CARVER, R. P. Analysis of chunked test items as measures of reading and listening comprehension. Journal of Educational Measurement, 1970, 7, (a) CARVER, R. P. What is reading comprehension and how should it be measured? In G. B. Schick (Ed.) Nineteenth Yearbook of the National Reading Conference. Milwaukee: National Reading Conference, 1970, 19, (b) CARVER, R. P. Special problems in measuring change with psychometric devices. Proceedings of the A.I.R. seminar on evaluative research-strategies and methods.

15 296 Pittsburg: American Institutes for Research, 1970, (c) CARVER, R. P. A computer model of reading and its implications for measurement and research. Reading Research Quarterly, 1971, 6, (a) CARVER, R. P. Sense and nonsense in speed reading. Silver Spring, Md.: Revrac Publications, (b) CARVER, R. P. Toward a comprehensive theory of reading, 1972, unpublished manuscript, (a) CARVER, R. P. Speed readers don't read; they skim. Psychology Today, 1972, August, (b) CARVER, R. P. Reading tests in 1970 versus 1980: psychometric versus edumetric tests. The Reading Teacher, 1972, 26, (c) CARVER, R. P. Understanding, information-processing, and learning from prose materials. Journal of Educational Psychology, 1973, 64, CARVER, R. P., & DARBV, CHARLES A., Jr. Development and evaluation of a test of information storage during reading. Journal of Educational Measurement, 1971, 8(1), DAVIS, F. B. Research in comprehension in reading. Reading Research Quarterly, 1968, 3, EARR, R. Measuring reading comprehension: an historical perspective. Twentieth Yearbook of the National Reading Conference. Milwaukee: National Reading Conference, 1971, LIDDLE, W. An investigation of the Wood Reading' Dynamics Method. Ann Arbor, Michigan: University Microfilms, 1965, No STROUD, J. B. A critical note on reading. Psychological Bulletin, 1942, 39,

FACTOR ANALYSIS OF THE ABILITY TO COMPREHEND TIME-COMPRESSED SPEECH 1

FACTOR ANALYSIS OF THE ABILITY TO COMPREHEND TIME-COMPRESSED SPEECH 1 40 Journal of Reading Behavior 1971-72 Vol. 4, No. 1, Winter FACTOR ANALYSIS OF THE ABILITY TO COMPREHEND TIME-COMPRESSED SPEECH 1 Ronald P. Carver, Raymond L. Johnson, and Herbert L. Friedman* Abstract

More information

SEMINAR ON SERVICE MARKETING

SEMINAR ON SERVICE MARKETING SEMINAR ON SERVICE MARKETING Tracy Mary - Nancy LOGO John O. Summers Indiana University Guidelines for Conducting Research and Publishing in Marketing: From Conceptualization through the Review Process

More information

THE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER

THE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER THE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER Introduction, 639. Factor analysis, 639. Discriminant analysis, 644. INTRODUCTION

More information

Description of components in tailored testing

Description of components in tailored testing Behavior Research Methods & Instrumentation 1977. Vol. 9 (2).153-157 Description of components in tailored testing WAYNE M. PATIENCE University ofmissouri, Columbia, Missouri 65201 The major purpose of

More information

Denny Borsboom Jaap van Heerden Gideon J. Mellenbergh

Denny Borsboom Jaap van Heerden Gideon J. Mellenbergh Validity and Truth Denny Borsboom Jaap van Heerden Gideon J. Mellenbergh Department of Psychology, University of Amsterdam ml borsboom.d@macmail.psy.uva.nl Summary. This paper analyzes the semantics of

More information

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2 MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and Lord Equating Methods 1,2 Lisa A. Keller, Ronald K. Hambleton, Pauline Parker, Jenna Copella University of Massachusetts

More information

Incorporating quantitative information into a linear ordering" GEORGE R. POTTS Dartmouth College, Hanover, New Hampshire 03755

Incorporating quantitative information into a linear ordering GEORGE R. POTTS Dartmouth College, Hanover, New Hampshire 03755 Memory & Cognition 1974, Vol. 2, No.3, 533 538 Incorporating quantitative information into a linear ordering" GEORGE R. POTTS Dartmouth College, Hanover, New Hampshire 03755 Ss were required to learn linear

More information

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity Measurement & Variables - Initial step is to conceptualize and clarify the concepts embedded in a hypothesis or research question with

More information

Bruno D. Zumbo, Ph.D. University of Northern British Columbia

Bruno D. Zumbo, Ph.D. University of Northern British Columbia Bruno Zumbo 1 The Effect of DIF and Impact on Classical Test Statistics: Undetected DIF and Impact, and the Reliability and Interpretability of Scores from a Language Proficiency Test Bruno D. Zumbo, Ph.D.

More information

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison Empowered by Psychometrics The Fundamentals of Psychometrics Jim Wollack University of Wisconsin Madison Psycho-what? Psychometrics is the field of study concerned with the measurement of mental and psychological

More information

1 The conceptual underpinnings of statistical power

1 The conceptual underpinnings of statistical power 1 The conceptual underpinnings of statistical power The importance of statistical power As currently practiced in the social and health sciences, inferential statistics rest solidly upon two pillars: statistical

More information

CHAPTER 3 METHOD AND PROCEDURE

CHAPTER 3 METHOD AND PROCEDURE CHAPTER 3 METHOD AND PROCEDURE Previous chapter namely Review of the Literature was concerned with the review of the research studies conducted in the field of teacher education, with special reference

More information

A Comparison of Several Goodness-of-Fit Statistics

A Comparison of Several Goodness-of-Fit Statistics A Comparison of Several Goodness-of-Fit Statistics Robert L. McKinley The University of Toledo Craig N. Mills Educational Testing Service A study was conducted to evaluate four goodnessof-fit procedures

More information

ASSESSING THE EFFECTS OF MISSING DATA. John D. Hutcheson, Jr. and James E. Prather, Georgia State University

ASSESSING THE EFFECTS OF MISSING DATA. John D. Hutcheson, Jr. and James E. Prather, Georgia State University ASSESSING THE EFFECTS OF MISSING DATA John D. Hutcheson, Jr. and James E. Prather, Georgia State University Problems resulting from incomplete data occur in almost every type of research, but survey research

More information

Development, Standardization and Application of

Development, Standardization and Application of American Journal of Educational Research, 2018, Vol. 6, No. 3, 238-257 Available online at http://pubs.sciepub.com/education/6/3/11 Science and Education Publishing DOI:10.12691/education-6-3-11 Development,

More information

PSYCHOMETRIC PROPERTIES OF CLINICAL PERFORMANCE RATINGS

PSYCHOMETRIC PROPERTIES OF CLINICAL PERFORMANCE RATINGS PSYCHOMETRIC PROPERTIES OF CLINICAL PERFORMANCE RATINGS A total of 7931 ratings of 482 third- and fourth-year medical students were gathered over twelve four-week periods. Ratings were made by multiple

More information

In this chapter we discuss validity issues for quantitative research and for qualitative research.

In this chapter we discuss validity issues for quantitative research and for qualitative research. Chapter 8 Validity of Research Results (Reminder: Don t forget to utilize the concept maps and study questions as you study this and the other chapters.) In this chapter we discuss validity issues for

More information

Bayesian Tailored Testing and the Influence

Bayesian Tailored Testing and the Influence Bayesian Tailored Testing and the Influence of Item Bank Characteristics Carl J. Jensema Gallaudet College Owen s (1969) Bayesian tailored testing method is introduced along with a brief review of its

More information

An Empirical Test of a Postulate of a Mediating Process between Mind Processes Raimo J Laasonen Project Researcher Vihti Finland

An Empirical Test of a Postulate of a Mediating Process between Mind Processes Raimo J Laasonen Project Researcher Vihti Finland 1 An Empirical Test of a Postulate of a Mediating Process between Mind Processes Raimo J Laasonen Project Researcher Vihti Finland Running head: AN EMPIRICAL TEST 2 Abstract The objective of the research

More information

Psychology Research Methods Lab Session Week 10. Survey Design. Due at the Start of Lab: Lab Assignment 3. Rationale for Today s Lab Session

Psychology Research Methods Lab Session Week 10. Survey Design. Due at the Start of Lab: Lab Assignment 3. Rationale for Today s Lab Session Psychology Research Methods Lab Session Week 10 Due at the Start of Lab: Lab Assignment 3 Rationale for Today s Lab Session Survey Design This tutorial supplements your lecture notes on Measurement by

More information

Discrimination Weighting on a Multiple Choice Exam

Discrimination Weighting on a Multiple Choice Exam Proceedings of the Iowa Academy of Science Volume 75 Annual Issue Article 44 1968 Discrimination Weighting on a Multiple Choice Exam Timothy J. Gannon Loras College Thomas Sannito Loras College Copyright

More information

How to interpret results of metaanalysis

How to interpret results of metaanalysis How to interpret results of metaanalysis Tony Hak, Henk van Rhee, & Robert Suurmond Version 1.0, March 2016 Version 1.3, Updated June 2018 Meta-analysis is a systematic method for synthesizing quantitative

More information

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form INVESTIGATING FIT WITH THE RASCH MODEL Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form of multidimensionality. The settings in which measurement

More information

LEDYARD R TUCKER AND CHARLES LEWIS

LEDYARD R TUCKER AND CHARLES LEWIS PSYCHOMETRIKA--VOL. ~ NO. 1 MARCH, 1973 A RELIABILITY COEFFICIENT FOR MAXIMUM LIKELIHOOD FACTOR ANALYSIS* LEDYARD R TUCKER AND CHARLES LEWIS UNIVERSITY OF ILLINOIS Maximum likelihood factor analysis provides

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

Chapter 11 Nonexperimental Quantitative Research Steps in Nonexperimental Research

Chapter 11 Nonexperimental Quantitative Research Steps in Nonexperimental Research Chapter 11 Nonexperimental Quantitative Research (Reminder: Don t forget to utilize the concept maps and study questions as you study this and the other chapters.) Nonexperimental research is needed because

More information

Chapter 3 Tools for Practical Theorizing: Theoretical Maps and Ecosystem Maps

Chapter 3 Tools for Practical Theorizing: Theoretical Maps and Ecosystem Maps Chapter 3 Tools for Practical Theorizing: Theoretical Maps and Ecosystem Maps Chapter Outline I. Introduction A. Understanding theoretical languages requires universal translators 1. Theoretical maps identify

More information

USE AND MISUSE OF MIXED MODEL ANALYSIS VARIANCE IN ECOLOGICAL STUDIES1

USE AND MISUSE OF MIXED MODEL ANALYSIS VARIANCE IN ECOLOGICAL STUDIES1 Ecology, 75(3), 1994, pp. 717-722 c) 1994 by the Ecological Society of America USE AND MISUSE OF MIXED MODEL ANALYSIS VARIANCE IN ECOLOGICAL STUDIES1 OF CYNTHIA C. BENNINGTON Department of Biology, West

More information

Competency Rubric Bank for the Sciences (CRBS)

Competency Rubric Bank for the Sciences (CRBS) Competency Rubric Bank for the Sciences (CRBS) Content Knowledge 1 Content Knowledge: Accuracy of scientific understanding Higher Order Cognitive Skills (HOCS) 3 Analysis: Clarity of Research Question

More information

Empirical Knowledge: based on observations. Answer questions why, whom, how, and when.

Empirical Knowledge: based on observations. Answer questions why, whom, how, and when. INTRO TO RESEARCH METHODS: Empirical Knowledge: based on observations. Answer questions why, whom, how, and when. Experimental research: treatments are given for the purpose of research. Experimental group

More information

INADEQUACIES OF SIGNIFICANCE TESTS IN

INADEQUACIES OF SIGNIFICANCE TESTS IN INADEQUACIES OF SIGNIFICANCE TESTS IN EDUCATIONAL RESEARCH M. S. Lalithamma Masoomeh Khosravi Tests of statistical significance are a common tool of quantitative research. The goal of these tests is to

More information

Essential Skills for Evidence-based Practice Understanding and Using Systematic Reviews

Essential Skills for Evidence-based Practice Understanding and Using Systematic Reviews J Nurs Sci Vol.28 No.4 Oct - Dec 2010 Essential Skills for Evidence-based Practice Understanding and Using Systematic Reviews Jeanne Grace Corresponding author: J Grace E-mail: Jeanne_Grace@urmc.rochester.edu

More information

EVALUATING AND IMPROVING MULTIPLE CHOICE QUESTIONS

EVALUATING AND IMPROVING MULTIPLE CHOICE QUESTIONS DePaul University INTRODUCTION TO ITEM ANALYSIS: EVALUATING AND IMPROVING MULTIPLE CHOICE QUESTIONS Ivan Hernandez, PhD OVERVIEW What is Item Analysis? Overview Benefits of Item Analysis Applications Main

More information

equation involving two test variables.

equation involving two test variables. A CODED PROFILE METHOD FOR PREDICTING ACHIEVEMENT 1 BENNO G. FRICKE University of Michigan COUNSELORS and test users have often been advised to use the test profile in their attempt to understand students.

More information

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz This study presents the steps Edgenuity uses to evaluate the reliability and validity of its quizzes, topic tests, and cumulative

More information

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA Data Analysis: Describing Data CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA In the analysis process, the researcher tries to evaluate the data collected both from written documents and from other sources such

More information

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review Results & Statistics: Description and Correlation The description and presentation of results involves a number of topics. These include scales of measurement, descriptive statistics used to summarize

More information

Research Prospectus. Your major writing assignment for the quarter is to prepare a twelve-page research prospectus.

Research Prospectus. Your major writing assignment for the quarter is to prepare a twelve-page research prospectus. Department of Political Science UNIVERSITY OF CALIFORNIA, SAN DIEGO Philip G. Roeder Research Prospectus Your major writing assignment for the quarter is to prepare a twelve-page research prospectus. A

More information

Conclusion. The international conflicts related to identity issues are a contemporary concern of societies

Conclusion. The international conflicts related to identity issues are a contemporary concern of societies 105 Conclusion 1. Summary of the argument The international conflicts related to identity issues are a contemporary concern of societies around the world. It is only necessary to watch the news for few

More information

Does momentary accessibility influence metacomprehension judgments? The influence of study judgment lags on accessibility effects

Does momentary accessibility influence metacomprehension judgments? The influence of study judgment lags on accessibility effects Psychonomic Bulletin & Review 26, 13 (1), 6-65 Does momentary accessibility influence metacomprehension judgments? The influence of study judgment lags on accessibility effects JULIE M. C. BAKER and JOHN

More information

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories Kamla-Raj 010 Int J Edu Sci, (): 107-113 (010) Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories O.O. Adedoyin Department of Educational Foundations,

More information

The Myers Briggs Type Inventory

The Myers Briggs Type Inventory The Myers Briggs Type Inventory Charles C. Healy Professor of Education, UCLA In press with Kapes, J.T. et. al. (2001) A counselor s guide to Career Assessment Instruments. (4th Ed.) Alexandria, VA: National

More information

European Federation of Statisticians in the Pharmaceutical Industry (EFSPI)

European Federation of Statisticians in the Pharmaceutical Industry (EFSPI) Page 1 of 14 European Federation of Statisticians in the Pharmaceutical Industry (EFSPI) COMMENTS ON DRAFT FDA Guidance for Industry - Non-Inferiority Clinical Trials Rapporteur: Bernhard Huitfeldt (bernhard.huitfeldt@astrazeneca.com)

More information

Highlighting Effect: The Function of Rebuttals in Written Argument

Highlighting Effect: The Function of Rebuttals in Written Argument Highlighting Effect: The Function of Rebuttals in Written Argument Ryosuke Onoda (ndrysk62@p.u-tokyo.ac.jp) Department of Educational Psychology, Graduate School of Education, The University of Tokyo,

More information

Reliability, validity, and all that jazz

Reliability, validity, and all that jazz Reliability, validity, and all that jazz Dylan Wiliam King s College London Introduction No measuring instrument is perfect. The most obvious problems relate to reliability. If we use a thermometer to

More information

Author's response to reviews

Author's response to reviews Author's response to reviews Title: Physiotherapy interventions in scientific physiotherapy publications focusing on interventions for children with cerebral palsy: A qualitative phenomenographic approach.

More information

Measuring and Assessing Study Quality

Measuring and Assessing Study Quality Measuring and Assessing Study Quality Jeff Valentine, PhD Co-Chair, Campbell Collaboration Training Group & Associate Professor, College of Education and Human Development, University of Louisville Why

More information

BACKGROUND CHARACTERISTICS OF EXAMINEES SHOWING UNUSUAL TEST BEHAVIOR ON THE GRADUATE RECORD EXAMINATIONS

BACKGROUND CHARACTERISTICS OF EXAMINEES SHOWING UNUSUAL TEST BEHAVIOR ON THE GRADUATE RECORD EXAMINATIONS ---5 BACKGROUND CHARACTERISTICS OF EXAMINEES SHOWING UNUSUAL TEST BEHAVIOR ON THE GRADUATE RECORD EXAMINATIONS Philip K. Oltman GRE Board Professional Report GREB No. 82-8P ETS Research Report 85-39 December

More information

CONTENT ANALYSIS OF COGNITIVE BIAS: DEVELOPMENT OF A STANDARDIZED MEASURE Heather M. Hartman-Hall David A. F. Haaga

CONTENT ANALYSIS OF COGNITIVE BIAS: DEVELOPMENT OF A STANDARDIZED MEASURE Heather M. Hartman-Hall David A. F. Haaga Journal of Rational-Emotive & Cognitive-Behavior Therapy Volume 17, Number 2, Summer 1999 CONTENT ANALYSIS OF COGNITIVE BIAS: DEVELOPMENT OF A STANDARDIZED MEASURE Heather M. Hartman-Hall David A. F. Haaga

More information

Cambridge Pre-U 9773 Psychology June 2013 Principal Examiner Report for Teachers

Cambridge Pre-U 9773 Psychology June 2013 Principal Examiner Report for Teachers PSYCHOLOGY Cambridge Pre-U Paper 9773/01 Key Studies and Theories Key messages Evaluation should always be explicitly linked to the material (theories and/or research) being evaluated rather than a broad

More information

Chapter 4 DESIGN OF EXPERIMENTS

Chapter 4 DESIGN OF EXPERIMENTS Chapter 4 DESIGN OF EXPERIMENTS 4.1 STRATEGY OF EXPERIMENTATION Experimentation is an integral part of any human investigation, be it engineering, agriculture, medicine or industry. An experiment can be

More information

Comments on David Rosenthal s Consciousness, Content, and Metacognitive Judgments

Comments on David Rosenthal s Consciousness, Content, and Metacognitive Judgments Consciousness and Cognition 9, 215 219 (2000) doi:10.1006/ccog.2000.0438, available online at http://www.idealibrary.com on Comments on David Rosenthal s Consciousness, Content, and Metacognitive Judgments

More information

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling Olli-Pekka Kauppila Daria Kautto Session VI, September 20 2017 Learning objectives 1. Get familiar with the basic idea

More information

Psychology 2019 v1.3. IA2 high-level annotated sample response. Student experiment (20%) August Assessment objectives

Psychology 2019 v1.3. IA2 high-level annotated sample response. Student experiment (20%) August Assessment objectives Student experiment (20%) This sample has been compiled by the QCAA to assist and support teachers to match evidence in student responses to the characteristics described in the instrument-specific marking

More information

Fundamental Concepts for Using Diagnostic Classification Models. Section #2 NCME 2016 Training Session. NCME 2016 Training Session: Section 2

Fundamental Concepts for Using Diagnostic Classification Models. Section #2 NCME 2016 Training Session. NCME 2016 Training Session: Section 2 Fundamental Concepts for Using Diagnostic Classification Models Section #2 NCME 2016 Training Session NCME 2016 Training Session: Section 2 Lecture Overview Nature of attributes What s in a name? Grain

More information

The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016

The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016 The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016 This course does not cover how to perform statistical tests on SPSS or any other computer program. There are several courses

More information

Basic Concepts in Research and DATA Analysis

Basic Concepts in Research and DATA Analysis Basic Concepts in Research and DATA Analysis 1 Introduction: A Common Language for Researchers...2 Steps to Follow When Conducting Research...2 The Research Question...3 The Hypothesis...3 Defining the

More information

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Greg Pope, Analytics and Psychometrics Manager 2008 Users Conference San Antonio Introduction and purpose of this session

More information

Pediatrics Milestones and Meaningful Assessment Translating the Pediatrics Milestones into Assessment Items for use in the Clinical Setting

Pediatrics Milestones and Meaningful Assessment Translating the Pediatrics Milestones into Assessment Items for use in the Clinical Setting Pediatrics Milestones and Meaningful Assessment Translating the Pediatrics Milestones into Assessment Items for use in the Clinical Setting Ann Burke Susan Guralnick Patty Hicks Jeanine Ronan Dan Schumacher

More information

Chapter 1. Research : A way of thinking

Chapter 1. Research : A way of thinking Chapter 1 Research : A way of thinking Research is undertaken within most professions. More than a set of skills, research is a way of thinking: examining critically the various aspects of your day-to-day

More information

Feature encoding and pattern classifications with sequentially presented Markov stimuli*

Feature encoding and pattern classifications with sequentially presented Markov stimuli* Feature encoding and pattern classifications with sequentially presented Markov stimuli* BLL R. BROWN and CHARLES E. AYLWORTH University of Louisville, Louisville, Kentucky 008 The major objective of this

More information

Reliability, validity, and all that jazz

Reliability, validity, and all that jazz Reliability, validity, and all that jazz Dylan Wiliam King s College London Published in Education 3-13, 29 (3) pp. 17-21 (2001) Introduction No measuring instrument is perfect. If we use a thermometer

More information

sample of 85 graduate students at The University of Michigan s influenza, benefits provided by a flu shot, and the barriers or costs associated

sample of 85 graduate students at The University of Michigan s influenza, benefits provided by a flu shot, and the barriers or costs associated Construct Validation of the Health Belief Model K. Michael Cummings, M.P.H. Alan M. Jette, M.P.H. Irwin M. Rosenstock, Ph.D.* A multitrait-multimethod design was employed to assess the construct validity

More information

Chapter 1. Research : A way of thinking

Chapter 1. Research : A way of thinking Chapter 1 Research : A way of thinking Research is undertaken within most professions. More than a set of skills, research is a way of thinking: examining critically the various aspects of your day-to-day

More information

computation and interpretation of indices of reliability. In

computation and interpretation of indices of reliability. In THE CONCEPTS OF RELIABILITY AND HOMOGENEITY C. H. COOMBS 1 University of Michigan I. Introduction THE literature of test theory is replete with articles on the computation and interpretation of indices

More information

Critical Thinking Assessment at MCC. How are we doing?

Critical Thinking Assessment at MCC. How are we doing? Critical Thinking Assessment at MCC How are we doing? Prepared by Maura McCool, M.S. Office of Research, Evaluation and Assessment Metropolitan Community Colleges Fall 2003 1 General Education Assessment

More information

TYPES OF HYPNOTIC DREAMS AND THEIR RELATION TO HYPNOTIC DEPTH 1

TYPES OF HYPNOTIC DREAMS AND THEIR RELATION TO HYPNOTIC DEPTH 1 Journal of Abnormal Psychology, Vol., No., -8 TYPES OF HYPNOTIC DREAMS AND THEIR RELATION TO HYPNOTIC DEPTH CHARLES T. TART Laboratory of Hypnosis Research, Stanford University Several types of experiences

More information

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS Michael J. Kolen The University of Iowa March 2011 Commissioned by the Center for K 12 Assessment & Performance Management at

More information

Comparing Direct and Indirect Measures of Just Rewards: What Have We Learned?

Comparing Direct and Indirect Measures of Just Rewards: What Have We Learned? Comparing Direct and Indirect Measures of Just Rewards: What Have We Learned? BARRY MARKOVSKY University of South Carolina KIMMO ERIKSSON Mälardalen University We appreciate the opportunity to comment

More information

INDIVIDUAL DIFFERENCES, COGNITIVE ABILITIES, AND THE INTERPRETATION OF AUDITORY GRAPHS. Bruce N. Walker and Lisa M. Mauney

INDIVIDUAL DIFFERENCES, COGNITIVE ABILITIES, AND THE INTERPRETATION OF AUDITORY GRAPHS. Bruce N. Walker and Lisa M. Mauney INDIVIDUAL DIFFERENCES, COGNITIVE ABILITIES, AND THE INTERPRETATION OF AUDITORY GRAPHS Bruce N. Walker and Lisa M. Mauney Sonification Lab, School of Psychology Georgia Institute of Technology, 654 Cherry

More information

Author s response to reviews

Author s response to reviews Author s response to reviews Title: The validity of a professional competence tool for physiotherapy students in simulationbased clinical education: a Rasch analysis Authors: Belinda Judd (belinda.judd@sydney.edu.au)

More information

VARIED THRUSH MANUSCRIPT REVIEW HISTORY REVIEWS (ROUND 2) Editor Decision Letter

VARIED THRUSH MANUSCRIPT REVIEW HISTORY REVIEWS (ROUND 2) Editor Decision Letter 1 VARIED THRUSH MANUSCRIPT REVIEW HISTORY REVIEWS (ROUND 2) Editor Decision Letter Thank you for submitting your revision to the Journal of Consumer Research. The manuscript and the revision notes were

More information

The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing

The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing Terry A. Ackerman University of Illinois This study investigated the effect of using multidimensional items in

More information

Reliability Theory for Total Test Scores. Measurement Methods Lecture 7 2/27/2007

Reliability Theory for Total Test Scores. Measurement Methods Lecture 7 2/27/2007 Reliability Theory for Total Test Scores Measurement Methods Lecture 7 2/27/2007 Today s Class Reliability theory True score model Applications of the model Lecture 7 Psych 892 2 Great Moments in Measurement

More information

Issues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy

Issues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy Industrial and Organizational Psychology, 3 (2010), 489 493. Copyright 2010 Society for Industrial and Organizational Psychology. 1754-9426/10 Issues That Should Not Be Overlooked in the Dominance Versus

More information

1. The Role of Sample Survey Design

1. The Role of Sample Survey Design Vista's Approach to Sample Survey Design 1978, 1988, 2006, 2007, 2009 Joseph George Caldwell. All Rights Reserved. Posted at Internet website http://www.foundationwebsite.org. Updated 20 March 2009 (two

More information

Gender-Based Differential Item Performance in English Usage Items

Gender-Based Differential Item Performance in English Usage Items A C T Research Report Series 89-6 Gender-Based Differential Item Performance in English Usage Items Catherine J. Welch Allen E. Doolittle August 1989 For additional copies write: ACT Research Report Series

More information

VERDIN MANUSCRIPT REVIEW HISTORY REVISION NOTES FROM AUTHORS (ROUND 2)

VERDIN MANUSCRIPT REVIEW HISTORY REVISION NOTES FROM AUTHORS (ROUND 2) 1 VERDIN MANUSCRIPT REVIEW HISTORY REVISION NOTES FROM AUTHORS (ROUND 2) Thank you for providing us with the opportunity to revise our paper. We have revised the manuscript according to the editors and

More information

2 Critical thinking guidelines

2 Critical thinking guidelines What makes psychological research scientific? Precision How psychologists do research? Skepticism Reliance on empirical evidence Willingness to make risky predictions Openness Precision Begin with a Theory

More information

Item Writing Guide for the National Board for Certification of Hospice and Palliative Nurses

Item Writing Guide for the National Board for Certification of Hospice and Palliative Nurses Item Writing Guide for the National Board for Certification of Hospice and Palliative Nurses Presented by Applied Measurement Professionals, Inc. Copyright 2011 by Applied Measurement Professionals, Inc.

More information

The complete Insight Technical Manual includes a comprehensive section on validity. INSIGHT Inventory 99.72% % % Mean

The complete Insight Technical Manual includes a comprehensive section on validity. INSIGHT Inventory 99.72% % % Mean Technical Manual INSIGHT Inventory 99.72% Percentage of cases 95.44 % 68.26 % -3 SD -2 SD -1 SD +1 SD +2 SD +3 SD Mean Percentage Distribution of Cases in a Normal Curve IV. TEST DEVELOPMENT Historical

More information

Word Association Type and the Temporal Stacking of Responses

Word Association Type and the Temporal Stacking of Responses JOURNAL OF VERBAL LEARNING AND VERBAL BEHAVIOR 9, 207-211 (1970) Word Association Type and the Temporal Stacking of Responses JOHN C. MASTERS University of Minnesota, Minneapolis, Minnesota 55455 GARY

More information

A brief history of the Fail Safe Number in Applied Research. Moritz Heene. University of Graz, Austria

A brief history of the Fail Safe Number in Applied Research. Moritz Heene. University of Graz, Austria History of the Fail Safe Number 1 A brief history of the Fail Safe Number in Applied Research Moritz Heene University of Graz, Austria History of the Fail Safe Number 2 Introduction Rosenthal s (1979)

More information

DON M. PALLAIS, CPA 14 Dahlgren Road Richmond, Virginia Telephone: (804) Fax: (804)

DON M. PALLAIS, CPA 14 Dahlgren Road Richmond, Virginia Telephone: (804) Fax: (804) DON M. PALLAIS, CPA 14 Dahlgren Road Richmond, Virginia 23233 Telephone: (804) 784-0884 Fax: (804) 784-0885 Office of the Secretary PCAOB 1666 K Street, NW Washington, DC 20006-2083 Gentlemen: November

More information

FSA Training Papers Grade 7 Exemplars. Rationales

FSA Training Papers Grade 7 Exemplars. Rationales FSA Training Papers Grade 7 Exemplars Rationales Rationales for Grade 7 Exemplars Reading Grade 7 Reading Exemplar #1: Score 3 Comprehension of the passages and task clearly evident Generally purposeful

More information

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research 2012 CCPRC Meeting Methodology Presession Workshop October 23, 2012, 2:00-5:00 p.m. Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy

More information

FMEA AND RPN NUMBERS. Failure Mode Severity Occurrence Detection RPN A B

FMEA AND RPN NUMBERS. Failure Mode Severity Occurrence Detection RPN A B FMEA AND RPN NUMBERS An important part of risk is to remember that risk is a vector: one aspect of risk is the severity of the effect of the event and the other aspect is the probability or frequency of

More information

Social inclusion as recognition? My purpose here is not to advocate for recognition paradigm as a superior way of defining/defending SI.

Social inclusion as recognition? My purpose here is not to advocate for recognition paradigm as a superior way of defining/defending SI. Social inclusion as recognition? My purpose here is not to advocate for recognition paradigm as a superior way of defining/defending SI. My purpose is just reflective: given tradition of thought and conceptual

More information

PEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW. Ball State University

PEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW. Ball State University PEER REVIEW HISTORY BMJ Open publishes all reviews undertaken for accepted manuscripts. Reviewers are asked to complete a checklist review form (see an example) and are provided with free text boxes to

More information

Audio: In this lecture we are going to address psychology as a science. Slide #2

Audio: In this lecture we are going to address psychology as a science. Slide #2 Psychology 312: Lecture 2 Psychology as a Science Slide #1 Psychology As A Science In this lecture we are going to address psychology as a science. Slide #2 Outline Psychology is an empirical science.

More information

Generalization and Theory-Building in Software Engineering Research

Generalization and Theory-Building in Software Engineering Research Generalization and Theory-Building in Software Engineering Research Magne Jørgensen, Dag Sjøberg Simula Research Laboratory {magne.jorgensen, dagsj}@simula.no Abstract The main purpose of this paper is

More information

Checklist of Key Considerations for Development of Program Logic Models [author name removed for anonymity during review] April 2018

Checklist of Key Considerations for Development of Program Logic Models [author name removed for anonymity during review] April 2018 Checklist of Key Considerations for Development of Program Logic Models [author name removed for anonymity during review] April 2018 A logic model is a graphic representation of a program that depicts

More information

Recognizing Ambiguity

Recognizing Ambiguity Recognizing Ambiguity How Lack of Information Scares Us Mark Clements Columbia University I. Abstract In this paper, I will examine two different approaches to an experimental decision problem posed by

More information

Shiken: JALT Testing & Evaluation SIG Newsletter. 12 (2). April 2008 (p )

Shiken: JALT Testing & Evaluation SIG Newsletter. 12 (2). April 2008 (p ) Rasch Measurementt iin Language Educattiion Partt 2:: Measurementt Scalles and Invariiance by James Sick, Ed.D. (J. F. Oberlin University, Tokyo) Part 1 of this series presented an overview of Rasch measurement

More information

Information Structure for Geometric Analogies: A Test Theory Approach

Information Structure for Geometric Analogies: A Test Theory Approach Information Structure for Geometric Analogies: A Test Theory Approach Susan E. Whitely and Lisa M. Schneider University of Kansas Although geometric analogies are popular items for measuring intelligence,

More information

By Gordon Welty Wright State University Dayton, OH 45435

By Gordon Welty Wright State University Dayton, OH 45435 "Evaluation Research and Research Designs," in Peter Taylor and Doris Cowley (eds) Readings in Curriculum Evaluation, Dubuque, Iowa: Wm. C. Brown (1972), pp. 81-83. By Gordon Welty Wright State University

More information

Basic concepts and principles of classical test theory

Basic concepts and principles of classical test theory Basic concepts and principles of classical test theory Jan-Eric Gustafsson What is measurement? Assignment of numbers to aspects of individuals according to some rule. The aspect which is measured must

More information

DRAFT (Final) Concept Paper On choosing appropriate estimands and defining sensitivity analyses in confirmatory clinical trials

DRAFT (Final) Concept Paper On choosing appropriate estimands and defining sensitivity analyses in confirmatory clinical trials DRAFT (Final) Concept Paper On choosing appropriate estimands and defining sensitivity analyses in confirmatory clinical trials EFSPI Comments Page General Priority (H/M/L) Comment The concept to develop

More information

THE NATURE OF OBJECTIVITY WITH THE RASCH MODEL

THE NATURE OF OBJECTIVITY WITH THE RASCH MODEL JOURNAL OF EDUCATIONAL MEASUREMENT VOL. II, NO, 2 FALL 1974 THE NATURE OF OBJECTIVITY WITH THE RASCH MODEL SUSAN E. WHITELY' AND RENE V. DAWIS 2 University of Minnesota Although it has been claimed that

More information

How is ethics like logistic regression? Ethics decisions, like statistical inferences, are informative only if they re not too easy or too hard 1

How is ethics like logistic regression? Ethics decisions, like statistical inferences, are informative only if they re not too easy or too hard 1 How is ethics like logistic regression? Ethics decisions, like statistical inferences, are informative only if they re not too easy or too hard 1 Andrew Gelman and David Madigan 2 16 Jan 2015 Consider

More information