CHARACTERISTICS OF EXAMINEES WHO LEAVE QUESTIONS UNANSWERED ON THE GRE GENERAL TEST UNDER RIGHTS-ONLY SCORING

Size: px

Start display at page:

Download "CHARACTERISTICS OF EXAMINEES WHO LEAVE QUESTIONS UNANSWERED ON THE GRE GENERAL TEST UNDER RIGHTS-ONLY SCORING"

Scot Hamilton
6 years ago
Views:

SCORING Jerilee Grandy GRE Board Professional Report No.

presents the findings of a research project funded by and

1 CHARACTERISTICS OF EXAMINEES WHO LEAVE QUESTIONS UNANSWERED ON THE GRE GENERAL TEST UNDER RIGHTS-ONLY SCORING Jerilee Grandy GRE Board Professional Report No P ETS Research Report November 1987 This report presents the findings of a research project funded by and carried out under the auspices of the Graduate Record Examinations Board.

2 Characteristics of Examinees Who Leave Questions Unanswered on the GRE General Test under Rights-only Scoring Jerilee Grandy GRE Board Professional Report No P November 1987 Educational Testing Service, Princeton N.J

4 Content3 Page Abstract Background Purpose Part 1: Analysis Method Results of GRE File i Distributions of items not answered 6 Correlations of blank item between test sections 6 Distributions by demographic group 7 Comparisons between "nonguessers" and others 7 Effects of not guessing on scaled score 10 Regression analyses Part 2: Questionnaire Survey Purpose Sampling Questionnaire design and pretest 15 Survey administration Results Response rate Characteristics Questionnaire of sample responses Consistency of responses Discussion of questionnaire Summary and Conclusions References Tables Appendices findings

5 Abstract The purpose of the study was to determine why some examinees taking the GRE General Test leave items blank when they have been told that they will not be penalized for guessing and have been instructed to guess on questions they cannot answer. Data from the October 1984 test administration were analyzed to establish whether nonguessers were different in any way from those who completed, or nearly completed, each test section. Nonguessers were defined as examinees who left large numbers of items blank, whether by omitting or by not finishing a test section. The analyses in this study did not separate these two ways of leaving items blank. A sample of non-guessers was also surveyed by mail to determine whether they understood the instructions and why they omitted items. The tendency to leave items blank was most evident among the following of examinees: women non-whites, particularly Blacks resident aliens or foreign non-citizens members of families with less than average formal education older examinees examinees who were out of school or already in graduate school examinees who had taken the GRE previously, under old instructions examinees planning to study humanities or social sciences examinees with lower than average undergraduate grades examinees with lower than average GRE scores, even when the scores were corrected for not guessing In regression analyses, GRE scores provided the best estimate of number of items left unanswered. For the verbal section, ethnicity was the second best predictor. Age, sex, and GPA made small but significant contributions to the equation. For the quantitative section, age was the second best predictor after test score, with sex and ethnicity each contributing a small amount. The analytical score was also predicted best by test score and second by age. Ethnicity and GPA provided small but significant weights. Survey results suggested that many nonguessers did not fully understand guessing strategies and may have been confused by the new instructions because they conflicted with what the examinees had learned earlier. Their questionnaire responses were often inconsistent, and many claimed to have read and heard instructions not to guess.

6 Background The psychometric literature contains numerous articles on the pros and cons of formula scoring versus rights-only scoring of multiple-choice tests. Under formula scoring, examinees are assessed a penalty for guessing whereby the score is computed by subtracting one fourth (for fivealternative items) of the number of wrong answers from the number right. Under rights-only scoring, examinees are awarded points for the number of correct answers, with no penalty for wrong answers. Some of the pros and cons of each scoring method have been summarized and discussed by Angoff (1979) and by Angoff and Schrader (1984). Psychometric arguments in favor of formula scoring have been advanced by Lord (1963, 1974, 1975) and by Lord and Novick (1968, p. 308). Their studies suggest that formula-scored tests are more valid, more reliable, and more efficient than rights-scored tests. By examining item-response curves, Lord (1980) has suggested that difficult tests are especially unreliable for low-scoring students because they guess more often than they should, and that for them partial information is likely to be misinformation. Angoff and Schrader (1984) conducted,a study to evaluate two competing hypotheses. The differential effects hypothesis states that some students, when tested under formula directions, omit items about which they have useful partial knowledge. These students would presumably benefit under rights-only directions. The invariance hypothesis states that examinees would score no better than chance expectation on those items that they would omit under formula scoring but would answer under rights-only scoring. Results of the study tended to support the invariance hypothesis, and thus to support formula scoring. The results also suggested that formula scoring has the effect of compensating for differences in guessing strategies. Aside from the psychometric arguments favoring formula scoring, many people view pure random guessing as a test-taking strategy to be discouraged because it focuses on the student's interest in improving his or her test score without sufficient regard to the educational outcome being assessed. There have also been a number of arguments advanced in favor of rights-only scoring. Instructions to the examinee are more direct and easier to follow. Many examinees fail to understand the logic of the guessing strategy described for formula scoring. Some investigators have argued that personality characteristics play a greater role in formula-scored examinations than in rights-only examinations because the testtaker is faced with the question of making a number of decisions concerning the advisability of, or the relative risks involved in, following a hunch. Each item about which the examinee is uncertain demands a decison-making process. Many such decisions introduce irrelevant sources of error variance associated with the examinee's willingness to take risks and his or her level of confidence. 1

7 Finally, Glass and Wiley tests are more reliable than Lord's assertion. (1964) have claimed that rights-only scored formula-scored tests, thus contradicting The Research Committee of the GRE Board considered the various competing arguments for each type of scoring, and, in 1980, decided to move from a formula-scoring mode to a rights-only mode for the General Test. One of their major considerations was the effect of instructions on student behavior in the test-taking situation. Members of the committee also felt that uninformed guessing would cause problems for any item based equating scheme, and that the psychometric advantages favoring either formula scoring or rights-only scoring are small. The committee was also pursuaded that students should always guess because guessing is to their advantage (GRE Board minutes, April 1980). Since October 1982, the GRE Information Bulletin, in its discussion of test taking strategy, has instructed examinees as follows: On the General (Aptitude) Test, your scores will be determined by the number of best answers you select from the choices given. No penalty is assessed for wrong answers. Therefore, even if you are uncertain about a question, it is better that you guess at the answer rather than not respond at all. You risk nothing by guessing. Failure to respond, however, eliminates the possibility of raising your score by selecting the correct answer. The explanation is also given on the back cover of the test booklet, and it is read aloud to the examinees, along with other instructions, once at the start of the exam. In spite of having this information, some examinees still omit items. Prior to the present study, we estimated that between 7,600 and 20,000 examinees a year--i.e., between 4 and 11 percent of the total examinee population-- penalize themselves at least 10 scaled score points and possibly 40 points or more on theverbal section by not guessing. The estimates were higher for the quantitative section and higher still for the analytical section. Questions therefore have arisen as to whether failure to guess under rights-only scoring actually has a serious effect on examinees' scores, and, if so, whether instructions could be changed to encourage examinees to guess, and whether steps should be taken to compensate in some way for not guessing.

8 Purpose The purpose of this study was to answer three questions: 1. Do examinees who tend to leave items unanswered differ in any respects from those who complete all items? 2. How seriously are examinees' scores actually affected by not guessing? 3. What reasons do examinees give for not guessing on the questions they leave blank? To answer these questions, we analyzed data from the October 1984 test administration and conducted a mail survey of a sample of examinees who failed to answer more than 30 items altogether. In Part 1 of the study, we addressed the first two questions, basing our conclusions on data from the entire population that took the test in October. Part 2 describes the survey and how the findings provide some answers to the third question.

9 Part 1: Analysis of GRE File Method To answer questions regarding the effects of item omission on test scores and the relation of item omission to examinee characteristics, we analyzed data from the population that took the GRE General test in October That test administration contained the largest number of examinees that year. The data base we used contained information from the background questionnaire, converted (scaled) test scores, and the number of items not answered. While it might have been preferable to analyze separately the number of items omitted and the number not reached, this would have required that all tests (five different forms) be specially restored. We considered the following arguments before deciding not to restore the tests. First, effects on the test scores would be the same. In other words, since omitted items and not-reached items are both scored zero, they did not have to be separated to study the effects of leaving items unanswered. Second, although we did not know this for certain, we thought it likely that examinees would use the same general strategy for items they could not answer as for items they did not reach. Even if they did not use the same strategy, we could obtain the information we needed from the combination of both kinds of blank items. If we could estimate the number of items left blank from other examinee characteristics (e.g., age, race), we would know that there is a relationship between those characteristics and the tendency to omit items and/or to leave items blank that cannot be finished in the time allowed. In either case, decisions regarding the next step to take were likely to be the same, With these considerations in mind, we decided to conduct the analyses in Part 1 on all items not answered, namely, the sum of those omitted and those not reached. Analyses of the total population consisted of the following: 1. frequency distributions of numbers of items not answered, in the total test and by test section, 2. correlations among the number of unanswered items in the verbal, quantitative, and analytical sections, 3. frequency distributions of numbers of items not answered, by demographic groups. We then defined "nonguessers" as the subpopulation who failed to answer more than 30items.The decision to use 30 items was not based on any statistical consideration. Rather, we reasoned that it was a sufficiently large number to exclude examinees who might have inadvertently missed a few questions, and it was small enough to exclude those who gave 4

10 up after attempting the first few questions. There was no a priori reason to believe that the choice of 30 items would yield results substantively different from those obtained using 25, 35, or some other number in the same general range. Analyses consisted of comparisons between nonguessers and others on background characteristics and ability measures. Finally, we estimated the magnitude of the effects of item nonresponse on test scores. Analyses required that the reported scaled scores be "corrected" for not guessing. To make this correction on the verbal and analytical sections (which have five response alternatives), we estimated that one in five unanswered items would have been correct if the examinee had guessed completely at random. Therefore, for every five items not answered, the examinee's score was raised one raw score point. For the quantitative section, half of the items have five alternatives and half have four. Under the assumption that the number of unanswered items was split evenly over the two sections, for half of the unanswered items we converted one to five and the other half we converted one to four. Next we needed to convert the raw score points gained into scaled score points to be added. In the October administration, more than one test form was used, and each had a slightly different conversion. Because the conversions were not radically different from one form to another, for the purposes of this study we averaged the conversions and used the same equation for everyone, regardless of the form he or she took. For the verbal section, 1 raw score point was set equal to 9 scaled score points. For the quantitative section the conversion was 1 to 13, and for the analytical section we used 1 to 15. To illustrate how the corrections were done, suppose an examinee had a verbal scaled score of 480 and had left 15 items unanswered. If he had guessed on those 15 items, his score would have been raised 3 raw score points (one for each five items left blank). Those 3 points would then be converted to 27 scaled score points because the raw-score-to-scaled-score conversion for the verbal section was 1 to 9. Finally, the 27 points would be added to the 480 to produce a "corrected" score of 507. The following simplified equations were used to perform all corrections: V' = V + 1.8bv Q' = Q + 2.9bq A' = A + 3.0ba where V', Q', and A' are the corrected scores, V, Q, and A are the rightsonly scores, and b,, b, and b, are the numbers of verbal, quantitative, and analytical items le F t blank. After correcting the scaled scores for not guessing, we examined the effects of not guessing on the scaled scores of nonguessers. In the last set of analyses, we used stepwise regressions to explore 5

11 the extent to which we could estimate unanswered from their corrected scaled ables. the number of items examinees left scores and certain background vari- Results The total number of examinees who took all three sections of the GRE General Test in October 1984 was 55,656. Of these, only 0.5 percent failed to specify their sex, while 19.2 percent omitted the question on citizenship, and 27.5 percent omitted ethnic identity. It is important to note, however, that the background questionnaire instructed examinees (in very small print) to skip the question on ethnicity unless they were U.S. citizens or resident aliens. In fact, it told them that they could skip all of the questions if they had taken the test within the past year and the answers to all questions had not changed. The fact that many examinees omitted certain items in the background questionnaire and, furthermore, did not always follow the branching instructions, created difficulties in the analyses of test item omission. We will discuss these in greater detail in the section on analysis by demographic group. Distributions --- of items not answered. Table 1 shows the distribution of the total number of items not answered over the entire test. Nearly 44 percent of the population answered every item. One-tenth of the examinees, however, left 12 or more items blank, and 4.7 percent--nearly 3,000 examinees --failed to answer more than 20 items. This indicates that a sizable number of test-takers either did not understand the instructions to guess or chose not to follow them. As we shall see later, the decision to leave items blank can have a considerable effect on the resulting test score. If we examine the frequency distributions separately for each of the test sections, we see that the greatest number of blank items occurred on the analytical section and the fewest number on the verbal section. From Table 2 we see that three-fourths of the population answered all of the verbal questions. One percent, however, left 18 or more items unanswered. On the quantitative section (Table 3), 72 percent answered every question. But on the analytical section (Table 4), only 61 percent answered all items. Two percent left more than 15 questions blank, and 10 percent failed to answer more than 6. Correlations of blank items between test sections. It is reasonable to --- hypothesize that a person's tendency to guess or not to guess would apply to all test sections. We would expect, therefore, some correlation between the numbers of items not answered across sections. Table 5 shows that they are correlated, the highest correlation being 0.62 between the quantitative and analytical sections. The correlations obtained here suggest that the individual examinee is using a consistent test-taking strategy, with regard to guessing, for all test sections.

12 Distributions by demographic group. In the total population, just slightly more than half of the examinees (51.8 percent) were women. From Table G we see that females tended to leave more items blank than did males, regardless of the test section. This pattern was most prominent in the quantitative section where the women not answering 26 or more items outnumbered the men 2 to 1. Patterns of non-response among the various ethnic groups and among those of different citizenship status were also quite evident. Analyses of nonresponse to ethnicity and to citizenship were confounded, however, by the fact that noncitizens did not generally answer the ethnicity question. To avoid confounding ethnicity and citizenship, therefore, the distribution of items left unanswered by ethnicity as shown in Table 7 is based only on those who indicated that they were U.S. citizens. Nearly all ethnic minorities left a disproportionately high number of items blank on all three sections. Ethnic differences were greatest on the verbal section and least on the quantitative. Blacks were by far the most likely to leave large numbers of items unanswered. Among those leaving more than 25 items unanswered on the verbal section, over one-fifth were Black, while only 4 percent of all U.S. citizens taking the test were Black. Foreign noncitizens and resident aliens left more items unanswered than did U. S. citizens, especially on the verbal and analytical sections (Table 8). On the quantitative section, however, only resident aliens left a disproportionately large number of items unanswered. It is not surprising that U. S. citizens tended to answer more items, particularly in the verbal section. The real difference is probably English language proficiency rather than test-taking strategy. Table 9 confirms this expectation. On the verbal section, in particular, the number of examinees leaving items blank was quite large. Only 11.5 percent of the population indicated that English was not their primary language, while over 30 percent of those examinees omitting more than 25items were primarily non- English speaking. Comparisons between "nonguessers" and others. From these analyses, it is evident that examinees who choose not to guess at items they cannot answer have, on the average, different characteristics from those who do guess. To investigate their characteristics further, we defined a group we called "nonguessers." These were the examinees who left more than 30 items blank in the entire test. Based on the 30-item cutoff, there were 890 non guessers in the population. The purpose of the analysis was to see how the nonguessers differed, on the average, from the rest of the population. We subdivided the nonguessers and the remaining examinees each into six subgroups: White males, White females, Black males, Black females, "other" males, and "other" females. Table 10 shows the numbers of examinees in each of these categories among the nonguessers. The smallest cell, which contained only four examinees, contained "other" males. This group was therefore not included in the subgroup comparisons. The largest group was White females. In fact, it is very clear from this table that the number of women among the nonguessers was more than three times the number of men.

13 Among the 890 nonguessers, 80 either omitted the item on citizenship or indicated that they were not citizens. For the remainder of the analyses comparing guessers with nonguessers, we used data only from U.S. citizens. Table 11 confirms that, among U.S. citizens, there were disproportionately high numbers of minorities and females among the nonguessers. Nearly 12 percent of the nonguessing population were Black females, while less than 3 percent of the remaining population was composed of Black females. Examining other items in the background questionnaire, we found that fewer than one-third of the nonguessers were currently enrolled as college seniors, whereas 44 percent of the remainder of the examinees were seniors (Table 12). Nore of the nonguessers were either in graduate school already or were out of school and had bachelor's or master's degrees. Breaking this down by sex and ethnic subgroup, we examined the percentages of test-takers who were college seniors. In Table 13 we see that among White males, 30 percent of the nonguessers were seniors, while 49 percent of the rest of the examinees were seniors. A similar relationship held for White females --29 percent of the nonguessers and 41 percent of the remaining test-takers were seniors. Blacks, both males and females, showed little difference between nonguessers and the remaining examinees. Among "other" females, there was some difference and that difference was in the same direction as for Whites, but not as large. Consistent with these findings is an apparent relationship between guessing behavior and age. Table 14 shows that nonguessers tended, on the average, to be older. Among White males, they were two and one-half years older than other examinees. For White females, the difference was 3.6 years. The differences were not quite as great for the other groups, but they did exist. If older examinees were more likely to remember and to follow the old test instructions, we would also expect more of them to have taken an old form of the GRE prior to 1982 when the guessing instructions changed. The background questionnaire asked whether they had taken the GRE within the past year or prior to that date (prior to October 1983). Assuming that at least some of those who took it prior to 1983 actually took it prior to 1982, we would expect to find a slightly higher percentage of nonguessers in this category. In fact, among White males, over 15 percent of the nonguessers reported taking the GRE prior to October 1983, while only 9 percent of the remainder of the examinees took the test before that date (Table 15). A similar pattern held for Whites of both sexes and for "other" females. Comparing groups on English language proficiency, we found some differences among the sex and ethnic groups (Table 16). Among Black females, nonguessers were about the same as the rest of the population--almost everyone (98 percent) claimed that English was their primary language. The greatest difference was among "other" females, where only 53 percent of the nonguessers reported that English was their primary language compared with 68 percent of the rest of the population of "other" females. For Whites, regardless of sex, there was little difference between the nonguessers and others, probably because a very high percentage were native English 8

14 speakers. One additional background variable we analyzed was intended major field. Because of the small numbers planning to enter some fields, we did not analyze the sex and ethnic groups separately. Furthermore, we studied only four large major field areas: humanities, social sciences, biological sciences, and physical sciences. Table 17 shows that the greatest difference in major field preference was in the physical sciences. Among nonguessers, only 12 percent planned to major in a physical science, while 22 percent of the rest of the population chose this area. A disproportionately large number of nonguessers selected humanities and social sciences. Finally, we compared the academic achievement of nonguessers with that of other examinees by analyzing grade point average (GPA) and GRE scores. The grade distributions in Table 18 indicate that nonguessers generally had lower grades during their last two years of college. Among the nonguessers, only 11.5 percent reported having an "A" average, while 16.8 percent of the remaining population were "A" students. Likewise, 26.7 percent of the nonguessers and 35.3 percent of the other examinees claimed that they had an "A-" average. Below this level--from "B" down to "C-" -- we found a greater proportion of nonguessers. Consistent with lower grades among nonguessers were lower GRE scores. We compared the scores as they were reported under rights-only scoring, and, in addition, we "corrected" them for not guessing as described earlier. Table 19 shows the mean verbal scores both ways. Looking at the rights-only score averages first, we see that among White males, nonguessers scored on the average more than 100 points lower than the remaining test takers. The differences in the converted scores of nonguessers and other examinees were somewhat less for the other subgroups; among Black males, nonguessers scored an average of 35 points lower than other examinees. The standard deviations under rights-only scoring ranged from 102 (Black females) to 132 ("other" females) for the population excluding non guessers. For nonguessers, they ranged from 80 to 108. In standard deviation units, therefore, some of the differences between nonguessers and others were considerable. Among "other" females, the difference in rightsonly score averages between nonguessers and other examinees was two-thirds of a standard deviation. Using the formulas derived earlier, we computed the mean verbal scores of each group as if they had guessed at random on the unanswered items. The differences between nonguessers and the rest of the population were reduced by this correction. The corrected difference for White males was 81 points (compared with 108 when uncorrected); for Black males the difference was reduced to 9 points (from 37). We still see that there were differences in verbal scores between non guessers and the rest of the examinees. The difference was almost negligible for Black males but quite large for White males. It is worth mentioning that the standard deviations of the corrected scores are slightly smaller, but only by about 2 points. Correcting for not guessing truly 9

15 reduced the differences between nonguessers and other examinees, but the differences were still real, especially for Whites and "others." Quantitative score averages showed a similar pattern but with even greater differences between nonguessers and others (Table 20). Using rights-only scores, we see that among White males, nonguessers scored an average of 132 points lower than other White males. Among Black females, the difference was only 36 points. When we correct for not guessing using the formulas derived earlier, the differences are reduced to 102 for White nales and 8 for Black females. The standard deviation of the corrected scores of White males was 111 for nonguessers and 123 for the remaining examinees. We therefore still see differences of nearly a standard deviation between the quantitative scores of nonguessers and those of the White male population. Finally, when we looked at the analytical score differences, we found the same pattern (Table 21). The difference was again greatest for White males, with nonguessers scoring an average of 161 points lower than other White males. Slack females showed a difference of 64 points. The differences between nonguessers and the rest of the populations were even larger than those found in the verbal or Quantitative sections, and the standard deviations were about the same. For White males, the standard deviation of the scores for nonguessers was only 89; for the rest of the White male population, it was 121. Correcting the scores for not guessing, we found that White males who chose not to guess still scored an average of 113 points lower than other White males. The difference for Black females was 21 points. The standard deviations for White males were 57 and 120 for nonguessers and other examinees respectively. Again, we found a difference on the order of a standard deviation between nonguessers and the rest of the population of White males. Differences for the other groups were less, but they were still sizable, especially for non-blacks. What we conclude from these analyses is not surprising. Even if we guess for the examinee who"has left items blank, those who leave large numbers of items unanswered will still obtain lower than average scores. Effects of not guessing on scaled score. -- What is perhaps more important to discuss are the possible<ffects of guessing for the examinee who leaves items blank. On the average, correcting for not guessing would raise the mean score a point or two. But it is not the mean that concerns us; it is the effect on the scores of those who omit many items that may be considerable. If a single item on the verbal section affects the converted score 9 points, an examinee does not have to leave out many items before his or her score is drastically affected, perhaps enough to influence acceptance to graduate school. Among White males in our group of nonguessers, the average verbal score would have been raised from 420 to

16 A single item on the quantitative section affected a converted score even more. Among the White males defined as nonguessers, the average quantitative score would have been raised from 466 to 498. The average alack male nonguesser, whose mean score was only 378, would at least have gone over the 400 mark to 410. The greatest differences between the converted scores and the corrected converted scores occurred on the analytical section. For nonguessers, the difference would have been around 50 points (Table 21). The average score for White males who did not guess was 414; if their scores had been corrected for not guessing, that average would have been 467. For Black females, the average would have been raised from 343 to 393. While the average scores obtained by nonguessers still may not be high enough for admission to a selective university, not all nonguessers should be thought of as fitting the average of the group we defined. Considering that the standard deviation for our group was around 100, some nonguessers clearly have scores in a higher range, and it is possible that, for them, guessing or not guessing could affect their admission to graduate school. Regression analyses. The descriptive analyses showed a number of variables to be related to item omission. These variables were undoubtedly confounded. We expect citizenship and ethnic identity to be related to English language proficiency and for age to be related to prior test experience. To explore which variables might be most strongly related to item omission, we conducted several regression analyses, selecting as independent variables relevant background data and scaled test scores corrected for not guessing. The first analysis was performed on all examinees who were U. S. citizens. It consisted of a stepwise regression in which the number of items left unanswered was regressed onto scaled score, age, ethnic category, sex, GPA during the last two years of college, and number of years out of school. The scaled score was corrected for not guessing as described earlier. The reason for using the corrected score was that we would expect it to provide a better estimate of the examinee's true score on the test section. Ethnic category was coded simply 0 or 1, representing White or non-white. Sex was defined as 0 for male and 1 for female. The analysis was performed only on U. S. citizens because the noncitizens were instructed not to answer the ethnicity item, and ethnicity was an important variable to investigate. Results of this regression analysis are shown in Table 22. For each of the three test sections, the score explained the greatest proportion of the variance in numbers of items unanswered. For the verbal section, ethnic category was stepped in second;age was third, sex was fourth, and GPA was last. Number of years out of school had no significant weight, possibly because it was highly correlated with age. For the quantitative and analytical sections, age had a greater weight than did ethnic category. Based on these results we would conclude that test score is the best predictor of guessing behavior-- low scoring examinees leave more items blank than do high-scoring examinees, regardless of race, sex, age, or achievement in college. Quite clearly, this is not a very profound discovery. Those who omit items must necessarily obtain lower scores than the bulk of the examinee population simply because the population as a 11

17 whole includes those who know the answers. What we would really like to know is whether, if we matched omitters with random guessers on the number of items they could not answer, we would find a significant difference in some other characteristic. Unfortunately, we cannot do that with the available information. What Table 22 does tell us, however, is that if we hold test score constant, four other variables still have a small but significant relationship with item omission. For the verbal section, non-whites leave more items blank than do Whites, as do older examinees. For the quantitative and analytical sections, age seems to be the primary predictor of item omission. In fact, on the analytical section, age is weighted nearly as heavily as test score, and ethnic category is relatively unimportant. A question that naturally arises from this analysis is whether these relationships hold over the full range of abilities. Among low-scoring examinees, item omission may be related to different variables than it is among high-scoring examinees. To investigate this possibility, we divided examinees into five different analysis groups (quintiles) according to their scores. For each group, the regression was recomputed using the same independent variables (excluding test score) to estimate the number of items left unanswered. Results for the verbal section are shown in Table 23. The first result worth noting is the pattern of means and percentages for the five groups. The lowest quintile was 26 percent non-white, compared with 11 percent of the second quintile. The highest quintile was only 7 percent non-white. Age was also seen to be somewhat related to test-score level. The oldest examinees, on the average, were in the lowest quintile; the highest three quintiles contained examinees of the same average age. GPA for the last two years of college was consistent with test-score group, and the percentage who were female was highest in the lower quintiles. The multiple correlations show that these four variables predicted item omission best in the lowest scoring group, where ethnic category carried the greatest regression weight. Second was age, third was GPA, and last was sex. These weights changed in relative importance, however, for the other groups. For the second quintile, age and sex were about equally important, ethnic category was third, and GPA did not enter the equation at all. In the middle quintile, the multiple correlation was only 0.06 between number of omits and all four predictors, none of which carried a great weight. In the top two quintiles, ethnic category dropped out entirely, even though minorities constituted at least 7 percent of the examinees in each of these high-scoring groups. It appears from this analysis that item omission is best explained among those with low scores, and that, among low-scoring examinees, item omission is disproportionately high among non-whites, regardless of their age, sex, or college grades. In addition, age, sex, and GPA are each related to item omission, though not quite so heavily as ethnic identity. The same analyses were conducted for the quantitative test section; results are shown in Table 24. Here we see that the ethnic composition of the lowest quintile was 23 percent non-white, and about 9 or 10 percent at 12

18 all other score levels. Those in the lower scoring groups were also older, on the average. There was a considerable difference in the sex ratios across score groups, with the lowest quintile being 76 percent female and the highest quintile being only 33 percent female. GPA was consistent with test-score level. An examination of the multiple correlations shows again that the number of items omitted can be best explained among the lowest scoring examinees. From the standardized regression weights we see, however, that age was the most heavily weighted variable. In the lowest quintile, age was nearly matched by minority status, and in the highest quintile, age was matched about equally with sex as a predictor of item omission. On the analytical section (Table ZS), we see that age carried a somewhat heavier weight than it did for the verbal or quantitative section. \Jhile ethnic category was c learly related to test score, as evidenced by the percentages of non-whites in each quintile, the regression weights were consistently lower for ethnicity than for age. While GPA, sex, and ethnic category each contributed to the prediction, age provided the best single estimation of number of items unanswered on the analytical test section, and this was especially evident among the lowest scoring examinees. 13

19 Part 2: Questionnaire Survey For the second phase of the study, a sample of nonguessers, as defined in Part 1, was surveyed by mail to determine as many reasons as possible why they did not guess on items they could not answer. The questionnaire attempted to assess their understanding of scoring techniques that penalize for guessing as well as to challenge their memory of the instructions they were given when they took the test. The purpose of the survey was not to draw inferences about populations. A much more extensive study would have been needed if we had intended to compare the races or sexes on test-taking strategies regarding guessing or to relate those strategies to other background variables. This study simply enumerated some of the reasons that examinees gave for not guessing when they were instructed to do so, and it provided a groundwork for discussion. Sampling One possible sampling strategy would have been to select the 200 examinees who left the most blank items. This strategy was rejected because those examinees might have been similar and unique in some way. Perhaps they were all ill and left the test center early. Because they were such a small proportion of the exarninee population, this was entirely possible. Therefore, we decided to sample across allnonguessers, as we defined them in Part 1, namely, those who left more than 30 items blank in the whole test. In the event that the reasons for not guessing were different by race or sex, we selected a nearly equal number from each race by sex group. Because there were only 4 "other" males in the population of nonguessers, we eliminated this group. Of the remaining five groups, we would have sampled 40 examinees per group except that there were only 35 Black males in the population. Thus, we included all 35 Black males and increased the other group sizes so the total would be 200. As stated earlier, our purpose was not to provide a sample from which to draw inferences about populations, and therefore, these groups should not be regarded as sampling strata. The decision to include approximately equal numbers from each sex/ethnic subgroup was based on the need to maximize the number of reasons examinees would give for not guessing, in the event that Black females, for example, had different reasons than did White males. We drew the sample by choosing every fourth White male, every tenth White female, every 13lack male, every second Black female, and every third "other" female. 14

20 The final sample contained the following numbers: 42 White males 42 White females 35 Black males 42 Black females 41 "other" females Questionnaire Design and Pretest The first draft of the questionnaire contained questions regarding - test preparation - the availability of instructions for guessing or not guessing - how many items they omitted and did not finish - what strategy they used with regard to items they could not answer - what they did when only one minute remained at the end of the test period - their beliefs about guessing and penalties for guessing - whether they had previously taken the General test or a Subject Test. We pretested this draft of the questionnaire on 10 local examinees who had taken the test in October. All had left some items unanswered and were therefore the most appropriate population to use for the pretest. Each of the 10 was given the questionnaire in person by a research assistant and then interviewed after completing it. The examinee was paid $5 for participating. Based on the questionnaire responses and interviews, we carefully reviewed and revised the questionnaire. Appendix A shows the final form of the questionnaire. Survey Administration The survey was administered according to the following steps: 1. The questionnaire was mailed on December 4, 1984, just seven weeks after the examinees had taken the test. It was sent with a cover letter (Appendix B), a postage-paid return envelope, and a check for $5. A mailing label was placed on the front of the questionnaire so we could identify the returns. 2. Eighteen days later a postcard reminder (Appendix C) was sent to 15

21 the the dix 118 examinees who had not responded. 3. Two weeks later, another copy of the questionnaire was mailed to remaining 81 non-respondents. With it was a followup letter (Appen- D) and a return envelope. No check was sent. 4. Six weeks later, all returned questionnaires were analyzed. Because of Christmas vacation and exam weeks, many of the questionnaires did not reach the examinees until they returned to their dormitories or until they received them from home. Undoubtedly, some were addressed to permanent home addresses and some were sent to college addresses, so a considerable number of questionnaires had to be forwarded. Nevertheless, the response rate was quite favorable. Results Response rate. Of the 202 questionnaires mailed, 3 were returned by the Post Office as undeliverable. One examinee returned the check and did not complete the questionnaire. Of the 199 questionnaires that reached their destinations, 166 were completed and returned. The response rate was 82 percent of those mailed. The response rates for the five groups were as follows: White males 76% White females 90% Black males 83% Black females 86% "Other" females 76% Characteristics of the sample. For purposes of analysis, data from the -- questionnaires were matched with GRZ scores and background information. The background characteristics of the survey respondents were quite similar to the characteristics of the entire subpopulation of nonguessers. The distribution of respondents was approxim Table 26a shows the exact distribution. Black group (17.5 percent), and White females were the ately equal across groups. males were the smallest largest (22.9 percent). Of the 166 respondents, 163 answered the question on citizenship (Table 26b). Of these, 86.5 percent claimed to be U.S. citizens. Twentytwo examinees were either resident aliens or foreign noncitizens. Similar to the citizenship distribution was the distribution of responses to the question on English language proficiency (Table 26~). Of the 161 examinees who answered this question, 18 (11.2 percent) reported that English was not their primary language. In addition to these descriptive characteristics, we analyzed exarninees' responses to the GRE background questionnaire items on parents' education (Table 26d). Among the parents of the 158 exarninees who answered these questions, we found that 26 percent of the fathers and 25 percent of 16

22 the mothers had not graduated from high school. This was a startling finding, considering that only about 15 to 17 percent of the GIiE population as a whole have parents with so little formal education. On the other hand, it is at least partially explained by the overrepresentation of minorities in our sample. In terms of academic ability indicators (Tables 27a-b), 44 percent of our sample reported having a B average for the last two years of college. Fewer than one third had averages of A- or A. The average GRE scores were 400, 420, and 391 for the verbal, quantitative, and analytical sections, respectively. Questionnaire responses. Table 28 shows that 61 percent indicated that they had either worked through the publication entitled Practicing -- to Take -- the GRE or had read other commercially available test-preparation books. Ninety percent of the sample said that they had read the descriptive booklet that accompanied the test registration materials (Table 28, item 2). Seventy-five percent indicated that it provided them with useful guidance about how to take the test. Of the respondents who answered the question on best strategy if faced with an item they could not answer, only 31 percent indicated that they should "pick an answer at random even if you had no idea of what the answer might be." The greatest proportion (46 percent) marked "pick an answer at random only if you could eliminate two or more of the choices." In item 3 (Table 29), the questionnaire asked whether the test-center supervisor gave them guidance on how to take the test. Only half indicated that supervisors had. Because so many examinees believed no guidance was given, we looked at the test center codes and locations of the centers where these examinees took their tests. If large numbers of examinees used the same centers, we would suspect that the supervisors had omitted some of the instructions. When we listed the test centers, however, we found no pattern. In fact, rarely were two examinees from the same center. They were from centers all over the country, so it appeared most unlikely that the supervisors were at fault. Of those examinees who acknowledged that the supervisors gave instructions, only 42 percent marked the "correct" strategy. Just as many indicated that they were told to guess only if they could eliminate two or more choices. Again, we compiled a list of test centers where these examinees took their exams to see if the same centers appeared more than once. If so, that would suggest that the supervisors were actually not reading the correct instructions or were supplementing them with their own interpretations. We found little or no repetition of test center code and no evidence that any supervisor explained the strategy incorrectly. Continuin g with questionnaire item 5, we computed the distributions of numbers of items left blank. This question was asked primarily to ascertain whether the examinees were even aware that they had omitted test items. If many had claimed to answer all items, we would have had to reexamine their answer sheets to see if the problem lay in gridding or scanning. Most of the respondents did remember having omitted items. Table 30 shows the distribution of responses. Because we could not 17

23 separate numbers of items omitted from the number not answered, we did not attempt to verify the survey estimates. However, the numbers covered a reasonable range of values, and they confirmed the fact that examinees knew that they had left questions blank. Item 6 asked about examinees' behavior during the one minute at the end of the test session (Table 31). First we asked whether the supervisors notified them when there was one minute left. Only half indicated that the supervisor did. Once again, we listed the test center codes corresponding to those examinees who indicated that the supervisors did not notify them when there was one minute left. Again we found no center to be implicated, thus suggesting that supervisor behavior was not the issue. We also asked what the examinees did after the one-minute mark. The greatest proportion (40 percent) continued to work on items they could answer. Only about 16 percent indicated that they marked items at random. because this was a multiple-response item, many examinees marked that they did more than one thing. Indeed, it is reasonable to continue working on items you are sure you can answer before marking the rest at random; but clearly, not many ever marked at random. Items 7 and 8 attempted to assess the examinees' understanding and beliefs regarding guessing, first in relation to the GEE General Test and then to standardized tests in general. Table 32 shows the distributions of responses to each statement. We assumed that those who chose not to respond to a statement simply did not know how to answer. Thus the percentages are based on the number answering "yes" out of the entire population. If they had clearly understood that they were to guess at random on the General Test, everyone should have answered "yes" to the first statement, namely, "It is to my advantage to answer every question even if I must choose an answer at random." Seventy-two percent marked "yes" to this statement. The next statement said, "Taints will be subtracted from my score for incorrect answers." Only 24 percent marked "yes" to this statement, and 11 percent left it blank --apparently not sure. Even so, it seems that most examinees knew they would not be penalized for guessing. The third statement read, "It is likely that choosing at random will improve my score to some extent." Only 63 percent marked "yes," and 13 percent left it blank. About the same percentage understood that the next item was not correct, namely, "It is likely that choosing at random will reduce my score to some extent." Only 27 percent said "yes," but 12 percent left it blank. The final statement was, "Choosing at random is a useful strategy only when I have some knowledge of a question and can eliminate one or more of the answer choices." Here 64 percent of the sample answered "yes." In item 8 the examinees' were asked to respond to the same set of statements, but with respect to most multiple-choice tests. Even here, 65 percent thought they should answer every question even if they must guess 18

AN ANALYSIS OF TIME-RELATED SCORE INCREMENTS AND/OR DECREMENTS FOR GRE REPEATERS ACROSS ABILITY AND SEX GROUPS. Donald Rock and Charles Werts

AN ANALYSIS OF TIME-RELATED SCORE INCREMENTS AND/OR DECREMENTS FOR GRE REPEATERS ACROSS ABILITY AND SEX GROUPS Donald Rock and Charles Werts GRE Board Research Report GREB No. 77-9R March 1980 This report