CHARACTERISTICS OF EXAMINEES WHO LEAVE QUESTIONS UNANSWERED ON THE GRE GENERAL TEST UNDER RIGHTS-ONLY SCORING

Size: px
Start display at page:

Download "CHARACTERISTICS OF EXAMINEES WHO LEAVE QUESTIONS UNANSWERED ON THE GRE GENERAL TEST UNDER RIGHTS-ONLY SCORING"

Transcription

1 CHARACTERISTICS OF EXAMINEES WHO LEAVE QUESTIONS UNANSWERED ON THE GRE GENERAL TEST UNDER RIGHTS-ONLY SCORING Jerilee Grandy GRE Board Professional Report No P ETS Research Report November 1987 This report presents the findings of a research project funded by and carried out under the auspices of the Graduate Record Examinations Board.

2 Characteristics of Examinees Who Leave Questions Unanswered on the GRE General Test under Rights-only Scoring Jerilee Grandy GRE Board Professional Report No P November 1987 Educational Testing Service, Princeton N.J

3 by Educational Testing Service. All rights reserved.

4 Content3 Page Abstract Background Purpose Part 1: Analysis Method Results of GRE File i Distributions of items not answered 6 Correlations of blank item between test sections 6 Distributions by demographic group 7 Comparisons between "nonguessers" and others 7 Effects of not guessing on scaled score 10 Regression analyses Part 2: Questionnaire Survey Purpose Sampling Questionnaire design and pretest 15 Survey administration Results Response rate Characteristics Questionnaire of sample responses Consistency of responses Discussion of questionnaire Summary and Conclusions References Tables Appendices findings

5 Abstract The purpose of the study was to determine why some examinees taking the GRE General Test leave items blank when they have been told that they will not be penalized for guessing and have been instructed to guess on questions they cannot answer. Data from the October 1984 test administration were analyzed to establish whether nonguessers were different in any way from those who completed, or nearly completed, each test section. Nonguessers were defined as examinees who left large numbers of items blank, whether by omitting or by not finishing a test section. The analyses in this study did not separate these two ways of leaving items blank. A sample of non-guessers was also surveyed by mail to determine whether they understood the instructions and why they omitted items. The tendency to leave items blank was most evident among the following of examinees: women non-whites, particularly Blacks resident aliens or foreign non-citizens members of families with less than average formal education older examinees examinees who were out of school or already in graduate school examinees who had taken the GRE previously, under old instructions examinees planning to study humanities or social sciences examinees with lower than average undergraduate grades examinees with lower than average GRE scores, even when the scores were corrected for not guessing In regression analyses, GRE scores provided the best estimate of number of items left unanswered. For the verbal section, ethnicity was the second best predictor. Age, sex, and GPA made small but significant contributions to the equation. For the quantitative section, age was the second best predictor after test score, with sex and ethnicity each contributing a small amount. The analytical score was also predicted best by test score and second by age. Ethnicity and GPA provided small but significant weights. Survey results suggested that many nonguessers did not fully understand guessing strategies and may have been confused by the new instructions because they conflicted with what the examinees had learned earlier. Their questionnaire responses were often inconsistent, and many claimed to have read and heard instructions not to guess.

6 Background The psychometric literature contains numerous articles on the pros and cons of formula scoring versus rights-only scoring of multiple-choice tests. Under formula scoring, examinees are assessed a penalty for guessing whereby the score is computed by subtracting one fourth (for fivealternative items) of the number of wrong answers from the number right. Under rights-only scoring, examinees are awarded points for the number of correct answers, with no penalty for wrong answers. Some of the pros and cons of each scoring method have been summarized and discussed by Angoff (1979) and by Angoff and Schrader (1984). Psychometric arguments in favor of formula scoring have been advanced by Lord (1963, 1974, 1975) and by Lord and Novick (1968, p. 308). Their studies suggest that formula-scored tests are more valid, more reliable, and more efficient than rights-scored tests. By examining item-response curves, Lord (1980) has suggested that difficult tests are especially unreliable for low-scoring students because they guess more often than they should, and that for them partial information is likely to be misinformation. Angoff and Schrader (1984) conducted,a study to evaluate two competing hypotheses. The differential effects hypothesis states that some students, when tested under formula directions, omit items about which they have useful partial knowledge. These students would presumably benefit under rights-only directions. The invariance hypothesis states that examinees would score no better than chance expectation on those items that they would omit under formula scoring but would answer under rights-only scoring. Results of the study tended to support the invariance hypothesis, and thus to support formula scoring. The results also suggested that formula scoring has the effect of compensating for differences in guessing strategies. Aside from the psychometric arguments favoring formula scoring, many people view pure random guessing as a test-taking strategy to be discouraged because it focuses on the student's interest in improving his or her test score without sufficient regard to the educational outcome being assessed. There have also been a number of arguments advanced in favor of rights-only scoring. Instructions to the examinee are more direct and easier to follow. Many examinees fail to understand the logic of the guessing strategy described for formula scoring. Some investigators have argued that personality characteristics play a greater role in formula-scored examinations than in rights-only examinations because the testtaker is faced with the question of making a number of decisions concerning the advisability of, or the relative risks involved in, following a hunch. Each item about which the examinee is uncertain demands a decison-making process. Many such decisions introduce irrelevant sources of error variance associated with the examinee's willingness to take risks and his or her level of confidence. 1

7 Finally, Glass and Wiley tests are more reliable than Lord's assertion. (1964) have claimed that rights-only scored formula-scored tests, thus contradicting The Research Committee of the GRE Board considered the various competing arguments for each type of scoring, and, in 1980, decided to move from a formula-scoring mode to a rights-only mode for the General Test. One of their major considerations was the effect of instructions on student behavior in the test-taking situation. Members of the committee also felt that uninformed guessing would cause problems for any item based equating scheme, and that the psychometric advantages favoring either formula scoring or rights-only scoring are small. The committee was also pursuaded that students should always guess because guessing is to their advantage (GRE Board minutes, April 1980). Since October 1982, the GRE Information Bulletin, in its discussion of test taking strategy, has instructed examinees as follows: On the General (Aptitude) Test, your scores will be determined by the number of best answers you select from the choices given. No penalty is assessed for wrong answers. Therefore, even if you are uncertain about a question, it is better that you guess at the answer rather than not respond at all. You risk nothing by guessing. Failure to respond, however, eliminates the possibility of raising your score by selecting the correct answer. The explanation is also given on the back cover of the test booklet, and it is read aloud to the examinees, along with other instructions, once at the start of the exam. In spite of having this information, some examinees still omit items. Prior to the present study, we estimated that between 7,600 and 20,000 examinees a year--i.e., between 4 and 11 percent of the total examinee population-- penalize themselves at least 10 scaled score points and possibly 40 points or more on theverbal section by not guessing. The estimates were higher for the quantitative section and higher still for the analytical section. Questions therefore have arisen as to whether failure to guess under rights-only scoring actually has a serious effect on examinees' scores, and, if so, whether instructions could be changed to encourage examinees to guess, and whether steps should be taken to compensate in some way for not guessing.

8 Purpose The purpose of this study was to answer three questions: 1. Do examinees who tend to leave items unanswered differ in any respects from those who complete all items? 2. How seriously are examinees' scores actually affected by not guessing? 3. What reasons do examinees give for not guessing on the questions they leave blank? To answer these questions, we analyzed data from the October 1984 test administration and conducted a mail survey of a sample of examinees who failed to answer more than 30 items altogether. In Part 1 of the study, we addressed the first two questions, basing our conclusions on data from the entire population that took the test in October. Part 2 describes the survey and how the findings provide some answers to the third question.

9 Part 1: Analysis of GRE File Method To answer questions regarding the effects of item omission on test scores and the relation of item omission to examinee characteristics, we analyzed data from the population that took the GRE General test in October That test administration contained the largest number of examinees that year. The data base we used contained information from the background questionnaire, converted (scaled) test scores, and the number of items not answered. While it might have been preferable to analyze separately the number of items omitted and the number not reached, this would have required that all tests (five different forms) be specially restored. We considered the following arguments before deciding not to restore the tests. First, effects on the test scores would be the same. In other words, since omitted items and not-reached items are both scored zero, they did not have to be separated to study the effects of leaving items unanswered. Second, although we did not know this for certain, we thought it likely that examinees would use the same general strategy for items they could not answer as for items they did not reach. Even if they did not use the same strategy, we could obtain the information we needed from the combination of both kinds of blank items. If we could estimate the number of items left blank from other examinee characteristics (e.g., age, race), we would know that there is a relationship between those characteristics and the tendency to omit items and/or to leave items blank that cannot be finished in the time allowed. In either case, decisions regarding the next step to take were likely to be the same, With these considerations in mind, we decided to conduct the analyses in Part 1 on all items not answered, namely, the sum of those omitted and those not reached. Analyses of the total population consisted of the following: 1. frequency distributions of numbers of items not answered, in the total test and by test section, 2. correlations among the number of unanswered items in the verbal, quantitative, and analytical sections, 3. frequency distributions of numbers of items not answered, by demographic groups. We then defined "nonguessers" as the subpopulation who failed to answer more than 30items.The decision to use 30 items was not based on any statistical consideration. Rather, we reasoned that it was a sufficiently large number to exclude examinees who might have inadvertently missed a few questions, and it was small enough to exclude those who gave 4

10 up after attempting the first few questions. There was no a priori reason to believe that the choice of 30 items would yield results substantively different from those obtained using 25, 35, or some other number in the same general range. Analyses consisted of comparisons between nonguessers and others on background characteristics and ability measures. Finally, we estimated the magnitude of the effects of item nonresponse on test scores. Analyses required that the reported scaled scores be "corrected" for not guessing. To make this correction on the verbal and analytical sections (which have five response alternatives), we estimated that one in five unanswered items would have been correct if the examinee had guessed completely at random. Therefore, for every five items not answered, the examinee's score was raised one raw score point. For the quantitative section, half of the items have five alternatives and half have four. Under the assumption that the number of unanswered items was split evenly over the two sections, for half of the unanswered items we converted one to five and the other half we converted one to four. Next we needed to convert the raw score points gained into scaled score points to be added. In the October administration, more than one test form was used, and each had a slightly different conversion. Because the conversions were not radically different from one form to another, for the purposes of this study we averaged the conversions and used the same equation for everyone, regardless of the form he or she took. For the verbal section, 1 raw score point was set equal to 9 scaled score points. For the quantitative section the conversion was 1 to 13, and for the analytical section we used 1 to 15. To illustrate how the corrections were done, suppose an examinee had a verbal scaled score of 480 and had left 15 items unanswered. If he had guessed on those 15 items, his score would have been raised 3 raw score points (one for each five items left blank). Those 3 points would then be converted to 27 scaled score points because the raw-score-to-scaled-score conversion for the verbal section was 1 to 9. Finally, the 27 points would be added to the 480 to produce a "corrected" score of 507. The following simplified equations were used to perform all corrections: V' = V + 1.8bv Q' = Q + 2.9bq A' = A + 3.0ba where V', Q', and A' are the corrected scores, V, Q, and A are the rightsonly scores, and b,, b, and b, are the numbers of verbal, quantitative, and analytical items le F t blank. After correcting the scaled scores for not guessing, we examined the effects of not guessing on the scaled scores of nonguessers. In the last set of analyses, we used stepwise regressions to explore 5

11 the extent to which we could estimate unanswered from their corrected scaled ables. the number of items examinees left scores and certain background vari- Results The total number of examinees who took all three sections of the GRE General Test in October 1984 was 55,656. Of these, only 0.5 percent failed to specify their sex, while 19.2 percent omitted the question on citizenship, and 27.5 percent omitted ethnic identity. It is important to note, however, that the background questionnaire instructed examinees (in very small print) to skip the question on ethnicity unless they were U.S. citizens or resident aliens. In fact, it told them that they could skip all of the questions if they had taken the test within the past year and the answers to all questions had not changed. The fact that many examinees omitted certain items in the background questionnaire and, furthermore, did not always follow the branching instructions, created difficulties in the analyses of test item omission. We will discuss these in greater detail in the section on analysis by demographic group. Distributions --- of items not answered. Table 1 shows the distribution of the total number of items not answered over the entire test. Nearly 44 percent of the population answered every item. One-tenth of the examinees, however, left 12 or more items blank, and 4.7 percent--nearly 3,000 examinees --failed to answer more than 20 items. This indicates that a sizable number of test-takers either did not understand the instructions to guess or chose not to follow them. As we shall see later, the decision to leave items blank can have a considerable effect on the resulting test score. If we examine the frequency distributions separately for each of the test sections, we see that the greatest number of blank items occurred on the analytical section and the fewest number on the verbal section. From Table 2 we see that three-fourths of the population answered all of the verbal questions. One percent, however, left 18 or more items unanswered. On the quantitative section (Table 3), 72 percent answered every question. But on the analytical section (Table 4), only 61 percent answered all items. Two percent left more than 15 questions blank, and 10 percent failed to answer more than 6. Correlations of blank items between test sections. It is reasonable to --- hypothesize that a person's tendency to guess or not to guess would apply to all test sections. We would expect, therefore, some correlation between the numbers of items not answered across sections. Table 5 shows that they are correlated, the highest correlation being 0.62 between the quantitative and analytical sections. The correlations obtained here suggest that the individual examinee is using a consistent test-taking strategy, with regard to guessing, for all test sections.

12 Distributions by demographic group. In the total population, just slightly more than half of the examinees (51.8 percent) were women. From Table G we see that females tended to leave more items blank than did males, regardless of the test section. This pattern was most prominent in the quantitative section where the women not answering 26 or more items outnumbered the men 2 to 1. Patterns of non-response among the various ethnic groups and among those of different citizenship status were also quite evident. Analyses of nonresponse to ethnicity and to citizenship were confounded, however, by the fact that noncitizens did not generally answer the ethnicity question. To avoid confounding ethnicity and citizenship, therefore, the distribution of items left unanswered by ethnicity as shown in Table 7 is based only on those who indicated that they were U.S. citizens. Nearly all ethnic minorities left a disproportionately high number of items blank on all three sections. Ethnic differences were greatest on the verbal section and least on the quantitative. Blacks were by far the most likely to leave large numbers of items unanswered. Among those leaving more than 25 items unanswered on the verbal section, over one-fifth were Black, while only 4 percent of all U.S. citizens taking the test were Black. Foreign noncitizens and resident aliens left more items unanswered than did U. S. citizens, especially on the verbal and analytical sections (Table 8). On the quantitative section, however, only resident aliens left a disproportionately large number of items unanswered. It is not surprising that U. S. citizens tended to answer more items, particularly in the verbal section. The real difference is probably English language proficiency rather than test-taking strategy. Table 9 confirms this expectation. On the verbal section, in particular, the number of examinees leaving items blank was quite large. Only 11.5 percent of the population indicated that English was not their primary language, while over 30 percent of those examinees omitting more than 25items were primarily non- English speaking. Comparisons between "nonguessers" and others. From these analyses, it is evident that examinees who choose not to guess at items they cannot answer have, on the average, different characteristics from those who do guess. To investigate their characteristics further, we defined a group we called "nonguessers." These were the examinees who left more than 30 items blank in the entire test. Based on the 30-item cutoff, there were 890 non guessers in the population. The purpose of the analysis was to see how the nonguessers differed, on the average, from the rest of the population. We subdivided the nonguessers and the remaining examinees each into six subgroups: White males, White females, Black males, Black females, "other" males, and "other" females. Table 10 shows the numbers of examinees in each of these categories among the nonguessers. The smallest cell, which contained only four examinees, contained "other" males. This group was therefore not included in the subgroup comparisons. The largest group was White females. In fact, it is very clear from this table that the number of women among the nonguessers was more than three times the number of men.

13 Among the 890 nonguessers, 80 either omitted the item on citizenship or indicated that they were not citizens. For the remainder of the analyses comparing guessers with nonguessers, we used data only from U.S. citizens. Table 11 confirms that, among U.S. citizens, there were disproportionately high numbers of minorities and females among the nonguessers. Nearly 12 percent of the nonguessing population were Black females, while less than 3 percent of the remaining population was composed of Black females. Examining other items in the background questionnaire, we found that fewer than one-third of the nonguessers were currently enrolled as college seniors, whereas 44 percent of the remainder of the examinees were seniors (Table 12). Nore of the nonguessers were either in graduate school already or were out of school and had bachelor's or master's degrees. Breaking this down by sex and ethnic subgroup, we examined the percentages of test-takers who were college seniors. In Table 13 we see that among White males, 30 percent of the nonguessers were seniors, while 49 percent of the rest of the examinees were seniors. A similar relationship held for White females --29 percent of the nonguessers and 41 percent of the remaining test-takers were seniors. Blacks, both males and females, showed little difference between nonguessers and the remaining examinees. Among "other" females, there was some difference and that difference was in the same direction as for Whites, but not as large. Consistent with these findings is an apparent relationship between guessing behavior and age. Table 14 shows that nonguessers tended, on the average, to be older. Among White males, they were two and one-half years older than other examinees. For White females, the difference was 3.6 years. The differences were not quite as great for the other groups, but they did exist. If older examinees were more likely to remember and to follow the old test instructions, we would also expect more of them to have taken an old form of the GRE prior to 1982 when the guessing instructions changed. The background questionnaire asked whether they had taken the GRE within the past year or prior to that date (prior to October 1983). Assuming that at least some of those who took it prior to 1983 actually took it prior to 1982, we would expect to find a slightly higher percentage of nonguessers in this category. In fact, among White males, over 15 percent of the nonguessers reported taking the GRE prior to October 1983, while only 9 percent of the remainder of the examinees took the test before that date (Table 15). A similar pattern held for Whites of both sexes and for "other" females. Comparing groups on English language proficiency, we found some differences among the sex and ethnic groups (Table 16). Among Black females, nonguessers were about the same as the rest of the population--almost everyone (98 percent) claimed that English was their primary language. The greatest difference was among "other" females, where only 53 percent of the nonguessers reported that English was their primary language compared with 68 percent of the rest of the population of "other" females. For Whites, regardless of sex, there was little difference between the nonguessers and others, probably because a very high percentage were native English 8

14 speakers. One additional background variable we analyzed was intended major field. Because of the small numbers planning to enter some fields, we did not analyze the sex and ethnic groups separately. Furthermore, we studied only four large major field areas: humanities, social sciences, biological sciences, and physical sciences. Table 17 shows that the greatest difference in major field preference was in the physical sciences. Among nonguessers, only 12 percent planned to major in a physical science, while 22 percent of the rest of the population chose this area. A disproportionately large number of nonguessers selected humanities and social sciences. Finally, we compared the academic achievement of nonguessers with that of other examinees by analyzing grade point average (GPA) and GRE scores. The grade distributions in Table 18 indicate that nonguessers generally had lower grades during their last two years of college. Among the nonguessers, only 11.5 percent reported having an "A" average, while 16.8 percent of the remaining population were "A" students. Likewise, 26.7 percent of the nonguessers and 35.3 percent of the other examinees claimed that they had an "A-" average. Below this level--from "B" down to "C-" -- we found a greater proportion of nonguessers. Consistent with lower grades among nonguessers were lower GRE scores. We compared the scores as they were reported under rights-only scoring, and, in addition, we "corrected" them for not guessing as described earlier. Table 19 shows the mean verbal scores both ways. Looking at the rights-only score averages first, we see that among White males, nonguessers scored on the average more than 100 points lower than the remaining test takers. The differences in the converted scores of nonguessers and other examinees were somewhat less for the other subgroups; among Black males, nonguessers scored an average of 35 points lower than other examinees. The standard deviations under rights-only scoring ranged from 102 (Black females) to 132 ("other" females) for the population excluding non guessers. For nonguessers, they ranged from 80 to 108. In standard deviation units, therefore, some of the differences between nonguessers and others were considerable. Among "other" females, the difference in rightsonly score averages between nonguessers and other examinees was two-thirds of a standard deviation. Using the formulas derived earlier, we computed the mean verbal scores of each group as if they had guessed at random on the unanswered items. The differences between nonguessers and the rest of the population were reduced by this correction. The corrected difference for White males was 81 points (compared with 108 when uncorrected); for Black males the difference was reduced to 9 points (from 37). We still see that there were differences in verbal scores between non guessers and the rest of the examinees. The difference was almost negligible for Black males but quite large for White males. It is worth mentioning that the standard deviations of the corrected scores are slightly smaller, but only by about 2 points. Correcting for not guessing truly 9

15 reduced the differences between nonguessers and other examinees, but the differences were still real, especially for Whites and "others." Quantitative score averages showed a similar pattern but with even greater differences between nonguessers and others (Table 20). Using rights-only scores, we see that among White males, nonguessers scored an average of 132 points lower than other White males. Among Black females, the difference was only 36 points. When we correct for not guessing using the formulas derived earlier, the differences are reduced to 102 for White nales and 8 for Black females. The standard deviation of the corrected scores of White males was 111 for nonguessers and 123 for the remaining examinees. We therefore still see differences of nearly a standard deviation between the quantitative scores of nonguessers and those of the White male population. Finally, when we looked at the analytical score differences, we found the same pattern (Table 21). The difference was again greatest for White males, with nonguessers scoring an average of 161 points lower than other White males. Slack females showed a difference of 64 points. The differences between nonguessers and the rest of the populations were even larger than those found in the verbal or Quantitative sections, and the standard deviations were about the same. For White males, the standard deviation of the scores for nonguessers was only 89; for the rest of the White male population, it was 121. Correcting the scores for not guessing, we found that White males who chose not to guess still scored an average of 113 points lower than other White males. The difference for Black females was 21 points. The standard deviations for White males were 57 and 120 for nonguessers and other examinees respectively. Again, we found a difference on the order of a standard deviation between nonguessers and the rest of the population of White males. Differences for the other groups were less, but they were still sizable, especially for non-blacks. What we conclude from these analyses is not surprising. Even if we guess for the examinee who"has left items blank, those who leave large numbers of items unanswered will still obtain lower than average scores. Effects of not guessing on scaled score. -- What is perhaps more important to discuss are the possible<ffects of guessing for the examinee who leaves items blank. On the average, correcting for not guessing would raise the mean score a point or two. But it is not the mean that concerns us; it is the effect on the scores of those who omit many items that may be considerable. If a single item on the verbal section affects the converted score 9 points, an examinee does not have to leave out many items before his or her score is drastically affected, perhaps enough to influence acceptance to graduate school. Among White males in our group of nonguessers, the average verbal score would have been raised from 420 to

16 A single item on the quantitative section affected a converted score even more. Among the White males defined as nonguessers, the average quantitative score would have been raised from 466 to 498. The average alack male nonguesser, whose mean score was only 378, would at least have gone over the 400 mark to 410. The greatest differences between the converted scores and the corrected converted scores occurred on the analytical section. For nonguessers, the difference would have been around 50 points (Table 21). The average score for White males who did not guess was 414; if their scores had been corrected for not guessing, that average would have been 467. For Black females, the average would have been raised from 343 to 393. While the average scores obtained by nonguessers still may not be high enough for admission to a selective university, not all nonguessers should be thought of as fitting the average of the group we defined. Considering that the standard deviation for our group was around 100, some nonguessers clearly have scores in a higher range, and it is possible that, for them, guessing or not guessing could affect their admission to graduate school. Regression analyses. The descriptive analyses showed a number of variables to be related to item omission. These variables were undoubtedly confounded. We expect citizenship and ethnic identity to be related to English language proficiency and for age to be related to prior test experience. To explore which variables might be most strongly related to item omission, we conducted several regression analyses, selecting as independent variables relevant background data and scaled test scores corrected for not guessing. The first analysis was performed on all examinees who were U. S. citizens. It consisted of a stepwise regression in which the number of items left unanswered was regressed onto scaled score, age, ethnic category, sex, GPA during the last two years of college, and number of years out of school. The scaled score was corrected for not guessing as described earlier. The reason for using the corrected score was that we would expect it to provide a better estimate of the examinee's true score on the test section. Ethnic category was coded simply 0 or 1, representing White or non-white. Sex was defined as 0 for male and 1 for female. The analysis was performed only on U. S. citizens because the noncitizens were instructed not to answer the ethnicity item, and ethnicity was an important variable to investigate. Results of this regression analysis are shown in Table 22. For each of the three test sections, the score explained the greatest proportion of the variance in numbers of items unanswered. For the verbal section, ethnic category was stepped in second;age was third, sex was fourth, and GPA was last. Number of years out of school had no significant weight, possibly because it was highly correlated with age. For the quantitative and analytical sections, age had a greater weight than did ethnic category. Based on these results we would conclude that test score is the best predictor of guessing behavior-- low scoring examinees leave more items blank than do high-scoring examinees, regardless of race, sex, age, or achievement in college. Quite clearly, this is not a very profound discovery. Those who omit items must necessarily obtain lower scores than the bulk of the examinee population simply because the population as a 11

17 whole includes those who know the answers. What we would really like to know is whether, if we matched omitters with random guessers on the number of items they could not answer, we would find a significant difference in some other characteristic. Unfortunately, we cannot do that with the available information. What Table 22 does tell us, however, is that if we hold test score constant, four other variables still have a small but significant relationship with item omission. For the verbal section, non-whites leave more items blank than do Whites, as do older examinees. For the quantitative and analytical sections, age seems to be the primary predictor of item omission. In fact, on the analytical section, age is weighted nearly as heavily as test score, and ethnic category is relatively unimportant. A question that naturally arises from this analysis is whether these relationships hold over the full range of abilities. Among low-scoring examinees, item omission may be related to different variables than it is among high-scoring examinees. To investigate this possibility, we divided examinees into five different analysis groups (quintiles) according to their scores. For each group, the regression was recomputed using the same independent variables (excluding test score) to estimate the number of items left unanswered. Results for the verbal section are shown in Table 23. The first result worth noting is the pattern of means and percentages for the five groups. The lowest quintile was 26 percent non-white, compared with 11 percent of the second quintile. The highest quintile was only 7 percent non-white. Age was also seen to be somewhat related to test-score level. The oldest examinees, on the average, were in the lowest quintile; the highest three quintiles contained examinees of the same average age. GPA for the last two years of college was consistent with test-score group, and the percentage who were female was highest in the lower quintiles. The multiple correlations show that these four variables predicted item omission best in the lowest scoring group, where ethnic category carried the greatest regression weight. Second was age, third was GPA, and last was sex. These weights changed in relative importance, however, for the other groups. For the second quintile, age and sex were about equally important, ethnic category was third, and GPA did not enter the equation at all. In the middle quintile, the multiple correlation was only 0.06 between number of omits and all four predictors, none of which carried a great weight. In the top two quintiles, ethnic category dropped out entirely, even though minorities constituted at least 7 percent of the examinees in each of these high-scoring groups. It appears from this analysis that item omission is best explained among those with low scores, and that, among low-scoring examinees, item omission is disproportionately high among non-whites, regardless of their age, sex, or college grades. In addition, age, sex, and GPA are each related to item omission, though not quite so heavily as ethnic identity. The same analyses were conducted for the quantitative test section; results are shown in Table 24. Here we see that the ethnic composition of the lowest quintile was 23 percent non-white, and about 9 or 10 percent at 12

18 all other score levels. Those in the lower scoring groups were also older, on the average. There was a considerable difference in the sex ratios across score groups, with the lowest quintile being 76 percent female and the highest quintile being only 33 percent female. GPA was consistent with test-score level. An examination of the multiple correlations shows again that the number of items omitted can be best explained among the lowest scoring examinees. From the standardized regression weights we see, however, that age was the most heavily weighted variable. In the lowest quintile, age was nearly matched by minority status, and in the highest quintile, age was matched about equally with sex as a predictor of item omission. On the analytical section (Table ZS), we see that age carried a somewhat heavier weight than it did for the verbal or quantitative section. \Jhile ethnic category was c learly related to test score, as evidenced by the percentages of non-whites in each quintile, the regression weights were consistently lower for ethnicity than for age. While GPA, sex, and ethnic category each contributed to the prediction, age provided the best single estimation of number of items unanswered on the analytical test section, and this was especially evident among the lowest scoring examinees. 13

19 Part 2: Questionnaire Survey For the second phase of the study, a sample of nonguessers, as defined in Part 1, was surveyed by mail to determine as many reasons as possible why they did not guess on items they could not answer. The questionnaire attempted to assess their understanding of scoring techniques that penalize for guessing as well as to challenge their memory of the instructions they were given when they took the test. The purpose of the survey was not to draw inferences about populations. A much more extensive study would have been needed if we had intended to compare the races or sexes on test-taking strategies regarding guessing or to relate those strategies to other background variables. This study simply enumerated some of the reasons that examinees gave for not guessing when they were instructed to do so, and it provided a groundwork for discussion. Sampling One possible sampling strategy would have been to select the 200 examinees who left the most blank items. This strategy was rejected because those examinees might have been similar and unique in some way. Perhaps they were all ill and left the test center early. Because they were such a small proportion of the exarninee population, this was entirely possible. Therefore, we decided to sample across allnonguessers, as we defined them in Part 1, namely, those who left more than 30 items blank in the whole test. In the event that the reasons for not guessing were different by race or sex, we selected a nearly equal number from each race by sex group. Because there were only 4 "other" males in the population of nonguessers, we eliminated this group. Of the remaining five groups, we would have sampled 40 examinees per group except that there were only 35 Black males in the population. Thus, we included all 35 Black males and increased the other group sizes so the total would be 200. As stated earlier, our purpose was not to provide a sample from which to draw inferences about populations, and therefore, these groups should not be regarded as sampling strata. The decision to include approximately equal numbers from each sex/ethnic subgroup was based on the need to maximize the number of reasons examinees would give for not guessing, in the event that Black females, for example, had different reasons than did White males. We drew the sample by choosing every fourth White male, every tenth White female, every 13lack male, every second Black female, and every third "other" female. 14

20 The final sample contained the following numbers: 42 White males 42 White females 35 Black males 42 Black females 41 "other" females Questionnaire Design and Pretest The first draft of the questionnaire contained questions regarding - test preparation - the availability of instructions for guessing or not guessing - how many items they omitted and did not finish - what strategy they used with regard to items they could not answer - what they did when only one minute remained at the end of the test period - their beliefs about guessing and penalties for guessing - whether they had previously taken the General test or a Subject Test. We pretested this draft of the questionnaire on 10 local examinees who had taken the test in October. All had left some items unanswered and were therefore the most appropriate population to use for the pretest. Each of the 10 was given the questionnaire in person by a research assistant and then interviewed after completing it. The examinee was paid $5 for participating. Based on the questionnaire responses and interviews, we carefully reviewed and revised the questionnaire. Appendix A shows the final form of the questionnaire. Survey Administration The survey was administered according to the following steps: 1. The questionnaire was mailed on December 4, 1984, just seven weeks after the examinees had taken the test. It was sent with a cover letter (Appendix B), a postage-paid return envelope, and a check for $5. A mailing label was placed on the front of the questionnaire so we could identify the returns. 2. Eighteen days later a postcard reminder (Appendix C) was sent to 15

21 the the dix 118 examinees who had not responded. 3. Two weeks later, another copy of the questionnaire was mailed to remaining 81 non-respondents. With it was a followup letter (Appen- D) and a return envelope. No check was sent. 4. Six weeks later, all returned questionnaires were analyzed. Because of Christmas vacation and exam weeks, many of the questionnaires did not reach the examinees until they returned to their dormitories or until they received them from home. Undoubtedly, some were addressed to permanent home addresses and some were sent to college addresses, so a considerable number of questionnaires had to be forwarded. Nevertheless, the response rate was quite favorable. Results Response rate. Of the 202 questionnaires mailed, 3 were returned by the Post Office as undeliverable. One examinee returned the check and did not complete the questionnaire. Of the 199 questionnaires that reached their destinations, 166 were completed and returned. The response rate was 82 percent of those mailed. The response rates for the five groups were as follows: White males 76% White females 90% Black males 83% Black females 86% "Other" females 76% Characteristics of the sample. For purposes of analysis, data from the -- questionnaires were matched with GRZ scores and background information. The background characteristics of the survey respondents were quite similar to the characteristics of the entire subpopulation of nonguessers. The distribution of respondents was approxim Table 26a shows the exact distribution. Black group (17.5 percent), and White females were the ately equal across groups. males were the smallest largest (22.9 percent). Of the 166 respondents, 163 answered the question on citizenship (Table 26b). Of these, 86.5 percent claimed to be U.S. citizens. Twentytwo examinees were either resident aliens or foreign noncitizens. Similar to the citizenship distribution was the distribution of responses to the question on English language proficiency (Table 26~). Of the 161 examinees who answered this question, 18 (11.2 percent) reported that English was not their primary language. In addition to these descriptive characteristics, we analyzed exarninees' responses to the GRE background questionnaire items on parents' education (Table 26d). Among the parents of the 158 exarninees who answered these questions, we found that 26 percent of the fathers and 25 percent of 16

22 the mothers had not graduated from high school. This was a startling finding, considering that only about 15 to 17 percent of the GIiE population as a whole have parents with so little formal education. On the other hand, it is at least partially explained by the overrepresentation of minorities in our sample. In terms of academic ability indicators (Tables 27a-b), 44 percent of our sample reported having a B average for the last two years of college. Fewer than one third had averages of A- or A. The average GRE scores were 400, 420, and 391 for the verbal, quantitative, and analytical sections, respectively. Questionnaire responses. Table 28 shows that 61 percent indicated that they had either worked through the publication entitled Practicing -- to Take -- the GRE or had read other commercially available test-preparation books. Ninety percent of the sample said that they had read the descriptive booklet that accompanied the test registration materials (Table 28, item 2). Seventy-five percent indicated that it provided them with useful guidance about how to take the test. Of the respondents who answered the question on best strategy if faced with an item they could not answer, only 31 percent indicated that they should "pick an answer at random even if you had no idea of what the answer might be." The greatest proportion (46 percent) marked "pick an answer at random only if you could eliminate two or more of the choices." In item 3 (Table 29), the questionnaire asked whether the test-center supervisor gave them guidance on how to take the test. Only half indicated that supervisors had. Because so many examinees believed no guidance was given, we looked at the test center codes and locations of the centers where these examinees took their tests. If large numbers of examinees used the same centers, we would suspect that the supervisors had omitted some of the instructions. When we listed the test centers, however, we found no pattern. In fact, rarely were two examinees from the same center. They were from centers all over the country, so it appeared most unlikely that the supervisors were at fault. Of those examinees who acknowledged that the supervisors gave instructions, only 42 percent marked the "correct" strategy. Just as many indicated that they were told to guess only if they could eliminate two or more choices. Again, we compiled a list of test centers where these examinees took their exams to see if the same centers appeared more than once. If so, that would suggest that the supervisors were actually not reading the correct instructions or were supplementing them with their own interpretations. We found little or no repetition of test center code and no evidence that any supervisor explained the strategy incorrectly. Continuin g with questionnaire item 5, we computed the distributions of numbers of items left blank. This question was asked primarily to ascertain whether the examinees were even aware that they had omitted test items. If many had claimed to answer all items, we would have had to reexamine their answer sheets to see if the problem lay in gridding or scanning. Most of the respondents did remember having omitted items. Table 30 shows the distribution of responses. Because we could not 17

23 separate numbers of items omitted from the number not answered, we did not attempt to verify the survey estimates. However, the numbers covered a reasonable range of values, and they confirmed the fact that examinees knew that they had left questions blank. Item 6 asked about examinees' behavior during the one minute at the end of the test session (Table 31). First we asked whether the supervisors notified them when there was one minute left. Only half indicated that the supervisor did. Once again, we listed the test center codes corresponding to those examinees who indicated that the supervisors did not notify them when there was one minute left. Again we found no center to be implicated, thus suggesting that supervisor behavior was not the issue. We also asked what the examinees did after the one-minute mark. The greatest proportion (40 percent) continued to work on items they could answer. Only about 16 percent indicated that they marked items at random. because this was a multiple-response item, many examinees marked that they did more than one thing. Indeed, it is reasonable to continue working on items you are sure you can answer before marking the rest at random; but clearly, not many ever marked at random. Items 7 and 8 attempted to assess the examinees' understanding and beliefs regarding guessing, first in relation to the GEE General Test and then to standardized tests in general. Table 32 shows the distributions of responses to each statement. We assumed that those who chose not to respond to a statement simply did not know how to answer. Thus the percentages are based on the number answering "yes" out of the entire population. If they had clearly understood that they were to guess at random on the General Test, everyone should have answered "yes" to the first statement, namely, "It is to my advantage to answer every question even if I must choose an answer at random." Seventy-two percent marked "yes" to this statement. The next statement said, "Taints will be subtracted from my score for incorrect answers." Only 24 percent marked "yes" to this statement, and 11 percent left it blank --apparently not sure. Even so, it seems that most examinees knew they would not be penalized for guessing. The third statement read, "It is likely that choosing at random will improve my score to some extent." Only 63 percent marked "yes," and 13 percent left it blank. About the same percentage understood that the next item was not correct, namely, "It is likely that choosing at random will reduce my score to some extent." Only 27 percent said "yes," but 12 percent left it blank. The final statement was, "Choosing at random is a useful strategy only when I have some knowledge of a question and can eliminate one or more of the answer choices." Here 64 percent of the sample answered "yes." In item 8 the examinees' were asked to respond to the same set of statements, but with respect to most multiple-choice tests. Even here, 65 percent thought they should answer every question even if they must guess 18

AN ANALYSIS OF TIME-RELATED SCORE INCREMENTS AND/OR DECREMENTS FOR GRE REPEATERS ACROSS ABILITY AND SEX GROUPS. Donald Rock and Charles Werts

AN ANALYSIS OF TIME-RELATED SCORE INCREMENTS AND/OR DECREMENTS FOR GRE REPEATERS ACROSS ABILITY AND SEX GROUPS. Donald Rock and Charles Werts AN ANALYSIS OF TIME-RELATED SCORE INCREMENTS AND/OR DECREMENTS FOR GRE REPEATERS ACROSS ABILITY AND SEX GROUPS Donald Rock and Charles Werts GRE Board Research Report GREB No. 77-9R March 1980 This report

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

Discrimination Weighting on a Multiple Choice Exam

Discrimination Weighting on a Multiple Choice Exam Proceedings of the Iowa Academy of Science Volume 75 Annual Issue Article 44 1968 Discrimination Weighting on a Multiple Choice Exam Timothy J. Gannon Loras College Thomas Sannito Loras College Copyright

More information

SURVEY TOPIC INVOLVEMENT AND NONRESPONSE BIAS 1

SURVEY TOPIC INVOLVEMENT AND NONRESPONSE BIAS 1 SURVEY TOPIC INVOLVEMENT AND NONRESPONSE BIAS 1 Brian A. Kojetin (BLS), Eugene Borgida and Mark Snyder (University of Minnesota) Brian A. Kojetin, Bureau of Labor Statistics, 2 Massachusetts Ave. N.E.,

More information

BACKGROUND CHARACTERISTICS OF EXAMINEES SHOWING UNUSUAL TEST BEHAVIOR ON THE GRADUATE RECORD EXAMINATIONS

BACKGROUND CHARACTERISTICS OF EXAMINEES SHOWING UNUSUAL TEST BEHAVIOR ON THE GRADUATE RECORD EXAMINATIONS ---5 BACKGROUND CHARACTERISTICS OF EXAMINEES SHOWING UNUSUAL TEST BEHAVIOR ON THE GRADUATE RECORD EXAMINATIONS Philip K. Oltman GRE Board Professional Report GREB No. 82-8P ETS Research Report 85-39 December

More information

Critical Thinking Assessment at MCC. How are we doing?

Critical Thinking Assessment at MCC. How are we doing? Critical Thinking Assessment at MCC How are we doing? Prepared by Maura McCool, M.S. Office of Research, Evaluation and Assessment Metropolitan Community Colleges Fall 2003 1 General Education Assessment

More information

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

3 CONCEPTUAL FOUNDATIONS OF STATISTICS 3 CONCEPTUAL FOUNDATIONS OF STATISTICS In this chapter, we examine the conceptual foundations of statistics. The goal is to give you an appreciation and conceptual understanding of some basic statistical

More information

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA Data Analysis: Describing Data CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA In the analysis process, the researcher tries to evaluate the data collected both from written documents and from other sources such

More information

Test review. Comprehensive Trail Making Test (CTMT) By Cecil R. Reynolds. Austin, Texas: PRO-ED, Inc., Test description

Test review. Comprehensive Trail Making Test (CTMT) By Cecil R. Reynolds. Austin, Texas: PRO-ED, Inc., Test description Archives of Clinical Neuropsychology 19 (2004) 703 708 Test review Comprehensive Trail Making Test (CTMT) By Cecil R. Reynolds. Austin, Texas: PRO-ED, Inc., 2002 1. Test description The Trail Making Test

More information

COMPUTING READER AGREEMENT FOR THE GRE

COMPUTING READER AGREEMENT FOR THE GRE RM-00-8 R E S E A R C H M E M O R A N D U M COMPUTING READER AGREEMENT FOR THE GRE WRITING ASSESSMENT Donald E. Powers Princeton, New Jersey 08541 October 2000 Computing Reader Agreement for the GRE Writing

More information

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review Results & Statistics: Description and Correlation The description and presentation of results involves a number of topics. These include scales of measurement, descriptive statistics used to summarize

More information

Chapter 11. Experimental Design: One-Way Independent Samples Design

Chapter 11. Experimental Design: One-Way Independent Samples Design 11-1 Chapter 11. Experimental Design: One-Way Independent Samples Design Advantages and Limitations Comparing Two Groups Comparing t Test to ANOVA Independent Samples t Test Independent Samples ANOVA Comparing

More information

AMERICAN BOARD OF SURGERY 2009 IN-TRAINING EXAMINATION EXPLANATION & INTERPRETATION OF SCORE REPORTS

AMERICAN BOARD OF SURGERY 2009 IN-TRAINING EXAMINATION EXPLANATION & INTERPRETATION OF SCORE REPORTS AMERICAN BOARD OF SURGERY 2009 IN-TRAINING EXAMINATION EXPLANATION & INTERPRETATION OF SCORE REPORTS Attached are the performance reports and analyses for participants from your surgery program on the

More information

Optimizing Communication of Emergency Response Adaptive Randomization Clinical Trials to Potential Participants

Optimizing Communication of Emergency Response Adaptive Randomization Clinical Trials to Potential Participants 1. Background The use of response adaptive randomization (RAR) is becoming more common in clinical trials (Journal of Clinical Oncology. 2011;29(6):606-609). Such designs will change the randomization

More information

A Comparison of Several Goodness-of-Fit Statistics

A Comparison of Several Goodness-of-Fit Statistics A Comparison of Several Goodness-of-Fit Statistics Robert L. McKinley The University of Toledo Craig N. Mills Educational Testing Service A study was conducted to evaluate four goodnessof-fit procedures

More information

Variations in Mean Response Times for Questions on the. Computer-Adaptive General Test: Implications for Fair Assessment

Variations in Mean Response Times for Questions on the. Computer-Adaptive General Test: Implications for Fair Assessment Variations in Mean Response Times for Questions on the Computer-Adaptive GRE@ General Test: Implications for Fair Assessment Brent Bridgeman Frederick Cline GRE No. 96-2P June 2 This report presents the

More information

Validity refers to the accuracy of a measure. A measurement is valid when it measures what it is suppose to measure and performs the functions that

Validity refers to the accuracy of a measure. A measurement is valid when it measures what it is suppose to measure and performs the functions that Validity refers to the accuracy of a measure. A measurement is valid when it measures what it is suppose to measure and performs the functions that it purports to perform. Does an indicator accurately

More information

Linking Assessments: Concept and History

Linking Assessments: Concept and History Linking Assessments: Concept and History Michael J. Kolen, University of Iowa In this article, the history of linking is summarized, and current linking frameworks that have been proposed are considered.

More information

Regression Discontinuity Analysis

Regression Discontinuity Analysis Regression Discontinuity Analysis A researcher wants to determine whether tutoring underachieving middle school students improves their math grades. Another wonders whether providing financial aid to low-income

More information

Appendix: Instructions for Treatment Index B (Human Opponents, With Recommendations)

Appendix: Instructions for Treatment Index B (Human Opponents, With Recommendations) Appendix: Instructions for Treatment Index B (Human Opponents, With Recommendations) This is an experiment in the economics of strategic decision making. Various agencies have provided funds for this research.

More information

Variable Data univariate data set bivariate data set multivariate data set categorical qualitative numerical quantitative

Variable Data univariate data set bivariate data set multivariate data set categorical qualitative numerical quantitative The Data Analysis Process and Collecting Data Sensibly Important Terms Variable A variable is any characteristic whose value may change from one individual to another Examples: Brand of television Height

More information

Improving Individual and Team Decisions Using Iconic Abstractions of Subjective Knowledge

Improving Individual and Team Decisions Using Iconic Abstractions of Subjective Knowledge 2004 Command and Control Research and Technology Symposium Improving Individual and Team Decisions Using Iconic Abstractions of Subjective Knowledge Robert A. Fleming SPAWAR Systems Center Code 24402 53560

More information

An introduction to power and sample size estimation

An introduction to power and sample size estimation 453 STATISTICS An introduction to power and sample size estimation S R Jones, S Carley, M Harrison... Emerg Med J 2003;20:453 458 The importance of power and sample size estimation for study design and

More information

Description of components in tailored testing

Description of components in tailored testing Behavior Research Methods & Instrumentation 1977. Vol. 9 (2).153-157 Description of components in tailored testing WAYNE M. PATIENCE University ofmissouri, Columbia, Missouri 65201 The major purpose of

More information

The Logic of Causal Order Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 15, 2015

The Logic of Causal Order Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 15, 2015 The Logic of Causal Order Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 15, 2015 [NOTE: Toolbook files will be used when presenting this material] First,

More information

Validity. Ch. 5: Validity. Griggs v. Duke Power - 2. Griggs v. Duke Power (1971)

Validity. Ch. 5: Validity. Griggs v. Duke Power - 2. Griggs v. Duke Power (1971) Ch. 5: Validity Validity History Griggs v. Duke Power Ricci vs. DeStefano Defining Validity Aspects of Validity Face Validity Content Validity Criterion Validity Construct Validity Reliability vs. Validity

More information

Understanding Uncertainty in School League Tables*

Understanding Uncertainty in School League Tables* FISCAL STUDIES, vol. 32, no. 2, pp. 207 224 (2011) 0143-5671 Understanding Uncertainty in School League Tables* GEORGE LECKIE and HARVEY GOLDSTEIN Centre for Multilevel Modelling, University of Bristol

More information

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data TECHNICAL REPORT Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data CONTENTS Executive Summary...1 Introduction...2 Overview of Data Analysis Concepts...2

More information

Computerized Mastery Testing

Computerized Mastery Testing Computerized Mastery Testing With Nonequivalent Testlets Kathleen Sheehan and Charles Lewis Educational Testing Service A procedure for determining the effect of testlet nonequivalence on the operating

More information

Developing and Testing Survey Items

Developing and Testing Survey Items Developing and Testing Survey Items William Riley, Ph.D. Chief, Science of Research and Technology Branch National Cancer Institute With Thanks to Gordon Willis Contributions to Self-Report Errors Self-report

More information

Bayesian Tailored Testing and the Influence

Bayesian Tailored Testing and the Influence Bayesian Tailored Testing and the Influence of Item Bank Characteristics Carl J. Jensema Gallaudet College Owen s (1969) Bayesian tailored testing method is introduced along with a brief review of its

More information

Psychology Research Process

Psychology Research Process Psychology Research Process Logical Processes Induction Observation/Association/Using Correlation Trying to assess, through observation of a large group/sample, what is associated with what? Examples:

More information

Inferential Statistics

Inferential Statistics Inferential Statistics and t - tests ScWk 242 Session 9 Slides Inferential Statistics Ø Inferential statistics are used to test hypotheses about the relationship between the independent and the dependent

More information

A Strategy for Handling Missing Data in the Longitudinal Study of Young People in England (LSYPE)

A Strategy for Handling Missing Data in the Longitudinal Study of Young People in England (LSYPE) Research Report DCSF-RW086 A Strategy for Handling Missing Data in the Longitudinal Study of Young People in England (LSYPE) Andrea Piesse and Graham Kalton Westat Research Report No DCSF-RW086 A Strategy

More information

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison Empowered by Psychometrics The Fundamentals of Psychometrics Jim Wollack University of Wisconsin Madison Psycho-what? Psychometrics is the field of study concerned with the measurement of mental and psychological

More information

WARNING, DISTRACTION, AND RESISTANCE TO INFLUENCE 1

WARNING, DISTRACTION, AND RESISTANCE TO INFLUENCE 1 Journal of Personality and Social Psychology 1965, Vol. 1, No. 3, 262-266 WARNING, DISTRACTION, AND RESISTANCE TO INFLUENCE 1 JONATHAN L. FREEDMAN Stanford DAVID 0. SEARS of California, Los Angeles 2 hypotheses

More information

Study on Gender in Physics

Study on Gender in Physics Listening Practice Study on Gender in Physics AUDIO - open this URL to listen to the audio: https://goo.gl/7xmlgh Questions 1-10 Choose the correct letter, A, B C. Study on Gender in Physics 1 The students

More information

GMAC. Scaling Item Difficulty Estimates from Nonequivalent Groups

GMAC. Scaling Item Difficulty Estimates from Nonequivalent Groups GMAC Scaling Item Difficulty Estimates from Nonequivalent Groups Fanmin Guo, Lawrence Rudner, and Eileen Talento-Miller GMAC Research Reports RR-09-03 April 3, 2009 Abstract By placing item statistics

More information

Chapter 5: Field experimental designs in agriculture

Chapter 5: Field experimental designs in agriculture Chapter 5: Field experimental designs in agriculture Jose Crossa Biometrics and Statistics Unit Crop Research Informatics Lab (CRIL) CIMMYT. Int. Apdo. Postal 6-641, 06600 Mexico, DF, Mexico Introduction

More information

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc.

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc. Chapter 23 Inference About Means Copyright 2010 Pearson Education, Inc. Getting Started Now that we know how to create confidence intervals and test hypotheses about proportions, it d be nice to be able

More information

The Social Norms Review

The Social Norms Review Volume 1 Issue 1 www.socialnorm.org August 2005 The Social Norms Review Welcome to the premier issue of The Social Norms Review! This new, electronic publication of the National Social Norms Resource Center

More information

Study 2a: A level biology, psychology and sociology

Study 2a: A level biology, psychology and sociology Inter-subject comparability studies Study 2a: A level biology, psychology and sociology May 2008 QCA/08/3653 Contents 1 Personnel... 3 2 Materials... 4 3 Methodology... 5 3.1 Form A... 5 3.2 CRAS analysis...

More information

Validity. Ch. 5: Validity. Griggs v. Duke Power - 2. Griggs v. Duke Power (1971)

Validity. Ch. 5: Validity. Griggs v. Duke Power - 2. Griggs v. Duke Power (1971) Ch. 5: Validity Validity History Griggs v. Duke Power Ricci vs. DeStefano Defining Validity Aspects of Validity Face Validity Content Validity Criterion Validity Construct Validity Reliability vs. Validity

More information

Sheila Barron Statistics Outreach Center 2/8/2011

Sheila Barron Statistics Outreach Center 2/8/2011 Sheila Barron Statistics Outreach Center 2/8/2011 What is Power? When conducting a research study using a statistical hypothesis test, power is the probability of getting statistical significance when

More information

Chapter 7: Descriptive Statistics

Chapter 7: Descriptive Statistics Chapter Overview Chapter 7 provides an introduction to basic strategies for describing groups statistically. Statistical concepts around normal distributions are discussed. The statistical procedures of

More information

Chapter 02 Developing and Evaluating Theories of Behavior

Chapter 02 Developing and Evaluating Theories of Behavior Chapter 02 Developing and Evaluating Theories of Behavior Multiple Choice Questions 1. A theory is a(n): A. plausible or scientifically acceptable, well-substantiated explanation of some aspect of the

More information

Further Properties of the Priority Rule

Further Properties of the Priority Rule Further Properties of the Priority Rule Michael Strevens Draft of July 2003 Abstract In Strevens (2003), I showed that science s priority system for distributing credit promotes an allocation of labor

More information

Implicit Information in Directionality of Verbal Probability Expressions

Implicit Information in Directionality of Verbal Probability Expressions Implicit Information in Directionality of Verbal Probability Expressions Hidehito Honda (hito@ky.hum.titech.ac.jp) Kimihiko Yamagishi (kimihiko@ky.hum.titech.ac.jp) Graduate School of Decision Science

More information

National Cancer Patient Experience Survey Results. University Hospitals of Leicester NHS Trust. Published July 2016

National Cancer Patient Experience Survey Results. University Hospitals of Leicester NHS Trust. Published July 2016 National Cancer Patient Experience Survey 2015 Results University Hospitals of Leicester NHS Trust Published July 2016 Revised 17th August 2016 The National Cancer Patient Experience Survey is undertaken

More information

CHAPTER - 6 STATISTICAL ANALYSIS. This chapter discusses inferential statistics, which use sample data to

CHAPTER - 6 STATISTICAL ANALYSIS. This chapter discusses inferential statistics, which use sample data to CHAPTER - 6 STATISTICAL ANALYSIS 6.1 Introduction This chapter discusses inferential statistics, which use sample data to make decisions or inferences about population. Populations are group of interest

More information

National Cancer Patient Experience Survey Results. East Kent Hospitals University NHS Foundation Trust. Published July 2016

National Cancer Patient Experience Survey Results. East Kent Hospitals University NHS Foundation Trust. Published July 2016 National Cancer Patient Experience Survey 2015 Results East Kent Hospitals University NHS Foundation Trust Published July 2016 Revised 17th August 2016 The National Cancer Patient Experience Survey is

More information

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Greg Pope, Analytics and Psychometrics Manager 2008 Users Conference San Antonio Introduction and purpose of this session

More information

Chapter 5: Research Language. Published Examples of Research Concepts

Chapter 5: Research Language. Published Examples of Research Concepts Chapter 5: Research Language Published Examples of Research Concepts Contents Constructs, Types of Variables, Types of Hypotheses Note Taking and Learning References Constructs, Types of Variables, Types

More information

Item Writing Guide for the National Board for Certification of Hospice and Palliative Nurses

Item Writing Guide for the National Board for Certification of Hospice and Palliative Nurses Item Writing Guide for the National Board for Certification of Hospice and Palliative Nurses Presented by Applied Measurement Professionals, Inc. Copyright 2011 by Applied Measurement Professionals, Inc.

More information

Practitioner s Guide To Stratified Random Sampling: Part 1

Practitioner s Guide To Stratified Random Sampling: Part 1 Practitioner s Guide To Stratified Random Sampling: Part 1 By Brian Kriegler November 30, 2018, 3:53 PM EST This is the first of two articles on stratified random sampling. In the first article, I discuss

More information

IAASB Main Agenda (February 2007) Page Agenda Item PROPOSED INTERNATIONAL STANDARD ON AUDITING 530 (REDRAFTED)

IAASB Main Agenda (February 2007) Page Agenda Item PROPOSED INTERNATIONAL STANDARD ON AUDITING 530 (REDRAFTED) IAASB Main Agenda (February 2007) Page 2007 423 Agenda Item 6-A PROPOSED INTERNATIONAL STANDARD ON AUDITING 530 (REDRAFTED) AUDIT SAMPLING AND OTHER MEANS OF TESTING CONTENTS Paragraph Introduction Scope

More information

Psychology Research Process

Psychology Research Process Psychology Research Process Logical Processes Induction Observation/Association/Using Correlation Trying to assess, through observation of a large group/sample, what is associated with what? Examples:

More information

Best on the Left or on the Right in a Likert Scale

Best on the Left or on the Right in a Likert Scale Best on the Left or on the Right in a Likert Scale Overview In an informal poll of 150 educated research professionals attending the 2009 Sawtooth Conference, 100% of those who voted raised their hands

More information

Political Science 15, Winter 2014 Final Review

Political Science 15, Winter 2014 Final Review Political Science 15, Winter 2014 Final Review The major topics covered in class are listed below. You should also take a look at the readings listed on the class website. Studying Politics Scientifically

More information

Running Head: TRUST INACCURATE INFORMANTS 1. In the Absence of Conflicting Testimony Young Children Trust Inaccurate Informants

Running Head: TRUST INACCURATE INFORMANTS 1. In the Absence of Conflicting Testimony Young Children Trust Inaccurate Informants Running Head: TRUST INACCURATE INFORMANTS 1 In the Absence of Conflicting Testimony Young Children Trust Inaccurate Informants Kimberly E. Vanderbilt, Gail D. Heyman, and David Liu University of California,

More information

Chapter 12. The One- Sample

Chapter 12. The One- Sample Chapter 12 The One- Sample z-test Objective We are going to learn to make decisions about a population parameter based on sample information. Lesson 12.1. Testing a Two- Tailed Hypothesis Example 1: Let's

More information

The Relationship between Fraternity Recruitment Experiences, Perceptions of Fraternity Life, and Self-Esteem

The Relationship between Fraternity Recruitment Experiences, Perceptions of Fraternity Life, and Self-Esteem Butler University Digital Commons @ Butler University Undergraduate Honors Thesis Collection Undergraduate Scholarship 2016 The Relationship between Fraternity Recruitment Experiences, Perceptions of Fraternity

More information

About this consent form. Why is this research study being done? Partners HealthCare System Research Consent Form

About this consent form. Why is this research study being done? Partners HealthCare System Research Consent Form Protocol Title: Gene Sequence Variants in Fibroid Biology Principal Investigator: Cynthia C. Morton, Ph.D. Site Principal Investigator: Cynthia C. Morton, Ph.D. Description of About this consent form Please

More information

Conversation Tactics Checklist (Hallam, R S, Ashton, P, Sherbourne, K, Gailey, L, & Corney, R. 2007).

Conversation Tactics Checklist (Hallam, R S, Ashton, P, Sherbourne, K, Gailey, L, & Corney, R. 2007). Conversation Tactics Checklist (Hallam, R S, Ashton, P, Sherbourne, K, Gailey, L, & Corney, R. 2007). This 54-item self-report questionnaire was devised to assess how people behave when it becomes difficult

More information

Addendum: Multiple Regression Analysis (DRAFT 8/2/07)

Addendum: Multiple Regression Analysis (DRAFT 8/2/07) Addendum: Multiple Regression Analysis (DRAFT 8/2/07) When conducting a rapid ethnographic assessment, program staff may: Want to assess the relative degree to which a number of possible predictive variables

More information

A Cross-validation of easycbm Mathematics Cut Scores in. Oregon: Technical Report # Daniel Anderson. Julie Alonzo.

A Cross-validation of easycbm Mathematics Cut Scores in. Oregon: Technical Report # Daniel Anderson. Julie Alonzo. Technical Report # 1104 A Cross-validation of easycbm Mathematics Cut Scores in Oregon: 2009-2010 Daniel Anderson Julie Alonzo Gerald Tindal University of Oregon Published by Behavioral Research and Teaching

More information

Running head: INDIVIDUAL DIFFERENCES 1. Why to treat subjects as fixed effects. James S. Adelman. University of Warwick.

Running head: INDIVIDUAL DIFFERENCES 1. Why to treat subjects as fixed effects. James S. Adelman. University of Warwick. Running head: INDIVIDUAL DIFFERENCES 1 Why to treat subjects as fixed effects James S. Adelman University of Warwick Zachary Estes Bocconi University Corresponding Author: James S. Adelman Department of

More information

HEDIS CAHPS 4.0H Member Survey

HEDIS CAHPS 4.0H Member Survey Helping you turn insight into action HEDIS - CAHPS 4.0H Member Survey Adult - HMO prepared for ANTHEM BLUE CROSS BLUE SHIELD - INDIA June 800.989.5150 dssresearch.com Table of Contents Background and Objectives

More information

In this chapter we discuss validity issues for quantitative research and for qualitative research.

In this chapter we discuss validity issues for quantitative research and for qualitative research. Chapter 8 Validity of Research Results (Reminder: Don t forget to utilize the concept maps and study questions as you study this and the other chapters.) In this chapter we discuss validity issues for

More information

Rise to the Challenge or Not Give a Damn: Differential Performance in High vs. Low Stakes Tests

Rise to the Challenge or Not Give a Damn: Differential Performance in High vs. Low Stakes Tests Rise to the Challenge or Not Give a Damn: Differential Performance in High vs. Low Stakes Tests Yigal Attali Educational Testing Service Rosedale Rd. MS 16 R Princeton, NJ 08541 USA Voice: 609 734 1747

More information

Never P alone: The value of estimates and confidence intervals

Never P alone: The value of estimates and confidence intervals Never P alone: The value of estimates and confidence Tom Lang Tom Lang Communications and Training International, Kirkland, WA, USA Correspondence to: Tom Lang 10003 NE 115th Lane Kirkland, WA 98933 USA

More information

Chapter 1 Introduction to I/O Psychology

Chapter 1 Introduction to I/O Psychology Chapter 1 Introduction to I/O Psychology 1. I/O Psychology is a branch of psychology that in the workplace. a. treats psychological disorders b. applies the principles of psychology c. provides therapy

More information

1 The conceptual underpinnings of statistical power

1 The conceptual underpinnings of statistical power 1 The conceptual underpinnings of statistical power The importance of statistical power As currently practiced in the social and health sciences, inferential statistics rest solidly upon two pillars: statistical

More information

Introduction. 1.1 Facets of Measurement

Introduction. 1.1 Facets of Measurement 1 Introduction This chapter introduces the basic idea of many-facet Rasch measurement. Three examples of assessment procedures taken from the field of language testing illustrate its context of application.

More information

Student Performance Q&A:

Student Performance Q&A: Student Performance Q&A: 2009 AP Statistics Free-Response Questions The following comments on the 2009 free-response questions for AP Statistics were written by the Chief Reader, Christine Franklin of

More information

Risk Aversion in Games of Chance

Risk Aversion in Games of Chance Risk Aversion in Games of Chance Imagine the following scenario: Someone asks you to play a game and you are given $5,000 to begin. A ball is drawn from a bin containing 39 balls each numbered 1-39 and

More information

APPLICATION FELLOWSHIP IN IMPLANT DENTISTRY PROGRAM

APPLICATION FELLOWSHIP IN IMPLANT DENTISTRY PROGRAM : Application Date Month Day Year University of Rochester University of Rochester Medical Center Eastman Institute for Oral Health 625 Elmwood Avenue Rochester, New York 14620-2989 USA (585) 275-8315 Paste

More information

About this consent form

About this consent form Protocol Title: Development of the smoking cessation app Smiling instead of Smoking Principal Investigator: Bettina B. Hoeppner, Ph.D. Site Principal Investigator: n/a Description of Subject Population:

More information

EVALUATING AND IMPROVING MULTIPLE CHOICE QUESTIONS

EVALUATING AND IMPROVING MULTIPLE CHOICE QUESTIONS DePaul University INTRODUCTION TO ITEM ANALYSIS: EVALUATING AND IMPROVING MULTIPLE CHOICE QUESTIONS Ivan Hernandez, PhD OVERVIEW What is Item Analysis? Overview Benefits of Item Analysis Applications Main

More information

Basic Concepts in Research and DATA Analysis

Basic Concepts in Research and DATA Analysis Basic Concepts in Research and DATA Analysis 1 Introduction: A Common Language for Researchers...2 Steps to Follow When Conducting Research...2 The Research Question...3 The Hypothesis...3 Defining the

More information

Humane League Labs. Sabine Doebel, Susan Gabriel, and The Humane League

Humane League Labs. Sabine Doebel, Susan Gabriel, and The Humane League Report: Does Encouraging The Public To Eat Vegan, Eat Vegetarian, Eat Less Meat, or Cut Out Or Cut Back On Meat And Other Animal Products Lead To The Most Diet Change? Sabine Doebel, Susan Gabriel, and

More information

Job Choice and Post Decision Dissonance1

Job Choice and Post Decision Dissonance1 ORGANIZATIONAL BEHAVIOR AND HUMAN PERFORMANCE 13, 133-145 (1975) Job Choice and Post Decision Dissonance1 EDWARD E. LAWLER III University of Michigan WALTER J. KULECK, JR. University of Michigan JOHN GRANT

More information

Examining the impact of moving to on-screen marking on concurrent validity. Tom Benton

Examining the impact of moving to on-screen marking on concurrent validity. Tom Benton Examining the impact of moving to on-screen marking on concurrent validity Tom Benton Cambridge Assessment Research Report 11 th March 215 Author contact details: Tom Benton ARD Research Division Cambridge

More information

Dr. Kelly Bradley Final Exam Summer {2 points} Name

Dr. Kelly Bradley Final Exam Summer {2 points} Name {2 points} Name You MUST work alone no tutors; no help from classmates. Email me or see me with questions. You will receive a score of 0 if this rule is violated. This exam is being scored out of 00 points.

More information

ORIGINS AND DISCUSSION OF EMERGENETICS RESEARCH

ORIGINS AND DISCUSSION OF EMERGENETICS RESEARCH ORIGINS AND DISCUSSION OF EMERGENETICS RESEARCH The following document provides background information on the research and development of the Emergenetics Profile instrument. Emergenetics Defined 1. Emergenetics

More information

National Cancer Patient Experience Survey Results. Milton Keynes University Hospital NHS Foundation Trust. Published July 2016

National Cancer Patient Experience Survey Results. Milton Keynes University Hospital NHS Foundation Trust. Published July 2016 National Cancer Patient Experience Survey 2015 Results Milton Keynes University Hospital NHS Foundation Trust Published July 2016 The National Cancer Patient Experience Survey is undertaken by Quality

More information

GRE R E S E A R C H. Cognitive Patterns of Gender Differences on Mathematics Admissions Tests. Ann Gallagher Jutta Levin Cara Cahalan.

GRE R E S E A R C H. Cognitive Patterns of Gender Differences on Mathematics Admissions Tests. Ann Gallagher Jutta Levin Cara Cahalan. GRE R E S E A R C H Cognitive Patterns of Gender Differences on Mathematics Admissions Tests Ann Gallagher Jutta Levin Cara Cahalan September 2002 GRE Board Professional Report No. 96-17P ETS Research

More information

FUNCTIONAL CONSISTENCY IN THE FACE OF TOPOGRAPHICAL CHANGE IN ARTICULATED THOUGHTS Kennon Kashima

FUNCTIONAL CONSISTENCY IN THE FACE OF TOPOGRAPHICAL CHANGE IN ARTICULATED THOUGHTS Kennon Kashima Journal of Rational-Emotive & Cognitive-Behavior Therapy Volume 7, Number 3, Fall 1989 FUNCTIONAL CONSISTENCY IN THE FACE OF TOPOGRAPHICAL CHANGE IN ARTICULATED THOUGHTS Kennon Kashima Goddard College

More information

GCSE EXAMINERS' REPORTS

GCSE EXAMINERS' REPORTS GCSE EXAMINERS' REPORTS SOCIOLOGY SUMMER 2016 Grade boundary information for this subject is available on the WJEC public website at: https://www.wjecservices.co.uk/marktoums/default.aspx?l=en Online Results

More information

Biserial Weights: A New Approach

Biserial Weights: A New Approach Biserial Weights: A New Approach to Test Item Option Weighting John G. Claudy American Institutes for Research Option weighting is an alternative to increasing test length as a means of improving the reliability

More information

Sawtooth Software. The Number of Levels Effect in Conjoint: Where Does It Come From and Can It Be Eliminated? RESEARCH PAPER SERIES

Sawtooth Software. The Number of Levels Effect in Conjoint: Where Does It Come From and Can It Be Eliminated? RESEARCH PAPER SERIES Sawtooth Software RESEARCH PAPER SERIES The Number of Levels Effect in Conjoint: Where Does It Come From and Can It Be Eliminated? Dick Wittink, Yale University Joel Huber, Duke University Peter Zandan,

More information

Confidence Intervals On Subsets May Be Misleading

Confidence Intervals On Subsets May Be Misleading Journal of Modern Applied Statistical Methods Volume 3 Issue 2 Article 2 11-1-2004 Confidence Intervals On Subsets May Be Misleading Juliet Popper Shaffer University of California, Berkeley, shaffer@stat.berkeley.edu

More information

Audio: In this lecture we are going to address psychology as a science. Slide #2

Audio: In this lecture we are going to address psychology as a science. Slide #2 Psychology 312: Lecture 2 Psychology as a Science Slide #1 Psychology As A Science In this lecture we are going to address psychology as a science. Slide #2 Outline Psychology is an empirical science.

More information

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do.

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do. Midterm STAT-UB.0003 Regression and Forecasting Models The exam is closed book and notes, with the following exception: you are allowed to bring one letter-sized page of notes into the exam (front and

More information

The Lens Model and Linear Models of Judgment

The Lens Model and Linear Models of Judgment John Miyamoto Email: jmiyamot@uw.edu October 3, 2017 File = D:\P466\hnd02-1.p466.a17.docm 1 http://faculty.washington.edu/jmiyamot/p466/p466-set.htm Psych 466: Judgment and Decision Making Autumn 2017

More information

State of Connecticut Department of Education Division of Teaching and Learning Programs and Services Bureau of Special Education

State of Connecticut Department of Education Division of Teaching and Learning Programs and Services Bureau of Special Education State of Connecticut Department of Education Division of Teaching and Learning Programs and Services Bureau of Special Education Introduction Steps to Protect a Child s Right to Special Education: Procedural

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

DON M. PALLAIS, CPA 14 Dahlgren Road Richmond, Virginia Telephone: (804) Fax: (804)

DON M. PALLAIS, CPA 14 Dahlgren Road Richmond, Virginia Telephone: (804) Fax: (804) DON M. PALLAIS, CPA 14 Dahlgren Road Richmond, Virginia 23233 Telephone: (804) 784-0884 Fax: (804) 784-0885 Office of the Secretary PCAOB 1666 K Street, NW Washington, DC 20006-2083 Gentlemen: November

More information

2 Critical thinking guidelines

2 Critical thinking guidelines What makes psychological research scientific? Precision How psychologists do research? Skepticism Reliance on empirical evidence Willingness to make risky predictions Openness Precision Begin with a Theory

More information