Are the Least Frequently Chosen Distractors the Least Attractive?: The Case of a Four-Option Picture-Description Listening Test

Size: px
Start display at page:

Download "Are the Least Frequently Chosen Distractors the Least Attractive?: The Case of a Four-Option Picture-Description Listening Test"

Transcription

1 Are the Least Frequently Chosen Distractors the Least Attractive?: The Case of a Four-Option Picture-Description Listening Test Hideki IIURA Prefectural University of Kumamoto Abstract This study reports on a replication of Iimura s (2014) study on the attractiveness of distractors in multiple-choice listening tests. Sixty-eight Japanese university students were assessed on their correct responses in a picture-description task of an C listening test (15 questions, four options each). During the listening test, the participants were asked to judge each of the four options as correct or incorrect and report the degree of confidence they had in their judgment. On the basis of the confidence level and correctness of response, confidence and attractiveness scores were generated. To assess how listening ability affected test-takers confidence and distractors attractiveness, three groups were developed on the basis of correct scores obtained on the listening test. The results of the replication study have confirmed the original study, suggesting that (a) the least frequently chosen distractors were not always the least attractive, (b) upper-level listeners were less attracted to distractors, and (c) upper-level listeners had greater confidence when responding with the correct answers. This article concludes that the conventional item analyses (i.e., response frequency and discriminatory power) are insufficient in evaluating the effectiveness of distractors, and a new kind of survey, in which test-takers can evaluate each distractor independently, should be incorporated in future C test development. 1. Introduction A typical multiple-choice (C) item consists of three or four options. One, often known as the key, is correct, while the other options, the distractors, are incorrect answers. Downing (2006) 46

2 explained that C format is favored in large-scale achievement tests because it has significant validity advantages. Notwithstanding its efficiency for automated scoring, the C format, if properly constructed, can assess a wide range of content and measure high-level cognitive skills. From a listening assessment perspective, the C format can tap into various levels of processing the input (e.g., phonological, word, sentence, pragmatic, or discourse levels: Field, 2012). However, the C format has often been criticized for its limitations, such as inducing random guessing or creating a negative washback effect (Hughes, 2003). From a listening assessment perspective, it imposes heavy cognitive demands because of dual task interference (e.g., audition: listening to the text and vision: reading questions/options: Pashler & Johnson, 1998). Furthermore, valid C items, especially effective distractors, are very difficult to construct. In addition, creating a sufficient number of plausible distractors is extremely difficult (Douglas, 2010). In fact, in his review of C-related articles, Rodriguez (2005) found that only one or two distractors among four or five options functioned as intended. Evidently then, one vital factor in successful C tests is to create functional distractors. As many theorists have contended, the functionality of a distractor has conventionally been evaluated in two ways: (a) discriminability, and (b) response frequency (Fulcher, 2010; Haladyna, 2004; Haladyna & Downing, 1993; Henning, 1987). Discriminability measures the extent to which a distractor can distinguish high-ability from low-ability test-takers. Response frequency measures the number of test-takers who choose each distractor. In other words, dysfunctional distractors have no discriminatory power and have been shown to be unattractive among test-takers. In the test development, distractors that succeed in attracting a certain number of test-takers are judged as functional and therefore remain in the test, whereas distractors that attract few test-takers are judged as unattractive and become targets for modification. Previous studies have indicated that decreasing C options had little effect on the C test results. For example, decreasing options from four to three, by eliminating the least frequently chosen response, did not affect the item difficulty or item discrimination (Shizuka, Takeuchi, 47

3 Yashima, & Yoshizawa, 2006). Similar results were found comparing five-, four-, and three-option listening tests (Lee & Winke, 2013). The three-option C test differed in item difficulty from the four- and five-option tests, but these three formats did not differ in item discrimination. Although previous studies employed different option deletion methods, namely, item analysis (Shizuka et al., 2006) and evaluators judgments (Lee & Winke, 2013), both studies depended on the identical assumption that the least frequently selected distractors cannot attract the attention of test-takers. Considering the nature of the C test-taking process, further consideration of the relationship between response frequency and the attractiveness of distractors is necessary. It seems reasonable to assume that when taking an C format test, test-takers consider three or four times per item, depending on the number of options, before they finally choose an option. That is, given test-takers must only select one option, some unselected distractors may have functioned as plausible ones. Thus, we can propose that unselected distractors are not uniformly unattractive to test-takers. Therefore, we should evaluate each distractor s attractiveness, separate from its response frequency. 2. The Original Study To evaluate distractor attractiveness, Iimura (2014) developed a questionnaire where the participants were asked to judge each of the options as correct or incorrect and further indicate the degree of confidence for their judgement. He surveyed 75 Japanese university students with the questionnaire to explore the distractor attractiveness based on the following perspectives: 1. Compare keys with distractors with regard to test-takers' confidence levels; 2. Compare the attractiveness of distractors in terms of test-takers' confidence levels and response frequency (p. 21). The original study found that the difference of English proficiency affected test-takers confidence in choosing the correct and incorrect answers with regard to the first perspective. The results showed that there was a significant difference in the degree of confidence in both keys and distractors between test-takers proficiency levels, indicating that proficient listeners were more

4 likely to choose the correct answer with more confidence and that less proficient listeners were more likely to be distracted by distractors. With regard to the second perspective, the study revealed that frequency of response was not always consistent with the attractiveness of distractors. In more than half of the items, the least frequently chosen distractor was not the least attractive. Based on the survey results, he concluded that conventional item analysis based on the response frequency was not sufficient in evaluating the effectiveness of distractors. 3. The Present Study 3.1 Aims Despite the fact that Iimura (2014) revealed the insufficiency of the conventional distractor analysis, further investigation is needed to verify his findings because his study examined only the three-option question-response task. Given that task types can affect test performance (Bachman & Palmer, 2010), it is necessary to examine some tasks other than the question-response. oreover, the number of options should be taken into account. When compared to a three-option format, a four-option format has been more widely used in major English listening tests, such as the Test of English for International Communication (TOEIC: except Part 2 in the listening section), Test of English as a Foreign Language, Institutional Testing Program (TOEFL ITP), and EIKEN. Therefore, it is worth examining the attractiveness of distractors in four-option picture-description of the listening test. The aim of this replication study is to verify the abovementioned findings in Iimura (2014). As in the original study, this study examined the functionality of distractors from the two following research viewpoints: (a) compare keys with distractors with regard to test-takers confidence levels, and (b) compare the attractiveness of distractors in terms of test-takers confidence levels and response frequency. This study closely followed the research procedures adopted in the original study. As will be elaborated in the following sections, the relevant factors such as participants, the questionnaire, and 49

5 scoring were identical or closely related to those in the original study while altering the task (picture-description) and the number of options (four-option). 3.2 ethod Participants In line with the original study, this replication study was conducted with Japanese university students (N = 68, mean age = 19.5, SD = 0.5, ale = 39, Female = 28) whose English proficiency levels ranged between A2.1 and B1.2 according to the CEFR-J*1 (Tono, 2013). All participants were native speakers of Japanese and had started learning English in junior high school at the age of 12. Each student had been studying English for at least six years. Prior to data collection, participants were informed that all identifying information would remain confidential, and their permission to participate was obtained Listening Test While the original study adopted a three-option question-response task, this study used a four-option picture-description task consisting 15 test items from a TOEIC preparation book (Educational Testing Service, 2011). In this task, test-takers heard four statements (i.e., options) about a picture and were asked to select the one statement that best described the picture. The four statements were only heard, not read, and played only once each. In the original TOEIC CD attached to the preparation book, there was approximately a one-second pause between options and a three-second pause between items. We edited the original CD to have three-second pauses between options for answering the questionnaire and eight-second pauses between questions, to allow participants ample time to answer the listening test and questionnaire. The listening test was played with a CD player. 50

6 3.2.3 Questionnaire From the design of the questionnaire, it was possible to generate a confidence score for a given option, and the attractiveness of distractors. To elicit test-takers perceptions of keys and distractors, the current study adopted the original study s questionnaire that served as both an answer sheet and a survey to determine confidence levels. Because the original questionnaire was designed for a three-option format, it was tailored for a four-option format. As seen in Figure 1, the questionnaire was symmetrically divided at the middle point (0: I m not sure) into two sides for correctness (correct or incorrect) with three confidence levels, for both correct and incorrect options (H: very confident; : moderately confident; L: not confident). H Incorrect 9HU\ FRQILGHQW L 0RGHUDWHO\ FRQILGHQW 0 1RW YHU\ FRQILGHQW L, P QRW VXUH Correct 1RW YHU\ FRQILGHQW H 0RGHUDWHO\ FRQILGHQW H C 9HU\ FRQILGHQW Incorrect 9HU\ FRQILGHQW L 0RGHUDWHO\ FRQILGHQW 0 1RW YHU\ FRQILGHQW L, P QRW VXUH Correct 1RW YHU\ FRQILGHQW H 0RGHUDWHO\ FRQILGHQW B 9HU\ FRQILGHQW 9HU\ FRQILGHQW 0RGHUDWHO\ FRQILGHQW 1RW YHU\ FRQILGHQW, P QRW VXUH 1RW YHU\ FRQILGHQW 0RGHUDWHO\ FRQILGHQW A 9HU\ FRQILGHQW 9HU\ FRQILGHQW 0RGHUDWHO\ FRQILGHQW 1RW YHU\ FRQILGHQW, P QRW VXUH 1RW YHU\ FRQILGHQW 0RGHUDWHO\ FRQILGHQW 9HU\ FRQILGHQW No.1 H L 0 L H D H Correct L 0 L H Incorrect Correct Incorrect Figure 1. Sample from the questionnaire. Test-takers were asked to circle the part of the line according to the confidence level they had in their judgment of each option s correctness. In addition, they were asked to circle one of the capital letters A, B, C or D representing each option. Figure 1 illustrates a case whereby a participant judged A as incorrect with moderate confidence, B as incorrect with low confidence, C as correct with high confidence, and D as correct with low confidence. Therefore, the test-taker decided on C as his/her final answer. During the experiment, participants were instructed to choose the degree of confidence they had in Option A, immediately after hearing this option. This step was repeated for Options B, C, and D. They were then asked to select and circle one of the four options 51

7 as their final answer for the item Scoring As described above, the questionnaire was utilized to elicit the test-takers perceptions of each option. Identical to the original study, the keys and distractors were separately coded because the questionnaire of keys was intended to elicit test-takers confidence levels toward the correct answer and that of distractors was intended to determine the distractors level of attractiveness, indicative of the test-takers confidence levels toward incorrect answers Confidence scoring The questionnaire data was coded with values ranging from one to seven on the basis of correctness of the response and test-takers confidence levels. Table 1 illustrates how test-takers confidence levels were converted into scores. The possible scores that could be achieved ranged from one to seven. Low or high scores indicated that participants felt very confident in either incorrect (e.g., one) or correct (e.g., seven) answers, respectively. Table 1 Scoring Table for Confidence Ratings on Keys and Attractiveness in Distractors Confidence level Correctness Confidence/attractiveness Very confident Correct 7 Not very confident Correct 6 Not confident Correct 5 I m not sure 4 Not confident Incorrect 3 Not very confident Incorrect 2 Very confident Incorrect

8 Attractiveness scoring Distractor attractiveness was coded on the basis of correctness and confidence using a scale with values ranging from one to seven. Table 1 illustrates the conversion procedure for generating an attractiveness score from correctness and confidence. The possible scores ranged from one to seven, with low or high scores indicating that a given distractor was deemed by participants as either incorrect or correct with a high level of confidence (although all distractors were incorrect). Thus, a given distractor ranged from not attractive (e.g., one) to attractive (e.g., seven). An attractive distractor appeared to lead the test-taker into deeming it correct with high confidence. Therefore, a code of seven was assigned to those answers where a participant deemed the distractor correct with high confidence. On the other hand, an unattractive distractor only caused test-takers to believe it to be incorrect. Therefore, a distractor judged as incorrect with high confidence was allocated the code of one. 4. Results and Discussion 4.1 Listening Test Performance The average score on the listening test in this replication study was (out of 15, or 67% correct) while the mean in the original study was 9.15 (out of 15, or 61%). This difference can be interpreted in two ways: (a) participants in the replication study were more proficient than those in the original study, or (b) the picture-descript task in the replication study was easier than the question-response task in the original study. With regard to the internal consistency reliability (Cronbach's alpha), the replication study (α =.61) was lower than the original study (α =.70) probably because reliability is affected by the amount of variance among test-takers (Green, 2013). In other words, participants' proficiency range in the replication study (SD = 2.30) was smaller than that in the replication study (SD = 2.96). Followed by the original study, as Table 2 shows, participants were divided into three groups on the basis of the number of correct responses given in the listening test: low-score (LG), 53

9 middle-score (G), and high-score (HG) group. Table 2 Three Groups on the Basis of the Listening Test Group (n) Score range SD 95% CI Reliability LG (14) [6.10, 7.33] G (34) [9.66, 10.23].61 HG (20) [12.44, 13.16] Note. LG = low-score group; G = middle-score group; HG = high-score group. aximum possible score is 15. CI = confidence interval. 4.2 Confidence and Attractiveness Reliability of the Questionnaire Internal consistency reliability (Cronbach's alpha) of the questionnaire was calculated for (a) confidence (i.e., keys) and (b) attractiveness (i.e., distractors). As Table 3 shows, reliability for confidence (α =.67) was lower than for attractiveness (α =.91) probably because the number of keys (15 one for each item) was smaller than the number of distractors (45 three for each item). This difference was also found in the original study (confidence =.80; attractiveness =.92, respectively). Table 3 Reliability of the Questionnaire Used in This Study Confidence Attractiveness No. of items Cronbach s alpha Confidence in keys Table 4 outlines a summary of the questionnaire used in this study. As can be seen, the mean confidence score in the keys increased as the English level (i.e., three scored groups) increased,

10 whereas the mean attractiveness score for the distractors decreased as the level of English increased. The one-way ANOVA performed on the confidence scores indicated that the groups significantly differed with a large effect size: F (2, 65) = 31.91, p <.001, η2 =.50. Tukey s post-hoc tests revealed that all groups significantly differed from each other in confidence scores, p <.05. These results were parallel to the findings in the original study where the three groups significantly differed with a large effect size: F (2, 72) = 53.22, p <.001, η2 =.36 and the post-hoc test showed that they differed from each other at the 0.05 level. Table 4 Average Test-Takers Confidence in Keys and Attractiveness of Distractors Score group Low iddle SD Confidence 4.55a 0.39 Attractiveness 3.19a 0.58 High SD SD 4.94a a b a, b 0.47 Note. eans in a row sharing subscripts (a & b) are significantly different from each other. Possible maximum score is seven. These results suggest that advanced test-takers tended to have more confidence than less-advanced test-takers when they responded with correct answers. This may be related to the effective use of metacognitive strategies; that is, proficient listeners can monitor and evaluate their listening comprehension more efficiently than less-proficient listeners (e.g., acaro, Graham, & Vanderplank, 2007; Vandergrift & Goh, 2012). It can therefore be concluded that advanced test-takers respond to choosing the correct answers with more confidence Attractiveness of distractors Another one-way ANOVA performed on the attractiveness scores indicated again that the groups differed significantly with a large effect size: F (2, 65) = 12.01, p <.001, η2 =.27. Tukey s 55

11 post-hoc tests revealed that scores of LG and G did not differ from each other, but that both were significantly higher than scores of HG, p <.05. These results were consistent with the original study 2 where the three groups significantly differed with a large effect size: F(2, 72) = 29.85, p <.001,η =.21 and the post-hoc test demonstrated that LG and G were significantly higher than the attractiveness of distractors for HG at the 0.05 level. These results indicate that lower-level test-takers had a tendency to be more tempted by distractors than advanced test-takers. Thus, it can be concluded that less-advanced test-takers tend to be tempted by distractors Response frequency and discrimination in distractors Table 5 reports the response frequency (i.e., how many test-takers chose each option) and results of Chi-square tests for each of the 15 items. In Item 1, for example, 45.6% of the participants (n = 31) chose the key, 39.7% (n = 27) chose Distractor 1, 11.8% (n = 8) selected Distractor 2, and the rest (n = 2) chose Distractor 3. Fisher s exact test*2 was carried out to determine if there was a significant difference in the number of responses between these four options, resulting in a sizable 2 (3) = 35.41, p <.001. Overall, in all 15 items there was a significant difference between them: χ difference in response frequencies between the four choices. Table 5 also illustrates discriminatory power (point-biserial correlation coefficient: rpbi) for each item. According to Haladyna and Rodriguez (2013), distractors should be negatively correlated with total score and that keys should be positively correlated with total test score. As the table shows, correlation in keys was higher than in distractors (Distractor 1, 2, & 3) in the 15 items. oreover, all the distractors produced negative or almost zero point-biserial correlations. Therefore, in line with the original study, all distractors in the replication study can be considered to have functioned properly in terms of discrimination. 56

12 Table 5 Response Frequency, Fisher s Exact Tests, and Discriminatory Power for Each Item Item Key/IF Distractor 1 Distractor 2 Distractor 3 χ 2 % (r pbi ) % (r pbi ) % (r pbi ) % (r pbi ) d value (.39) 39.7 (.25) 11.8 (.08) 2.9 (.26) *** (.62) 29.4 (.27) 20.6 (.43) 4.4 (.05) *** (.54) 26.5 (.16) 22.1 (.43) 1.5 (.16) *** (.33) 8.8 (.23) 4.4 (.23) *** (.34) 4.4 (.34) *** (.57) 23.5 (.41) 20.6 (.26) 2.9 (.02) *** (.32) 17.6 (.48) 16.2 (.04) 13.2 (.02) *** (.30) 54.4 (.22) 20.6 (.05) ** (.38) 3.0 (.20) 3.0 (.33) *** (.23) 1.5 (.06) 1.5 (.26) *** (.19) 5.9 (.10) 1.5 (.21) *** (.30) 2.9 (.12) 2.9 (.26) 2.9 (.12) *** (.44) 13.2 (.41) 1.5 (.16) *** (.38) 23.5 (.33) 1.5 (.21) *** (.37) 38.8 (.31) 25.4 (.04) 11.9 (.07) * Note. IF = item facility. * p <.05, ** p <.01, *** p < The lowest attractiveness and response frequency Table 6 presents the results from 3 x 3 (Group x Distractor) two-way mixed ANOVAs that were carried out on the attractiveness scores for each of the 15 C questions. The levels of Distractor variables were based on frequency of responses, such that Dis 1 was the most frequently chosen distractor, Dis 2 the next, and Dis 3 the least chosen. The alpha level was set to.003 (.05/15) because 15 significance tests were conducted one each for each item. The post-hoc analysis was performed with one-way ANOVA followed by t-tests with Bonferroni correction. The results of the ANOVAs can be divided into five categories, in terms of statistically 57

13 significant differences in two factors (Group x Distractor): (a) interaction (Item 5), (b) Group (Item 2 & 11), (c) Distractor (Item 1, 3, 4, 6, 8, & 15), (d) Group and Distractor (Item 7, 12, 13, & 14), and (e) no difference (Item 9 & 10). Overall, in two-thirds of the items, three distractors significantly differed from each other in terms of attractiveness and also in one-third of the items, the level of the distractors attractiveness varied according to groups. On the basis of the ANOVAs results, this categorization, however, may only provide us with a partial understanding of distractor attractiveness. It is prudent to take into account the results of the response frequency shown in Table 5 cross-checking the rank of attractiveness with the rank of response frequency in the distractors of each item. Table 6 Two-Way ANOVAs on Attractiveness of Three Distractors for 15 Items Source S F p η 2 Post hoc Item 1 Group Distractor * 0.27 Dis 3 < 2 < 1 Group x Distractor Item 2 Group <.001 * 0.06 H < = L Distractor Group x Distractor Item 3 Group Distractor <.001 * 0.34 Dis 3 < 1 = 2 Group x Distractor Item 4 Group Distractor <.001 * 0.11 Dis 3 < 2 = 1 Group x Distractor (Continues)

14 Table 6 (Continued) Two-Way ANOVAs on Attractiveness of Three Distractors for 15 Items Source S F p η 2 Post hoc Item 5 Group L: Dis 2 = 3 < 1 Distractor <.001 * : Dis 2 < 1 = 3 Group x Distractor <.001 * 0.06 H: Dis 2 = 1 = 3 Item 6 Group Distractor <.001 * 0.13 Dis 3 < 2 = 1 Group x Distractor Item 7 Group <.001 * 0.10 H < = L Distractor * 0.05 Dis 1 = 3 < 2 Group x Distractor Item 8 Group Distractor <.001 * 0.48 Dis 3 < 2 < 1 Group x Distractor Item 9 Group Distractor Group x Distractor Item 10 Group Distractor Group x Distractor Item 11 Group * 0.08 H < = L Distractor Group x Distractor (Continues)

15 Table 6 (Continued) Two-Way ANOVAs on Attractiveness of Three Distractors for 15 Items Source S F p η 2 Post hoc Item 12 Group <.001 * 0.12 H < = L Distractor <.001 * 0.05 Dis 3 = 2 < 1 Group x Distractor Item 13 Group <.001 * 0.14 H < = L Distractor <.001 * 0.28 Dis 2 = 3 < 1 Group x Distractor Item 14 Group <.001 * 0.08 H < = L Distractor <.001 * 0.32 Dis 3 = 2 < 1 Group x Distractor Item 15 Group Distractor <.001 * 0.08 Dis 3 = 2 < 1 Group x Distractor Note. Dis 1 = the most frequently chosen distractor; Dis 2 = the second-most frequently chosen distractor; Dis 3 = the least frequently chosen distractor. H = high-scoring group; = middle-scoring group; L = low-scoring group. * p <.003. Table 7 presents four categories that combined the results of the ANOVAs with the response frequency results. It should be noted here that in Table 7, items were separated by the results, regardless of whether the main effect of Group was statistically significant. Although this information may not be crucial for identifying the relationship between the least attractive and the least frequent selection of distractors, it reveals a tendency of the test-takers perceptions, in terms of overall attractiveness toward the distractors

16 Table 7 Items Categorized on the Basis of the Relationship between Attractiveness and Response Frequency Category: The least frequently chosen distractor is 1. the least attractive ain effect of group ns sig 1, 3, 4, 6, 8 2. as attractive as the second frequently chosen distractor 3. as attractive as the most frequently chosen distractor 2, 12, 13, 14, 9, 10 7, 11 Others (Item 5) The first category, The least frequently chosen distractor is the least attractive, was derived from the results when the main effect of the Distractor was shown to be significantly different between three distractors, and also the lowest attractive distractor became the least frequently selected distractor. This category reflects the conventional assumption that a distractor s attractiveness is commensurate with response frequency. Only five out of the 15 items fell into this category (items 1, 3, 4, 6, & 8). The second category, The least frequently chosen distractor is as attractive as the second frequently chosen distractor, referred to the results (a) when there was no significant main effect of the Distractor between the three distractors, or (b) even when the main effect of the Distractor was significantly different between three distractors and the second frequently chosen distractor had no greater attractiveness than the least chosen distractor. Five out of the 15 items fell into this category (items 2, 12, 13, 14, & 15). The third category, The least frequently chosen distractor is as attractive as the most frequently chosen distractor, referred to the results (a) when no significant main effect of the Distractor was found, or (b) even when the main effect of the Distractor was significantly different between three distractors, and the most frequently chosen distractor had no greater attractiveness than the least frequently chosen distractor. Four out of the 15 items were identified as falling into this category (items 7, 9, 10, & 11). 61

17 The last category (Others) was drawn from the results where there was a significant interaction between the Group and the Distractor (Item 5). ore specifically, for the LG, the least frequently chosen distractors (dis 2 & 3) were the least attractive distractors. For the G and HG, on the other hand, the least frequently chosen distractors were not the least attractive. In summary, these results showed that in one-third of the items, the least frequently selected distractor was the least attractive, but that for more than half the items, the least frequently chosen distractor was as attractive as the second, or even the most frequently chosen distractor. One might argue that some items in this study too easily attracted test-takers. In fact, item facility of several items (items 5, 9, 10, 11, 12) exceeded 90% and the mean attractiveness score of the distractors in those items was relatively low. Not all the easy items, however, failed to attract test-takers. In Item 11, for instance, the mean distractor attractiveness was not low ( = 2.3 for HG; = 3.0 for G; and = 3.4 for LG, respectively). Hence, it can be said that distractors, in even easy items, did attract test-takers to some extent. This finding is well accordance with the original study; there were only five out of 15 items in which the least frequently chosen distractor was also the least attractive, whereas in seven out of 15 items, the least frequently chosen distractor was not the least attractive. These findings have led us to reconsider the conventional way of evaluating distractors and response frequency. As found in this replication study as well as the original study, there were several unselected distractors that were still able to function properly. It is necessary to differentiate distractors that were neither chosen nor attractive from those that were not chosen but plausible enough to attract test-takers. Therefore, it is possible that response frequency is insufficient for evaluating the effectiveness of distractors. Instead, incorporating some tools that can elicit test-takers' perceptions of each distractor is required. The questionnaire developed by Iimura (2014) could be recommended for use because it has shown strong internal consistency in the original and this replication study. Having said that, one might claim how it could be possible to incorporate such tools in the actual test design. In the high-stake test development, checking new items are usually conducted as pilot testing or field testing. Examining attractiveness of each distractor should 62

18 be done in such trialing phases with using the mentioned above questionnaire. 5. Conclusion Functionality of distractors has predominantly been evaluated by response frequency. Namely, prior investigations have assumed a direct relationship between response frequency and the attractiveness of distractors. This study has attempted to reexamine the relationship between attractiveness and frequency in distractors by using a questionnaire that revealed test-takers confidence in each option. The findings of this study strongly support Iimura (2014) in that (a) upper-level test-takers had greater confidence when responding with the correct answers, (b) lower-level test-takers were more attracted to distractors, and (c) the least frequently chosen distractors were not necessarily the least attractive. Therefore, it can be surmised that conventional item analyses are insufficient for evaluating the effectiveness of distractors during test development. In other words, a new kind of survey, in which test-takers can evaluate each distractor independently, should be incorporated into C test development. This study examined a picture-description listening task with four options to investigate distractor attractiveness that has been examined in a previous study with a three-option question-response listening task. Having said that, both question-response and picture-description tasks were relatively simple in comparison to passage comprehension tasks that contain more complex information within the passage itself, the question stems, and options. Future research should include a passage comprehension task with longer dialogues or monologues. Acknowledgements This work was supported by JSPS (Japan Society for the Promotion of Science) KAKENHI (Grand-in-Aid for Scientific Research) Grant number [15K02790]. 63

19 Notes 1. English listening proficiency levels from A2.1 to B1.2 in CEFR-J are equivalent to from 110 (185) to 335 (395) in TOEIC listening scores (Tono, 2013, p. 229). 2. Instead of Chi-square tests, Fisher s exact tests were suitable when the expected frequencies were too low (e.g., sample size is quite small: Field, 2009). References Bachman, L., & Palmer, A. (2010). Language assessment in practice. Oxford University Press. Douglas, D. (2010). Understanding language testing. London: Hodder Education. Downing, S.. (2006). Selected-response item formats in test development. In S.. Downing & T.. Haladyna (Eds.), Handbook of test development (pp ). ahwah, NJ: Lawrence Erlbaum Associates. Educational Testing Service. (2011). TOEIC test official practice: Listening. Tokyo: Institute for International Business Communication. Field, A. (2009). Discovering statistics using SPSS (3rd ed.). London: Sage. Field, J. (2012). Cognitive validity. In A. Geranpayeh & L. Taylor (Eds.), Examining listening: Research and practice in assessing second language listening (pp ). Cambridge University Press. Fulcher, G. (2010). Practical language testing. London: Hodder Education. Green, R. (2013). Statistical analyses for language testers. Hampshire: Palgrave acmillan. Haladyna, T.. (2004). Developing and validating multiple-choice test items (3rd ed.). ahwah, NJ: Lawrence Erlbaum Associates. Haladyna, T.., & Downing, S.. (1993). How many options is enough for a multiple-choice test item? Educational and Psychological easurement, 53, doi: / Haladyna, T.., & Rodriguez,. C. (2013). Developing and validating test items. New York: 64

20 Routledge. Henning, G. (1987). A guide to language testing: Development, evaluation, research. Cambridge, A: Newbury House Publisher. Hughes, A. (2003). Testing for language teachers (2nd ed.). Cambridge University Press. Iimura, H. (2014). Attractiveness of distractors in multiple-choice listening tests. JLTA Journal, 17, Lee, H., & Winke, P. (2013). The differences among three-, four-, and five-option-item formats in the context of a high-stakes English-language listening tests. Language Testing, 30, doi: / acaro, E., Graham, S., & Vanderplank, R. (2007). A review of listening strategies: Focus on sources of knowledge and on success. In A. D. Cohen & E. acaro (Eds.), Language learner strategies (pp ). Oxford University Press. Pashler, H., & Johnson, J. C. (1998). Attentional limitations in dual-task performance. In H. Pashler (Ed.), Attention (pp ). Hove, East Sussex: Psychology Press. Rodriguez,. C. (2005). Three options are optimal for multiple-choice items: A meta-analysis of 80 years of research. Educational easurement: Issues and Practice, 24(2), doi: /j x Shizuka, T., Takeuchi, O., Yashima, T., & Yoshizawa, K. (2006). A comparison of three- and four-option English tests for university entrance selection purposes in Japan. Language Testing, 23, doi: / lt319oa Tono, Y. (Ed.). (2013). The CEFR-J handbook: A resource book for using CAN-DO descriptors for English language teaching. Tokyo: Taishukan. Vandergrift, L., & Goh, C. C.. (2012). Teaching and learning second language listening: etacognition in action. New York, NY: Routledge. 65

Test-Taking Strategies and Task-based Assessment: The Case of Iranian EFL Learners

Test-Taking Strategies and Task-based Assessment: The Case of Iranian EFL Learners Test-Taking Strategies and Task-based Assessment: The Case of Iranian EFL Learners Hossein Barati Department of English, Faculty of Foreign Languages, University of Isfahan barati@yahoo.com Zohreh Kashkoul*

More information

AN ANALYSIS ON VALIDITY AND RELIABILITY OF TEST ITEMS IN PRE-NATIONAL EXAMINATION TEST SMPN 14 PONTIANAK

AN ANALYSIS ON VALIDITY AND RELIABILITY OF TEST ITEMS IN PRE-NATIONAL EXAMINATION TEST SMPN 14 PONTIANAK AN ANALYSIS ON VALIDITY AND RELIABILITY OF TEST ITEMS IN PRE-NATIONAL EXAMINATION TEST SMPN 14 PONTIANAK Hanny Pradana, Gatot Sutapa, Luwandi Suhartono Sarjana Degree of English Language Education, Teacher

More information

2013 Supervisor Survey Reliability Analysis

2013 Supervisor Survey Reliability Analysis 2013 Supervisor Survey Reliability Analysis In preparation for the submission of the Reliability Analysis for the 2013 Supervisor Survey, we wanted to revisit the purpose of this analysis. This analysis

More information

Reliability and Validity of a Task-based Writing Performance Assessment for Japanese Learners of English

Reliability and Validity of a Task-based Writing Performance Assessment for Japanese Learners of English Reliability and Validity of a Task-based Writing Performance Assessment for Japanese Learners of English Yoshihito SUGITA Yamanashi Prefectural University Abstract This article examines the main data of

More information

Implicit Information in Directionality of Verbal Probability Expressions

Implicit Information in Directionality of Verbal Probability Expressions Implicit Information in Directionality of Verbal Probability Expressions Hidehito Honda (hito@ky.hum.titech.ac.jp) Kimihiko Yamagishi (kimihiko@ky.hum.titech.ac.jp) Graduate School of Decision Science

More information

Information and cue-priming effects on tip-of-the-tongue states

Information and cue-priming effects on tip-of-the-tongue states Information and cue-priming effects on tip-of-the-tongue states Psycholinguistics 2 Contents: Introduction... 1 Pilot Experiment... 2 Experiment... 3 Participants... 3 Materials... 4 Design... 4 Procedure...

More information

Comprehensive Statistical Analysis of a Mathematics Placement Test

Comprehensive Statistical Analysis of a Mathematics Placement Test Comprehensive Statistical Analysis of a Mathematics Placement Test Robert J. Hall Department of Educational Psychology Texas A&M University, USA (bobhall@tamu.edu) Eunju Jung Department of Educational

More information

How Do We Assess Students in the Interpreting Examinations?

How Do We Assess Students in the Interpreting Examinations? How Do We Assess Students in the Interpreting Examinations? Fred S. Wu 1 Newcastle University, United Kingdom The field of assessment in interpreter training is under-researched, though trainers and researchers

More information

CHAPTER III RESEARCH METHOD. method the major components include: Research Design, Research Site and

CHAPTER III RESEARCH METHOD. method the major components include: Research Design, Research Site and CHAPTER III RESEARCH METHOD This chapter presents the research method and design. In this research method the major components include: Research Design, Research Site and Access, Population and Sample,

More information

Interface Validity Investigating the potential role of face validity in content validation Gábor Szabó, Robert Märcz ECL Examinations

Interface Validity Investigating the potential role of face validity in content validation Gábor Szabó, Robert Märcz ECL Examinations Interface Validity Investigating the potential role of face validity in content validation Gábor Szabó, Robert Märcz ECL Examinations Outline - Questions of face validity - New approach - Context, participants

More information

The Effect of Guessing on Item Reliability

The Effect of Guessing on Item Reliability The Effect of Guessing on Item Reliability under Answer-Until-Correct Scoring Michael Kane National League for Nursing, Inc. James Moloney State University of New York at Brockport The answer-until-correct

More information

Item Analysis Explanation

Item Analysis Explanation Item Analysis Explanation The item difficulty is the percentage of candidates who answered the question correctly. The recommended range for item difficulty set forth by CASTLE Worldwide, Inc., is between

More information

Correlation and Regression

Correlation and Regression Dublin Institute of Technology ARROW@DIT Books/Book Chapters School of Management 2012-10 Correlation and Regression Donal O'Brien Dublin Institute of Technology, donal.obrien@dit.ie Pamela Sharkey Scott

More information

CHAPTER - 6 STATISTICAL ANALYSIS. This chapter discusses inferential statistics, which use sample data to

CHAPTER - 6 STATISTICAL ANALYSIS. This chapter discusses inferential statistics, which use sample data to CHAPTER - 6 STATISTICAL ANALYSIS 6.1 Introduction This chapter discusses inferential statistics, which use sample data to make decisions or inferences about population. Populations are group of interest

More information

THE TEST STATISTICS REPORT provides a synopsis of the test attributes and some important statistics. A sample is shown here to the right.

THE TEST STATISTICS REPORT provides a synopsis of the test attributes and some important statistics. A sample is shown here to the right. THE TEST STATISTICS REPORT provides a synopsis of the test attributes and some important statistics. A sample is shown here to the right. The Test reliability indicators are measures of how well the questions

More information

CHAPTER VI RESEARCH METHODOLOGY

CHAPTER VI RESEARCH METHODOLOGY CHAPTER VI RESEARCH METHODOLOGY 6.1 Research Design Research is an organized, systematic, data based, critical, objective, scientific inquiry or investigation into a specific problem, undertaken with the

More information

A Brief (very brief) Overview of Biostatistics. Jody Kreiman, PhD Bureau of Glottal Affairs

A Brief (very brief) Overview of Biostatistics. Jody Kreiman, PhD Bureau of Glottal Affairs A Brief (very brief) Overview of Biostatistics Jody Kreiman, PhD Bureau of Glottal Affairs What We ll Cover Fundamentals of measurement Parametric versus nonparametric tests Descriptive versus inferential

More information

An experimental investigation of consistency of explanation and graph representation

An experimental investigation of consistency of explanation and graph representation An experimental investigation of consistency of explanation and graph representation Nana Kanzaki (kanzaki@cog.human.nagoya-u.ac.jp) Graduate School of Information Science, Nagoya University, Japan Kazuhisa

More information

Psychological testing

Psychological testing Psychological testing Lecture 12 Mikołaj Winiewski, PhD Test Construction Strategies Content validation Empirical Criterion Factor Analysis Mixed approach (all of the above) Content Validation Defining

More information

Reliability and Validity checks S-005

Reliability and Validity checks S-005 Reliability and Validity checks S-005 Checking on reliability of the data we collect Compare over time (test-retest) Item analysis Internal consistency Inter-rater agreement Compare over time Test-Retest

More information

An insight into the relationships between English proficiency test anxiety and other anxieties

An insight into the relationships between English proficiency test anxiety and other anxieties World Transactions on Engineering and Technology Education Vol.15, No.3, 2017 2017 WIETE An insight into the relationships between English proficiency test anxiety and other anxieties Mei Dong Xi an Shiyou

More information

Encoding of Elements and Relations of Object Arrangements by Young Children

Encoding of Elements and Relations of Object Arrangements by Young Children Encoding of Elements and Relations of Object Arrangements by Young Children Leslee J. Martin (martin.1103@osu.edu) Department of Psychology & Center for Cognitive Science Ohio State University 216 Lazenby

More information

Relationships between stage of change for stress management behavior and perceived stress and coping

Relationships between stage of change for stress management behavior and perceived stress and coping Japanese Psychological Research 2010, Volume 52, No. 4, 291 297 doi: 10.1111/j.1468-5884.2010.00444.x Short Report Relationships between stage of change for stress management behavior and perceived stress

More information

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS Michael J. Kolen The University of Iowa March 2011 Commissioned by the Center for K 12 Assessment & Performance Management at

More information

THE INTERPRETATION OF EFFECT SIZE IN PUBLISHED ARTICLES. Rink Hoekstra University of Groningen, The Netherlands

THE INTERPRETATION OF EFFECT SIZE IN PUBLISHED ARTICLES. Rink Hoekstra University of Groningen, The Netherlands THE INTERPRETATION OF EFFECT SIZE IN PUBLISHED ARTICLES Rink University of Groningen, The Netherlands R.@rug.nl Significance testing has been criticized, among others, for encouraging researchers to focus

More information

On the usefulness of the CEFR in the investigation of test versions content equivalence HULEŠOVÁ, MARTINA

On the usefulness of the CEFR in the investigation of test versions content equivalence HULEŠOVÁ, MARTINA On the usefulness of the CEFR in the investigation of test versions content equivalence HULEŠOVÁ, MARTINA MASARY K UNIVERSITY, CZECH REPUBLIC Overview Background and research aims Focus on RQ2 Introduction

More information

Evaluation of CBT for increasing threat detection performance in X-ray screening

Evaluation of CBT for increasing threat detection performance in X-ray screening Evaluation of CBT for increasing threat detection performance in X-ray screening A. Schwaninger & F. Hofer Department of Psychology, University of Zurich, Switzerland Abstract The relevance of aviation

More information

To Thine Own Self Be True: A Five-Study Meta-Analysis on the Accuracy of Language-Learner Self-Assessment

To Thine Own Self Be True: A Five-Study Meta-Analysis on the Accuracy of Language-Learner Self-Assessment To Thine Own Self Be True: A Five-Study Meta-Analysis on the Accuracy of Language-Learner Self-Assessment Troy L. Cox, PhD Associate Director of Research and Assessment Center for Language Studies Brigham

More information

Problem #1 Neurological signs and symptoms of ciguatera poisoning as the start of treatment and 2.5 hours after treatment with mannitol.

Problem #1 Neurological signs and symptoms of ciguatera poisoning as the start of treatment and 2.5 hours after treatment with mannitol. Ho (null hypothesis) Ha (alternative hypothesis) Problem #1 Neurological signs and symptoms of ciguatera poisoning as the start of treatment and 2.5 hours after treatment with mannitol. Hypothesis: Ho:

More information

Evaluation of CBT for increasing threat detection performance in X-ray screening

Evaluation of CBT for increasing threat detection performance in X-ray screening Evaluation of CBT for increasing threat detection performance in X-ray screening A. Schwaninger & F. Hofer Department of Psychology, University of Zurich, Switzerland Abstract The relevance of aviation

More information

Effects of Coherence Marking on the Comprehension and Appraisal of Discourse

Effects of Coherence Marking on the Comprehension and Appraisal of Discourse Effects of Coherence Marking on the Comprehension and Appraisal of Discourse Judith Kamalski (Judith.Kamalski@let.uu.nl) Utrecht Institute of Linguistics, Trans 10, 3512 JK Utrecht, The Netherlands Leo

More information

EVALUATING AND IMPROVING MULTIPLE CHOICE QUESTIONS

EVALUATING AND IMPROVING MULTIPLE CHOICE QUESTIONS DePaul University INTRODUCTION TO ITEM ANALYSIS: EVALUATING AND IMPROVING MULTIPLE CHOICE QUESTIONS Ivan Hernandez, PhD OVERVIEW What is Item Analysis? Overview Benefits of Item Analysis Applications Main

More information

Chapter 12: Introduction to Analysis of Variance

Chapter 12: Introduction to Analysis of Variance Chapter 12: Introduction to Analysis of Variance of Variance Chapter 12 presents the general logic and basic formulas for the hypothesis testing procedure known as analysis of variance (ANOVA). The purpose

More information

Empirical Formula for Creating Error Bars for the Method of Paired Comparison

Empirical Formula for Creating Error Bars for the Method of Paired Comparison Empirical Formula for Creating Error Bars for the Method of Paired Comparison Ethan D. Montag Rochester Institute of Technology Munsell Color Science Laboratory Chester F. Carlson Center for Imaging Science

More information

Comparison of the emotional intelligence of the university students of the Punjab province

Comparison of the emotional intelligence of the university students of the Punjab province Available online at www.sciencedirect.com Procedia Social and Behavioral Sciences 2 (2010) 847 853 WCES-2010 Comparison of the emotional intelligence of the university students of the Punjab province Aijaz

More information

Certification of Airport Security Officers Using Multiple-Choice Tests: A Pilot Study

Certification of Airport Security Officers Using Multiple-Choice Tests: A Pilot Study Certification of Airport Security Officers Using Multiple-Choice Tests: A Pilot Study Diana Hardmeier Center for Adaptive Security Research and Applications (CASRA) Zurich, Switzerland Catharina Müller

More information

A New Approach to Examining Validity

A New Approach to Examining Validity Nov. 2006, Volume 4, No.11 (Serial No.38) US -China Foreign Language, ISSN1539-8080, US A A New Approach to Examining Validity Test-taking Strategy Investigation HE Xue-chun * (Foreign Languages Department,

More information

Reviewing the TIMSS Advanced 2015 Achievement Item Statistics

Reviewing the TIMSS Advanced 2015 Achievement Item Statistics CHAPTER 11 Reviewing the TIMSS Advanced 2015 Achievement Item Statistics Pierre Foy Michael O. Martin Ina V.S. Mullis Liqun Yin Kerry Cotter Jenny Liu The TIMSS & PIRLS conducted a review of a range of

More information

Investigating the Reliability of Classroom Observation Protocols: The Case of PLATO. M. Ken Cor Stanford University School of Education.

Investigating the Reliability of Classroom Observation Protocols: The Case of PLATO. M. Ken Cor Stanford University School of Education. The Reliability of PLATO Running Head: THE RELIABILTY OF PLATO Investigating the Reliability of Classroom Observation Protocols: The Case of PLATO M. Ken Cor Stanford University School of Education April,

More information

Ego identity, self-esteem, and academic achievement among EFL learners: A relationship study

Ego identity, self-esteem, and academic achievement among EFL learners: A relationship study International Journal of Research Studies in Educational Technology 2017 Volume 6 Number 1, 81-88 Ego identity, self-esteem, and academic achievement among EFL learners: A relationship study Mohammadi,

More information

How People Estimate Effect Sizes: The Role of Means and Standard Deviations

How People Estimate Effect Sizes: The Role of Means and Standard Deviations How People Estimate Effect Sizes: The Role of Means and Standard Deviations Motoyuki Saito (m-saito@kwansei.ac.jp) Department of Psychological Science, Kwansei Gakuin University Hyogo 662-8501, JAPAN Abstract

More information

Knowledge as a driver of public perceptions about climate change reassessed

Knowledge as a driver of public perceptions about climate change reassessed 1. Method and measures 1.1 Sample Knowledge as a driver of public perceptions about climate change reassessed In the cross-country study, the age of the participants ranged between 20 and 79 years, with

More information

8/28/2017. If the experiment is successful, then the model will explain more variance than it can t SS M will be greater than SS R

8/28/2017. If the experiment is successful, then the model will explain more variance than it can t SS M will be greater than SS R PSY 5101: Advanced Statistics for Psychological and Behavioral Research 1 If the ANOVA is significant, then it means that there is some difference, somewhere but it does not tell you which means are different

More information

THE USE OF CRONBACH ALPHA RELIABILITY ESTIMATE IN RESEARCH AMONG STUDENTS IN PUBLIC UNIVERSITIES IN GHANA.

THE USE OF CRONBACH ALPHA RELIABILITY ESTIMATE IN RESEARCH AMONG STUDENTS IN PUBLIC UNIVERSITIES IN GHANA. Africa Journal of Teacher Education ISSN 1916-7822. A Journal of Spread Corporation Vol. 6 No. 1 2017 Pages 56-64 THE USE OF CRONBACH ALPHA RELIABILITY ESTIMATE IN RESEARCH AMONG STUDENTS IN PUBLIC UNIVERSITIES

More information

Reliability Study of ACTFL OPIc in Spanish, English, and Arabic for the ACE Review

Reliability Study of ACTFL OPIc in Spanish, English, and Arabic for the ACE Review Reliability Study of ACTFL OPIc in Spanish, English, and Arabic for the ACE Review Prepared for: American Council on the Teaching of Foreign Languages (ACTFL) White Plains, NY Prepared by SWA Consulting

More information

Title: Reliability and validity of the adolescent stress questionnaire in a sample of European adolescents - the HELENA study

Title: Reliability and validity of the adolescent stress questionnaire in a sample of European adolescents - the HELENA study Author's response to reviews Title: Reliability and validity of the adolescent stress questionnaire in a sample of European adolescents - the HELENA study Authors: Tineke De Vriendt (tineke.devriendt@ugent.be)

More information

PSY 216: Elementary Statistics Exam 4

PSY 216: Elementary Statistics Exam 4 Name: PSY 16: Elementary Statistics Exam 4 This exam consists of multiple-choice questions and essay / problem questions. For each multiple-choice question, circle the one letter that corresponds to the

More information

Cross-validation of easycbm Reading Cut Scores in Washington:

Cross-validation of easycbm Reading Cut Scores in Washington: Technical Report # 1109 Cross-validation of easycbm Reading Cut Scores in Washington: 2009-2010 P. Shawn Irvin Bitnara Jasmine Park Daniel Anderson Julie Alonzo Gerald Tindal University of Oregon Published

More information

English 10 Writing Assessment Results and Analysis

English 10 Writing Assessment Results and Analysis Academic Assessment English 10 Writing Assessment Results and Analysis OVERVIEW This study is part of a multi-year effort undertaken by the Department of English to develop sustainable outcomes assessment

More information

Overestimation of Skills Affects Drivers Adaptation to Task Demands

Overestimation of Skills Affects Drivers Adaptation to Task Demands University of Iowa Iowa Research Online Driving Assessment Conference 2007 Driving Assessment Conference Jul 10th, 12:00 AM Overestimation of Skills Affects Drivers Adaptation to Task Demands Saskia de

More information

Shiken: JALT Testing & Evaluation SIG Newsletter. 12 (2). April 2008 (p )

Shiken: JALT Testing & Evaluation SIG Newsletter. 12 (2). April 2008 (p ) Rasch Measurementt iin Language Educattiion Partt 2:: Measurementt Scalles and Invariiance by James Sick, Ed.D. (J. F. Oberlin University, Tokyo) Part 1 of this series presented an overview of Rasch measurement

More information

Rater Reliability on Criterionreferenced Speaking Tests in IELTS and Joint Venture Universities

Rater Reliability on Criterionreferenced Speaking Tests in IELTS and Joint Venture Universities Lee, J. (2014). Rater reliability on criterion-referenced speaking tests in IELTS and Joint Venture Universities. English Teaching in China, 4, 16-20. Rater Reliability on Criterionreferenced Speaking

More information

SUMMER 2011 RE-EXAM PSYF11STAT - STATISTIK

SUMMER 2011 RE-EXAM PSYF11STAT - STATISTIK SUMMER 011 RE-EXAM PSYF11STAT - STATISTIK Full Name: Årskortnummer: Date: This exam is made up of three parts: Part 1 includes 30 multiple choice questions; Part includes 10 matching questions; and Part

More information

Simple ways to improve a test Assessment Institute in Indianapolis Interactive Session Tuesday, Oct 29, 2013

Simple ways to improve a test Assessment Institute in Indianapolis Interactive Session Tuesday, Oct 29, 2013 Slide 1 Simple ways to improve a test 2013 Assessment Institute in Indianapolis Interactive Session Tuesday, Oct 29, 2013 Hill, Y. Z. (2013, October). Using Excel to analyze multiple-choice items: Simple

More information

1. Below is the output of a 2 (gender) x 3(music type) completely between subjects factorial ANOVA on stress ratings

1. Below is the output of a 2 (gender) x 3(music type) completely between subjects factorial ANOVA on stress ratings SPSS 3 Practice Interpretation questions A researcher is interested in the effects of music on stress levels, and how stress levels might be related to anxiety and life satisfaction. 1. Below is the output

More information

Convergence Principles: Information in the Answer

Convergence Principles: Information in the Answer Convergence Principles: Information in the Answer Sets of Some Multiple-Choice Intelligence Tests A. P. White and J. E. Zammarelli University of Durham It is hypothesized that some common multiplechoice

More information

Analogy-Making in Children: The Importance of Processing Constraints

Analogy-Making in Children: The Importance of Processing Constraints Analogy-Making in Children: The Importance of Processing Constraints Jean-Pierre Thibaut (jean-pierre.thibaut@univ-poitiers.fr) University of Poitiers, CeRCA, CNRS UMR 634, 99 avenue du recteur Pineau

More information

Quality Assessment Criteria in Conference Interpreting from the Perspective of Loyalty Principle Ma Dan

Quality Assessment Criteria in Conference Interpreting from the Perspective of Loyalty Principle Ma Dan 2017 2nd International Conference on Humanities Science, Management and Education Technology (HSMET 2017) ISBN: 978-1-60595-494-3 Quality Assessment Criteria in Conference Interpreting from the Perspective

More information

Choosing a Significance Test. Student Resource Sheet

Choosing a Significance Test. Student Resource Sheet Choosing a Significance Test Student Resource Sheet Choosing Your Test Choosing an appropriate type of significance test is a very important consideration in analyzing data. If an inappropriate test is

More information

Assessing the Validity and Reliability of the Teacher Keys Effectiveness. System (TKES) and the Leader Keys Effectiveness System (LKES)

Assessing the Validity and Reliability of the Teacher Keys Effectiveness. System (TKES) and the Leader Keys Effectiveness System (LKES) Assessing the Validity and Reliability of the Teacher Keys Effectiveness System (TKES) and the Leader Keys Effectiveness System (LKES) of the Georgia Department of Education Submitted by The Georgia Center

More information

for Music Therapy Supervision

for Music Therapy Supervision A Study on the Realities Faced by Supervisees and on Their Educational Needs for Music Therapy Supervision Supervision is defined as the continual training program that is assisted by a supervisor in order

More information

Teachers Sense of Efficacy Scale: The Study of Validity and Reliability

Teachers Sense of Efficacy Scale: The Study of Validity and Reliability EUROPEAN ACADEMIC RESEARCH Vol. II, Issue 12/ March 2015 ISSN 2286-4822 www.euacademic.org Impact Factor: 3.1 (UIF) DRJI Value: 5.9 (B+) Teachers Sense of Efficacy Scale: The Study of Validity and Dr.

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

Basic concepts and principles of classical test theory

Basic concepts and principles of classical test theory Basic concepts and principles of classical test theory Jan-Eric Gustafsson What is measurement? Assignment of numbers to aspects of individuals according to some rule. The aspect which is measured must

More information

Item-Level Examiner Agreement. A. J. Massey and Nicholas Raikes*

Item-Level Examiner Agreement. A. J. Massey and Nicholas Raikes* Item-Level Examiner Agreement A. J. Massey and Nicholas Raikes* Cambridge Assessment, 1 Hills Road, Cambridge CB1 2EU, United Kingdom *Corresponding author Cambridge Assessment is the brand name of the

More information

Tibor Pólya. Hungarian Academy of Sciences, Budapest, Hungary

Tibor Pólya. Hungarian Academy of Sciences, Budapest, Hungary Psychology Research, August 2017, Vol. 7, No. 8, 458-462 doi:10.17265/2159-5542/2017.08.006 D DAVID PUBLISHING National History Contributes to the Definition of National Identity Tibor Pólya Hungarian

More information

Report on FY2014 Annual User Satisfaction Survey on Patent Examination Quality

Report on FY2014 Annual User Satisfaction Survey on Patent Examination Quality Report on FY2014 Annual User Satisfaction Survey on Patent Examination Quality May 2015 Japan Patent Office ABSTRACT Ⅰ. Introduction High quality and globally reliable patents granted by the JPO (Japan

More information

This self-archived version is provided for scholarly purposes only. The correct reference for this article is as follows:

This self-archived version is provided for scholarly purposes only. The correct reference for this article is as follows: SOCIAL AFFILIATION CUES PRIME HELP-SEEKING INTENTIONS 1 This self-archived version is provided for scholarly purposes only. The correct reference for this article is as follows: Rubin, M. (2011). Social

More information

The Stability of Undergraduate Students Cognitive Test Anxiety Levels

The Stability of Undergraduate Students Cognitive Test Anxiety Levels A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to Practical Assessment, Research & Evaluation. Permission is granted to distribute

More information

Title: Healthy snacks at the checkout counter: A lab and field study on the impact of shelf arrangement and assortment structure on consumer choices

Title: Healthy snacks at the checkout counter: A lab and field study on the impact of shelf arrangement and assortment structure on consumer choices Author's response to reviews Title: Healthy snacks at the checkout counter: A lab and field study on the impact of shelf arrangement and assortment structure on consumer choices Authors: Ellen van Kleef

More information

CHAMP: CHecklist for the Appraisal of Moderators and Predictors

CHAMP: CHecklist for the Appraisal of Moderators and Predictors CHAMP - Page 1 of 13 CHAMP: CHecklist for the Appraisal of Moderators and Predictors About the checklist In this document, a CHecklist for the Appraisal of Moderators and Predictors (CHAMP) is presented.

More information

A Cross-validation of easycbm Mathematics Cut Scores in. Oregon: Technical Report # Daniel Anderson. Julie Alonzo.

A Cross-validation of easycbm Mathematics Cut Scores in. Oregon: Technical Report # Daniel Anderson. Julie Alonzo. Technical Report # 1104 A Cross-validation of easycbm Mathematics Cut Scores in Oregon: 2009-2010 Daniel Anderson Julie Alonzo Gerald Tindal University of Oregon Published by Behavioral Research and Teaching

More information

Internal Consistency and Reliability of the Networked Minds Measure of Social Presence

Internal Consistency and Reliability of the Networked Minds Measure of Social Presence Internal Consistency and Reliability of the Networked Minds Measure of Social Presence Chad Harms Iowa State University Frank Biocca Michigan State University Abstract This study sought to develop and

More information

Running head: THE DEVELOPMENT AND PILOTING OF AN ONLINE IQ TEST. The Development and Piloting of an Online IQ Test. Examination number:

Running head: THE DEVELOPMENT AND PILOTING OF AN ONLINE IQ TEST. The Development and Piloting of an Online IQ Test. Examination number: 1 Running head: THE DEVELOPMENT AND PILOTING OF AN ONLINE IQ TEST The Development and Piloting of an Online IQ Test Examination number: Sidney Sussex College, University of Cambridge The development and

More information

Statistics. Nur Hidayanto PSP English Education Dept. SStatistics/Nur Hidayanto PSP/PBI

Statistics. Nur Hidayanto PSP English Education Dept. SStatistics/Nur Hidayanto PSP/PBI Statistics Nur Hidayanto PSP English Education Dept. RESEARCH STATISTICS WHAT S THE RELATIONSHIP? RESEARCH RESEARCH positivistic Prepositivistic Postpositivistic Data Initial Observation (research Question)

More information

Gang Zhou, Xiaochun Niu. Dalian University of Technology, Liao Ning, China

Gang Zhou, Xiaochun Niu. Dalian University of Technology, Liao Ning, China Psychology Research, June 2015, Vol. 5, No. 6, 372-379 doi:10.17265/2159-5542/2015.06.003 D DAVID PUBLISHING An Investigation into the Prevalence of Voice Strain in Chinese University Teachers Gang Zhou,

More information

RESULTS. Chapter INTRODUCTION

RESULTS. Chapter INTRODUCTION 8.1 Chapter 8 RESULTS 8.1 INTRODUCTION The previous chapter provided a theoretical discussion of the research and statistical methodology. This chapter focuses on the interpretation and discussion of the

More information

Students and parents/guardians are highly encouraged to use Parent Connect to track their progress.

Students and parents/guardians are highly encouraged to use Parent Connect to track their progress. 1 Assessment and Grading Plan Grades earned at Arundel High School will be a reflection of student s mastery of the related and relevant national, state, and industry standards pertaining to the course

More information

Importance of Good Measurement

Importance of Good Measurement Importance of Good Measurement Technical Adequacy of Assessments: Validity and Reliability Dr. K. A. Korb University of Jos The conclusions in a study are only as good as the data that is collected. The

More information

Providing Evidence for the Generalizability of a Speaking Placement Test Scores

Providing Evidence for the Generalizability of a Speaking Placement Test Scores Providing Evidence for the Generalizability of a Speaking Placement Test Scores Payman Vafaee 1, Behrooz Yaghmaeyan 2 Received: 15 April 2015 Accepted: 10 August 2015 Abstract Three major potential sources

More information

Formulating and Evaluating Interaction Effects

Formulating and Evaluating Interaction Effects Formulating and Evaluating Interaction Effects Floryt van Wesel, Irene Klugkist, Herbert Hoijtink Authors note Floryt van Wesel, Irene Klugkist and Herbert Hoijtink, Department of Methodology and Statistics,

More information

Instrumental activity in achievement motivation1. Department of Child Study, Faculty of Home Economics, Japan Women's University, Bunkyo-ku, Tokyo 112

Instrumental activity in achievement motivation1. Department of Child Study, Faculty of Home Economics, Japan Women's University, Bunkyo-ku, Tokyo 112 Japanese Psychological Research 1981, Vol.23, No.2, 79-87 Instrumental activity in achievement motivation1 MISAKO MIYAMOTO2 Department of Child Study, Faculty of Home Economics, Japan Women's University,

More information

Biserial Weights: A New Approach

Biserial Weights: A New Approach Biserial Weights: A New Approach to Test Item Option Weighting John G. Claudy American Institutes for Research Option weighting is an alternative to increasing test length as a means of improving the reliability

More information

BASIC PRINCIPLES OF ASSESSMENT

BASIC PRINCIPLES OF ASSESSMENT TOPIC 4 BASIC PRINCIPLES OF ASSESSMENT 4.0 SYNOPSIS Topic 4 defines the basic principles of assessment (reliability, validity, practicality, washback, and authenticity) and the essential sub-categories

More information

On the purpose of testing:

On the purpose of testing: Why Evaluation & Assessment is Important Feedback to students Feedback to teachers Information to parents Information for selection and certification Information for accountability Incentives to increase

More information

THE EFFECTIVENESS OF VARIOUS TRAINING PROGRAMMES 1. The Effectiveness of Various Training Programmes on Lie Detection Ability and the

THE EFFECTIVENESS OF VARIOUS TRAINING PROGRAMMES 1. The Effectiveness of Various Training Programmes on Lie Detection Ability and the THE EFFECTIVENESS OF VARIOUS TRAINING PROGRAMMES 1 The Effectiveness of Various Training Programmes on Lie Detection Ability and the Role of Sex in the Process THE EFFECTIVENESS OF VARIOUS TRAINING PROGRAMMES

More information

Introduction to Meta-Analysis

Introduction to Meta-Analysis Introduction to Meta-Analysis Nazım Ço galtay and Engin Karada g Abstract As a means to synthesize the results of multiple studies, the chronological development of the meta-analysis method was in parallel

More information

INSPECT Overview and FAQs

INSPECT Overview and FAQs WWW.KEYDATASYS.COM ContactUs@KeyDataSys.com 951.245.0828 Table of Contents INSPECT Overview...3 What Comes with INSPECT?....4 Reliability and Validity of the INSPECT Item Bank. 5 The INSPECT Item Process......6

More information

Prosody Rule for Time Structure of Finger Braille

Prosody Rule for Time Structure of Finger Braille Prosody Rule for Time Structure of Finger Braille Manabi Miyagi 1-33 Yayoi-cho, Inage-ku, +81-43-251-1111 (ext. 3307) miyagi@graduate.chiba-u.jp Yasuo Horiuchi 1-33 Yayoi-cho, Inage-ku +81-43-290-3300

More information

Running Head: ADVERSE IMPACT. Significance Tests and Confidence Intervals for the Adverse Impact Ratio. Scott B. Morris

Running Head: ADVERSE IMPACT. Significance Tests and Confidence Intervals for the Adverse Impact Ratio. Scott B. Morris Running Head: ADVERSE IMPACT Significance Tests and Confidence Intervals for the Adverse Impact Ratio Scott B. Morris Illinois Institute of Technology Russell Lobsenz Federal Bureau of Investigation Adverse

More information

Principles of Sociology

Principles of Sociology Principles of Sociology DEPARTMENT OF ECONOMICS ATHENS UNIVERSITY OF ECONOMICS AND BUSINESS [Academic year 2017/18, FALL SEMESTER] Lecturer: Dimitris Lallas Principles of Sociology 4th Session Sociological

More information

Original Article. Relationship between sport participation behavior and the two types of sport commitment of Japanese student athletes

Original Article. Relationship between sport participation behavior and the two types of sport commitment of Japanese student athletes Journal of Physical Education and Sport (JPES), 17(4), Art 267, pp. 2412-2416, 2017 online ISSN: 2247-806X; p-issn: 2247 8051; ISSN - L = 2247-8051 JPES Original Article Relationship between sport participation

More information

Record of the Consultation on Pharmacogenomics/Biomarkers

Record of the Consultation on Pharmacogenomics/Biomarkers This English version of the record of the consultation has been published by PMDA. In the event of inconsistency between the Japanese original and this English translation, the former shall prevail. (Attachment

More information

PTHP 7101 Research 1 Chapter Assignments

PTHP 7101 Research 1 Chapter Assignments PTHP 7101 Research 1 Chapter Assignments INSTRUCTIONS: Go over the questions/pointers pertaining to the chapters and turn in a hard copy of your answers at the beginning of class (on the day that it is

More information

Observational Category Learning as a Path to More Robust Generative Knowledge

Observational Category Learning as a Path to More Robust Generative Knowledge Observational Category Learning as a Path to More Robust Generative Knowledge Kimery R. Levering (kleveri1@binghamton.edu) Kenneth J. Kurtz (kkurtz@binghamton.edu) Department of Psychology, Binghamton

More information

Project exam in Cognitive Psychology PSY1002. Autumn Course responsible: Kjellrun Englund

Project exam in Cognitive Psychology PSY1002. Autumn Course responsible: Kjellrun Englund Project exam in Cognitive Psychology PSY1002 Autumn 2007 674107 Course responsible: Kjellrun Englund Stroop Effect Dual processing causing selective attention. 674107 November 26, 2007 Abstract This document

More information

Chapter Three. Methodology. This research used experimental design with quasi-experimental

Chapter Three. Methodology. This research used experimental design with quasi-experimental 37 Chapter Three Methodology This chapter presents research design, research setting, population and sampling, data collection method and data collection procedure. Data analysis is also presented in this

More information

4 Diagnostic Tests and Measures of Agreement

4 Diagnostic Tests and Measures of Agreement 4 Diagnostic Tests and Measures of Agreement Diagnostic tests may be used for diagnosis of disease or for screening purposes. Some tests are more effective than others, so we need to be able to measure

More information

Reliability and Validity of the Divided

Reliability and Validity of the Divided Aging, Neuropsychology, and Cognition, 12:89 98 Copyright 2005 Taylor & Francis, Inc. ISSN: 1382-5585/05 DOI: 10.1080/13825580590925143 Reliability and Validity of the Divided Aging, 121Taylor NANC 52900

More information