BACKGROUND CHARACTERISTICS OF EXAMINEES SHOWING UNUSUAL TEST BEHAVIOR ON THE GRADUATE RECORD EXAMINATIONS
|
|
- Gervase Lane
- 5 years ago
- Views:
Transcription
1 ---5 BACKGROUND CHARACTERISTICS OF EXAMINEES SHOWING UNUSUAL TEST BEHAVIOR ON THE GRADUATE RECORD EXAMINATIONS Philip K. Oltman GRE Board Professional Report GREB No. 82-8P ETS Research Report December 1985 l This report presents the findings of a research project funded by and carried out under the auspices of the Graduate Record Examinations Board.
2 Background Characteristics of Examinees Showing Unusual Test Behavior on the Graduate Record Examinations Philip K. Oltman GRE Board Professional Report No. 82-8P December by Educational Testing Service All rights reserved
3 Abstract We ordinarily expect item difficulty to be related to errors on tests; examinees generally tend to make errors on more difficult items and to answer easier items correctly. However, some examinees miss easy items and get more difficult items correct. The extent to which correct and incorrect responses are predicted by the difficulty of test items has been quantified in various ways. One method, originated by Sato (1975) and modified by Harnisch and Linn (1981), was used in the present study of item level data from the Graduate Record Examinations General Test. The modified Sato caution index was found to be of low reliability in these data and to be generally unrelated to a variety of background variables, although ethnic group showed a small but significant relation to the index. In these data the modified caution index showed a curvilinear relation to total test score, with examinees who showed very high or very low scores having higher index values than those in the middle range of test scores. Indexes calculated on the three sections of the test were uncorrelated with each other. Finally, the index did not moderate the relationship between test scores and self-reported grades, which it should have done if it indeed indicates how well the test measures the intended construct for any individual. The conclusion reached is that the modified caution index adds little information that would be of value in interpreting GRE test scores.
4 Introduction An examinee taking a test may achieve a given number of correct responses in a variety of ways. To cite the example given by Harnisch and Linn (1981), it is possible to achieve a score of 10 correct answers on a 20-item test in 184,756 different ways, that is, by answering 184,756 different patterns of correct and incorrect responses. While the variations among most of these patterns probably do not make much substantive difference, in other cases they might be important. If one examinee's 10 correct answers were the easiest 10 items,, and another's were the most difficult 10, one might hesitate to assert that the two scores of 10 mean the same thing. Admittedly, this large a difference in patterns would be a rare occurrence, but the example serves to illustrate the point that patterns of correct and incorrect responses may have information in them beyond what is provided by the total score. The total score, for example, would not tell us anything about patterns of strengths and weaknesses across different sections of the test, nor would it uncover evidence that the preparation of some examinees differed markedly from others in emphasis on certain areas. On the hypothetical 20-item test, given the 184,756 patterns of 10 correct responses and the astronomical number of patterns producing all the other possible total scores, a way of imposing some structure on the mass of potential information would be extremely useful. One way might be to search for clusters of examinees that show similar patterns of correct and incorrect responses. Another approach, which was followed in the present study, is to compare each obtained pattern of correct and incorrect responses with a benchmark pattern and calculate an index to indicate the extent of deviation of each pattern from that benchmark. A number of approaches have been developed using the item difficulties for the group as a template against which to compare each examrinee's pattern of responses. Underlying these methods is the notion that an examinee behaving in a perfectly orderly way would achieve correct responses on all items up to some level of difficulty, and then would show incorrect responses on all items more difficult than that level. For example, in our hypothetical situation, an "ideal" examinee with a score of 10 would achieve correct responses on the 10 easiest items and would miss the 10 most dj.ffi.cu1.t ones. Although this result would seldom be achieved exactly, most correct answers would be expected to come from the easier items and most errors would be expected to come from the more difficult items. The extent to which an examinee's pattern deviated from the "ideal" could well contain important information. At the very least, a tendency for an examinee to make errors on
5 -2- easy items and correct responses on difficult items would serve as a caution flag that the test score was produced by an unusual pattern of responses that may not be interpretable in the same terms as a score obtained from a pattern that closely matches the item difficulties. If a group of examinees deviates from the expected pattern, then the normative item difficulties may not apply to them, and one might question whether the test measures the intended construct. Some of the methods developed to study unusual response patterns have been based on item response theory (e.g., Levine & Rubin, 1979), while others (e.g., Donlon & Fischer, 1968; van der Flier, 1977; Tatsuoka & Tatsuoka, 1980; Sato, 1975, described in English by Tatsuoka, 1978) have directly compared the pattern of correct and incorrect responses with the item difficulties to derive an index of deviation, or "caution" in Sato's terminology. After comparing several methods of computing indexes of deviation from the usual or expected pattern of responses, Harnisch and Linn (1981) suggested that a modification of Sate's caution index was preferred over the others they studied because it was least confounded with total score. We therefore selected Harnisch and Linn's modification of Sato's caution index to apply to item level data files in a study of one administration of the Graduate Record Examinations (GRE) General Test. The modified caution index ranges from 0 to 1, with higher values indicating greater departure from the usual pattern of response. That is, an individual with a high modified caution index has missed some easy items and gotten some difficult items correct. "Usual" patterns of response are defined by the total number of correct items achieved by an examinee. For example, for examinees achieving 5 items correct, the usual pattern would entail that those items be the 5 easiest items; for examinees with 10 items correct, the 10 easiest; and so on. If the n items marked correct ly are not the n easiest it ems, then th a.t pattern is unusual 9 *-the greater the difficult y of the n c 0 lrrect items, t more un usual the pattern becomes. The highest possible modif caution index value would be obtai ned by a pat t ern consisting items c.orrect that were the most d Ifficult n i t ems. Further details of the calculations produc ing the m;3hi f ied caution in dex can be found in Harnisch and Linn (1981). he ied of n - Our interest was in the correlates of unusual test behavior on the GRE test. Would it be possible to find some pattern in the information available in the background data collected during the GRE test administration that is characteristic of examinees showing unusual test responses? For example, it seemed possible that ethnic minority groups might differ from the majority to the extent that ethnic background produces differing cultural and educational experiences that would be expressed in test performance.
6
7 -4- The modified caution index was computed for each examinee, separately for each score on the GRE General Test. The computations were carried out using a computer program designed for this purpose (Harnisch, Kuo, & Torres, 1982), and the resulting indexes were added to the data record of each examinee. Calculation of Modified Caution Index If each examinee's record consists of a row of O's and l's representing correct and incorrect responses, the data from the entire sample can be portrayed as a matrix, with rows representing examinees and columns representing items. To calculate the modified caution index, the columns are rearranged so that the column sums decrease from left to right (that is, the easiest items are toward the left). Similarly the rows are rearranged so that the row sums decrease from top to bottom (that is, the examinees with higher scores are toward the top). Given this matrix, Sato's caution index, as modified by Harnisch and Linn (1981) is given by the following formula: n. J 1. t (1 - uij)nej - 1 uijnmj Caution Index = j=l j=ni.+l n. J 1. t j=l n. - l J t n.j j=j+l-ni. where i = 1,2,... I indexes the examinee, j = 1,2,... J indexes the item, U ij = 1 if examinee i answers item j correctly, and 0 if examinee i answers item j incorrectly, n = total correct for the i th i. examinee, and n.j = total number of correct responses to the jth item.
8 -5- Results Score Means and Distributions of Sex, Ethnic Group, and Major The means and standard deviations of scores on the verbal, quantitative, and analytical sections of the GRE General Test for each sample and for the population are shown in Table 1. Also shown in Table 1 are the distributions of sex, ethnic group, and undergraduate major for the samples and the population. From these data it is apparent that the samples accurately represent the population from which they were drawn. In none of these comparisons did the samples differ significantly from the population or from each other. Characteristics of the Modified Sato Caution Index The means and standard deviations of the modified caution indexes computed from the verbal, quantitative, and analytical scores on the test are shown in Table 2 for each sample. The samples did not differ from each other in means or standard deviations on any of the scores on the test. The means, ranging from.20 to.24, and the standard deviations, ranging from.06 to.08, are comparable to those reported by Harnisch and Linn (1981), although our data were from a general aptitude test taken by graduate-school-bound examinees and those of Harnisch and Linn were from an achievement test taken by fourth-graders. The distributions of the modified caution index for each of the three parts of the test are shown in Table 3. While the distributions of observed indexes are not markedly asymmetrical, they cluster in the lower range of possible values, with very few exceeding.50. To assess reliability, pairs of indexes were computed for each examinee, one for the odd-numbered items in a test, and one for the even-numbered items. Spearman-Brown reliability estimates based on the correlations between the odd and even indexes were quite low, ranging between.15 and.29. Plots of odd indexes versus even indexes were examined, but nothing particularly unusual was found. The plots had no extreme outliers and were otherwise unremarkable, except for the low relation between the two indexes. This apparent lack of reliability obviously makes it rather unlikely that the index will correlate with anything else. Correlations between Indexes from Sections of the Test Correlations were computed between the indexes calculated for each of the three measures: verbal, quantitative, and analytical. There was no evidence that unusual responding was a "trait," in the
9 -6- sense that it might cut across test sections. The correlations among the indexes from the three measures hovered around zero. Correlates of the Modified Sato Caution Index Correlations and multiple regression analyses were performed to explore the data for possible correlates of degree of unusual responding. In each analysis the relation between the caution index and a given variable was computed with total score on the test held constant. The aim was to use the background data to characterize examinees with varying degrees of departure from the "usual" pattern of test response. In every case, the results from the two samples were almost identical. To conserve space in what follows, only the results for the first sample will be described. Background information questionnaire. None of the background information items was substantially related to the caution indexes calculated from the verbal, quantitative, or analytical item data. The largest correlation observed was a multiple correlation of.14 between the set of all ethnic groups and the caution index calculated from the verbal item responses. The largest difference accounting for the multiple correlation was between the Black and White examinees: the 101 Black examinees had a mean index for the verbal measure of.26 (SD =.08), while the 1,725 White examinees' mean index was.21 (SD r.06). This difference was statistically significant (2 <.Ol)because of the large sample size. However, given the size of the difference, it would be difficult to claim much utility for the modified caution index in the interpretation of GRE General Test scores. The test scores of the Black examinees were generally lower than those of the White group, which may have had some effect on the caution indexes as well. To test the mean caution index difference more directly, a sample of 101 White examinees was drawn to match the 101 Black examinees on test scores; three separate White samples were drawn to match on each of the three parts of the test. To carry out the matching, the distributions of Black examinees' scores were divided into six intervals, and White examinees were randomly drawn from each interval to match the number of Black examinees from that interval. The mean indexes did not differ significantly between the Black and matched White groups, suggesting that the observed mean differences in caution indexes in the total sample were due to differences in test score levels. When the test score levels were made equivalent for Black and White groups, the difference in caution index disappeared. Scores on sections of the test. One of the reasons that Harnisch and Linn (1981) recommended the use of the modified Sato caution index was that it showed a very low correlation with test scores in their data. If a particular index were to show a substantial correlation with test scores, it would be difficult to
10 -7- use it as an independent source of information about test performance that goes beyond the total score. Over all examinees, we found quite low correlations also in our data. The linear correlations between the verbal, quantitative, and analytical scores and the modified caution indexes were -.16,.02, and -.lo, respectively. However, when we calculated these correlations for each ethnic group separately, we found rather different results. Table 4 displays these results and shows that, while the correlations between test scores and index values for White examinees were indeed around zero, those for other groups were considerably higher. In particular, the Black examinees showed consistently negative correlations, indicating that the lower the score, the more unusual was the pattern of correct and incorrect responses. Scatter plots of each test score plotted against its respective index value were examined to try to determine whether they might help explain why the correlations differed from one ethnic group to another. Curvilinearity was apparent, and Table 5 displays how much the multiple correlation was increased by adding a squared test score term to the regression. In each case the increase was significant. The shape of the curvilinearity suggested that examinees with extremely high or extremely low test scores had higher index values. The observed curvilinearity probably accounts fully for the differences in correlations between test scores and indexes for the various ethnic groups. The case is most clear-cut for the Black examinee group, who scored significantly lower on each part of the test than did the White examinees. Given the curvilinearity in the scatter plots, any group scoring near the low end of the distribution would show a negative slope. Correlations were calculated between test scores and indexes for the matched White samples (described above) for the verbal, quantitative, and analytical scores. These, along with the comparable correlations from the Black examinees' data, are displayed in Table 6, where it can be seen that the two sets of correlations are very similar. Thus it seems reasonable to conclude that the substantial negative correlations between test scores and caution index values for Black examinees can be attributed to the fact that the Black examinees' generally lower scores put that group on the descending arm of the U-shaped curve. As can be seen from the correlations calculated for the matched White samples, a group of White examinees with test scores similar to the Black group's scores showed similar correlations.
11 -8- The Caution Index as a Moderator The implication of a high caution index is that the test score accompanying it is somehow difficult to interpret and is perhaps not a valid indicator of the underlying dimension we intend to estimate. The caution index would be important information to have available when interpreting a test score if it indeed gave such information about the validity of the score. Following this reasoning, one might expect that the caution index would moderate the relationship between test scores and various criteria they are intended to predict. That is, one might expect that grades would be better predicted by test scores for examinees with low caution indexes than for those with high caution indexes. If high caution indexes in a group of examinees indicate that the meaning of their test scores is not clear, then their grades and other criteria should not be predicted as well. While subsequent actual grades were not available to test this interpretation of the caution index, the examinees did provide self-reports of their undergraduate grades. Correlations between these self-reports of grades and scores on the three measures of the General Test are shown in Table 7 for the total group, for those "low caution" examinees who had indexes in the bottom 5 percent of the distribution, and for "high caution" examinees who were in the top 5 percent on the index. In each case, the correlations were higher for the "high caution" group, which is the reverse of what was expected. Regressions were compared for the high and low groups to check the possibility that differences in variances might have produced the differing correlations shown in Table 7, but the regressions showed the same pattern as the correlations. Discussion The aim of the study was to explore some of the correlates of unusual patterns of test response on the GRE General Test using the modified Sato caution index as an indicator of the extent to which a pattern of correct and uncorrect responses could be considered unusual. Given the available background data, a search was conducted for evidence that would speak on the potential usefulness of the index in the interpretation of GRE General Test scores. Generally, the conclusion to be drawn from these analyses is that the index is of limited usefulness in this particular context. Perhaps the most problematical finding was the low reliability of the index. Furthermore, indexes calculated for each of the three sections of the test were uncorrelated with each other. Thus it is not possible to interpret unusual response as a "trait." That is, index values apparently do not indicate anything general
12 -9- about an examinee's approach across tests in general, or even across sections of a test. If indexes are interpretable, such interpretation would pertain only to the particular part of the test on which the index was calculated, at least in these data. The means and distributions of the indexes we calculated were comparable to those reported by Harnisch and Linn (1981). However, we found more curvilinearity in the relationship between test score and index than was expected from Harnisch and Linn's report. There was a significant tendency for high index values to be associated with extreme test scores, either high or low. The description of the index suggested that it would have considerable utility as a moderator variable. If indeed a high index value indicated that interpretation of its associated score must be made with caution, and a low index value indicated that the score was a straightforward reflection of the underlying construct, then correlations between test scores and criteria should be higher for examinees with low indexes than for those with high indexes. High caution implies doubts about validity. No evidence supporting such a use for the index was found in these data. Correlations between test scores and self-reported grades did not differ in the expected direction between high- and low-index groups. Perhaps the caution index behaves differently in aptitude test data than it does in achievement test data. Little evidence was found in the GRE data of consistent individual differences in extent of unusual responding or of a strong source of variance independent of the test scores. Achievement tests that have a closer relation between instruction and assessment may be more likely to contain sources of unusual response pattern variance that would be usefully related to individual examinee characteristics. In summary, none of the analyses that were performed provided evidence that examinees showing unusual patterns of test response on the GRE General Test, as indicated by the modified Sato caution index, had unique background characteristics. The caution index did not provide information that would be useful to users of the GRE General Test. While there may indeed be information in the pattern of test responses beyond what is provided by the total score, this particular index calculated from these data did not prove to be informative.
13 -lo- References Donlon, T. F., & Fischer, F. E. (1968). An index of an individual's agreement with group-determined item difficulties. Educational and Psychological Measurement, 28, Harnisch, D. L., Kuo, S., & Torres, R. T. (1982). SPP: Student Problem Package (Version 1.0). Champaign, IL: University of Illinois Office of Educational Testing, Research, and Service. Harnisch, D. L., & Linn, R. L. (1981). Analysis of item response patterns: Questionable test data and dissimilar curriculum practices. Journal of Educational Measurement, 18, Levine, M. V., & Rubin, D. B. (1979). Measuring the appropriateness of multiple choice test scores. Journal of Educational Statistics, 5, Sato, T. (1975). [The construction and interpretation of S-P tables.] Tokyo: Meiji Tosho. Tatsuoka, M. M. (1978). Recent psychometric developments in Japan: Engineers grapple with educational measurement problems. Paper presented at the ONR Contractors Meeting on Individualized Measurement, Columbia, MO. Tatsuoka, K., & Tatsuoka, M. M. (1980). Detection of aberrant response patterns and their effects on dimensionality (Research Report 80-4). Urbana, IL: University of Illinois, Computer-Based Education Research Laboratory. van der Flier, H. (1977). Environmental factors and deviant response patterns. In Y. H. Poortinga (Ed.), Basic problems in cross-cultural psychology. Amsterdam: Swets and Seitlinger, B.V.
14 -ll- Table 1 Means, Standard Deviations, and Distributions of Major Descriptive Variables Sample Aa Sample Ba Populationb GRE Scores M SD M SD M SD Verbal Quantitative Analytical Distributions Sex Percent Percent Percent Female Male Ethnic Group Amerindian Black Chicano Oriental Puerto Rican Other Hispanic White Other Undergraduate Major Humanities Social Sciences Biological Sciences Physical Sciences an bk - = 1,994 = 57,814
15 -12- Table 2 Means and Standard Deviations of Modified Caution Indexes Calculated on Verbal, Quantitative, and Analytical Measures of the GRE General Test Sample A Sample B M SD M SD - - GRE Measures Verbal Quantitative Analytical
16 -13- Table 3 Distributions of Modified Caution Index Calculated from the Verbal, Quantitative, and Analytical Measures of the GRE General Test Measures Verbal Quantitative Analytical Samples: A B A B A B Percentilesa lO l ascores in body of table are at the percentiles indicated by the row headings for the measures indicated by the column headings.
17 -14- Table 4 Correlations between GRE Verbal, Quantitative, and Analytical Scores and Modified Caution Index on Corresponding Measures by Ethnic Group Measures N Verbal Quantitative Analytical - Ethnic Group Amerindian Black Chicano Oriental Puerto Rican a 26 Hispanic, Other White Other
18 -150 Table 5 Tests of Curvilinearity of Regression of Caution Index on Test Scores Measures Verbal Quantitative Analytical Type of Correlation with Test Score Lineara Curvilinearb lO a Test score versus caution index. bmultiple correlation, test score, and test score squared versus caution index.
19 -16- Table 6 Correlations Between Test Scores and Caution Indexes for Black and for White Examinees Matched on Test Scores Measures Verbal Quantitative Analytical Group Black White Matched a -.3gb -.24' awhite group matched on verbal score. b White group matched on quantitative score. 'White group matched on analytical score.
20 -17- Table 7 Correlations Between Self-Reported Grades and Test Scores for Different Levels of the Caution Index Measures Verbal Quantitative Analytical Groups Total Sample "Low Cautiona".ll "High Cautionb" alower 5 percent on caution index. b Upper 5 percent on caution index.
Section 3.2 Least-Squares Regression
Section 3.2 Least-Squares Regression Linear relationships between two quantitative variables are pretty common and easy to understand. Correlation measures the direction and strength of these relationships.
More informationThe Influence of Test Characteristics on the Detection of Aberrant Response Patterns
The Influence of Test Characteristics on the Detection of Aberrant Response Patterns Steven P. Reise University of California, Riverside Allan M. Due University of Minnesota Statistical methods to assess
More informationResults & Statistics: Description and Correlation. I. Scales of Measurement A Review
Results & Statistics: Description and Correlation The description and presentation of results involves a number of topics. These include scales of measurement, descriptive statistics used to summarize
More informationC-1: Variables which are measured on a continuous scale are described in terms of three key characteristics central tendency, variability, and shape.
MODULE 02: DESCRIBING DT SECTION C: KEY POINTS C-1: Variables which are measured on a continuous scale are described in terms of three key characteristics central tendency, variability, and shape. C-2:
More informationEmpowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison
Empowered by Psychometrics The Fundamentals of Psychometrics Jim Wollack University of Wisconsin Madison Psycho-what? Psychometrics is the field of study concerned with the measurement of mental and psychological
More informationCHILD HEALTH AND DEVELOPMENT STUDY
CHILD HEALTH AND DEVELOPMENT STUDY 9. Diagnostics In this section various diagnostic tools will be used to evaluate the adequacy of the regression model with the five independent variables developed in
More informationComparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria
Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria Thakur Karkee Measurement Incorporated Dong-In Kim CTB/McGraw-Hill Kevin Fatica CTB/McGraw-Hill
More informationCHAPTER ONE CORRELATION
CHAPTER ONE CORRELATION 1.0 Introduction The first chapter focuses on the nature of statistical data of correlation. The aim of the series of exercises is to ensure the students are able to use SPSS to
More informationAppendix B Statistical Methods
Appendix B Statistical Methods Figure B. Graphing data. (a) The raw data are tallied into a frequency distribution. (b) The same data are portrayed in a bar graph called a histogram. (c) A frequency polygon
More informationEVALUATING AND IMPROVING MULTIPLE CHOICE QUESTIONS
DePaul University INTRODUCTION TO ITEM ANALYSIS: EVALUATING AND IMPROVING MULTIPLE CHOICE QUESTIONS Ivan Hernandez, PhD OVERVIEW What is Item Analysis? Overview Benefits of Item Analysis Applications Main
More informationConvergence Principles: Information in the Answer
Convergence Principles: Information in the Answer Sets of Some Multiple-Choice Intelligence Tests A. P. White and J. E. Zammarelli University of Durham It is hypothesized that some common multiplechoice
More informationUnderstandable Statistics
Understandable Statistics correlated to the Advanced Placement Program Course Description for Statistics Prepared for Alabama CC2 6/2003 2003 Understandable Statistics 2003 correlated to the Advanced Placement
More informationBOOTSTRAPPING CONFIDENCE LEVELS FOR HYPOTHESES ABOUT REGRESSION MODELS
BOOTSTRAPPING CONFIDENCE LEVELS FOR HYPOTHESES ABOUT REGRESSION MODELS 17 December 2009 Michael Wood University of Portsmouth Business School SBS Department, Richmond Building Portland Street, Portsmouth
More informationDifferential Item Functioning
Differential Item Functioning Lecture #11 ICPSR Item Response Theory Workshop Lecture #11: 1of 62 Lecture Overview Detection of Differential Item Functioning (DIF) Distinguish Bias from DIF Test vs. Item
More informationAddendum: Multiple Regression Analysis (DRAFT 8/2/07)
Addendum: Multiple Regression Analysis (DRAFT 8/2/07) When conducting a rapid ethnographic assessment, program staff may: Want to assess the relative degree to which a number of possible predictive variables
More informationStandard Scores. Richard S. Balkin, Ph.D., LPC-S, NCC
Standard Scores Richard S. Balkin, Ph.D., LPC-S, NCC 1 Normal Distributions While Best and Kahn (2003) indicated that the normal curve does not actually exist, measures of populations tend to demonstrate
More informationChapter 7: Descriptive Statistics
Chapter Overview Chapter 7 provides an introduction to basic strategies for describing groups statistically. Statistical concepts around normal distributions are discussed. The statistical procedures of
More information12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2
PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Introduce moderated multiple regression Continuous predictor continuous predictor Continuous predictor categorical predictor Understand
More informationAnalysis and Interpretation of Data Part 1
Analysis and Interpretation of Data Part 1 DATA ANALYSIS: PRELIMINARY STEPS 1. Editing Field Edit Completeness Legibility Comprehensibility Consistency Uniformity Central Office Edit 2. Coding Specifying
More informationRunning head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note
Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1 Nested Factor Analytic Model Comparison as a Means to Detect Aberrant Response Patterns John M. Clark III Pearson Author Note John M. Clark III,
More informationSimple Linear Regression the model, estimation and testing
Simple Linear Regression the model, estimation and testing Lecture No. 05 Example 1 A production manager has compared the dexterity test scores of five assembly-line employees with their hourly productivity.
More informationThe Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing
The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing Terry A. Ackerman University of Illinois This study investigated the effect of using multidimensional items in
More informationTechnical Specifications
Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically
More information11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES
Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are
More informationLecture 12 Cautions in Analyzing Associations
Lecture 12 Cautions in Analyzing Associations MA 217 - Stephen Sawin Fairfield University August 8, 2017 Cautions in Linear Regression Three things to be careful when doing linear regression we have already
More informationBusiness Statistics Probability
Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment
More informationSection 6: Analysing Relationships Between Variables
6. 1 Analysing Relationships Between Variables Section 6: Analysing Relationships Between Variables Choosing a Technique The Crosstabs Procedure The Chi Square Test The Means Procedure The Correlations
More informationOn the purpose of testing:
Why Evaluation & Assessment is Important Feedback to students Feedback to teachers Information to parents Information for selection and certification Information for accountability Incentives to increase
More informationThe Effect of Guessing on Item Reliability
The Effect of Guessing on Item Reliability under Answer-Until-Correct Scoring Michael Kane National League for Nursing, Inc. James Moloney State University of New York at Brockport The answer-until-correct
More informationUsing the Rasch Modeling for psychometrics examination of food security and acculturation surveys
Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Jill F. Kilanowski, PhD, APRN,CPNP Associate Professor Alpha Zeta & Mu Chi Acknowledgements Dr. Li Lin,
More informationExamining the Psychometric Properties of The McQuaig Occupational Test
Examining the Psychometric Properties of The McQuaig Occupational Test Prepared for: The McQuaig Institute of Executive Development Ltd., Toronto, Canada Prepared by: Henryk Krajewski, Ph.D., Senior Consultant,
More information3.2 Least- Squares Regression
3.2 Least- Squares Regression Linear (straight- line) relationships between two quantitative variables are pretty common and easy to understand. Correlation measures the direction and strength of these
More informationConditional Distributions and the Bivariate Normal Distribution. James H. Steiger
Conditional Distributions and the Bivariate Normal Distribution James H. Steiger Overview In this module, we have several goals: Introduce several technical terms Bivariate frequency distribution Marginal
More informationRunning head: INDIVIDUAL DIFFERENCES 1. Why to treat subjects as fixed effects. James S. Adelman. University of Warwick.
Running head: INDIVIDUAL DIFFERENCES 1 Why to treat subjects as fixed effects James S. Adelman University of Warwick Zachary Estes Bocconi University Corresponding Author: James S. Adelman Department of
More informationSOME NOTES ON STATISTICAL INTERPRETATION
1 SOME NOTES ON STATISTICAL INTERPRETATION Below I provide some basic notes on statistical interpretation. These are intended to serve as a resource for the Soci 380 data analysis. The information provided
More informationStill important ideas
Readings: OpenStax - Chapters 1 13 & Appendix D & E (online) Plous Chapters 17 & 18 - Chapter 17: Social Influences - Chapter 18: Group Judgments and Decisions Still important ideas Contrast the measurement
More informationChapter 2--Norms and Basic Statistics for Testing
Chapter 2--Norms and Basic Statistics for Testing Student: 1. Statistical procedures that summarize and describe a series of observations are called A. inferential statistics. B. descriptive statistics.
More informationReliability, validity, and all that jazz
Reliability, validity, and all that jazz Dylan Wiliam King s College London Introduction No measuring instrument is perfect. The most obvious problems relate to reliability. If we use a thermometer to
More informationMeasurement and Descriptive Statistics. Katie Rommel-Esham Education 604
Measurement and Descriptive Statistics Katie Rommel-Esham Education 604 Frequency Distributions Frequency table # grad courses taken f 3 or fewer 5 4-6 3 7-9 2 10 or more 4 Pictorial Representations Frequency
More information3 CONCEPTUAL FOUNDATIONS OF STATISTICS
3 CONCEPTUAL FOUNDATIONS OF STATISTICS In this chapter, we examine the conceptual foundations of statistics. The goal is to give you an appreciation and conceptual understanding of some basic statistical
More informationComplex Regression Models with Coded, Centered & Quadratic Terms
Complex Regression Models with Coded, Centered & Quadratic Terms We decided to continue our study of the relationships among amount and difficulty of exam practice with exam performance in the first graduate
More informationIAPT: Regression. Regression analyses
Regression analyses IAPT: Regression Regression is the rather strange name given to a set of methods for predicting one variable from another. The data shown in Table 1 and come from a student project
More informationUnit 1 Exploring and Understanding Data
Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile
More informationThings you need to know about the Normal Distribution. How to use your statistical calculator to calculate The mean The SD of a set of data points.
Things you need to know about the Normal Distribution How to use your statistical calculator to calculate The mean The SD of a set of data points. The formula for the Variance (SD 2 ) The formula for the
More informationChapter 3: Describing Relationships
Chapter 3: Describing Relationships Objectives: Students will: Construct and interpret a scatterplot for a set of bivariate data. Compute and interpret the correlation, r, between two variables. Demonstrate
More informationInvestigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories
Kamla-Raj 010 Int J Edu Sci, (): 107-113 (010) Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories O.O. Adedoyin Department of Educational Foundations,
More informationClinical Trials A Practical Guide to Design, Analysis, and Reporting
Clinical Trials A Practical Guide to Design, Analysis, and Reporting Duolao Wang, PhD Ameet Bakhai, MBBS, MRCP Statistician Cardiologist Clinical Trials A Practical Guide to Design, Analysis, and Reporting
More informationDetecting Suspect Examinees: An Application of Differential Person Functioning Analysis. Russell W. Smith Susan L. Davis-Becker
Detecting Suspect Examinees: An Application of Differential Person Functioning Analysis Russell W. Smith Susan L. Davis-Becker Alpine Testing Solutions Paper presented at the annual conference of the National
More informationSTATISTICS 8 CHAPTERS 1 TO 6, SAMPLE MULTIPLE CHOICE QUESTIONS
STATISTICS 8 CHAPTERS 1 TO 6, SAMPLE MULTIPLE CHOICE QUESTIONS Circle the best answer. This scenario applies to Questions 1 and 2: A study was done to compare the lung capacity of coal miners to the lung
More informationExamining differences between two sets of scores
6 Examining differences between two sets of scores In this chapter you will learn about tests which tell us if there is a statistically significant difference between two sets of scores. In so doing you
More information1 The conceptual underpinnings of statistical power
1 The conceptual underpinnings of statistical power The importance of statistical power As currently practiced in the social and health sciences, inferential statistics rest solidly upon two pillars: statistical
More informationInterpreting the Item Analysis Score Report Statistical Information
Interpreting the Item Analysis Score Report Statistical Information This guide will provide information that will help you interpret the statistical information relating to the Item Analysis Report generated
More informationNondestructive Inspection and Testing Section SA-ALC/MAQCN Kelly AFB, TX
PROFICIENCY EVALUATION OF NDE PERSONNEL UTILIZING THE ULTRASONIC METHODOLOGY Mark K. Davis Nondestructive Inspection and Testing Section SA-ALC/MAQCN Kelly AFB, TX 7824-5000 INTRODUCTION The measure by
More informationA Comparison of Several Goodness-of-Fit Statistics
A Comparison of Several Goodness-of-Fit Statistics Robert L. McKinley The University of Toledo Craig N. Mills Educational Testing Service A study was conducted to evaluate four goodnessof-fit procedures
More informationUnderstanding Uncertainty in School League Tables*
FISCAL STUDIES, vol. 32, no. 2, pp. 207 224 (2011) 0143-5671 Understanding Uncertainty in School League Tables* GEORGE LECKIE and HARVEY GOLDSTEIN Centre for Multilevel Modelling, University of Bristol
More informationVariations in Mean Response Times for Questions on the. Computer-Adaptive General Test: Implications for Fair Assessment
Variations in Mean Response Times for Questions on the Computer-Adaptive GRE@ General Test: Implications for Fair Assessment Brent Bridgeman Frederick Cline GRE No. 96-2P June 2 This report presents the
More informationVARIABLES AND MEASUREMENT
ARTHUR SYC 204 (EXERIMENTAL SYCHOLOGY) 16A LECTURE NOTES [01/29/16] VARIABLES AND MEASUREMENT AGE 1 Topic #3 VARIABLES AND MEASUREMENT VARIABLES Some definitions of variables include the following: 1.
More informationInstrument equivalence across ethnic groups. Antonio Olmos (MHCD) Susan R. Hutchinson (UNC)
Instrument equivalence across ethnic groups Antonio Olmos (MHCD) Susan R. Hutchinson (UNC) Overview Instrument Equivalence Measurement Invariance Invariance in Reliability Scores Factorial Invariance Item
More informationIssues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy
Industrial and Organizational Psychology, 3 (2010), 489 493. Copyright 2010 Society for Industrial and Organizational Psychology. 1754-9426/10 Issues That Should Not Be Overlooked in the Dominance Versus
More informationReadings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F
Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Plous Chapters 17 & 18 Chapter 17: Social Influences Chapter 18: Group Judgments and Decisions
More informationCHAPTER 3 DATA ANALYSIS: DESCRIBING DATA
Data Analysis: Describing Data CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA In the analysis process, the researcher tries to evaluate the data collected both from written documents and from other sources such
More information2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%
Capstone Test (will consist of FOUR quizzes and the FINAL test grade will be an average of the four quizzes). Capstone #1: Review of Chapters 1-3 Capstone #2: Review of Chapter 4 Capstone #3: Review of
More informationHow Lertap and Iteman Flag Items
How Lertap and Iteman Flag Items Last updated on 24 June 2012 Larry Nelson, Curtin University This little paper has to do with two widely-used programs for classical item analysis, Iteman and Lertap. I
More informationDescribe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo
Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment
More informationChapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE
Chapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE 1. When you assert that it is improbable that the mean intelligence test score of a particular group is 100, you are using. a. descriptive
More informationExamining Relationships Least-squares regression. Sections 2.3
Examining Relationships Least-squares regression Sections 2.3 The regression line A regression line describes a one-way linear relationship between variables. An explanatory variable, x, explains variability
More informationMantel-Haenszel Procedures for Detecting Differential Item Functioning
A Comparison of Logistic Regression and Mantel-Haenszel Procedures for Detecting Differential Item Functioning H. Jane Rogers, Teachers College, Columbia University Hariharan Swaminathan, University of
More informationV. Measuring, Diagnosing, and Perhaps Understanding Objects
V. Measuring, Diagnosing, and Perhaps Understanding Objects Our purpose when undertaking this venture was not to explain data or even to build better instruments. It may not seem like it based on the discussion
More informationProblem #1 Neurological signs and symptoms of ciguatera poisoning as the start of treatment and 2.5 hours after treatment with mannitol.
Ho (null hypothesis) Ha (alternative hypothesis) Problem #1 Neurological signs and symptoms of ciguatera poisoning as the start of treatment and 2.5 hours after treatment with mannitol. Hypothesis: Ho:
More informationMeasuring the User Experience
Measuring the User Experience Collecting, Analyzing, and Presenting Usability Metrics Chapter 2 Background Tom Tullis and Bill Albert Morgan Kaufmann, 2008 ISBN 978-0123735584 Introduction Purpose Provide
More informationStatistical Techniques. Meta-Stat provides a wealth of statistical tools to help you examine your data. Overview
7 Applying Statistical Techniques Meta-Stat provides a wealth of statistical tools to help you examine your data. Overview... 137 Common Functions... 141 Selecting Variables to be Analyzed... 141 Deselecting
More informationCHARACTERISTICS OF EXAMINEES WHO LEAVE QUESTIONS UNANSWERED ON THE GRE GENERAL TEST UNDER RIGHTS-ONLY SCORING
CHARACTERISTICS OF EXAMINEES WHO LEAVE QUESTIONS UNANSWERED ON THE GRE GENERAL TEST UNDER RIGHTS-ONLY SCORING Jerilee Grandy GRE Board Professional Report No. 83-16P ETS Research Report 87-38 November
More informationLatent Trait Standardization of the Benzodiazepine Dependence. Self-Report Questionnaire using the Rasch Scaling Model
Chapter 7 Latent Trait Standardization of the Benzodiazepine Dependence Self-Report Questionnaire using the Rasch Scaling Model C.C. Kan 1, A.H.G.S. van der Ven 2, M.H.M. Breteler 3 and F.G. Zitman 1 1
More informationMultivariable Systems. Lawrence Hubert. July 31, 2011
Multivariable July 31, 2011 Whenever results are presented within a multivariate context, it is important to remember that there is a system present among the variables, and this has a number of implications
More informationConnexion of Item Response Theory to Decision Making in Chess. Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan
Connexion of Item Response Theory to Decision Making in Chess Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan Acknowledgement A few Slides have been taken from the following presentation
More informationTest Validity. What is validity? Types of validity IOP 301-T. Content validity. Content-description Criterion-description Construct-identification
What is? IOP 301-T Test Validity It is the accuracy of the measure in reflecting the concept it is supposed to measure. In simple English, the of a test concerns what the test measures and how well it
More informationLecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics
Biost 517 Applied Biostatistics I Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 3: Overview of Descriptive Statistics October 3, 2005 Lecture Outline Purpose
More informationA Case Study: Two-sample categorical data
A Case Study: Two-sample categorical data Patrick Breheny January 31 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/43 Introduction Model specification Continuous vs. mixture priors Choice
More informationSection on Survey Research Methods JSM 2009
Missing Data and Complex Samples: The Impact of Listwise Deletion vs. Subpopulation Analysis on Statistical Bias and Hypothesis Test Results when Data are MCAR and MAR Bethany A. Bell, Jeffrey D. Kromrey
More informationChapter 11 Nonexperimental Quantitative Research Steps in Nonexperimental Research
Chapter 11 Nonexperimental Quantitative Research (Reminder: Don t forget to utilize the concept maps and study questions as you study this and the other chapters.) Nonexperimental research is needed because
More informationReliability, validity, and all that jazz
Reliability, validity, and all that jazz Dylan Wiliam King s College London Published in Education 3-13, 29 (3) pp. 17-21 (2001) Introduction No measuring instrument is perfect. If we use a thermometer
More informationDescription of components in tailored testing
Behavior Research Methods & Instrumentation 1977. Vol. 9 (2).153-157 Description of components in tailored testing WAYNE M. PATIENCE University ofmissouri, Columbia, Missouri 65201 The major purpose of
More informationStatistical Methods and Reasoning for the Clinical Sciences
Statistical Methods and Reasoning for the Clinical Sciences Evidence-Based Practice Eiki B. Satake, PhD Contents Preface Introduction to Evidence-Based Statistics: Philosophical Foundation and Preliminaries
More informationMULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES
24 MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES In the previous chapter, simple linear regression was used when you have one independent variable and one dependent variable. This chapter
More informationA Broad-Range Tailored Test of Verbal Ability
A Broad-Range Tailored Test of Verbal Ability Frederic M. Lord Educational Testing Service Two parallel forms of a broad-range tailored test of verbal ability have been built. The test is appropriate from
More informationRAG Rating Indicator Values
Technical Guide RAG Rating Indicator Values Introduction This document sets out Public Health England s standard approach to the use of RAG ratings for indicator values in relation to comparator or benchmark
More informationSheila Barron Statistics Outreach Center 2/8/2011
Sheila Barron Statistics Outreach Center 2/8/2011 What is Power? When conducting a research study using a statistical hypothesis test, power is the probability of getting statistical significance when
More informationMEASURING AFFECTIVE RESPONSES TO CONFECTIONARIES USING PAIRED COMPARISONS
MEASURING AFFECTIVE RESPONSES TO CONFECTIONARIES USING PAIRED COMPARISONS Farzilnizam AHMAD a, Raymond HOLT a and Brian HENSON a a Institute Design, Robotic & Optimizations (IDRO), School of Mechanical
More informationRisk Aversion in Games of Chance
Risk Aversion in Games of Chance Imagine the following scenario: Someone asks you to play a game and you are given $5,000 to begin. A ball is drawn from a bin containing 39 balls each numbered 1-39 and
More informationPolitical Science 15, Winter 2014 Final Review
Political Science 15, Winter 2014 Final Review The major topics covered in class are listed below. You should also take a look at the readings listed on the class website. Studying Politics Scientifically
More informationCRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys
Multiple Regression Analysis 1 CRITERIA FOR USE Multiple regression analysis is used to test the effects of n independent (predictor) variables on a single dependent (criterion) variable. Regression tests
More informationMBA 605 Business Analytics Don Conant, PhD. GETTING TO THE STANDARD NORMAL DISTRIBUTION
MBA 605 Business Analytics Don Conant, PhD. GETTING TO THE STANDARD NORMAL DISTRIBUTION Variables In the social sciences data are the observed and/or measured characteristics of individuals and groups
More informationMEASURES OF ASSOCIATION AND REGRESSION
DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 816 MEASURES OF ASSOCIATION AND REGRESSION I. AGENDA: A. Measures of association B. Two variable regression C. Reading: 1. Start Agresti
More informationPractitioner s Guide To Stratified Random Sampling: Part 1
Practitioner s Guide To Stratified Random Sampling: Part 1 By Brian Kriegler November 30, 2018, 3:53 PM EST This is the first of two articles on stratified random sampling. In the first article, I discuss
More informationAP Statistics. Semester One Review Part 1 Chapters 1-5
AP Statistics Semester One Review Part 1 Chapters 1-5 AP Statistics Topics Describing Data Producing Data Probability Statistical Inference Describing Data Ch 1: Describing Data: Graphically and Numerically
More informationAndré Cyr and Alexander Davies
Item Response Theory and Latent variable modeling for surveys with complex sampling design The case of the National Longitudinal Survey of Children and Youth in Canada Background André Cyr and Alexander
More informationClincial Biostatistics. Regression
Regression analyses Clincial Biostatistics Regression Regression is the rather strange name given to a set of methods for predicting one variable from another. The data shown in Table 1 and come from a
More information10. LINEAR REGRESSION AND CORRELATION
1 10. LINEAR REGRESSION AND CORRELATION The contingency table describes an association between two nominal (categorical) variables (e.g., use of supplemental oxygen and mountaineer survival ). We have
More informationLinking Assessments: Concept and History
Linking Assessments: Concept and History Michael J. Kolen, University of Iowa In this article, the history of linking is summarized, and current linking frameworks that have been proposed are considered.
More information