English Language Development Assessment (ELDA) 2004 Field Test Administration

Size: px
Start display at page:

Download "English Language Development Assessment (ELDA) 2004 Field Test Administration"

Transcription

1 English Language Development Assessment (ELDA) TECHNICAL REPORT 2004 Field Test Administration AND MEASUREMENT INCORPORATED Submitted to the Council of Chief State School Officers on behalf of the LEP-SCASS by the American Institutes for Research October 31, 2005 The contents of this document were developed under grant S368A030006, CFDA A, from the U.S. Department of Education. However, those contents do not necessarily represent the policy of the U.S. Department of Education, and you should not assume endorsement by the Federal Government.

2

3 TABLE OF CONTENTS EXECUTIVE SUMMARY... V INTRODUCTION DEVELOPMENT OF STANDARDS AND SPECIFICATIONS... 3 Development of Items... 4 Performance Level Descriptors... 5 Forms of PLDs... 5 Numbers of Levels and Range... 6 Focus of PLDs... 7 PLD Development Procedures... 8 Development of Field Test Forms Forms Design FIELD TEST SAMPLING PLAN LOGISTICAL ISSUES IN FIELD TEST ADMINISTRATION TEST ADMINISTRATION PROCEDURES Training Information Time to Take Each Field Test Accommodations Offered and Used ITEM ANALYSIS Scoring Procedures Classical Item Analysis Differential Item Functioning Analysis Review of Items Not Meeting Specified Standards Strategy for Modifying Problematic Items Field Test Form Analyses RASCH/IRT ANALYSIS Field Test Form Equating Vertical Linking VALIDITY STUDIES LONG-TERM OPERATION OF THE ELDA PROGRAM REFERENCES Spring 2004 Field Test Administration i American Institutes for Research

4 TABLE OF CONTENTS (CONTINUED) APPENDIX A: TEST BLUEPRINTS AND ITEM SPECIFICATIONS... A-3 APPENDIX B: PERFORMANCE LEVEL DESCRIPTORS... B-3 APPENDIX C: SAMPLING PROCEDURE FOR THE MARCH 2004 ELDA FIELD TEST ADMINISTRATION... C-3 APPENDIX D: ITEM STATISTICS AND DIF FOR ALL FIELD TESTED ITEMS... D-3 APPENDIX E: FREQUENCY DISTRIBUTION OF DIF CATEGORIES...E-3 APPENDIX F: FIELD TEST FORM ANALYSIS BY LANGUAGE GROUPS...F-3 APPENDIX G: ITEM DIFFICULTIES AND FIT STATISTICS FROM WINSTEPS (BEFORE VERTICAL LINKING)... G-3 APPENDIX H: ITEM POOLS WITH CALIBRATED STEP DIFFICULTIES (BEFORE VERTICAL LINKING)... H-3 APPENDIX I: VERTICAL LINKING...I-3 Written by Judit Antal, Mathina Calliope, Wen-Hung Chen, Phoebe Winter, and Steve Ferrara Spring 2004 Field Test Administration ii American Institutes for Research

5 LIST OF TABLES Table 1. Levels of Performance for ELDA... 7 Table 2. Indicators Defining Each Language Domain... 9 Table 3. Total Number of Items in ELDA Spring 2004 Field Test Table 4. Distribution of Item Types Table 5. Distribution of Item Types Table 6. General Test Administration in Time Guidelines Table 7. Summary of DIF Classification Rules Table 8. Summary of DIF Classification Rules for Polytomous Items Table 9. Number of Items Flagged on the Basis of Classical Item Analysis Table 10. Mean Coefficient Alpha Reliabilities Table 11. Number of Items Flagged for Misfitting Values Table 12. Number of Linking Items in Vertical Linking Analysis Spring 2004 Field Test Administration iii American Institutes for Research

6 Spring 2004 Field Test Administration iv American Institutes for Research

7 EXECUTIVE SUMMARY This report describes the March 2004 field test of the English Language Development Assessment (ELDA). The purpose of this field test was to ensure the development of an operational form by using a multi-stage review grounded in com-only accepted content and psychometric standards. This report details those standards and other procedures used to select items from the item pool for the operational form. The American Institutes for Research (AIR) collected data on items from field test forms across four skill domains and three grade clusters. In addition, the field test sample comprised four language groupings. The field test was designed to result in the construc-ion of the initial operational assessment form for implementation during the academic year. The 2004 field test assessments had the following components: Skill Domains: Listening, Reading, Speaking, Writing Grade Clusters: Grades 3 5, Grades 6 8, Grades 9 12 Language Groups: LEP Spanish, LEP Other, LEP Exited, Native English Speakers Field test forms were constructed according to test specifications developed by the Steering Committee in collaboration with AIR and members of the Limited-English-Proficient State Collaborative on Assessment and Student Standards (LEP-SCASS). Items were developed by qualified item writers to match specifications, and all items passed a rigorous review process before being included on field test forms. A large sample of students in grades 3 12 participated in the field test. Classical item analysis (IA) and differential item functioning (DIF) analyses were conducted to detect any potential test administration or scoring problems. Items flagged as too easy were found mostly for Native English Speakers as opposed to the most difficult items, which were found mostly for LEP Spanish students. Two items were identified as too difficult and 352 as too easy for Native English Speakers. For LEP Spanish students, 5 items appeared too easy and 19 were classified as too difficult. The highest number of severe item bias (DIF) were identified for LEP vs. Non-LEP contrast in grade cluster in 9-12 Reading. Test difficulty (p-value, or proportion correct value), biserial correlation, and omit rate were calculated for each test form. The average omit rate was moderately low across skill Spring 2004 Field Test Administration v Executive Summary

8 domains, grade clusters, test forms, and language groups. However, LEP Spanish and LEP Other students had a slight tendency to omit more items than non-lep students, which implies that they experienced difficulty answering test questions. The average biserial and polyserial correlations were moderately high. Reliabilities of the test forms were consistently high across all skill domains, grade clusters, and language groups. Items then passed through a two-stage review process consisting of reviews by a team of AIR psychometricians and by a Joint Committee formed by members of AIR, the Council of Chief State School Officers (CCSSO), Measurement Incorporated (MI), and the LEP-SCASS state membership. Recommendations were then made to include items in the operational pool, revise and resubmit items for administration in future field tests, or reject items from consideration. Only those items that passed all stages of review were included in the master item pool for operational form construction. AIR used Masters (1982) partial credit model to estimate ELDA item parameters. To implement randomly equivalent groups design, the IRT calibrations were conducted for each field test by setting the mean population ability to zero for each form in each grade cluster. Goodness of fit indices were also used to further analyze appropriateness of items. The highest number of misfit items (using Infit and Outfit statistics) were found in Reading, grade cluster 6-8 (26%). Through IRT calibration of the same common items embedded in the test forms of the adjacent grade clusters, it was possible to link the measurement scale of the three grade clusters into one scale. To ensure the quality of the linking item pool, AIR used a stepwise deletion procedure when computing the linking constant. Overall, 75% of the anchor items remained in the final linking item pool. Operational form construction was conducted jointly with AIR psychometricians, AIR content experts, and LEP-SCASS state member representatives. Form construction used the item locations from the Rasch/IRT analyses to pre-equate operational forms. This process resulted in the creation of an operational form for use in the school year. Spring 2004 Field Test Administration vi Executive Summary

9 INTRODUCTION This report describes the March 2004 field test of the English Language Development Assessment (ELDA). The purpose of this field test was to ensure the development of an operational form by using a multi-stage review grounded in commonly accepted content and psychometric standards. This report details those standards and other procedures used to select items from the item pool for the operational form. The purpose of this report is to provide a coherent overview of the March 2004 field test and the resultant pool of items used to construct the operational test form. ELDA is a battery of tests designed to allow schools to measure annual progress in the acquisition of English language proficiency skills among non-native English speaking students in The battery consists of separate tests for listening, speaking, reading, and writing, at each of three grade clusters: 3-5, 6-8, and The tests are aligned with the ESL standards of project member states and are developed to provide content coverage across three academic topic areas (English/Language Arts ELA; Math, Science, and Technology MST; and, Social Studies SS) and one non-academic topic area related to the school environment (School-Environmental S-E, which includes topics such as extra-curricular activities, student health, homework, classroom management, lunchtime, among many others). They are tests of language skills with content drawn from age-appropriate school curricular and non-curricular sources. They are not tests of academic content; in other words, no external or prior content-related knowledge is required to respond to test questions. Nor is performance on production skills tests scored in terms of content validity of response beyond what may be supplied in test input. While the main function of the ELDA tests is to measure annual progress in English language acquisition, they also permit the identification of students who have reached full English proficiency (FEP) or LEP-exit level, that is, a level considered appropriate for successful functioning within the school system at the appropriate grade level. It should be stressed that FEP is not intended as synonymous with native English speaker. The tests are not designed to provide placement information relative to English language courses or programs offered at a school. Nor are they designed to provide diagnostic feedback to students and their language teachers. Such functions could, at a future date, be integrated into the current battery. Spring 2004 Field Test Administration 1 Introduction

10 ELDA is designed to measure progress in the acquisition of English language proficiency across three grade clusters within grades The three clusters (3 5, 6 8, and 9 12) reflect common administrative clustering in many school systems, common clustering in other similar tests, and cognitive developmentally appropriate grouping. An important factor in decisions regarding grade clustering is the English language development characteristic of the target population diverse across 3-12 grades ranging from complete beginners to fully English proficient. Broad grade clustering, as determined for ELDA, allows for a more appropriate distribution of students across performance levels within each cluster than would have been possible with finer cluster distinctions. Broad grade clustering also reduces the challenges implied in vertical alignment procedures across clusters within each domain. As required under NCLB, ELDA contains separate tests for each of the four skills domains of listening, speaking, reading, and writing, for which separate scores need to be reported. The March 2004 field tests included assessments covering four skill domains across three grade clusters. In addition, the field test sample comprised four language groupings. The American Institutes for Research (AIR) gathered data by using two field test forms per grade cluster, resulting in 24 stand-alone field test forms. The use of multiple forms was required to produce an item candidate pool that was large enough to create one operational form that used only those items passing content and psychometric reviews. The March 2004 field test assessments had the following components: Skill Domains: Listening, Reading, Speaking, Writing Grade Clusters: Grades 3 5, Grades 6 8, Grades 9 12 Language Groups: LEP Spanish, LEP Other, LEP Exited, Native English Speakers The remainder of this report is organized into eight major sections. The first section is a brief description of the procedures used to construct field test forms for each skill domain and grade cluster combination to adequately represent relevant ESL standards and minimize the testing burden on participating schools. The second section describes the field test sampling plan. The third and the fourth sections detail logistical issues and test administration procedures. Section 5 explains the procedures used to analyze the field test data: classical item analysis and differential item functioning analyses and Section 6 describes the item response theory (IRT) Spring 2004 Field Test Administration 2 Introduction

11 techniques used to calibrate item difficulty by test form within grade clusters and across grade clusters for each skill domain. The last two sections include the validity studies and some recommendations regarding on-going development and maintenance of the psychometric integrity of ELDA. 1. DEVELOPMENT OF STANDARDS AND SPECIFICATIONS The design, development, and implementation of ELDA has been headed by the Council of Chief State School Officers (CCSSO) based in Washington D.C., specifically the Limited- English-Proficient State Collaborative on Assessment and Student Standards (LEP-SCASS). Nevada has headed the consortium of states. AIR has developed the test items and forms, and the Center for the Study of Assessment, Validity, and Evaluation (C-SAVE), based at the University of Maryland, has provided validity and reliability research to the project. No Child Left Behind, the 2001 Elementary and Secondary Education Act, was the impetus for the project, and six requirements highlighted in the act have shaped the architecture and conceptualization of ELDA. States must measure proficiency and show progress; assess all LEP students; independently measure the four skill domains of reading, writing, speaking, and listening; report a separate measure for comprehension; assess proficiency in academic language and in the language of social interaction; and align the assessments with their state Language Development (ELD) Standards. ELDA was aligned to state ESL standards through an analysis of the ESL standards of consortium states available to the project at the outset. From an analysis of state ESL standards for each of the four skills domains, AIR constructed and the LEP-SCASS approved a set of core ESL standards, which formed the basis for item design. The content that forms the basis for test items in ELDA is distributed across four topic areas. Approximately 25 percent of the items use language from each of the three curriculum domains of mathematics, science, and technology; English language arts; and social studies. The remaining 25 percent of the items use the social language of interaction between students, teachers, other school personnel, and parents related to school issues. Spring 2004 Field Test Administration 3 1. Development of Standards and Specifications

12 AIR proposed and the LEP-SCASS approved test item specifications for each of the four skill domain areas. Listening consists of only multiple-choice items. Students listen to five types of texts read by a narrator and actors (short phrases, short dialogues, extended dialogues, short presentations, and extended presentations; the 3-5 grade cluster excludes extended presentations because of developmental inappropriateness) and answer comprehension questions. Reading is also entirely multiple choice. Students read three types of text (short, early reading comprehension passages or cloze items; instructions; and long passages) and answer comprehension questions. Writing comprises both multiple-choice and short- and extendedconstructed-response items. Three broad standards editing, revising, and planning and organizing are assessed through multiple-choice items attached to short student-written passages. Speaking consists only of constructed-response items. Items come in sets of four prompts, each eliciting a different speaking function. Development of Items To develop items that measure these academic standards as specified by the content specifications, AIR brought together a highly competent pool of item writers, using a mix of external item writers, NAEP foreign language item writers, and other internal content experts. The LEP SCASS states recommended to AIR teachers who had experience with assessment development, and AIR contacted those teachers and selected them based on their availability. Bill Eilfort and Natalie Chen, assessment development consultants, worked to guide item writers during a weekend workshop in Denver in February The consultants, along with AIR staff, working in groups by domain and grade level, trained the item writers by explaining general item writing principles, and were available to help the participants develop their items. AIR also provided books and texts and other reference materials (such as Encarta online encyclopedia) for the item writers and they developed items individually and in small groups. After items were drafted and reviewed by the writers, they were entered into the review protocol as part of AIR s Item Tracking System (ITS) database. The following review levels were then conducted: Preliminary: Items reviewed by junior staff for formatting and basic item construction principles LABS: Items reviewed by a trained and certified LABS (language accessibility, bias, and sensitivity) reviewer Editor: Items reviewed for grammar, writing conventions and clarity Spring 2004 Field Test Administration 4 1. Development of Standards and Specifications

13 Senior: Items reviewed by a senior content expert in ESL or English language arts, evaluating the items for their match to the standards and for their measurement integrity Items that passed all these reviews were brought to LEP-SCASS meetings for review, comments, revision, and approval. At SCASS content review meetings, members split into grade-cluster groups and were instructed on the specifications for the items, the standards and the benchmarks, and individually reviewed the items before meeting as a group to accept, revise, reject or recommend revisions for resubmission all the items. Those items that survived the final review entered the field-test item pool. Performance Level Descriptors AIR has developed performance level descriptors (PLDs) for each of the four language skills tested in the ELDA battery listening, speaking, reading, and writing (see Appendix B). In addition to the requirement that independently obtained scores be reported for each of these four domains, federal regulations also require that a fifth score be reported for the composite skill of comprehension, derived from a combination of the listening and reading scores. As such, we have included comprehension as a fifth set of PLDs. The PLDs have undergone a series of review and revision procedures at AIR (outlined below) with both ELDA project staff and non-project staff. Forms of PLDs The PLDs exist in two forms, reflecting the two different functions that PLDs serve. One form is a matrix display of the descriptions of performance levels intended for standard-setting purposes. In this format we expect the cell-by-cell display of information to facilitate the task of capturing the distinctions between levels for each of the performance indicators that make up the language domain (the term indicators in this document refers to the set of component skills that define each of the language domains). The second form is a narrative of the information contained in the matrix intended for reporting purposes. In this format we expect the description to capture the character of a level across all the indicators, thus making it easier for stakeholders such as schools and parents to interpret test performance and progress in the Spring 2004 Field Test Administration 5 1. Development of Standards and Specifications

14 acquisition of English. This document first presents the matrix form of the PLDs, and then the narratives. The PLDs are common across the grade clusters for which the tests have been designed. The PLD for listening, for example, provides common performance level descriptions for the 3-5, 6-8 and 9-12 clusters. Underlying the notion of a common performance scale to describe English language proficiency development across 3-12 is the assumption that the same domain performance indicators, as well as the values given to each indicator at each level, can be used to describe language development in third grade as can be used in high school. We believe that this assumption is valid, that indicators such as text type, text-level structure, or fluency, and their values across performance levels, are generalizable across age or grade. What is variable across age or grade, with respect to test design and performance measurement, are specific ageappropriate test input materials (contexts for language use such as stimuli, the topics embedded in stimuli, specific features of grammar and vocabulary that are bound by cognitive development, and test graphics) and the cognitive skills required by the language tasks of the test. Numbers of Levels and Range It was determined at the Steering Committee meeting held at project start-up in Berkeley, California, in December 2002, that the PLDs should contain five levels of discrimination (see Table 1 below). Five levels were considered appropriate to (a) capture the construct of development in academic English language proficiency from Pre-functional to Full English Proficiency; (b) permit an acceptable resolution to the tension between cost effectiveness in test development and test administration efficiency versus psychometric needs; (c) permit students to show growth in English language development from one year to the next; and (d) permit students to reach the ultimate target of Full English proficiency within a realistic period of time. The levels range from Full English Proficiency, a level at which an LEP student is deemed to be able to function effectively and consistently through the medium of academic English in the school system (and thus ceases to be defined as LEP), to Pre-functional, a level at which an LEP student is consistently unable to communicate with any success in the English of the school environment, although may have some limited knowledge of English. It should be pointed out that the proficiency required for entry into the Full English Proficiency level is not Spring 2004 Field Test Administration 6 1. Development of Standards and Specifications

15 synonymous with that of native-speaker proficiency in English; FEP students may function effectively and successfully in the school system while still exhibiting a non-native speaker accent, while making production errors (which typically would not impede communication), or while comprehending less than the full range of subtle meanings intended by a writer or speaker (again, with little negative effect on communication). By contrast, many aspects of an Advanced level proficiency and some aspects of an Intermediate and even of a Beginners level proficiency allow for the demonstration of an ability to function effectively in the school system, with less and less consistency and sophistication as one moves down the scale. Table 1. Levels of Performance for ELDA Level Label 5 Full English Proficiency 4 Advanced 3 Intermediate 2 Beginners 1 Pre-functional Note: The labels used to define each level are provisional and pending approval by the LEP-SCASS members. Focus of PLDs The PLDs developed by AIR, in agreement with guidelines established by the LEP- SCASS members at the project start-up Steering Committee meeting, are intended to describe threshold points rather than the full range implied by a level; that is, each description characterizes what is minimally required for entry into a level. This is true of other second or foreign language scales of performance, including those of the American Council on the Teaching of Foreign Languages (the ACTFL Proficiency Guidelines), the Interagency Language Roundtable Proficiency Levels (ILR or Government Foreign Service scale), and the Council of Europe Proficiency Levels. The threshold approach is motivated by the constructs that are defined by the levels; that is, language skills that are cyclical and multidimensional and expanding patterns of learning rather than skills that are linear and unidimensional and learning at rate of complexity that is constant. For example, at level 1 a student may have no understanding of how to express present time in English, at level 2 a student may be able to express present time through the use of the present tense of some common verbs with simple adverbial present tense markers, at level 3 and beyond a student should have more extensive Spring 2004 Field Test Administration 7 1. Development of Standards and Specifications

16 ways of expressing present time with the use of a greater range of verbs and with more sophisticated time markers and with an ability to contrast present with other time reference. The model is often described in the literature as an inverted pyramid in which, as you progress up the scale, progressively more language skill is required to attain the next level. The PLD for the bottom level is an exception; it does not conform to the threshold requirement but rather is a description of a range, from zero knowledge and ability to just below what is minimally required for entry into level 2. The range implied by level 1, a pre-functional or pre-communicative level (to continue the pyramid metaphor, the apex of the pyramid), is relatively uncontentious to define. PLD Development Procedures An initial draft version of the PLDs for each of the four language skills was created at the project start-up Steering Committee meeting. Documents from the California State and New York State English Language Proficiency Levels, which represented substantive consideration of the issue of defining proficiency levels in the field of standards-aligned assessment for ESL, were consulted in this initial process. The draft version of the PLDs provided an important initial understanding and agreement of the type of characterization required at each of the five levels, reflecting a common understanding of the theoretical foundation for the descriptions. Particularly important was the determination of a working definition of level 5, Full English Proficient. The initial draft version of the PLDs, however, lacked vertical and horizontal alignment. They were substantially reviewed and revised during the test development process to achieve alignment, both within domain and across the four domains. This review and revision process involved the following steps: 1 Analyzing the original draft versions of the PLDs within a matrix to determine the performance indicators used to define each domain; 2 Assessing the degree to which the indicators represented a complete and theoretically sound definition of the domain and aligned with test design specifications and scoring rubrics for constructed response items, and then making appropriate revisions to the indicators; 3 Assessing the degree to which the indicators were vertically aligned across the five levels and then making appropriate revisions; Spring 2004 Field Test Administration 8 1. Development of Standards and Specifications

17 4 Assessing the degree to which the indicators were horizontally aligned across domains (particularly listening with reading, and speaking with writing) and then making appropriate revisions; 5 Analyzing both listening and reading PLDs to determine what may be considered common for the creation of comprehension PLDs; 6 Reviewing the content of all PLDs with internal AIR staff who are content experts but who are not related to the ELDA project; and 7 Submitting all PLDs to editorial review. Table 2 below summarizes the performance indicators identified for each of the five domains for which PLDs have been constructed: Table 2. Indicators Defining Each Language Domain Receptive Skills Productive Skills Performance Indicators Listening Reading Comprehension Speaking Writing 1 Text types content area/non-content area 2 Discou se types content area/non-content area 3 Speech types connect, tell, expand, reason content area/non-content area 4 Forming a general understanding main idea, theme, problem, conflict, plot, character, event, mood, message, purpose 5 Developing an understanding details 6 Linking information communicator point of view inference, conclusion, evaluation 7 Vocabulary and structure academic/school-social formal/informal 8 Use of vocabulary academic/school-social formal/informal 9 Text-level structure organization logic of argument cohesive devices Spring 2004 Field Test Administration 9 1. Development of Standards and Specifications

18 10 Table 3. Indicators Defining Each Language Domain (continued) Receptive Skills Productive Skills Performance Indicators Listening Reading Comprehension Speaking Writing Sentence-level structure tense modality word order inflection 11 Mechanics punctuation spelling capitalization 12 Fluency creativity, spontaneity, flexibility pronunciation Development of Field Test Forms This section briefly describes the construction of field test forms. Field test form construction is critical to ensuring a large item pool from which operational forms can be constructed. Errors in the field test form-construction phase can result in a depleted item pool or a mis-estimation of item parameters that perpetuates throughout the form-construction process. The specifications for the operational forms were used to develop specifications for the field test forms. AIR with the LEP-SCASS technical advisory committee (TAC) determined that two field test forms (A and B) for each grade cluster for each skill domain would most likely yield the number of items necessary for building one operational form. For Reading and Listening, each 3 5 and 6 8 field test form contained 60 items in contrast to the 50 items that were specified for the operational form. The 9 12 field-test form contained 75 items in contrast to 60 in the operational form. For Writing, Form A and Form B differed to maximize the number of constructed-response items field tested. For Speaking, the field test forms mirrored exactly the blueprints for the operational form. The bookmaps for the field-test forms for Reading, Writing and Listening, which provide descriptions of standards, item types and target difficulty levels, can be found in Appendix A, as well as the forms map, listing task names and shared tasks for Speaking. Spring 2004 Field Test Administration Development of Standards and Specifications

19 The next step was to construct field test forms from the items in the field test item pool. Content experts from AIR selected items from this pool to meet the requirements described in the test specifications. Guidelines for form assembly were the following: balance across keys and across topic areas, minimalization of item exposure, balance of ESL standards coverage specifications, and item/set influence on each other. Forms were submitted to the LEP-SCASS for review and approval, and AIR staff made adjustments on the basis of feedback. Forms Design AIR used a randomly equivalent groups design for the initial field test of ELDA items. This design allowed us to maximize the number of unique items administered in the field test while keeping the number of field test forms to a minimum. To field test new items in subsequent years, AIR will adopt a common item equating design in which field test items are embedded in operational test forms. Embedded field test designs have the advantage of requiring only one administration per school year, easing the burden on participating schools and students, and they allow us to obtain field test item parameter estimates under operational test administration conditions. In 2004, only multiple-choice items were included in the Reading and Listening test forms. Writing test forms included multiple-choice, extended constructed-response, and short constructed-response item formats. Speaking test forms comprised graphic prompts to which students provided oral responses. Student responses were recorded on audio tape for subsequent scoring. Table 3 specifies the total number of items appearing in the spring 2004 field test. For vertical linking, some items were shared across forms. Items appeared on both the 3-5 and 6-8 instruments, and on both the 6-8 and 9-12 instruments. To determine which items to share, content experts developed selection criteria: the shared items should cover the standards and there should be even text-type representation. Grade and curriculum appropriateness also was considered. For example, a passage shared across the 6-8 and 9-12 forms would not deal with high school graduation. Finally, difficulty level was considered. An attempt was made to choose the easiest 6-8 items to share with the 3-5 cluster or the most difficult 3-5 items to share with the 6-8 cluster. Spring 2004 Field Test Administration Development of Standards and Specifications

20 Table 3. Total Number of Items in ELDA Spring 2004 Field Test GradeCluster Form Listening Reading Speaking Writing 3 5 A B A B A B Table 4 provides information on item types and about the maximum score points available in each form. Table 4. Distribution of Item Types Listening GradeCluster Form MC Constructed-Response Items 2 Point CR 3 Point CR 4 Point CR Total Score Points 3 5 A B A B A B Table 5. Distribution of Item Types Readin g Speaking GradeCluster Form MC Constructed-Response Items Total Score Points 3 5 A B A B A B A B A B A B Spring 2004 Field Test Administration Development of Standards and Specifications

21 Table 5. Distribution of Item Types (continued) Writing GradeCluster Form MC Constructed-Response Items Total Score Points 3 5 A B A B A B For Reading and Writing, we spiraled the two field test forms within classrooms. For Listening and Speaking, we spiraled the two field test forms across classrooms within school for those schools administering field tests to more than one classroom. 2. FIELD TEST SAMPLING PLAN AIR developed a field test sampling plan for spring 2004 that was implemented by Measurement Incorporated (MI). The main purpose of spring 2004 field testing was to collect data for item screening and parameter calibration. The final goal was to construct one core ELDA form to be administered in the academic year. A target of 1,000 students per form was set to obtain reliable estimates for item screening and parameter estimation. The number of students requested from each state was determined by the number of states participating in the field test. Thirteen states participated in the spring 2004 field test (Alabama, Georgia, Iowa, Indiana, Kentucky, Louisiana, Nebraska, New Jersey, Ohio, Oklahoma, South Carolina, Virginia, and West Virginia). A sample of 2,000 students per grade cluster was drawn equally from the participating member states. The first sampling plan approved by the Technical Advisory Committee was distributed to the member states in December According to this design, schools within each state were clustered into four groups on the basis of the average English proficiency of their LEP students. Following this scheme, schools were coded as Low, Medium Low, Medium High, and High with respect to the overall English language proficiency of LEP students. Schools were then sampled proportionally to their group size, as measured by the total number of schools at each of the four levels of overall language proficiency. Within schools, students were selected to participate in the field test according to their native language background (LEP Spanish, LEP Other, LEP Exited, and native English speakers). Spring 2004 Field Test Administration Field Test Sampling Plan

22 In February 2004, however, some states indicated that they did not collect student performance data on English proficiency and were therefore not able to follow the sampling procedure. For this reason, a revised plan (see Appendix C) was developed and distributed to these states. The new plan required states to select students according to the number of LEP students enrolled. Following this plan, schools in each state were listed within each grade cluster in the order of the number of LEP students enrolled. The list of schools was then divided into equal thirds called Large, Medium, and Small, corresponding to the number of LEP students enrolled. Some questions arose when the states actually implemented this procedure. Later a supplement document was distributed to the member states to address those questions (see Appendix C). 3. LOGISTICAL ISSUES IN FIELD TEST ADMINISTRATION Materials for each student were put into a plastic bag. The bag included a Reading test booklet, Writing test booklet, Speaking test booklet, Listening test booklet, blank Speaking response tape, and Student Background Questionnaire. Materials for each teacher were put into a separate bag. These included the Speaking and Listening prompt cassettes and the Administration Manual. All materials in the teacher and student bags (with the exception of the Administration Manual, which was not secure) were security barcoded. Training Information 4. TEST ADMINISTRATION PROCEDURES The primary training vehicle was the Administration Manual. Also, a toll-free help line staffed with trained ELDA personnel was in effect through the duration of the project. Time to Take Each Field Test The field test was officially untimed. General guidelines for times were provided in the Administration Manual as follows: Table 6. General Test Administration in Time Guidelines Grade Cluster Listening Speaking Reading Writing hour, 20 mins 25 mins 1 hour 1 hour hour, 20 mins 25 mins 1 hour 1 hour hour, 40 mins 25 mins 1 hour, 15 mins 1 hour Spring 2004 Field Test Administration Logistical Issues in Field Test Administration

23 Accommodations Offered and Used The Administration Manual provided test administrators with guidelines for offering and using test accommodations. Specifically, any accommodations offered should be related to the student s specific disability. No accommodations were allowed that might change the content of the assessment. For example, defining words used in the writing or reading passages, any other stimulus materials, or the assessment questions were not considered appropriate. Accommodations in the administration procedures for ELDA were allowable provided that they were specified in a student s IEP or 504 plan and provided for the ELDA. Because a student s assessment results should reflect her or his true ability and should not be influenced by inappropriate accommodations, the administration manual emphasized that any accommodations should be consistent with practices routinely used in the student's instruction and assessment. Test administrators were instructed that any accommodations provided for an individual must be specified before the student takes the assessment and must be documented in the student s IEP. They were directed to contact their District Coordinator for additional state guidelines on accommodations for the ELDA. If a student with disabilities takes the ELDA, the administration of the assessment should be under standardized assessment conditions. Only those accommodations listed below or specifically identified in the student s IEP or 504 plan may be provided. Any accommodation provided to a student must be noted on the third page of his or her ELDA Student Background Questionnaire. The following accommodations may be provided to students with disabilities on the ELDA (in addition to any accommodations specified in the student s IEP or 504 plan): Computerized Assessment: Students may use a computer to type their responses instead of writing in their test booklets. Spell check, glossaries, grammar check, dictionaries and thesauruses are not allowed on the ELDA. Word processed responses should be stapled into the student s original test booklet. Dictation of Responses: Students who are unable to write due to a disability are allowed to dictate their responses to a transcriber or into an audio recorder for the Reading and Listening ELDA. The student s answers should be transferred into the student s original test booklet. A scribe may not be used for the Writing ELDA. Spring 2004 Field Test Administration Test Administraion Procedures

24 Extended/Adjusted Time: The ELDA is an untimed assessment. For students whose attention span or behavior interferes with regular testing sessions, test administration may be altered to allow for a number of shorter testing sessions. Testing may also be stopped and continued at a later time if behavior interferes with the testing session. The time of day the test is administered may also be adjusted to be most beneficial to the student. All testing sessions MUST be completed within the allotted testing window. Individual/Small Group Administration: Tests may be administered to a small group or an individual requiring more attention than can be provided in a large group administration. 5. ITEM ANALYSIS AIR received the data from MI for analysis. An initial evaluation of item quality and an examination of potential bias in item performance were conducted with classical item analysis (IA) and analysis of differential item functioning (DIF). Classical item analysis is a relatively straightforward approach to examining the quality of the items in each field test form. DIF analyses in language assessments are designed to determine whether students of similar levels of proficiency have different probabilities of answering items correctly (or receiving higher credit levels in the case of constructed-response items) because of language-group membership. In some cases, differential item functioning may indicate item bias. However, it is necessary for an external committee to review all items classified as having high levels of DIF to determine whether the item is unfair to members of various language subgroup populations. The following sections describe the steps of item analyses, which include classical item analysis, DIF analysis, and a review of those items not meeting the specified item statistics. The final section discusses analyses of test reliability and test difficulty. Scoring Procedures Experienced MI professional readers and scoring leadership did the Writing and Speaking handscoring. These same readers scored the 2005 census test that immediately preceded the field test. Writing training materials were identical to those that came out of the 2004 Rangefinding meetings held by the CCSSO in Boston and used in the 2004 field test and 2005 census test. According to contract guidelines, 10% of the writing responses were second read as a reliability check. Readers who scored the Speaking field test responses were trained using the same materials shipped to teachers who scored the 2005 Speaking census responses. Spring 2004 Field Test Administration Item Analysis

25 These materials were developed by MI and representatives from a number of SCASS states, including Nebraska and Louisiana. Classical Item Analysis In addition to evaluating the statistical properties of test items, item analysis also provides an opportunity to detect possible data errors before the analysis moves further. For each item, the item analysis yields the proportion of students selecting each response option for multiple-choice items or the proportion of students scoring in each response category for constructed-response items. The omit rate for each item is also calculated and includes both the percentage of students skipping the item and the percentage of students not reaching the item. Biserial correlations (un-adjusted ) for multiple-choice items and polyserial correlations for constructed-response items are used to examine the correlation between a student s performance on each item with the student s overall performance on the test form. For purposes of calculating item statistics, omitted and not-reached are treated as incorrect. For multiple-choice items, the proportion correct value (p-value) is calculated as the number of students who answer an item correctly divided by the total number of students. The biserial correlation of the keyed response is the correlation between the item score and the total test score. Biserial correlations are also calculated for distracters. The biserial correlation for distracters is the correlation between the item score, treating the target distracter as the correct response, and the total test score, restricting the sample to only those who chose either the target distracter or the keyed response (Attali & Fraenkel, 2000). For constructed-response items, we calculated the proportion of students falling into each score-point category defined by the item s scoring rubric (e.g., 0, 1, 2 for constructed-response items with three score categories; 0, 1, 2, 3, 4 for constructed-response items with five score categories). We computed item difficulty as the mean score on the item across all students taking the form. For both multiple-choice and constructed-response items, omit rates were also calculated. High rates of response omission may indicate confusion by test takers on how to respond to the item, confusion among readers about how to score the item, excessive test speededness, or an Spring 2004 Field Test Administration Item Analysis

26 item that is too difficult. Appendix D presents item statistics by language group for all field tested items. Differential Item Functioning Analysis AIR conducted analyses of differential item functioning (DIF) on all field test items to detect potential item bias across language groups. The purpose of these analyses is to identify items that may favor students in one group over students of similar ability in another group. For example, items that are more difficult for one language group (e.g., Spanish LEP students) may require background knowledge or skills that are less prevalent among these students than among other language group (e.g., Other LEP students) of similar ability, indicating potential item bias. This interpretation is referred to as construct irrelevance. AIR performs three DIF analyses for each item: (1) LEP Spanish versus LEP Other, (2) LEP Exited versus native English speakers, and (3) LEP (LEP Spanish and LEP Other combined) versus Non-LEP (LEP Exited and native English speakers combined). AIR employed the Mantel-Haenszel procedure whenever the sample size is greater than N = 300 for both groups (Holland & Thayer, 1986, 1992) to conduct DIF analyses for the field test. For detecting DIF for dichotomous items, the Mantel-Haenszel (MH) (Holland & Thayer, 1988; Mantel & Haenszel, 1959) and generalized Mantel-Haenszel (GMH; Mantel & Haenszel, 1959, Somes, 1986) procedures are the most popular DIF detection procedures used in educational measurement (Fidalgo, Mellenbergh & Muniz, 2000; Wang & Su, 2004). Recent research show that MH procedures are robust and have sufficient power at the 5% significant level at 10% DIF item conditions. MH test also maintains good control of its Type I error and is more powerful than the likelihood ratio test when the comparison group latent trait distributions are identical (Ankenmann, Witt, and Dunbar, 1999). Also, Wang and Su (2004) showed that MH and GMH both yield better control of Type I error than other methods when used with the Rasch model because they use number-correct scores as the matching variable. Furthermore, test length has little impact on their performance and can be used with smaller sample size. Total scores for each student on the test were used as the ability matching variable. MHdelta ˆ MH, the log-odds-ratio converted to the delta difficulty scale where 0 indicates no DIF, is then computed. Items are classified into three categories ranging from no DIF to mild DIF to Spring 2004 Field Test Administration Item Analysis

27 severe DIF according to the DIF classification convention established by the Educational Testing Service (Allen, Kline, & Zelenak, 1996) and summarized in Table 5. In the table, A refers to no DIF, B refers to mild DIF, and C refers to severe DIF. Table 7. Summary of DIF Classification Rules Classified Category Rule C ˆ is significantly larger than 1.0, and ˆ MH B ˆ MH is significantly larger than zero and either a) ˆ < 1.5, or MH b) ˆ is not significantly different from MH 1.5. MH A ˆ MH is not significantly different from zero, or ˆ < 1.0. MH For polytomous items, we calculated both the Mantel-Haenszel chi-square ( MH χ 2 ) (Zwick & Thayer, 1996; Zwick, Donoghue, & Grima, 1993) and the Standardized Mean Difference (SMD) index (Dorans & Kulick 1986). For constructed-response items, the classification rules are defined in Table 6. Appendix D exhibits the results of the three DIF analyses for each item. Table 8. Summary of DIF Classification Rules for Polytomous Items Classified Category C B A Rule SMD χ The p-value of MH is less than.05 and SD. The p-value of MH SMD 1.7 < 0.25 SD Otherwise 2 χ is less than.05 and Review of Items Not Meeting Specified Standards The item analyses provided information about the quality of the items. Items were flagged for review for four main reasons: 1. The proportion correct value is out of range [0.2, 0.9]. Spring 2004 Field Test Administration Item Analysis

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS Michael J. Kolen The University of Iowa March 2011 Commissioned by the Center for K 12 Assessment & Performance Management at

More information

Section 5. Field Test Analyses

Section 5. Field Test Analyses Section 5. Field Test Analyses Following the receipt of the final scored file from Measurement Incorporated (MI), the field test analyses were completed. The analysis of the field test data can be broken

More information

Gender-Based Differential Item Performance in English Usage Items

Gender-Based Differential Item Performance in English Usage Items A C T Research Report Series 89-6 Gender-Based Differential Item Performance in English Usage Items Catherine J. Welch Allen E. Doolittle August 1989 For additional copies write: ACT Research Report Series

More information

Appendix C Protocol for the Use of the Scribe Accommodation and for Transcribing Student Responses

Appendix C Protocol for the Use of the Scribe Accommodation and for Transcribing Student Responses Appendix C Protocol for the Use of the Scribe Accommodation and for Transcribing Student Responses writes or types student responses into the Student Testing Site or onto a scorable test booklet or answer

More information

Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria

Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria Thakur Karkee Measurement Incorporated Dong-In Kim CTB/McGraw-Hill Kevin Fatica CTB/McGraw-Hill

More information

Holt McDougal Avancemos!, Level correlated to the. Crosswalk Alignment of the National Standards for Learning Languages

Holt McDougal Avancemos!, Level correlated to the. Crosswalk Alignment of the National Standards for Learning Languages Holt McDougal Avancemos!, Level 2 2013 correlated to the Crosswalk Alignment of the National Standards for Learning Languages with the Common Core State Standards READING 1. Read closely to determine what

More information

Boise State University Foundational Studies Program Course Application Form

Boise State University Foundational Studies Program Course Application Form Boise State University Foundational Studies Program Course Application Form Due to the Foundational Studies Program by August 19, 2011 After the Foundational Studies Program has approved a course, departments

More information

Colorado Student Assessment Program

Colorado Student Assessment Program Colorado Student Assessment Program Technical Report 2007 Submitted to the Colorado Department of Education November 2007 Developed and published under contract with Colorado Department of Education by

More information

The Matching Criterion Purification for Differential Item Functioning Analyses in a Large-Scale Assessment

The Matching Criterion Purification for Differential Item Functioning Analyses in a Large-Scale Assessment University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Educational Psychology Papers and Publications Educational Psychology, Department of 1-2016 The Matching Criterion Purification

More information

INSPECT Overview and FAQs

INSPECT Overview and FAQs WWW.KEYDATASYS.COM ContactUs@KeyDataSys.com 951.245.0828 Table of Contents INSPECT Overview...3 What Comes with INSPECT?....4 Reliability and Validity of the INSPECT Item Bank. 5 The INSPECT Item Process......6

More information

Holt McDougal Avancemos!, Level correlated to the. Crosswalk Alignment of the National Standards for Learning Languages

Holt McDougal Avancemos!, Level correlated to the. Crosswalk Alignment of the National Standards for Learning Languages Holt McDougal Avancemos!, Level 1 2013 correlated to the Crosswalk Alignment of the National Standards for Learning Languages READING 1. Read closely to determine what the text says explicitly and to make

More information

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Jill F. Kilanowski, PhD, APRN,CPNP Associate Professor Alpha Zeta & Mu Chi Acknowledgements Dr. Li Lin,

More information

Appendix C: Protocol for the Use of the Scribe Accommodation and for Transcribing Student Responses

Appendix C: Protocol for the Use of the Scribe Accommodation and for Transcribing Student Responses Appendix C: Protocol for the Use of the Scribe Accommodation and for Transcribing Student Responses Scribing a student s responses by an adult test administrator is a response accommodation that allows

More information

Colorado Student Assessment Program

Colorado Student Assessment Program Colorado Student Assessment Program Technical Report 2011 Submitted to the Colorado Department of Education October 2011 Developed and published under contract with Colorado Department of Education by

More information

Proceedings of the 2011 International Conference on Teaching, Learning and Change (c) International Association for Teaching and Learning (IATEL)

Proceedings of the 2011 International Conference on Teaching, Learning and Change (c) International Association for Teaching and Learning (IATEL) EVALUATION OF MATHEMATICS ACHIEVEMENT TEST: A COMPARISON BETWEEN CLASSICAL TEST THEORY (CTT)AND ITEM RESPONSE THEORY (IRT) Eluwa, O. Idowu 1, Akubuike N. Eluwa 2 and Bekom K. Abang 3 1& 3 Dept of Educational

More information

Author s response to reviews

Author s response to reviews Author s response to reviews Title: The validity of a professional competence tool for physiotherapy students in simulationbased clinical education: a Rasch analysis Authors: Belinda Judd (belinda.judd@sydney.edu.au)

More information

Houghton Mifflin Harcourt Avancemos!, Level correlated to the

Houghton Mifflin Harcourt Avancemos!, Level correlated to the Houghton Mifflin Harcourt Avancemos!, Level 4 2018 correlated to the READING 1. Read closely to determine what the text says explicitly and to make logical inferences from it; cite specific textual evidence

More information

Successfully Making Decisions About the Use of Scribing and Speech-to-Text

Successfully Making Decisions About the Use of Scribing and Speech-to-Text Successfully Making Decisions About the Use of Scribing and Speech-to-Text Sheryl Lazarus and Kathy Strunk 2019 CEC Convention, Indianapolis NCEO is supported through a cooperative agreement between the

More information

COLLEGE OF THE DESERT

COLLEGE OF THE DESERT COLLEGE OF THE DESERT Course Code ASL-004 Course Outline of Record 1. Course Code: ASL-004 2. a. Long Course Title: Intermediate American Sign Language II b. Short Course Title: INTERMEDIATE ASL II 3.

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD Psy 427 Cal State Northridge Andrew Ainsworth, PhD Contents Item Analysis in General Classical Test Theory Item Response Theory Basics Item Response Functions Item Information Functions Invariance IRT

More information

Appendix A: NAPLaN Reading Skills by Proficiency Band

Appendix A: NAPLaN Reading Skills by Proficiency Band Appendix A: NAPLaN Reading Skills by Proficiency Band NAPLaN Comprehension Skills by Proficiency Band PB Finds clearly stated information in the first sentence of an illustrated information text. 1 Locates

More information

Introduction. 1.1 Facets of Measurement

Introduction. 1.1 Facets of Measurement 1 Introduction This chapter introduces the basic idea of many-facet Rasch measurement. Three examples of assessment procedures taken from the field of language testing illustrate its context of application.

More information

Differential Item Functioning from a Compensatory-Noncompensatory Perspective

Differential Item Functioning from a Compensatory-Noncompensatory Perspective Differential Item Functioning from a Compensatory-Noncompensatory Perspective Terry Ackerman, Bruce McCollaum, Gilbert Ngerano University of North Carolina at Greensboro Motivation for my Presentation

More information

Differential Item Functioning

Differential Item Functioning Differential Item Functioning Lecture #11 ICPSR Item Response Theory Workshop Lecture #11: 1of 62 Lecture Overview Detection of Differential Item Functioning (DIF) Distinguish Bias from DIF Test vs. Item

More information

Development, Standardization and Application of

Development, Standardization and Application of American Journal of Educational Research, 2018, Vol. 6, No. 3, 238-257 Available online at http://pubs.sciepub.com/education/6/3/11 Science and Education Publishing DOI:10.12691/education-6-3-11 Development,

More information

Providing Highly-Valued Service Through Leadership, Innovation, and Collaboration

Providing Highly-Valued Service Through Leadership, Innovation, and Collaboration ~ivingston Providing Highly-Valued Service Through Leadership, Innovation, and Collaboration March 3, 27 Dear Parents and Community Members: We are pleased to present you with the Annual Education Report

More information

COLLEGE OF THE DESERT

COLLEGE OF THE DESERT COLLEGE OF THE DESERT Course Code ASL-003 Course Outline of Record 1. Course Code: ASL-003 2. a. Long Course Title: Intermediate American Sign Language I b. Short Course Title: INTERMEDIATE ASL I 3. a.

More information

Validation of an Analytic Rating Scale for Writing: A Rasch Modeling Approach

Validation of an Analytic Rating Scale for Writing: A Rasch Modeling Approach Tabaran Institute of Higher Education ISSN 2251-7324 Iranian Journal of Language Testing Vol. 3, No. 1, March 2013 Received: Feb14, 2013 Accepted: March 7, 2013 Validation of an Analytic Rating Scale for

More information

Oral Proficiency Interview Tester Training. Manual >>>CLICK HERE<<<

Oral Proficiency Interview Tester Training. Manual >>>CLICK HERE<<< Oral Proficiency Interview Tester Training Manual the interests, experiences, and the linguistic abilities of the test takers. The Oral Proficiency Interview-computer (OPIc) was developed as a computerized

More information

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison Empowered by Psychometrics The Fundamentals of Psychometrics Jim Wollack University of Wisconsin Madison Psycho-what? Psychometrics is the field of study concerned with the measurement of mental and psychological

More information

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2 MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and Lord Equating Methods 1,2 Lisa A. Keller, Ronald K. Hambleton, Pauline Parker, Jenna Copella University of Massachusetts

More information

Performance Indicator INFORMATION FLUENCY - CONNECT

Performance Indicator INFORMATION FLUENCY - CONNECT Information Fluency Curriculum 1 Assessment Matrix June 2007 INFORMATION FLUENCY - 1. Draws on prior experience and knowledge. ELA 1,3 2. Identifies the big picture. ELA 1, Info. Lit 1 3. Participates

More information

ARTS IN MOTION CHARTER SCHOOL 7th Grade ELA CURRICULUM MAP

ARTS IN MOTION CHARTER SCHOOL 7th Grade ELA CURRICULUM MAP ARTS IN MOTION CHARTER SCHOOL 7th Grade ELA CURRICULUM MAP Projects Essential Questions Enduring Understandings Cognitive Skills CCSS Final Product Cultural Narratives Project Why is it important to tell

More information

Linking Assessments: Concept and History

Linking Assessments: Concept and History Linking Assessments: Concept and History Michael J. Kolen, University of Iowa In this article, the history of linking is summarized, and current linking frameworks that have been proposed are considered.

More information

Warren Consolidated Schools

Warren Consolidated Schools Creating Dynamic Futures through Student Achievement, High Expectations, and Strong Relationships 1.888.4WCS.KIDS www.wcskids.net Text WCSKIDS to 5778 ADMINISTRATION BUILDING 313 Anita, MI 4893 586.825.24

More information

School Annual Education Report (AER) Cover Letter

School Annual Education Report (AER) Cover Letter School (AER) Cover Letter May 11, 2018 Dear Parents and Community Members: We are pleased to present you with the (AER) which provides key information on the 2016-2017 educational progress for the Lynch

More information

Test-Taking Strategies and Task-based Assessment: The Case of Iranian EFL Learners

Test-Taking Strategies and Task-based Assessment: The Case of Iranian EFL Learners Test-Taking Strategies and Task-based Assessment: The Case of Iranian EFL Learners Hossein Barati Department of English, Faculty of Foreign Languages, University of Isfahan barati@yahoo.com Zohreh Kashkoul*

More information

2017 VERSION 1 TECHNICAL BULLETIN. Pre. Pre.

2017 VERSION 1 TECHNICAL BULLETIN. Pre. Pre. 2017 VERSION 1 TECHNICAL BULLETIN Pre Pre www.act.org Contents Contents............................................................ i Tables...............................................................

More information

COMPETENCY REVIEW GUIDE OFFICE OF EDUCATOR LICENSURE. How to Satisfy and Document Subject Matter Knowledge Competency Review Requirements

COMPETENCY REVIEW GUIDE OFFICE OF EDUCATOR LICENSURE. How to Satisfy and Document Subject Matter Knowledge Competency Review Requirements COMPETENCY REVIEW GUIDE OFFICE OF EDUCATOR LICENSURE How to Satisfy and Document Subject Matter Knowledge Competency Review Requirements February, 2017 Table of Contents: INTRODUCTION 1 HOW TO SATISFY

More information

College of Education and Human Services Exceptional Student & Deaf Education Course Descriptions

College of Education and Human Services Exceptional Student & Deaf Education Course Descriptions CATALOG 2010-2011 Undergraduate Information College of Education and Human Services Exceptional Student & Deaf Education Course Descriptions ASL2140: American Sign Language I 4 This course in American

More information

State Education Agency Accessibility and Accommodations Policies:

State Education Agency Accessibility and Accommodations Policies: State Education Agency Accessibility and Policies: 2017-2018 Within the ACCESS for ELLs 2.0 Accessibility and Supplement, there are a number of places where the text refers to the development of specific

More information

TExES Deaf and Hard-of-Hearing (181) Test at a Glance

TExES Deaf and Hard-of-Hearing (181) Test at a Glance TExES Deaf and Hard-of-Hearing (181) Test at a Glance See the test preparation manual for complete information about the test along with sample questions, study tips and preparation resources. Test Name

More information

California Subject Examinations for Teachers

California Subject Examinations for Teachers California Subject Examinations for Teachers TEST GUIDE AMERICAN SIGN LANGUAGE SUBTEST III Subtest Description This document contains the World Languages: American Sign Language (ASL) subject matter requirements

More information

FOURTH EDITION. NorthStar ALIGNMENT WITH THE GLOBAL SCALE OF ENGLISH AND THE COMMON EUROPEAN FRAMEWORK OF REFERENCE

FOURTH EDITION. NorthStar ALIGNMENT WITH THE GLOBAL SCALE OF ENGLISH AND THE COMMON EUROPEAN FRAMEWORK OF REFERENCE 4 FOURTH EDITION NorthStar ALIGNMENT WITH THE GLOBAL SCALE OF ENGLISH AND THE COMMON EUROPEAN FRAMEWORK OF REFERENCE 1 NorthStar Reading & Writing 3, 4th Edition NorthStar FOURTH EDITION NorthStar, Fourth

More information

The Classification Accuracy of Measurement Decision Theory. Lawrence Rudner University of Maryland

The Classification Accuracy of Measurement Decision Theory. Lawrence Rudner University of Maryland Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, April 23-25, 2003 The Classification Accuracy of Measurement Decision Theory Lawrence Rudner University

More information

May 15, Dear Parents and Community Members:

May 15, Dear Parents and Community Members: May 15, 2018 Dear Parents and Community Members: We are pleased to present you with the (AER) which provides key information on the 2017-18 educational progress for the Zemmer Campus 8/9 Building. The

More information

Detection of Differential Test Functioning (DTF) and Differential Item Functioning (DIF) in MCCQE Part II Using Logistic Models

Detection of Differential Test Functioning (DTF) and Differential Item Functioning (DIF) in MCCQE Part II Using Logistic Models Detection of Differential Test Functioning (DTF) and Differential Item Functioning (DIF) in MCCQE Part II Using Logistic Models Jin Gong University of Iowa June, 2012 1 Background The Medical Council of

More information

Critical Thinking Assessment at MCC. How are we doing?

Critical Thinking Assessment at MCC. How are we doing? Critical Thinking Assessment at MCC How are we doing? Prepared by Maura McCool, M.S. Office of Research, Evaluation and Assessment Metropolitan Community Colleges Fall 2003 1 General Education Assessment

More information

SPECIAL EDUCATION (SED) DeGarmo Hall, (309) Website:Education.IllinoisState.edu Chairperson: Stacey R. Jones Bock.

SPECIAL EDUCATION (SED) DeGarmo Hall, (309) Website:Education.IllinoisState.edu Chairperson: Stacey R. Jones Bock. 368 SPECIAL EDUCATION (SED) 591 533 DeGarmo Hall, (309) 438-8980 Website:Education.IllinoisState.edu Chairperson: Stacey R. Jones Bock. General Department Information Program Admission Requirements for

More information

Re-Examining the Role of Individual Differences in Educational Assessment

Re-Examining the Role of Individual Differences in Educational Assessment Re-Examining the Role of Individual Differences in Educational Assesent Rebecca Kopriva David Wiley Phoebe Winter University of Maryland College Park Paper presented at the Annual Conference of the National

More information

I. Language and Communication Needs

I. Language and Communication Needs Child s Name Date Additional local program information The primary purpose of the Early Intervention Communication Plan is to promote discussion among all members of the Individualized Family Service Plan

More information

Cross-validation of easycbm Reading Cut Scores in Washington:

Cross-validation of easycbm Reading Cut Scores in Washington: Technical Report # 1109 Cross-validation of easycbm Reading Cut Scores in Washington: 2009-2010 P. Shawn Irvin Bitnara Jasmine Park Daniel Anderson Julie Alonzo Gerald Tindal University of Oregon Published

More information

Arts and Entertainment. Ecology. Technology. History and Deaf Culture

Arts and Entertainment. Ecology. Technology. History and Deaf Culture American Sign Language Level 3 (novice-high to intermediate-low) Course Description ASL Level 3 furthers the study of grammar, vocabulary, idioms, multiple meaning words, finger spelling, and classifiers

More information

Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure. Rob Cavanagh Len Sparrow Curtin University

Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure. Rob Cavanagh Len Sparrow Curtin University Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure Rob Cavanagh Len Sparrow Curtin University R.Cavanagh@curtin.edu.au Abstract The study sought to measure mathematics anxiety

More information

The Vine Assessment System by LifeCubby

The Vine Assessment System by LifeCubby The Vine Assessment System by LifeCubby A Fully Integrated Platform for Observation, Daily Reporting, Communications and Assessment For Early Childhood Professionals and the Families that they Serve Alignment

More information

Detecting Suspect Examinees: An Application of Differential Person Functioning Analysis. Russell W. Smith Susan L. Davis-Becker

Detecting Suspect Examinees: An Application of Differential Person Functioning Analysis. Russell W. Smith Susan L. Davis-Becker Detecting Suspect Examinees: An Application of Differential Person Functioning Analysis Russell W. Smith Susan L. Davis-Becker Alpine Testing Solutions Paper presented at the annual conference of the National

More information

Assistant Superintendent of Business &

Assistant Superintendent of Business & THE LAMPHERE SCHOOLS ADMINISTRATION CENTER 3121 Dorchester Madison Heights, Michigan 4871-199 Telephone: (248) 589-199 FAX: (248) 589-2618 DALE STEEN Superintendent Finance PATRICK DILLON Assistant Superintendent

More information

Reliability and Validity of a Task-based Writing Performance Assessment for Japanese Learners of English

Reliability and Validity of a Task-based Writing Performance Assessment for Japanese Learners of English Reliability and Validity of a Task-based Writing Performance Assessment for Japanese Learners of English Yoshihito SUGITA Yamanashi Prefectural University Abstract This article examines the main data of

More information

Providing Highly-Valued Service Through Leadership, Innovation, and Collaboration. July 30, Dear Parents and Community Members:

Providing Highly-Valued Service Through Leadership, Innovation, and Collaboration. July 30, Dear Parents and Community Members: Providing Highly-Valued Service Through Leadership, Innovation, and Collaboration July 30, 2011 Dear Parents and Community Members: We are pleased to present you with the Annual Education Report (AER)

More information

A Comparison of Traditional and IRT based Item Quality Criteria

A Comparison of Traditional and IRT based Item Quality Criteria A Comparison of Traditional and IRT based Item Quality Criteria Brian D. Bontempo, Ph.D. Mountain ment, Inc. Jerry Gorham, Ph.D. Pearson VUE April 7, 2006 A paper presented at the Annual Meeting of the

More information

Description of components in tailored testing

Description of components in tailored testing Behavior Research Methods & Instrumentation 1977. Vol. 9 (2).153-157 Description of components in tailored testing WAYNE M. PATIENCE University ofmissouri, Columbia, Missouri 65201 The major purpose of

More information

Department of Middle, Secondary, Reading, and Deaf Education

Department of Middle, Secondary, Reading, and Deaf Education Department of Middle, Secondary, Reading, and Deaf Education 1 Department of Middle, Secondary, Reading, and Deaf Education Dr. Barbara Radcliffe, Department Head Room 1045, Education Building The Department

More information

To Thine Own Self Be True: A Five-Study Meta-Analysis on the Accuracy of Language-Learner Self-Assessment

To Thine Own Self Be True: A Five-Study Meta-Analysis on the Accuracy of Language-Learner Self-Assessment To Thine Own Self Be True: A Five-Study Meta-Analysis on the Accuracy of Language-Learner Self-Assessment Troy L. Cox, PhD Associate Director of Research and Assessment Center for Language Studies Brigham

More information

Bruno D. Zumbo, Ph.D. University of Northern British Columbia

Bruno D. Zumbo, Ph.D. University of Northern British Columbia Bruno Zumbo 1 The Effect of DIF and Impact on Classical Test Statistics: Undetected DIF and Impact, and the Reliability and Interpretability of Scores from a Language Proficiency Test Bruno D. Zumbo, Ph.D.

More information

linking in educational measurement: Taking differential motivation into account 1

linking in educational measurement: Taking differential motivation into account 1 Selecting a data collection design for linking in educational measurement: Taking differential motivation into account 1 Abstract In educational measurement, multiple test forms are often constructed to

More information

Purpose and Objectives of Study Objective 1 Objective 2 Objective 3 Participants and Settings Intervention Description Social peer networks

Purpose and Objectives of Study Objective 1 Objective 2 Objective 3 Participants and Settings Intervention Description Social peer networks 1 Title: Autism Peer Networks Project: Improving Social-Communication and Literacy for Young Children with ASD Funding: Institute of Education Sciences R324A090091 Session Presenters: Debra Kamps and Rose

More information

COLLEGE OF THE DESERT

COLLEGE OF THE DESERT COLLEGE OF THE DESERT Course Code ASL-001 Course Outline of Record 1. Course Code: ASL-001 2. a. Long Course Title: Elementary American Sign Language I b. Short Course Title: ELEMENTARY ASL I 3. a. Catalog

More information

Alignment in Educational Testing: What it is, What it isn t, and Why it is Important

Alignment in Educational Testing: What it is, What it isn t, and Why it is Important Alignment in Educational Testing: What it is, What it isn t, and Why it is Important Stephen G. Sireci University of Massachusetts Amherst Presentation delivered at the Connecticut Assessment Forum Rocky

More information

Characteristics of the Text Genre Nonfi ction Text Structure Three to eight lines of text in the same position on each page

Characteristics of the Text Genre Nonfi ction Text Structure Three to eight lines of text in the same position on each page LESSON 14 TEACHER S GUIDE by Karen J. Rothbardt Fountas-Pinnell Level J Nonfiction Selection Summary Children with special needs use a variety of special tools to help them see and hear. This simply written

More information

World Languages American Sign Language (ASL) Subject Matter Requirements

World Languages American Sign Language (ASL) Subject Matter Requirements World Languages American Sign Language (ASL) Subject Matter Requirements Part I: Content Domains for Subject Matter Understanding and Skill in World Languages American Sign Language (ASL) Domain 1. General

More information

Research and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida

Research and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida Vol. 2 (1), pp. 22-39, Jan, 2015 http://www.ijate.net e-issn: 2148-7456 IJATE A Comparison of Logistic Regression Models for Dif Detection in Polytomous Items: The Effect of Small Sample Sizes and Non-Normality

More information

Job Description: Special Education Teacher of Deaf and Hard of Hearing

Job Description: Special Education Teacher of Deaf and Hard of Hearing **FOR SCHOOL YEAR 2015-16** Reports to: Headmaster / Special Education Job Description: Special Education Teacher of Deaf and Hard of Hearing The Boston Arts Academy is looking for a high Teacher of Deaf

More information

MACOMB MONTESSORI ACADEMY School Annual Education Report (AER) Cover Letter - REVISED

MACOMB MONTESSORI ACADEMY School Annual Education Report (AER) Cover Letter - REVISED MACOMB MONTESSORI ACADEMY School (AER) Cover Letter - REVISED February 22, 217 Dear Parents and Community Members: We are pleased to present you with the (AER) which provides key information on the 215-216

More information

SEMINAR ON SERVICE MARKETING

SEMINAR ON SERVICE MARKETING SEMINAR ON SERVICE MARKETING Tracy Mary - Nancy LOGO John O. Summers Indiana University Guidelines for Conducting Research and Publishing in Marketing: From Conceptualization through the Review Process

More information

Guidelines for Captioning

Guidelines for Captioning Guidelines for Captioning TEXT: CASE Mixed case characters are preferred for readability Use Capital Letters for: Individual word Single Phrase to denote emphasis Shouting FONT USE: White Characters Medium

More information

Allen Independent School District Bundled LOTE Curriculum Beginning 2017 School Year ASL III

Allen Independent School District Bundled LOTE Curriculum Beginning 2017 School Year ASL III Allen Independent School District Bundled LOTE Curriculum Beginning 2017 School Year ASL III Page 1 of 19 Revised: 8/1/2017 114.36. American Sign Language, Level III (One Credit), Adopted 2014. (a) General

More information

NON-NEGOTIBLE EVALUATION CRITERIA

NON-NEGOTIBLE EVALUATION CRITERIA PUBLISHER: SUBJECT: COURSE: COPYRIGHT: SE ISBN: SPECIFIC GRADE: TITLE TE ISBN: NON-NEGOTIBLE EVALUATION CRITERIA 2017-2023 Group V World Language American Sign Language Level II Grades 7-12 Equity, Accessibility

More information

Scaling TOWES and Linking to IALS

Scaling TOWES and Linking to IALS Scaling TOWES and Linking to IALS Kentaro Yamamoto and Irwin Kirsch March, 2002 In 2000, the Organization for Economic Cooperation and Development (OECD) along with Statistics Canada released Literacy

More information

22932 Woodward Ave., Ferndale, MI School Annual Education Report (AER) Cover Letter

22932 Woodward Ave., Ferndale, MI School Annual Education Report (AER) Cover Letter 35 John R., Detroit, MI 4821 22932 Woodward Ave., Ferndale, MI 4822 313.831.351 School (AER) 248.582.81 Cover Letter School (AER) Cover Letter March 9, 217 Dear Parents and Community Members: We are pleased

More information

Investigating the Reliability of Classroom Observation Protocols: The Case of PLATO. M. Ken Cor Stanford University School of Education.

Investigating the Reliability of Classroom Observation Protocols: The Case of PLATO. M. Ken Cor Stanford University School of Education. The Reliability of PLATO Running Head: THE RELIABILTY OF PLATO Investigating the Reliability of Classroom Observation Protocols: The Case of PLATO M. Ken Cor Stanford University School of Education April,

More information

New Mexico TEAM Professional Development Module: Deaf-blindness

New Mexico TEAM Professional Development Module: Deaf-blindness [Slide 1] Welcome Welcome to the New Mexico TEAM technical assistance module on making eligibility determinations under the category of deaf-blindness. This module will review the guidance of the NM TEAM

More information

A circumstance or event that precedes a behavior. Uneasiness of the mind, typically shown by apprehension, worry and fear.

A circumstance or event that precedes a behavior. Uneasiness of the mind, typically shown by apprehension, worry and fear. Glossary of Terms for ADHD Accommodations Making changes to school curriculum in order to better serve children with special needs or learning differences. Accommodations can include a variety of modifications

More information

WE RE HERE TO HELP ESTAMOS AQUÍ PARA AYUDARLE ACCORDING TO THE 2010 UNITED STATES CENSUS, approximately 25.1 million people (or 8% of the total U.S. population) are limited English proficient (LEP). Limited

More information

Academic Program / Discipline Area (for General Education) or Co-Curricular Program Area:

Academic Program / Discipline Area (for General Education) or Co-Curricular Program Area: PROGRAM LEARNING OUTCOME ASSESSMENT PLAN General Information Academic Year of Implementation: 2012 2013 Academic Program / Discipline Area (for General Education) or Co-Curricular Program Area: Pre-major

More information

Smarter Balanced Interim Assessment Blocks Total Number of Items and hand scoring Requirements by Grade and Subject.

Smarter Balanced Interim Assessment Blocks Total Number of Items and hand scoring Requirements by Grade and Subject. Smarter Balanced Interim Assessment Blocks of Items and hand scoring Requirements by Grade and Subject. The following tables are intended to assist coordinators, site coordinators, and test administrators

More information

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form INVESTIGATING FIT WITH THE RASCH MODEL Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form of multidimensionality. The settings in which measurement

More information

Basic concepts and principles of classical test theory

Basic concepts and principles of classical test theory Basic concepts and principles of classical test theory Jan-Eric Gustafsson What is measurement? Assignment of numbers to aspects of individuals according to some rule. The aspect which is measured must

More information

School Annual Education Report (AER) Cover Letter

School Annual Education Report (AER) Cover Letter Lincoln Elementary Sam Skeels, Principal 158 S. Scott St Adrian, MI 49221 Phone: 517-265-8544 School (AER) Cover Letter April 29, 2017 Dear Parents and Community Members: We are pleased to present you

More information

BOARD CERTIFICATION PROCESS (EXCERPTS FOR SENIOR TRACK III) Stage I: Application and eligibility for candidacy

BOARD CERTIFICATION PROCESS (EXCERPTS FOR SENIOR TRACK III) Stage I: Application and eligibility for candidacy BOARD CERTIFICATION PROCESS (EXCERPTS FOR SENIOR TRACK III) All candidates for board certification in CFP must meet general eligibility requirements set by ABPP. Once approved by ABPP, candidates can choose

More information

ANN ARBOR PUBLIC SCHOOLS 2555 S. State Street Ann Arbor, MI www. a2schools.org Pioneer High School Annual Education Report (AER)!

ANN ARBOR PUBLIC SCHOOLS 2555 S. State Street Ann Arbor, MI www. a2schools.org Pioneer High School Annual Education Report (AER)! ANN ARBOR PUBLIC SCHOOLS 2555 S. State Street Ann Arbor, MI 48104 734-994-2200 www. a2schools.org Pioneer High School Annual Education Report (AER)!! Dear Pioneer Parents and Community Members: We are

More information

2015 Exam Committees Report for the National Physical Therapy Examination Program

2015 Exam Committees Report for the National Physical Therapy Examination Program 2015 Exam Committees Report for the National Physical Therapy Examination Program Executive Summary This report provides a summary of the National Physical Therapy Examination (NPTE) related activities

More information

Accessibility and Lecture Capture. David J. Blezard Michael S. McIntire Academic Technology

Accessibility and Lecture Capture. David J. Blezard Michael S. McIntire Academic Technology Accessibility and Lecture Capture David J. Blezard Michael S. McIntire Academic Technology IANAL WANL WANADACO What do they have in common? California Community Colleges California State University Fullerton

More information

ISA 540, Auditing Accounting Estimates, Including Fair Value Accounting Estimates, and Related Disclosures Issues and Task Force Recommendations

ISA 540, Auditing Accounting Estimates, Including Fair Value Accounting Estimates, and Related Disclosures Issues and Task Force Recommendations Agenda Item 1-A ISA 540, Auditing Accounting Estimates, Including Fair Value Accounting Estimates, and Related Disclosures Issues and Task Force Recommendations Introduction 1. Since the September 2016

More information

Georgia Performance Standards Framework for Biology 9-12

Georgia Performance Standards Framework for Biology 9-12 The following instructional plan is part of a GaDOE collection of Unit Frameworks, Performance Tasks, examples of Student Work, and Teacher Commentary. Many more GaDOE approved instructional plans are

More information

DISTRICT LETTERHEAD. REVISED TEMPLATE (Letter Sent on District s Letterhead) School Annual Education Report (AER) Cover Letter

DISTRICT LETTERHEAD. REVISED TEMPLATE (Letter Sent on District s Letterhead) School Annual Education Report (AER) Cover Letter DISTRICT LETTERHEAD REVISED 2017-18 TEMPLATE (Letter Sent on s Letterhead) School (AER) Cover Letter May 20, 2018 Dear Parents and Community Members: We are pleased to present you with the (AER) which

More information

JOHN C. THORNE, PHD, CCC-SLP UNIVERSITY OF WASHINGTON DEPARTMENT OF SPEECH & HEARING SCIENCE FETAL ALCOHOL SYNDROME DIAGNOSTIC AND PREVENTION NETWORK

JOHN C. THORNE, PHD, CCC-SLP UNIVERSITY OF WASHINGTON DEPARTMENT OF SPEECH & HEARING SCIENCE FETAL ALCOHOL SYNDROME DIAGNOSTIC AND PREVENTION NETWORK OPTIMIZING A CLINICAL LANGUAGE MEASURE FOR USE IN IDENTIFYING SIGNIFICANT NEURODEVELOPMENTAL IMPAIRMENT IN DIAGNOSIS OF FETAL ALCOHOL SPECTRUM DISORDERS (FASD) JOHN C. THORNE, PHD, CCC-SLP UNIVERSITY OF

More information

OFFICE OF EDUCATOR LICENSURE SUBJECT MATTER KNOWLEDGE. The Competency Review Made Simple

OFFICE OF EDUCATOR LICENSURE SUBJECT MATTER KNOWLEDGE. The Competency Review Made Simple OFFICE OF EDUCATOR LICENSURE SUBJECT MATTER KNOWLEDGE The Competency Review Made Simple Meeting and Verifying Subject Matter Competency Requirements January, 2016 Massachusetts Department of Elementary

More information

GMAC. Scaling Item Difficulty Estimates from Nonequivalent Groups

GMAC. Scaling Item Difficulty Estimates from Nonequivalent Groups GMAC Scaling Item Difficulty Estimates from Nonequivalent Groups Fanmin Guo, Lawrence Rudner, and Eileen Talento-Miller GMAC Research Reports RR-09-03 April 3, 2009 Abstract By placing item statistics

More information