Reliability and Validity Plan 2017 Frm CAEP The principles fr measures used in the CAEP accreditatin prcess include: (a) validity and reliability, (b) relevance, (c) verifiability, (d) representativeness, (e) cumulativeness, (f) fairness, (g) stakehlder interest, (h) benchmarks, (i) vulnerability t manipulatin, and (j) actinability 1. CAEP requires valid and reliable assessments t demnstrate candidate quality and that varius stakehlders must cntribute t the validity f the assessments. Validity and reliability are tw f the mst imprtant criteria when assessing instruments. Reliability means cnsistency and a test is valid if it measures what it is suppsed t measure. Befre we start, here is a reminder f CAEP Standard 5 and cmpnent 5.2. Standard 5: Prvider Quality, Cntinuus Imprvement, and Capacity The prvider maintains a quality assurance system cmprised f valid data frm multiple measures, including evidence f candidates and cmpleters psitive impact n P-12 student learning and develpment. The prvider supprts cntinuus imprvement that is sustained and evidence-based, and that evaluates the effectiveness f its cmpleters. The prvider uses the results f inquiry and data cllectin t establish pririties, enhance prgram elements and capacity, and test innvatins t imprve cmpleters impact n P-12 student learning and develpment. Cmpnents: Quality and Strategic Evaluatin 5.2 The prvider s quality assurance system relies n relevant, verifiable, representative, cumulative and actinable measures, and prduces empirical evidence that interpretatins f data are valid and cnsistent. Accrding t CAEP s Evidence Guide, the respnsibility lies with the EPP t prvide valid (and reliable) evidence. CAEP is cmmitted t strnger preparatin and accreditatin data. The prfessin needs evidence that assessment is intentinal, purpseful, and addresses deliberately psed questins f imprtance. Such reprting entails interpretatin and reflectin; measures need t be integrated and hlistic; and appraches t its assessment can be qualitative and quantitative, and direct and indirect. ALL EPP created assessments used in the CAEP review must meet the Sufficient level n the CAEP Instrument rubric. 1 Ewell, P. (2013). Principles fr measures used in the CAEP accreditatin prcess.
Submissin fr EPP-Created instruments 1. A cpy f the assessment 2. Data chart(s) 3. Instructins t candidates 4. Can include a page n hw yu addressed validity and reliability r yu can simply respnd t the 5 questins 5. (ptinal) Can include an analysis f data fr this instrument r this may be in yur Self Study Reprt Narrative 6. Respnse t five questins in AIMS Hw t address reliability and validity MSU EPP self-studies needs t include evidence related t the reliability and validity f the reprted data. Reliability and validity are frequently measured quantitatively. EPP s quantitative apprach t assess the reliability f instruments can invlve fur facets: 1. Supervisr (e.g., inter-rater reliability, internal cnsistency, bias) 2. Candidate (e.g., distributin f ratings) 3. Item (e.g., variability f items) 4. Time (e.g., variability f candidate perfrmance acrss time) Befre we review quantitative ways f assessing validity, we shuld cnsider that describing the reliability and validity f EPP created instruments has nt been an easy task fr institutins cmpleting their CAEP selfstudies. CAEP has recmmended the inclusin f bth quantitative and qualitative appraches when describing reliability and validity. The infrmatin prvided next cmes frm the Breaux and Ellit s presentatin: Reliability and Validity: Establishing and Cmmunicating Trustwrthy Findings (CAEP Spring Cnference 2015). Althugh the terms validity and reliability are traditinally assciated with quantitative research, CAEP des nt mean t imply that nly quantitative data are expected r valued.
Differences between quantitative and qualitative methds when establishing validity and reliability: Quantitative These methds f establishing validity and reliability are easier t describe briefly. The standards fr judging the results are less subjective. They require statistical literacy, and the results are decntextualized. Qualitative These methds f establishing validity and reliability depend much mre n anticipating and discnfirming a variety f ptential dubts that varius readers culd have. The prcess takes mre effrt, and the reader s judgment is less predictable. They require strng skills in lgical argumentatin and expsitry writing, but the results are mre cntextualized. Strategies Used in CAEP Self-Studies: Reliability 1. Quantitative studies explain hw they manage subjectivity using cmmn terminlgy, standard prcedures, and unifrm frmats fr reprting results. We need t make sure the crrect prcedures are selected, cnducted prperly, interpreted validly, and cmmunicated clearly. Fcus n key reliabilities. Use inter-rater crrelatins fr large. Use rater agreement fr small samples. The instrument s reliability sectin n CAEP s Instrument Rubric addresses the fllwing: Degree in which an assessment prduces stable and cnsistent results Ask the questin - Can the evidence be crrbrated? Criteria A detailed descriptin r plan is prvided Training f scrers and checking n inter-rater reliability are dcumented Steps are described that meet accepted research standards fr establishing interrater reliability Strategies Used in CAEP Self-Studies: Validity It is nt necessary t establish every frm. Sme f the prcesses are qualitative and invlve demnstrating alignment. Others are quantitative and invlve calculating values. Quantitative methds t assess validity; Fcused n key validities: Cntent: all relevant elements f the cnstruct are measured Cnstruct: measures intended attribute
Criterin: measures f attributes predict target behavirs Cncurrent: crrelates with a knwn gd measure Predictive: predicts scre n a significant future measure Cnvergent: measure crrelates with related measures The types f validity that are needed are judgment calls that have t be justified in the self-study s ratinale. Hwever, cntent validity and cnstruct validity shuld be included. Example When cperating teachers and university supervisrs rate a candidate, we need t shw that the assessment is a valid measure f the cnstruct r cnstructs; and that bth raters understand the items and verall intent in the same way. T shw the assessment is a valid measure: Expert judgment: what d university supervisrs and cperating teachers say? Alignment with relevant standards Agreement with lgically-related measures Is there sufficient variance in the evidence? T shw the assessment is a reliable measure: Inter-rater agreement Mre n Cntent Validity Fllwing Dr. Stevie Chepk s view, there are three imprtant cmpnents t establish cntent validity: 1. Determining the bdy f knwledge fr the cnstruct t be measured. The agreement amng experts requires the use f recgnized subject matter experts and it is based n their judgment. It als relies n individuals wh are familiar with the cnstruct such as faculty members, EPP based clinical educatrs, and/r P-12 based clinical educatrs. The key is having them answer the fundamental questin: D the indicatrs assess the cnstruct t be measured? 2. Aligning indicatrs t cnstruct. Indicatrs must assess sme aspects r segment f the cnstruct and indicatrs must align with the cnstruct. 3. Using Lawshe s Cntent Validity Rati
4. Lawshe s Cntent Validity Rati (CVR) Perfrmance dmains: Behavirs that are directly bservable Can be a simple prficiencies Can be higher mental prcess (inductive/deductive reasning) Operatinal definitin Extent t which verlap exists between perfrmance n assessment under investigatin, and ability t functin in the defined jb Attempts t identify the extent f the verlap The Cntent Evaluatin Panel is cmpsed f persns knwledgeable abut the jb, and it is mst successful when it is a cmbinatin f P-12 based clinical educatrs, EPP based clinical educatrs, and faculty. Each panel member is given the list f indicatrs r items independently and are asked t d the fllwing: Rate the item as essential, useful but nt essential, r nt necessary. Items/indicatrs must be aligned with the cnstruct being measured T quantifying cnsensus, any item/indicatr which is perceived as essential by mre than half f the panelists, has sme degree f cntent validity. The mre panelist (beynd 50%) wh perceive the indicatr as essential, the greater the extent r degree f its cntent validity Calculating the cntent validity rati (CVR) n e = number f panelists indicating essential N = ttal number f panelists Cmpare answer with CVR chart t determine CVR value based n the number f panelists. CVR is calculated fr each indicatr, and minimum value f the CVR is based n the number f panelists and is n a CVR Table. Keep r reject individual items based n the table results. CVR values range frm -1.0 t + 1.0. The mre panelists, the lwer the CVR value. Fr example, 5 panelists requires minimum CVR value f.99 15 panelists requires minimum CVR value f.60 40 panelists requires minimum CVR value f.30
Anther Methd f Establishing Cntent Validity Cnduct a jb-task analysis t identify essential jb tasks, knwledge areas, skills and abilities Link jb tasks, knwledge areas r skills t the assciated test cnstruct r cmpnent that it is intended t assess Use subject-matter experts The instrument s validity sectin n CAEP s Instrument Rubric addresses the fllwing: A descriptin r plan is prvided Describes the steps t be used fr determining cntent validity Research was used in the develpment f the plan Pilt was cmpleted prir t administratin O Steps meet accepted research standards/prtcls Lawshe s methd (CVR) Questins t Be Answered fr each Submitted EPP- Created Instrument 1. During which part f the candidate's experience is the assessment used? Is the assessment used just nce r multiple times during the candidate's preparatin? 2. Wh uses the assessment and hw are the individuals trained n the use f the assessment? 3. What is the intended use f the assessment and what is the assessment purprted t measure? 4. Please describe hw validity/trustwrthiness was established fr the assessment. 5. Please describe hw reliability/cnsistency was established fr the assessment. MSU EPP Hmegrwn Instruments We currently use seven EPP-created instruments t assess educatin prgrams at MSU. We need t determine the adequacy f these measures fr the accreditatin prcess. The measures are: 1. Experiential Lg 2. Candidate Prfessinal Dispsitin Traits 3. Missuri Educatr Evaluatin System (MEES) Rubric
4. Diversity Prficiencies 5. Cmprehensive Exam Assessment Rubric fr Advanced Prgrams 6. Research Rubric fr Advanced Prgrams 7. Student teaching exit survey (based n a prprietary survey) 8. EDC345 Multiculturalism Lessn Plan Next Steps 1. Revisit the instruments. Des it address critical elements required by CAEP? Which nes? 2. Make necessary adjustments. Define in specific terms what shuld be addressed and assessed Align with prgram bjectives, CAEP, InTASC and/r state standards Clarify language make sure we include distinguishable and measurable statements in rubrics 3. Use CAEP s Assessment Rubric as a guide 4. Establish cntent validity. Create Cntent Evaluatin Panels fr each instrument cmpsed f P-12 based clinical educatrs, EPP based clinical educatrs, and faculty. Thrugh a survey, ask them t rate the items as essential, useful but nt essential, r nt necessary. Calculate the Lawshe s Cntent Validity Rati (CVR) fr each item f the assessment t determine which items will remain. Fcus grups t discuss the cntent f each instrument 5. Establish inter-rater reliability. Use spring 2017 data as a pilt f the instruments When pssible, instructrs within a prgram shuld scre at least 3 samples independently f ne anther Cllect results and calculate the percentage f agreement n each cmpnent & submissin
If scres vary and yield <80% agreement, meet t discuss each item scre n each submissin Pay attentin t: Discrepancies between/amng scrers Whether discrepancies are due t language r hw items are defined Reslving discrepancies with clarified language, rearranging items, r ther changes Make nte f these changes and revise the assessment as necessary If substantial changes are necessary, each instructr shuld scre at least 2 wrk samples independently f ne anther; until instructrs reach at least 80% agreement When pssible triangulate (cmpare cperating teacher and supervisr results) When pssible, cmpare results ver time (chrt and panel-wise) When summarizing reliability, try t include data n the fllwing: Supervisr (e.g., inter-rater reliability, internal cnsistency, bias) Candidate (e.g., distributin f ratings) Item (e.g., variability f items) Time (e.g., variability f apprentice perfrmance acrss time)