Reliability and Today s Objectives Understand the difference between reliability and validity Understand how to develop valid indicators of a concept Reliability and Reliability How accurate or consistent is the measure? Would two people understand a question in the same way? Would the same person give the same answers under similar circumstances? Does the concept measure what it is intended to measure? Does the measure actually reflect the concept? Do the findings reflect the opinions, attitudes, and behaviors of the target population? Reliable but not valid Valid ldbut not reliable Valid and reliable 1
Levels of Reliability Example: Person s weight LOW HIGH Estimate on the part of the subject Estimate on the part of the observer Old bathroom scale Industrial scale Reliability Reliability is the consistency of your measurement, or the degree to which an instrument measures the same way each time it is used under the same condition with the same subjects. In short, it is the repeatability of your measurement. A measure is considered reliable if a person's score on the same test given twice is similar. It is important to remember that reliability is not measured, it is estimated. Here is a simple example to illustrate this. Suppose that you have bathroom weight scales and these weight scales are broken. The weight scales will represent the methodology. One person weighs you with these scales and obtains a result. Then, the weight scales are passed along to another person. The second person follows the same procedure, uses the same weight scales and weighs you. The same broken weigh scales are used. The two people, using the same broken weight scales, come to similar measures. The results are reliable. The results are obtained by two (or perhaps more) people using the faulty scale. Although the results are reliable, they may not be valid. That is, by using the faulty scales, the results are not a true indicator of the real weight. Reliability Accuracy, precision, or consistency of measurement Degree ee to which measures es are free from error and therefore yield consistent results Reliable measures mean the same data would have been collected under similar circumstances 2
Methods used to determine reliability Test-retest method Administer the same measures to the same respondents at two separate points in time Split-half method Correlate one-half of a scale with the other half Calculate reliability coefficient Statistical test that measures the internal consistency of a set of items How to improve Reliability? Quality of items; concise statements, homogenous words (some sort of uniformity) Adequate sampling of content domain; comprehensiveness of items Longer assessment less distorted by chance factors Developing a scoring plan (esp. for subjective items rubrics) Ensure VALIDITY Food Quality What items would you include to get adequate sampling of content domain? Program Satisfaction I like the after-school program I like the after-school teachers I would sign up again for the afterschool program 3
The ability of a scale to measure what it is intended to measure The extent to which a measure e reflects the real meaning of the concept under consideration The extent to which a measure reflects the opinions and behaviors of the population under investigation Can not be valid unless also reliable refers to the degree to which a study accurately reflects or assesses the specific concept that the researcher is attempting to measure. While reliability is concerned with the accuracy of the actual measuring instrument or procedure, validity is concerned with the study's success at measuring what the researchers set out to measure. Depends on the Purpose of the measure E.g. a ruler may be a valid measuring device for length, but isn t very valid for measuring volume Measuring what it is supposed to Must be inferred from evidence; cannot be directly measured What would be valid measures of Intelligence? Religiosity? Knowledge of RPTS 336 material? Tourism motivations? Commitment to a leisure activity? Satisfaction with a leisure service? Environmental ethic? 4
Types of validity Face (content) validity professional agreement that variables cover range of meanings included within the concept Items should be evaluated for their presumed relevance Items should cover a range of ideas rather than a single topic area Items should be evaluated in terms of the abilities of the individuals under investigation Types of validity Construct validity the degree to which a measure relates to other variables, as expected, within a given system of theoretical relationships Satisfaction and Program Quality Predictive validity extent to which a measure predicts some future event Self-esteem and GPA Factors that can lower Unclear directions Difficult reading vocabulary and sentence structure Ambiguity in statements Inadequate time limits Inappropriate level of difficulty Poorly constructed test items Test items inappropriate for the outcomes being measured Continued. Tests that are too short Improper arrangement of items (complex to easy?) Identifiable patterns of answers Teaching Administration and scoring Students Nature of criterion 5
External Answers the question of generalizability To what populations or settings can this effect be generalized? Two aspects Population validity Ecological Population Is the actual sample representative of the theoretical population? To determine, need to identify: Theoretical population Accessible population Sampling design and selected sample Actual sample 6