Statistical Methodology: 11. Reliability and Validity Assessment in Study Design, Part A Daiid J. Karl-as, MD

Size: px
Start display at page:

Download "Statistical Methodology: 11. Reliability and Validity Assessment in Study Design, Part A Daiid J. Karl-as, MD"

Transcription

1 64 ACADEMC EMERGENCY hledlclne JAN 1997 VOL?/NO 1 SPECAL CONTRBUTONS... Statistical Methodology: 11. Reliability and Validity Assessment in Study Design, Part A Daiid J. Karl-as, MD For any testing instrument to be considered useful, it must be both a reliable and a valid measure of the variable it is designed to assess. Reliabilih refers to the test s consistency with repeated trials, and indicates the extent to which differences in measurement of data are attributable to random variability inherent in the testing method rather than to actual differences in the variable being studied. Reliability is also referred to as the precision or interrzal consistency of a test, and does not require comparison with an external standard. Validity refers to how well the test truly assesses the characteristic it is intended to study as judged by external criteria; this is sometimes referred to as the test s accuracy or extenzal consistency. n contrast to reliability, validity measures the nonrandom, systematic error inherent in a study instrument. A test that is reliable but not valid can be likened to a thermometer that consistently gives an incorrect reading. A test without reliability is also useless; furthermore, an unreliable test is likely to be invalid. Figure 1 illustrates the difference between reliability and validity in a hypothetical test. Unfortunately, many researchers pay little, if any. attention to assessment of reliability and validity when designing clinical tests or questionnaires. The following examples illustrate how reliability and validity may be determined in 3 different situations.. Example 1. An investigator has developed a rapid bedside test for serum creatinine and wishes to know the accuracy of the test. The assay is considered valid if the results are comparable to those obtained using the standard reference assay for creatinine. The test is considered reliable if identical results are obtained on repeated measures of a serum sample. Both reliability and validity in this case can be measured by performing tests of correlation: validity is the correlation between the bedside assay and the reference assay; reliability is the correlation between repeated measures using the bedside test. Example 2. An emergency medicine program director wishes to create a written examination to assess her residents skill in performing conscious sedation. To determine the test s reliability, she could administer the test twice to a sample of residents and determine the correlation between the scores obtained from the first test and from the later retest. Determining the validity of this test is more problematic. The program director needs to ensure that a high score on her test is truly indicative of excellent conscious sedation skills, and that a low score reflects poor skills. To do this, she needs to use (or develop) a standard with which to compare her test. Because no criterion standard exists for assessing the ability to perform conscious sedation, she chooses as her reference a survey of faculty opinions regarding each resident s conscious sedation skills. then compares the results of her test with this external standard. With this approach, the program director strives to achieve consistency with another indicator of the characteristic under study, but cannot assess the true validity of her test. Example 3. An ethics researcher develops a questionnaire regarding emergency physicians attitudes about performing specific procedures on newly deceased patients. He chooses to measure reliability by asking several similar questions in a single survey and assessing the correspondence between answers lo these test items. For his survey to be considered reliable, subjects should give consistent answers to equivalent questions. This opinion survey does not have correct answers,

2 Reliability and Validity Assessment, Part A, Karras FGURE 1. Results of 3 hypothetical assays for serum creatinine. The true value of the sarnple is 1.2 rng/dl, measured by a laboratory reference standard. The data points obtained by 10 trials of each of 3 assays are indicated by an X. The first assay is both reliable and valid; the second is reliable but invalid; and the third is valid but unreliable true value 0 Reliable Reliable, Valid, and valid not valid not reliable and there is neither a criterion standard nor an external measure to corroborate the study s findings. Therefore, there is no method to quantitatively assess the validity of this test. The validity of the questionnaire relies on the researcher s ability to pose questions that adequately assess physicians attitudes toward performing procedures on newly deceased patients. These 3 examples demonstrate both the utility and the limitations of reliability and validity assessment. A number of statistical analyses are used to assess test reliability, which is inherently quantifiable. Although validity is central to interpretation and application of a test s results, assessment of this quality frequently poses difficulty and is often unmeasurable. n the following discussion, test refers to any chemical assay, written examination, or questionnaire that attempts to measure a variable. The variable may be quantitative or qualitative and may be a physical characteristic, behavior, belief, or attitude. TESTS OF CORRELATON...,...,_ Both reliability and validity are fundamentally measures of the strength of the association, or correlation, between different variables. Reliability is the correlation between results obtained on repeated administrations of a test, while validity is the correlation between the test and a reference standard. Tests of correlation yield a coefficient that can be interpreted as either the reliability or the validity coefficient, depending on the association under investigation. The correlation coefficient ranges from - 1 to 1 and its absolute magnitude reflects the strength of the relationship between the 2 variables. The sign of the coefficient indicates whether the variables increase together (positive correlation) or whether one increases while the other declines (negative correlation). The significance of a test of correlation is determined not only by the value of the correlation coefficient, but also by the associated p-value of the correlation. Pearson Product -Moment Correlation: When test results are continuous variables and are drawn from a normally distributed population, the Pearson product-moment correlation coefficient (r) may be used to measure the strength of association between the results (Appendix A). This test further requires that the relationship between the 2 variables be linear. The value is defined as: r =,- - Example 1 describes a hypothetical bedside test for creatinine. Table 1 provides sample data for a validity study in which 10 serum samples are assayed using 2 experimental assays and a reference standard. The association between these experimental assays and the reference test is shown in Figure 2, and the correlations, representing the validity coefficients, are calculated using the Pearson r. While both assays correlate strongly with the true value of serum creatinine, assay 2 consistently yields results 1.5 times greater than the reference standard. f one were to look only at r to determine the validity of the test, one would be misled into believing assay 2 to be superior to assay 1. Spearman s Rank Correlation: Many tests, particularly questionnaires, yield ordinal rather than interval or continuous data. n example 2, for instance, the attending physician rating each resident s conscious sedation skills may be asked to score those skills on a scale of 1 to 3, with 1 being unacceptable, 2 representing marginal com-

3 66 ACADEMC EMERGENCY MEDCNE JAN 1997 VOL 4/NO 1.. 2s 2.5 _ : - B 3.E! B.E! 'C z Q)e, p ' : 0.5 _. Assay B r =.99 0' r2 = p <.00001, slope = 1.5,*,' Assay A r =.96 *,',.2 =,92 P.' '*. p <.0001,o,,' slope = 1.O 0',*',* _' _'?, 0. 6'. *,,' ' * FGURE 2. Results of a validity test of 2 experimental assays for serum creatinine compared with a reference standard using the Pearson product-moment correlation. Although assay B (circles) is more highly correlated with the reference standard than is assay A (diamonds), the results of assay B are consistently 1.5 times greater than the reference standard..... :.,.. :.... ;.... ;. ~ " : ' ' ' ', 0, Reference Assay: Creatinine (mg/dl) petence, and 3 indicating satisfactory performance. Despite a rating's being assigning numerical values, a score of 2 does not represent a value twice as great as a score of 1, and the data cannot be analyzed using the Pearson r. To determine the reliability of tests using such ordinal data, Spearman's rank correlation coefficient (r,) can be calculated (Appendix B). This test is also appropriately used when data are nonparametric. To calculate the r,, the results of the 2 tests are ranked in descending order; the Pearson r is then calculated between the respective ranks, rather than between the data values. The resulting r, can be interpreted similarly to the Pearson r. of Correlation Tests: Tests of Correlation assume that the association between the 2 variables is linear. f the relationship is nonlinear, the resulting r may be low despite a strong association between variables (although this is rarely relevant in assessment of test validity and reliability). Furthermore, there may be clinically important relationships between variables that nevertheless yield low rs. For example, it may be irrelevant to the clinician whether a diagnostic test is accurate beyond a certain range, yet inaccuracy at the extremes of measurement may result in low r values. Because of these limitations, one should always visually examine a scatter plot of the data when interpreting the significance of a correlation test. A critical limitation of correlation tests is that strong linear relationship between 2 variables is not synonymous with strong agreement between the 2 measures. Systematic errors in measurement, or bias, may result in high correlation between a test and its reference standard, yet the results from the 2 measures may differ widely. An example of test bias would be the creatinine assay that consistently yields results 1.5 times that of the reference assay, as demonstrated by assay B in Table 1 and Figure 2. Although the correlation between this assay and the reference standard in this example is excellent, the test is clinically useless and the Pearson r is an inappropriate measure of test validity. An additional limitation of correlation tests is that statistical significance may be achieved when the correlation between 2 measures would be considered poor for practical purposes. When large numbers of subjects are studied, even very low correlations may achieve a statistical significance. With 10 subjects, an r of 0.65 is significant at p c 0.05, while an r of 0.36 is significant at this level when 30 subjects are studied. Few would consider tests so poorly correlated with their reference assays to be clinically relevant, and for this reason r values of at least 0.80 (and usually much higher) are considered a necessity. NTRACLASS CORRELATON... Although the Pearson and Spearman correlation tests are commonly used to assess the reliability and validity of measurements, it has been suggested that the preceding limitations make them unsuitable for this purpose. A more useful method of comparing 2 measures involves analysis of the differences between results obtained by each test. Measures with excellent agreement have little intertest differences, while those with poor agreement or with systematic errors in measurement have greater intertest differences.

4 Reliabilitv and Validitv Assessment. Part A. Karras 67 Difference assessment may be performed by plotting the intertest score differences against the mean score of the 2 tests (Fig. 3). When the 2 creatinine assays are analyzed using this technique, it becomes obvious that the results from assay B differ systematically from those of the reference test and that this difference becomes greater as the mean creatinine value increases. Assay A differs little from the reference assay and is shown to be a better test despite its inferior r. Once the data have been graphically analyzed and systematic bias has been ruled out, test utility can be assessed by calculating the mean and standard deviation (SD) of the intertest differences. Assuming a normal distribution, 95% of the intertest differences would be expected to fall in the range of the mean 2 2 SD. f differences in measurement of up to 2 SD between an experimental test and reference assay are acceptable for clinical purposes, then the test could be considered useful. ntraclass Correlation Coefficient: Assessment of intertest differences and difference variance can be quantified and extended to any case where each subject receives several test scores. The term intraclass correlation refers to measurement of correspondence between 22 different methods used to assess the same individual (class). The intraclass correlation coefficient (rl) (Appendix C) is defined as: T= 2 s: - s; s, + s; + (2/n)(nr22 - s;) where s, and sd are the SDs of the sums and differences, respectively, of each pair of measurements, 2 is the mean difference of each pair of measurements, and n is the number of subjects studied. Unlike the Pearson and Spearman tests of correlation, systematic errors in measurement are reflected by a reduction in the value of the r,. The test is TABLE 1 Sample Data Set for Assay of Serum Creatinine... Reference Sample No. Standard Assay 1 Assay in o not immune to such biases, however, and is inherently influenced by intersubject variability. TESTS OF CORRESPONDENCE FOR CATEGORCAL VARABLES... Many tests produce categorical results, in which subjects select a single answer from a list of descriptive and mutually exclusive options. f there is a logical arrangement by magnitude among the responses (such as a point scale to indicate agreement with a test statement), the data may be regarded as ordinal and analyzed using Spearman's rank correlation test. Tests of knowledge, however, typically yield answers that are either correct or incorrect, while opinion surveys often have inherently nonordered multiple-choice responses. Such tests cannot be analyzed for reliability or validity using the above tests of correlation for continuous or ordinal data. The statistical tools commonly used to analyze interrater agreement are highly applicable to studies of reliability and validity. Rather than comparing the results obtained from 2 independent raters, these analyses can FGURE 3. Analysis of test validity by assessment of intertest differences. The difference between the creatinine values obtained by a reference assay and 2 experimental tests are plotted against the mean value of the test z-3,: ' ES ; a 0) * h 05.. % % 0.: $ 3 results. Assay A (diamonds) shows little dif ference between its results and those of the 8.5 reference standard. Assay B (circles) shows g 8-1 greater differences with the reference stan- La dard; furthermore, the differences grow as the measured value of creatinine increases. E -2; *o O*O.. 0: P : * **.: -..! 0 o o o o mean.? experimental assays (mg/dl)

5 68 ~~ ACADEMC EMERGENCY MEDCNE JAN 1997 VOL 4/NO 1 TABLE 2 Assessment of Categorical Data Agreement... Test 2 Test 1 Correct ncorrect Total Correct a b a+b ncorrect C d c+d Total a+c b+d n calculate the association between the results of a test and a retest of the same subjects (in a reliability study) or between the results of an experimental test and a reference criterion (in a validity study). The simplest example of categorical data analysis involves assessment of 2 independent tests yielding dichotomous (correcthcorrect) outcomes. The results of this test can be arranged as shown in Table 2. Percentage Agreement: One way to assess correspondence between the 2 tests is to celculate how often the results agree and express this as a percentage of the total number of subjects. n this example, (a + d)/n represents the fraction of subjects achieving the same result in each of the tests. While straightforward and conceptually appealing, percentage agreement is rarely used to report test correspondence because it does not account for the number of times intertest agreement would occur by chance. Kappa Statistic: Analogous to the x2 statistic, the Cohen kappa (K) is used to calculate the extent of agreement, beyond that expected by chance alone, between 2 tests, each with at least 2 possible categorical outcomes (Appendix D). Derived from the r,, the statistic is defined as: where, represents observed agreement and, the expected agreement by chance. K ranges from 0, indicating no agreement beyond that expected from chance, to 1, indicating complete agreement. n general, K scores >0.75 represent excellent agreement between tests, while values <0.40 indicate poor agreement. Some tests have an inherent order between the categorical results. Resident intubation skills may be judged by the faculty as unacceptable, marginal, or competent. Alternatively, responses to an opinion questionnaire may include 5 choices ranging from strongly agree to strongly disagree. When assessing intertest agreement of such measures, it may be desirable to distinguish results that have a wide disparity from those in which the responses differ by only 1 category. The weighted K sfatisfic yields a value that is greater when responses correspond more closely, with the highest scores being assigned to exact agreement. The value of the weighted K is interpreted similarly to that of the unweighted K statistic. MEASUREMENT OF RELABLTY... Test-Retest Method: The most straightforward means of determining the reliability of a test is to administer the test on 22 occasions to the same subjects. A highly reliable test would show a great degree of correspondence in results between the test and the retest given at a later date. This test-retest technique is useful in determining the reliability of purely objective tests, such as the creatinine assay illustrated in example 1. Although this approach has significant limitations when applied to other types of measurements, the concepts underlying this technique serve as the foundation for understanding reliability measurement using a variety of other methods. The reliability coefficient is the value of the r between 2 test administrations. The test is said to be reliable if correlation exists at p < 0.05, but as mentioned above, scores of at least 0.80 are considered desirable. The cause of variation in test scores can be expressed in terns of the square of the r: an r2 of 0.90 means that 90% of the variability in scores is due to true differences in the characteristic being measured, while 10% of the variability is due to nonsystematic error within the testing method. Although useful for an objective test such as a chemical assay, the test-retest technique is problematic when applied to tests of knowledge, behavior, beliefs, and attitudes. t is often difficult, if not impossible, to administer a test multiple times to the same individuals because of time or budgetary constraints or because the study population can never be assembled more than once. Alternatively, a researcher may wish to assess a characteristic at a specific point in time, making retesting counterproductive. The greatest disadvantage of the test-retest method, however, is that the first administration of a test may influence the results of the second administration. n tests of achievement, the subject may learn information or skills by taking the test and the acquired knowledge is reflected in improved scores on the second test administration. Attitudinal questionnaires may be affected by subject s recalling the questions from the first test and repeating responses from memory on the retest. An additional problem is that of reactivity: the act of measuring a person s attitudes may lead to increased awareness of the phenomenon being studied, which may induce a change in the person s attitudes on retesting. Each of these factors leads to errors in measuring test reliability. Learning and memory will elevate the testretest correlation, while reactivity causes the correlation to be lower than it would be otherwise. Only tests that are unaffected by repetition can accurately be assessed using the test-retest technique. Because of the practical and the-

6 Reliability and Validity Assessment, Part A, Kurrus 69 oretical problems associated with this method, a number of other techniques for determining test reliability have been developed (Table 3). The most useful of these allow reliability to be determined from a single test administration. Alternative-form Method: A refinement of the testretest technique involves administration of similar, but not identical, tests on 2 separate occasions. The 2 forms of the test must be parallel, i.e., as close as possible in content and not differing in any systematic way. The alternative-form technique is used extensively in assessing the reliability of educational tests, where the test and the alternate-form test are typically administered 2 weeks apart. While the technique minimizes the problem of response memory, it shares many of the disadvantages of the testretest method, including learning and reactivity. The practical limitations of administering a test on 2 occasions also remain. Split-halves Method: The split-halves technique essentially administers 2 alternate forms of a test or questionnaire simultaneously. The test is divided into 2 parallel halves-typically, all even-numbered questions are designated as representing one half-test and all odd-numbered questions as representing the other half-test. Scores from each half-test are determined and correlated as though each represented a distinct test. The reliability coefficient using this method needs to be corrected by the following equation, known as the Spearman-Brown formula: 2 r, reliability coefficient = rhh where r, represents the correlation between the 2 halftests. The split-halves technique offers the advantage of allowing a reliability coefficient to be calculated from a single test administration, eliminating the problem of the retest s being contaminated by the first test administration. However, there are many ways to arbitrarily split a test into half-tests, and each split will result in a distinct reliability coefficient. The differences can be large, if the number of questions is relatively small or if the questions are less than perfectly equivalent. t is impossible to determine a single true reliability coefficient of tests analyzed with by this technique. nteritem Consistency Method: Rather than calculating the reliability between arbitrary half-tests, a single reliability coefficient can be calculated that is equivalent to the mean of the reliability coefficients of every possible half-test split. This technique is known as interitem consistency analysis and determines the correspondence among all equivalent test questions (Appendix E). The TABLE 3 Techniques for Determining Reliability... Test-retest Alternative-form Split-halves nteritem consistency Advantages: Straightforward, intuitively appealing. Disadvantages: Effects of memory, learning, and reactivity confound reliability assessment. May be impractical or impossible to administer. Advantage: Minimizes effect of memory. Disadvanrages: Learning and reactivity confound reliability assessment. May be impractical or impossible to administer. Need to develop parallel tests. Advantages: Single test administration. Minimizes effects of memory, learning, and reactivity. Disadvanrages: Need to develop parallel tests. Unable to determine true reliability given multiple possible splits. Advantages: Single test administration. No arbitrary splitting of tests. Single reliability score. Disadvantage: Requires homogeneity between test questions. most popular and generalizable method for determining interitem consistency is Cronbach s a, defined as: N N-1 where N is the number of test items, C u2(yi) is the sum of item variances, and a: is the total variance. a ranges from 0 to 1.0, with 1.0 representing perfect interitem correspondence and 0 representing no correspondence. An a of at least 0.80 is considered desirable. A statistic similar to a can be calculated when answers consist of dichotomous variables, such as correcthncorrect responses. Known as the Kuder-Richardson formula, the reliability coefficient (KR20) is calculated as: where N is the number of questions, pi is the proportion responding correctly to the ith question, qi = 1 - pi, and a: is the total variance. For a given degree of interitem correlation, a and KR20 increase markedly as the number of questions increases. All else being equal, therefore, reliability increases as more questions are added to assess a particular characteristic. However, the gain from adding further questions becomes marginal when more than about 8 questions are posed. Furthermore, if the additional ques-

7 70 ACADEMC EMERGENCY MEDCNE JAN 1997 VOL 4/NO 1 tions are not truly equivalent, the interitem correlation may drop as further questions are added, reducing the test s reliability. Applications and of nteritem Consistency: nteritem correlation provides an accurate method of determining the reliability of a test that is administered on a single occasion, without requiring arbitrary splitting into half-tests. Because it relies on the correlation of every item in the test with every other item, this technique is most useful when there is a great degree of homogeneity between the test questions. This lends itself well to educational tests, where the content is well defined and numerous equivalent questions can be developed to assess each specific characteristic. Before applying interitem correlation to a given test, one must be certain that there is a reasonable degree of homogeneity between the test questions. n determining the reliability of the conscious sedation examination described in example 2, the underlying assumptions are that each question on the test is a valid measure of the physician s true ability in performing the procedure, and that a subject would be expected to answer all questions consistently. n opinion questionnaires, multiple questions must be developed to address each parameter being measured, and interitem correlation can be determined among questions purported to measure the same characteristic. Reliability cannot be determined among questions that attempt to measure different characteristics. While it may be logical to ask questions related to similar topics in the same questionnaire, reliability must be calculated separately for each group of questions (each group representing an independent test of a characteristic). While determining the correlation between responses to heterogeneous questions may make for interesting study conclusions, it does not reflect the reliability of the test. SUMMARY... Assessment of test reliability and validity is often complex. Although tests of correlation are frequently used to measure intertest agreement, such indexes measure only the strength of the linear relationship between variables and may not provide an accurate assessment of the correspondence between test results. nspection of intertest differences, either visually or using the r,, may provide a better indicator of the correspondence between test results and accounts for measurement biases. of association between categorical variables can be measured using related tests such as the K statistic. Test reliability may be assessed by retesting, but this is not practical in many cases when subject memory or learning may confound the results of repeated examinations. Several methods exist for determining reliability from a single test administration and for assessing the cor- respondence between answers to homogeneous test questions. n the continuation article (Part B) on this subject, the concept and assessment of validity will be examined in more detail, and techniques for maximizing the reliability and validity of questionnaires will be discussed. SUGGESTED READNGS 1. Agresti A. Modeling patterns of agreement and disagreement. Stat Methods Med Res. 1992; 1: Anastasi A. Psychological Testing, ed. 5. New York: MacMillan Publishing, 1982, pp Armitage P, Berry G. Statistical Methods in Medical Research, ed. 3. Oxford, UK: Blackwell Scientific, 1994, pp 273-6, Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet i: Carmines EG, Zeller RA. Reliability and Validity Assessment. Newbury Park, CA: Sage Publications, 1979., 6. Cronbach LJ. Coefficient alpha and the internal structure of tests. Psychometrika ; 16: Dunn G. Design and analysis of reliability studies. Stat Methods Med Res. 1992; 1: Fleiss JL. Statistical Methods for Rates and Proportions, 2 ed. New York: John Wiley & Sons, 1981, pp Kraemer HC. Measurement of reliability for categorical data in medial research. Stat Methods Med Res. 1992; 1: Key words: statistics; statistical methodology; study design; reliability; validity. APPENDX A Pearson Product-Moment Correlation Alternative names and related methods Principle results Pearson r Continuous variables Linear relationship between variables Parametric data r range: - 1 to 1 Easily computed measure of linear association gnores measurement bias Low correlations may be statistically significant May fail to reveal nonlinear associations

8 Reliability and Validity Assessment, Part A. Karras 71 APPENDX B Spearman s Rank Correlation APPENDX D Kappa Measure of Agreement Alternative names and related methods Principle results Spearman s r (rs) nterval or ordinal variables Linear relationship between variables Parametric or nonparametric data rs range: -1 to 1 Measures linear association among ordinal or nonparametric data gnores measurement bias Low correlations may be statistically significant May fail to reveal nonlinear associations Alternative names and related methods Principal results Cohen K, weighted K Categorical variables (including dichotomous results) None K range: 0 to 1 Detects measurement bias\ Weighted K accounts for degree of disagreement when results have inherent order May be inappropriate when one test is more accurate than the other Alternative names and related methods Principal results APPENDX C ntraclass Correlation Concordance correlation (r,) Analysis of differences Continuous None r, range: -1 to 1 Detects measurement bias Does not require linear association nfluenced to some degree by measurement bias and between-subject variability APPENDX E nteritern Consistency Calculation Alternative names and nternal correlation (consistency), related methods Cronbach s a, Kuder-Richardson (KR20) Continuous variables, ordinal variables, dichotomous variables (KR20) Homogeneous test questions Principal results a, KR20 range: 0.0 to 1.0 Determines reliability of test with a single test administration Requires development of multiple equivalent questions Clinical Pearls (cont. from page 63) Radiology and Hospital Course: An anteroposterior (AP) pelvis radiograph shows a right anterior hip dislocation and a left posterior hip dislocation (Fig. 2). Postreduction radiographs (including Judet views) are negative for fractures. The patient s remaining radiographic evaluation is negative except for a nasal fracture. (The diagnosis and discussion appear on page 77). FGURE 2. Anteroposterior pelvis radiograph demonstrating right anterior and left posterior hip dislocations.

LANGUAGE TEST RELIABILITY On defining reliability Sources of unreliability Methods of estimating reliability Standard error of measurement Factors

LANGUAGE TEST RELIABILITY On defining reliability Sources of unreliability Methods of estimating reliability Standard error of measurement Factors LANGUAGE TEST RELIABILITY On defining reliability Sources of unreliability Methods of estimating reliability Standard error of measurement Factors affecting reliability ON DEFINING RELIABILITY Non-technical

More information

alternate-form reliability The degree to which two or more versions of the same test correlate with one another. In clinical studies in which a given function is going to be tested more than once over

More information

ADMS Sampling Technique and Survey Studies

ADMS Sampling Technique and Survey Studies Principles of Measurement Measurement As a way of understanding, evaluating, and differentiating characteristics Provides a mechanism to achieve precision in this understanding, the extent or quality As

More information

COMPUTING READER AGREEMENT FOR THE GRE

COMPUTING READER AGREEMENT FOR THE GRE RM-00-8 R E S E A R C H M E M O R A N D U M COMPUTING READER AGREEMENT FOR THE GRE WRITING ASSESSMENT Donald E. Powers Princeton, New Jersey 08541 October 2000 Computing Reader Agreement for the GRE Writing

More information

CHAPTER 3 METHOD AND PROCEDURE

CHAPTER 3 METHOD AND PROCEDURE CHAPTER 3 METHOD AND PROCEDURE Previous chapter namely Review of the Literature was concerned with the review of the research studies conducted in the field of teacher education, with special reference

More information

Introduction On Assessing Agreement With Continuous Measurement

Introduction On Assessing Agreement With Continuous Measurement Introduction On Assessing Agreement With Continuous Measurement Huiman X. Barnhart, Michael Haber, Lawrence I. Lin 1 Introduction In social, behavioral, physical, biological and medical sciences, reliable

More information

PTHP 7101 Research 1 Chapter Assignments

PTHP 7101 Research 1 Chapter Assignments PTHP 7101 Research 1 Chapter Assignments INSTRUCTIONS: Go over the questions/pointers pertaining to the chapters and turn in a hard copy of your answers at the beginning of class (on the day that it is

More information

Week 17 and 21 Comparing two assays and Measurement of Uncertainty Explain tools used to compare the performance of two assays, including

Week 17 and 21 Comparing two assays and Measurement of Uncertainty Explain tools used to compare the performance of two assays, including Week 17 and 21 Comparing two assays and Measurement of Uncertainty 2.4.1.4. Explain tools used to compare the performance of two assays, including 2.4.1.4.1. Linear regression 2.4.1.4.2. Bland-Altman plots

More information

11-3. Learning Objectives

11-3. Learning Objectives 11-1 Measurement Learning Objectives 11-3 Understand... The distinction between measuring objects, properties, and indicants of properties. The similarities and differences between the four scale types

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

DATA is derived either through. Self-Report Observation Measurement

DATA is derived either through. Self-Report Observation Measurement Data Management DATA is derived either through Self-Report Observation Measurement QUESTION ANSWER DATA DATA may be from Structured or Unstructured questions? Quantitative or Qualitative? Numerical or

More information

Survey Question. What are appropriate methods to reaffirm the fairness, validity reliability and general performance of examinations?

Survey Question. What are appropriate methods to reaffirm the fairness, validity reliability and general performance of examinations? Clause 9.3.5 Appropriate methodology and procedures (e.g. collecting and maintaining statistical data) shall be documented and implemented in order to affirm, at justified defined intervals, the fairness,

More information

02a: Test-Retest and Parallel Forms Reliability

02a: Test-Retest and Parallel Forms Reliability 1 02a: Test-Retest and Parallel Forms Reliability Quantitative Variables 1. Classic Test Theory (CTT) 2. Correlation for Test-retest (or Parallel Forms): Stability and Equivalence for Quantitative Measures

More information

VARIABLES AND MEASUREMENT

VARIABLES AND MEASUREMENT ARTHUR SYC 204 (EXERIMENTAL SYCHOLOGY) 16A LECTURE NOTES [01/29/16] VARIABLES AND MEASUREMENT AGE 1 Topic #3 VARIABLES AND MEASUREMENT VARIABLES Some definitions of variables include the following: 1.

More information

Overview of Non-Parametric Statistics

Overview of Non-Parametric Statistics Overview of Non-Parametric Statistics LISA Short Course Series Mark Seiss, Dept. of Statistics April 7, 2009 Presentation Outline 1. Homework 2. Review of Parametric Statistics 3. Overview Non-Parametric

More information

Collecting & Making Sense of

Collecting & Making Sense of Collecting & Making Sense of Quantitative Data Deborah Eldredge, PhD, RN Director, Quality, Research & Magnet Recognition i Oregon Health & Science University Margo A. Halm, RN, PhD, ACNS-BC, FAHA Director,

More information

EPIDEMIOLOGY. Training module

EPIDEMIOLOGY. Training module 1. Scope of Epidemiology Definitions Clinical epidemiology Epidemiology research methods Difficulties in studying epidemiology of Pain 2. Measures used in Epidemiology Disease frequency Disease risk Disease

More information

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity Measurement & Variables - Initial step is to conceptualize and clarify the concepts embedded in a hypothesis or research question with

More information

An update on the analysis of agreement for orthodontic indices

An update on the analysis of agreement for orthodontic indices European Journal of Orthodontics 27 (2005) 286 291 doi:10.1093/ejo/cjh078 The Author 2005. Published by Oxford University Press on behalf of the European Orthodontics Society. All rights reserved. For

More information

Importance of Good Measurement

Importance of Good Measurement Importance of Good Measurement Technical Adequacy of Assessments: Validity and Reliability Dr. K. A. Korb University of Jos The conclusions in a study are only as good as the data that is collected. The

More information

Figure 1: Design and outcomes of an independent blind study with gold/reference standard comparison. Adapted from DCEB (1981b)

Figure 1: Design and outcomes of an independent blind study with gold/reference standard comparison. Adapted from DCEB (1981b) Page 1 of 1 Diagnostic test investigated indicates the patient has the Diagnostic test investigated indicates the patient does not have the Gold/reference standard indicates the patient has the True positive

More information

and Screening Methodological Quality (Part 2: Data Collection, Interventions, Analysis, Results, and Conclusions A Reader s Guide

and Screening Methodological Quality (Part 2: Data Collection, Interventions, Analysis, Results, and Conclusions A Reader s Guide 03-Fink Research.qxd 11/1/2004 10:57 AM Page 103 3 Searching and Screening Methodological Quality (Part 2: Data Collection, Interventions, Analysis, Results, and Conclusions A Reader s Guide Purpose of

More information

Reliability and Validity checks S-005

Reliability and Validity checks S-005 Reliability and Validity checks S-005 Checking on reliability of the data we collect Compare over time (test-retest) Item analysis Internal consistency Inter-rater agreement Compare over time Test-Retest

More information

HPS301 Exam Notes- Contents

HPS301 Exam Notes- Contents HPS301 Exam Notes- Contents Week 1 Research Design: What characterises different approaches 1 Experimental Design 1 Key Features 1 Criteria for establishing causality 2 Validity Internal Validity 2 Threats

More information

Statistics for Psychosocial Research Session 1: September 1 Bill

Statistics for Psychosocial Research Session 1: September 1 Bill Statistics for Psychosocial Research Session 1: September 1 Bill Introduction to Staff Purpose of the Course Administration Introduction to Test Theory Statistics for Psychosocial Research Overview: a)

More information

Teaching A Way of Implementing Statistical Methods for Ordinal Data to Researchers

Teaching A Way of Implementing Statistical Methods for Ordinal Data to Researchers Journal of Mathematics and System Science (01) 8-1 D DAVID PUBLISHING Teaching A Way of Implementing Statistical Methods for Ordinal Data to Researchers Elisabeth Svensson Department of Statistics, Örebro

More information

Unequal Numbers of Judges per Subject

Unequal Numbers of Judges per Subject The Reliability of Dichotomous Judgments: Unequal Numbers of Judges per Subject Joseph L. Fleiss Columbia University and New York State Psychiatric Institute Jack Cuzick Columbia University Consider a

More information

Test Validity. What is validity? Types of validity IOP 301-T. Content validity. Content-description Criterion-description Construct-identification

Test Validity. What is validity? Types of validity IOP 301-T. Content validity. Content-description Criterion-description Construct-identification What is? IOP 301-T Test Validity It is the accuracy of the measure in reflecting the concept it is supposed to measure. In simple English, the of a test concerns what the test measures and how well it

More information

THE USE OF CRONBACH ALPHA RELIABILITY ESTIMATE IN RESEARCH AMONG STUDENTS IN PUBLIC UNIVERSITIES IN GHANA.

THE USE OF CRONBACH ALPHA RELIABILITY ESTIMATE IN RESEARCH AMONG STUDENTS IN PUBLIC UNIVERSITIES IN GHANA. Africa Journal of Teacher Education ISSN 1916-7822. A Journal of Spread Corporation Vol. 6 No. 1 2017 Pages 56-64 THE USE OF CRONBACH ALPHA RELIABILITY ESTIMATE IN RESEARCH AMONG STUDENTS IN PUBLIC UNIVERSITIES

More information

CHAPTER VI RESEARCH METHODOLOGY

CHAPTER VI RESEARCH METHODOLOGY CHAPTER VI RESEARCH METHODOLOGY 6.1 Research Design Research is an organized, systematic, data based, critical, objective, scientific inquiry or investigation into a specific problem, undertaken with the

More information

DATA GATHERING. Define : Is a process of collecting data from sample, so as for testing & analyzing before reporting research findings.

DATA GATHERING. Define : Is a process of collecting data from sample, so as for testing & analyzing before reporting research findings. DATA GATHERING Define : Is a process of collecting data from sample, so as for testing & analyzing before reporting research findings. 2012 John Wiley & Sons Ltd. Measurement Measurement: the assignment

More information

Validity and reliability of measurements

Validity and reliability of measurements Validity and reliability of measurements 2 Validity and reliability of measurements 4 5 Components in a dataset Why bother (examples from research) What is reliability? What is validity? How should I treat

More information

ESTABLISHING VALIDITY AND RELIABILITY OF ACHIEVEMENT TEST IN BIOLOGY FOR STD. IX STUDENTS

ESTABLISHING VALIDITY AND RELIABILITY OF ACHIEVEMENT TEST IN BIOLOGY FOR STD. IX STUDENTS International Journal of Educational Science and Research (IJESR) ISSN(P): 2249-6947; ISSN(E): 2249-8052 Vol. 4, Issue 4, Aug 2014, 29-36 TJPRC Pvt. Ltd. ESTABLISHING VALIDITY AND RELIABILITY OF ACHIEVEMENT

More information

11/24/2017. Do not imply a cause-and-effect relationship

11/24/2017. Do not imply a cause-and-effect relationship Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are highly extraverted people less afraid of rejection

More information

English 10 Writing Assessment Results and Analysis

English 10 Writing Assessment Results and Analysis Academic Assessment English 10 Writing Assessment Results and Analysis OVERVIEW This study is part of a multi-year effort undertaken by the Department of English to develop sustainable outcomes assessment

More information

Class 7 Everything is Related

Class 7 Everything is Related Class 7 Everything is Related Correlational Designs l 1 Topics Types of Correlational Designs Understanding Correlation Reporting Correlational Statistics Quantitative Designs l 2 Types of Correlational

More information

A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) *

A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) * A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) * by J. RICHARD LANDIS** and GARY G. KOCH** 4 Methods proposed for nominal and ordinal data Many

More information

Bijay Lal Pradhan, M Sc Statistics, FDPM (IIMA) 2

Bijay Lal Pradhan, M Sc Statistics, FDPM (IIMA) 2 Bijay Lal Pradhan Measurement and Scaling 1) Definition of measurement and scale 2) Type of Physical scale i. Nominal Scale iii. Interval scale ii. Ordinal Scale iv. Ratio Scale 3) Need of scaling 4) Criteria

More information

Empirical Knowledge: based on observations. Answer questions why, whom, how, and when.

Empirical Knowledge: based on observations. Answer questions why, whom, how, and when. INTRO TO RESEARCH METHODS: Empirical Knowledge: based on observations. Answer questions why, whom, how, and when. Experimental research: treatments are given for the purpose of research. Experimental group

More information

Comparison of the Null Distributions of

Comparison of the Null Distributions of Comparison of the Null Distributions of Weighted Kappa and the C Ordinal Statistic Domenic V. Cicchetti West Haven VA Hospital and Yale University Joseph L. Fleiss Columbia University It frequently occurs

More information

RELIABILITY AND VALIDITY (EXTERNAL AND INTERNAL)

RELIABILITY AND VALIDITY (EXTERNAL AND INTERNAL) UNIT 2 RELIABILITY AND VALIDITY (EXTERNAL AND INTERNAL) Basic Process/Concept in Research Structure 2.0 Introduction 2.1 Objectives 2.2 Reliability 2.3 Methods of Estimating Reliability 2.3.1 External

More information

Chapter 3. Psychometric Properties

Chapter 3. Psychometric Properties Chapter 3 Psychometric Properties Reliability The reliability of an assessment tool like the DECA-C is defined as, the consistency of scores obtained by the same person when reexamined with the same test

More information

Interpreting Kappa in Observational Research: Baserate Matters

Interpreting Kappa in Observational Research: Baserate Matters Interpreting Kappa in Observational Research: Baserate Matters Cornelia Taylor Bruckner Sonoma State University Paul Yoder Vanderbilt University Abstract Kappa (Cohen, 1960) is a popular agreement statistic

More information

PÄIVI KARHU THE THEORY OF MEASUREMENT

PÄIVI KARHU THE THEORY OF MEASUREMENT PÄIVI KARHU THE THEORY OF MEASUREMENT AGENDA 1. Quality of Measurement a) Validity Definition and Types of validity Assessment of validity Threats of Validity b) Reliability True Score Theory Definition

More information

Reliability. Internal Reliability

Reliability. Internal Reliability 32 Reliability T he reliability of assessments like the DECA-I/T is defined as, the consistency of scores obtained by the same person when reexamined with the same test on different occasions, or with

More information

Lecture 4: Research Approaches

Lecture 4: Research Approaches Lecture 4: Research Approaches Lecture Objectives Theories in research Research design approaches ú Experimental vs. non-experimental ú Cross-sectional and longitudinal ú Descriptive approaches How to

More information

Comparing Vertical and Horizontal Scoring of Open-Ended Questionnaires

Comparing Vertical and Horizontal Scoring of Open-Ended Questionnaires A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to the Practical Assessment, Research & Evaluation. Permission is granted to

More information

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz This study presents the steps Edgenuity uses to evaluate the reliability and validity of its quizzes, topic tests, and cumulative

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

Psychology, 2010, 1: doi: /psych Published Online August 2010 (

Psychology, 2010, 1: doi: /psych Published Online August 2010 ( Psychology, 2010, 1: 194-198 doi:10.4236/psych.2010.13026 Published Online August 2010 (http://www.scirp.org/journal/psych) Using Generalizability Theory to Evaluate the Applicability of a Serial Bayes

More information

Using Differential Item Functioning to Test for Inter-rater Reliability in Constructed Response Items

Using Differential Item Functioning to Test for Inter-rater Reliability in Constructed Response Items University of Wisconsin Milwaukee UWM Digital Commons Theses and Dissertations May 215 Using Differential Item Functioning to Test for Inter-rater Reliability in Constructed Response Items Tamara Beth

More information

On the purpose of testing:

On the purpose of testing: Why Evaluation & Assessment is Important Feedback to students Feedback to teachers Information to parents Information for selection and certification Information for accountability Incentives to increase

More information

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug? MMI 409 Spring 2009 Final Examination Gordon Bleil Table of Contents Research Scenario and General Assumptions Questions for Dataset (Questions are hyperlinked to detailed answers) 1. Is there a difference

More information

Types of Tests. Measurement Reliability. Most self-report tests used in Psychology and Education are objective tests :

Types of Tests. Measurement Reliability. Most self-report tests used in Psychology and Education are objective tests : Measurement Reliability Objective & Subjective tests Standardization & Inter-rater reliability Properties of a good item Item Analysis Internal Reliability Spearman-Brown Prophesy Formla -- α & # items

More information

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0% Capstone Test (will consist of FOUR quizzes and the FINAL test grade will be an average of the four quizzes). Capstone #1: Review of Chapters 1-3 Capstone #2: Review of Chapter 4 Capstone #3: Review of

More information

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method Biost 590: Statistical Consulting Statistical Classification of Scientific Studies; Approach to Consulting Lecture Outline Statistical Classification of Scientific Studies Statistical Tasks Approach to

More information

REVIEW ARTICLE. A Review of Inferential Statistical Methods Commonly Used in Medicine

REVIEW ARTICLE. A Review of Inferential Statistical Methods Commonly Used in Medicine A Review of Inferential Statistical Methods Commonly Used in Medicine JCD REVIEW ARTICLE A Review of Inferential Statistical Methods Commonly Used in Medicine Kingshuk Bhattacharjee a a Assistant Manager,

More information

Estimating Individual Rater Reliabilities John E. Overall and Kevin N. Magee University of Texas Medical School

Estimating Individual Rater Reliabilities John E. Overall and Kevin N. Magee University of Texas Medical School Estimating Individual Rater Reliabilities John E. Overall and Kevin N. Magee University of Texas Medical School Rating scales have no inherent reliability that is independent of the observers who use them.

More information

Measurement and Descriptive Statistics. Katie Rommel-Esham Education 604

Measurement and Descriptive Statistics. Katie Rommel-Esham Education 604 Measurement and Descriptive Statistics Katie Rommel-Esham Education 604 Frequency Distributions Frequency table # grad courses taken f 3 or fewer 5 4-6 3 7-9 2 10 or more 4 Pictorial Representations Frequency

More information

The recommended method for diagnosing sleep

The recommended method for diagnosing sleep reviews Measuring Agreement Between Diagnostic Devices* W. Ward Flemons, MD; and Michael R. Littner, MD, FCCP There is growing interest in using portable monitoring for investigating patients with suspected

More information

investigate. educate. inform.

investigate. educate. inform. investigate. educate. inform. Research Design What drives your research design? The battle between Qualitative and Quantitative is over Think before you leap What SHOULD drive your research design. Advanced

More information

Daniel Boduszek University of Huddersfield

Daniel Boduszek University of Huddersfield Daniel Boduszek University of Huddersfield d.boduszek@hud.ac.uk Introduction to Correlation SPSS procedure for Pearson r Interpretation of SPSS output Presenting results Partial Correlation Correlation

More information

INTRODUCTION TO STATISTICS SORANA D. BOLBOACĂ

INTRODUCTION TO STATISTICS SORANA D. BOLBOACĂ INTRODUCTION TO STATISTICS SORANA D. BOLBOACĂ OBJECTIVES Definitions Stages of Scientific Knowledge Quantification and Accuracy Types of Medical Data Population and sample Sampling methods DEFINITIONS

More information

Use of the Quantitative-Methods Approach in Scientific Inquiry. Du Feng, Ph.D. Professor School of Nursing University of Nevada, Las Vegas

Use of the Quantitative-Methods Approach in Scientific Inquiry. Du Feng, Ph.D. Professor School of Nursing University of Nevada, Las Vegas Use of the Quantitative-Methods Approach in Scientific Inquiry Du Feng, Ph.D. Professor School of Nursing University of Nevada, Las Vegas The Scientific Approach to Knowledge Two Criteria of the Scientific

More information

Chapter -6 Reliability and Validity of the test Test - Retest Method Rational Equivalence Method Split-Half Method

Chapter -6 Reliability and Validity of the test Test - Retest Method Rational Equivalence Method Split-Half Method Chapter -6 Reliability and Validity of the test 6.1 Introduction 6.2 Reliability of the test 6.2.1 Test - Retest Method 6.2.2 Rational Equivalence Method 6.2.3 Split-Half Method 6.3 Validity of the test

More information

Saville Consulting Wave Professional Styles Handbook

Saville Consulting Wave Professional Styles Handbook Saville Consulting Wave Professional Styles Handbook PART 4: TECHNICAL Chapter 19: Reliability This manual has been generated electronically. Saville Consulting do not guarantee that it has not been changed

More information

Choosing the Correct Statistical Test

Choosing the Correct Statistical Test Choosing the Correct Statistical Test T racie O. Afifi, PhD Departments of Community Health Sciences & Psychiatry University of Manitoba Department of Community Health Sciences COLLEGE OF MEDICINE, FACULTY

More information

Reliability, validity, and all that jazz

Reliability, validity, and all that jazz Reliability, validity, and all that jazz Dylan Wiliam King s College London Introduction No measuring instrument is perfect. The most obvious problems relate to reliability. If we use a thermometer to

More information

10 Intraclass Correlations under the Mixed Factorial Design

10 Intraclass Correlations under the Mixed Factorial Design CHAPTER 1 Intraclass Correlations under the Mixed Factorial Design OBJECTIVE This chapter aims at presenting methods for analyzing intraclass correlation coefficients for reliability studies based on a

More information

Epidemiologic Methods I & II Epidem 201AB Winter & Spring 2002

Epidemiologic Methods I & II Epidem 201AB Winter & Spring 2002 DETAILED COURSE OUTLINE Epidemiologic Methods I & II Epidem 201AB Winter & Spring 2002 Hal Morgenstern, Ph.D. Department of Epidemiology UCLA School of Public Health Page 1 I. THE NATURE OF EPIDEMIOLOGIC

More information

STATISTICS AND RESEARCH DESIGN

STATISTICS AND RESEARCH DESIGN Statistics 1 STATISTICS AND RESEARCH DESIGN These are subjects that are frequently confused. Both subjects often evoke student anxiety and avoidance. To further complicate matters, both areas appear have

More information

From Bivariate Through Multivariate Techniques

From Bivariate Through Multivariate Techniques A p p l i e d S T A T I S T I C S From Bivariate Through Multivariate Techniques R e b e c c a M. W a r n e r University of New Hampshire DAI HOC THAI NGUYEN TRUNG TAM HOC LIEU *)SAGE Publications '55'

More information

Supplement 2. Use of Directed Acyclic Graphs (DAGs)

Supplement 2. Use of Directed Acyclic Graphs (DAGs) Supplement 2. Use of Directed Acyclic Graphs (DAGs) Abstract This supplement describes how counterfactual theory is used to define causal effects and the conditions in which observed data can be used to

More information

Chapter 4B: Reliability Trial 2. Chapter 4B. Reliability. General issues. Inter-rater. Intra - rater. Test - Re-test

Chapter 4B: Reliability Trial 2. Chapter 4B. Reliability. General issues. Inter-rater. Intra - rater. Test - Re-test Chapter 4B: Reliability Trial 2 Chapter 4B Reliability General issues Inter-rater Intra - rater Test - Re-test Trial 2 The second clinical trial was conducted in the spring of 1992 using SPCM Draft 6,

More information

COMMITMENT &SOLUTIONS UNPARALLELED. Assessing Human Visual Inspection for Acceptance Testing: An Attribute Agreement Analysis Case Study

COMMITMENT &SOLUTIONS UNPARALLELED. Assessing Human Visual Inspection for Acceptance Testing: An Attribute Agreement Analysis Case Study DATAWorks 2018 - March 21, 2018 Assessing Human Visual Inspection for Acceptance Testing: An Attribute Agreement Analysis Case Study Christopher Drake Lead Statistician, Small Caliber Munitions QE&SA Statistical

More information

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots Correlational Research Stephen E. Brock, Ph.D., NCSP California State University, Sacramento 1 Correlational Research A quantitative methodology used to determine whether, and to what degree, a relationship

More information

HOW STATISTICS IMPACT PHARMACY PRACTICE?

HOW STATISTICS IMPACT PHARMACY PRACTICE? HOW STATISTICS IMPACT PHARMACY PRACTICE? CPPD at NCCR 13 th June, 2013 Mohamed Izham M.I., PhD Professor in Social & Administrative Pharmacy Learning objective.. At the end of the presentation pharmacists

More information

University of Wollongong. Research Online. Australian Health Services Research Institute

University of Wollongong. Research Online. Australian Health Services Research Institute University of Wollongong Research Online Australian Health Services Research Institute Faculty of Business 2011 Measurement of error Janet E. Sansoni University of Wollongong, jans@uow.edu.au Publication

More information

Validity and reliability of measurements

Validity and reliability of measurements Validity and reliability of measurements 2 3 Request: Intention to treat Intention to treat and per protocol dealing with cross-overs (ref Hulley 2013) For example: Patients who did not take/get the medication

More information

Reliability and Validity

Reliability and Validity Reliability and Today s Objectives Understand the difference between reliability and validity Understand how to develop valid indicators of a concept Reliability and Reliability How accurate or consistent

More information

32.5. percent of U.S. manufacturers experiencing unfair currency manipulation in the trade practices of other countries.

32.5. percent of U.S. manufacturers experiencing unfair currency manipulation in the trade practices of other countries. TECH 646 Analysis of Research in Industry and Technology PART III The Sources and Collection of data: Measurement, Measurement Scales, Questionnaires & Instruments, Sampling Ch. 11 Measurement Lecture

More information

Introduction to Reliability

Introduction to Reliability Reliability Thought Questions: How does/will reliability affect what you do/will do in your future job? Which method of reliability analysis do you find most confusing? Introduction to Reliability What

More information

An Examination of Culture Bias in the Wonderlic Personnel Test*

An Examination of Culture Bias in the Wonderlic Personnel Test* INTELLIGENCE 1, 51--64 (1977) An Examination of Culture Bias in the Wonderlic Personnel Test* ARTHUR R. JENSEN University of California, Berkeley Internal evidence of cultural bias, in terms of various

More information

Chronicles of Dental Research

Chronicles of Dental Research REVIEW ARTICLE Validity and reliability of a questionnaire: a literature review Shyamalima Bhattacharyya 1 Ramneek Kaur 1 Sukirat Kaur 1 Syed Amaan Ali 1 Abstarct Questionnaires form an important part

More information

Collecting & Making Sense of

Collecting & Making Sense of Collecting & Making Sense of Quantitative Data Deborah Eldredge, PhD, RN Director, Quality, Research & Magnet Recognition i Oregon Health & Science University Margo A. Halm, RN, PhD, ACNS-BC, FAHA Director,

More information

Convergence Principles: Information in the Answer

Convergence Principles: Information in the Answer Convergence Principles: Information in the Answer Sets of Some Multiple-Choice Intelligence Tests A. P. White and J. E. Zammarelli University of Durham It is hypothesized that some common multiplechoice

More information

Statistical Methods For Assessing Measurement Error (Reliability) in Variables Relevant to Sports Medicine

Statistical Methods For Assessing Measurement Error (Reliability) in Variables Relevant to Sports Medicine REVIEW ARTICLE Sports Med 1998 Oct; 26 (4): 217-238 0112-1642/98/0010-0217/$11.00/0 Adis International Limited. All rights reserved. Statistical Methods For Assessing Measurement Error (Reliability) in

More information

Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University.

Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University. Running head: ASSESS MEASUREMENT INVARIANCE Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies Xiaowen Zhu Xi an Jiaotong University Yanjie Bian Xi an Jiaotong

More information

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Jill F. Kilanowski, PhD, APRN,CPNP Associate Professor Alpha Zeta & Mu Chi Acknowledgements Dr. Li Lin,

More information

Reliability Analysis: Its Application in Clinical Practice

Reliability Analysis: Its Application in Clinical Practice Reliability Analysis: Its Application in Clinical Practice NahathaiWongpakaran Department of Psychiatry, Faculty of Medicine Chiang Mai University, Thailand TinakonWongpakaran Department of Psychiatry,

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

Skew and Internal Consistency

Skew and Internal Consistency Journal of Applied Psychology Copyright 2006 by the American Psychological Association 2006, Vol. 91, No. 6, 1351 1358 0021-9010/06/$12.00 DOI: 10.1037/0021-9010.91.6.1351 RESEARCH REPORTS Skew and Internal

More information

Measures. David Black, Ph.D. Pediatric and Developmental. Introduction to the Principles and Practice of Clinical Research

Measures. David Black, Ph.D. Pediatric and Developmental. Introduction to the Principles and Practice of Clinical Research Introduction to the Principles and Practice of Clinical Research Measures David Black, Ph.D. Pediatric and Developmental Neuroscience, NIMH With thanks to Audrey Thurm Daniel Pine With thanks to Audrey

More information

Chapter 5: Field experimental designs in agriculture

Chapter 5: Field experimental designs in agriculture Chapter 5: Field experimental designs in agriculture Jose Crossa Biometrics and Statistics Unit Crop Research Informatics Lab (CRIL) CIMMYT. Int. Apdo. Postal 6-641, 06600 Mexico, DF, Mexico Introduction

More information

Understanding Uncertainty in School League Tables*

Understanding Uncertainty in School League Tables* FISCAL STUDIES, vol. 32, no. 2, pp. 207 224 (2011) 0143-5671 Understanding Uncertainty in School League Tables* GEORGE LECKIE and HARVEY GOLDSTEIN Centre for Multilevel Modelling, University of Bristol

More information

Survey research (Lecture 1) Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.

Survey research (Lecture 1) Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4. Summary & Conclusion Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.0 Overview 1. Survey research 2. Survey design 3. Descriptives & graphing 4. Correlation

More information

Survey research (Lecture 1)

Survey research (Lecture 1) Summary & Conclusion Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.0 Overview 1. Survey research 2. Survey design 3. Descriptives & graphing 4. Correlation

More information

Measures of Clinical Significance

Measures of Clinical Significance CLINICIANS GUIDE TO RESEARCH METHODS AND STATISTICS Deputy Editor: Robert J. Harmon, M.D. Measures of Clinical Significance HELENA CHMURA KRAEMER, PH.D., GEORGE A. MORGAN, PH.D., NANCY L. LEECH, PH.D.,

More information

SURVEY TOPIC INVOLVEMENT AND NONRESPONSE BIAS 1

SURVEY TOPIC INVOLVEMENT AND NONRESPONSE BIAS 1 SURVEY TOPIC INVOLVEMENT AND NONRESPONSE BIAS 1 Brian A. Kojetin (BLS), Eugene Borgida and Mark Snyder (University of Minnesota) Brian A. Kojetin, Bureau of Labor Statistics, 2 Massachusetts Ave. N.E.,

More information