Statistical Analysis of Method Comparison Data

Similar documents
Sample Size and Screening Size Trade Off in the Presence of Subgroups with Different Expected Treatment Effects

Determinants of Cancer Screening Frequency: The Example of Screening for Cervical Cancer

Autoencoder networks for HIV classification

GENETIC AND SOMATIC EFFECTS OF IONIZING RADIATION

1 Thinking Critically With Psychological Science

An investigation of ambiguous-cue learning in pigeons

Incentives, information, rehearsal, and the negative recency effect*

Instantaneous Measurement and Diagnosis

Preview and Preparation Pack. AS & A2 Resources for the new specification

EPSAC Predictive Control of Blood Glucose Level in Type I Diabetic Patients

Quantifying the benefit of SHM: what if the manager is not the owner?

Me? Debunk a Vancomycin myth?... Take my life in my hands?

The Leicester Cough Monitor: preliminary validation of an automated cough detection system in chronic cough

Temporal organization of pattern structure

How can skin conductance responses increase over trials while skin resistance responses decrease?

Hepatitis C & B Co-infection PROJECT ECHO HEPC FEBRUARY 9, 2017 PRESENTED BY: DR. JOHN GUILFOOSE

What happened on the Titanic at 11:40 on the night of April 14, 1912,

TRICHOMES AND CANNABINOID CONTENT OF DEVELOPING LEAVES AND BRACTS OF CANNABIS SATIVA L. (CANNABACEAE) 1

JEJUNAL AND ILEAL ABSORPTION OF DIBASIC AMINO ACIDS AND AN ARGININE-CONTAINING DIPEPTIDE IN CYSTINURIA

Contrast Affects Flicker and Speed Perception Differently

Culture Bias in Clinical Assessment: Using New Metrics to Address Thorny Problems in Practice and Research

The Whopper has been Burger King s signature sandwich since 1957.

RELATIONSHIPS OF MECHANICAL POWER TO PERCEIVED EXERTION AND VARIOUS PHYSIOLOGICAL PARAMETERS MEASURED IN ELITE YOUTH DISTANCE RUNNERS AND CONTROLS

Left Ventricular Mass and Volume: Fast Calculation with Guide-Point Modeling on MR Images 1

Classification of ADHD and Non-ADHD Using AR Models and Machine Learning Algorithms

Effects of alpha-1 adrenergic receptor antagonist, terazosin, on cardiovascular functions in anaesthetised dogs

Assessment of "Average of Normals" Quality Control Procedures and Guidelines for Implementation

Fluorescent body distribution in spermatozoa in the male with exclusively female offspring*

Scratch and Match: Pigeons Learn Matching and Oddity With Gravel Stimuli

Outcomes for COPD pharmacological trials: from lung function to biomarkers

HEPTADECAPEPTIDE GASTRIN: MEASUREMENT IN BLOOD BY SPECIFIC RADIOIMMUNOASSAY

Balkan Journal of Mechanical Transmissions (BJMT)

Register studies from the perspective of a clinical scientist

Exercise testing in pulmonary arterial hypertension and in chronic heart failure

North Wales Area Planning Board for Substance Misuse

u Among postmenopausal women, hormone therapy with u CEE plus MPA for a median of 5.6 years or u CEE alone for a median of 7.

Human colorectal cancers display abnormal Fourier-transform infrared spectra

Radio Frequency Exposure Risk Assessment and Communication Critique of ARPANSA TRS-164 Report: Do we have a problem? Victor Leach and Steven Weller

Starch Digestion in Normal Subjects and Patients With Pancreatic Disease, Using a

The timed walk test as a measure of severity and survival in idiopathic pulmonary fibrosis

ICNIRP/ARPANSA GUIDELINES need urgent review. Victor Leach

Tricarboxylic Acid Metabolism Studies in the Ovary Throughout the Menstrual Cycle. S. J. Behrman, M.D., M.R.C.O.G., and Gregory S. Duboff, M.S., D.Sc.

Standardization of the One-stage Prothrombin Time for the Control of Anticoagulant Therapy

Optimized Fuzzy Logic Based Segmentation for Abnormal MRI Brain Images Analysis

COVER THE CATERPILLAR

Controlled processing in pigeons

Idiopathic chronic eosinophilic pneumonia and asthma: how do they influence each other?

Cumulative pregnancy rates for in vitro fertilization

LEUKOCYTE AND LYMPHOCYTE CYCLIC AMP RESPONSES IN ATOPIC ECZEMA

Sexual Behavior, HIV, and Fertility Trends: A Comparative Analysis of Six Countries

A Radically New Theory of how the Brain Represents and Computes with Probabilities

Pharmacokinetics of phenylpropanolamine in humans after a single-dose study

and Fertility Decline in Southeast Asia: to

Bullous pemphigoid (BP) represents the commonest

AMINO TERMINAL GASTRIN FRAGMENT IN SERUM OF ZOLLINGER-ELLISON SYNDROME PATIENTS

cystic fibrosis today

Increased follicular fluid total and free cortisol levels during the luteinizing hormone surge

Recommendations. for the Governance & Administration of Destination Marketing Fees

Upright versus upside-down faces: How interface attractiveness varies with orientation

Regulation of Cannabis Retail Sales in Port Moody

Self-Fuzzification Method according to Typicality Correlation for Classification on tiny Data Sets

Synchronous Oscillations in the Basal Secretion of Pancreatic-Polypeptide and Gastric Acid

Quinpirole and d-amphetamine administration posttraining enhances memory on spatial

A Fellowship in Pediatric Palliative Care:

Talking About. And Dying. A Discussion Tool For Residential Aged Care Facility Staff

Preview. Guide. Introductory Exercise: Fact or Falsehood?

History of Prostate Cancer Screening and Current CUA Guidelines

TREATMENT of hypogonadotropic hypogonadism

Self-control trainings: What we (do not) know so far

Quantitative Fecal Indium Ill-Labeled Leukocyte Excretion in the Assessment of Disease in Crohn's Disease

Review Protocol for Radiation Thermometry CMCs

STATISTICS AND RESEARCH DESIGN

The Ins and Outs of Enteral Nutrition

Barbara Head, PhD, CHPN, ACSW, FPCN Bonika Peters, MPH

SUPPORTING PREGNANT AND PARENTING WOMEN WHO USE SUBSTANCES What Communities are Doing to Help

Edge. Danbred. The. Livingston Enterprises... A Study in Persistence and Commitment. Volume 1 Issue 1 May 2006

COMBUSTION GENERATED PARTICULATE EMISSIONS

Author(s) TAKAHASHI, Munezoh; NISHIMOTO, Koic.

Application of RNNDNA ratio and tryptic enzyme activity on laboratory-reared and wild-caught herring larvae - Short communication -

Invacare Matrx Libra

Selecting a Risk-Based SQC Procedure for a HbA1c Total QC Plan

Discrimination of color-odor compounds by honeybees: Tests of a continuity model

TRACE ELEMENTS IN THE HAIRS OF WINTERING MEMBERS OF THE 13TH JAPANESE ANT ARCTIC RESEARCH EXPEDITION. Hiroshi KozuKA * and Yukio KANDA *

CENTER FOR VIOLENCE PREVENTION

Polysaccharide Hydrolysis and Metallic Impurities Removal Behavior of Rice Husks in Citric Acid Leaching Treatment

Pulmonary Hypertension In Pediatrics

Long-term effects of food deprivation: II. Impact on morphine reactivity

Demography and Language Competition

Mould exposure at home relates to inflammatory markers in blood

MR Detection of Brain Iron

The Future of HIV Care in Nevada TRUDY LARSON, MD PROFESSOR AND DEAN, SCHOOL OF COMMUNITY HEALTH SCIENCES UNIVERSITY OF NEVADA, RENO

Aline Désesquelles 1, Michele Antonio Salvatore 2, France Meslé 1, Viviana Egidi 2, Marilena Pappagallo 3, Luisa Frova 3, Monica Pace 3

The Male Orgasm: Pelvic Contractions Measured by Anal Probe

Detection of Vasospasm following Subarachnoid Hemorrhage Using Transcranial Doppler

Lipoprotein(a) in Cerebrovascular and Coronary Atherosclerosis

Prevent, Promote, Provoke: Voices from the Substance Abuse Field

Delcath Investor Presentation (NASDAQ: DCTH)

Advance Care Planning in the Chronic Kidney Disease Population A Quality Improvement Project

DIRECT TRANSHEPATIC MEASUREMENT OF PORTAL VEIN PRESSURE USING A THIN NEEDLE

Enzyme-linked Immunoassay Index for Anti-NC16a IgG and IgE Autoantibodies Correlates with Severity and Activity of Bullous Pemphigoid

Transcription:

Statistical Analysis of Method Comparison Data Testing rmality GEORGE S. CEMBROWSKI, PH.D., JAMES O. WESTGARD, PH.D., WILLIAM J. CONOVER, PH.D., AND ERIC C. TOREN, JR., PH.D. Cembrowski, George S., Westgard, James O., Conover, William J., and Toren, Eric C, Jr.: Statistical analysis of method comparison data. Testing normality. Am J Clin Pathol 72: 2-26, 979. A Lilliefors test of normality has been applied to data from precision and accracy stdies. Most data sets tested as non-normal. Simlation stdies showed that the test is extremely sensitive to the ronded, narrowly distribted data that are typical of method performance stdies in clinical chemistry. The Lilliefors test can be modified to be applicable to ronded data so that it gives fewer indications of nonnormality. The athors conclde that the selection of a test of normality reqires carefl stdy of the properties of the test. Otherwise, the sbseqent choice between parametric and nonparametric statistics may not be meaningfl. (Key words: Method comparison stdies; Statistics; Tests of normality; Lilliefors test; Kolmogorov-Smirnov test; nparametric statistics.) THE ACCEPTABILITY of a laboratory method depends in part on the method's precision and accracy. These measres are derived primarily from the statistical analysis of replication data and methodcomparison data. The appropriateness of sing either a parametric or a nonparametric statistical approach depends on the freqency distribtion of the data. Parametric tests sch as the t test, the F test, the Pearson prodct moment correlation coefficient, and regression analysis assme that distribtions are normal, or gassian. nparametric tests sch as the sign test, Wilcoxon's test, and Spearman's rank correlation coefficient make no assmption of normality and can be applied to normal and non-normal data alike. Received December 27, 977; received revised manscript and accepted for pblication May 25, 978. Spported by Grants. GM 978 and GN 24453 from the National Instittes of Health, Grant. GP-43625X from the National Science Fondation, by a General Research Spport Sb-Grant from the University of Wisconsin Medical School, a compting grant from the Gradate School, and by the Clinical Laboratories, University of Wisconsin Hospitals, Madison, Wisconsin. Presented in part at the Twenty-eighth National Meeting of the American Association of Clinical Chemists, Hoston, Texas, Agst -6, 976. Address reprint reqests to Dr. Cembrowski: Department of Pathology, University of Wisconsin Hospitals, Clinical Science Center, 600 Highland Avene, Madison, Wisconsin 53792. Departments of Pathology and Medicine and the Clinical Laboratories, University of Wisconsin, Center for Health Sciences, Madison Wisconsin W and associates" recommended a form of the Lilliefors 5,7 statistical test for testing whether clinical laboratory data have a gassian, or normal, freqency distribtion. This test of normality, a specialized Kolmogorov test, 8 was sed to determine whether parametric or nonparametric technics shold be employed for the analysis of method comparison data. An example illstrating the calclation of the Lilliefors test was inclded by W and associates." Recently the se of this test has been recommended in a contining edcation pblication of ASCP.' Frther applications are expected to appear in the clinical pathology literatre. We have stdied the se of the recommended method for testing normality and have reservations abot its general application to clinical chemical data. We have confirmed the observation of W and colleages that the differences between paired reslts of patient specimens analyzed by two methods generally demonstrate non-normality when evalated by the recommended test. We tested method-comparison differences for 2 commonly measred constitents, sing data from stdies comparing the Technicon SMAC and SMA 2/60 analyzers* and the DPont ACA.t In addition, we tested precision data from lyophilized pools, which were analyzed by these instrments, all operating in rotine service. These data also tested predominantly non-normal. These observations abot the non-normal distribtion of data from clinical chemical performance stdies assme that the chosen test of normality has appropriate sensitivity for the type of data being tested. There are many tests for normality available, each having its own properties. 9 We present here some investigations which evalate the sensitivity of the Lilliefors test to ronded, * Technicon Corporation, Tarrytown, New York 59. t E. I. DPont de Nemors and Co..Wilmington, Delaware 9898. 0002-973/79/0700/002 $00.80 American Society of Clinical Pathologists 2 on 02 March 8

22 CEMBROWSKlErAL. A.J.C.P.. Jly 979.000 >- c_> -z. Z) a LJJ LT 5 O 5- - 0.800- =5 o.eoo ^2 0.400 a 0-44 45 46 47 48 49 44 45 46 47 48 49 BUN (MG/DL ) Fie.. Freqency histogram of simlated blood rea nitrogen vales. X = 46. mg/dl, SD =.02 mg/dl. narrowly distribted data that are typical of the performance stdies for atomated instrments and wellcorrelated chemical methodologies. Sensitivity of the Lilliefors Test For stdy prposes, we assmed the simplest case, where the differences between the test method and the reference method are the reslt of only the random error in the test method. If the test method had normally distribted errors, then the between-method differences wold also be normally distribted. We therefore stdied precision data, even thogh W and associates proposed the test for examining the differences between BUN (MG/DL) FIG. 2. Empirical distribtion fnction of data of Figre (see text). methods in a patient comparison stdy, rather than for testing the precision of an individal method. A compter was sed to generate normally distribted data whose means and standard deviations were similar to those of control observations prodced by each test channel of the SMA 2/60. Twelve series of 800 simlated normally distribted control reslts (one series for each test of SMA 2/60) were prodced by the scaling and sbseqent ronding of comptergenerated random normally distribted nmbers. Table shows the tests, their averages and standard deviations, and the significant figres to which the test reslts were ronded. Sample sizes of, 40,, 0, and 800 were tested for normality at the a = 5 Table I. Compter-simlated rmally Distribted SMA 2 Control Data, Means, and Standard Deviations for a Test Sample of 800, rmality a = 5, as Tested by W and Associates" Test* Mean Standard Deviation Figre to Which Test Reslts Were Ronded n = Accepted as formal n = 40 n = n = 0 n = 800 Calcim (mg/dl) Phosphors (mg/dl) Glcose (mg/dl) BUN (mg/dl) Uric acid (mg/dl) Cholesterol (mg/dl) Total protein (g/dl) Albmin (g/dl) Total bilirbin (mg/dl) ALP (/dl) LD (/) AST (/) 6.43 3.73 65.9 46. 5.70 30.9 4.3 2.6.44 8.2 52. 68.6 0.8 95 3.32.5 88 3.3 7 86 0.406 9.0 3.7 * Abbreviations sed: BUN = blood rea nitrogen; LD = lactate dehydrogenase (L-lactate: NAD oxidoredctase, EC...27); ALP = alkaline phosphatase (orthophosphoric acid monester phosphohydrolase, EC 3..3.); AST = serm gltamic oxaloacetic transaminase (L-aspartate: 2-oxogltarate aminotransferase, EC 2.6..). on 02 March 8

Vol. 72. I TESTING NORMALITY 23 significance level. As shown in Table, most of these normally distribted data sets tested as non-normal, especially data sets that had a large sample size or narrow distribtion with few concentration intervals. This sggested that the Lilliefors test was either overly sensitive or inappropriate for ronded, narrowly distribted data that are typical of method performance stdies in clinical chemistry. Rationale for a Modified Lilliefors Test of rmality In the application of the Lilliefors test, the data's empirical cmlative distribtion is compared with the normal cmlative distribtion fnction. The distribtion is classified as non-normal when the maximm vertical distance between the two fnctions exceeds the Lilliefors test statistic, which is tablated by sample size (n) and confidence coefficient (p), where - p = a, the significance level for the test. 5 As an example, Figre shows a histogram of the first vales of the simlated blood rea nitrogen (BUN) test data. The empirical distribtion fnction of these data is presented in Figre 2. The empirical distribtion fnction and the theoretical cmlative normal distribtion fnction are compared in Figre 3. For each test vale, the empirical distribtion fnction is defined as the fraction of test reslts that are less than or eqal to that test vale. Becase the BUN data have been ronded to the nearest integer, the fraction of test reslts that are less than a specific integral test reslt cannot be determined. For example, any test reslt ranging from 45.5 to 46.5 mg/dl will be ronded to 46. Therefore, the fraction of test reslts below 46 cannot be ascertained. However, the fraction of test reslts less than 45.5 or 46.5 mg/dl can easily be calclated. As shown in Figre 2, the vale of the empirical distribtion fnction for a BUN vale of 46.5 mg/dl is 0.65, i.e., 65% of the BUN data are lower than 46.5 mg/dl. In order to compare the normal cmlative distribtion fnction and the empirical distribtion fnction, the test vales are normalized by sbtracting their mean and dividing by their standard deviation. This transformation does not change the shape of the empirical distribtion crve; the test reslts are merely centered arond zero. The position of each test vale on the abscissa corresponds to the difference in standard deviations between the mean and that test vale. In Figre 3, where the empirical distribtion fnction (with normalized BUN vales) and the normal cmlative distribtion fnction (obtained from standard statistical tables 8 ) are sperimposed, the maximm vertical distance between the fnctions indicates the degree of non-normality. With the se of the algorithm of W and associates,'' the vertical distances between the two ION i ~z. Z) CO ( Q Q_ U.000 0.800 0-600- 0-400- o.ooo NORriflLIZED BUN VALUES FIG. 3. rmalized empirical distribtion fnction with the normal cmlative distribtion fnction sperimposed. The vertical line at the normalized vale of 0 on the X axis corresponds to the mean vale of approximately 46, as shown in the above figres. Using the algorithm of W and associates, Dl is the maximm vertical distance between the empirical distribtion fnction and the normal cmlative distribtion fnction. D2 is the maximm vertical distance when measred at midpoints between consective test vales, rather than at the test vales. fnctions are measred at each of the test vales. Ths in Figre 3 the maximm vertical distance is Dl, which has a vale of 0.. This mst be compared with the test statistic, which is 5 for n = and a =. Since Dl exceeds the test statistic, the distribtion is classified as non-normal, with only a % probability that this cold occr de to chance. Becase the empirical distribtion fnction is not rigorosly defined at each of the test vales, the comparison of the empirical distribtion fnction and the normal cmlative distribtion fnction at each of these test vales is not appropriate. The empirical distribtion fnction can be determined only at points on the abscissa corresponding to midpoints between two sccessive test vales. The maximm vertical distance measred at the midpoints is 2 (D2), which is considerably smaller than 5, the test statistic for a =. The Appendix describes an algorithm for a modified Lilliefors test that takes into consideration the ronding of data. There are two differences between the modified algorithm and that of W and associates." First, distances between the empirical distribtion fnction and the normal cmlative distribtion fnction are measred at points midway between sccessive test vales and not at the test vales. Second, only one distance is measred for each set of identical test points, whereas the algorithm cited by W and as- on 02 March 8

24 CEMBROWSKIEIAi.. A.J.C.P.. Jly 979 Table 2. Probabilities that the Lilliefors Test Statistic is Exceeded for Certain Significance Levels (a)* Test Nmber in Grop (n) a = 0. Probabi lity Test Statistic Exceeded for a = 5 a = 0. a = 5 a = Calcim 0 5 32 2 8 Phosphors 0 07 5 38 07 Glcose 0 5 28 38 48 40 2 38 30 3 22 09 Blood rea nitrogen 0 Uric acid 0 2 8 07 Cholesterol 0 46 68 62 88 98 3 52 48 68 86 23 39 22 46 56 3 8 Total protein 0 Albmin 0 Total bilirbin 0 Alkaline phosphatase 0 3 48 38 22 28 28 Lactate dehydrogenase 0 32 40 74 86 96 2 23 52 60 76 2 36 36 52 2 o:o28 on 02 March 8

Vol. 72. I TESTING NORMALITY 25 Table 2. (Contined) Test Nmber in Grop (n) a = 0. Probability Test Statistic Exceeded for a = 5 a = 0. a = 5 a = Serm gltamic oxaloacetic transaminase 0 32 40 46 7 22 30 40 2 8 28 09 ' For n - and n =,,000 sets of normality distribted data were generated at each n. For n =,, and 0, 0 sets were generated. sociates measred as many distances as there were points. Characteristics of the Modified Lilliefors Test Tests of the Kolmogorov type," when properly applied to discrete data, are conservative. This means that when the test classifies a method as non-normal, a is actally lower than what is being specified. The tre probability that the maximm vertical distance will exceed the test statistic will be lower than the significance level of that test statistic. 3,4 For example, the Lilliefors test statistic is 5 for n =. and a =. Given a set of normal discrete data, the probability that the maximm vertical distance will exceed 5 will actally be less than. Conover" has proposed a Kolmogorov test for discontinos distribtions in which the exact test statistic can be calclated. The calclations, however, become difficlt with a sample size greater than 30. To determine the degree of conservativeness of the proposed modified Lilliefors test, normally distribted SMA 2/60 data were simlated, groped in varios sample sizes, and tested. At least 0 different grops of data were simlated at each sample size. Estimates of the probability that a normal set of data wold test as non-normal were obtained from the fractions of non-normal grops. The simlation reslts are presented in Table 2. The empirically derived probabilities are considerably lower than those expected with a continos distribtion, especially for those tests that prodce very few different test vales (cf. standard deviations and least significant digits, Table ). These tests inclde total protein, albmin, and total bilirbin. The probability that a set of data is fond to be non-normal is greater for tests that prodce more intervals in a histogram, e.g., cholesterol, the enzymes, and glcose. These reslts indicate that the modified Lilliefors test is extremely conservative. To investigate the sensitivity of the method to otliers, grops of normal simlated SMA 2/60 data with single otliers were tested for normality. Otliers at 3, 4, and 5 standard deviations (SD) from the mean had no effect on the reslt of normality for grops as small as n =. rmal distribtions of SD = and n = tested as non-normal with otliers more than 6 SD away from the mean. Otliers at 8 SD reslted in grops as large as n = testing as non-normal. The compter-simlated normally distribted data of Table all tested as normal with the se of the modified Lilliefors test. Application of the test to the somatotropin difference data of Figre of W and associates" showed that the data previosly classified as non-normal were also normally distribted. The data cited in Table 4 of W and associates" were reanalyzed sing the modified algorithm. Two of the three previosly nonnormal sets were shown to be non-normal at lesser significance levels, a = 5 instead of a =. The data from Volme 7, page 79 (97), contained an extreme otlier that, when removed, cold not be proved as non-normal (a = 5). The other sets of data were normal. Discssion These investigations show that the choice of a test of normality reqires carefl stdy. One test may be overly sensitive and another very conservative. The choice of the test essentially determines whether the performance data test as normal or non-normal. For precision and accracy stdies of clinical chemical methods, the test proposed by W and associates will indicate a high freqency of non-normality, even with normally distribted data. The test as modified here will give a lower freqency of non-normality. The selection at present depends primarily on the point of view of the investigator. One who favors nonparametric statistics may select the more sensitive test, and one who favors parametric statistics, the less sensitive test. An objective choice of a test of normality reqires first that stdies be made to determine the natre of error distribtions for clinical analytic methods and how the distribtion affects the interpretation of the performance stdy data. on 02 March 8

26 CEMBROWSKICTAL. A.J.C.P.. Jly 979 Table 3. Example of Modified Lilliefors Test Calclation (X, - X)/SD + (0.5 NCDF X, (X, - X)/SD LSD*)/SD NCDFt EDFt - EDF -2.4 -.2 -.0-0.9-0.7-0.2 0.3 0.3 0.5.0 4.0 -.853-0.929-0.775-0.698-0.544-82 - 72 49 0.226 0.380 0.765 3.075 -.84-0.890-0.736-0.660-0.6-43 88 0.264 0.48 0.804 3.4 n = 7, X =. SD =.299. * Least significant digit. t NCDF = normal cmlative distribtion fnction. t EDF = empirical distribtion fnction. 35 87 0.23 0.255 0.307 0.483 0.54 0.544 0.574 0.604 0.662 0.789 0.999 59 8 76 0.235 0.294 0.353 0.588 0.647 0.706 0.824 0.882 0.94.000 69 54 3 30 74 0.3 3 0.29 0.2 52 6. Kolmogorov A: Confidence limits for an nknown distribtion fnction. Ann Math Statist 2:46-463, 94 7. Lilliefors HW: On the Kolmogorov-Smirnov test for normality with mean and variance nknown. J Am Statist Assoc 62: 399-402, 967 8. Natrella MG: Experimental Statistics. National Brea of Standards Handbook 9. Washington, D. C, U. S. Government Printing Office, 963, Table A-l, Cmlative normal distribtion vales of P, p T2 9. Shapiro SS, Wilk MB, Chen HJ: A comparative stdy of varios tests for normality. J Am Statist Assoc 63:343-372, 968. Westgard JO, de Vos DJ, Hnt MR, et al: Concepts and practices in the evalation of clinical chemistry methods. Part. Statistics. Part IV. Decisions on acceptability. Am J Med Technol 44:552-570, 727-742, 978. W GT, Twomey SL, Thiers RE: Statistical evalation of method comparison data. Clin Chem 2:35-3, 975 APPENDIX Implementation of the Modified Lilliefors Test Arrange the data in ascending order. Calclate the mean (X) and the standard deviation (SD). Let the vale of the maximm vertical distance be zero. Start at the beginning of the test reslts and move seqentially, stopping jst before each different test reslt to do steps -4:. rmalize the present test vale (X,) by sbtracting the mean and dividing by the SD. The reslt is: One other major consideration in the statistical approach is whether the jdgment of performance shold be based on statistical significance or clinical significance. Barnett 2 has long advocated the need for consideration of the clinical significance of laboratory reslts. The present application of nonparametric tests disregards this and considers only whether the observed differences are statistically significant. Small differences between methods may be tolerable, even when they are statistically significant. It is obvios that a systematic error or bias of mg/dl for a glcose method is of no concern, even thogh it may be statistically significant. Jdgments of the acceptability of a method's performance shold reqire that estimates of errors be statistically reliable bt that the acceptability of the error be jdged relative to the clinical demands on the test. The present recommendations for the se of nonparametric tests do not take sch elements into accont. Gidelines for proper application are lacking, and decisions abot a method's performance are likely to be more confsing, rather than more objective. References. Arthr GL, Rawnsley HM: Statistical Analysis of Method Comparison Stdies. Advanced Clinical Chemistry Check Sample. ACC-23. CCE Concil on Clinical Chemistry, American Society of Clinical Pathologists, 977 2. Barnett RN: Medical significance of laboratory reslts. Am J Clin Pathol :67-676, 968 3. Bradley JV: Distribtion-Free Statistical Tests. Englewood Cliffs, N. J., Prentice-Hall, Inc., 968, pp 302-303 4. Conover WJ: A Kolmogorov goodness-of-fit test for discontinos distribtions. J Am Statist Assoc 67:59-596, 972 5. Conover WJ: Practical nparametric Statistics. New York, John Wiley and Sons, 97, pp 30-306, p 398 (Table 5) X, - X SD 2. Add to this qantity half of the interval between two consective normalized test reslts. This step corresponds to adding half of the least significant digit (LSD) divided by the SD and gives: X, - X 0.5 LSD + SD SD With the se of statistical tables, look p the normal cmlative distribtion for this qantity. 3. Calclate the vale of the empirical distribtion fnction for the present test reslt. This is simply the position of the crrent test vale divided by the total nmber of points. 4. Calclate the absolte difference between the normal cmlative distribtion and the empirical distribtion fnctions. If the maximm vertical distance is less than the new absolte vale, reassign its vale to the vale of the new absolte distance. When all the reslts have been processed, compare the derived maximm absolte distance with the appropriate Lilliefors test statistic. Whenever the statistic is exceeded, the poplation can be said to be non-normally distribted. Use of the modified Lilliefors test is illstrated in Table 3, the initial data being differences of patients comparison reslts measred by two different analytic methods. The least significant digit is the nmber to which the data have been ronded, which is for these data. For n = 7 and a =.5 the Lilliefors test statistic is 0.25. The maximm absolte distance is 0.2, fond at X, = 0.5, i = 5. Becase this exceeds the Lilliefors test statistic, the data may be classified as non-normal at the a = 5 significance level. on 02 March 8