A Clinical Evaluation of Various Delta Check Methods

Similar documents
Delta Check Calculation Guide

Rate and Delta Checks Compared for Selected ChemistryTests, $1 and Donald P Connally2

Assessment of "Average of Normals" Quality Control Procedures and Guidelines for Implementation

IAASB Main Agenda (September 2005) Page Agenda Item. Analysis of ISA 330 and Mapping Document

The antihypertensive and diuretic effects of amiloride and. of its combination with hydrochlorothiazide

Sawtooth Software. The Number of Levels Effect in Conjoint: Where Does It Come From and Can It Be Eliminated? RESEARCH PAPER SERIES

Confidence Intervals On Subsets May Be Misleading

Diabetic Ketoacidosis

What is a PET? Although there are many types of pets, we will be discussing the Peritoneal Equilibration Test

System accuracy evaluation of FORA Test N Go Blood Glucose Monitoring System versus YSI 2300 STAT Plus glucose analyzer following ISO 15197:2013

Basis for Conclusions: ISA 230 (Redrafted), Audit Documentation

Empirical Formula for Creating Error Bars for the Method of Paired Comparison

Remarks on Bayesian Control Charts

About Reading Scientific Studies

AUDIT SAMPLING CHAPTER LEARNING OUTCOMES CHAPTER OVERVIEW

CME/SAM. An Examination of the Usefulness of Repeat Testing Practices in a Large Hospital Clinical Chemistry Laboratory

on both components of conc Fl Fl schedules, c and a were again less than 1.0. FI schedule when these were arranged concurrently.

Attentional Theory Is a Viable Explanation of the Inverse Base Rate Effect: A Reply to Winman, Wennerholm, and Juslin (2003)

International Standard on Auditing (UK) 530

CAP Laboratory Improvement Programs. Utility of Repeat Testing of Critical Values. A Q-Probes Analysis of 86 Clinical Laboratories

SRI LANKA AUDITING STANDARD 530 AUDIT SAMPLING CONTENTS

Basic Biostatistics. Dr. Kiran Chaudhary Dr. Mina Chandra

January Testing Delays and Spurious Results Caused by Improper Specimen Collection

CHAPTER NINE DATA ANALYSIS / EVALUATING QUALITY (VALIDITY) OF BETWEEN GROUP EXPERIMENTS

European Federation of Statisticians in the Pharmaceutical Industry (EFSPI)

System accuracy evaluation of FORA Test N Go Blood Glucose Monitoring System versus YSI 2300 STAT Plus glucose analyzer following ISO 15197:2013

Revised Cochrane risk of bias tool for randomized trials (RoB 2.0) Additional considerations for cross-over trials

A Comparison of Several Goodness-of-Fit Statistics

Nondestructive Inspection and Testing Section SA-ALC/MAQCN Kelly AFB, TX

Statistical Techniques. Masoud Mansoury and Anas Abulfaraj

Essential Skills for Evidence-based Practice: Statistics for Therapy Questions

Determination of Delay in :flirn Around Time (TAT) of Stat Tests and its Causes: an AKUH Experience

IAASB Main Agenda (February 2007) Page Agenda Item PROPOSED INTERNATIONAL STANDARD ON AUDITING 530 (REDRAFTED)

About the Article. About the Article. Jane N. Buchwald Viewpoint Editor

Enumerative and Analytic Studies. Description versus prediction

CAP Laboratory Improvement Programs. Delta Check Practices and Outcomes

An update on the analysis of agreement for orthodontic indices

Paracetamol Overdose Clinical Audit

RAG Rating Indicator Values

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2

Louis Leon Thurstone in Monte Carlo: Creating Error Bars for the Method of Paired Comparison

CCM6+7+ Unit 12 Data Collection and Analysis

The Leeds Reliable Change Indicator

AP STATISTICS 2008 SCORING GUIDELINES (Form B)

The Scientific Method

MODEL SELECTION STRATEGIES. Tony Panzarella

Does momentary accessibility influence metacomprehension judgments? The influence of study judgment lags on accessibility effects

Objectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests

Assurance Engagements Other than Audits or Review of Historical Financial Statements

Selecting and Interpreting Lab Results -

Evaluation Models STUDIES OF DIAGNOSTIC EFFICIENCY

Phlebotomy Top Gun Order of Draw: Do We Still Care?

Framework for Comparative Research on Relational Information Displays

Functional Assessment Observation Form

Statistical Power Sampling Design and sample Size Determination

Seeing is not Believing (13-Nov-2004)

Insight Assessment Measuring Thinking Worldwide

Chapter 11. Experimental Design: One-Way Independent Samples Design

Improved IPGM: Demonstrating the Value to both Patients and Hospitals

Chapter 2 End-Stage Renal Disease: Scope and Trends

WELCOME! Lecture 11 Thommy Perlinger

INTERNATIONAL STANDARD ON ASSURANCE ENGAGEMENTS 3000 ASSURANCE ENGAGEMENTS OTHER THAN AUDITS OR REVIEWS OF HISTORICAL FINANCIAL INFORMATION CONTENTS

The RoB 2.0 tool (individually randomized, cross-over trials)

GRADE. Grading of Recommendations Assessment, Development and Evaluation. British Association of Dermatologists April 2018

Equivalent Accuracy Evaluation of FORA Premium V10 Blood Glucose Monitoring System as Compared to Fora V30 Blood Glucose Monitoring System

An introduction to power and sample size estimation

Biochemical investigations in clinical medicine

Online Annexes (2-4)

Critical Review Form Diagnostic Test

observational studies Descriptive studies

4.5. How to test - testing strategy HBV Decision-making tables PICO 3

Article from. Forecasting and Futurism. Month Year July 2015 Issue Number 11

ProScript User Guide. Pharmacy Access Medicines Manager

Context of Best Subset Regression

Student Performance Q&A:

Investigating the robustness of the nonparametric Levene test with more than two groups

Computerized Mastery Testing

Setting The setting was primary care. The economic study was carried out in the United Kingdom.

GRADE. Grading of Recommendations Assessment, Development and Evaluation. British Association of Dermatologists April 2014

ACID/BASE. A. What is her acid-base disorder, what is her anion gap, and what is the likely cause?

FAQs about upcoming OPTN policy changes related to blood type determination, reporting, and verification.

Evaluation of VACUETTE CAT Serum Fast Separator Blood Collection Tube for Routine Chemistry Analytes in Comparison to VACUTAINER RST Tube

INDIVIDUAL STUDY TABLE REFERRING TO PART OF THE DOSSIER Volume: Page:

CHAPTER ONE CORRELATION

Statistics S3 Advanced Level

Breast Cancer Screening: Improved Readings With Computers

Sanjay P. Zodpey Clinical Epidemiology Unit, Department of Preventive and Social Medicine, Government Medical College, Nagpur, Maharashtra, India.

What Is Science? Lesson Overview. Lesson Overview. 1.1 What Is Science?

TB Clinical Guidelines: Revision Highlights March 2014

Asian Journal of Phytomedicine and Clinical Research Journal home page:

INTERPRETATION OF STUDY FINDINGS: PART I. Julie E. Buring, ScD Harvard School of Public Health Boston, MA

for Medical Research. (Received May 10th, 1922.)

Appendix G: Methodology checklist: the QUADAS tool for studies of diagnostic test accuracy 1

Below, we included the point-to-point response to the comments of both reviewers.

AP STATISTICS 2008 SCORING GUIDELINES (Form B)

10.2 Summary of the Votes and Considerations for Policy

Multiple Comparisons and the Known or Potential Error Rate

Why Tests Don t Pass

CME/SAM. Enhanced Creatinine and Estimated Glomerular Filtration Rate Reporting to Facilitate Detection of Acute Kidney Injury

Jackknife-based method for measuring LRP onset latency differences

Transcription:

CLIN. CHEM. 27/1, 5-9 (1981) A Clinical Evaluation of Various Delta Check Methods Lawrence A. Wheeler1 and Lewis B. Sheiner2 To evaluate the performance of delta check techniques, we analyzed 707 unselected pairs of continuous-flow test results, using three different delta check methods. If any of the test results (plus the urea nitrogen/creatinine ratio and the anion gap) failed one of the checks, the reason for the failure was sought by examining subsequent test results, retesting specimens, and (or) reviewing the patient s chart. Each delta check failure was accordingly classified as a true or false positive. The percentage of positives we judged to be true positives ranged from 5 to 29%. Each of the three methods had test types with low and high percentages of true positives. We conclude that with the delta check methods one can detect errors otherwise overlooked, but at the cost of investigating many false positives, because, in the population we studied, disease processes or therapy often caused large changes in a series of test results for a patient. The concept of using prior test results from a patient to determine whether a newly obtained test result is likely to be in error ( delta checks ) is very attractive (1-4). First, it is a direct approach in which the test results of interest are evaluated rather than an indirect method such as traditional quality-control techniques. The latter methods only evaluate the performance of the test procedure on a quality-control specimen. Errors in specimen identification, test performance, and test-result reporting for the clinical specimen cannot be detected. Second, the magnitude of the delta can be so chosen that the delta check will always fail when the change, if it is not artifactual, is clinically important. This process will alert laboratory personnel to test results that, if incorrect, could result in inappropriate therapy. If an effective procedure is implemented to follow up delta check failures, two important advantages are realized. Many or all (depending on the extent of the follow-up procedure) incorrect test results (where the error has resulted in a test result that differs significantly from a previous value) can be detected and not released. In addition, those test results that represent actual clinically important changes and pass the follow-up procedure for laboratory delta check failure can be so indicated on the test-results report, thus alerting the clinician to clinically important changes and increasing his or her confidence that these changes are not simply laboratory errors. This should eliminate some unnecessary retesting and allow appropriate clinical steps to be taken more promptly. These potential benefits of delta check techniques have prompted proposals for their adoption by many groupsincluding the College of American Pathologists, which in their Inspection and Accreditation Program specifies the absence of delta check techniques to be a Phase I (minor) deficiency for laboratories with laboratory computer systems. Unfortu- Department of Pathology, Indiana University, Indianapolis, IN 46223. 2 Department of Medicine, Division of Clinical Pharmacology, and Department of Laboratory Medicine, University of California, San Francisco, CA 94143. Received July 14, 1980; accepted Sept. 5, 1980. nately, while the concept of delta checking has been accepted, no clinical trial has tested the relative effectiveness of the delta check methods that have been proposed for clinical chemistry tests. Our purpose was to evaluate the three currently proposed delta check procedures (1-3) that are applicable to some or all of the SMA 6 continuous-flow analysis results. This evaluation involved using subsequent test results, repeat determinations, and chart review to classify delta check failures into true positives (errors made in specimen identification, test performance, or test result reporting) and false positives (changes ascribable to physiological responses to disease or therapy). Materials and Methods The test results used in this study were collected with the clinical laboratory Community Health Computing Laboratory computer system of the California Medical Center. For five consecutive days (Monday-Friday) all of the SMA 6 (Technicon Instruments Corp., Tarrytown, NY 10591) tests done during the morning hours were evaluated by using the delta check methods of Ladenson (1), Whitehurst et al. (2), and Wheeler and Sheiner (3). The Wheeler-Sheiner method uses points on two probability density functions of delta values. One probability density function was obtained when the two results used to form the delta were 0.9 to 1.5 days apart and the other when they were 1.5 to 2.5 days apart. To allow this method to be evaluated, we included in the study only thpse SMA 6 results for which another set of SMA 6 results were obtained 0.9 to 2.5 days previously. In addition, the set of values from the probability density function that corresponded to the 0.05 and 0.95 points were used in the study (i.e., nominally 10% of any particular group test results could be expected to fail the Wheeler-Sheiner delta check). We designed an algorithm to classify each test result that failed a delta check as a true or false positive. A test result that fails a delta check because of an actual change in the patient s analyte value was defined to be a false positive. A positive that was due to any other reason was defined as a true positive (i.e., an error was made in identifying the specimen in the patient-care area, a specimen-identification error had occurred in the laboratory, the SMA 6 had malfunctioned, or the test result was incorrectly entered into the laboratory computer). te that this algorithm operates at the individual test (e.g., Na+) level. If two or more test results from a specimen failed delta checks, each was independently classified as a true or false positive. We believed that having laboratory personnel immediately collect another specimen and re-run it on the SMA 6 would be the best approach to determine whether or not a test result that failed a delta check was an error, although in some cases even this process might not yield definitive evidence. The first step in the algorithm (see Figure 1) is an approximation of this method. In the discussion that follows, the previous test result that was compared to the current test result will be designated TR1; the current test result will be TR2; and a subsequent test result, obtained within 24 h, TR3. If a TR3 was available, an arbitrary rule was used to judge whether it indicated that TR2 represented an actual change in the patient s serum, not an error: if TR3 was nearer to TR2 CLINICAL CHEMISTRY, Vol. 27,. 1. 1981 5

UN Tests Failing The iv 103 Obtained Yes False Positive Wheeler-Shelner Method (27) liv False Positives,True Positive True Positives False Positive judgenmnt False Positives Fig. 1. Delta check evaluation algorithm than to TR1, TR2 was considered to be correct and the delta check failure was specified to be a false positive. If the TR3 was closer to TR1 than to TR2, or if a TR3 was not obtained, the second step in the algorithm was carried out. This step included performing a repeat SMA 6 determination on the TR2 specimen (when there was sufficient specimen). The results of this determination is designated TR2R. When the difference between TR2 and TR2R exceeded three times the standard deviation of the corresponding test method, a laboratory test-performance error was judged to have occurred and the delta check failure was designated a true positive. If TR2R validated TR2 or if sufficient specimen was not available to perform a repeat determination, the third step of the algorithm was performed. The third step of the algorithm was to review the patient s chart. This review focused on a search for the etiology of the delta. Examples of some clinical situations that we accepted as being the cause of large changes in SMA 6 test results are: 1. Renal dialysis done between the time the two specimens were drawn. 2. Potassium supplementation given (as a reason for a increase). 3. Recent renal transplant as a reason for decreasing urea nitrogen and creatinine (normal pattern) or increasing urea nitrogen and creatinine (rejection). 4. Intravenous therapy with an electrolyte-containing fluid. Because we investigated only those test results that failed one or more of the delta check methods, we cannot divide the test results that did not fail a particular delta check into true and false negatives with total accuracy. However, in several cases a test result that did not fail one method did fail one or both of the other methods and was therefore investigated, so we can assign these delta check failures to be true or false negatives. Further, if it is assumed that all medically significant changes in test result values will be detected by one of the three methods, true and false negative percentages can be computed for all the test result delta check combinations except urea nitrogen/creatinine ratio and anion gap, which were evaluated only by the Wheeler-Sheiner method. This assumption yields a lower bound for the false-negative percentage rates, because two classes of undetected test result errors will not be included. The first is the small error (e.g., reporting a K result that was actually determined to be 3.2 mmolfl as 3.4 mmol/l). This clearly is an error, but it should made True Positive (I) Fig. 2. Delta check evaluation algorithm results for the Wheeler-Sheiner delta check method applied to urea nitrogen test results have no impact on patient case. Another example of this type of error would be a switch of labels on specimens from two patients with normal values for electrolytes, urea nitrogen, and creatinine. All the results would be in error, but they might not differ sufficiently to trigger a delta check failure. The second type of test result error that would not be included in the false-negative calculation is one in which the true test result differs greatly from the previous test result, but the erroneous test result is nearly the same. For example, yesterday s K+ result was 4.0 mmol/l and the true value for today s K+ value is 6.0 mmol/l, but an error is made such that a value of 3.9 mmol/l is reported. Delta check methods are by definition unable to detect this type of error. Despite the above problems, we believe that method performance estimates based on this assumption are useful, because small errors are not of clinical importance and the second type of error should be relatively rare. Results A total of 707 sets of SMA 6 results (including the urea nitrogen/creatinine ratio and anion gap) were included in the study. Of these, 253 had an SMA 6 analysis performed within 0.9 to 2.5 days previously and therefore satisfied the criterion for delta check evaluation. Of these 253 sets of SMA 6 results, 150 (59%) had at least one test result that failed one or more of the three delta check methods. The algorithm described in Figure 1 was followed in all the cases. Figure 2 presents the results of this process for urea nitrogen test results that failed the Wheeler-Sheiner delta check. A total of 27 test results failed the Wheeler-Sheiner delta check. Of these, 17 were judged to be false positives because the result of a determination made on a specimen on the next day (TR3) was nearer to TR2 than TR1. Three results had TR3 values nearer to TR1 than TR2, and in seven cases a urea nitrogen determination was not done the next day. The 10 specimens in the two latter groups were candidates for repeat tests (i.e., obtaining TR2R values). In one case the repeat determination was not done. The reason repeat determinations were not done was not recorded for each case; however, the reason was usually that an insufficient volume of specimen was available. In two cases the TR2R value exceeded three test-method standard deviations from TR2. In these cases the repeat value was judged to in- 6 CLINICAL CHEMISTRY, Vol. 27,. 1, 1981

dicate that the TR2 value was in error and therefore these two cases were deemed to represent true positives (i.e., an error had taken place in the performance or reporting of the test yielding TR2). In seven cases the absolute magnitude of the difference between TR2 and TR2R was less than or equal to three test-method standard deviations. In the eight cases for which a judgment could not be made based on repeat test values, the third step in the algorithm (chart review) was performed, if possible. In one case the chart was not reviewed, because the chart could not be located at the time. In six cases a clinical reason for the change in the test results was found on chart review. These six cases were judged to represent false positives-i.e., an actual physiological change had occurred in the patient. In the remaining case we could find no reason for the change in the chart and therefore this case was specified to be a true positive. Table 1 presents the performance of the three delta check methods in this study. The true and false positive results were obtained by use of the algorithm described and illustrated above. The judgment made column lists the number of test results of each type that failed one of the delta checks and that we were unable to assign to one of the other classes. This amounts to 13 of 193 (7%) of the Wheeler-Sheiner method values, nine of 91(9%) of the Ladenson method values, and seven of 141 (5%) of the Whitehurst method values. The Predictive value column presents data on the relative efficiency of the methods in practice. The values range from a low of 5% for K by the Whitehurst method to 29% for creatinine in the Wheeler-Sheiner method. The Error incidence rate column presents information on the (inferred) error rate for the test included in the study. The error rate ranged from 1.2% for Na to 4% for urea nitrogen. We examined the values of the deltas that we judged to be true or false positives, to see if different choices of the delta check limits would yield better performance. For example, it could have been the case that most or all of the deltas that were judged to be true positives were larger in magnitude than the false positives. We found that if the delta check limits were selected to be as large as possible in magnitude while still having all the true positives fail the delta check (e.g., for Na the three deltas which were judged to be true positives were 7, -11, and -11 mmol/l and therefore the Na delta check limits were selected to be -10 and 6 mmol/l), that the delta check performance was not greatly improved. The percentage of the deltas failing this delta check that corresponded to true positives ranged from 14% for K to 37% for creatinine. These results show that, for the population considered in this study, no adjustment in the delta check limits would have resulted in greatly reduced false-positive rates. An important but little-discussed version of the delta check is to combine results of delta checks of several separate tests performed on any single specimen, to determine whether a specimen identification error has occurred. Table 2 presents the data obtained in this study. Each array of combinations of true and false positives adds up to 150, the total number of specimens for which one or more test results failed one or more of the delta check methods. The entry under zero false-positives and zero true-positives represents the number of specimens that had no test results failing that delta check method. t surprisingly, the number of specimens in this cateogry is inversely related to the number of tests included in the delta check method. The Wheeler-Sheiner method has a total of 16 specimens with only true positives, five with both true and false positives, and 91 with only false positives. If a specimen is judged to be a true positive if at least one true positive is detected for it, then the percentage of positive specimens with true positives is 19% (21/112). This means that in this study approximately one fifth of the specimens that had one or more tests fail the Wheeler-Sheiner delta check were found to have at least one test result in error. By the Ladenson method the true positive rate was 16% (11/70), by the Whitehurst method 18% (16/ 90). Discussion We evaluated the performance of three delta check methods in clinical use by applying them to 707 sets of SMA 6 results and attempting to determine the etiology of each delta check failure (Figure 1). Examination of the column of Table 1 that gives the total number of tests failing the delta check tells us about the relative stringency of the delta check methods. In general, the three delta check methods prescribe a more complicated decision procedure than just being positive whenever a maximum difference between the current value and the most recent previous value is exceeded. For example, for the serum urea nitrogen (UN) test, the ranges used by the three methods are given below: Method Wheeler-Sheiner Delta failure values -7 or A> 3-11 or A> 8 A< -38or A> 39 UN1 - UN2I Ladenson I >0.5 UN1 I UN2 <10 10 UN2 59 UN2 >59 Whitehurst IUN1 UN2 25 - UN2I >0.5 IUN1 - UN2 >0.25 UN2 >25 I UN2 Clearly the Whitehurst rule is the most stringent and the Ladenson rule the least stringent. The Wheeler-Sheiner rule is more complex and falls within these two extremes. This property of the rules is reflected by the total number of UN tests failing each method (Ladenson 10, Wheeler-Sheiner 25, and Whitehurst 56). Next consider the true-positive and false-positive columns of Table 1. These values indicate the yield in the error detection of each of the test/delta check method combinations. The predictive value column of Table 1 gives the percentage of positives that we judged to be true positives. The values range from very low values for K by the Whitehurst method (5%), Na by the Wheeler-Sheiner method (6%), K by the Wheeler-Sheiner method (6%), and urea nitrogen by the Ladenson method (6%) to relatively high values for chloride by the Whitehurst method (20%), bicarbonate by the Wheeler-Sheiner method (20%), urea nitrogen/creatinine ratio by the Wheeler-Sheiner method (28%), and creatinine by the Wheeler-Sheiner method (29%). te that with even the best of these, more than 70% of the delta check failures are false positives. This result is a consequence of the fact that in the patient population to which these methods were applied (i.e., patients for whom two SMA 6 s were ordered within 2.5 days), large variations in these types of test results are commonly seen that are attributable to disease or therapy. These results indicate the price in follow-up of false positives that the laboratory will have to pay to detect errors, when it is possible to do so by using delta check methods. It is entirely possible that if our patient population had been composed of predominantly healthy people (e.g., marines taking periodic physical examinations) or of patients with relatively well-controlled diseases, (e.g., medicine clinic patients) the results would have been different. In such popu- CLINICAL CHEMISTRY, Vol. 27,. 1, 1981 7

Delta. tests Error check Test failing the True False False True ludgment Predictive incidence method type delta check positives positives negatives negatives made value, % rate, % TR2R reason TR3 Reason In Indicates error for delta In chart confirms TR2 chart for delta Wheeler- Na 18 0 13 4 2 233 0 6 1.2 Sheiner N 35 0 2 24 6 4 214 3 6 2.4 Cl 24 3 0 15 3 1 228 3 14 1.6 HC03 25 3 2 14 5 0 228 1 21 2.0 UN 27 2 17 6 7 219 1 12 4.0 Cr 23 6 0 10 5 1 229 2 29 2.8 UN/Cr 26 5 2 14 4 a 227 1 a a anion gap 15 1 0 8 4 a 238 2 a a Ladenson Na 14 1 1 10 2 1 238 0 14 1.2 K 49 1 5 26 10 0 204 7 12 2.4 UN 17 0 1 13 2 9 227 1 6 4.0 Cr 11 2 0 8 0 5 237 1 20 2.8 Whitehurst Na 18 1 2 11 4 0 235 0 17 1.2 K 22 0 1 15 4 5 226 2 5 2.4 CI- 10 1 1 5 3 2 241 0 20 1.6 HC03 17 2 1 10 3 2 234 1 19 2.0 UN 60 5 5 36 10 0 193 4 18 4.0 Cr 14 2 0 11 1 5 234 0 14 2.8 v False negatives could not be detected for these test results. Therefore these calculations CR, creatinine: UN, serum urea nitrogen. Table 1. Delta Check Methods: Performance Characteristics were not performed. lations the large changes in test values that we attributed to disease processes or therapy in the current study should be infrequent. Examination of the true-positive and false-negative columns of Table 1 indicates that relatively few test results of each type that we evaluated were judged to be in error. The error incidence rate column gives the exact values. The range of error incidence, from 1.2 to 4.0%, is somewhat higher than laboratorians would expect to find, and may be due to our algorithm overdiagnosing changes as being due to errors rather than to physiology. Table 2. Specimen Level True Positive (TP) and False Positive (FP) Results Wheeler-Sheiner method (3) Ladenson method(1) Whitehurst method (2) TP/specim.n One approach to utilizing delta checks in a computerized laboratory would be to have a message that the delta check has failed provided to the technologist at the time the test result is recorded. The technologist would check for transcription errors at that time. If the technologist could not find a reason for the delta check failure, the result should be referred to a pathologist for review. What happened next should depend on the pathologist s judgment of the potential adverse effect of the test result being in error. Follow-up actions could include repeating the test or calling the physician who ordered the test to determine if the patient s clinical condition or therapy provided an explanation for the large change in the test result value. If it is decided to release the test result, it should be marked with an appropriate symbol so that the clinician will be aware that it represents a large change, that it has been reviewed in FP/soeclmen 0 1 2 3 4 5 the laboratory by a pathologist, and that the review process 0 38 58 21 9 2 1 did not reveal that the test result was in error. 1 11 0 2 1 0 0 We can consider the performance of delta check methods 2 5 0 1 1 0 0 that require two or more test results to fail delta checks for the 3 0 0 0 0 0 0 specimen to fail a delta check by examining Table 2. Interestingly, many specimens had more than one test fail a delta check. By the Wheeler-Sheiner method, 43 (38%) of the specimens that failed one delta check also failed two or more. 0 80 47 12 0 0 0 The Whitehurst method had 43% (39/90) and the Ladenson 1 11 0 0 0 0 0 method had 17% (12/70). Consider a delta check rule that 2 0 0 0 0 0 0 would specify that a specimen be judged as failing a delta 3 o 0 0 0 0 0 check if two or more test results fail delta checks. Using this rule with the Wheeler-Sheiner method results, we find that 23% (10/43) of our specimens fail this specimen level delta check and include at least one test result delta check failure 0 60 51 13 7 2 0 that was a true positive. This is a slight improvement over the 1 10 1 3 0 0 0 rule that one test is needed to fail a delta check; however, this 2 2 0 0 0 0 0 rule missed 52% (11/21) of the specimens that had at least one 3 1 0 0 0 0 0 true positive. The number of specimens that would have to be evaluated has been reduced from 112 to 43; on the other 8 CLINICAL CHEMISTRY, Vol. 27,. 1, 1981

hand, missing 11 of 21 specimens with erroneous test results is clearly undesirable. In a recent paper (4) we evaluated the performance of delta check methods by using a simulation approach and found on using the Wheeler-Sheiner method as discussed above (i.e., a specimen fails if two or more of the eight test results fail delta checks) that the true.positive rate was 84% and the falsepositive rate was 20%. Therefore, if the specimen errors in this study were of the same type studied previously (i.e., mislabeled specimens) we would expect 44 specimens to have failed the at least two of eight delta check (18 true positives and 26 false positives). In fact 43 specimens failed (10 true positives and 33 false positives), most likely owing to the fact that many errors other than specimen mislabeling can occur. In general, specimen mislabeling causes all the test results to be in error, while other error sources, such as machine malfunction and test result transcription errors, tend to cause only one to be in error. Therefore the latter errors will not be detected by a delta check method that requires two or more test results to fail individual delta checks. te that 11 specimens with one true positive and no false positives are missed by using this rule. These specimens probably represent cases of the second type of error. In summary, we show that the three delta check methods as applied to individual tests can detect erroneous test results, but unfortunately deltas of similar magnitude occurred as a result of disease or therapy two to 15 times as often as those due to errors (for this group of tests and this patient population). This result limits the efficiency of delta check methods in this setting, because the major effort in delta check failure follow-up would be spent on false positives. Nevertheless, we believe that delta check methods can serve a useful role. First, they detect errors that escape with standard quality control techniques-and of course a primary goal of a clinical laboratory is to eliminate erroneous results. Second, flagging test results that fail the delta but then pass the laboratory s review procedure should increase the clipician s confidence in these results and decrease unnecessary repeat tests. References 1. Ladenson, J. H., Patients as their own controls: Use of the computer to identify laboratory error. Clin. Chem. 21, 1648-1653 (1975). 2. Whitehurst, P., DeSilvio, T. V., and Boyadjian, G., Evaluation of discrepancies in patients results-an aspect of computer-assisted quality control. Clin. Chem. 21,87-92 (1975). 3. Wheeler, L. A., and Sheiner, L. B., Delta check tables for the Technicon SMA 6 continuous-flow analyzer. Clin. Chern. 23, 216-219 (1977). 4. Sheiner, L. B., Wheeler, L. A., and Moore, J. K., The performance of delta check methods. Gun. Chem. 25, 2034-2037 (1979). CLINICAL CHEMISTRY, Vol. 27,. 1, 1981 9