CLIN. CHEM. 27/1, 5-9 (1981) A Clinical Evaluation of Various Delta Check Methods Lawrence A. Wheeler1 and Lewis B. Sheiner2 To evaluate the performance of delta check techniques, we analyzed 707 unselected pairs of continuous-flow test results, using three different delta check methods. If any of the test results (plus the urea nitrogen/creatinine ratio and the anion gap) failed one of the checks, the reason for the failure was sought by examining subsequent test results, retesting specimens, and (or) reviewing the patient s chart. Each delta check failure was accordingly classified as a true or false positive. The percentage of positives we judged to be true positives ranged from 5 to 29%. Each of the three methods had test types with low and high percentages of true positives. We conclude that with the delta check methods one can detect errors otherwise overlooked, but at the cost of investigating many false positives, because, in the population we studied, disease processes or therapy often caused large changes in a series of test results for a patient. The concept of using prior test results from a patient to determine whether a newly obtained test result is likely to be in error ( delta checks ) is very attractive (1-4). First, it is a direct approach in which the test results of interest are evaluated rather than an indirect method such as traditional quality-control techniques. The latter methods only evaluate the performance of the test procedure on a quality-control specimen. Errors in specimen identification, test performance, and test-result reporting for the clinical specimen cannot be detected. Second, the magnitude of the delta can be so chosen that the delta check will always fail when the change, if it is not artifactual, is clinically important. This process will alert laboratory personnel to test results that, if incorrect, could result in inappropriate therapy. If an effective procedure is implemented to follow up delta check failures, two important advantages are realized. Many or all (depending on the extent of the follow-up procedure) incorrect test results (where the error has resulted in a test result that differs significantly from a previous value) can be detected and not released. In addition, those test results that represent actual clinically important changes and pass the follow-up procedure for laboratory delta check failure can be so indicated on the test-results report, thus alerting the clinician to clinically important changes and increasing his or her confidence that these changes are not simply laboratory errors. This should eliminate some unnecessary retesting and allow appropriate clinical steps to be taken more promptly. These potential benefits of delta check techniques have prompted proposals for their adoption by many groupsincluding the College of American Pathologists, which in their Inspection and Accreditation Program specifies the absence of delta check techniques to be a Phase I (minor) deficiency for laboratories with laboratory computer systems. Unfortu- Department of Pathology, Indiana University, Indianapolis, IN 46223. 2 Department of Medicine, Division of Clinical Pharmacology, and Department of Laboratory Medicine, University of California, San Francisco, CA 94143. Received July 14, 1980; accepted Sept. 5, 1980. nately, while the concept of delta checking has been accepted, no clinical trial has tested the relative effectiveness of the delta check methods that have been proposed for clinical chemistry tests. Our purpose was to evaluate the three currently proposed delta check procedures (1-3) that are applicable to some or all of the SMA 6 continuous-flow analysis results. This evaluation involved using subsequent test results, repeat determinations, and chart review to classify delta check failures into true positives (errors made in specimen identification, test performance, or test result reporting) and false positives (changes ascribable to physiological responses to disease or therapy). Materials and Methods The test results used in this study were collected with the clinical laboratory Community Health Computing Laboratory computer system of the California Medical Center. For five consecutive days (Monday-Friday) all of the SMA 6 (Technicon Instruments Corp., Tarrytown, NY 10591) tests done during the morning hours were evaluated by using the delta check methods of Ladenson (1), Whitehurst et al. (2), and Wheeler and Sheiner (3). The Wheeler-Sheiner method uses points on two probability density functions of delta values. One probability density function was obtained when the two results used to form the delta were 0.9 to 1.5 days apart and the other when they were 1.5 to 2.5 days apart. To allow this method to be evaluated, we included in the study only thpse SMA 6 results for which another set of SMA 6 results were obtained 0.9 to 2.5 days previously. In addition, the set of values from the probability density function that corresponded to the 0.05 and 0.95 points were used in the study (i.e., nominally 10% of any particular group test results could be expected to fail the Wheeler-Sheiner delta check). We designed an algorithm to classify each test result that failed a delta check as a true or false positive. A test result that fails a delta check because of an actual change in the patient s analyte value was defined to be a false positive. A positive that was due to any other reason was defined as a true positive (i.e., an error was made in identifying the specimen in the patient-care area, a specimen-identification error had occurred in the laboratory, the SMA 6 had malfunctioned, or the test result was incorrectly entered into the laboratory computer). te that this algorithm operates at the individual test (e.g., Na+) level. If two or more test results from a specimen failed delta checks, each was independently classified as a true or false positive. We believed that having laboratory personnel immediately collect another specimen and re-run it on the SMA 6 would be the best approach to determine whether or not a test result that failed a delta check was an error, although in some cases even this process might not yield definitive evidence. The first step in the algorithm (see Figure 1) is an approximation of this method. In the discussion that follows, the previous test result that was compared to the current test result will be designated TR1; the current test result will be TR2; and a subsequent test result, obtained within 24 h, TR3. If a TR3 was available, an arbitrary rule was used to judge whether it indicated that TR2 represented an actual change in the patient s serum, not an error: if TR3 was nearer to TR2 CLINICAL CHEMISTRY, Vol. 27,. 1. 1981 5
UN Tests Failing The iv 103 Obtained Yes False Positive Wheeler-Shelner Method (27) liv False Positives,True Positive True Positives False Positive judgenmnt False Positives Fig. 1. Delta check evaluation algorithm than to TR1, TR2 was considered to be correct and the delta check failure was specified to be a false positive. If the TR3 was closer to TR1 than to TR2, or if a TR3 was not obtained, the second step in the algorithm was carried out. This step included performing a repeat SMA 6 determination on the TR2 specimen (when there was sufficient specimen). The results of this determination is designated TR2R. When the difference between TR2 and TR2R exceeded three times the standard deviation of the corresponding test method, a laboratory test-performance error was judged to have occurred and the delta check failure was designated a true positive. If TR2R validated TR2 or if sufficient specimen was not available to perform a repeat determination, the third step of the algorithm was performed. The third step of the algorithm was to review the patient s chart. This review focused on a search for the etiology of the delta. Examples of some clinical situations that we accepted as being the cause of large changes in SMA 6 test results are: 1. Renal dialysis done between the time the two specimens were drawn. 2. Potassium supplementation given (as a reason for a increase). 3. Recent renal transplant as a reason for decreasing urea nitrogen and creatinine (normal pattern) or increasing urea nitrogen and creatinine (rejection). 4. Intravenous therapy with an electrolyte-containing fluid. Because we investigated only those test results that failed one or more of the delta check methods, we cannot divide the test results that did not fail a particular delta check into true and false negatives with total accuracy. However, in several cases a test result that did not fail one method did fail one or both of the other methods and was therefore investigated, so we can assign these delta check failures to be true or false negatives. Further, if it is assumed that all medically significant changes in test result values will be detected by one of the three methods, true and false negative percentages can be computed for all the test result delta check combinations except urea nitrogen/creatinine ratio and anion gap, which were evaluated only by the Wheeler-Sheiner method. This assumption yields a lower bound for the false-negative percentage rates, because two classes of undetected test result errors will not be included. The first is the small error (e.g., reporting a K result that was actually determined to be 3.2 mmolfl as 3.4 mmol/l). This clearly is an error, but it should made True Positive (I) Fig. 2. Delta check evaluation algorithm results for the Wheeler-Sheiner delta check method applied to urea nitrogen test results have no impact on patient case. Another example of this type of error would be a switch of labels on specimens from two patients with normal values for electrolytes, urea nitrogen, and creatinine. All the results would be in error, but they might not differ sufficiently to trigger a delta check failure. The second type of test result error that would not be included in the false-negative calculation is one in which the true test result differs greatly from the previous test result, but the erroneous test result is nearly the same. For example, yesterday s K+ result was 4.0 mmol/l and the true value for today s K+ value is 6.0 mmol/l, but an error is made such that a value of 3.9 mmol/l is reported. Delta check methods are by definition unable to detect this type of error. Despite the above problems, we believe that method performance estimates based on this assumption are useful, because small errors are not of clinical importance and the second type of error should be relatively rare. Results A total of 707 sets of SMA 6 results (including the urea nitrogen/creatinine ratio and anion gap) were included in the study. Of these, 253 had an SMA 6 analysis performed within 0.9 to 2.5 days previously and therefore satisfied the criterion for delta check evaluation. Of these 253 sets of SMA 6 results, 150 (59%) had at least one test result that failed one or more of the three delta check methods. The algorithm described in Figure 1 was followed in all the cases. Figure 2 presents the results of this process for urea nitrogen test results that failed the Wheeler-Sheiner delta check. A total of 27 test results failed the Wheeler-Sheiner delta check. Of these, 17 were judged to be false positives because the result of a determination made on a specimen on the next day (TR3) was nearer to TR2 than TR1. Three results had TR3 values nearer to TR1 than TR2, and in seven cases a urea nitrogen determination was not done the next day. The 10 specimens in the two latter groups were candidates for repeat tests (i.e., obtaining TR2R values). In one case the repeat determination was not done. The reason repeat determinations were not done was not recorded for each case; however, the reason was usually that an insufficient volume of specimen was available. In two cases the TR2R value exceeded three test-method standard deviations from TR2. In these cases the repeat value was judged to in- 6 CLINICAL CHEMISTRY, Vol. 27,. 1, 1981
dicate that the TR2 value was in error and therefore these two cases were deemed to represent true positives (i.e., an error had taken place in the performance or reporting of the test yielding TR2). In seven cases the absolute magnitude of the difference between TR2 and TR2R was less than or equal to three test-method standard deviations. In the eight cases for which a judgment could not be made based on repeat test values, the third step in the algorithm (chart review) was performed, if possible. In one case the chart was not reviewed, because the chart could not be located at the time. In six cases a clinical reason for the change in the test results was found on chart review. These six cases were judged to represent false positives-i.e., an actual physiological change had occurred in the patient. In the remaining case we could find no reason for the change in the chart and therefore this case was specified to be a true positive. Table 1 presents the performance of the three delta check methods in this study. The true and false positive results were obtained by use of the algorithm described and illustrated above. The judgment made column lists the number of test results of each type that failed one of the delta checks and that we were unable to assign to one of the other classes. This amounts to 13 of 193 (7%) of the Wheeler-Sheiner method values, nine of 91(9%) of the Ladenson method values, and seven of 141 (5%) of the Whitehurst method values. The Predictive value column presents data on the relative efficiency of the methods in practice. The values range from a low of 5% for K by the Whitehurst method to 29% for creatinine in the Wheeler-Sheiner method. The Error incidence rate column presents information on the (inferred) error rate for the test included in the study. The error rate ranged from 1.2% for Na to 4% for urea nitrogen. We examined the values of the deltas that we judged to be true or false positives, to see if different choices of the delta check limits would yield better performance. For example, it could have been the case that most or all of the deltas that were judged to be true positives were larger in magnitude than the false positives. We found that if the delta check limits were selected to be as large as possible in magnitude while still having all the true positives fail the delta check (e.g., for Na the three deltas which were judged to be true positives were 7, -11, and -11 mmol/l and therefore the Na delta check limits were selected to be -10 and 6 mmol/l), that the delta check performance was not greatly improved. The percentage of the deltas failing this delta check that corresponded to true positives ranged from 14% for K to 37% for creatinine. These results show that, for the population considered in this study, no adjustment in the delta check limits would have resulted in greatly reduced false-positive rates. An important but little-discussed version of the delta check is to combine results of delta checks of several separate tests performed on any single specimen, to determine whether a specimen identification error has occurred. Table 2 presents the data obtained in this study. Each array of combinations of true and false positives adds up to 150, the total number of specimens for which one or more test results failed one or more of the delta check methods. The entry under zero false-positives and zero true-positives represents the number of specimens that had no test results failing that delta check method. t surprisingly, the number of specimens in this cateogry is inversely related to the number of tests included in the delta check method. The Wheeler-Sheiner method has a total of 16 specimens with only true positives, five with both true and false positives, and 91 with only false positives. If a specimen is judged to be a true positive if at least one true positive is detected for it, then the percentage of positive specimens with true positives is 19% (21/112). This means that in this study approximately one fifth of the specimens that had one or more tests fail the Wheeler-Sheiner delta check were found to have at least one test result in error. By the Ladenson method the true positive rate was 16% (11/70), by the Whitehurst method 18% (16/ 90). Discussion We evaluated the performance of three delta check methods in clinical use by applying them to 707 sets of SMA 6 results and attempting to determine the etiology of each delta check failure (Figure 1). Examination of the column of Table 1 that gives the total number of tests failing the delta check tells us about the relative stringency of the delta check methods. In general, the three delta check methods prescribe a more complicated decision procedure than just being positive whenever a maximum difference between the current value and the most recent previous value is exceeded. For example, for the serum urea nitrogen (UN) test, the ranges used by the three methods are given below: Method Wheeler-Sheiner Delta failure values -7 or A> 3-11 or A> 8 A< -38or A> 39 UN1 - UN2I Ladenson I >0.5 UN1 I UN2 <10 10 UN2 59 UN2 >59 Whitehurst IUN1 UN2 25 - UN2I >0.5 IUN1 - UN2 >0.25 UN2 >25 I UN2 Clearly the Whitehurst rule is the most stringent and the Ladenson rule the least stringent. The Wheeler-Sheiner rule is more complex and falls within these two extremes. This property of the rules is reflected by the total number of UN tests failing each method (Ladenson 10, Wheeler-Sheiner 25, and Whitehurst 56). Next consider the true-positive and false-positive columns of Table 1. These values indicate the yield in the error detection of each of the test/delta check method combinations. The predictive value column of Table 1 gives the percentage of positives that we judged to be true positives. The values range from very low values for K by the Whitehurst method (5%), Na by the Wheeler-Sheiner method (6%), K by the Wheeler-Sheiner method (6%), and urea nitrogen by the Ladenson method (6%) to relatively high values for chloride by the Whitehurst method (20%), bicarbonate by the Wheeler-Sheiner method (20%), urea nitrogen/creatinine ratio by the Wheeler-Sheiner method (28%), and creatinine by the Wheeler-Sheiner method (29%). te that with even the best of these, more than 70% of the delta check failures are false positives. This result is a consequence of the fact that in the patient population to which these methods were applied (i.e., patients for whom two SMA 6 s were ordered within 2.5 days), large variations in these types of test results are commonly seen that are attributable to disease or therapy. These results indicate the price in follow-up of false positives that the laboratory will have to pay to detect errors, when it is possible to do so by using delta check methods. It is entirely possible that if our patient population had been composed of predominantly healthy people (e.g., marines taking periodic physical examinations) or of patients with relatively well-controlled diseases, (e.g., medicine clinic patients) the results would have been different. In such popu- CLINICAL CHEMISTRY, Vol. 27,. 1, 1981 7
Delta. tests Error check Test failing the True False False True ludgment Predictive incidence method type delta check positives positives negatives negatives made value, % rate, % TR2R reason TR3 Reason In Indicates error for delta In chart confirms TR2 chart for delta Wheeler- Na 18 0 13 4 2 233 0 6 1.2 Sheiner N 35 0 2 24 6 4 214 3 6 2.4 Cl 24 3 0 15 3 1 228 3 14 1.6 HC03 25 3 2 14 5 0 228 1 21 2.0 UN 27 2 17 6 7 219 1 12 4.0 Cr 23 6 0 10 5 1 229 2 29 2.8 UN/Cr 26 5 2 14 4 a 227 1 a a anion gap 15 1 0 8 4 a 238 2 a a Ladenson Na 14 1 1 10 2 1 238 0 14 1.2 K 49 1 5 26 10 0 204 7 12 2.4 UN 17 0 1 13 2 9 227 1 6 4.0 Cr 11 2 0 8 0 5 237 1 20 2.8 Whitehurst Na 18 1 2 11 4 0 235 0 17 1.2 K 22 0 1 15 4 5 226 2 5 2.4 CI- 10 1 1 5 3 2 241 0 20 1.6 HC03 17 2 1 10 3 2 234 1 19 2.0 UN 60 5 5 36 10 0 193 4 18 4.0 Cr 14 2 0 11 1 5 234 0 14 2.8 v False negatives could not be detected for these test results. Therefore these calculations CR, creatinine: UN, serum urea nitrogen. Table 1. Delta Check Methods: Performance Characteristics were not performed. lations the large changes in test values that we attributed to disease processes or therapy in the current study should be infrequent. Examination of the true-positive and false-negative columns of Table 1 indicates that relatively few test results of each type that we evaluated were judged to be in error. The error incidence rate column gives the exact values. The range of error incidence, from 1.2 to 4.0%, is somewhat higher than laboratorians would expect to find, and may be due to our algorithm overdiagnosing changes as being due to errors rather than to physiology. Table 2. Specimen Level True Positive (TP) and False Positive (FP) Results Wheeler-Sheiner method (3) Ladenson method(1) Whitehurst method (2) TP/specim.n One approach to utilizing delta checks in a computerized laboratory would be to have a message that the delta check has failed provided to the technologist at the time the test result is recorded. The technologist would check for transcription errors at that time. If the technologist could not find a reason for the delta check failure, the result should be referred to a pathologist for review. What happened next should depend on the pathologist s judgment of the potential adverse effect of the test result being in error. Follow-up actions could include repeating the test or calling the physician who ordered the test to determine if the patient s clinical condition or therapy provided an explanation for the large change in the test result value. If it is decided to release the test result, it should be marked with an appropriate symbol so that the clinician will be aware that it represents a large change, that it has been reviewed in FP/soeclmen 0 1 2 3 4 5 the laboratory by a pathologist, and that the review process 0 38 58 21 9 2 1 did not reveal that the test result was in error. 1 11 0 2 1 0 0 We can consider the performance of delta check methods 2 5 0 1 1 0 0 that require two or more test results to fail delta checks for the 3 0 0 0 0 0 0 specimen to fail a delta check by examining Table 2. Interestingly, many specimens had more than one test fail a delta check. By the Wheeler-Sheiner method, 43 (38%) of the specimens that failed one delta check also failed two or more. 0 80 47 12 0 0 0 The Whitehurst method had 43% (39/90) and the Ladenson 1 11 0 0 0 0 0 method had 17% (12/70). Consider a delta check rule that 2 0 0 0 0 0 0 would specify that a specimen be judged as failing a delta 3 o 0 0 0 0 0 check if two or more test results fail delta checks. Using this rule with the Wheeler-Sheiner method results, we find that 23% (10/43) of our specimens fail this specimen level delta check and include at least one test result delta check failure 0 60 51 13 7 2 0 that was a true positive. This is a slight improvement over the 1 10 1 3 0 0 0 rule that one test is needed to fail a delta check; however, this 2 2 0 0 0 0 0 rule missed 52% (11/21) of the specimens that had at least one 3 1 0 0 0 0 0 true positive. The number of specimens that would have to be evaluated has been reduced from 112 to 43; on the other 8 CLINICAL CHEMISTRY, Vol. 27,. 1, 1981
hand, missing 11 of 21 specimens with erroneous test results is clearly undesirable. In a recent paper (4) we evaluated the performance of delta check methods by using a simulation approach and found on using the Wheeler-Sheiner method as discussed above (i.e., a specimen fails if two or more of the eight test results fail delta checks) that the true.positive rate was 84% and the falsepositive rate was 20%. Therefore, if the specimen errors in this study were of the same type studied previously (i.e., mislabeled specimens) we would expect 44 specimens to have failed the at least two of eight delta check (18 true positives and 26 false positives). In fact 43 specimens failed (10 true positives and 33 false positives), most likely owing to the fact that many errors other than specimen mislabeling can occur. In general, specimen mislabeling causes all the test results to be in error, while other error sources, such as machine malfunction and test result transcription errors, tend to cause only one to be in error. Therefore the latter errors will not be detected by a delta check method that requires two or more test results to fail individual delta checks. te that 11 specimens with one true positive and no false positives are missed by using this rule. These specimens probably represent cases of the second type of error. In summary, we show that the three delta check methods as applied to individual tests can detect erroneous test results, but unfortunately deltas of similar magnitude occurred as a result of disease or therapy two to 15 times as often as those due to errors (for this group of tests and this patient population). This result limits the efficiency of delta check methods in this setting, because the major effort in delta check failure follow-up would be spent on false positives. Nevertheless, we believe that delta check methods can serve a useful role. First, they detect errors that escape with standard quality control techniques-and of course a primary goal of a clinical laboratory is to eliminate erroneous results. Second, flagging test results that fail the delta but then pass the laboratory s review procedure should increase the clipician s confidence in these results and decrease unnecessary repeat tests. References 1. Ladenson, J. H., Patients as their own controls: Use of the computer to identify laboratory error. Clin. Chem. 21, 1648-1653 (1975). 2. Whitehurst, P., DeSilvio, T. V., and Boyadjian, G., Evaluation of discrepancies in patients results-an aspect of computer-assisted quality control. Clin. Chem. 21,87-92 (1975). 3. Wheeler, L. A., and Sheiner, L. B., Delta check tables for the Technicon SMA 6 continuous-flow analyzer. Clin. Chern. 23, 216-219 (1977). 4. Sheiner, L. B., Wheeler, L. A., and Moore, J. K., The performance of delta check methods. Gun. Chem. 25, 2034-2037 (1979). CLINICAL CHEMISTRY, Vol. 27,. 1, 1981 9