THE EMPEROR'S NEW CLOTHES: APPLICATION OF POLYGRAPH TESTS IN THE AMERICAN WORKPLACE

THE EMPEROR'S NEW CLOTHES: APPLICATION OF POLYGRAPH TESTS IN THE AMERICAN WORKPLACE CHARLES R. HONTS Psychology Department University of North Dakota Grand Forks, North Dakota 58202 In a sudden blinding flash the vain Emperor saw how he had been tricked and cheated. BuJ being a mighty Emperor, with purple blood in his veins, he held his head all the higher and walked on and on down that long street. The Emperor's New Clothes, Hans Christian Andersen ;' Most of the private'seelor uses of the polygraph in the United States wert eliminated by the Employee Polygraph Protection Act of 1988. However, polygraph use by the federal government continues to grow unabated. The government uses polygraph tests in criminal investigations and in national security screening. All uses are controver' sial, but the screening uses are particularly so. In national security screening, polygraph tests are used both in the hiring process and with current employees. Polygraph tests used in the hiring process are without empirical support. Polygraphers' claims of high ujility on the basis of development of information during interrogations are suspect because the information they develop has never been shown to be predictive of future behavior. Research and analyses conducted on the Department of Defense's Counterintelligence Scope Polygraph (CSP) Screening Program indicate that the polygraph tests used in that program are unable to discriminate truthtellersfrom deceivers. It appears that the CSP polygraph examinations correctly classify only abouj 2 % of the guilty subjects. Effictive countermeasures exacerbate this problem and may render the CSP Screening Program completely ineffective at detecting deception. Politically unpleasant changes that must involve calling a substantial number of innocent subjects deceptive are necessary if national security screening polygraphs are to be applied effectively. The author thanks Jobn Kircher, David Raskin, and William Yankee for their hdpful comments during the preparation of this anicle. Forensic Reports, 4:91-116, 1991 91 Copyright 1991 by Hemisphere Publishing Corporation

92 c. R. Honls Introduction Recendy, the polygraph profession had the unique distinction of having a substantial portion of its practice oudawed by the Congress of the United States with the Employee Polygraph Protection Act (EPPA) of 1988. Prior to the EPPA, an estimated 2 million polygraphs were given in the private sector each year (Goldzband, 1990). With the exception of the drug, security, and nuclear power industries, the EPPA oudawed virtually all of the employment screening uses of the polygraph in the private sector. Under the EPPA, investigative uses of the polygraph in the workplace are still permitted but only under restrictive conditions. The actual impact of the EPPA on the number of tests run per year is not yet known. Some authors speculate that many tests will continue to be conducted under the loopholes in the law (Shneour, 1990), whereas others believe that few tests will now be conducted in the private sector (Frierson, 1988). Polygraph use by all levels of government was unaffected by the EPPA, and the federal use of polygraph tests continues to increase dramatically (Department of Defense, 1986, 1987, 1988, 1989). The polygraph has always been controversial, and this controversy swirls at many levels. Some scientists have embraced the polygraph as being as "American as apple pie" (Barland, 1988), whereas others have vilified its use (Goldzband, 1990; Kleinmuntz & Szucko, 1982, 1984; Lykken, 1974, 1981, 1987; Shneour, 1990). Other scientists have taken the position that some kinds of polygraph tests may have a useful role to play in some situations (Honts, 1987; Raskin, 1989; Raskin & Kircher, 1987). The controversies on the use of the polygraph generally fall iil.to one of two realms. One of these realms concerns the balancing of the civil liberties of the individual in a free society against the rights of other individuals to protect themselves from thieves. There is also the concern of balancing the rights of the individual against the right of a government to protect itself from its enemies. At another level, considerable controversy surrounds the scientific efforts to estimate the accuracy of the many techniques that are conjoindy referred to as polygraph, or lie detector, tests. The balancing of civil liberties and protective rights is a formidable and complex task for legal and political scholars. I do not attempt it here. I do briefly describe the applications of polygraph techniques in the workplace and attempt to elucidate the scientific controversy con-

Polygraph Tests in the Ubrkplace 93 cerning the accuracy and utility of the polygraph. Finally, J present an analysis of the data currently available from one large national security polygraph screening program in the federal government. Polygraph Techniques Initially, it is important to note that there is no one polygraph technique. There are a great many polygraph techniques known by many names. However, all polygraph tests have certain characteristics in common. They all involve the measurement of physiological responses with an instrument while the subject is being asked a series of questions. Usually, measures are taken of respiration, palmar sweating, and cardiovascular activity. Additionally, all currently applied polygraph tests have a more or less structured interview during which the examiner psychologically prepares the subject for the test. Usually, the physiological data resulting from the actual testing are formally evaluated by the examiner. Substantial differences exist between the various tests in the nature of the pretest interview, in the structure of the question series, and in the evaluation of the physiological data. Generally, polygraph tests fall into two broad families of techniques. The first family of techniques is known as concealed injormo.tion tests. Examiners using these teehniques attempt to discover whether an individual has hidden information that only the guilty person would know. These techniques have been described in detail by Lykken (1981), and they have been studied extensively in the laboratory. It is generally accepted that these techniques can be highly accurate (Iacono & Patrick, 1988; Raskin, 1989). However, concealedinformation tests are almost never used in criminal or employment cases in the United States because most relevant case information is either revealed through the mass media or during interrogations associated with the investigation. This apparently is not the case in Japan (Yamamura & Miyata, 1990), where concealed-information tests are reported to be widely applied. However, the impact of concealed-information techniques on the United States' workforce is negligible, and I do not discuss it further. Most work-related testing in the United States is conducted with detection of deception tests. These tests infer deception by compar-

94 c. R. Honts ing the physiological responses to direct accusatory questions, usually referred to as relevant questions (i.e., Did you steal the money from the safe?), with some type of comparison question. Detection of deception techniques can be subdivided into two major families of tests depending on what type of question is used for comparison purposes. The older of these families of tests are known as relevant-irrelevant tests. Relevant-irrelevant tests ask a series of relevant questions that are compared with innocuous neutral or irrelevant questions that the person is assumed to be answering truthfully (i.e., Do people call you Joe?). The assumption underlying this technique is that when people lie, they producer larger physiological responses than when they tell the truth. Almost all of the scientists involved in detection of deception research reject the notion that the relevant-irrelevant test could be a useful discriminator of truth and deception (Iacono & Patrick, 1988; Lykken, 1981; Raskin, 1986, 1989). It does seem intuitively obvious that both innocent and guilty subjects will be able to recognize that the relevant questions are the important questions on the test. Given that, it is equally obvious that most individuals will respond physiologically to the relevant questions regardless of their guilt. In support of this notion, two recent experiments have demonstrated that the relevant-irrelevant test is no better than chance at discriminating the guilty from the innocent (Horowitz, 1989; Horvath, 1988). However, in spite of this logical and empirical rejection, the relevantirrelevant test still plays a major role in polygraph testing in the workplace, as I discuss in the following sections. The second family of detection of deception tests are known as control question tests (CQTs). The CQT was developed by Reid (1947) as an attempt to fix the obvious deficiencies in the relevantirrelevant test. The CQT quickly became the detection of deception test of choice in law enforcement, and it currendy has widespread. application in the workplace (Barland, 1988). The rationale or'the CQT predicts differential physiological responses between relevant and control questions presented to guilty and innocent individuals. As with the relevant-irrelevant test, it is assumed that the guilty subjects will respond strongly to relevant questions that are direct and deal with the issues of the examination. However, with the CQT, innocent subjects are expected to respond strongly to broad general control questions (i.e., Have you ever lied to someone who trusted

Polygraph Tests in the IMJrkplace 95 you?). These control questions are presented in such a manner that it is assumed that all subjects will be concerned about the veracity of their responses to them. Innocent individuals are expected to respond more to control than to relevant questions because they are sure of the veracity of their response to the relevant questions, and they are assumed to be either lying or at least uncertain about the veracity of their response to the control questions. Equal physiological responses to both the relevant and the control questions results in an inconclusive outcome. Applications of the Polygraph in the U.S. Workplace The polygraph is used in the workplace in three distinctly different ways. First, polygraph tests can be used as a tool to investigate individual acts of wrongdoing. Second, polygraph tests may be used as a screening tool when individuals apply for employment. Third, polygraph tests can be used as a screening tool with current employees. ITWestigative Forensic Applications Investigative polygraph tests follow a specific event. A limited number of suspects are tested about a limited number of specific acts. Usually, a CQT is used. This use of the polygraph is not much different from the general investigative use of the polygraph by law enforcement. Most of the scientific research conducted on the CQT has centered on this type of investigative forensic application of the polygraph. Unfortunately, little consensus has been reached on how accurate polygraph tests are in this, the most straightforward, of the polygraph's applications. Two recent reviews have reached different conclusions about the validity of investigative CQTs (Iacono & Patrick, 1988; Raskin, 1989). In general, the upper estimates for the validity of the CQT are approximately 90 % and the lower estimates are approximately 70 %. Most authors agree that the CQT makes more false positive errors (calling the innocent deceptive) than false negative errors (calling the guilty truthful) under normal circumstances (see the discussion of countermeasures later in this article). However, despite disagreements in the literature about the accuracy

96 C. R. Honls rates for polygraph tests, it is likely that most scientists would' find the investigative forensic uses of the polygraph to be the least controversial of the polygraph's applications. This use of the polygraph was the one application in the commercial workplace that was left essentially intact by the EPPA. Preemployment Screening The second use of the polygraph is as a preemployment screening device. In this application, individuals applying for a job are required to take a polygraph test regarding their truthfulness to the potential employer about their background. Depending on the situation, a number of life-style questions may be asked. These life-style questions may include items regarding the use and sale of drugs, theft from previous employers, criminal activities, and deviant sexual behaviors. This type of testing was eliminated in the commercial workplace by the EPPA, with the exception of a few exempted professions. However, preemployment polygraphs continue to play a prominent role in the hiring processes of several federal agencies. The exact numbers of preemployment examinations given by the federal government are not made public. However, the Office of Technology Assessment (OTA; 1983) estimated that the National Security Agency alone conducted 6,700 preemployment examinations in 1982. The preemployment screening uses of the polygraph have traditionally been one of the most controversial but least studied applications of the polygraph. The OTA stated that at that time there was insufficient evidence in the scientific literature to make any formal assessment of the actual accuracy of screening polygraphs. Since the OTA report, there has been no improvement in the situation; there still is insufficient evidence in the scientific literature to make any formal assessment of the accuracy or the utility of preemployment screening polygraph tests. Research evaluating preemploymeht polygraph examinations must eventually address two questions. The first question is the same question that is asked of the investigative forensic uses of the polygraph. That is, are the techniques being used valid detectors of deception? The answer to this question may well be no for preemployment polygraphs. The second question that must be addressed concerns predicting future behavior with a psychological test.

Polygraph Tests in the Ubrkplace 97 Wllidity of Preemployment Polygraph Tests. Preemployment polygraphs have traditionally used the relevant-irrelevant test as the technique of choice. As I noted earlier, the application of the relevant-irrelevant test in investigative forensic examinations has been almost universally rejected by both the proponents (Raskin,1986) and the opponents of forensic polygraph examinations (Lykken, 1981). As described earlier, recent laboratory research indicates that the relevant-irrelevant test is no better than chance in discriminating truthtellers from deceivers. Thus, any preemployment polygraph program that is based on the relevant-irrelevant technique is immediately suspect. Because the relevant-irrelevant test is held in such general disrepute, it must be assumed that the agencies, such as the National Security Agency (OTA, 1983), that continue the use of the relevant-irrelevant test do so for reasons other than for the detection of deception. Most probably, in the agencies that continue its use, the polygraph simply serves as a powerful psychological cudgel to aid their interrogations. However, programs that use the more generally accepted CQT for preemployment purposes may fare no better in terms of likely validity. As described earlier, the CQT compares relevant questions about the issues at hand with control questions about the individual's past. However, in a preemployment polygraph, the relevant questions take the form of control questions in an investigative forensic polygraph examination. For example, a typical preemployment relevant question might be, "Have you ever stolen froilil an employer?" However, this very same question might have been chosen as a control question on an investigative forensic examination of a theft on the assumption that the subject's no answer was a lie or at least the subject was uncertain of the veracity of his or her denial. The problem is obvious: Given that the areas covered by the questions of a preemployment examination are by nature broad and ambiguous, it seems likely that many individuals will respond to them regardless of whether they have committed serious transgressions. It seems likely that the false positive rates produced by such preemployment polygraph tests must be higher than those of investigative forensic polygraph examinations. This problem might be corrected by the application of the relatively new technique known as the dirccted lie control technique (Fuse, 1982; Honts & Raskin, 1988; Horowitz, 1989). In this technique, known lies are used as controls, and it has been shown to

98 C. R. Honts reduce the nuinber of false positive outcomes as compared with the standard probable lie control test in both a laboratory (Horowitz, 1989) and a field study (Honts & Raskin, 1988). Unfortunately, there currently is little or no utilization of the directed lie control test in federal preemployment screening. This is particularly puzzling in light of the fact that the directed lie control question test has been demonstrated to be far more accurate than the relevant-irrelevant test (Horowitz, 1989), which is widely applied. Usefulness of Preemployment Confessions. Issues of validity aside, polygraph examiners maintain that they are able to obtain an im.{lressive number of confessions of past wrongdoing in response to interrogation of the relevant issues of preemployment examinations (Goldzband, 1990). However, the supposed utility of preemployment polygraph examinations on the basis of obtained confessions is also suspect on at least two counts. First, Lykken (1981) has persuasively argued that the individual who tries to be truthful during a preemployment examination and who, at the examiner's urging, bares all of his or her past wrongdoing, is the very individual who is most likely to be rejected by the preemployment screening process, whereas the individual who makes minor admissions and then dishonestly maintains his or her innocence is more likely to be given the benefit of the doubt and passed through. Thus, individuals who are trying to be honest are likely to be penalized for their truthfulness, whereas the very individuals who one would most like to reject with a life-style preemployment test (e.g., psychopaths) are actually the individuals most likely to pass because they are unlikely to confess and are better able to lie convincingly in interpersonal situations.. PredictingPuture Behavior. The second problem with utility arguments for the usefulness of preemployment polygraph tests transcends discussions about the validity of the various techniques. Even if these techniques are as good as or better at discriminating truthtellers from deceivers than investigative forensic examinations, a larger problem remains, a problem that concerns predicting future behavior. Are deception or confessions to issues covered by life-style preemployment polygraph examinations at all related to future behavior on the job? This is a difficult question that has yet to be addressed empirically. To adequately assess this question, longitudinal studies will be

Polygraph Tests in the UfJrkplace 99 necessary. Such studies could contrast work settings that ~ as similar as possible, except for their use of polygraph tests. In one setting, polygraph tests would be administered and used in employee selection. In the other setting, polygraph tests would be administered, but the results would not be used nor made available to anyone in the workplace. These will not be easy studies to conduct, and they raise many ethical and methodological questions. For example, consider the situation in which an individual is taking a preemployment polygraph examination for a law enforcement agency and admits to having been involved in undiscovered felonies. Would it be ethical to withhold that information from the agency and allow the individual to be hired? The methodological problems are also formidable. Adequate dependent measures may be difficult to develop, and to a great extel1t they will depend on the objectification of the goals of a preemployment polygraph program, something that seems to be lacking. Despite the difficulties, these studies will have to be conducted if one is to assess the validity and usefulness of preemployment polygraph examinations with anything other than myth and speculation. Periodic Screening The final major application of the polygraph in the workplace is for periodic screening. In the private sector, this type of screening was geared primarily toward finding out if current employees were involved in internal theft or if they were in some other way injuring their employers. In this use, employees were periodically tested in a "fishing expedition" approach, regardless of whether they were suspected of wrongdoing. It is clear that there were many abuses of individuals through this practice in the private sector. Some of those abuses resulted in civil actions against employers and polygraph examiners that often ended with substantial settlements for the plaintiffs (Raskin, 1981). This fishing expedition use of the polygraph was essentially outlawed in the private sector by the EPPA. However, the periodic testing of employees in the federal government continues and is increasing (Department of Defense, 1989). In the government, the focus of periodic testing is on espionage. Department of Defense employees and contractors with top secret special access clearances are all subject to periodic testing with the polygraph to determine if they are involved in espionage against the

100 C. R. Honts United States. This program may soon be expanded to include new agencies, a counternarcotics effort, and all people with top secret clearance (Department of Defense, 1989). The Central Intelligence Agency and the National Security Agency subject all of their employees to periodic testing. Both control question and relevant-irrelevant tests are used for periodic testing.. Tfllidity of Periodic &reening Examinations. Some evidence is available on the validity of federal periodic screening. Barland, Honts, and Barger (1989a) conducted an experiment on the validity of periodic screening techniques used by Army INSCOM, the Air Force Office of Special Investigations, the National Security Agency, and the Central Intelligence Agency. The 207 subjects in this experiment were all federal employees, 74% of whom had access to classified materials at some point in their government careers. Roughly half of the subjects either participated in an act of mock espionage or were given knowledge about acts of espionage. The mock espionage crimes were complex and involved real-world espionage trade craft. The subjects went to clandestine meetings with foreign-sounding agents in local bars, where they were required to use code words to initiate the contact. They stole or copied realistic mock classified documents and then made "dead drops" within the local community over a period of days. The subjects received money from their foreign contact for their espionage activities. These mock espionage and knowledge scenarios provided the kind of experiences that the national security screening programs were designed to uncover. The subjects in this experiment were told that under no circumstances were they to reveal their involvement in the experiment to anyone other than the experimenters. They were told to resist interrogation by the polygraph examiners as long as possible. In addition, they were told that if they made admissions about real-world security violations or espionage, they would be subject to actual criminal investigation and possible criminal charges. The examinations were conduced by federal polygraph examiners whose main duty was to conduct periodic national security screening tests for their agencies. They were told to conduct their examinations as they would in the field, including interrogations of subjects they believed to be deceptive. If the examiners felt that repeat testing was necessary, they were empowered to schedule and conduct as many tests as they felt necessary, just as they would in the

Polygraph Tests in the UfJrkplace 101 field. The examiners in this study were adept at obtaining information from the subjects. Admissions of real-world security violations were obtained from about 20% of the subjects. One agency obtained a confession rate of real-world wrongdoing of 32 %. Although most of the admissions were relatively insignificant (i.e., discussions of classified matters with a spouse), several of the confessions were serious enough that the confessions themselves were in turn classified. Unfortunately, the examiners were much less adept at detecting deception, as is shown in Table 1. Performance with innocent subjects was better than might have been expected on the basis of the previous research on investigative examinations. Four of the programmed innocent subjects who failed their examinations made confessions of real-world security violations that were covered by the relevant questions of their examinations, and so they should not be considered as errors. Excluding those 4 programmed innocent subjects who confessed and the 4 innocent inconclusive outcomes, 97% of the innocent subjects were correctly classified. However, excluding the 8 guilty inconclusive outcomes, only 34% of the guilty subjects were correctly classified. Although this represents better-than-chance performance, it is modest performance indeed, accounting for only about 10% of the variance in the guilt-innocence criterion. The examiners in many laboratory studies of investigative polygraph examinations have been able to account for 60% or more of the criterion variance (Kircher, Horowitz, & Raskin, 1988). When the Barland et al. (1989a) charts were blindly evaluated, the performance was worse, accounting for only 4% of the criterion variance. In the blind evaluations, only one agency produced better-than-chance discrimination of the innocent from the guilty. Subsequent to the Barland et al. (1989a) study, the Department of Defense conducted a series of four additional studies on various aspects of polygraph screening (Barland, Honts, & Barger, 1989b; TABLE 1 Outcomes for the Original Examiners in the Barland, Honts, and Barger (1989a) Study Outcome Truthful Inconclusive Deceptive Innocent Guilty 105 55 4 8 7 28

\ 102 C. R. Honts Honts, 1989; Honts, Barland, & Barger, 1989;' Honts & Carlton, 1990). Those studies were conducted in an effort to determine whether methodological problems or inherent weaknesses in screening polygraph examinations were responsible for the surprisingly poor results found by Barland et al. (1989a). Honts et al. (1989) examined the effects of the relative specificity of the wording of relevant questions. As noted earlier, the relevant questions typically used in screening examinations are broad in their wording (i.e., Have you ever committed an act of espionage against the United States?) as compared with the relevant questions used in criminal specific-issue examinations (i.e., Did you steal that secret document from the secretary's desk?). Guilty subjects in the Honts et al. (1989) study stole a mock classified document from a secretary's desk in a military facility. Half of those subjects were tested with a CQT that used broad scree~ing-type relevant questions, whereas the other half of the subjects received a similar CQT, except that it used specific criminal issue-type relevant questions. Honts et al. (1989) also manipulated a methodological variable. In the Barland et al. (1989a) study, up to 2 months intervened between the commission of the mock espionage and the polygraph examinations. In most laboratory studies of the polygraph, the examinations were given immediately after the commission of the mock crime. It is possible that the intervening time could have weakened the manipulation and resulted in reduced accuracy rates. Half of the Honts et al. (1989) subjects were tested immediately after the mock espionage. The remaining subjects were tested 6 weeks following the act of mock espionage. Neither the manipulation of relevant question specificity nor the manipulation of the time interval between the mock espionage and the examination produced significant effects in either decisions or in the numerical scores. Excluding the 6% inconclusive outcomes, the Honts et al. examiners correctly classified 90% of the innocent and 81 % of the guilty subjects. Those results suggest that neither of these variables was responsible for the poor detection of deception in Barland et al. (1989a). Furthermore, they demonstrated that substantial rates of detection are possible in a mock espionage setting. However, the generalizability of the Honts et al. accurate rates might have been limited by the fact that the CQT used in that experiment was developed for that experiment and was not specifically representative of any technique actually used in the field.

Polygraph Tests in lhe flfjrkplace 103 Barland et al. (1989b) compared two testing strategies that!=ould be used when the examination must cover several relevant issues. Most previous mock crime experiments have studied only single relevant issue examinations, whereas the examinations in Barland et al. (1989a) addressed several relevant issues. Subjects in Barland et al. (1989b) were programmed to be guilty of 0, 1, 2, or 3 different acts of mock espionage. Half of the subjects were then tested with one triple-issue test, and the other half were tested with three single-issue tests. There were no differences in the results of these two approaches to testing multiple issues, either in the numerical scores or in decision accuracy. Excluding the 24% inconclusives, 79% of the innocent and 93 % of the guilty subjects were correctly classified. However, neither approach was better than chance at identifying specifically which crime the guilty subjects had committed. Those results suggest that the false negative rate in Barland et al. (1989a) was not caused by the use of multiple-issue testing or by the use mock espionage scenarios. Honts (1989) explored the possibility that examiners who exclusively conduct screening examinations over a long period of time alter their technique in such a way that their accuracy decreases as compared with examiners who conduct primarily criminal issue examinations. The examiners in the Barland et al. (1989b) and the Honts et al. (1989) experiments primarily conducted criminal examinations in their assignments, whereas the examiners in Barland et al. (1989a) almost exclusively conducted screening examinations. Honts's (1989) guilty subjects committed an act of mock espionage by stealing, copying, and passing a mock classified document to a foreign-sounding agent. Half of Honts's subjects were tested by examiners who primarily conducted criminal issue examinations, whereas the remainder were tested by examiners who almost exclusively conducted screening examinations. All of the examiners used the Counterintelligence Scope Polygraph (CSP) examinations used by the Department of Defense for screening, and all were instructed to conduct the examinations according to regulations. There were no significant differences in either the accuracy of decisions or in the numerical scores between the two groups of examiners. The detection of deception performance of these examiners was poor, with only 40.3% of the guilty subjects being correctly identified. These results suggest that the weakness lies in the screening test and not directly with the type of examiner who conducts the examination.

104 C. R. Bonis However; none of the subjects in the Department of Defense studies just described were motivated with specific incentives associated with the outcome of their polygraph examination. Podlesny and Raskin (1977) have suggested that incentives are a critical part of establishing a deceptive context in laboratory detection of deception experiments. Barland et al. (1989a) strongly suggested that the lack of incentives accounted for the inability of their examiners to detect deception. Honts and Carlton (1990) explored the effects of motivation on the detection of deception. Subjects in the Honts and Carlton study were U.S. Army basic trainees who were either innocent or guilty of having committed a mock theft of a gun. Half of the Honts and Carlton subjects were motivated by the offer of an afternoon off without duties if they could pass their examination. Data indicated that these basic trainees preferred the afternoon off to a monetary reward of $25. Honts and Carlton's subjects were tested using the standard CQT developed at the University of Utah (Kircher & Raskin, 1988). They reported no effects of the motivation manipulation on either the accuracy of decisions or on the numerical scores. Excluding inconclusives, the Honts and Carlton examiners correctly classified 87% of their guilty subjects and 78% of their innocent subjects. These results strongly suggest that the lack of incentives cannot account for the lack of detection of deception in the Barland et al. (1989a) results. The results of the four studies conducted by the Department of Defense as follow-ups to the Barland et al. (1989a) study all support and strengthen the results of that study. No methodological problems with that study have been found. Given the high level of realism of the mock crimes and the examinations used by Barland et al. (1989a), the results of that study clearly represent the best available experimental estimate of the validity of federal periodic screening polygraph tests. Applying Validity Estimates to Real-World Applications Unfortunately, one cannot go directly from the results of any study or group of studies and apply those results directly to the field in order to determine accuracy of a technique in an applied setting. In estimating the accuracy of tests given in the field, accuracy must be

Polygraph Tests in the UfJrkplace 105 considered within the context of the specific application of the technique. One of the major factors that must be considered is the base rate. In the case of polygraph tests, the base rate is the proportion of the population to be tested that is guilty of the acts under investigation. If the base rate of guilt is 50%, that is, half of the subjects are guilty and half are innocent, the solution is simple and has been described by Raskin (1986) with a conditional probability analysis. Table 2 shows the results of a conditional probability analysis for a test that is 90% accurate on a population of 1,000 subjects, 500 of whom are innocent and 500 of whom are guilty. Of the 500 truthful outcomes, 450 will be correct, and the confidence in anyone truthful outcome is 90%. Similarly, of the 500 deceptive outcomes, 450 will be correct, and the confidence in any 1 deceptive outcome is 90 %. However, if the base rate changes substantially from 50 %, or if the test is less accurate with one type of subject than with another, then the confidence in an individual test outcome may be different. For example, consider Tables 3 and 4. Table 3 shows a situation in which the base rate is 50%, but the test is more accurate with guilty than with innocent subjects. In this illustration, there are 375 truthful outcomes, 93 % of which are correct. However, of the 625 deceptive outcomes, 150 are incorrect. Thus, for this illustration, the confidence in a deceptive outcome is only 76%. Table 4 shows the effect of a skewed base rate. In the Table 4 illustration, 1,000 subjects are tested with a technique that is 90% accurate with all subjects, but here only 1 % of the subjects are guilty. In this case, there are 892 truthful outcomes, 99.9% of which are correct. However, there are 108 deceptive outcomes, of which only 9 are correct. Here the confidence in a deceptive examination outcome is only 8.3 %. The problem faced by national security screening programs IS TABLE 2 Conditional Probability Analysis of 500 Innocent and 500 Guilty Subjects Outcome Truthful Deceptive Innocent 450 50 Guilty 50 450 Total 500 500 Total 500 500 1,000 Note. The technique used is 90% accurate.

106 c. R. Honts TABLE 3 Conditional Probability Analysis of 500 Innocent and 500 Guilty Subjects Outcome Truthful Deceptive Total Innocent Guilty 350 25 150 475 500 500 Total 375 625 1,000 Nolt. The technique used is 95% accurate with guilty sujects but only 70% accurate with.innocent subjects. somewhat described by the problem illustrated in Table 4. Accurate estimates of the number of people with security clearances who are involved in espionage are either lacking or they are not available in the open literature. Lykken (1987) has suggested that the rate must be less than 0.1 %. One would hope that the actual rate of espionage is much less than even that small percentage. Given that the stated targets for national security screening represent such a small percentage of the population to be tested, national security screening programs seemed to have faced a nearly insurmountable problem since their inception. Because the base rate far exceeds even the most liberal estimates for accuracy of polygraph tests, it is difficult to see how they can be useful. Researchers will have to look further for a possible role for polygraph tests in national security screening. Raskin and Kircher (1987) have noted that even with a formidable base-rate problem, a reasonably accurate screening test can still be useful. The procedure they advocate is one known as a system of successive hurdles (Meehl & Rosen, 1955). In a system of successive hurdles, the polygraph would be used as an initial screening that TABLE 4 Conditional Probability Analysis of 990 Innocent and 10 Guilty Subjects Outcome Truthful Deceptive Innocent 891 99 Guilty 9 Total 892 108 Total 990 10 1,000 Nolt. The technique used is 90% accurate.

Polygraph 7ists in lhe Ubrkplaa 107 would effectively reduce the number of suspects.. Using the illustration in Table 4, it can be seen that if polygraph tests are reasonably accurate, they could reduce the number of suspects from 1,000 to 108. Those 108 individuals could then be subjected to additional scrutiny to determine who the actual innocent and guilty individuals were. Those additional investigations would have to be conducted with the knowledge that many of the outcomes of the polygraph test must be false positive errors. Great care would have to be taken to ensure that simply failing a polygraph would not damage an individual's career. Reducing the field of suspects by an order of magnitude could represent a considerable savings to the government because investigations are difficult and expensive to conduct. The Counterintelligence Scope Polygraph Program The Department of Defense's Counterintelligence Scope Polygraph (CSP) Program was established (Department of Defense, 1984) to be a system of successive hurdles. Individuals with special access clearances were periodically to be given polygraph tests on issues of espionage and security violations. As currendy implemented, no disciplinary action can be taken against an individual for refusing to take a test or for failing a polygraph test, except that the individual's clearance can be denied or revoked. The individual must be retained in a position of equal pay and responsibility except for the clearance. Supposedly, refusal or failure should have no adverse effect on an individual's career in government service, unless of course the individual admits to having committed espionage or other illegal acts. Four years' worth of data from the Department of Defense CSP Program (Department of Defense, 1986, 1987, 1988, 1989) are available for analysis. During those 4 years, 31,150 examinations were conduced as part of this program. Of those 31,150 examinations, rio opinion was rendered in 43 cases, and 22 cases were reported to be inconclusive. Of the 31,085 cases in which decisions were given, 167 individuals were reported as being deceptive and 30,918 were reported as being truthful. Of the 167 deceptive outcomes, 129 subsequently confessed to some actions that might have accounted for their failing the examination. From this, one can conclude with some certainty that the proportion of false positive errors in the CSP Pro-

108 C. R. Honts gram cannot be greater than.00123 (38/30918 + 38), a truly remarkable outcome given the expectations based on the conditional probability analyses described earlier and illustrated in Table 4. Converging the Evidence from the Laboratory and the Field On first glance, it is difficult to reconcile the results of the conditional probability analyses described earlier with the results from the Department of Defense reports to Congress. Initially, it would seem that either the polygraph is a fantastically accurate screening device or that something is wrong with the logic of the conditional probability analysis. Because neither of these alternatives is likely, one must look further. The results of the Barland et al. (1989a) study suggest that the polygraph techniques used by the federal government for periodic screening are accurate with innocent subjects but also that they are not good with guilty subjects. Those results could be converged with the results from the CSP Program if a reasonable estimate of the base rate of guilt could be developed. A closer examination of the Department of Defense (1986, 1987, 1988, 1989) reports reveals that the vast majority of the reported confessions concerned security violations rather than acts of espionage. Given that the main issue being detected by the CSP Program appears to be security violations rather than espionage, then the Barland et al. (1989a) study provides an estimate of the base rate of guilt. Barland et al. (1989a), using confessions and admissions obtained in their study, estimated the base rate of security violations to be 20%, although they suggested that this was likely to be a conservative estimate. Table 5 shows the results of a conditional probability analysis solved by working backward under the assumption of a 20% base rate of guilt with the outcomes obtained from the CSP Program. The analysis assumes that ail 38 deceptive outcomes that were not followed by confessions were false positive errors. Considering them to be deceptive subjects who failed to confess would make a negligible difference in the conclusions of this analysis. The results of this analysis suggest that the polygraph examinations in the CSP Program were close to 100 % accurate with innocent subjects but that only 2.1 % of the guilty subjects were detected in their deception.

Polygraph Tests in the I#Jrkplace 109 TABLE 5 Conditional Probability Analysis Worked Backward from the CSP Program Data and an Assumed Base Rate of Deception of 20% Examination outcome Outcome Truthful Deceptive Total Innocent 24,830" 38 b 24,868" Guilty 6,088 a 129 6,217a Total 30,918 167 31,085 Note. CSP - Counterintelligence Scope Polygraph Screening Program. athese numbers were generated by the analysis; other values were empirical. ~he 38 subjects who were called deceptive but who did not confess were assumed to be false positive errors for this analysis. Locating them in the guilty-deceptive cell makes no noticeable change in the conclusions drawn from this analysis. There is evidence that some federal agencies are operating in this accuracy range. The Barland et al. (1989a) Agency 4 produced results that were 100% accurate on innocent subjects but were only 8% accurate on guilty subjects. This seems to be a very close match with the actual data and gives considerable support to the notion that the CSP Program is highly ineffective at detecting deception. It appears that rather than serving as the first hurdle in a series of screening procedures, an effort was made to limit false positive errors at the first step in the process. As a result, nearly all of the screening ability of the polygraph has been lost. It seems clear that for the CSP Program to function as it was designed, significant changes will have to be made. These programs will have to accept the role as a first hurdle, and their examiners must be willing to call individuals deceptive, even at the cost of making a substantial number of false positive errors. However, any national security screening program that includes a substantial number of false positive errors at any stage may prove to be an unacceptable solution in a public and political climate that fears and condemns falsely accusing the innocent. The Countermeasure Problem However, even if accurate security screening polygraph tests were being correctly applied in a successive hurdle system, there is still one

110 C. R. Honts major barrier to their effective implementation. That is the problem posed by countermeasures. A countermeasure is anything that an individual could do in an effort to produce a truthful outcome on a polygraph test. Between 1981 and 1988, I was involved in a series of programmatic studies that explored the effects of several physical and mental countermeasures on the CQT (Honts, 1986; Honts & Hodes, 1983; Honts, Hodes, & Raskin, 1985; Honts, Raskin, & Kircher, 1987; Honts, Raskin, Kircher, & Hodes, 1988). In order to defeat the CQT, a guilty subject must be able to produce larger responses to the control questions than to relevant questions in a way that is not noticeable to the polygraph examiner. I and my colleagues found that this was a relatively easy task to accomplish. We gave subjects less. than.30 minutes of countermeasure training. During that training, subjects were taught to recognize control and relevant questions, and they were instructed to do things such as bite their tongue or count backward by 7s during the control questions of the examination. In one study (Honts et al., 1987), this brief training was so effective that none of the subjects who received the training were successfully detected in their deception. In the same study, the accuracy of decisions with guilty subjects who did not receive the countermeasure training was 100%. In the other studies, the countermeasure subjects have not been as successful, but over the series of studies the false negative rate has averaged better than 50% for those subjects who have received our brief training and information. It should be noted that all of these countermeasure studies were laboratory studies. It is nearly impossible to conduct valid countermeasure research in the field (Honts, 1987). It is possible that the results of laboratory countermeasure studies may not generalize to field applications. However, recent newspaper reports have indicated that a number of Cuban double-agents were able to defeat repeated federal polygraph testing over a period of 10 years (Safire, 1989; Wines & Ostrow, 1987). The conjunction of this evidence frotij. the field and the powerful results from the laboratory present a strong case for the danger posed by countermeasures. Unless strong evidence can be presented to indicate otherwise, it would seem to be conservative and prudent to assume that hostile intelligence agents have the knowledge, ability, and motive to use countermeasures effectively in the field. For polygraph tests to serve effectively in na-

Polygraph Tests in the Workplace 111 tional security screening, solutions to the countermeasure problem will have to be found and implemented. However, to date, no really successful solutions to the countermeasure problem have been reported (Honts, 1987). Honts et al. (1987) found that electromyographic recordings were useful in detecting muscle contraction countermeasures. Unfortunately, the standard polygraph instrument cannot be used to measure electromyographic activity. Furthermore, the number of muscle sites that might have to be measured could be large. In any event, electromyographic measures could not be used to detect mental countermeasures, and mental countermeasures have been demonstrated to be just as effective as physical countermeasures (Honts, 1986). Other movement detectors popular with polygraph examiners suffer the same limitations. The only avenue that seems to offer promise in solving the countermeasure problem is the use of advanced quantitative techniques in the scoring of polygraph data. Kircher and Raskin (1988) have described a discriminant analysis-based system that performs as well as the best human evaluators at scoring polygraph charts in both the laboratory (Honts, 1986; Honts & Carlton, 1990; Horowitz, 1989; Kircher & Raskin, 1988) and in the field (Raskin, Horowitz, & Kircher, 1989; Raskin, Kircher, Honts, & Horowitz, 1988). This discriminant analysis system has been demonstrated to substantially reduce countermeasure-produced false negative errors in the laboratory (Honts, 1986). Unfortunately, with the exception of the U.S. Secret Service, this system has had little or no application in the federal government. Additional quantitative research has suggested that the false negative rate attributable to successful countermeasures could be reduced to approximately 15 % through discriminant classification analysis of autonomic patterning (Honts, Kircher, & Raskin, 1988), but this research has yet to be replicated or to achieve any field implementation. Summary and Conclusions With the strong limitations and restrictions placed on the use of the polygraph in the private sector by the EPPA of 1988, the federal

112 C. R. Hon/s government'has become the primary user of polygraph tests in the U.S. workplace. The federal government generally uses two polygraph techniques. The older of those techniques is the relevantirrelevant test. Although still widely used, the relevant-irrelevant test has been thoroughly discredited by both logical arguments and empirical data. The motive for using the relevant-irrelevant' technique cannot legitimately be for the instrumental detection of deception. Rather, in those settings in which the relevant-irrelevant technique is used, it probably serves simply as an interrogative prop. The other technique widely used by the federal government is the CQT. This technique, although controversial, has more scientific support than the relevant-irrelevant test. In the federal workplace, the polygraph is used in three applications: as an investigative tool, as a preemployment screening device, and as a periodic screening device. The investigative use of the polygraph is similar to the polygraph's application in law enforcement. Although, there is controversy about the validity of the polygraph in investigations, it is here that the polygraph gains its most support and its least controversial application. Litde evidence is available to support the use of the polygraph in preemployment screening. To date, there are no adequate studies of the validity of the preemployment use of the polygraph. Polygraph examiners note that they obtain a great deal of information with preemployment polygraph examinations. However, there is no published research, to my knowledge, that demonstrates that such information is actually useful in predicting future behavior. It may well be that those individuals who are most desirable are rejected, whereas those who are the least desirable are passed. There are many reasons to find the use of the preemployment polygraph tests suspect, and until adequate research demonstrates the effectiveness of the preemployment polygraph, its use will remain questionable., The final application of the polygraph in the federal government is as a national security screening test. A large laboratory study (Barland et al., 1989a) of national security screening polygraph tests indicated that those techniques were very good at passing truthful individuals but that they were very ineffectual at detecting deception. Conditional probability analysis of field data from the CSP Program (Department of Defense, 1986, 1987, 1988, 1989) suggests that polygraph examinations in that program are about 2 % accurate with