1 Deception Detection Accuracy Judee K. Burgoon University of Arizona, USA The ability of technologies and humans to detect deception varies widely. Based on estimates derived from signal detection theory, the detection rate for unaided humans falls far short of instruments used to measure veracity. Research investigating whether human detection rates can be improved sufficiently to reach parity with the best technologies has addressed factors such as demographics, personality, and cognitive factors that produce biased rather than accurate judgments, and how resultant inaccuracies can be mitigated through training and other interventions. Signal detection theory Contemporary detection accuracy research relies on signal detection theory (Green & Swets, 1966) to produce various calculations of accurate or inaccurate detection. Hits (also known as the true positive rate, TPR, or the genuine accept ratio, GAR, or sensitivity) are the percentage of actual target cases (in this case, deception) that are classified accurately by judges or instruments as deceit. Misses (also known as the false negative rate, FNR, or false negatives) are the percentage of actual target cases of deception that are erroneously misclassified as truthful. False alarms (also called false positive rate, FPR, or false accept rate, FAR) are the percentage of nontarget (truthful) cases that are erroneously identified as deceptive. True negatives (or true negative rate, TNR) are the percentage of truthful cases that are correctly identified as truthful. The relationships among these terms are shown in Figure 1. Other labels that are frequently used to refer to classification accuracy are sensitivity, the ability to detect deception when it is present, and specificity, the ability to recognize truth when it is present. Highest accuracy occurs when true deception is identified along with a low rate of false alarms. In graphical form, this is displayed as a receiver operating characteristic (ROC) curve, shown in Figure 2. The y axis represents how sensitive the judge or instrument is in detecting true deception. The x axis shows how specific the judge or instrument is by not committing false alarms and identifying actual truths as truths. The diagonal line on the graph represents chance. The farther the ROC moves away from chance and toward the upper left corner, the higher the accuracy in classifying both deception and truth. The International Encyclopedia of Interpersonal Communication, First Edition. Edited by Charles R. Berger and Michael E. Roloff. 2016 John Wiley & Sons, Inc. Published 2016 by John Wiley & Sons, Inc. DOI:10.1002/9781118540190.wbeic0106
2 Actual Predicted Truth Deception Truth True negative TPR False alarm FPR Deception Miss FNR Hit TPR Figure 1 Types of signal detection theory classifications. 1.0 AUC = 0.83 0.3 0.2 0.1 0 True positive rate 0.0 0.2 0.4 0.6 0.8 0.5 0.6 0.7 0.8 0.9 0.4 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate Figure 2 Sample receiver operating characteristic (ROC) curve. One way to commit errors is to have a truth bias, which is overestimating truth relative to actual truth. (A less common bias, lie bias, consists of declaring more of a target s messages as lies than are actually deceptive.) Humans have a strong tendency to be truth biased, believing others are being truthful. This bias can lead to the appearance of high accuracy in judging truth just by judging every message as truthful. This would result in 100% truth accuracy but all the deception cases would be missed, resulting in 0% deception detection accuracy. The goal is to minimize errors in both types of judgments. The ROC provides a handy way to visualize if judgments are biased one way or another. Detection accuracy by instrumentation When the lay public thinks of lie detection, the first thing that comes to mind is the polygraph. The polygraph is the standard against which other devices and human judgments are typically compared. In 2003, the National Academy of Sciences
issued a report that raised serious questions about the validity of the polygraph and the science behind it. The polygraph was seen as most valid and reliable when used for specific issue criminal investigations. The report was more critical about its use for employment screening and projected future misconduct. Subsequently, the American Polygraph Association established stringent standards. Evidentiary event specific diagnostic examinations were required to achieve a criterion accuracy rate of.90 or higher with no more than.20 inconclusive judgments. A meta analysis that assessed the effectiveness of the polygraph in achieving high sensitivity (Gougler et al., 2011) showed that after removing outliers, the accuracy aggregated across multiple nonindependent questions was.89 with an inconclusive rate of.11. (Other specific techniques produced somewhat lower rates of.85 to.87 accuracy.) These results create a very high bar against which other instruments and humans can be compared. Other lie detection instruments that have been tested and fielded include, among others, computerized vocal stress analysis, layered voice analysis, electroencephalograph (EEG), functional magnetic resonance imaging (fmri), and brain fingerprinting. Some of these devices have received significant empirical testing, with mixed results; others have yet to be subjected to rigorous validation and replication research. For example, several tests of voice assessment instruments have found that they are not more accurate than chance but tests of some of the features produced by the instruments have achieved high levels of accuracy. Claims made in the popular literature and by manufacturers selling these devices to law enforcement far outstrip the accuracy found when rigorous testing results are available. The jury is therefore still out on how accurately instruments beyond the polygraph classify truth-tellers and prevaricators. 3 Detection accuracy by humans Far more research has investigated humans ability to judge accurately the veracity of another. Meta analyses (e.g., Hartwig & Bond, 2011) have shown that unaided detection ability by humans averages 54%, which is only slightly better than chance. That estimate combines accuracy in detecting truth and accuracy in detecting deceit. When separate estimates are calculated, truth detection accuracy approximates 61% and deception detection accuracy approximates 47%, which is below chance. This dismal showing for human judges comes largely from studies using students and laypeople rather than professionals trained in judging deceit. Several investigations and meta analyses, however, have reported that trained professionals do not fare much better (and sometimes fare worse) than untrained individuals, leading to disputes about whether professionals are in fact more accurate or merely more confident and lie biased than novices (Meissner & Kassin, 2002). A significant amount of research continues to address this question of whether training actually improves accuracy and, if so, what features such as feedback, repeated sessions, computer aids, and training in telltale content, language, or nonverbal behaviors are most beneficial (e.g., Crews, Cao, Lin, Nunamaker, & Burgoon, 2007; Kluger & DeNisi, 1996).
4 Moderators of detection accuracy The mixed results on detection success have led to pursuit of numerous factors that might moderate accurate detection. Among the factors that have been thought to make a difference are these: base rate; deceiver social skills; judge s detection skills; context and amount of exposure; questioning strategies. Base rate Base rate refers to the proportion of deceptive stimuli to truthful stimuli that are to be judged. The more judges are exposed to truthful rather than deceptive cases, the more accurate they are likely to be, due to their truth bias (Levine, Kim, Park, & Hughes, 2006). This has been called the veracity effect because judgments of truthfulness are a function of the proportion of truthful utterances to which the judge is exposed. If there are very few instances of deception, it becomes difficult to spot them. Deceiver social skills Some communicators just naturally look truthful and are more masterful at appearing believable. The greater the sender s skill, the more difficult for their deceit to be detected (Burgoon, Buller, & Guerrero, 1995). Conversely, the poorer the sender s skill, the more transparent the lies will be and therefore more readily detected (Bond & DePaulo, 2008). Part of the skill may also be the deceiver s motivation to succeed. Those reporting higher motivation are judged as more credible and evade detection when lying. Judge s detection skills Contrary to much early hypothesizing, it appears that there is very little difference among judges in their individual skills at detecting deceit (Bond & DePaulo, 2008). Although the provocative notion that some people are wizards at detecting deception has been bandied about, the empirical evidence does not support there being a special class of gifted individuals at detecting deception. Efforts to find such individuals who share common characteristics that would explain their accuracy have failed. Instead, it appears that humans are similarly poor in their ability to recognize lies and other forms of dissembling. Context and amount of exposure Many of the studies showing poor detection accuracy by humans are based on studies in which judges see, hear, or read very brief samples of truthful and deceptive utterances extracted from unknown circumstances and must declare them as truth or deception.
When judges are given more context about where the utterances have come from, have access to more of what was said and done prior to and following the samples, and have exposure to longer samples of discourse, their accuracy rates improve. For example, professionals who have access to an entire interview attain much higher accuracy than those who must pass judgment based on very brief snippets. 5 Questioning strategies Recent research has begun to delve more deeply into understanding what kinds of questions expose interviewee deceit. While assorted questioning techniques have been the staple of professional training in interviewing and interrogation, academic research has begun to put various strategies to the test. Benign control questions intended to elicit a baseline of neutral responding are compared to ones intended to create an emotional charge or greater cognitive exertion. The behavioral analysis interview goes beyond questioning about details of a crime to assess respondents willingness to help solve the crime (the Sherlock Holmes effect ) and their attitudes about the crime and possible punishment. The cognitive interview requires repeated and deeper inquiry that should elicit more cognitive engagement and amplification by truthful respondents. Strategic withholding of evidence until later in the interview creates possible story conflicts that guilty parties must resolve. Increasing cognitive exertion by requiring narrative retellings in reverse order should prove more taxing for deceivers than truth-tellers, thereby showing more tells of deception. Asking unexpected questions that blindside the interviewee may expose more contradictions and hesitations by deceivers. Other strategies under test include interspersing opinion questions that indirectly reveal guilt and employing questions that make indirect accusations. These varied interviewing techniques may reveal the extent to which indicators of deceit are linked to specific kinds of questions and circumstances and should only be taken as signs of deceit when those circumstances are in force. To the extent that the detection of deceit becomes more nuanced, a ccuracy may improve. SEE ALSO: Deceptive Message Production; Interpersonal Deception Theory References Bond, J. C. F., & DePaulo, B. M. (2008). Individual differences in judging deception: Accuracy and bias. Psychological Bulletin, 134, 477 492. doi: 10.1037/0033 2909.134.4.477 Burgoon, J. K., Buller, D. B., & Guerrero, L. K. (1995). Interpersonal deception: IX. Effects of social skill and nonverbal communication on deception success and detection accuracy. Journal of Language and Social Psychology, 14, 289 311. doi: 10.1177/0261927X95143003 Crews, J. M., Cao, J., Lin, M., et al. (2007). A comparison of instructor led vs. web based training for detecting deception. Journal of Science, Technology, Engineering and Math Education, 8, 31 39. Gougler, M., Nelson, R., Handler, M., et al. (2011). Meta analytic survey of criterion accuracy of validated polygraph techniques. Polygraph, 40, 203 305. Green, D. M., & Swets, J. A. (1966). Signal detection theory and psychophysics. New York, NY: Wiley.
6 Hartwig, M., & Bond Jr., C. F. (2011). Why do lie catchers fail? A lens model meta analysis of human lie judgments. Psychological Bulletin, 137(4), 643 659. doi: 10.1037/a0023589 Kluger, A. N., & DeNisi, A. (1996). The effects of feedback interventions on performance: A historical review, a meta analysis, and a preliminary feedback intervention theory. Psychological Bulletin, 119, 254 284. doi: 10.1037/0033 2909.119.2.254 Levine, T. R., Kim, R. K., Park, H. S., & Hughes, M. (2006). Deception detection accuracy is a predictable linear function of message veracity base rate: A formal test of Park and Levine s probability model. Communication Monographs, 73, 243 260. doi: 10.1080/03637750600873736 Meissner, C. A., & Kassin, S. M. (2002). He s guilty! : Investigator bias in judgments of truth and deception. Law and Human Behavior, 5, 469 480. doi: 10.1023/A:1020278620751 Further reading Driskell, J. E. (2012). Effectiveness of deception detection training: A meta analysis. Psychology, Crime & Law, 18(8), 713 731. doi: 10.1080/1068316X.2010.535820 Frank, M. G., & Feeley, T. H. (2003). To catch a liar: Challenges for research in lie detection training. Journal of Applied Communication Research, 31, 58 75. Granhag, P. A., & Strömwall, L. A. (Eds.). (2004). The detection of deception in forensic contexts. New York, NY: Cambridge University Press. Memon, A. A., Vrij, A., & Bull, R. (2003). Psychology and law: Truthfulness, accuracy and credibility (2nd ed.) Chichester, UK: Wiley. Judee K. Burgoon is professor of communication and director of research for the Center for the Management of Information, Eller College of Management, University of Arizona, USA. She is a fellow of the International Communication Association, recipient of the ICA Steven B. Chaffee Career Achievement Award, National Communication Association distinguished scholar, and Mark L. Knapp Award in Interpersonal Communication. Her current research, which has been supported extensively with extramural funding, centers on interpersonal deception and automated deception detection. She has published eight books and monographs and over 300 articles and chapters related to interpersonal, nonverbal, and verbal communication.