A Bayesian Network Model of Knowledge-Based Authentication

Similar documents
An Empirical Investigation of Knowledge-Based Authentication

How to Create Better Performing Bayesian Networks: A Heuristic Approach for Variable Selection

Trust in E-Commerce Vendors: A Two-Stage Model

Subjective randomness and natural scene statistics

Analyzing Spammers Social Networks for Fun and Profit

Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision in Pune, India

Who Subscribe to Identity Theft Protection Service

A Comparison of Collaborative Filtering Methods for Medication Reconciliation

Coherence Theory of Truth as Base for Knowledge Based Systems

Understanding Social Norms, Enjoyment, and the Moderating Effect of Gender on E-Commerce Adoption

BAYESIAN NETWORK FOR FAULT DIAGNOSIS

External Variables and the Technology Acceptance Model

Modeling Terrorist Beliefs and Motivations

A Trade-off Between Number of Impressions and Number of Interaction Attempts

Identifying Parkinson s Patients: A Functional Gradient Boosting Approach

CLASSIFICATION OF BRAIN TUMOUR IN MRI USING PROBABILISTIC NEURAL NETWORK

Physicians' Acceptance of Web-Based Medical Assessment Systems: Findings from a National Survey

Bayesian Networks Representation. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University

International Journal of Software and Web Sciences (IJSWS)

Identifying the Zygosity Status of Twins Using Bayes Network and Estimation- Maximization Methodology

EECS 433 Statistical Pattern Recognition

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison

Identifying a Computer Forensics Expert: A Study to Measure the Characteristics of Forensic Computer Examiners

Scientific Working Group on Digital Evidence

On Trust. Massimo Felici. Massimo Felici On Trust c

Mammogram Analysis: Tumor Classification

A Novel Iterative Linear Regression Perceptron Classifier for Breast Cancer Prediction

Application of Bayesian Network Model for Enterprise Risk Management of Expressway Management Corporation

TWO HANDED SIGN LANGUAGE RECOGNITION SYSTEM USING IMAGE PROCESSING

COMMITMENT. &SOLUTIONS Act like someone s life depends on what we do. UNPARALLELED

A HMM-based Pre-training Approach for Sequential Data

A Vision-based Affective Computing System. Jieyu Zhao Ningbo University, China

Case Studies of Signed Networks

Section 6.1 "Basic Concepts of Probability and Counting" Outcome: The result of a single trial in a probability experiment

MS&E 226: Small Data

Seeing is Behaving : Using Revealed-Strategy Approach to Understand Cooperation in Social Dilemma. February 04, Tao Chen. Sining Wang.

Bayesian (Belief) Network Models,

Modeling Terrorist Beliefs and Motivations

The Classification Accuracy of Measurement Decision Theory. Lawrence Rudner University of Maryland

Recognizing Scenes by Simulating Implied Social Interaction Networks

Event Classification and Relationship Labeling in Affiliation Networks

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING. CP 7026-Software Quality Assurance Unit-I. Part-A

Using AUC and Accuracy in Evaluating Learning Algorithms

Skin color detection for face localization in humanmachine

Construction of Theoretical Model for Antiterrorism: From Reflexive Game Theory Viewpoint

Computational Cognitive Science

APPROVAL SHEET. Uncertainty in Semantic Web. Doctor of Philosophy, 2005

Challenges of Fingerprint Biometrics for Forensics

MACHINE learning-as-a-service has seen an explosion

The 29th Fuzzy System Symposium (Osaka, September 9-, 3) Color Feature Maps (BY, RG) Color Saliency Map Input Image (I) Linear Filtering and Gaussian

Can We Assess Formative Measurement using Item Weights? A Monte Carlo Simulation Analysis

CHAPTER 6 HUMAN BEHAVIOR UNDERSTANDING MODEL

Chapter 1. Introduction

The Open Access Institutional Repository at Robert Gordon University

Assurance Activities Ensuring Triumph, Avoiding Tragedy Tony Boswell

How to use the Lafayette ESS Report to obtain a probability of deception or truth-telling

Applying Machine Learning Techniques to Analysis of Gene Expression Data: Cancer Diagnosis

Bayesian relevance and effect size measures. PhD thesis booklet. Gábor Hullám. György Strausz, PhD (BME)

Intent Inference and Action Prediction using a Computational Cognitive Model

Interim Report. Establish Steering Committee. Gather information

An Experimental Study on Authorship Identification for Cyber Forensics

Systems Theory: Should Information Researchers Even Care?

Sensory Cue Integration

An assistive application identifying emotional state and executing a methodical healing process for depressive individuals.

Framework for Comparative Research on Relational Information Displays

Local Image Structures and Optic Flow Estimation

Position Paper: How Certain is Recommended Trust-Information?

Predicting Heart Attack using Fuzzy C Means Clustering Algorithm

The Development and Application of Bayesian Networks Used in Data Mining Under Big Data

Bayesian Nonparametric Methods for Precision Medicine

The Role of Trust in Ubiquitous Computing Marc Langheinrich ETH Zurich, Switzerland

On the Usability and Security of Pseudo-signatures

Evaluation of Linguistic Labels Used in Applications

Machine Learning-based Inference Analysis for Customer Preference on E-Service Features. Zi Lui, Zui Zhang:, Chenggang Bai-', Guangquan Zhang:

TEACHING YOUNG GROWNUPS HOW TO USE BAYESIAN NETWORKS.

Challenges for U.S. Attorneys Offices (USAO) in Opioid Cases

Using Bayesian Networks to Direct Stochastic Search in Inductive Logic Programming

Natural Revocability in Handwritten Signatures to Enhance Biometric Security

AN INVESTIGATION OF DECISION- MAKING AND THE TRADEOFFS INVOLVING COMPUTER SECURITY RISK

Decisions and Dependence in Influence Diagrams

INTERNATIONAL STANDARD ON ASSURANCE ENGAGEMENTS 3000 ASSURANCE ENGAGEMENTS OTHER THAN AUDITS OR REVIEWS OF HISTORICAL FINANCIAL INFORMATION CONTENTS

CHAPTER 4 THE QUESTIONNAIRE DESIGN /SOLUTION DESIGN. This chapter contains explanations that become a basic knowledge to create a good

Introduction to Bayesian Analysis 1

Clustering mass spectrometry data using order statistics

CS 4365: Artificial Intelligence Recap. Vibhav Gogate

A Learning Method of Directly Optimizing Classifier Performance at Local Operating Range

Blackout risk, cascading failure and complex systems dynamics

Keywords Missing values, Medoids, Partitioning Around Medoids, Auto Associative Neural Network classifier, Pima Indian Diabetes dataset.

NIH Public Access Author Manuscript Conf Proc IEEE Eng Med Biol Soc. Author manuscript; available in PMC 2013 February 01.

Identity Verification Using Iris Images: Performance of Human Examiners

CISC453 Winter Probabilistic Reasoning Part B: AIMA3e Ch

LEAF Marque Assurance Programme

The Significance of Mixed Methods Research in Information Systems Research

Confluence: Conformity Influence in Large Social Networks

Inference Methods for First Few Hundred Studies

MBA SEMESTER III. MB0050 Research Methodology- 4 Credits. (Book ID: B1206 ) Assignment Set- 1 (60 Marks)

Test-Driven Development

An Improved Algorithm To Predict Recurrence Of Breast Cancer

Automatic Definition of Planning Target Volume in Computer-Assisted Radiotherapy

SUPPLEMENTARY INFORMATION

Transcription:

Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2007 Proceedings Americas Conference on Information Systems (AMCIS) December 2007 A Bayesian Network Model of Knowledge-Based Authentication Ye Chen University of Wisconsin, Madison Follow this and additional works at: http://aisel.aisnet.org/amcis2007 Recommended Citation Chen, Ye, "A Bayesian Network Model of Knowledge-Based Authentication" (2007). AMCIS 2007 Proceedings. 423. http://aisel.aisnet.org/amcis2007/423 This material is brought to you by the Americas Conference on Information Systems (AMCIS) at AIS Electronic Library (AISeL). It has been accepted for inclusion in AMCIS 2007 Proceedings by an authorized administrator of AIS Electronic Library (AISeL). For more information, please contact elibrary@aisnet.org.

A BAYESIAN NETWORK MODEL OF KNOWLEDGE-BASED AUTHENTICATION Extended Abstract Ye Chen Dissertation Advisor: Dr. Divakaran Liginlal Department of Operations and Information Management, School of Business University of Wisconsin-Madison yechen@bus.wisc.edu Abstract In this paper, we adapt a statistical modeling approach to the problem of knowledgebased authentication (KBA). The study consists of three parts, namely, model selection, feature selection, and empirical investigation. First, a nonparametric Bayesian network model of KBA is presented, which is grounded in probabilistic reasoning and information theory. Second, we propose an approach to feature selection in KBA that is based on the principle of maximum entropy with proper underlying probabilistic semantics in the security domain. Third, an empirical study extends the analytical modeling to the behavioral space of KBA involving both legitimate users and attackers. Our empirical investigation is comprised of a pilot study and a large-scale experiment with online social networking data. The pilot study validated that the proposed Bayesian model makes a sensible approximation to the human cognitive process. Our experiments with online social networking data showed that, with the cutting-edge statistical machine learning techniques and the abundant data available from the Internet, the guessability an attacker would achieve can be significantly improved. The results of the analytical modeling and the empirical study bear direct implications to the KBA implementation and establish a foundation for future research in the KBA area. Keywords: Knowledge-based authentication, Bayesian networks, probabilistic model, security, feature selection, information theory, maximum entropy, simulation, online social networks Introduction Knowledge-based authentication (KBA) is a method of verifying a claimed identity through challenges (questions) and responses (answers), where answers are from the individual s personal knowledge (factoids) and not memorized as is the case of passwords (Hastings et al. 2004; Regan 2004). A prominent example is ebay s consumer authentication service (CAS) (Trilli et al. 2004) which challenges users with questions such as: what was your previous address/street number? or what are the last 4 digits of your discovery card? KBA has unique advantages over other authentication methods such as passwords and biometrics, 1

especially in situations where users need to verify themselves less frequently as in credit verification or tax filing. KBA is considered convenient for both users (claimants) and verifiers (Lawler 2004). No a priori relationship between the claimant and the verifier needs to be established specifically for the sake of authentication, since organizations usually acquire the knowledge about the user from previous transactions or public data sources such as Social Security Administration (SSA) and Consumer Reporting Agency (CRA). These factoids are also relatively effortless to remember, compared with strong passwords (Kaliski 2005). Moreover, KBA appears to be integral to the identity verification procedures mandated by the USA PATRIOT Act (2001). A fundamental challenge in KBA research and practice is the lack of a mathematical formalism of the KBA problem; specifically how to quantify the level of assurance of a KBA system (Burr 2004). To answer this question one needs to consider a set of systematic metrics, i.e., guessability and memorability, for the individual factoids and the selected factoid vector constituting a KBA system, along with a framework to unify these components and their dependency relationships. Model Selection: A Bayesian Network Model of KBA We apply statistical modeling approach, specifically Bayesian networks (BNs) (Friedman et al. 1997) to KBA (Chen et al. 2007). From the model learning point of view, we can neither readily obtain in practice the malicious side of training data from the potential impostors, nor theoretically assume that the inherently unpredictable future attacking behaviors can be extrapolated from the past fraudulent data. This challenge precludes us from employing the state-of-the-art classifiers such as Support Vector Machines (SVMs) (Burges 1998). Instead, we adopt the generative view to model the KBA domain in order to deliver strong authentication systems. Formally, Definition 1: A Bayesian network model for KBA (BN-KBA) is a six-tuple DAG (directed acyclic graph): The vertices of the DAG BN - KBA def = ( y, x, k, E, E, E ). (1) y x V = ( y, x, k) index three types of variables: y is the authenticity of a claimant s identity; = { x, x, Κ, x } is the random vector of n selected features or factoids; and k = { k, k2, Κ, k 1 m x 1 2 n } is the random vector of knowledge sources. The edges of the DAG E = { E y, E x, Ek} represent three types of dependency relationships: E y denotes the dependencies of k all factoids on the class variable; E k denotes the dependency edges from knowledge sources to the associated factoids; and E x denotes the interdependences among factoids. 2

The BN-KBA model essentially defines a joint probability distribution, in a compact manner, over the three types of variables of interest: p ( y, x, k) = p( y) p(k) p( x i π( x i )), (2) where π x ) denotes the set of parent nodes of x in the BN DAG. With a learned BN-KBA model, i.e., ( i the joint distribution, we can compute the posterior probability of the class variable by Bayes rule: p(e y = 1) p( y = 1) p( y = 1 e) =, (3) p(e) where e i n i= 1 denotes entered evidence regarding correctness of responses and trustworthiness of knowledge sources. The authentication decision is made by comparing the posterior with a predefined threshold. The BN-KBA model is intuitively appealing in that it captures two key metrics of KBA as the model parameters, particularly the likelihoods: 1. Memorability: m( x) def = p( x = 1 y = 1) is the probability that a claimant with true identity recalls the factoid correctly. 2. Guessability: g( x) = p( x = 1 y = 1) is the probability that an impostor correctly guesses the factoid. def To estimate the model parameters in Equation (2), we apply two distinct approaches: (1) for the prior probabilities, such as p ( y) and p (k), and the memorabilities m( x), we use the traditional statistical learning methods including the maximum likelihood (ML) and maximum a posteriori (MAP) estimation; and (2) for the guessabilities g( x), we adopt a game-theoretic analysis since the data about attacking is unattainable, and derive the optimal guessability of a KBA system: g *(x) = max p(x), (4) x A where x is the vector of the selected factoids constituting a KBA system, and A denotes the set of possible realizations of x. In other words, a rational attacker should always try the highest frequent realizations of the feature vector x. On the other hand, a secure KBA implementation should select factoids such that g *( x ) = max p(x) x A is minimized. Feature Selection: A Maximum Entropy Approach Another critical task for KBA design is to select a relevant subset of factoids to achieve the best level of security assurance. The challenge lies in how to measure the relevance of a factoid subset. Under an 3

information-theoretic view, a population of legitimate users defines an empirical distribution over the selected feature space p(x) ; while an attacking strategy can be formulated as another distribution q(x) that approximates the true distribution p(x). A natural distance measure between these two distributions is the cross entropy, thus the feature selection problem can be formulated as the following optimization problem: f* = arg max H ( p, q f ), (5) f F where f indexes the selected subset of factoids, and F is the set of all available factoids. The closed-form solutions to this optimization problem lead to three feature selection criteria, at different granularity levels: 1. Domain-level feature selection is to select the optimal f with regard to the average user identity id, thus to be used in all subsequent challenge-response sessions. Formally, f* = arg min max p(x f ). (6) f F x A 2. Identity-level feature selection is to select the optimal f with regard to a specific id, thus to be adaptive to different identities under attack. Formally, f* = arg max h(x id f ), (7) f F where x is the vector realization of id, and x f ) is the entropy of the realization given f. id h( id 3. Factoid-level feature selection is to select the optimal f in a stepwise manner such that the selection of the next factoid is based on the claimant s responses to the previous questions, and the remaining selected factoids as a whole will give the lowest guessability or the maximum guessing entropy. Formally, f*' = arg max h (x' id x prev = rprev,f '), (8) f ' F' where the superscript ' stands for the remaining factoid vector to be selected, the subscript prev denotes the previous factoids having been challenged, and r denotes the claimant s responses. An Empirical Investigation The objective of the empirical study is two-fold: both exploratory and confirmatory. The confirmatory part attempts to experimentally validate the hypothesis that our statistical model is a sensible approximation to the human cognitive process for the KBA problem. The preliminary results show that our statistical model captures the principal components of the goodness of a KBA design; specifically, security and convenience factors altogether account for 84% of all human subject votes. The exploratory part aims to gain new insights into the user perceptions and preferences of the criteria to determine the goodness of a KBA design, 4

and novel attacking strategies from an attacker s perspective as well. Findings from our pilot study include, but not limited to, 1. Convenience of a KBA system is valued as important as the obscurity (difficulty of guessing) variable; 2. Dependency relationship among factoids is weighted high (the second highest variable in the security factor) by both legitimate users and attackers; and 3. Other novel resources, particularly online social networks, may put imprudent KBA designs at risk. One intriguing finding from our pilot study is that information disclosure on the Internet has raised serious security and privacy concerns. In the context of KBA, the personal knowledge revealed from a variety of online sources can be directly or indirectly exploited by imposters to attack a KBA system. Motivated by this finding, we further experimentally investigate the security threats posed by online information disclosure, specifically from online social networks. Instead of considering the trivial task of locating the information that can be directly used for guessing a KBA factoid, e.g., high school mascot, we focus on how much hidden knowledge about individuals can be derived from the publicly available online social networking data. We plan to apply sophisticated machine learning techniques to learn the underlying statistical patterns of these online social networking user profiles. Our preliminary experimental results show that, with such a rich data repository and the cutting-edge machine learning techniques, the guessability an attacker would achieve can be substantially improved. On the other hand, this framework for analyzing self-disclosed data can also be used as a vulnerability analysis tool for KBA implementations. References "Uniting and Strengthening America by Providing Appropriate Tools Required to Intercept and Obstruct Terrorism (USA PATRIOT Act)," in: Act of 2001, 107th US Congress, 1st Session, HR 3162 Public Law 107-45, 2001. Burges, C.J.C. "A Tutorial on Support Vector Machines for Pattern Recognition," Data Mining and Knowledge Discovery (2) 1998, pp 121-167. Burr, W. "NIST E-Authentication Guidance: Can We Add KBA?," KBA Symposium-Knowledge Based Authentication: Is It Quantifiable?, Gaithersburg, MD, 2004. Chen, Y., and Liginlal, D. "Bayesian Networks for Knowledge-Based Authentication," IEEE Transaction of Knowledge and Data Engineering (19:5) 2007, pp 695-710. Friedman, N., Geiger, D., and Goldszmidt, M. "Bayesian Network Classifiers," Machine Learning (29:2-3) 1997, pp 131-163. Hastings, N.E., and Dodson, D.F. "Quantifying Assurance of Knowledge Based Authentication," ECIW 2004: The 3rd European Conference on Information Warfare and Security, London, UK, 2004. Kaliski, B. "Future Directions in User Authentication," IT-DEFENSE 2005, Cologne, Germany, 2005. Lawler, B. "Models of Knowledge Based Authentication (KBA)," KBA Symposium-Knowledge Based Authentication: Is It Quantifiable?, Gaithersburg, MD, 2004. Regan, T. "Information Source Metrics," KBA Symposium-Knowledge Based Authentication: Is It Quantifiable?, Gaithersburg, MD, 2004. Trilli, K., and Andrews, B. "KBA Challenge Response System," KBA Symposium-Knowledge Based Authentication: Is It Quantifiable?, Gaithersburg, MD, 2004. 5