Outline. What s inside this paper? My expectation. Software Defect Prediction. Traditional Method. What s inside this paper?

Similar documents
Lecture 9 Internal Validity

Bayesian Belief Network Based Fault Diagnosis in Automotive Electronic Systems

Representation and Analysis of Medical Decision Problems with Influence. Diagrams

Improving statistical estimates used in the courtroom. Precis. Bayes Theorem. Professor Norman Fenton. Queen Mary University of London and Agena Ltd

You can use this app to build a causal Bayesian network and experiment with inferences. We hope you ll find it interesting and helpful.

ERA: Architectures for Inference

Application of Bayesian Networks to Quantitative Assessment of Safety Barriers Performance in the Prevention of Major Accidents

You can use this app to build a causal Bayesian network and experiment with inferences. We hope you ll find it interesting and helpful.

Decisions and Dependence in Influence Diagrams

Bayesian (Belief) Network Models,

Artificial Intelligence Programming Probability

Lecture 3: Bayesian Networks 1

Defect Removal. RIT Software Engineering

3. L EARNING BAYESIAN N ETWORKS FROM DATA A. I NTRODUCTION

A Vision-based Affective Computing System. Jieyu Zhao Ningbo University, China

Assignment Question Paper I

Application of Analytical Hierarchy Process and Bayesian Belief Networks for Risk Analysis

Expert judgements in risk and reliability analysis

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018

Introduction to Bayesian Analysis 1

Modeling State Space Search Technique for a Real World Adversarial Problem Solving

How to use the Lafayette ESS Report to obtain a probability of deception or truth-telling

Defect Density Prediction with Six Sigma. Software Measurement European Forum Roma, 28-May-2009

MBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1. Lecture 27: Systems Biology and Bayesian Networks

in Engineering Prof. Dr. Michael Havbro Faber ETH Zurich, Switzerland Swiss Federal Institute of Technology

DECISION ANALYSIS WITH BAYESIAN NETWORKS

Understanding Uncertainty in School League Tables*

Using Bayesian Networks to Analyze Expression Data. Xu Siwei, s Muhammad Ali Faisal, s Tejal Joshi, s

Defect Removal Metrics. SE 350 Software Process & Product Quality 1

Lesson 87 Bayes Theorem

Supplementary notes for lecture 8: Computational modeling of cognitive development

Progress in Risk Science and Causality

STIN2103. Knowledge. engineering expert systems. Wan Hussain Wan Ishak. SOC 2079 Ext.: Url:

TESTING at any level (e.g., production, field, or on-board)

IE 5203 Decision Analysis Lab I Probabilistic Modeling, Inference and Decision Making with Netica

Introduction to Computational Neuroscience

Expert System Profile

International Journal of Software and Web Sciences (IJSWS)

Causal Knowledge Modeling for Traditional Chinese Medicine using OWL 2

Exploring Experiential Learning: Simulations and Experiential Exercises, Volume 5, 1978 THE USE OF PROGRAM BAYAUD IN THE TEACHING OF AUDIT SAMPLING

Pythia WEB ENABLED TIMED INFLUENCE NET MODELING TOOL SAL. Lee W. Wagenhals Alexander H. Levis

A Brief Introduction to Bayesian Statistics

Bayes Linear Statistics. Theory and Methods

Bayes Theorem Application: Estimating Outcomes in Terms of Probability

A Bayesian Network Analysis of Eyewitness Reliability: Part 1

Computational Cognitive Neuroscience

High-level Vision. Bernd Neumann Slides for the course in WS 2004/05. Faculty of Informatics Hamburg University Germany

Knowledge is rarely absolutely certain. In expert systems we need a way to say that something is probably but not necessarily true.

Ecological Statistics

Diagnostic Reasoning: Approach to Clinical Diagnosis Based on Bayes Theorem

Cognitive Modeling. Lecture 12: Bayesian Inference. Sharon Goldwater. School of Informatics University of Edinburgh

Hypothesis-Driven Research

Bayes, Data and NUREG/CR-6928 Caveats Nathan Larson, Carroll Trull September 2017

Stepwise Knowledge Acquisition in a Fuzzy Knowledge Representation Framework

Practical Bayesian Optimization of Machine Learning Algorithms. Jasper Snoek, Ryan Adams, Hugo LaRochelle NIPS 2012

Continuous/Discrete Non Parametric Bayesian Belief Nets with UNICORN and UNINET

A probabilistic method for food web modeling

The Development and Application of Bayesian Networks Used in Data Mining Under Big Data

MS&E 226: Small Data

Sequential Decision Making

A Decision-Theoretic Approach to Evaluating Posterior Probabilities of Mental Models

Convolutional Coding: Fundamentals and Applications. L. H. Charles Lee. Artech House Boston London

New Effects-Based Operations Models in War Games. Lee W. Wagenhals Larry K. Wentz

How to Create Better Performing Bayesian Networks: A Heuristic Approach for Variable Selection

Analysis of Competing Hypotheses using Subjective Logic (ACH-SL)

Reasoning with Bayesian Belief Networks

The Century of Bayes

Evaluation Models STUDIES OF DIAGNOSTIC EFFICIENCY

CS 4365: Artificial Intelligence Recap. Vibhav Gogate

Supplement 2. Use of Directed Acyclic Graphs (DAGs)

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

Do not copy, post, or distribute

Estimating the number of components with defects post-release that showed no defects in testing

TEACHING YOUNG GROWNUPS HOW TO USE BAYESIAN NETWORKS.

A Survey of Bayesian Network Models for Decision Making System in Software Engineering

Natural Scene Statistics and Perception. W.S. Geisler

University of Cambridge Engineering Part IB Information Engineering Elective

Bayesians methods in system identification: equivalences, differences, and misunderstandings

Computational Models for Belief Revision, Group Decisions and Cultural Shifts

Goal-Oriented Measurement plus System Dynamics A Hybrid and Evolutionary Approach

Fodor on Functionalism EMILY HULL

Signal Detection Theory and Bayesian Modeling

CISC453 Winter Probabilistic Reasoning Part B: AIMA3e Ch

CHAPTER ONE CORRELATION

Probability II. Patrick Breheny. February 15. Advanced rules Summary

(CORRELATIONAL DESIGN AND COMPARATIVE DESIGN)

How Does Analysis of Competing Hypotheses (ACH) Improve Intelligence Analysis?

Incorporation of Imaging-Based Functional Assessment Procedures into the DICOM Standard Draft version 0.1 7/27/2011

Irrationality in Game Theory

10CS664: PATTERN RECOGNITION QUESTION BANK

Analysis of complex patterns of evidence in legal cases: Wigmore charts vs. Bayesian networks

From data to models: incorporating uncertainty into decision support systems. Outline. Probabilistic vs Mechanistic models.

TEACHING BAYESIAN METHODS FOR EXPERIMENTAL DATA ANALYSIS

A Bayesian Network Model for Analysis of the Factors Affecting Crime Risk

Argument Analysis Using Bayesian Networks

Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics, 2010

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method

Spatial Cognition for Mobile Robots: A Hierarchical Probabilistic Concept- Oriented Representation of Space

Modelling crime linkage with Bayesian Networks

Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision in Pune, India

Transcription:

Outline A Critique of Software Defect Prediction Models Norman E. Fenton Dongfeng Zhu What s inside this paper? What kind of new technique was developed in this paper? Research area of this technique? 2002-12-1 CMSC88 Paper Presentation 1 2002-12-1 CMSC88 Paper Presentation 2 My expectation What s inside this paper? Software defect prediction Traditional Method Using size and complexity metrics Using Testing Metrics Using process quality data Multivariate approaches Bayesian Belief Network (BBN) 2002-12-1 CMSC88 Paper Presentation 2002-12-1 CMSC88 Paper Presentation Software Defect Prediction Traditional Method Defects are commonly defined as deviations from specifications or expectations which might lead to failures in operation. Three problem perspectives Predict the number of defects in the system Estimate the reliability of the system in terms of time to failure Understanding the impact of design and testing processes on defect counts and failure densities 2002-12-1 CMSC88 Paper Presentation Complexity and size metrics Testing Process Design and development process Multivariate studies 2002-12-1 CMSC88 Paper Presentation 1

Size and Complexity Lines of Code (LOC) Lines of Code (LOC) Lines of Code McCabe s Cyclomatic Complexity Function Point D?.8? 0.018L D? 0.09? 0.001 L? 0.000000 L 2 2002-12-1 CMSC88 Paper Presentation 2002-12-1 CMSC88 Paper Presentation 8 Lines of Code (LOC) McCabe s Cyclomatic Complexity Strength Can be easily counted Used in many existing software estimation models Weaknesses Programming language dependent Smaller programs penalized Based on a control flow representation of a program. Measure the logical complexity of a module G; Good indicator of module complexity Represents the minimum number of linearly independent paths through G Cyclomatic complexity is a measure of the amount of control structure or decision logic in a program. Study have shown a high correlation between v(g) and the occurrence of errors and it has become a widely accepted indicator of software reliability. 2002-12-1 CMSC88 Paper Presentation 9 2002-12-1 CMSC88 Paper Presentation 10 Cyclomatic complexity (cont) Cyclomatic complexity (Cont) V = e n + 2p e is the number of edges n is the number of nodes p is the number of components Strengths Directly comparable across different projects, coding style and languages Used to keep modules within an understandable and workable size Weaknesses Formula is incorrect when edges cross Fails to distinguish between different kinds of control flow structures Does not consider the level of nesting of control structure 2002-12-1 CMSC88 Paper Presentation 11 2002-12-1 CMSC88 Paper Presentation 12 2

Function Point Function Point (cont) Function points were introduced in order to replace LOC, provide a more reliable size metric that would capture the added benefit to the user obtained in the developing software, that would be independent of source code and language. weights: Simple Average Complex Function point can be estimated early in the software lifecycle. W 1 Concept of FP Proposed by Albrecht in 199 UFP = number of inputs * w1 + number of outputs * w2 + number of user inquiries * w + number of files * w + number of external references * w W 2 W W W 10 1 10 2002-12-1 CMSC88 Paper Presentation 1 2002-12-1 CMSC88 Paper Presentation 1 Function point counting procedure Testing Process Selection the project to count Identify the boundaries of the application Select experts Calculate the 1 general system characteristic and the value adjustment factor Count and weigh EIF (External interface file) ILF (Internal logical file) EI (External inputs) EO (External outputs) EQ (External inquiries) Get the unadjusted function point count Get the adjusted function point count Prediction based on the collection of data about defects discovered during early inspection and testing phases. In the absence of an extensive local database it may be possible to use published bench-marking data to help with this kind of prediction 2002-12-1 CMSC88 Paper Presentation 1 2002-12-1 CMSC88 Paper Presentation 1 Process quality Data Process quality Data (cont) The quality of the development process is the best predictor of product quality. The Capability Maturity Model for Software (CMM or SW-CMM) is a model for judging the maturity of the software processes of an organization and for identifying the key practices that are required to increase the maturity of these processes. The Software CMM has become a standard for assessing and improving software processes. Through the SW-CMM, the SEI and community have put in place an effective means for modeling, defining, and measuring the maturity of the processes used by software professionals. Need more information, go to : http://www.sei.cmu.edu/cmm/cmm.html 2002-12-1 CMSC88 Paper Presentation 1 2002-12-1 CMSC88 Paper Presentation 18

Multivariate Approaches General problems Many of the metrics are colinear Design method to choose the principal components Develop the relative complexity metric This metric is calculated using the magnitude of variability from each of the factor analysis dimensions as the input weights in a weighted sum 2002-12-1 CMSC88 Paper Presentation 19 Different definition of defects Postrelease defects The total of known defects The set of defects discovered after some arbitrary fixed point in the software life cycle The weakness of defect count itself as a measure of software reliability Difficult of determining in advance the seriousness of a defect Wide variations of operational profiles Large numbers of defects may not necessarily lead to improved reliability 2002-12-1 CMSC88 Paper Presentation 20 Problem with size and complexity method Assume a straightforward relationship with defects defects are a function of size or defects are caused by program complexity Problem with Multivariate Approach It s hard to see how you might advise a programmer or designer on how to redesign the programs to achieve a better metric The modal ignore the causal effects of programmers and designers Complex program are themselves a consequence of poor design ability or problem difficult Defects may be introduced at the design stage control? a HNK? a PRC? a E? a VG? a MMC? a Error? a HNP? a LOC 1 2 8 2002-12-1 CMSC88 Paper Presentation 21 2002-12-1 CMSC88 Paper Presentation 22 Problem with data quality and statistical methodology Wake up Lack of attention to the assumptions necessary for successful use of a particular statistical technique Lack of distinction made between model fitting and model prediction Unjustified removal of data points Misuse of averaged data 2002-12-1 CMSC88 Paper Presentation 2 2002-12-1 CMSC88 Paper Presentation 2

BBN Method BBN Method (cont) Bayesian Belief Networks (BBNs) allow us to express complex interrelations within the model at a level of uncertainty commensurate with the problem. Bayesian Belief Network, Belief Networks, Causal Probabilistic Networks, Causal Nets, Graphical Probability Networks, Probabilistic Cause-Effect Models, and Probabilistic Influence Diagrams A BBN is a graphical network that represents probabilistic relationships among variables. BBNs enable reasoning under uncertainty and combine the advantages of an intuitive visual representation with a sound mathematical basis in Bayesian probability. 2002-12-1 CMSC88 Paper Presentation 2 A BBN is a special type of diagram together with an associated set of probability tables. The graph is made up of nodes and arcs where the nodes represent uncertain variables and the arcs the causal/relevance relationships between variables. The nodes represent discrete or continuous variables. BBN can predict events based on partial or uncertain data. It has the ability to represent and manipulate complex models that might never be implemented using conventional methods 2002-12-1 CMSC88 Paper Presentation 2 BBN Method (cont) Bayesian Method Bayes set out his theory in 1 In making sense of new data, traditional statistical methods ignore the past. Bayesian techniques, in contrast, allow you to start with what you already believe and then see how new information changes your confidence in that belief. Putting Bayes theorem to work requires heavy calculus now aided by computer models but the fundamental concept is quite straightforward. 2002-12-1 CMSC88 Paper Presentation 2 2002-12-1 CMSC88 Paper Presentation 28 Bayesian Method (cont) Bayesian Method (cont) P( H)? P( D / H) P( H / D)? P( D) P(H) prior distribution P(D/H) likelihood function P(D) evidence distribution The theorem says that the probability of the hypothesis, given the data, is equal to the probability of the data, given that the hypothesis is correct, multiplied by the probability of the hypothesis before obtaining the data provided by the averaged probability of the data. Advantages Provide a mathematical rule explaining how you should change your existing beliefs in the light of new evidence. Bayesian statistics allows researchers to use everything from hunches to hard data to compute the probability that a hypothesis is correct. provides a mechanism for updating state-of-knowledge when the information is accumulated in stages pieces. 2002-12-1 CMSC88 Paper Presentation 29 2002-12-1 CMSC88 Paper Presentation 0

Bayesian Method (cont) BBN Method Disadvantage It s difficult to implement before the birth of computers. In theorem, Bayesians say enough evidence will lead people who start with dramatically different priors to essentially the same posterior answer. When the data are scarce, however, the choice of a prior probability can heavily influence the posterior. NPT node probability table The NPTs capture the conditional probabilities of a node given the stateof its parent nodes. Propagation - propagation algorithms Although Bayesian probability been around for a long time it is only in the last few years that efficient algorithms (and tools to implement them) have been developed to enable propagation in networks with a reasonable number of variables Hugin tools --- Very powerful BBN tool with excellent graphical interface and fast implementation of Bayesian propagation. Check at www.hugin.com 2002-12-1 CMSC88 Paper Presentation 1 2002-12-1 CMSC88 Paper Presentation 2 BBN method (cont) More resource Doing propagation is difficult when the number of variable increase or the node dependence increase. There is no universal efficient algorithm for doing the computation. In the 1980s researchers discovered propagation algorithms that were effective for large classes of BBNs. It is now possible to use BBNs to solve complex problems without doing any of the Bayesian calculations by hand The research area in BBN 2002-12-1 CMSC88 Paper Presentation BBN tutorial http://www.dcs.qmul.ac.uk/~norman/bbns/bb Ns.htm BBN paper http://www.cs.cmu.edu/afs/cs.cmu.edu/user/ldc/ WWW/bayes-net-research/bayes-netresearch.html 2002-12-1 CMSC88 Paper Presentation Thank you! 2002-12-1 CMSC88 Paper Presentation