Defect Removal Metrics. SE 350 Software Process & Product Quality 1

Similar documents
Defect Removal. RIT Software Engineering

Why Tests Don t Pass

2013 JadaCastellari.com all rights reserved

Getting the Design Right Daniel Luna, Mackenzie Miller, Saloni Parikh, Ben Tebbs

Reliability of feedback fechanism based on root cause defect analysis - case study

CHAPTER ONE CORRELATION

NEXT. Top 12 Myths about OEE

SSAA SCOPED 3 POSITIONAL, FIELD RIFLE, NRA & AIR RIFLE. Shoot three 5 shot groups. This exercise requires a target with at least 3 diagrams.

Chapter 02 Developing and Evaluating Theories of Behavior

VO2 Max Booster Program VO2 Max Test

Outline. What s inside this paper? My expectation. Software Defect Prediction. Traditional Method. What s inside this paper?

Table of Contents. Introduction. 1. Diverse Weighing scale models. 2. What to look for while buying a weighing scale. 3. Digital scale buying tips

Clever Hans the horse could do simple math and spell out the answers to simple questions. He wasn t always correct, but he was most of the time.

Intro to HCI / Why is Design Hard?

REPORTING INSTRUCTIONS AND DATA DICTIONARY FOR FY 2017 FLEX PROGRAM PIMS INTRODUCTION

How Confident Are Yo u?

The Effect of Analysis Methods and Input Signal Characteristics on Hearing Aid Measurements

DEFECT DENSITY METRICS

Test-Driven Development Exposed Growing Complex Software One Test at a Time

CHAPTER 15: DATA PRESENTATION

Paul Figueroa. Washington Municipal Clerks Association ANNUAL CONFERENCE. Workplace Bullying: Solutions and Prevention. for

Optimizing User Flow to Avoid Event Registration Roadblocks

What You Should Know Before You Hire a Chiropractor by Dr. Paul R. Piccione, D.C.

Step-by-Step Guide to Using Salesforce ( to Request an Interpreter

THE FIRST SESSION CHECKLIST

Standardized Defect Statuses. Abstract

Test-Driven Development

tracking progression we can better manage our patients. Like any tool, any instrument you ve got to

Attention and Concentration Problems Following Traumatic Brain Injury. Patient Information Booklet. Talis Consulting Limited

GEORGIA INSTITUTE OF TECHNOLOGY George W. Woodruff School of Mechanical Engineering ME Creative Decisions and Design Summer 2011

Human intuition is remarkably accurate and free from error.

A Guide to Reading a Clinical or Research Publication

AUDIOLOGY MARKETING AUTOMATION

GRiST handbook. Christopher Buckingham. August 12, Definition of terms 2

How Lertap and Iteman Flag Items

Writing Reaction Papers Using the QuALMRI Framework

15 INSTRUCTOR GUIDELINES

You can use this app to build a causal Bayesian network and experiment with inferences. We hope you ll find it interesting and helpful.

National Inspection of services that support looked after children and care leavers

FMEA AND RPN NUMBERS. Failure Mode Severity Occurrence Detection RPN A B

Jack Grave All rights reserved. Page 1

Lightened Dream. Quick Start Guide Lightened Dream is a dream journal designed to wake you up in your dreams.

Gaps In Successful EHR Implementations: The Challenges & Successes Of Providers In The Marketplace

Following is a list of topics in this paper:

GYM Workout Planner Story

VIEW AS Fit Page! PRESS PgDn to advance slides!

Intro to HCI / Why is Design Hard?

Bipolar Disorder in Children and Teens

My Review of John Barban s Venus Factor (2015 Update and Bonus)

OneTouch Reveal Web Application. User Manual for Healthcare Professionals Instructions for Use

Module 4 Introduction

Subliminal Messages: How Do They Work?

Using the Patient Reported Outcome Measures Tool (PROMT)

Foundations of software engineering

Getting a DIF Breakdown with Lertap

Performance Dashboard for the Substance Abuse Treatment System in Los Angeles

How-To Evaluate a Veterinary Digital Radiography System A SPECIAL REPORT

MyDispense OTC exercise Guide

2017 FAQs. Dental Plan. Frequently Asked Questions from employees

August 29, Introduction and Overview

Overcoming Subconscious Resistances

Cognizanti. You Don t Join Us; We Join You. Insurance/Health & Wellness VOLUME

UWMD Workshop: Outcomes and Logic Models. CIG Application Workshop for FY

Illusion of control is all about the relationship between the conscious and the sub-conscious mind.

Sample Observation Form

FEELING AS INTRINSIC REWARD

Feeling. Thinking. My Result: My Result: My Result: My Result:

SMART STEPS towards a tobacco-free life

Team 6 - Green Prep. Project Synopsis. Project Description. Introduction. Problem: Objective:

HONG GAO LIANG JIA ZHU ('RED SORGHUM: A NOVEL OF CHINA' IN TRADITIONAL CHINESE CHARACTERS) BY YAN MO

Flex:upgrade counsel guide 1.0

IE 361 Module 31. Patterns on Control Charts Part 1. Reading: Section 3.4 Statistical Methods for Quality Assurance. ISU and Analytics Iowa LLC

Hypothesis-Driven Research

Who? What? What do you want to know? What scope of the product will you evaluate?

Gage R&R. Variation. Allow us to explain with a simple diagram.

Don t Worry About Every Abandoned Call Calls abandoned in 10 seconds or less are not sales opportunities!

Improvement in Diabetic Care

Enumerative and Analytic Studies. Description versus prediction

Superhero Sprints Quick Start Guide

PATIENT AND FAMILY SATISFACTION. Dr. James Stallcup, M.D.

One in four adults experiences a mental health problem. Poor mental health can affect anyone of any age.


Improving Individual and Team Decisions Using Iconic Abstractions of Subjective Knowledge

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments

The Power of Feedback

Automatic Context-Aware Image Captioning

Lecture 9: Sound Localization

Recording Transcript Wendy Down Shift #9 Practice Time August 2018

Michael Norman International &

CONCEPTS GUIDE. Improving Personal Effectiveness With Versatility

CAN T WE ALL JUST GET ALONG?

Name of the paper: Effective Development and Testing using TDD. Name of Project Teams: Conversion Team and Patient Access Team.

Expert System Profile

Chapter 1. Research : A way of thinking

Making charts in Excel

NONPROFIT CONFLICT COMMUNICATION SKILLS

FORM 5.1. CTAG Case Conceptualization Form

Multiple Regression Models

Assignment 4: True or Quasi-Experiment

Transcription:

Defect Removal Metrics 1

Objectives Understand some basic defect metrics and the concepts behind them Defect density metrics Defect detection and removal effectiveness etc. Look at the uses and limitations of quality metrics Build some intuition to help balance investment in quality against the cost of poor quality Cost of Quality Cost of Poor Quality 2

Defect Removal Metrics: Concepts All defect removal metrics are computed from the measurements identified last time: Inspection reports, test reports, field defect reports Used to get different views on what s going on Each metric can be used to tell us something about the development process or results Many are amazingly useful, though all have limitations Need to learn how to use each metric and tool effectively For most defect metrics, filter out minor and cosmetic defects Can easily make many metrics look good by finding more or fewer cosmetic problems (level of nitpicking) 3

Measuring Total Number of Defects Many metrics have parameters such as total number of defects For example: Total number of requirements defects Clearly, we only ever know about the defects that are found So we never know the true value of many of these metrics Further, as we find more defects, this number will increase: Hopefully, finding defects is asymptotic over time We find fewer defects as time goes along, especially after release So metrics that require total defects information will change over time, but hopefully converge eventually The later in the lifecycle we compute the metric, the more meaningful the results, but also the less useful for the current project If and when we use these metrics, we must be aware of this lag effect and account for it 4

Measuring Size Many defect metrics have size parameters: The most common size metric is KLOC (thousands of lines of code) Depends heavily on language, coding style, competence Code generators may produce lots of code, distort measures Are included libraries counted? Does not take complexity of application into account Easy to compute automatically and reliably (but can be manipulated) An alternative size metric is function points (FP s) A partly-subjective measure of functionality delivered Directly measures functionality of application: number of inputs and outputs, files manipulated, interfaces provided, etc. More valid but less reliable, more effort to gather We use KLOC in our examples, but works just as well with FP s Be careful with using feature count in agile processes 5

Defect Density Most common metric: Number of defects / size Defect density in released code ( defect density at release ) is a good measure of organizational capability Defects found after release / size of released software Can compute defect densities per phase, per increment, per component, etc. Useful to identify problem components that could use rework or deeper review Heuristic: Defects tend to cluster Note that problem components will typically be highcomplexity code at the heart of systems Focus early increments on complex functionality to expose defects and issues early 6

Using Defect Density Defect densities (and most other metrics) vary a lot by domain Can only compare across similar projects Very useful as measure of organizational capability to produce defect-free outputs Can be compared with other organizations in the same application domain Outlier information useful to spot problem projects and problem components Can be used in-process, if comparison is with defect densities of other projects in same phase or increment If much lower, may indicate defects not being found If much higher, may indicate poor quality of work (Need to go behind the numbers to find out what is really happening Metrics can only provide triggers) 7

Defect Density: Limitations Size estimation has problems of reliability and validity Total Defects problem: Can only count the defects you detect Criticality and criticality assignment Combining defects of different criticalities reduces validity Criticality assignment is itself subjective Defects may not equal reliability Users experience failures, not defects Statistical significance when applied to phases, increments, and components Actual number of defects may be so small that random variation can mask significant variation 8

(Can only count the defects you detect) Defect Removal Effectiveness Percentage of defects removed during a phase or increment (Total Defects found) / (Defects found during that phase + Defects not found) Approximated by: (Defects found) / (Defects found during that phase + Defects found later) Includes defects carried over from previous phases or increments Good measure of effectiveness of defect removal practices Test effectiveness, inspection effectiveness Correlates strongly with output quality Other terms: Defect removal efficiency, error detection efficiency, fault containment, etc. 9

Phase Found Phase Found DRE Table Example Phase of Origin Req Des Code UT IT ST Field Total Found Cum. Found Req 5 5 5 Des 2 14 16 21 Code 3 9 49 61 82 UT 0 2 22 8 32 114 IT 0 3 5 0 5 13 127 ST 1 3 16 0 0 1 21 148 Field 4 7 6 0 0 0 1 18 166 Total Injected 15 38 98 8 5 1 1 166 Cum. Injected 15 53 151 159 164 165 166 (Illustrative example, not real data) Phase of Origin 10

Increment Found Increment Found DRE Table Example: Increments Increment of Origin I-1 I-2 I-3 I-4 I-5 I-6 Field Total Found I-1 5 5 5 Cum. Found I-2 2 14 16 21 I-3 3 9 49 61 82 I-4 0 2 22 8 32 114 I-5 0 3 5 0 5 13 127 I-6 1 3 16 0 0 1 21 148 Field 4 7 6 0 0 0 1 18 166 Total Injected Cum. Injected 15 38 98 8 5 1 1 166 15 53 151 159 164 165 166 (Illustrative example, not real data) Increment of Origin 11

Phase Found Requirements Phase DRE Example In the requirements phase, 5 requirements defects were found and removed But additional requirements defects were found in later phases. The total number of found requirements defects at the end of all phases (plus field operations) is 15 15 total requirements defects injected DRE in requirements phase is 5/15 (# found / # available to find) Req Des Code UT IT ST Field Total Found Cum. Found Req 5 5 5 Des 2 14 16 21 Code 3 9 49 61 82 UT 0 2 22 8 32 114 IT 0 3 5 0 5 13 127 ST 1 3 16 0 0 1 21 148 Field 4 7 6 0 0 0 1 18 166 Total Injected Total defects found in requirements phase = 5 Total requirements defects injected = 15 15 38 98 8 5 1 1 166 Cum. Injected 15 53 151 159 164 165 166 (Illustrative example, not real data) Phase of Origin 12

Phase Found Req Des Code UT IT ST Field Total Found Cum. Found Req 5 5 5 Des 2 14 16 21 Code 3 9 49 61 82 UT 0 2 22 8 32 114 IT 0 3 5 0 5 13 127 ST 1 3 16 0 0 1 21 148 Field 4 7 6 0 0 0 1 18 166 Total Injected Cum. Injected Design Phase DRE Example In the design phase, 14 design defects were found and removed, plus 2 requirements defects were found and removed. Total defects found and removed: (14+2) = 16 Additional design defects were found in later phases: 38 total design defects injected Total defects removed prior to design phase = 5 Total defects found in design phase = 16 Total design defects injected = 38 15 38 98 8 5 1 1 166 15 53 151 159 164 165 166 Total defects available to find Phase = 48 of Origin (Cum. injected SE Cum. 350 Software Found Process in prior & phases) Product Quality To compute removal effectiveness in the design phase, we need to count how many defects (requirements and design) were still in the system (we do not count those already found and removed in the requirements phase) There were 15 requirements defects total injected, but 5 had already been found and removed in the requirements phase 10 requirements defects available to find There were 38 total design defects injected, and 14 of those 38 were found So, in design phase (2+14) defects found (10 + 38) defects available to find Design phase DRE = (2+14)/(10+38) = 16/48 13

Phase Found Coding Phase DRE Example Coding phase DRE = 61/130 Req Des Code UT IT ST Field Total Cum. Total defects removed prior Found Found to coding phase = 21 Req 5 5 5 Des 2 14 Total defects found in 16 21 Code 3 9 49 coding phase = 61 61 82 UT 0 2 22 8 32 114 IT 0 3 5 0 5 13 127 ST 1 3 16 0 Total coding defects injected 0 = 1 98 21 148 Field 4 7 6 0 0 0 1 18 166 Total 15 38 98 8 5 1 1 166 Injected Total defects available to find = 130 Cum. Injected 15 53 151 159 164 165 166 (Illustrative example, not real data) 14 Phase of Origin

Phase Found Unit Test Phase DRE Example Unit Test phase DRE = 32/77 Req Des Code UT IT ST Field Total Found Cum. Found Req 5 5 5 Des 2 14 16 21 Code 3 9 49 Total defects found in 61 82 UT 0 2 22 8 unit test phase = 32 32 114 IT 0 3 5 0 5 13 127 ST 1 3 16 0 0 1 21 148 Field 4 7 6 0 0 Total 0 defects 1 available 18 to find = 166 77 Total 15 38 98 8 5 1 1 166 Injected Cum. Injected 15 53 151 159 164 165 166 (Illustrative example, not real data) 15 Phase of Origin

DRE Value Compute effectiveness of tests and reviews: Actual defects found / defects present at entry to review/test (Phasewise Defect Removal Effectiveness: PDRE) For incremental development, Increment Defect Removal Effectiveness: IDRE Compute overall defect removal effectiveness: Problems fixed before release / total originated problems Analyze cost effectiveness of tests vs. reviews: Hours spent per problem found in reviews vs. tests Need to factor in effort to fix problem found during review vs. effort to fix problem found during test To be more exact, we must use a defect removal model Shows pattern of defect removal Where defects originate ( injected ), where they get removed 16

Counter-intuitive implication DRE Implications If testing reveals lots of bugs, likely that final product will be very buggy too Not true that we have found and fixed a lot of problems, now our software is OK We can only make this second assertion if testing reveals lots of bugs early on, but the latter stages of testing reveal hardly any bugs And even then, only if you are not simply repeating the same tests! 17

Statistical significance: DRE Limitations Note how small the numbers in each box are Hard to draw conclusions from data about one project At best a crude indicator of which phases and reviews worked better Organization-wide data has far more validity Remember that when the numbers are small, better to show the raw numbers Even if you show DRE percentages, include actual defect count data in each box (DRE = 32/77 preferred to DRE = 42%) Full picture only after project completion Easily influenced by underreporting of problems found 18

Other Related Metrics Phase Containment Effectiveness: % of problems introduced during a phase that were found within that phase For example, Table 1 design PCE = 14/38 = 0.37 ( 37%) PCE of 70% is considered very good Phasewise Defect Injection Rate: Number of defects introduced during that phase / size High injection rates (across multiple projects) indicate need to improve the way that phase is performed Possible solutions: training, stronger processes, tools, checklists, etc. Similar for Increment Defect Injection Rate 19

(From Kan Text) Defect Removal Model Defects existing on phase entry Defects injected during development Defect detection Undetected defects Defect fixing Incorrect fixes Defects remaining after phase exit Defects removed Can predict defects remaining, given: Historical data for phasewise defect injection rates Historical data for rates of defect removal Historical data for rates of incorrect fixes Actual phasewise defects found Can statistically optimize defect removal, given (in addition to rates) Phasewise costs of finding defects (through reviews and testing) Phasewise costs of fixing defects Can decide whether it is worthwhile to reduce fault injection rates, by providing additional training, adding more processes and checklists, etc. This is statistical process control. But remember all the disclaimers on its validity. 20

Additional Metrics for Inspections Several simple (secondary) metrics can be tracked and managed within control limits: Inspection rates: Size / Duration of inspection meeting Very high or very low rates may indicate problems Inspection effort: (Preparation + meeting + tracking) / size Inspection preparation time: Make sure preparation happens Avoid overloading others on team Inspection effectiveness is still the bottom line These are just helping with optimizing inspections 21

Cost of Quality (COQ) Total effort put into quality-related activities: Testing and test development effort Inspections and reviews Quality assessments and preparation COQ is a percentage of project effort Pure number, suitable for comparisons across projects and organizations Can observe relationships between COQ and defect removal efficiency, COQ and release defect density Note that defect prevention reverses the normal relationships: reduces both COQ and release defect density Will NOT show up in defect removal effectiveness! 22

Cost of Poor Quality (COPQ) Total effort put into rework: Cost of fixing defects Cost of revising/updating affected documentation Cost of re-testing, re-inspecting Cost of patches & patch distribution Cost of tracking defects Percentage of project effort, a pure number Would generally correlate well with defect densities If there are fewer defects, less rework needed COPQ < 10% is very good Note that early defect detection (inspections) and defect prevention reduce COPQ Here, COPQ is cost of rework. Also consider cost of redeployment (patches), customer dissatisfaction, etc. 23

Quality Sweet Spot Quantity Number of missed defects Cost of quality Effort on Quality Low COQ Optimal Amount of Quality Effort High COQ

Optimizing Quality Efforts Normally, there is a balance between COQ and COPQ To reduce rework, need to spend more effort on quality upfront Note that high COPQ increases COQ, because of re-testing and other re-work Defect prevention and superior quality approaches (better test methodologies, more effective reviews, etc.) cut both COQ and COPQ Objective is to have a lower COQ while maintaining good (low) COPQ and low release defect density More quality efforts will always improve quality, but there is a point of diminishing returns COPQ, release defect density within targets -> adequate quality 25

Limitations of COQ/COPQ Assumes the numbers are accurate That the numbers fairly reflect all defect removal activities and all rework activities COPQ easily distorted if there is one requirements or design bug that creates a large amount of rework Balancing COQ / COPQ is an organizational-level activity Improves statistical significance, averages out variations Evens out distortions in COPQ due to a couple of highrework bugs Need to wait until product has been in the field to get truer COPQ numbers Should COPQ include customer expectation management? It is more expensive to gain a customer than to keep a customer 26

Limitations of COQ/COPQ - Cont d Can Use COQ / COPQ at the project level as indicators But need to go behind the numbers to interpret better Hard for COQ to account properly for unit tests if developers do it along with coding Inspections often have additional hidden effort because developers will go through their code extra carefully before submitting it for inspection 27

Conclusions Defect densities tell us a lot about the quality of the product Need multiple stages of defect removal: Inspections are well-known to be cost-effective Early detection of defects saves work More expensive to fix bugs late in lifecycle DRE and similar charts help us to: Compute inspection and test effectiveness Predict field defect rates See pattern of defect removal Defect removal metrics can also help optimize effort spent on quality activities: COQ vs. COPQ Notice how all these fancy metrics come from just the basic review and test reports! Don t gather too much data; focus on meaningful analysis 28