Decision Analysis Rates, Proportions, and Odds Decision Table Statistics Receiver Operating Characteristic (ROC) Analysis

Similar documents
Objectives. 6.3, 6.4 Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests

Author's personal copy

Presymptomatic Risk Assessment for Chronic Non- Communicable Diseases

Chapter 4: One Compartment Open Model: Intravenous Bolus Administration

SIMULATIONS OF ERROR PROPAGATION FOR PRIORITIZING DATA ACCURACY IMPROVEMENT EFFORTS (Research-in-progress)

Review. Imagine the following table being obtained as a random. Decision Test Diseased Not Diseased Positive TP FP Negative FN TN

carinzz prophylactic regimens

Risk Prediction & Risk Management Obviously NOT a priority for senior clinical managers in high security psychiatric hospitals.

Severe Psychiatric Disorders in Mid-Life and Risk of Dementia in Late- Life (Age Years): A Population Based Case-Control Study

The Application of a Cognitive Diagnosis Model via an. Analysis of a Large-Scale Assessment and a. Computerized Adaptive Testing Administration

STAT 200. Guided Exercise 7

Differences in the local and national prevalences of chronic kidney disease based on annual health check program data

A Note on False Positives and Power in G 3 E Modelling of Twin Data

VU Biostatistics and Experimental Design PLA.216

Sampling methods Simple random samples (used to avoid a bias in the sample)

Anchor Selection Strategies for DIF Analysis: Review, Assessment, and New Approaches

Do People s First Names Match Their Faces?

Bayesian design using adult data to augment pediatric trials

Psychology Research Process

Patterns of Inheritance

The Model and Analysis of Conformity in Opinion Formation

Daniel Boduszek University of Huddersfield

Relating mean blood glucose and glucose variability to the risk of multiple episodes of hypoglycaemia in type 1 diabetes

Estimating shared copy number aberrations for array CGH data: the linear-median method

Zheng Yao Sr. Statistical Programmer

Linear Theory, Dimensional Theory, and the Face-Inversion Effect

Merging of Experimental and Simulated Data Sets with a Bayesian Technique in the Context of POD Curves Determination

Polymorbidity in diabetes in older people: consequences for care and vocational training

Cocktail party listening in a dynamic multitalker environment

Dental X-rays and Risk of Meningioma: Anatomy of a Case-Control Study

Mendel Laid the Groundwork for Genetics Traits Genetics Gregor Mendel Mendel s Data Revealed Patterns of Inheritance Experimental Design purebred

Child attention to pain and pain tolerance are dependent upon anxiety and attention

SOME ASSOCIATIONS BETWEEN BLOOD GROUPS AND DISEASE

Unit 1 Exploring and Understanding Data

Classical Psychophysical Methods (cont.)

Automatic System for Retinal Disease Screening

Technical Whitepaper

Report of the Committee on Serious Violent and Sexual Offenders

Daniel Boduszek University of Huddersfield

Business Statistics Probability

Research Article Effects of Pectus Excavatum on the Spine of Pectus Excavatum Patients with Scoliosis

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

Randomized controlled trials: who fails run-in?

BMI 541/699 Lecture 16

The Harris Benedict equation reevaluated: resting energy requirements and the body cell mass13

Treating Patients with HIV and Hepatitis B and C Infections: Croatian Dental Students Knowledge, Attitudes, and Risk Perceptions

Internet-based relapse prevention for anorexia nervosa: nine- month follow-up

Psychology Research Process

Introduction. We can make a prediction about Y i based on X i by setting a threshold value T, and predicting Y i = 1 when X i > T.

bivariate analysis: The statistical analysis of the relationship between two variables.

Introducing Two-Way and Three-Way Interactions into the Cox Proportional Hazards Model Using SAS

This is an author-deposited version published in: Eprints ID: 15989

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc.

Reinforcing Visual Grouping Cues to Communicate Complex Informational Structure

IN a recent article (Iwasa and Pomiankowski 1999),

Khalida Ismail, 1 Andy Sloggett, 2 and Bianca De Stavola 3

Gender Differences and Predictors of Work Hours in a Sample of Ontario Dentists. Cite this as: J Can Dent Assoc 2016;82:g26

State-Trace Analysis of the Face Inversion Effect

Annie Quick and Saamah Abdallah, New Economics Foundation

Title: Correlates of quality of life of overweight and obese patients: a pharmacy-based cross-sectional survey

Research with the SAPROF

The meaning of Beta: background and applicability of the target reliability index for normal conditions to structural fire engineering

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14

Evaluation of the Coping Strategies Used by Knee Osteoarthritis Patients for Pain and Their Effect on the Disease-Specific Quality of Life

ROC Curves. I wrote, from SAS, the relevant data to a plain text file which I imported to SPSS. The ROC analysis was conducted this way:

Remaining Useful Life Prediction of Rolling Element Bearings Based On Health State Assessment

An Intuitive Approach to Understanding the Attributable Fraction of Disease Due to a Risk Factor: The Case of Smoking

The duration of the attentional blink in natural scenes depends on stimulus category

Georgina Salas. Topics EDCI Intro to Research Dr. A.J. Herrera

Regret theory and risk attitudes

Induced Mild Hypothermia for Ischemic Stroke Patients

Binary Diagnostic Tests Two Independent Samples

Origins of Hereditary Science

Inferential Statistics

RISK FACTORS FOR NOCTURIA IN TAIWANESE WOMEN AGED YEARS

Identification and low-complexity regime-switching insulin control of type I diabetic patients

Does obesity modify prostate cancer detection in a European cohort?

Module 28 - Estimating a Population Mean (1 of 3)

King s Research Portal

COPD is a common disease. Over the prolonged, Pneumonic vs Nonpneumonic Acute Exacerbations of COPD*

A model of HIV drug resistance driven by heterogeneities in host immunity and adherence patterns

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

INTRODUCTION TO MACHINE LEARNING. Decision tree learning

Transitive Relations Cause Indirect Association Formation between Concepts. Yoav Bar-Anan and Brian A. Nosek. University of Virginia

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

SPECTRAL ENVELOPE ANALYSIS OF SNORING SIGNALS

Statistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN

Week 2 Video 3. Diagnostic Metrics

Epidemiology of PRA in Pre Transplant Renal Recipients and its Relation to Different Factors

Estimation of Area under the ROC Curve Using Exponential and Weibull Distributions

CHAPTER ONE CORRELATION

The Usage of Emergency Contraceptive Methods of Female Students in Hawassa University: A Case Study on Natural and Computational Science

Introduction to screening tests. Tim Hanson Department of Statistics University of South Carolina April, 2011

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality

Family Dysfunction Differentially Affects Alcohol and Methamphetamine Dependence: A View from the Addiction Severity Index in Japan

Cognitive Load and Analogy-making in Children: Explaining an Unexpected Interaction

Problem #1 Neurological signs and symptoms of ciguatera poisoning as the start of treatment and 2.5 hours after treatment with mannitol.

Syncope in Children and Adolescents

Detection Theory: Sensitivity and Response Bias

The use of the Youth Level of Service / Case Management Inventory (YLS/CMI) in Scotland

Transcription:

Decision Analysis Rates, Proortions, and Odds Decision Table Statistics Receiver Oerating Characteristic (ROC) Analysis Paul Paul Barrett Barrett email: email:.barrett@liv.ac.uk htt://www.liv.ac.uk/~barrett/aulhome.htm Affiliations: The The State State Hosital, Carstairs Det. Det. of of Clinical Psychology, Univ. Univ. Of Of Liverool November, 1999 1999

Definitions.1 Base Rate or Prevalence of a Test The robability of an occurrence, usually exressed as a %. Base rates are defined for secific oulations of interest - and are restricted to them. A base rate is equivalent to a roortion. E.g. 5 out of 100 atients commit an assault on clinical staff, this is a base rate of 5/100 0.05 5%. The base rate for ersonality disorder as a diagnosis for atients at Hosital X is about 0.20. Given there are 600 atients at Hosital X, we would exect there to be 600 x 0.20 120 atients with a diagnosis of ersonality disorder.

Definitions.2 Odds The ratio of the robability of an occurrence to the robability of non-occurrence. Odds are thus related, BUT not the same as a roortion or a base rate. E.g. 5 out of 100 atients commit an assault on clinical staff, this is a base rate of 5/100 0.05 5%. The odds of any one atient committing an assault is: 0.05/(1-0.05)0.05263:1. We can multily both sides of the ratio (say by 100) to re-exress it as: 5.26:100. This is interreted as for every 100 atients who do not commit an assault, we would exect to see 5.26 who do.

Definitions.3 Odds (cont). Odds BUT, how is it that we have 5 out of every 100 atients who commit an assault, yet we have 5.26 who are exected to commit an assault for every 100 who do not commit an assault? Look carefully at the words.. for every 100 who do not commit an assault. We know that 5 out of every 100 are exected to commit an assault, so, out of 100 atients who do not commit an assault, we exect to see 5 + ((5/95)x5) 5.26. Why? Well, we know that 5 out of 100 will commit an assault (which is a ratio of 5 who do to every 95 who don t). Therefore, for every new atient who doesn t, there will be a 5/95 chance that one will.

Definitions.4 Odds (cont). The other examle... The base rate for ersonality disorder as a diagnosis for atients at Hosital X is about 0.20. The odds of any one atient having a ersonality disorder diagnosis is: 0.20/(1.0-0.20)0.25 or 25:100. We could also exress this as 4:1odds against any atient having a ersonality disorder. That is, for every atient who has a diagnosis of ersonality disorder, we would exect 4 NOT to have this diagnosis.

More Formally The odds of an event E occurring are: O(E) P(E)/[1 P(E)] The robability of the same event is: P(E) O(E)/[1+O(E)] So, for odds of 0.05263 in our examle above, we can convert these back to a robability of occurrence P(E) 0.05263/[1+0.05263] 0.05

An outcome or classification table.1 Yes Outcome Event No Predicted Outcome Yes No A C Success TP Failure FN B D Failure FP Success TN Where: TP robability of a True Positive decision FP robability of a False Positive decision FN robability of a False Negative decision TN robability of a True Negative decision Which can also be dislayed as:

An outcome or classification table.2 Yes Outcome Event No Predicted Outcome Yes No A C True +ve TP False ve FN B D False +ve FP True ve TN Where: TP robability of a True Positive decision FP robability of a False Positive decision FN robability of a False Negative decision TN robability of a True Negative decision

An examle - using the examle of violent behaviour in slide 2 above. We have a base rate of 0.05 of violence. The table looks like... Predicted Outcome Actual Outcome Yes No Row Marginals TP FP Yes 0.05 FN TN No 0.95 Column Marginals 0.05 0.95 1.0

Given we know that 5 out of every 100 atients will commit an assault, and given we ick 5 atients at random from each 100, how accurate would we be in our rediction of assault - JUST USING the BASE RATE? Predicted Outcome Actual Outcome Yes No Row Marginals TP FP Yes 0.05 FN TN No 0.95 Column Marginals 0.05 0.95 1.0

Remember that under the law of robabilities for events that occur indeendently of one another, we multily the robabilities/roortions for each event occurring alone - in order to comute the robability of observing the joint occurrence. So, for the cell Predicted Yes and Actual Yes, we multily together the robability for the category Predicted Yes (the Row Marginal) and the robability for the category Actual Yes (the Column Marginal)... which is 0.05 x 0.05 0.0025

We can do the same for each cell in the same way what we see is that we would redict the True Negatives correctly with 0.9025. However, by chance alone, we would only redict the True Positives with 0.0025. Predicted Yes No Row Marginals Outcome Yes TP.0025 FP.0475 0.05 FN No.0475 TN.9025 0.95 Column Marginals Actual Outcome 0.05 0.95 1.0

Multilying through by 100 (so as to exress the figures as ercentages) Predicted Yes No Row Marginals Outcome Yes.25% 4.75% 5% No 4.75% 90.25% 95% Column Marginals Actual Outcome 5% 95% 100%

But how do we summarise the information in the table so that we can decide whether the results are good, bad, or indifferent?

Classification Table Indices.1 Sensitivity of the rediction SE TP + TP FN The robability that an actual (outcome) observed event is redicted correctly E.g. The robability of a atient who commits a violent act being redicted to do so Predicted Outcome Yes No Outcome Event Yes Success TP Failure FN No Failure FP Success TN

Classification Table Indices.2 Secificity of the rediction SP TN + TN FP The robability that the actual non-occurrence (outcome) of an event is redicted correctly. E.g. The robability of a atient who does not commit a violent act being redicted correctly. Predicted Outcome Yes No Outcome Event Yes Success TP Failure FN No Failure FP Success TN

Classification Table Indices.3 Positive Power of Prediction PPP TP + TP FP The robability that a Yes rediction correctly redicted the occurrence of an outcome event. E.g. the robability that a atient redicted to be violent was actually violent. Predicted Outcome Yes No Outcome Event Yes Success TP Failure FN No Failure FP Success TN

Classification Table Indices.4 Negative Power of Prediction NPP TN + TN FN The robability that a No rediction correctly redicted the non-occurrence of an outcome event. E.g. the robability that a atient redicted Not to (No) commit an assault, did not in fact commit any violent offences. Predicted Outcome Yes No Outcome Event Yes Success TP Failure FN No Failure FP Success TN

Classification Table Indices.5 Overall Predictive Accuracy - or- the Efficiency of the test PE TP + TN The overall robability of rediction success the robability that the redictions and outcomes agree with one another. Predicted Outcome Yes No Outcome Event Yes Success TP Failure FN No Failure FP Success TN

So - to our examle... Sensitivity of the rediction SE TP + TP FN 0.05 The robability that an actual (outcome) observed event is redicted correctly E.g. The robability of a atient who commits a violent act being redicted to do so Predicted Outcome Outcome Event Yes No Yes 0.0025 0.0475 TP FP No 0.0475 0.9025 FN TN

Secificity of the rediction SP TN + TN FP 0.95 The robability that the actual non-occurrence (outcome) of an event is redicted correctly. E.g. The robability of a atient who does not commit a violent act being redicted correctly. Predicted Outcome Outcome Event Yes No Yes 0.0025 0.0475 TP FP No 0.0475 0.9025 FN TN

Positive Power of Prediction PPP TP + TP FP 0.05 The robability that a Yes rediction correctly redicted the occurrence of an outcome event. E.g. the robability that a atient redicted to be violent was actually violent. Predicted Outcome Outcome Event Yes No Yes 0.0025 0.0475 TP FP No 0.0475 0.9025 FN TN

NPP 0. TN Negative Power of Prediction 95 TN + FN The robability that a No rediction correctly redicted the non-occurrence of an outcome event. E.g. the robability that a atient redicted Not to (No) commit an assault, did not in fact commit any violent offences. Predicted Outcome Outcome Event Yes No Yes 0.0025 0.0475 TP FP No 0.0475 0.9025 FN TN

Overall Predictive Accuracy - or- the Efficiency of the test PE TP + TN 0.905 The overall robability of rediction success the robability that the redictions and outcomes agree with one another. Predicted Outcome Outcome Event Yes No Yes 0.0025 0.0475 TP FP No 0.0475 0.9025 FN TN

Reviewing the results - we have: Sensitivity 0.05 Secificity 0.95 PPP 0.05 NPP 0.95 Predictive Accuracy/Efficiency 0.905 Base Rate 0.05 Let s change the base rate to 0.20 and see what haens:

We comute the exected robabilities of occurrence given our base rate marginal robabilities under a hyothesis that we allocate atients at random into our redicted categories... Predicted Outcome Actual Outcome Yes No Row Marginals TP Yes.04 FP.16 0.20 FN No.16 TN.64 0.80 Column Marginals 0.20 0.80 1.0

TP of the rediction 20 Sensitivity of the rediction Secificity of the rediction SE SP TP + + FN 0. 0. TN of the rediction 80 PPP TP Positive Power of Prediction 20 TN TP + FP FP 0. 0. TN Negative Power of Prediction 80 Overall Predictive Accuracy - or- the Efficiency of the test NPP PE TN + TP + TN FN 0.68

Reviewing both sets of results - we now have: Base Rate Base Rate 0.20 0.05 Sensitivity 0.05 0.20 Secificity 0.95 0.80 PPP 0.05 0.20 NPP 0.95 0.80 Efficiency 0.905 0.68 As can be seen, for a random test, the sensitivity equals the base rate (revalence) of the test, with secificity equal to (1.0 the base rate).

Let us now see what haens when we introduce an intervention into our rediction scenario we use the Barrett ICPE test of assaultative behaviour and use this to make redictions of behaviour amongst atients. My test correctly classifies all atients who commit assaults (5 out of 100), & redicts 8 others who do not actually commit an assault. How does my test shae u against a urely random selection using the base rate of 5%. Predicted Outcome Yes No Row Marginals TP Yes.05 FP.08 0.13 FN No.00 TN.87 0.87 Column Marginals Actual Outcome 0.05 0.95 1.0

Sensitivity of the rediction SE + 1. of the rediction TP 0 TN of the rediction 916 Secificity of the rediction SP PPP TP TN + FN FP 0. 0. TP Positive Power of Prediction 385 TN Negative Power of Prediction 0 Overall Predictive Accuracy - or- the Efficiency of the test NPP PE TP + TN + FP TP + TN FN 1. 0.92

Is there a single statistic that might be used to summarise all these arameters? Well, try these two which exress the classification indices, taking into account the base rate for the data in your table...

Loeber and Dishion s (1983) RIOC statistic RIOC PE C [ ] 1 PPP BR C where C [( BR) ( PPP) + ( 1 BR) ( 1 PPP) ] RIOC is short for Relative Imrovement Over Chance PE Predictive Efficiency PPP Positive Power to Predict BR the Base Rate

Kraemer s (1992) Quality Indices.1.1 κ κ Q () 1 () 0 where TP ( Sensitivity ( Secificity + FP Q Q and Q) Q ) Q 1 Q k(1) and k(0) vary between 0 and 1.0, with 1.0 being maximum ossible quality (erfection) k(1) and k(0) are also called weighted kaas

Kraemer s (1992) Quality Indices.2.2 Using these two quality indices (for Sensitivity and Secificity), it is ossible to comute a Chi Square statistic that rovides a significance test for our classification data. χ 2 0 where N N 0 κ the total () 1 κ ( 0) no.of The test has 1 degree of freedom observations The H o is that cell roortions in your classification table are those exected under a model of indeendent assignment of observations

Kraemer s (1992) Quality Indices.3.3 Although this statistical test looks as though it is secial to Kraemer s indices, it is not. It is just a straightforward chi-square test of indeendence for a 2x2 table more commonly exressed as: 2 ( ) χ 2 f f o e f e where f o observed frequency and f e exected frequency with (R-1)(C-1) df R No. of Rows C No. of Columns

Kraemer s (1992) Quality Indices.4.4 Further, we can write the formula for the hi coefficient as: φ k( 0) k(1) Which is the geometric mean of the two weighted kaas (see DICHOT 3.0 rogram hel on weighted kaa), whose significance can be comuted as a chi-square given by: χ χ χ 2 2 2 N N N φ φ ( k(0) k(1) ) ( k(0) k(1) ) k(0) k(1)

Anyway, Let s see how our examles fare using these statistics. Note that in our first examle, with a base rate of 0.05, we allocated observations (created exected roortions) under an indeendence model (at random). Predicted Yes No Row Marginals Outcome Yes TP.0025 FP.0475 0.05 FN No.0475 TN.9025 0.95 Column Marginals Actual Outcome 0.05 0.95 1.0

RIOC where C RIOC where C PE C [ 1 PPP BR ] C [( BR) ( PPP) + ( 1 BR) ( 1 PPP) ] 0.905 C [ 1 0.05 0.05 ] C [( 0.05) ( 0.05) + ( 1 0.05) ( 1 0.05) ] RIOC 0.0

κ () 1 κ () κ Q () 0 where TP ( Sensitivity Q ( Secificity Q + FP and Q) Q ) Q 1 Q (0.05 0.05) 1 0 0.95 (0.95 0.95) κ () 0 0 0.05 where Q TP + FP and Q 1 Q Thus, regardless of the actual size of the Sensitivity and Secificity, we can see that both are of zero quality. The chi-square value is also zero, as exected since we generated our roortions under a hyothesis of indeendence χ χ 2 2 N 0 0.0 κ () 1 κ ( 0)

Now, we look at the ICPE test results - to assess their legitimacy and subsequent utility... Predicted Outcome Actual Outcome TP Yes No Row FP Marginals Yes.05.08 0.13 FN TN No.00.87 0.87 Column Marginals 0.05 0.95 1.0 The base rate is still 0.05

RIOC where C RIOC where C PE C [ 1 PPP BR ] C [( BR) ( PPP) + ( 1 BR) ( 1 PPP) ] 0.92 C [ 1 0.385 0.05 ] C [( 0.05) ( 0.385) + ( 1 0.05) ( 1 0.385) ] RIOC 5.14

κ () 1 κ () κ Q () 0 where TP ( Sensitivity Q ( Secificity Q + FP and Q) Q ) Q 1 Q (1.0 0.13) 1 1.0 0.87 (0.916 0.87) κ () 0 0.35 0.13 where Q TP + FP and Q 1 Q Thus, although our sensitivity is of excellent quality, our secificity is marred by the number of false ositives, given the low base rate. For the chi-square, we might assume we have 100 atients so.. χ 2 N 0 κ () 1 κ ( 0) Formula 2 2 100 1.0 0.35 P < 0.0001 highly significant χ χ 35.0 with 1d.f.

In conclusion What matters is the relative cost of making Tye 1 (False Negative) vs Tye II (False Positive) errors. If you want to rule out a certain disorder (or say risk of violence) with a articular test cut-score, then focus on the False Negative Rate. If you want to err on the side of detecting even the ossibility of a disorder, then ermit the False Positive Rate to increase, whilst decreasing the False Negative Rate. For examle, breast cancer screening. The various summary statistics are just that - they are there to aid you in your decision making - and for comarative analysis of cometing decision solutions.

In Practice! Current rediction classification accuracy with data from the VRAG (violence risk araisal guide) to make actuarial assessment of violence robability on a 9-oint scale (55% PPP using a binary cutoff for high/low risk), 72% classification accuracy, a Relative Imrovement Over Chance rediction of 0.88, with a base rate of 31% - categorising high scorers as those scoring above the 80th ercentile raw VRAG score, which ranges between -22 and + 28).

In Practice! These results have a 72% classification accuracy with a base rate of 31% (N618) Predicted Outcome Outcome Event Yes No Yes 115 94 No 76 333 Is this good, bad, or indifferent?

Receiver/Relative Oerating Characteristic (ROC) Analysis Well, there is another method of analysis which we can use to assist us in our decision making, based uon 2x2 tables of data. This method originated both from statistical decision theory (Neyman- Pearson Hyothesis Testing and the concet of POWER), and from signal detection theory within electronic signal detection (radar signals). It is concerned with examining the accuracy of stimulus or event discrimination as a function of all ossible values of a decision criterion. For examle, in violence risk assessment, we might wish to determine the accuracy of our discrimination between those who recidivate and those who do not, as a function of some kind of criterion score or value. Decision Analysis: ROC analysis November 1999

ROC Analysis.1 The method has been used within sychology to investigate decision rocesses in humans, with esecial regard to sychohysical recognition and detection judgements (e.g. threshold determination, stimulus discreancy detection), as well as cognitive henomena associated with attention and memory. It has also become a standard technique in general medical research with regard to evidence-based ractise associated with treatment success-failures, factors as causes of illness, and drug effectiveness investigations. It has now come to the fore in actuarial risk research within the forensic mental health domain, and is used as a key methodology for examining the utility or otherwise of any roosed rediction tool. It is imortant to understand exactly what the technique involves, how it is imlemented, and the range of diagnostic arameters that can be comuted using it (and the assumtions made). Decision Analysis: ROC analysis November 1999

ROC Analysis.2 Critical Points ROC analysis ermits an investigator to evaluate a fixed or to be inferred criterion value to make accurate discrimination between two outcomes (whether they be 2 different stimuli, stimulus vs nostimulus, violent offence vs no violent offence, or risk factor vs no risk factor). The criterion value may be a test score, level of risk factor, drug dosage, cognitive judgement, or any other kind of variable that can ossess at least a binary categorical, multile ordinal, or even equal-interval magnitudes). The ROC curve, in every case, consists of lotting the True Positive rate on the ordinate (vertical, y-axis) and the False-Alarm rate (False Positive Proortion or rate) on the abscissa (horizontal, x- axis). Decision Analysis: ROC analysis November 1999

ROC Analysis.3 Critical Point In order to remain focussed on the use of ROC within the forensic mental health domain, and to give the resentation real meaning, I will base all further discussion on the use of the methodology within actuarial risk research, and use the data of Quinsey et al (1994) - the VRAG standardisation samle - to exemlify both the calculations and the range of arameters that can be used to evaluate the risk instrument. Decision Analysis: ROC analysis November 1999

ROC Analysis.4 The Violence Risk Araisal Guide Quinsey et al (1994) collected ersonal, clinical, and offencerelated information from case-records on 618 Canadian male atients who had been released from their high security mental health institution during a eriod rior to and u to a final data of Aril 1998. They also tracked the violent offence recidivism of these atients ost-discharge - and reorted the initial findings in 1993-4. What they were trying to do was isolate those atientoriented variables (case-record information) that were key redictors of recidivist behaviour. Their first task was to determine the key redictors. Their second task was to form a scale of these redictors, assign weights to them (reflecting their imortance to the rediction), and subsequently develo an additive scale of risk (the VRAG). Decision Analysis: ROC analysis November 1999

ROC Analysis.5 We join them after they have develoed the 1-9 score VRAG scale. Let s assume we are working with them as co- investigators We have -3- questions to answer.. What are the violent recidivism rates er VRAG score? What is the overall relationshi between the VRAG scores and recidivist outcome? How accurate is our test over the range of VRAG scores? That is, how accurate are we in correctly detecting violent recidivist vs non-recidivist atients from their VRAG scores alone? The answer to is where we use ROC analysis, along with some of the other measures we have been discussing. Decision Analysis: ROC analysis November 1999

ROC Analysis.6 What are the violent recidivism rates er VRAG score? Rate of Violence Recidivism 1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 Rates of Violent Recidivism for subjects at each of 9 risk levels 11 7 year recidivism follow-u - Taken from Rice (1997) These are the numbers of atients who obtained the VRAG score 71 101 111 0 1 2 3 4 5 6 7 8 9 116 96 Violence Risk Level (VRAG) 74 29 9 Decision Analysis: ROC analysis November 1999

ROC Analysis.7 What is the overall relationshi between the VRAG scores and recidivist outcome? The VRAG redicts violent recidivism in Canadian atients (over 7 and 10 years) with r 0.45 (source.. Webster, Harris, Rice, Cormier, Quinsey (1994),.37, the Violence Prediction Scheme Manual) Clinician rediction of violent recidivism in 3 unique studies (meta analysis N1,080 atients) r 0.09 (source.. Bonta, Law, and Hanson (1998),.136, The rediction of criminal and violent recidivism among mentally disordered offenders: A Meta Analysis. Psychological Bulletin, 123, 2, 123-142). Decision Analysis: ROC analysis November 1999

ROC Analysis.8 This was calculated in the case of the VRAG using a simle Pearson r roduct moment statistic (- might be called a a ointbiserial r in this case - a continuous variable (VRAG score) and - one dichotomous variable (Outcome) - but it is mathematically equivalent to Pearson r) E.g. Patient VRAG Score Outcome 1 2 1 2 1 0 3 5 1 4 2 0 5 2 0 5 3 0 etc. Decision Analysis: ROC analysis November 1999

ROC Analysis.9 How accurate is our test over the range of VRAG scores? That is, how accurate are we in correctly detecting violent recidivist vs non-recidivist atients from their VRAG scores alone? First, we take each VRAG score (1,2,3 9) in turn. We treat this score as a cut-score. That is, we decide that everybody scoring at or above the score is to be redicted as likely to commit a violent offence. We then look at our actual outcome data - and determine how accurate our redictions were. Although Quinsey et al do not rovide these data exlicitly, they do rovide certain data which enables us to comute their observed frequencies for each VRAG score. We are effectively reverseengineering their study - in order to re-create the stes they would have followed. Decision Analysis: ROC analysis November 1999

ROC Analysis.10 In order to achieve this (and for any other dataset that might come your way in future) - We need the base-rate for the violent outcomes in the samle The recidivism rates/robabilities at each score level. The numbers of atients scoring at each level The total N (number of atients in the samle) From these arameters, we can comute all necessary information to re-create the 2x2 decision table for each score level. Decision Analysis: ROC analysis November 1999

ROC Analysis.11 The base rate for violent recidivism in the Canadian data was 0.31 The recidivism robabilities at each score level is given in Table A.1. in Quinsey et al (1998) - also can be seen on the grah on slide #52 (5 slides back). The numbers of atients scoring at each VRAG score is given in Rice (1997) - also can be seen on the grah on slide #52 (5 slides back). We know the total N of the samle is 618. We can now comute our 2x2 tables for each score level... Decision Analysis: ROC analysis November 1999

ROC Analysis.12 For a VRAG score of 1 We know that with a base rate of 0.31 (actually 0.3090615), there were (618 * 0.3090615 191 recidivist offenders in the total samle of 618 atients. We are redicting that everybody who scores 1 or above will commit a violent offence. Because this is the lowest ossible score, we are saying that all 618 atients will commit an offence. We now comare our redictions to the actual outcome data in a 2x2 table I m using the freely downloadable DICHOT 3.0 rogram to exemlify the data tables and analyses see web-site reference at end of resentation. Decision Analysis: ROC analysis November 1999

ROC Analysis.13 For a VRAG score of 1 We can see that we make no negative redictions (will not commit a violent offence), hence we make massive false-ositive errors. Take a look at the various indices we have discussed to see how they describe such a table... Decision Analysis: ROC analysis November 1999

ROC Analysis.14 For a VRAG score of 1 Note the figures in black, these are the ROC curve coordinate values that you will lot later Decision Analysis: ROC analysis November 1999

ROC Analysis.15 For a VRAG score of 2 We are redicting that everybody who scores 2 or above will commit a violent offence. Anybody scoring 1 is redicted NOT to commit an offence. We know from Rice (1997) that 11 atients scored at 1, and (618-11) 607 scored 2 and above. We are thus redicting that 607 atients will commit a violent offence, and 11 will not. Now, out of the 607 for whom we make a ositive rediction, we know from the recidivism robabilities given by Quinsey et al, that no-one recidivated at a score of 1, hence, our negative rediction will be error-free (no false-negatives - redicted no, but outcome occurred)... Decision Analysis: ROC analysis November 1999

ROC Analysis.16 For a VRAG score of 2 Decision Analysis: ROC analysis November 1999

ROC Analysis.17 For a VRAG score of 2 Decision Analysis: ROC analysis November 1999

ROC Analysis.18 For a VRAG score of 3 We are redicting that everybody who scores 3 or above will commit a violent offence. Anybody scoring 2 or below is redicted NOT to commit an offence. We know from Rice (1997) that 11 atients scored at 1, and 71 scored at 2 82 (618-82) 536 scored 3 and above. We are thus redicting that 536 atients will commit a violent offence, and 82 will not. Now, out of the 536 for whom we make a ositive rediction, we know from the recidivism robabilities given by Quinsey et al, that no-one recidivated at a score of 1, but, 0.08 atients recidivated with a score of 2, hence, our negative rediction will contain (0.08 * 71 6 atients out of 82 who were redicted as safe, but who went on to commit a violent offence. Thus... Decision Analysis: ROC analysis November 1999

ROC Analysis.19 For a VRAG score of 3 Knowing that 6 atients who did commit an offence were wrongly redicted as safe, we also know the value for cell A, as (191-6). Remember, the base rate remains constant as it is indexing the actual occurrence of violence in the total samle. Decision Analysis: ROC analysis November 1999

ROC Analysis.20 For a VRAG score of 3 Decision Analysis: ROC analysis November 1999

ROC Analysis.21 For a VRAG score of 4 We are redicting that everybody who scores 4 or above will commit a violent offence. Anybody scoring 3 or below is redicted NOT to commit an offence. We know from Rice (1997) that 11 atients scored at 1, 71 scored at 2, and 101 scored at 3 183 (618-183) 435 scored 4 and above. We are thus redicting that 435 atients will commit a violent offence, and 183 will not. Now, out of the 435 for whom we make a ositive rediction, we know from the recidivism robabilities given by Quinsey et al, that no-one recidivated at a score of 1, but, 0.08 atients recidivated with a score of 2, and 0.12 at a score of 3, hence, our negative rediction will contain (0.08*71 6) + (0.12*101 12) 18 atients out of 183 who were redicted as safe, but who went on to commit a violent offence. Thus...

ROC Analysis.22 For a VRAG score of 4 Decision Analysis: ROC analysis November 1999

ROC Analysis.23 For a VRAG score of 4 Decision Analysis: ROC analysis November 1999

ROC Analysis.24 For a VRAG score of 5 We are redicting that everybody who scores 5 or above will commit a violent offence. Anybody scoring 4 or below is redicted NOT to commit an offence. We know from Rice (1997) that 11 atients scored at 1, 71 scored at 2, 101 scored at 3, and 111 scored at 4 294 (618-294) 324 scored 5 and above. We are thus redicting that 324 atients will commit a violent offence, and 294 will not. Now, out of the 324 for whom we make a ositive rediction, we know from the recidivism robabilities given by Quinsey et al, that no-one recidivated at a score of 1, but, 0.08 atients recidivated with a score of 2, 0.12 at a score of 3, and 0.17 at a score of 4, hence, our negative rediction will contain (0.08*71 6) + (0.12*101 12) + (0.17*111 19) 37 atients out of 294 who were redicted as safe, but who went on to commit a violent offence. Thus...

ROC Analysis.25 For a VRAG score of 5 Decision Analysis: ROC analysis November 1999

ROC Analysis.26 For a VRAG score of 5 And so on until a VRAG score of 9 Decision Analysis: ROC analysis November 1999

ROC Analysis.27 For a VRAG score of 9 We are redicting that everybody who scores 9 will commit a violent offence. Anybody scoring below 9 is redicted NOT to commit an offence. We know from Rice (1997) that 9 atients scored at 9, all of whom committed a violent offence. Hence we are making just 9 redictions of violence, and we know that all 9 commit a violent offence. However, we are now saying that (618-9) 609 atients will not commit a violent offence. This means our false negative frequency is (191-9)182 (atients redicted to be safe but who went on to commit a violent offence). Because all atients redicted to be violent were violent, we have no false ositives. Thus, we have... Decision Analysis: ROC analysis November 1999

ROC Analysis.28 For a VRAG score of 9 Decision Analysis: ROC analysis November 1999

ROC Analysis.29 For a VRAG score of 9 The reason why the RIOC Invalid is due to a division by zero in the formula (no false ositives, erfect PPP). Decision Analysis: ROC analysis November 1999

ROC Analysis.30 Sensitivity (Prob. of correctly redicting offence) ROC Curve for Quinsey et al (1994) VRAG instrument Using 7 yr recidivism robabilities 1.0 0.9 VRAG Score 5 VRAG Score 4 0.8 0.7 0.6 0.5 0.4 50/50 Chance level 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 False Alarm, False Positive Rate (1-Secificity)

ROC Analysis.31 Some Key Points... The area below the black line indicates discrimination less than chance The area above the black line indicates discrimination greater than chance We can comute the area under our ROC curve (the red line) as a base-rate insensitive measure of the accuracy of our test. We can also use this measure as a means to comare tests for their efficiency over a score range We can also use the distance d` (d-rime, Cohen s d) between the means of our two hyothetical samling distributions as an estimate of our effect size (discrimination). Finally, we can estimate r (the relationshi) from our d` - if we don t have access to all the data. Decision Analysis: ROC analysis November 1999

ROC Analysis.32 Let s take a look at what we are doing - in terms of our two hyothetical distributions (the null (no-outcome) and alternative (outcome) distributions). Our exosition here assumes that both distributions are standard unit-normal, have the same standard deviation (SD) of 1.0, but differ in their mean locations, along the x-axis SD scale. Essentially, as we move our criterion value along this standardized x- axis, we can comute exected values for the 4 cells of our 2x2 table. The grah below shows the cell roortions and terminology - within the decision-theoretic framework. The grah is scaled correctly for the d` arameter as shown. Decision Analysis: ROC analysis November 1999

ROC Analysis.33 ROC Curves - d` 2.12, FPR 0.10, Sensitivity 0.80 No Signal+Noise Null distrib. Signal+Noise Alternative distrib. µ ο d' µ 1 x c Probability True Negatives False Negatives False Positives True Positives 1.28 SD units 0.84 SD units Standard Deviation Units 0.0 0.0 x c The decision criterion value

ROC Analysis.34 Now, let s comute the Area under the Curve (AUC) for the ROC curve. We can do this by converting our observed ROC coordinate values for each samled oint into normal distribution z-deviate coordinate values. The sloe and linear regression of the z- transformed Sensitivity on the z-transformed false-alarm rate allow us to estimate the AUC as: ˆ a Z AUC (1 + Where a the intercet arameter of the linear regression b the sloe arameter of the linear regression So - let s run through what we do... 2 b ) Decision Analysis: ROC analysis November 1999

ROC Analysis.35 The Data File ZSENS and ZFPR are the deviate z-scores of the corresonding values of SENSITIV and FPR. Decision Analysis: ROC analysis November 1999

ROC Analysis.36 Plotting the Data, using only valid airs... Deviate z-score ROC lot - for valid airs of observations Regression Z-score sensitivity 1.0093 +.97762 * Z-False Alarms 2.0 1.5 Regression 95% confid. Z-score Sensitivity 1.0 0.5 0.0-0.5-1.0-1.5-2.0-2.5-2.0-1.5-1.0-0.5 0.0 0.5 1.0 1.5 Z-score FPR, False Alarm Rate Decision Analysis: ROC analysis November 1999

ROC Analysis.37 And finally, the calculation... ˆ a Z AUC (1 + 2 b ) Where: a the intercet arameter of the linear regression 1.0093 b the sloe arameter of the linear regression 0.97762 ˆ Z AUC (1 + 1.0093 0.97762 2 ) 0.723 However, if we look at the relationshis that should hold between a, b, and d`, and the SDs in both distributions... Decision Analysis: ROC analysis November 1999

ROC Analysis.38 d a / b µ 0 b σ / σ 0 1 µ Where σ o and σ 1 are the standard deviations of the null and alternative distributions, and each are equal to 1.0 From our data we have.. d 1.0093/ 0.97762 1. 0324 0.97762 s s 0 / 1 1 Where s 0 and s 1 are samle estimates of our standard deviations. However, the ratio of standard deviations (b) should equal 1.0 (as each oulation σ is assumed equal to 1.0). Decision Analysis: ROC analysis November 1999

ROC Analysis.39 So, if we solve the linear regression equation for the intercet, with a constant sloe value of 1.0, we obtain a 1.02124 a/b d` (because b 1.0) Z AUC 1.02124 (1 + 1.0 2 ) 0.722 The ublished value for d` for the VRAG is 1.06. The ublished value for Z AUC for the VRAG is 0.76 The reason for the discreancy between mine and the ublished arameters is due to Quinsey et al comuting their ROC using raw VRAG scores (their grah on. 149 of the 1998 book shows 15 comuted oints, comared to my 6 using the scaled VRAG. Decision Analysis: ROC analysis November 1999

ROC Analysis.40 As to the estimation of d` for single 2x2 tables, and for the estimation of r (the overall relationshi), see the single-sheet A4 handouts - which is simly a rintout of the online hel in DICHOT 3.0. Decision Analysis: ROC analysis November 1999