Review: Conditional Probability. Using tests to improve decisions: Cutting scores & base rates

Review: Conditional Probability Using tests to improve decisions: & base rates Conditional probabilities arise when the probability of one thing [A] depends on the probability of something else [B] In such cases, we want to factor in the probability of B before we worry about A This amounts to focusing on the elements that are likely to be picked out by both A and B Three ways We can consider three ways to solve condition probability questions (all exactly equivalent): Common sense Probability tables Bayes Theorem a.) Common sense 5 males, one wears a dress; 3 females, wear dresses. What is the probability that you wear a dress, given that you are female? First: We want to know how many people are both dress wearers and females = P(A and B) = Second: We want to know what proportion of all woman are accounted for by the dress wearing females = Dress wearing females / Females = P(Female and dress-wearing)/p(female) = /5 b.) Probability Tables What is the probability that you sometimes wear a dress, given that you are female? b.) Probability Tables What is the probability that you sometimes wear a dress, given that you are female? Dress No dress Dress No dress Male Male JUST IGNORE ALL THE MALES! Female Female

c.) Bayes Theorem What is the probability that you sometimes wear a dress, given that you are female? P(A B) = P(B A) P(A) / P(B) Proof: By definition, (.) P(A B) = P(A and B) / P(B) (2.) P(B A) = P(A and B) / P(A) (3.) P(B A) P(A) = P(A and B) [Multiply (2.) by P(A)] (.) P(B A) P(A)= P(A B) P(B) [Substitute (.) in (3.)] (5.) P(B A) P(A) / P(B) = P(A B) [Divide by P(B)] P(Dress-wearing Female) = P(Female Dress-wearing)P(Female)/P(Dresswearing) = (/5 * 5/0) / (5/0) = /5 Why use ^$#*! Bayes Theorem? Bayes Theorem is not intended to confuse, but to simplify: you can use it to get the probability relation between any two cells in the 2x2 table It can also be generalized to more complex situations However, in this class we won t go outside of 2x2 conditional probability tables: so just draw a picture or think it through if you prefer! What is a cutting score or cutting line? How shall we evaluate how any given test is? What is a cutting score or cutting line? In many tests we have criteria: if a subject scores above score X, they are likely to be Y [a genius, a moron, a prospect, likely to die in six months] X is a cutting score Note that this is a conditional probability: P(diagnosis test result) Note also that in this case probability of X [test result] is not given by God : we test designers are free to change the cutting score as we like In doing so, we can change P(diagnosis test result) As an example, think of the probability that a person is a genius (defined, let s say, as IQ > 30) given that they got an IQ score of 28, on the one hand, or 0, on the other. Assume the standard error for IQ is 0 points Then there is a fair chance that a person who got 28 has an IQ above 30, but a very small (but non-zero) chance that that person who got 0 has an IQ above 30 If we used 0 as a cutting score for genius, we d be wrong a lot: P(diagnosis test result) is very low If we used 28 as a cutting score for genius, we d be wrong less often: P(diagnosis test result) is higher What we want is some principled way of deciding what a cutting score is for any particular purpose Clearly, our choice of cutting score will depend on that purpose When we are diagnosing a brain tumour, we want to be wrong almost never if the person does have a brain tumour AND we don t care too much if we make a false positive When we are trying to identify criminals, we might be more worried about minimizing false positives (we could ruin a life is we say someone is a criminal when they are not) and willing to pay the price by letting some real criminals go free (increase our false negative rate) 2

False negative: Incorrectly undiagnosed. False positive: Incorrectly diagnosed Low false negative rate High false positive rate High false negative rate Low false positive rate Rewarding incompetence Ignoring competence How shall we evaluate how a test is? How shall we evaluate how a test is? Three things need to be taken into account: i.) The size of the correlation between test scores and criterion - The higher the correlation, the narrower the scatterplot (i.e. the ellipse) and the smaller the error rates Three things need to be taken into account: ii.) The base rate iii.) The cutting score What is the relation between these two measures? 3

The relation between base rate and cutting score Example from Meehl: Group A: 5 well-adjusted soldiers Group B: 89 mal-adjusted soldiers A scale diagnosed 55% of Group B, and only 9% of Group A, so the authors advocated its use Example: Assume N = 0,000 500 are bad. 55% (275) are classified as bad 9500 are. 8% (7695) are not classified as bad. (7695 + 275)/0000 = 79.97% are correctly classified. Why should this bother us? We could have correctly classified 95% without using a test! Let s use Bayes Theorem: Is bad bad? Let s use Bayes Theorem: Is not bad? P(Good Not Diagnosed) =P(Not Diagnosed Good)P(Good)/P(Not Diagnosed) P( Diagnosed) = P(Diagnosed )P()/P(Diagnosed) P(Diagnosed ) 0.55 Given P() 0.05 Assumed P(Diagnosed) 0.208 = (0.55*0.05) + (0.9* 0.95) P( Diagnosed) 0.3 = P(Diagnosed )P()/P(Diagnosed) When we take base rates into account, an identification of a person as bad actually has only a 3% chance of being correct, not a 55% chance as claimed. P(Not Diagnosed Good) 0.8 Given P(Good) 0.95 Assumed P(Not Diagnosed) 0.7920 = (0.5*0.05) + (0.8* 0.95) P(Good Not Diagnosed) 0.97 By Bayes' Theorem When we take base rates into account, a failure to identify a person as bad has 97% chance of being correct but remember that we were already 95% sure before we bothered to do the calculation! The relation between base rate and cutting score, II Let s do the math! A certain Rorschach configuration is seen in 8.% of schizophrenics, and 0% of nonschizophrenics The authors claim this is clinically useful: Is it really? P(Schizo Rorschach) = P(Rorschach Schizo)P(Schizo)/P(Rorschach) P(Rorschach Schizo) 0.08 The empirical finding P(Schizo) 0.0085 Known base rate for schizophrenia P(Rorschach) 0.0006885 = (0.0085*0.08) P(Schizo Rorschach).00 =P(Rorschach Schizo)P(Schizo)/P(Rorschach) Although the sign is certain in this case, it is so rare itself and applies to a group with such a rare base rate that it is P(Rorschach) that is worrying: This information would be diagnostically helpful in only 7 cases out of 0,000! = it is clinically useless

What can we do? : Rule Example : Rule In order for a positive diagnostic assertion to be more likely true than false, the ratio of positive to negative base rates in the examined population must exceed the false positive to valid positive rate : Base rate of positives Base rate of negatives > False positive rate of test True positive rate of test Base rate of positives Base rate of negatives > False positive rate of test True positive rate of test A cutting score identifies 80% of brain-damaged patients. 5% of nondamaged patients also exceed that cut-off. What base rates can justify the use of such a test?.5 (false positive) /.80 (true positive) = 0.9 The ratio of brain damaged to non-brain damaged patients in the population under consideration must be equal to or greater than.9, or about in 5. The easiest case: Equal base rates (Rule 2) Example: Equal base rates (Rule 2) Iff base rates are equal, then the probability of a positive diagnosis is the ratio of the true positive rate to the sum of the true and false positive rates. Another way of saying this more simply is: equal base rates render Bayes Theorem unnecessary. Iff base rates are equal, then the probability of a positive diagnosis is the ratio of the true positive rate to the sum of the true and false positive rates. Two kinds of cancers occur equally often. A test diagnoses Type B with 68% accuracy, but is at chance for Type A. You get a positive test result. What is the probability you have Type B cancer? For once life is simple. The probability is 68%. 0.68 / (0.68 + 0.32) = 0.68 Example 2: Equal base rates (Rule 2) A test picks out 75% of people who will continue in school (true positives) but also 0% of those who will not (false positives). It is claimed that about half of all students in the population drop out of school. How far off can that claim be without the test being useless? The probability of a positive diagnosis with equal split is the ratio of the true positive rate to the sum of the true and false positive rates: 0.75 / (0.75 + 0.0) = 0.65 So the test gets about 65% right. If less than 35% of the students actually do drop out, the test will not do better than base rates. When can a test help? (Rule 3) A test result can only help if the base rate of the more numerous class (here, positive) is less than the ratio of the true negative rate to the sum of the true and false negative rate That is: If it is a matter of fact that (say) only 0% of students drop out, then there is no use giving this test: it can t beat the 90% odds you have of being correct before you bothered to give the test 5

When can a test help? (Rule 3) A test result can only help if the base rate of the more numerous class (say, positive) is less than the ratio of the true negative rate to the sum of the true and false negative rate A test of maladjustment classifies 85% of maladjusted girls, but only misidentifies 5% of adjusted girls. What base rates are needed to support these ratios? (Assume, reasonably, that there are more adjusted than unadjusted girls.) What does this have to do with cutting lines? The proportion of people selected (diagnosed, chosen) from a sample is called the selection ratio When positive/negative base rates are not equal, there is a (fairly brutal) trade-off between the accuracy (error rate) of a diagnosis or prediction, and the size of the selection ratio The ratio of the true negative rate to the sum of the true and false negative rate = (0.85 [true negative] / (0.85 [true negative] + 0.5 [false negative]) = 0.85. The test can only help if less than 85% of girls are well-adjusted. The brutal trade-off If you want to be very sure you are right, you can speak of only a very small proportion of the sample (and you need a very large sample to get the cut-off points!) If you want to say something about everyone, then you must be prepared to be uncertain about your cut-off points, and wrong very often. In short: you can be certain about a few people, or uncertain about a lot of people: take your pick! False negative: Incorrectly unselected False positive: Incorrectlyselected Low false Negative rate High false positive rate High false negative rate Low false positive rate Rewarding incompetence Ignoring competence 6

Sensitivity & Specificity The sensitivity of a test = The probability of having a positive test result when the disease is present = P(Result Disease) = True positive rate The specificity of a test = The probability of having a negative test result when the disease is absent = P(~Result ~Disease) = True negative rate False negative: Incorrectly unselected True negative: Correctly unselected SPECIFICITY SENSITIVITY True positive: Correctly selected False positive: Incorrectly selected What to do? What to do? 2.) Obviously, sometimes we can be satisfied with a small improvement on true negative base rates and with a large false positive rate As we have said, we don t mind mistaking 90 brain tumors in order not miss 20. 2.) Successive hurdles: Take a chance, allow errors, and give the expensive, time-consuming, but accurate tests to those who are selected out from a first-pass of a less-expensive, less time-consuming, and more accurate test Repeat as necessary... 3.) Sometimes we can find sub-populations with less extreme base rates than in the world-at-large If our referrals are well-screened, we can have more confidence in base rates that are less onerous (= closer to being equal) than they would be in the world at large What to do? 3 What to do?.) Sometimes so what? is the right thing to say. Since testing with any accuracy is so difficult to do well, we should not bother to give tests that don t lead to real changes in therapy or other treatment If you can identify therapy candidates with 70% accuracy, so what? Will you then ignore or refuse to treat those who don t make the cut? If not, don t waste time and effort giving the test Gather base rate information. 7