STAT 200. Guided Exercise 4 - PDF Free Download

STAT 200 Guided Exercise 4 1. Let s Revisit this Problem. Fill in the table again. Diagnostic tests are not infallible. We often express a fale positive and a false negative with any test. There are further terms which we will discuss in this exercise. Imagine that the probability is 0.95 that a certain test will diagnose a diabetic correctly as being diabetic, and it is 0.05 that it will diagnose a person who is not diabetic as being diabetic. It is known that roughly 10% if the population is diabetic. What is the probability that a person diagnosed as being diabetic actually is diabetic? Hint: This is a use Bayes theorem problem, which we did not cover in the lectures. There is another way to handle this problem mack a mock 2 by 2 table of the data based on the information you already know. Once the table is complete, you can solve for the conditional probability. Since some of the probabilities are small, I would suggest you make a table that is based on 100,000 people. I have started the table for you. Test Results Diabetes Status Diabetic Not Diabetic Diabetic Not Diabetic 9500 500 10,000 4500 85500 90,000 14000 86000 100,000 a. What is the probability that a person diagnosed as being diabetic actually is diabetic? P(D Test says D) = 9500/14,000 =.6786 b. What is the odds of the test results saying you are a diabetic (versus not a diabetic) for those who truly are a diabetic? Odds = 9500/500 = 19 c. What is the odds of the test results saying you are a diabetic (versus not a diabetic) for those who are not a diabetic? Odds = 4500/85500 =.052632 d. What is the odds ratio for the test results saying you are a diabetic (versus not a diabetic) comparing diabetics to non diabetics? Interpret in words this odds ratio. Odds Ratio = 19/.052632 = 361 ; Those that are diabetic are 361 times more likely to get a test result saying they are diabetic than those who are not diabetic 1

e. We can think of our table in the following way: Test Results Diabetes Status Diabetic Not Diabetic Diabetic True Positive False Negative Not Diabetic False Positive True Negative The sensitivity of a test is expressed as the probability of a positive test among patients with the disease. The formula is given as: Sensitivity = What is the sensitivity of this test? This is P(Pos Test Diabetic) = 9,500/10,000 =.95 A conditional probability! True Positive True Positive + False Negative ( ) f. The specificity of a test is expressed as the probability of a negative test among patients without the disease. The formula is given as: Specificity = What is the specificity of this test? This is P(Neg Test Not Diabetic) = 85,500/90,000 =.95 A conditional probability! True Negative True Negative + False Positive ( ) 2

2. Discrete Random Variable: The number of Games in a Baseball World Series. Based on past results found in the Information Please Almanac, there is a 0.1809 probability that a baseball World Series contest will last four games, a 0.2234 probability that it will last five games, a 0.2234 probability that it will last six games, and a 0.3723 probability that it will last seven games. The probability table is given below: X 4 5 6 7 P(X).1809.2234.2234.3723 a. What is the mean (expected value) number of games in a World Series? E(x) = 4*.1809 + 5*.2234 + 6*.2234 + 7*.3727 = 5.7871 b. What is the variance of the number of games in a World Series? Var = (4-5.7871) 2 *.1809 + (5-5.7871) 2 *.2234 + (6-5.7871) 2 *.2234 + (7-5.7871) 2 *.3723 Var = 1.2740 c. Is it unusual for a team to sweep the World Series (win all four games in a row)? It is not unusual. We expect that 18.09% of the time. However, it is the lowest probability of the possible outcomes, and there is an 81.91% chance of more than 4 games. I would expect that networks look at the probabilities associated with a sweep when bidding on the coverage of the World Series. 3

3. Consider an experiment in which 10 identical small boxes are placed side-by-side on a table. A crystal is placed, at random, inside one of the boxes. A self-professed psychic is asked to pick the box that contains the crystal. This experiment is repeated seven times, and x is the number of correct decisions in seven tries. Thus, it is a Binomial random variable. a. If the psychic is guessing, what is the value of p, the probability of a correct decision on each trial? P(success) = 1/10 =.1 This means a random person just guessing where the crystal is under one of 10 boxes has a 1in 10 or 10% chance of being right. b. Fill in the remaining portions of this table reflecting the probability distribution for this variable using the binomial table or the binomial formula. The Binomial Table for n = 7 and p =.10 is much easier! X 0 1 2 3 4 5 6 7 p(x).4783.3720.1240.0230.0026.0002.0000.0000 c. If the psychic is guessing, what is the expected number of correct decisions in seven trials, and what is the variance? E(x) = n*p = 7 *.1 =.7 V(x) =n*p*q = 7 *.1 *.9 =.63; Std dev. =.7937 d. If the psychic is guessing, what is the probability of no correct decisions in seven trials? Just read the answer from the table! It is pretty high - there is a high probability you won t get any right. X 0 1 2 3 4 5 6 7 P(x).4783.3720.1240.0230.0026.0002.0000.0000 e. One of the psychics who took the test got all seven wrong. Suppose the criteria for having ESP is that you could guess right with p =.5. In other words, if you are a psychic you might not get it right all the time, but you should be doing much better than chance. If p=.5 instead of.10, what is the probability of guessing incorrectly on all seven trials? If a person really was a psychic, it would be rare that such a person would guess none right in 7 tries. X 0 1 2 3 4 5 6 7 P(x).0078.0547.1641.2734.2734.1641.0547.0078 4

4. If a single bit of data (0 or 1) is transmitted over a noisy communication channel, it has a probability p of being incorrectly transmitted. To improve the reliability of the transmission, the bit is transmitted n times, where n is odd. A decoder at the receiving end, called a majority decoder, decides that the correct message is the one carried by the majority of the received bits. This means that if there are five transmissions of a (0,1) bit, the bit used by at least three of the transmissions would be considered correct. Assume that each bit is independently subject to being corrupted with the same probability p, and that p=.1. Note, p is the probability of an error, and in terms of a binomial problem we will think of X as the number of errors in n transmissions. a. If a company sent only one transmission, what is the probability of it being received without an error? p=.1, which is the probability of an incorrect transmission. So q = 1-p =.90. The probability of it being received without an error is.9. If the information is important, this probability might seem too low. b. A company decides to use 5 transmissions as a strategy to reduce errors (n=5). Set up the outcomes for 5 transmissions and the probabilities associated with each outcome using the binomial distribution. X 0 1 2 3 4 5 p(x).5905.3281.0729.0081.0005.0000 c. Calculate the mean, variance, and standard deviation for this problem. E(x) = n*p = 5 *.1 =.5 V(x) = n*p*q = 5 *.1 *.9 =.45; Std dev. =.6708 d. If five messages are sent for each bit, the probability that the message is correctly received is the probability of two or fewer errors. This is not easy to see, but think it through with me. If the system sends 3, 4, 5 wrong messages, the majority decoder strategy will accept the wrong message and make a wrong decision. But it the wrong message is sent 2, 1 or 0 times, the right message will be accepted. Look at the probability of zero, 1 or 2 messages from our binomial table above. What is the probability that the message is correctly received in five transmissions (i.e., 2 or fewer errors)? Compare that with the answer your derived in Part a. Did sending five transmissions improve the chances of sending the message correctly? P(x=0) + P(x=1) + P(x=2) =.5905 +.3281 +.0729 =.9914 This is much better that.9 The majority decoder strategy with n= 5 transmissions greatly improved the chance of a right transmission 5

5. Discrete Random Variable Problem. A concert producer has scheduled an outdoor concert on a Saturday. If it does not rain, he expects to make $20,000 profit from the concert If it does rain, the producer will be forced to cancel the concert and lose $12,000 (from fees, advertising, stadium rental and so forth) The probability of rain on Saturday is.4. a. What is the expected profit from the concert? Hint: write out the probability distribution and solve for the expectation. The values that your random variable can take are the dollar values. x $20,000 -$12,000 P(x).6.4 E(x) = 20,000*.6-12,000*.4 = $12,000-4,800 = $7,200 b. For a fee of $1,000 an insurance company will insure against all losses from a rained out concert. If the producer buys the insurance, what is her expected profit from the concert? Note: an insurance fee is a fixed cost incurred regardless of whether is rains or not. x $20,000 0 P(x).6.4 E(x) = 20,000*.6 + 0*.4 = $12,000 - $1,000 = $11,000 x $19,000-1000 P(x).6.4 E(x) = 19,000*.6-1,000*.4 = $11,400 - $400 = $11,000 c. Assuming the forecast is accurate, do you believe the insurance company has charged too much or too little? Hint: reformulate the problem to express outcomes in terms of the insurance company and what they expect to pay out. x 0 -$12,000 P(x).6.4 E(x) = 0*.6-12,000*.4 = -$4,800 payout Yet they only charged $1,000 - they charged too little. 6

6. Normal Distribution Problem. Plastic bags used for packaging produce are manufactured so that the breaking strength of the bag is normally distributed with a mean of 5 pounds per square inch and a standard deviation of 1.5 pounds per square inch. What proportion of the bags produced have a breaking strength of: a. Less than 3.17 pounds per square inch? Z = (3.17 5)/1.5 = -1.22; P(<= Z) =.5 -.3888 =.1112 b. At least 3.6 pounds per square inch? Z = (3.6 5)/1.5 = -.9333; P(>=Z) =.3238 +.5 =.8238 c. Between 5 and 5.5 pounds per square inch? Z = (5.5 5)/1.5 =.3333; P(5<Z<5.5) =.1293 d. Between 3.2 and 4.2 pounds per square inch? Z = (3.2 5)/1.5 = -1.20; P(5<= Z ) =.3849 Z = (4.2-5)/1.5 = -.5333; P(5<= Z ) =.2019 Answer =.3849 -.2019 =.1830 e. Between what two values symmetrically distributed around the mean will 95% of the breaking strengths fall? Be careful here! With the normal distribution we need to be more precise than 2 standard deviations. 5 ± 1.96(1.5) = 2.06 to 7.94 7

7. Normal Distribution Problem. You have been hired as a consultant to provide analysis for the Personnel Department at ZTel company, a large communications company. Every applicant of ZTel must take a standardized exam, and the hire or no-hire decision depends in part on this exam. The exam was purchased from a company which says the exam is distributed approximately normal with: µ = 525 σ = 55 The current interview policy has two phases. The first phase separates all applicants into one of three categories: Automatic Interview score of 600 or above Maybe Interview score of 500 to 600 Automatic Rejects score less than 500 The Maybe group are passed on to a second phase where their previous experiences, education, special skills, and other factors are taken into consideration in whether to grant an interview or not. No one at the company can remember why the values of 600 and 500 were used as the standards for automatic interview or rejection, and most likely there were decided arbitrarily by a former Personnel Manager. The current Personnel Manager of Ztel needs to know the following: a. The probability associated with the current standard of being automatically rejected - what proportion of the applicants are automatically rejected? Z = (500-525)/55 = -.4545 P(X <= -.4545)=.5 -.1753 Automatic Reject < 500 =.5 -.1753 =.3247 b. The probability associated with the current standard of being automatically interviewed - what proportion of the applicants are automatically interviewed? Z = (600-525)/55 = 1.364 P(X >= 1.364) =.5 -.4137 Automatic Interview > 600 =.5 -.4131 =.0863 c. The manger notices that applicants that score between 535 and 580 tend to be good hires, having both good skills and a higher probability of accepting an offer to the company. She would like to give this group a higher priority in the second phase of evaluation. What percentage of the applicants should she expect to fall within this range? Z = (580-525)/55 = 1.000 P(Z) =.3413 Z = (535-525)/55 =.182 P(Z) =.0721 P (535 <= X <=580) =.3413 -.0721 =.2692 26.9% or about 27 percent are in the sweet Spot 8

d. The manager would prefer that the exam score for automatically interview would be set at the top 15% (the 85 th percentile) and the automatic rejection would be set at 20% (at the 20 th percentile). What are the exam values in this distribution associated with these probabilities (in this case, round to whole numbers)? For the top 15% automatically interviewed, it would be at the 85th percentile, z = 1.04 1.04 = (x-525)/55 = (1.04*55)+525 = 582.2 582 For the bottom 20% it would be at the 20th percentile, z = -.8416 -.8416 = (x-525)/55 =-.8416*55+525 = 478.71 479 Summarize your results as a recommendation to your client. The old approach used thresholds that were arbitrary. With the new approach we could identify the percentage of applicants in the good high range as well as defend the automatic interview and automatic reject in terms of percentiles in the distribution. 9