STAT 200. Guided Exercise 4

Similar documents
MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Creative Commons Attribution-NonCommercial-Share Alike License

Student Performance Q&A:

Risk Aversion in Games of Chance

Biostatistics Lecture April 28, 2001 Nate Ritchey, Ph.D. Chair, Department of Mathematics and Statistics Youngstown State University

Sheila Barron Statistics Outreach Center 2/8/2011

C-1: Variables which are measured on a continuous scale are described in terms of three key characteristics central tendency, variability, and shape.

Statistics for Psychology

OCW Epidemiology and Biostatistics, 2010 David Tybor, MS, MPH and Kenneth Chui, PhD Tufts University School of Medicine October 27, 2010

Objectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests

111, section 8.6 Applications of the Normal Distribution

BIOSTATS 540 Fall 2018 Exam 2 Page 1 of 12

Unit 2: Probability and distributions Lecture 3: Normal distribution

Chapter 12: Introduction to Analysis of Variance

Lesson 11.1: The Alpha Value

EVALUATING AND IMPROVING MULTIPLE CHOICE QUESTIONS

ASSIGNMENT 2. Question 4.1 In each of the following situations, describe a sample space S for the random phenomenon.

Bayes Theorem Application: Estimating Outcomes in Terms of Probability

were selected at random, the probability that it is white or black would be 2 3.

Math HL Chapter 12 Probability

Multiple Choice Questions

Study Guide for the Final Exam

(a) 50% of the shows have a rating greater than: impossible to tell

One-Way ANOVAs t-test two statistically significant Type I error alpha null hypothesis dependant variable Independent variable three levels;

Bayes theorem describes the probability of an event, based on prior knowledge of conditions that might be related to the event.

Your goal in studying for the GED science test is scientific

Review for Final Exam

Never P alone: The value of estimates and confidence intervals

Probability Models for Sampling

Chapter 8 Estimating with Confidence. Lesson 2: Estimating a Population Proportion

AP Statistics TOPIC A - Unit 2 MULTIPLE CHOICE

Bayesian Analysis by Simulation

manipulation influences other variables, the researcher is conducting a(n)

How is ethics like logistic regression? Ethics decisions, like statistical inferences, are informative only if they re not too easy or too hard 1

THIS PROBLEM HAS BEEN SOLVED BY USING THE CALCULATOR. A 90% CONFIDENCE INTERVAL IS ALSO SHOWN. ALL QUESTIONS ARE LISTED BELOW THE RESULTS.

Averages and Variation

Lecture 20: Chi Square

Lab 4 (M13) Objective: This lab will give you more practice exploring the shape of data, and in particular in breaking the data into two groups.

Lecture Outline Biost 517 Applied Biostatistics I. Statistical Goals of Studies Role of Statistical Inference

Welcome to this series focused on sources of bias in epidemiologic studies. In this first module, I will provide a general overview of bias.

Chapter 2--Norms and Basic Statistics for Testing

An Introduction to Bayesian Statistics

Choosing Life: Empowerment, Action, Results! CLEAR Menu Sessions. Substance Use Risk 2: What Are My External Drug and Alcohol Triggers?

LSP 121. LSP 121 Math and Tech Literacy II. Topics. Risk Analysis. Risk and Error Types. Greg Brewster, DePaul University Page 1

Psychological. Influences on Personal Probability. Chapter 17. Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

ISC- GRADE XI HUMANITIES ( ) PSYCHOLOGY. Chapter 2- Methods of Psychology

Perfect Bayesian Equilibrium

Midterm project due next Wednesday at 2 PM

Chapter 7: Descriptive Statistics

(a) 50% of the shows have a rating greater than: impossible to tell

The Human Side of Science: I ll Take That Bet! Balancing Risk and Benefit. Uncertainty, Risk and Probability: Fundamental Definitions and Concepts

Chapter 8 Estimating with Confidence

Chapter 8 Estimating with Confidence. Lesson 2: Estimating a Population Proportion

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

When Intuition. Differs from Relative Frequency. Chapter 18. Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

DISCRETE RANDOM VARIABLES: REVIEW

GCE. Statistics (MEI) OCR Report to Centres. June Advanced Subsidiary GCE AS H132. Oxford Cambridge and RSA Examinations

Chi Square Goodness of Fit

Normal Random Variables

Appendix: Instructions for Treatment Index B (Human Opponents, With Recommendations)

SAMPLING AND SAMPLE SIZE

Chapter 11. Experimental Design: One-Way Independent Samples Design

Math 2311 Section 3.3

Welcome to OSA Training Statistics Part II

MATH CALCULUS & STATISTICS/BUSN - PRACTICE EXAM #2 - SUMMER DR. DAVID BRIDGE

2) {p p is an irrational number that is also rational} 2) 3) {a a is a natural number greater than 6} 3)

STAT 113: PAIRED SAMPLES (MEAN OF DIFFERENCES)

Sections 10.7 and 10.9

STOR 155 Section 2 Midterm Exam 1 (9/29/09)

Name: Economics Fall Semester October 14, Test #1

3.2 Least- Squares Regression

Practice First Midterm Exam

Lesson 87 Bayes Theorem

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

Regression Including the Interaction Between Quantitative Variables

Test 1 Version A STAT 3090 Spring 2018

Business Statistics Probability

Math 261 Exam I Spring Name:

Subliminal Messages: How Do They Work?

Research Methods II, Spring Term Logic of repeated measures designs

PROBABILITY Page 1 of So far we have been concerned about describing characteristics of a distribution.

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

ANOVA. Thomas Elliott. January 29, 2013

USING STATCRUNCH TO CONSTRUCT CONFIDENCE INTERVALS and CALCULATE SAMPLE SIZE

EXERCISE: HOW TO DO POWER CALCULATIONS IN OPTIMAL DESIGN SOFTWARE

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc.

Module 28 - Estimating a Population Mean (1 of 3)

ORIENTATION SAN FRANCISCO STOP SMOKING PROGRAM

Chapter 8: Estimating with Confidence

STAT 100 Exam 2 Solutions (75 points) Spring 2016

Math 243 Sections , 6.1 Confidence Intervals for ˆp

A Case Study: Two-sample categorical data

Sample Size Considerations. Todd Alonzo, PhD

Quantitative Literacy: Thinking Between the Lines

MBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1. Lecture 27: Systems Biology and Bayesian Networks

Exam 4 Review Exercises

Chapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE

STA Module 1 The Nature of Statistics. Rev.F07 1

STA Rev. F Module 1 The Nature of Statistics. Learning Objectives. Learning Objectives (cont.

Case A Review: Checkpoint A Contents

Transcription:

STAT 200 Guided Exercise 4 1. Let s Revisit this Problem. Fill in the table again. Diagnostic tests are not infallible. We often express a fale positive and a false negative with any test. There are further terms which we will discuss in this exercise. Imagine that the probability is 0.95 that a certain test will diagnose a diabetic correctly as being diabetic, and it is 0.05 that it will diagnose a person who is not diabetic as being diabetic. It is known that roughly 10% if the population is diabetic. What is the probability that a person diagnosed as being diabetic actually is diabetic? Hint: This is a use Bayes theorem problem, which we did not cover in the lectures. There is another way to handle this problem mack a mock 2 by 2 table of the data based on the information you already know. Once the table is complete, you can solve for the conditional probability. Since some of the probabilities are small, I would suggest you make a table that is based on 100,000 people. I have started the table for you. Test Results Diabetes Status Diabetic Not Diabetic Diabetic Not Diabetic 9500 500 10,000 4500 85500 90,000 14000 86000 100,000 a. What is the probability that a person diagnosed as being diabetic actually is diabetic? P(D Test says D) = 9500/14,000 =.6786 b. What is the odds of the test results saying you are a diabetic (versus not a diabetic) for those who truly are a diabetic? Odds = 9500/500 = 19 c. What is the odds of the test results saying you are a diabetic (versus not a diabetic) for those who are not a diabetic? Odds = 4500/85500 =.052632 d. What is the odds ratio for the test results saying you are a diabetic (versus not a diabetic) comparing diabetics to non diabetics? Interpret in words this odds ratio. Odds Ratio = 19/.052632 = 361 ; Those that are diabetic are 361 times more likely to get a test result saying they are diabetic than those who are not diabetic 1

e. We can think of our table in the following way: Test Results Diabetes Status Diabetic Not Diabetic Diabetic True Positive False Negative Not Diabetic False Positive True Negative The sensitivity of a test is expressed as the probability of a positive test among patients with the disease. The formula is given as: Sensitivity = What is the sensitivity of this test? This is P(Pos Test Diabetic) = 9,500/10,000 =.95 A conditional probability! True Positive True Positive + False Negative ( ) f. The specificity of a test is expressed as the probability of a negative test among patients without the disease. The formula is given as: Specificity = What is the specificity of this test? This is P(Neg Test Not Diabetic) = 85,500/90,000 =.95 A conditional probability! True Negative True Negative + False Positive ( ) 2

2. Discrete Random Variable: The number of Games in a Baseball World Series. Based on past results found in the Information Please Almanac, there is a 0.1809 probability that a baseball World Series contest will last four games, a 0.2234 probability that it will last five games, a 0.2234 probability that it will last six games, and a 0.3723 probability that it will last seven games. The probability table is given below: X 4 5 6 7 P(X).1809.2234.2234.3723 a. What is the mean (expected value) number of games in a World Series? E(x) = 4*.1809 + 5*.2234 + 6*.2234 + 7*.3727 = 5.7871 b. What is the variance of the number of games in a World Series? Var = (4-5.7871) 2 *.1809 + (5-5.7871) 2 *.2234 + (6-5.7871) 2 *.2234 + (7-5.7871) 2 *.3723 Var = 1.2740 c. Is it unusual for a team to sweep the World Series (win all four games in a row)? It is not unusual. We expect that 18.09% of the time. However, it is the lowest probability of the possible outcomes, and there is an 81.91% chance of more than 4 games. I would expect that networks look at the probabilities associated with a sweep when bidding on the coverage of the World Series. 3

3. Consider an experiment in which 10 identical small boxes are placed side-by-side on a table. A crystal is placed, at random, inside one of the boxes. A self-professed psychic is asked to pick the box that contains the crystal. This experiment is repeated seven times, and x is the number of correct decisions in seven tries. Thus, it is a Binomial random variable. a. If the psychic is guessing, what is the value of p, the probability of a correct decision on each trial? P(success) = 1/10 =.1 This means a random person just guessing where the crystal is under one of 10 boxes has a 1in 10 or 10% chance of being right. b. Fill in the remaining portions of this table reflecting the probability distribution for this variable using the binomial table or the binomial formula. The Binomial Table for n = 7 and p =.10 is much easier! X 0 1 2 3 4 5 6 7 p(x).4783.3720.1240.0230.0026.0002.0000.0000 c. If the psychic is guessing, what is the expected number of correct decisions in seven trials, and what is the variance? E(x) = n*p = 7 *.1 =.7 V(x) =n*p*q = 7 *.1 *.9 =.63; Std dev. =.7937 d. If the psychic is guessing, what is the probability of no correct decisions in seven trials? Just read the answer from the table! It is pretty high - there is a high probability you won t get any right. X 0 1 2 3 4 5 6 7 P(x).4783.3720.1240.0230.0026.0002.0000.0000 e. One of the psychics who took the test got all seven wrong. Suppose the criteria for having ESP is that you could guess right with p =.5. In other words, if you are a psychic you might not get it right all the time, but you should be doing much better than chance. If p=.5 instead of.10, what is the probability of guessing incorrectly on all seven trials? If a person really was a psychic, it would be rare that such a person would guess none right in 7 tries. X 0 1 2 3 4 5 6 7 P(x).0078.0547.1641.2734.2734.1641.0547.0078 4

4. If a single bit of data (0 or 1) is transmitted over a noisy communication channel, it has a probability p of being incorrectly transmitted. To improve the reliability of the transmission, the bit is transmitted n times, where n is odd. A decoder at the receiving end, called a majority decoder, decides that the correct message is the one carried by the majority of the received bits. This means that if there are five transmissions of a (0,1) bit, the bit used by at least three of the transmissions would be considered correct. Assume that each bit is independently subject to being corrupted with the same probability p, and that p=.1. Note, p is the probability of an error, and in terms of a binomial problem we will think of X as the number of errors in n transmissions. a. If a company sent only one transmission, what is the probability of it being received without an error? p=.1, which is the probability of an incorrect transmission. So q = 1-p =.90. The probability of it being received without an error is.9. If the information is important, this probability might seem too low. b. A company decides to use 5 transmissions as a strategy to reduce errors (n=5). Set up the outcomes for 5 transmissions and the probabilities associated with each outcome using the binomial distribution. X 0 1 2 3 4 5 p(x).5905.3281.0729.0081.0005.0000 c. Calculate the mean, variance, and standard deviation for this problem. E(x) = n*p = 5 *.1 =.5 V(x) = n*p*q = 5 *.1 *.9 =.45; Std dev. =.6708 d. If five messages are sent for each bit, the probability that the message is correctly received is the probability of two or fewer errors. This is not easy to see, but think it through with me. If the system sends 3, 4, 5 wrong messages, the majority decoder strategy will accept the wrong message and make a wrong decision. But it the wrong message is sent 2, 1 or 0 times, the right message will be accepted. Look at the probability of zero, 1 or 2 messages from our binomial table above. What is the probability that the message is correctly received in five transmissions (i.e., 2 or fewer errors)? Compare that with the answer your derived in Part a. Did sending five transmissions improve the chances of sending the message correctly? P(x=0) + P(x=1) + P(x=2) =.5905 +.3281 +.0729 =.9914 This is much better that.9 The majority decoder strategy with n= 5 transmissions greatly improved the chance of a right transmission 5

5. Discrete Random Variable Problem. A concert producer has scheduled an outdoor concert on a Saturday. If it does not rain, he expects to make $20,000 profit from the concert If it does rain, the producer will be forced to cancel the concert and lose $12,000 (from fees, advertising, stadium rental and so forth) The probability of rain on Saturday is.4. a. What is the expected profit from the concert? Hint: write out the probability distribution and solve for the expectation. The values that your random variable can take are the dollar values. x $20,000 -$12,000 P(x).6.4 E(x) = 20,000*.6-12,000*.4 = $12,000-4,800 = $7,200 b. For a fee of $1,000 an insurance company will insure against all losses from a rained out concert. If the producer buys the insurance, what is her expected profit from the concert? Note: an insurance fee is a fixed cost incurred regardless of whether is rains or not. x $20,000 0 P(x).6.4 E(x) = 20,000*.6 + 0*.4 = $12,000 - $1,000 = $11,000 x $19,000-1000 P(x).6.4 E(x) = 19,000*.6-1,000*.4 = $11,400 - $400 = $11,000 c. Assuming the forecast is accurate, do you believe the insurance company has charged too much or too little? Hint: reformulate the problem to express outcomes in terms of the insurance company and what they expect to pay out. x 0 -$12,000 P(x).6.4 E(x) = 0*.6-12,000*.4 = -$4,800 payout Yet they only charged $1,000 - they charged too little. 6

6. Normal Distribution Problem. Plastic bags used for packaging produce are manufactured so that the breaking strength of the bag is normally distributed with a mean of 5 pounds per square inch and a standard deviation of 1.5 pounds per square inch. What proportion of the bags produced have a breaking strength of: a. Less than 3.17 pounds per square inch? Z = (3.17 5)/1.5 = -1.22; P(<= Z) =.5 -.3888 =.1112 b. At least 3.6 pounds per square inch? Z = (3.6 5)/1.5 = -.9333; P(>=Z) =.3238 +.5 =.8238 c. Between 5 and 5.5 pounds per square inch? Z = (5.5 5)/1.5 =.3333; P(5<Z<5.5) =.1293 d. Between 3.2 and 4.2 pounds per square inch? Z = (3.2 5)/1.5 = -1.20; P(5<= Z ) =.3849 Z = (4.2-5)/1.5 = -.5333; P(5<= Z ) =.2019 Answer =.3849 -.2019 =.1830 e. Between what two values symmetrically distributed around the mean will 95% of the breaking strengths fall? Be careful here! With the normal distribution we need to be more precise than 2 standard deviations. 5 ± 1.96(1.5) = 2.06 to 7.94 7

7. Normal Distribution Problem. You have been hired as a consultant to provide analysis for the Personnel Department at ZTel company, a large communications company. Every applicant of ZTel must take a standardized exam, and the hire or no-hire decision depends in part on this exam. The exam was purchased from a company which says the exam is distributed approximately normal with: µ = 525 σ = 55 The current interview policy has two phases. The first phase separates all applicants into one of three categories: Automatic Interview score of 600 or above Maybe Interview score of 500 to 600 Automatic Rejects score less than 500 The Maybe group are passed on to a second phase where their previous experiences, education, special skills, and other factors are taken into consideration in whether to grant an interview or not. No one at the company can remember why the values of 600 and 500 were used as the standards for automatic interview or rejection, and most likely there were decided arbitrarily by a former Personnel Manager. The current Personnel Manager of Ztel needs to know the following: a. The probability associated with the current standard of being automatically rejected - what proportion of the applicants are automatically rejected? Z = (500-525)/55 = -.4545 P(X <= -.4545)=.5 -.1753 Automatic Reject < 500 =.5 -.1753 =.3247 b. The probability associated with the current standard of being automatically interviewed - what proportion of the applicants are automatically interviewed? Z = (600-525)/55 = 1.364 P(X >= 1.364) =.5 -.4137 Automatic Interview > 600 =.5 -.4131 =.0863 c. The manger notices that applicants that score between 535 and 580 tend to be good hires, having both good skills and a higher probability of accepting an offer to the company. She would like to give this group a higher priority in the second phase of evaluation. What percentage of the applicants should she expect to fall within this range? Z = (580-525)/55 = 1.000 P(Z) =.3413 Z = (535-525)/55 =.182 P(Z) =.0721 P (535 <= X <=580) =.3413 -.0721 =.2692 26.9% or about 27 percent are in the sweet Spot 8

d. The manager would prefer that the exam score for automatically interview would be set at the top 15% (the 85 th percentile) and the automatic rejection would be set at 20% (at the 20 th percentile). What are the exam values in this distribution associated with these probabilities (in this case, round to whole numbers)? For the top 15% automatically interviewed, it would be at the 85th percentile, z = 1.04 1.04 = (x-525)/55 = (1.04*55)+525 = 582.2 582 For the bottom 20% it would be at the 20th percentile, z = -.8416 -.8416 = (x-525)/55 =-.8416*55+525 = 478.71 479 Summarize your results as a recommendation to your client. The old approach used thresholds that were arbitrary. With the new approach we could identify the percentage of applicants in the good high range as well as defend the automatic interview and automatic reject in terms of percentiles in the distribution. 9