Chapter 15: Continuation of probability rules

Similar documents
Probability II. Patrick Breheny. February 15. Advanced rules Summary

PHP2500: Introduction to Biostatistics. Lecture III: Introduction to Probability

How Math (and Vaccines) Keep You Safe From the Flu

9. Interpret a Confidence level: "To say that we are 95% confident is shorthand for..

Chapter 1: Exploring Data

THIS PROBLEM HAS BEEN SOLVED BY USING THE CALCULATOR. A 90% CONFIDENCE INTERVAL IS ALSO SHOWN. ALL QUESTIONS ARE LISTED BELOW THE RESULTS.

Chapter 6: Counting, Probability and Inference

Chapter 5 & 6 Review. Producing Data Probability & Simulation

Chapter 13: Experiments

STAT 110: Introduction to Descriptive Statistics

Quantitative Literacy: Thinking Between the Lines

Math HL Chapter 12 Probability

5. Suppose there are 4 new cases of breast cancer in group A and 5 in group B. 1. Sample space: the set of all possible outcomes of an experiment.

AMS 5 EXPERIMENTAL DESIGN

Designing a Study Operational Definitions

Probability Models for Sampling

Design, Sampling, and Probability

Math 1680 Class Notes. Chapters: 1, 2, 3, 4, 5, 6

Probability. Esra Akdeniz. February 26, 2016

The Epidemic Model 1. Problem 1a: The Basic Model

Thursday, April 25, 13. Intervention Studies

MITOCW conditional_probability

Lesson Marginal, Joint, and Conditional Probabilities from Two-Way Tables

What women should know about. cervical cancer. American Cancer Society Guidelines for the Early Detection of Cervical Cancer

Thursday, April 25, 13. Intervention Studies

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

Infectious Disease Models 3. Nathaniel Osgood CMPT 858 March 16, 2010

Suggested Exercises and Projects 395

Statistics for Psychology

Mathematical Structure & Dynamics of Aggregate System Dynamics Infectious Disease Models 2. Nathaniel Osgood CMPT 394 February 5, 2013

A case study in Pulse Polio Immunisation

Never P alone: The value of estimates and confidence intervals

Essentials of Aggregate System Dynamics Infectious Disease Models. Nathaniel Osgood CMPT 858 FEBRUARY 3, 2011

An application of topological relations of fuzzy regions with holes

HPV facts about the virus, the vaccine and what this means for you. Answers to common questions asked by adolescents and young adults

CHANCE YEAR. A guide for teachers - Year 10 June The Improving Mathematics Education in Schools (TIMES) Project

Appendix: Instructions for Treatment Index B (Human Opponents, With Recommendations)

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

Quizzes (and relevant lab exercises): 20% Midterm exams (2): 25% each Final exam: 30%

Overview CANCER. Cost Facts

Probability and Sample space

Cost-effectiveness, Affordability, and Financing of Cervical Cancer Prevention

For all women aged Cervical Screening. Frequently Asked Questions. States of Guernsey Public Health Services

Analytical Geometry. Applications of Probability Study Guide. Use the word bank to fill in the blanks.

STAT 100 Exam 2 Solutions (75 points) Spring 2016

STI & HIV PRE-TEST ANSWER KEY

Sexually Transmitted Diseases This publication was made possible by Grant Number TP1AH from the Department of Health and Human Services,

Naïve Bayes classification in R

Section 3.2 Least-Squares Regression

PROTECT YOURSELF + PROTECT YOUR PARTNER. syphilis THE FACTS

Case Studies in Ecology and Evolution. 10 The population biology of infectious disease

6 Relationships between

MIDTERM EXAM. Total 150. Problem Points Grade. STAT 541 Introduction to Biostatistics. Name. Spring 2008 Ismor Fischer

Information on: HPV testing. jostrust.org.uk

Two-sample Categorical data: Measuring association

Elementary Statistics and Inference. Elementary Statistics and Inference. 1.) Introduction. 22S:025 or 7P:025. Lecture 1.

HPV Knowledge Survey Healthcare Providers

Health Care Professional Questionnaire

Trends in: Flu Immunization

Test 1: Professor Symanzik Statistics

observational studies Descriptive studies

Unit 1 Exploring and Understanding Data

Outline. Chapter 3: Random Sampling, Probability, and the Binomial Distribution. Some Data: The Value of Statistical Consulting

Study Guide for the Final Exam

Chapter 13. Experiments and Observational Studies

AP Statistics TOPIC A - Unit 2 MULTIPLE CHOICE

AP Stats Review for Midterm

Graphic Organizers. Compare/Contrast. 1. Different. 2. Different. Alike

Surveys with questions on adolescents and HPV

Infectious Disease Models 4: Basic Quantities of Mathematical Infectious Disease Epidemiology. Nathaniel Osgood CMPT

ANATOMY OF A RESEARCH ARTICLE

Understanding Preventive Care

Lesson Building Two-Way Tables to Calculate Probability

Exercises on SIR Epidemic Modelling

Describing Data: Part I. John McGready Johns Hopkins University

Chapter 11. Experimental Design: One-Way Independent Samples Design

Experimental design. Basic principles

Lesson: A Ten Minute Course in Epidemiology

Chapter 13 Summary Experiments and Observational Studies

Reading for Evidence: Cancer Understandings

To learn more about your plan, please see empireblue.com.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

SUPPLEMENTARY MATERIAL

Chapter 1 Data Types and Data Collection. Brian Habing Department of Statistics University of South Carolina. Outline

What is Statistics? (*) Collection of data Experiments and Observational studies. (*) Summarizing data Descriptive statistics.

Controlled Experiments

Introduction; Study design

Chapter 8: Estimating with Confidence

Statistics: Interpreting Data and Making Predictions. Interpreting Data 1/50

Bayesian Analysis by Simulation

PROTECT YOURSELF + PROTECT YOUR PARTNER. trichomoniasis THE FACTS

Data that can be classified as belonging to a distinct number of categories >>result in categorical responses. And this includes:

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

ISC- GRADE XI HUMANITIES ( ) PSYCHOLOGY. Chapter 2- Methods of Psychology

Grow & Stay Healthy Guidelines to Live By

Chapter 12. The One- Sample

3.2 Least- Squares Regression

Bacteria and Viruses

Unraveling Recent Cervical Cancer Screening Updates and the Impact on Your Practice

Preventive health guidelines As of May 2015

Transcription:

Chapter 15: Continuation of probability rules Example: HIV-infected women attending either an infectious disease clinic in Bangkok were screened for high-risk HPV and received a Pap test; those with abnormal cervical cytologies were referred for diagnosis and treatment. 1 The table below shows some of the data. Table 1: Data extracted from Table 2, of Screening HIV-Infected Women for Cervical Cancer in Thailand: Findings From a Demonstration Project, Sirivongrangson, et al. Cytology High-risk HPV yes no Total negative 6 123 129 positive 37 44 81 Total 43 167 210 This data set will be used to develop some more probability rules besides those presented in Chapter 14. The 210 women cross-classified in table will be treated as a population. 1. Let A denote the event that a person selected at random from the population tests positive for high-risk HPV. One individual is to be drawn at random from among the 210 and each is equally likely to be selected. Since 81 of the 210 have tested positive for high-risk HPV, P (A) = 81 210 =.386. 2. Let B denote the event that an individual selected at random from the population has an abnormal cytology. The probability of A is the number that were found to have abnormal cytology out of the number that were screened; hence, P (A) = 43 210 =.205. 3. The probability that an individual tests positive for high-risk HPV and was found to have abnormal cytology is P (A and B) = 37 210 =.176. The event that a randomly selected individual tests positive for high-risk HPV or is found to have abnormal cytology is the proportion of the 210 that fall in one of the three groups: 1. Individuals that tested positive for high-risk HPV and were not found to have abnormal cytology (there are 44 individuals in this group), 1 Sexually Transmitted Diseases: February 2007, Volume 34, Issue 2, pp 104-107. 108

2. Individuals that tested negative for high-risk HPV and were found to have abnormal cytology (there are 6 individuals in this group), and 3. Individuals that tested positive for high-risk HPV and were found to have abnormal cytology (there are 37 individuals in this group). Thus, the probability that an individual tests positive for high-risk HPV or is found to have abnormal cytology is 44 + 6 + 37 P (A or B) = =.414. 210 Another method of computing the P (A or B) uses the general addition rule: P (A or B) = P (A) + P (B) P (A and B). Why subtract P (A and B)? Because the individuals in this group were counted once in computing P (A) and again in computing P (B); they ve been counted twice. Subtracting P (A and B) corrects for double-counting. Cytology abnormal HPV positive 6 37 44 The first calculation avoids over-counting by decomposing the event of interest (A or B) into three mutually exclusive events. (Two mutually exclusive events have no individuals in common). 123 The Venn diagram above illustrates the situation. Note that the box represents the entire population, and that the 2 circles partition the box into 4 non-overlapping regions. The regions correspond to the sub-groups of the population. Remark: A useful mathematical result is that the probability of an event A can be written as the union of two mutually exclusive events. Let B denote another event in the sample space. Then, A = (A and B) or (A and B c ). As the events (A and B) and (A and B c ) are mutually exclusive, P (A) = P (A and B) + P (A and B c ). (1) 109

Re-arranging equation (1) provides another useful result: Salk polio vaccine trials of 1954 P (A and B) = P (A) P (A and B c ). One of the largest randomized experiment ever (in terms of numbers of subjects) used 401,983 volunteer children. Of these, 200,745 received the Salk vaccine and 201,229 received a placebo. Of those that received the vaccine, 33 contracted paralytic polio 2 during the 1954 polio season and of those that received the placebo, 115 contracted paralytic polio. Let V denote the event that a child received the vaccine and I denote the event that a child contracted polio. The data are summarized in Table 2 Table 2: Experimental results of the 1954 Salk vaccine trials. Vaccinated Placebo Total Number of children 200,745 201,229 401,983 Number of paralytic cases 33 115 148 For this population of 401, 983 volunteer children, find the probabilities of the following events using the general addition rule 3. 1. A child selected at random contracted polio. 2. A child selected at random contracted polio and received the vaccine. 3. A child selected at random contracted polio and did not receive the vaccine. The probabilities are 1. P (I) = 148 =.000368 is the probability that a child selected at random contracted 401,983 paralytic polio. 2. P (V and I) = 33 =.000082 is the probability that a child selected at random 401,983 contracted paralytic polio and received the vaccine. 3. P (V c and I) is the probability that a child selected at random contracted paralytic polio and did not receive the vaccine. We can compute according to P (V c and I) = P (I) P (V and I) =.000368.000082 =.000286. 2 Paralytic polio does not include diagnosed polio cases without paralysis - there were 83 total diagnosed cases. 3 The population is sufficiently large that some of the probabilities below may (in principle) be treated as probabilities applying to the larger population of all U.S. resident children during the 1950 s. 110

How much more likely is it select at random an unvaccinated child that contracted paralytic polio than it is to select at random a vaccinated child that contracted paralytic polio? Was this vaccine a great success? 4.000286 = 3.49 times..000082 Conditional probability An excellent approach to assessing the effectiveness of vaccines is through conditional probability. We discussed conditional distributions in Chapter 3 (a conditional distribution was obtained from a two-way table by considering the distribution of a variable when the data were limited to a single level of the second variable). The approach is extended now to probabilities. Suppose that we examine the probability of contracting paralytic polio given that the child received the Salk vaccine. The probability is expressed as P (I V ) and it is the proportion of vaccinated children that contracted the disease: P (I V ) = 33 200, 745 =.000164. (2) In comparison, the probability of contracting paralytic polio given that the child received the placebo is P (I V c ) = 115 201, 229 =.000571, and the chance that a child contracts paralytic polio are (approximately).000571.000164 = 3.48 times greater if the child is not vaccinated compared to vaccinated children. Note that even if the vaccine caused some of the polio cases, it s better to be vaccinated than not. The conditional probability of A given B can be computed from P (B) and P (A B). The principle behind the calculation is that if A is to happen given that B has happened, then A B must happen. P (A B) is not the conditional probability, since P (A B) is, in the Salk vaccine example, the fraction of children that contract paralytic polio and have been vaccinated out of all children including those that were not vaccinated. These additional children should not be counted since we are interested only in the proportion of vaccinated children that contract polio out of all vaccinated children. To convert P (A B) to a conditional probability, it should be divided by the probability of the conditional event P (B). Dividing by P (B) serves to scale P (A B) to reflect the 4 Upon hearing the results of the study people openly wept with relief, p. 203, Polio, An American Story, D.M. Oshinsky, 2005. 111

likelihood of the event B. The formula for the conditional probability is P (A B) = P (A B), (3) P (B) provided P (B) > 0. If P (B) = 0, then it is impossible for B to occur, and the conditional probability of A given that B has occurred is meaningless. For example, let s compute P (I V ) again using the conditional probability formula. First, P (V ) = 200745/401983 =.499386, then, P (I V ) = P (I V ) P (V ) =.000082.499386 =.000164, which is the same as the result above (equation 2). Examine the last calculation again. Recall that P (I V ) = 33 401983 Thus, and P (V ) = 200745 401983. P (I V ) = P (I V ) P (V ) 33 = 401983 401983 200745 33 = 200745, which is the proportion of the vaccinated children that contracted paralytic polio. The conditional probability formula (3) changes the denominator from the number of all children to the number in the conditional set (vaccinated children) when dividing by the probability of the conditioning event (V in this example). The conditional probability formula (equation 2) can be rearrranged to provide a formula for computing the probability of A B from the probability of B and the conditional probability of A given B: P (A B) = P (A B)P (B). (4) Also, P (A B) = P (B A)P (A). Independence: A mathematical definition of independence can be formulated at this point. Events A and B are independent if P (A B) = P (A). In other words, conditioning on B, or 112

knowing that B has occurred does not alter the probability of A no information is gained about the likelihood of A by knowing that B has occurred. For example, if A is the event that a randomly selected individual has black hair, then knowing that the individual is female does not change the probability of A. If A is the event that the randomly selected individual is at least 6 feet tall, then knowing that the individual is male does alter the probability of A. 5 If events A and B are independent, then the following mathematical statements are true: P (A B) = P (A) P (B A) = P (B) P (A B) = P (A)P (B). Suppose two events are disjoint (they cannot happen simultaneously). For instance, suppose that A is the event that randomly selected individual is male and B is the event that the individual is able to give birth. Are A and B independent? 6 Example Consider the Titanic data. Is there an association between the ticket class of the passenger and survivorship? Table 3 is a contingency table showing the cross-classification of passengers by survival status and ticket. Table 3: Titanic data from Table 2, p. 23, Intro Stats. Class First Second Third Crew Total Survived 203 118 178 212 711 Died 122 167 528 673 1490 Total 325 285 706 885 2201 The conditional distributions of survivorship (conditioning on ticket class) gives tells us the relative frequency of survivorship given ticket class. For example, given that the ticket class was first-class, the relative frequency of survival is 203/325 =.625, and the relative frequency of death is 1.625 = 122/325 =.375. 5 Six feet is approximately the 80th percentile of 20-year-old male heights whereas 6 feet is approximately the 97th percentile of 20-year-old female heights. Suppose we have 200 randomly selected 20 year-old individuals. About 100 23 200 = 11.5% will be at least 6 feet tall and the conditional probabilities of being over 6 feet are.2 for males and.03 for females. The conditional probability of being at least feet tall given the individual is male is.115/.03 = 3.83 larger than the unconditional probability of being at least 6 feet tall. 6 No, since P (A) =.5 0 = P (A B) where P (A B) is the probability that an individual is male given that they are able to give birth. 113

The event that a passenger holds a first-class ticket is denoted by A, and S denotes the event that a passenger survives. The conditional probability of survival is P (S A) =.625 and the conditional probability of death is P (S c A) = 1 P (S A) =.375. Table 4 shows the conditional probabilities of survival and death by ticket class (including crew). Table 4: Conditional probabilities of survival and death for Titanic passengers. Class First Second Third Crew Survived.625.414.252.096 Died.375.586.748.904 Let s ignore the crew and concentrate on passengers (there are 1316 passengers tabulated in Table 4). If a passenger is randomly selected, then the probability that the passenger is first-class is P (A) = 325/1316 =.247; the probability that the passenger is secondclass is P (B) = 285/1316 =.216; and the probability that the passenger is third-class is P (C) = 1.247.216 =.537. The probability that a randomly selected passenger survives is 203 + 118 + 178 P (S) = =.379 1316 and the probability that a randomly selected passenger does not survive is P (D) = 1 384 =.621. 114