Intro to Probability Instructor: Alexandre Bouchard

www.stat.ubc.ca/~bouchard/courses/stat302-sp2017-18/ Intro to Probability Instructor: Alexandre Bouchard

Plan for today: Bayesian inference 101 Decision diagram for non equally weighted problems Bayes rule Law of total probability

NSERC USRA Paid internship program to do research in a research group Deadline: Jan 18 I will hire one student for a project on probabilistic inference Looking for someone with programming experience, esp. on JVM + strong interest in probability/machine learning If interested drop me a email with CV + github Many other projects from other faculty members

www.stat.ubc.ca/~bouchard/courses/stat302-sp2017-18/ Logistics What s new/recent on the website: Webwork: first set is released. Due on the 18th. Problems being fixed

Review

Def 7 Conditional probability: Review Conditioning S P(A E) = P(E A) P(E) E (observation) A (query)

Prop. 5 Properties Check these: 0 P(A E) 1 P(S E) = 1 If A1, A2,... are disjoint: P(A1 A2... E) = P(A1 E) + P(A2 E) +... In other words: P(. E) is a probability

Bonus: A more intuitive view of independent events Recall: Events A and B are independent if: P(A B) = P(A) * P(B) Equivalent definition (when P(B) > 0): Events A and B are independent if: P(A B) = P(A)

Decision trees and conditional probabilities

Ex. 23 Two bags 1) Flip a coin to pick one of the two bags 2) Pick one shape from the selected bag Question: Probability to pick the star? A. 1/6 B. 1/5 C. 1/4 D. 1/2 Bag of red shapes Bag of blue shapes

Decision tree: not equally weighted outcomes 1/2 Question: Probability to 1/2 B pick the star? S 1/2 1/3 A. 1/6 B. 1/5 C. 1/4 1/2 R 1/3 D. 1/2 P(R S) = P(R) 1/3 P(L R) Leaf L

Decision tree: not equally weighted outcomes 1/2 (1/2) * (1/2) = 1/4 1/2 B (1/2) * (1/2) S 1/2 = 1/4 1/3 (1/2) * (1/3) 1/2 R 1/3 = 1 / 6 (1/2) * (1/3) P(R S) = P(R) = 1 / 6 1/3 L P(L) = (1/2) * (1/3) P(L R) = 1 / 6

Decision tree: not equally weighted outcomes 1/2 Question: Probability to 1/2 B pick the star? S 1/2 1/3 A. 1/6 B. 1/5 C. 1/4 1/2 R 1/3 D. 1/2 P(R S) = P(R) 1/3 P(L R) Leaf L

Prop. 6 Chain rule For any events A, B, with P(A) > 0: P( A, B) = P(A) P(B A) Useful trick: it can be done in the other order P( A, B) = P(B) P(A B)

Medical tests and the law of total probability

Ex. 24 HIV tests: background Ways of detecting HIV infections: Direct: try to match conserved regions of HIV s genome, Indirect: try to detect the immune system s reaction to the virus. Modern tests often use combinations of these methods and are highly accurate. Today: let s assume an hypothetical, less than perfect test.. NOTE: The probabilities we will compute today are NOT representative of modern HIV test accuracies. But the ideas apply to rare/harder to diagnose diseases.

Ex. 24 HIV testing An HIV test has the following two modes of failure: When a patient has the disease, the test will still be negative 2% of the time (false negative) When a patient does not have the disease, the test will turn positive 1% of the time (false positive)

Ex. 24 HIV prevalence Source: Wikipedia, http://tinyurl.com/n7kakso

Ex. 24 HIV testing An HIV test has the following two modes of failure: When a patient has the disease (event H), the test will still be negative (T - ) 2% of the time (false negative) When a patient does not have the disease (H c ), the test will turn positive (T + ) 1% of the time (false positive) Question: if 0.5% of the population has HIV, what % of the tests will turn positive? A. 2.0% B. 2.198% C. 1.485% D. 0.02%

Ex. 24 Translation of the problem into probabilities and conditional probabilities An HIV test has the following two modes of failure: When a patient has the disease (event H), the test will still be negative (T - ) 2% of the time (false negative) When a patient does not have the disease (H c ), the test will turn positive (T + ) 1% of the time (false positive) Question: if 0.5% of the population has HIV, what % of the tests will turn positive? P(T- H) = 0.02 P(T+ H c ) = 0.01 P(H) = 0.005 P(T+) =?

Decision Ei P(Ei) tree T+ 0.98 H T+ 0.0049 0.005 H 0.02 H T- 0.0001 S 0.995 H c 0.99 H c T- 0.98505 0.01 H c T+ 0.00995 Why? P(T+) = P(H T+) + P(H c T+) = 0.01485

Justification for: P(T+) = P(H T+) + P(H c T+) First: recall that... Def. 4 F { E1 E2 E3 Partition of an event The events Ei form a partition of the event F if: 1. The union of the Ei s is equal to F: i Ei = F 2. The Ei s are disjoint: if i j, then Ei Ej = F = T+ E1 = H T+ E2 = H c T+ E4 Why is this useful? Note: as consequence of 1, 2 and the axioms of probability, P(F) = P(E1) + P(E2) + P(E3) + P(E4) S T+ H c Second: check that... 1. E1 E2 = F 2. E1 E2 = H

Prop. 7 Law of total probability If A is any event of interest (for example, A = T+ in the previous example), then we can decompose its probability as: P(A) = P(H A) + P(H c A) This is true for any event H. Why? P(A) = P(S A) = P((H H c ) A) = P((H A) (H c A)) = P(H A) + P(H c A)

Prop. 8 Bayes rules Notation: H, H c : a partition of the space into hypotheses (example: whether the patient has HIV or not). E: an observation (example: test positive, E = T+) We want: P(H E). We have: P(E H), P(E H c ), P(H) P(H E) = P(E,H) P(E) = P(H) P(E H) P(E, H) + P(E, H c ) Chain rule Definition of conditional probability Law of total pr Chain rule = P(H) P(E H) P(H) P(E H) + P(H c ) P(E H c )

Applying Bayes rule An HIV test has the following two Ex. 26 modes of failure: When a patient has the disease, the test will still be negative 2% of the time (false negative) When a patient does not have the disease, the test will turn positive 1% of the time (false positive) Suppose now 25% of the population has HIV. Question: Given that the test is positive, what is the updated (posterior) probability that the patient is indeed affected by HIV? P(H E) = P(H) P(E H) P(H) P(E H) + P(H c ) P(E H c ) H, H c : a partition of the space into hypotheses E: an observation A. ~34% B. ~55% C. ~93% D. ~97%

Ex. 26 Applying Bayes rule An HIV test has the following two modes of failure: When a patient has the disease, the test will still be negative 2% of the time (false negative) When a patient does not have the disease, the test will turn positive 1% of the time (false positive) Question: Questions: (a) If 25% of the population has HIV, what % of the tests will turn positive? (b) Given that the test is positive, what is the updated (posterior) probability that the patient is indeed affected by HIV? P(H E) = P(H) P(E H) P(H) P(E H) + P(H c ) P(E H c ) H, H c : a partition of the space into hypotheses E: an observation A. ~34% B. ~55% C. ~93% D. ~97%

Law of total probability If A is any event of interest (for example, A = T+ in the previous example), then we can decompose it as: P(A) = P(H A) + P(H c A) This is true for any event H, H c. Extended version? A S { H Feature of H, H c we used? They form a partition of S into 2 blocks. The law of total probability still works with partitions of more than 2 blocks. H c

Prop. 7 (v2) Law of total probability If A is any event of interest, and H1,.., Hn is any partition of S, then we can decompose A as: S { P(A) = P(H1 A) +... + P(Hn A) n = P(Hi A) i = 1 A H3 H1 H2 H4

Prop. 8 (v2) Bayes rule (extended version) Notation: H1,..., Hn: a partition of the space into hypotheses. S { E: an observation. We want: P(Hi E). We have: P(E Hi), P(Hi) H1 H2 P(Hi E) = P(Hi) P(E Hi) P(H1) P(E H1) +... + P(Hn) P(E Hn) H3 H4

Prop. 6 (v2) Chain rule For any events A, B, with P(A) > 0: P( A, B) = P(A) P(B A) Extended version For any events A, B, C, with P(A, B) > 0: P( A, B, C) = P(A) P(B A) P(C A, B)