Greg Brewster, DePaul University Page 1 LSP 121 Math and Tech Literacy II Stats Wrapup Intro to Greg Brewster DePaul University Statistics Wrapup Topics Binning Variables Statistics that Deceive Intro to Events Sample spaces Empirical probabilities Theoretical probabilities Subjective probabilities Binning Grouping values (and their frequencies) into bins so that meaningful data analysis is possible.
Why is it valuable? If 100 students took an exam and everyone received a different score, it would be very difficult to analyze the results without binning. A histogram of the frequencies would be flat. How does PASW support binning? PASW provides a Visual Binning Transform Allows you to create a new variable that groups values from an old variable Old variable must be type Scale Then do Histogram with the binned variable. Statistics Can Deceive People who present statistical results often have an agenda. They wan their results to prove a certain result. For this reason, they may present them in a way that favors their outcome. Greg Brewster, DePaul University Page 2
Generalizing Sample Results Statistics are often taken from a small sample and then generalized to a larger group. Example: someone takes a survey of 100 DePaul students. Then they assume that these results reflect the opinions of all 23,000 DePaul students. Generalizing Sample Results In order to generalize sample results to a larger group, you should Make sure that the sample group is chosen randomly from the larger group in an unbiased way. Calculate a margin of error that specifies how much the larger group results might differ from the smaller sample results. The larger the sample group the smaller the margin of error. Generalizing Sample Results Example: Our survey shows that 57% of customers prefer Brand A, while only 43% prefer Brand B. **(Margin of Error: +/- 8%) Do these results prove anything? Not really. Could be 57%-8% = 49% who prefer Brand A, while 43%+8% = 51% prefer Brand B in the larger population. Greg Brewster, DePaul University Page 3
Greg Brewster, DePaul University Page 4 Some Typical Problems Descriptive Statistics Biased choice of sample Presentation of only results you want Grouping (e.g. Simpson s Paradox) Generalizing sample results to larger group Non-random sampling Sample size Margin of Error may not be shown Incorrect choice of statistic Simpson s Paradox From a data sampling viewpoint, the larger the data set, the better Simpson s Paradox demonstrates that a great deal of care has to be taken when combining smaller data sets into a larger one Sometimes the conclusions from the larger data set are opposite of the conclusion from the smaller data sets Example: Simpson s Paradox Baseball batting statistics for two players First Half Second Half Total Season Player A.400.250.264 Player B.350.200.336 How could Player A beat Player B for both halves individually, but then have a lower total season batting average? If someone wants to deceive you, they could just publish the First and Second Half results and leave out the Total Season results.
Example Continued We weren t told how many at bats each player had: First Half Second Half Total Season Player A 4/10 (.400) 25/100 (.250) 29/110 (.264) Player B 35/100 (.350) 2/10 (.200) 37/110 (.336) Player A s dismal second half and Player B s great first half had higher weights than the other two values. To avoid this problem Be very careful when you combine data subsets into a larger set Be an informed consumer of statistics Be skeptical Ask questions (courtesy of Dr Jost) Who says so? How do they know? What s missing? Did somebody change the subject? Does it make sense? Greg Brewster, DePaul University Page 5
Greg Brewster, DePaul University Page 6 How likely is it that something will happen? Expressing Event Something that occurs For example: roll of dice, flip of coin, weather forecast, election
Outcome Result of an event Has a value of interest to us For example: value on rolled die is 6, heads, rain, Bob Smith is treasurer Sample Space = All Possible Outcomes All possible combinations of outcomes Example - Flip one coin sample space T H Example Roll a die sample space: 1, 2, 3, 4, 5, 6 Sample Space = All Possible Outcomes All possible combinations of outcomes Example - Flip two coins Sample space HH HT TH TT Greg Brewster, DePaul University Page 7
Size of Sample Space = Number of Possible Outcomes M possible outcomes for one event and N possible outcomes for a event: Total number of possible outcomes (size of sample space) for the two events combined = M x N How many outcomes are possible when you roll two dice? Sample Space/ Possible Outcomes (cont.) A restaurant menu offers two choices for an appetizer, five choices for a main course, and three choices for a dessert. How many different possible three-course meals are there? A college offers 12 natural science classes, 15 social science classes, 10 English classes, and 8 fine arts classes. How many possible four-class combinations are there? Expressing As a proportion 0.0 1.0 As a percentage 0 100 % Greg Brewster, DePaul University Page 8
of an Event Occurring P(A) = of A occurring = proportion of the possible event(s) in which a particular outcome (A) occurs For example, Rolling a die (event) has 6 possible outcomes, (1,2,3,4,5,6). Sample space size = 6 of any of those outcomes is 1/6. For example, P(die roll = 3) = 1/6. of an Event Not Occurring P(not A) = 1 - P(A) For example, If the probability of rolling a 3 with one die is 1/6, then the probability of NOT rolling a 3 with one die is 1-1/6 = 5/6 Types of Greg Brewster, DePaul University Page 9
Greg Brewster, DePaul University Page 10 Three Basic Types of Theoretical, or a priori Empirical Subjective Theoretical or a Priori Theoretical, or a priori probability based on situations in which all outcomes are known to be equally likely. Probabilities can be calculated before event Examples: coin toss, dice roll, draw a card, spin a roulette wheel. Theoretical P(A) = (number of ways A can occur) (total # outcomes (sample space size)) e.g. of a head landing in a coin toss: 1/2 of rolling a 7 using two dice: that a family of 3 will have two boys and one girl
Greg Brewster, DePaul University Page 11 Empirical Empirical probability based on the results of observations or experiments. Used to predict the probability of future events based on how often they happened in the past. Empirical based on observations or experiments Example: Records indicate that a river has crested above flood level just four times in the past 2000 years. What is the empirical probability that the river will crest above flood level this year? 4/2000 = 1/500 = 0.002 Comparing Theoretical and Empirical Probabilities Theoretical probability of a coin flip resulting in heads =.5 But your actual results when flipping a coin (empirical probability) may not be exactly the same
Greg Brewster, DePaul University Page 12 Law of Large Numbers The theoretical probability of tossing a coin and landing tails is 0.5. But what if you toss it 5 times and you get HHHHH? The Law of Large Numbers says that the if you toss the coin a very large number of times then the empirical probability will approach the theoretical probability. Law of Large Numbers http://bcs.whfreeman.com/ips4e/cat_0 10/applets/expectedvalue.html Gambler s Fallacy You are playing craps in Vegas. You have had a string of bad luck. You figure since your luck has been so bad, it has to balance out and turn good Bad assumption! Each event is independent of another and has nothing to do with the previous run. Especially in the short run. It takes a very large number of tries before the Law of Large Numbers kicks in. This is called the Gambler s Fallacy
Subjective Subjective (personal) probability use personal judgment or intuition. For example If you go to college today, you will be more successful in the future. Often this probability is not quantified with any number. Greg Brewster, DePaul University Page 13