Population Population the complete collection of ALL individuals (scores, people, measurements, etc.) to be studied the population is usually too big to be studied directly, then statistics is used Parameter a numerical measurement (value) describing some characteristic of a population population parameter 1 2 Example: US population (about 323 million) N = 323,000,000 Examples of parameters: 1. Proportion of people supporting Health Care Reform One can denote it by p (0<p<1) 2. Average weight of Americans One can denote it by m (m>0) 11 Census versus Sample Census Collection of data from every member of a population (must include all N measurements) Sample Subcollection of members selected from a population (n measurements; n<n; n is sample size) 3 4 Statistic statistic a numerical measurement describing some characteristic of a sample. sample statistic Common Pitfalls in Statistical Studies Voluntary response sample when respondents themselves decide whether to be included in the sample Example: Internet polls Example: Mail-in polls Example: Telephone call-in polls 5 6
Misleading Conclusions Example: A statistical analysis might justify there is a correlation between the number of cigarettes smoked and pulse rate, but it could not justify the number of cigarettes smoked causes one s pulse rate to change. Reported Results Example: asking subjects their weight Small Samples Loaded Questions Questions intentionally worded to elicit a desired response More listed on pg. 10 of textbook 7 Statistical Significance when results are obtained in a study that are very unlikely to occur by chance Example: A coin landing on heads 95 out of 100 flips is statistically significant since this is unlikely to happen when flipping a fair coin. Whereas, flipping 54 heads out of 100 flips is not statistically significant since this is likely to occur by chance. 8 Practical Significance Does a treatment/finding make enough of a difference to justify its use or to be practical? Example: In a test of the Atkins weight loss program, 40 subjects using the program had a mean weight loss of 4.6 lbs. after one year. Using statistical methods, it was concluded that mean weight loss of 4.6 lbs. is statistically significant. However, common sense tells us that it doesn t seem very worthwhile to pursue a weight loss program with such small results. Someone starting a weight loss program would probably want to lose a lot more that 4.6 lbs. in a year! Hence, even though the weight loss is statistically significant, it isn t practically significant! 22 Data Data collections of observations (such as measurements, records, survey responses, etc.) Next slides will describe types of data 9 10 Quantitative Data Categorical Data Quantitative (or numerical) data Categorical (or qualitative) data consists of numbers representing counts or measurements. consists of names or labels (representing categories) Example: The weights of selected people Example: The ages of respondents Example: The genders (male/female) of professional athletes Example: s in a poll (yes/no) Example: Students grades (A,B,C,D,F) 11 12
Working with Quantitative Data Quantitative (numerical) data can further be described by distinguishing between discrete and continuous types 13 Discrete data Discrete Data these are numbers whose possible values are either a finite list of values or a countable list of values (for instance, possible values are 0, 1, 2, 3,...) Example: The number of children in a family 14 Continuous Data Continuous (numerical) data these are numbers with infinitely many possible values that correspond to some continuous scale that covers a range of values without gaps, interruptions, or jumps Example: The weight of a person Example: The volume of cola in a can 33 Levels of Measurement Alternatively, data can be described by distinguishing between nominal, ordinal, interval, & ratio types 15 16 Nominal level of measurement Ordinal level of measurement these are data that consist of names, labels, or categories only, i.e. the data cannot be arranged in any ordering these are data that can be arranged in some order, but there are no numerical differences between data values scheme (such as low to high) Example: Survey responses (yes, no, undecided) Example: Course grades (A, B, C, D, or F) Example: Survey responses such as Example: Genders (male, female) (highly satisfied/satisfied/unsatisfied/very unsatisfied) 17 18
Interval level of measurement these are data that can be arranged in some order, and the numerical difference between data values are meaningful, but the data have no natural zero starting point (when none of the quantity is present) Example: Temperature in Celsius Ratio level of measurement these are data that can be arranged in some order, and the numerical difference between data values are meaningful, and there is a natural zero starting point where none of the quantity is present Example: Temperature in Kelvin Example: Years 19 Example: Length Example: Weight 20 Simple Random Sample Simple Random Sample of n subjects selected in such a way that every possible sample of the same size n has the same chance of being chosen Example: drawing names out of a hat 44 Random Sample Random Sample of n subjects is when all n members of the population have the same chance of being selected. Example: Consider you use a coin to select a group of 3 from from a class of 6 students depicted below. If you use a coin to select a row, each student has the same chance of being selected. This is a random sample. However, each group of 3 does not have the same chance of being selected. This is not a simple random sample. heads 21 tails 22 Review Pg. 15 #39 The Newport Chronicle ran a survey by asking readers to call in their response to the question: Do you support the development of atomic weapons that could kill millions of innocent people? It was reported that 20 readers responded and 87% said no while 13% said yes. Identify four major flaws in this survey. Wording of question is biased and tends to encourage negative response. The sample size of 20 is too small. Voluntary response sample If 20 readers responded the percentages should be multiples of 5, so 87% and 13% are not possible results. If x readers call in and say yes then the percentage of yes is x / 20 100 = 5x % 23 24
Pg. 36 #4 72% of Americans squeeze their toothpaste tube from the top. This and other not-so-serious findings are in The First Really Important Survey of American Habits. Those results are based on 7,000 from the 25,000 questionnaires that were mailed. a. It uses a voluntary response sample, and those with special interests are more likely to respond, so its very possible that the sample is not representative of the population. a. What is wrong with this survey? 25 26 Pg. 35 #3 A data set includes depths (km) of the sources of earthquakes. Are these values discrete or continuous? Continuous Continuous numbers have infinitely many possible values that correspond to some continuous scale that covers a range of values without gaps, interruptions, or jumps 55 Pg. 35 #4 Are the earthquake depths described in the previous problem quantitative data or categorical data? quantitative data Quantitative data consists of numbers representing counts or measurements. 27 28 Pg. 35 #5 Which of the following best describes the level of measurement of the earthquake depths described in #3: nominal, ordinal, interval, ratio? Ratio Ratio data can be arranged in some order, and the numerical difference between data values are meaningful, and there is a natural zero starting point where none of the quantity is present 29 Types of Studies Observational Study 1. data are observed and collected on each subject 2. NO manipulation of the subject s environment occurs Example: randomly select a sample of subjects and observe them in order to record data for each subject on amount of exercise and number of colds within a set amount of time 30
Experiment 1. manipulate the subject s environment, then 2. measure the effects Example: Obtain a group of study participants (often volunteers) Manipulate: randomly assign the participants to the treatment (exercise) and control groups (no exercise). After a set amount of time, record amount of exercise and the number of colds for each person Confounding this occurs in an experiment when the investigators are not able to distinguish among the effects of different factors Example: A drug manufacturer tests a new cold medicine with 200 volunteer subjects - 100 men and 100 women. The men receive the drug, and the women do not. At the end of the test period, the men report fewer colds. 31 32 This experiment could be strengthened with a better design. Women and men could be randomly assigned to treatments. One treatment could receive a placebo, with blinding. Then, if the treatment group (i.e., the group getting the medicine) had sufficiently fewer colds than the other group, it would be reasonable to conclude that the medicine was effective in preventing colds. This is an example of a Randomized Block Design 66 33 34 Systematic Sampling pick a starting point and then select every k th element in the population Convenience Sampling use results that are easy to get Example: wanting to estimate defectiveness rate of phones coming off an assembly line You decide to walk over and pick up the 1 st phone that comes off the assembly line and then pick every 5 th phone after that Example: The CBS News station in New York often obtains opinions by interviewing neighbors of a person who is the focus of a news story. 35 36
Stratified Sampling Subdivide the population into at least two subgroups so that subjects within the same subgroup share the same characteristics (such as age groups) then draw a sample from each subgroup Example: divide a population people into male and female and then select a number of males and a number of females from each group Note: this is the same basic idea as a randomized block design, but stratified sampling is used for surveys; whereas, randomized block design is used when designing experiments 37 Cluster Sampling divide the population into sections (clusters) and then randomly select some of the clusters and choose all of the members from the selected cluster Example: On the day of the last presidential election, ABC News organized an exit poll in which specific polling stations were randomly selected, and all voters were surveyed as they left the premises. 38 Pg. 35 #6 Pg 32 #9 True of false: If you construct a sample by selecting every sixth earthquake depth from a list, the result is a simple random sample. False This is an example of systematic sampling. 39 77 You collected sample data by randomly selecting 12 different pages from Harry Potter and the Sorcerer's Stone and then finding the number of words in each sentence on each of those pages. What type of sampling is this? Cluster Sampling You divided the book into clusters (pages) and randomly selected some of the pages and all the sentence lengths from those pages were selected. 40 Pg. 37 #8 In a recent poll of 1500 adults, 52% of respondents said the use of marijuana should not be made legal. In the same poll, 23% of the respondents said that the use of medical marijuana should not be made legal. a. The sample of 1500 adults were selected from the population of all adults in the US. The method used to select the sample was equivalent to placing the names of all adults in a giant bowl, mixing the names, and then drawing 1500 names. What type of sampling is this? (simple random, random, systematic, convenience, stratified, cluster) 41 Simple Random Each group of size 1500 has the same chance of being selected. 42
b. If the sampling method consisted of random selection of 30 adults from each of the 50 states, what type of sampling would this be? (random, systematic, convenience, stratified, cluster) Stratified The US population was divided into subgroups (states) and then a random sample was drawn from each subgroup c. What is the level of measurement of the responses yes, no, don t know, and refused to respond? Nominal The data consists of labels and the data cannot be arranged in an ordering scheme. 43 44 In a recent poll of 1500 adults, 52% of respondents said the use of marijuana should not be made legal. In the same poll, 23% of the respondents said that the use of medical marijuana should not be made legal. d. Is the value of 52% a statistic or a parameter? Statistic It is based on a sample and not everyone in the US. 45 88 What is the purpose of Statistics? To collect and analyze data in order to gain knowledge of aspects of the population that would otherwise be unknown. Statistics should help you understand the world around you and help you make better informed decisions. 46