Chapter 5: Producing Data Key Vocabulary: observational study vs. experiment confounded variables population vs. sample sampling vs. census sample design voluntary response sampling convenience sampling bias chance simple random sample table of random digits probability sample stratified random sample strata cluster sampling multistage sampling design undercoverage nonresponse response bias probability sampling frame systematic random sample margin of error experimental units subjects treatment explanatory vs. response variables factor level placebo effect causation control group randomization randomized comparative experiment completely randomized experiment statistically significant replication double-blind experiment lack of realism blocking and block design randomized paired comparison matched pairs design repeated measures design relative frequency probability model simulation independent trial Calculator Skills: randint(a, b, c) or randint(a, b) 5.0 Introduction (pp. 266-270) 1. Who is considered to be the father of statistics, and why? What was his greatest contribution? 2. To what do the conclusions we draw from data analysis apply? Chapter 5: Producing Data Page 1 of 9
3. We must produce data in a way that is designed to do what? 4. What is meant by a sample? 5. What is the difference between an observational study and an experiment? 6. IMPORTANT: Experiments and samples provide useful data only when. 7. Which is preferable, an observational study or an experiment? Why? 8. Example 5.1 Why is it that an observational study cannot tell us what the effects of the described welfare policy would be? How do we remedy the shortcomings of the observational study? 9. What do we mean by confounded? 10. Well-designed experiments take steps to eliminate what? 11. Name a few reasons why observational studies or experiments may be difficult. 12. What is an alternate method for producing data (more on this in section 5.3)? 13. Statistical inference is an important topic on which we will spend a lot of time this year. In what way are data production and statistical inference connected? 5.1 Designing Samples (pp.270-289) 1. What is the key difference between a population and a sample? 2. What is the key difference between sampling and a census? Chapter 5: Producing Data Page 2 of 9
3. What is meant by a sample design, and why is it important? 4. What is a voluntary response sample and why is it a poor sample design? 5. What is convenience sampling, and why is it a poor sample design? 6. What is a two-word definition of bias (look on page 272), what does this mean, and why is it undesirable? 7. What is the primary difference between simple random samples and voluntary response or convenience samples? 8. IMPORTANT: What is the most primitive method of simple random sampling? We typically want to try to do something as similar to this as possible! 9. Pay close attention to the definition of an SRS it does NOT say only that every individual has an equal chance of being selected it says that every possible sample of n individuals has an equal chance of being selected. Make sure you understand this difference. 10. What two properties of a table of random digits make it a good choice for creating a simple random sample? 11. Understand Example 5.4, which demonstrates how to use a random-number table to choose an SRS. Particularly, make sure you understand that the way in which you label the population is somewhat arbitrary (although not always completely so). TIPS: (1) Use the least number of digits possible. (2) Start at 1 or 01 or 001, etc. instead of 0, 00, or 000 because this is more natural for counting. Chapter 5: Producing Data Page 3 of 9
12. The most general framework for designs that use chance to choose a sample is a probability sample. What must we know about the sample in order to call it a true probability sample? 13. IMPORTANT: What is the essential principle of statistical sampling? 14. What is a typical situation in which a sample more complex than an SRS might be used? 15. Describe a stratified random sample. 16. How are the strata chosen in a stratified random sample? 17. Understand Example 5.5, a discussion of how stratified random sampling is used. 18. Describe a multistage sampling design, and give a primary example of when this type of sample is used. Also be sure to understand how the Current Population Survey is conducted and what makes it a multistage design. 19. Understand that in AP Statistics, we only talk about a very small number the MANY sample designs. 20. When dealing with humans, even a well-designed sample suffers inaccuracies from biases such as undercoverage and nonresponse. What is the difference between these two types of biases, and give an example of each. 21. In the last paragraph of Example 5.6, we see that a sample design actually became politically controversial. It is important to realize that when it comes to analyzing social data, we are often working in gray areas, and that there is not necessarily a right answer. Chapter 5: Producing Data Page 4 of 9
22. Under what conditions might response bias occur (think of race, gender, legality, attitude of interviewer, memory), and how can we minimize this bias? 23. In what two ways can the wording of questions cause bias in a sample? 24. Are results from a sample exactly the same as what they would be for the entire population? 25. Can we expect to obtain the same results if we draw two different samples of the same size from the same population? 26. IMPORTANT: Properly designed samples avoid systematic bias, but their results are rarely exactly correct and they vary from sample to sample. 27. The results of random sampling, because they are based on chance, obey the laws of that govern chance behavior, and we will learn how to attach a margin of error to our sampling inaccuracies in later chapters. 28. Is it true that a larger sample always provides more accurate results than a smaller sample? 29. We can add the same word in two different places in the above sentence to make it true. What word, and where should it be added? 30. What is the difference between a sampling frame and the population? 31. What is a systematic random sample, and when is it often used? 32. Compare and contrast a systematic random sample and a stratified random sample. Chapter 5: Producing Data Page 5 of 9
5.2 Designing Experiments (pp. 290-309) 1. Explain the difference, if any, between experimental units and subjects. 2. Define treatment. 3. IMPORTANT VOCABULARY: A response variable (or predicted or dependent variable) represents the outcome you are studying, while the explanatory variable (or predictor or independent variable) represents an attempt to explain the observed outcomes. 4. Give an example of at least two levels of a factor in an experiment. 5. Describe the placebo effect. 6. In Example 5.10, can we say that studying foreign languages causes differences in students verbal abilities in English? 7. IMPORTANT: What is the biggest advantage of experiments over observational studies? 8. Describe another advantage of experiments over observational studies. 9. The simplest design of a comparative experiment is: Units Treatment Observe response. Why does this not always work with living subjects (read Example 5.11)? 10. What is the purpose of using a control group? 11. Control is the first basic principle of statistical design of experiments what is the simplest form of control? 12. What might be the result if we do not have control in our experiments? Chapter 5: Producing Data Page 6 of 9
13. What needs to be identified first in the design of an experiment? 14. Describe the second aspect of design. 15. KEY QUESTION: How can we assign experimental units to treatments in a way that is fair to all of the treatments? 16. Is matching typically adequate (not to be confused with matched pairs )? 17. What is the statistician s remedy to bias in experimental design? 18. Define randomization. 19. How does the design s flowchart outline describe the experiment in Example 5.12? 20. What is the logic behind a randomized comparative design? 21. How do we get the effects of chance to even out in the assignment of experimental units to treatments? 22. IMPORTANT: Summarize the principles of experimental design, namely, control, randomization, and replication. 23. Define statistically significant. 24. What do we typically hope to see when we conduct an experiment? 25. What makes an experimental design completely randomized? Make sure you understand why Figures 5.3 and 5.4 show completely randomized designs. 26. Is there a maximum number of treatments that completely randomized designs can compare? Chapter 5: Producing Data Page 7 of 9
27. What do we mean by a double-blind study, and what are the advantages of such a study? 28. What is the most serious potential weakness of experiments? Give an example. 29. IMPORTANT: Statistical analysis of the original experiment cannot tell us how far the results will generalize. 30. What is the simplest statistical design for experiments? 31. Describe two versions of matched pairs design. Be sure to include the number of subjects, treatments, and order of treatments in your description. 32. What advantages does blocking offer? 33. On what should we base the formation of blocks? 5.3 Simulating Experiments (pp. 309-318) 1. What are the three methods we can use to answer questions involving chance (first paragraph of 5.3)? Which of these is probably the most feasible? 2. What is simulation, and for what is it used? 3. IMPORTANT: List the five steps for conducting a simulation: 4. IMPORTANT: What is typically the most difficult part of running a simulation? 5. What do we mean by independent tosses of a coin, or independent trials in general? Chapter 5: Producing Data Page 8 of 9
6. What can we say about its associated probabilities if a model does not correctly describe the random phenomena it is intended to describe? 7. BE SURE you understand ALL of the examples in this section, particularly EXAMPLE 5.25, as it pertains to the use of your calculator to carry out simulations! If you want additional practice with simulation, do the questions that have a little spinner symbol in the margin. Chapter 5: Producing Data Page 9 of 9