CHAPTER 5: PRODUCING DATA - PDF Free Download

CHAPTER 5: PRODUCING DATA 5.1: Designing Samples Exploratory data analysis seeks to what data say by using: These conclusions apply only to the we examine. To answer questions about some of individuals we must data in a way that allows us to answer our questions. In an, we observe individuals and measure variables of interest but attempt to influence the In an, we deliberately impose some treatment on individuals in order to In most cases we want to gather information about a group of individuals. Why not contact every individual of interest? 1. 2. 3. In these cases, we gather information from only a of the group in order to about the whole. Vocabulary (Def) Population: 1

(Def) Sample: (Def) Sampling: (Def) Census Note: Poor sampling methods can produce misleading conclusions. (ex-1) Call-in opinion polls Voluntary response Television programs and magazines like to conduct call-in or write-in polls of public opinion. This involves announcing question and asking the public to call (or text) one number for Yes and another for No, or mail in a response. Call-in opinion polls are an example of (Def). Voluntary response sample consists of who choose by responding to a general appeal. Voluntary response samples are because people with opinions, especially negative opinions, are to respond. Note: In general, people who take the time and trouble to respond to an open invitation are representative of the population. IMPORTANT: Voluntary response is one of the common types of sampling methods. Another is convenience sampling. (Def) Convenience sampling: choosing individuals who are. 2

(ex-2) Interviewing at the mall Convenience sampling Manufacturers and advertising agencies often use interviews at shopping malls together information about the habits of consumers and the effectiveness of ads. Voluntary response sampling and convenience sampling choose samples that are almost guaranteed to represent the population BIAS: The previous sampling methods display or. The sampling method is if it systematically favors. * The remedy for bias in choosing a sample is to impersonal to do the choosing. 3

SIMPLE RANDOM SAMPLES: A sample chosen by allows neither favoritism by the sampler nor self-selection by respondents. The simplest way to use chance to select a sample is to place names (the population) and draw out (the sample). This is the idea of a simple random sample. A (SRS) of size n consists of n individuals from the chosen in such a way that every set of n individuals has an to be in the sample actually. IMPORTANT SRS FACTS: In a SRS, each has an equal chance to be chosen every possible has an equal chance to be chosen. To get a SRS the idea is to draw out of a hat. Realistically, we will use (calculator) or a table of to choose a SRS. RANDOM DIGIT TABLES: A table of random digits is a long string of the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 with these two properties: 1. Each entry in the table is to be any of the 10 digits 0 through 9. 2. The entries are of each other. That is, knowledge of one part of the table gives about any other part. * Table B in the back of your book is a table of random digits. 4

How to choose an SRS in four steps: 1. Label. Assign a numerical to every in the population. 2. Table. Use Table B to labels at random. 3. Stopping rule. Indicate when you should sampling. 4. Identify sample. Use the labels to subjects selected to be in the sample. Example 3: Help out Joan s accounting firm How to choose an SRS Joan s small accounting firm serves 30 business clients. Joan wants to interview a sample of 5 clients in detail to find ways to improve client satisfaction. To avoid bias, she chooses an SRS of size 5. Label: Table: use the following line from a random digit table to complete this example 69051 64817 87174 09517 84534 06489 87201 97245 Stopping Rule: why did we stop? Identify the sample: The general framework for a method that uses chance to choose a sample is a. A probability sample is a sample chosen by. We must know samples are possible and what (or ) each possible sample has. 5

* The use of chance to select that sample is the essential principle of statistical sampling. Methods for sampling from populations spread out over a are usually more complex that an SRS. (recall our sampling issues of: cost, time, and convenience) STRATIFIED RANDOM SAMPLES: To select a, first split up the populations into groups of individuals, called. Note: Strata are created so that the are in some way that is to the. Then choose a separate in each and these SRSs to form the full sample. Example 4: Suppose we want to survey local high schools. We may choose to divide high schools into public and private schools. Then take a SRS from the public schools and a SRS from the private schools. This way we are guaranteed to have both represented in our study. Then we put the two samples together for the full sample. In general, a stratified sample can give good information about each separately as well as about the. Also, if the individuals in each stratum are than the population as a whole, a stratified sample can produce information about the population than an of the. IN YOUR OWN WORDS A STRATIFIED RANDOM SAMPLE IS 6

CLUSTER SAMPLING: Another common type of probability sampling is the. Cluster sampling: 1 st : Divide the population into or. 2 nd : some of these clusters. 3 rd : Sample of the in the selected clusters. Example 5: What do AP students think? Cluster sampling Suppose you want to survey opinions of AP Statistics students to see if they feel they have enough time on the free-response section of the AP exam. How can this be done? SRS? Stratified Random Sample? Cluster Sample? By the end of this lesson we should understand these cartoons 7

Undercoverage and Nonresponse occurs when some groups in the are of the process of choosing the sample. occurs when an chosen for the sample be contacted or does. Difference between the two: If I am a prospective survey respondent, undercoverage occurs when: And nonresponse occurs when: In one case, my group isn t even in the sampling frame. In the other case, I don t respond. The behavior of the respondent or of the interview can cause. For example, - respondents may or - race/sex of interviewer can responses The of questions is the most important influence on the given to a sample survey. Example 6: Should we ban disposable diapers? A survey paid by the makers of disposable diapers found that 84% of the sample opposed banning disposable diapers. The questions asked in the survey was: It is estimated that disposable diapers account for less than 2% of the trash in today s landfills. In contrast, beverage containers, third-class mail, and yard waste are estimated to account for about 21% of the trash in landfills. Given this, in your opinion, would it be fair to ban disposable diapers? *** Insist on knowing the exact questions asked, the rate of nonresponse, and the date and method of the survey before you trust the poll result. *** Samples vary! *** Parameters are fixed! We can our results by knowing that random samples give more accurate results than smaller samples. 8

**SECTION 5.2*** Designing Experiments Vocabulary The individuals on which the experiment is done are the. When the units are human beings, they are called the. A specific experimental condition applied to the units is called a. The explanatory variables in an experiment are often called. Many experiments study the joint effects of several factors. In such an experiment, each treatment is formed by combining a specific value (often called a ) of each of the factors. Example 7: Effects of class size Do smaller classes in elementary school really benefit students in areas such as scores on standardized tests, staying in school, and going on to college? Observational study: Tennessee STAR program was an experiment on the effects of class size. 9

In principle, experiments can give good evidence for. Laboratory experiments in science and engineering often have a simple design with only a single treatment, which is applied to all of the experimental units. We rely on the environment of the laboratory to protect us from lurking variables. However, when experiments are conducted in the field or with living subjects, simple designs can yield invalid data. Medical example: Patients response may be due to the effect. A placebo is a dummy treatment. Many patients respond favorably to treatment, even a placebo. This may be due to in the doctor and of a cure or simply to the fact that medical often improve treatment. This is why we use a group of patients to receive a placebo. We call this group the group, because it enables us to the effects of variables on the outcome. * Control is the first basic principle of statistical design of experiments. Comparison of treatments in the environment is the simplest form of control. Don t confuse control and control group. Control refers to the overall effort to minimize variability in the way the experimental units are obtained and treated. Replication Even with control, there will still be among experimental units. Looking back at the Tennessee STAR experiment: There would be some difference even if all three groups were treated the same. This is because the among children means that some are than others assigns the smartest students to one group or another, so that there is a chance difference among groups If we assign students to each group, however, the of chance will. * The second principle of statistical design of experiments is replication: use enough subjects to reduce chance variation. 10

Randomization The third basic principle of design is randomization: the rule used to assign the experimental units to the treatments. Statisticians rely on to make an assignment that does not depend on any of the experimental units and that does not rely on the of the experimenter in any way. Example 8: Cell phones and driving Random assignment Does talking on a hands-free cell phone distract drivers? Undergraduate students drove in a highfidelity driving simulator equipped with a hands-free cell phone. The car ahead brakes: how quickly does the subject respond? The use of chance to divide experimental units into groups is called. The logic behind the randomized comparative design in Example 8 is as follows: Randomization produces two groups of subjects that we expect to be in all respects before the treatments are applied. Comparative design helps ensure that influences other than the cell phone operate on both groups. Therefore, in average brake reaction time must be due either to or to the play of in the random assignment of subjects to the two groups. 11

Principles of Experimental Design The basic principles of statistical design of experiments are: 1. Control the effects of lurking variables on the response, most simply by comparing two or more treatments. 2. Replicate each treatment on many units to reduce chance variation in the results. 3. Randomize use impersonal chance to assign experimental units to treatments. We hope to see a difference in the responses so that it is to happen just because of chance variation. An observed effect so large that it would rarely occur by chance is called. When all experimental units are allocated at random among all treatments, the experiment is said to have a. Example 9: TV commercial Completely randomized design What are the effects of repeated exposure to an advertising message? The answer may depend both on the length of the ad and on how often it is repeated. All subjects will view a 40-minute television program that includes ads for a digital camera. Some subjects will see a 30-second commercial; others will see a 90-second version. The same commercial will be shown 1, 3, or 5 times during the program. Suppose that we have 150 students who are willing to serve as subjects. Outline: Random assignment: 12

Blocking (Block Design) A is a group of experimental units or subjects that are known before the experiment to be in some way that is expected to systematically the to the treatments. In a, the random assignment of units to treatments is carried out separately. Blocks are another form of control. They control the effects of some outside variables by bringing those variables into the experiment to form the blocks. Example 10: Comparing cancer therapies Block Design The progress of a type of cancer differs in women and men. A clinical experiment to compare three therapies for this cancer therefore treats gender as a blocking variable. Two separate randomizations are done, one assigning the female subjects to the treatments, and the other assigning the male subjects. Blocks allow us to draw separate conclusions about each block, for example, about mean and women in the cancer study in Example 10. A wise experimenter will form blocks based on the most important unavoidable sources of variability among the experimental units. Control what you can, block what you can t control, and randomize the rest!!! 13

Matched Pairs Design Completely randomized designs are often inferior to more elaborate statistical designs. In particular, matching the subjects in various ways can produce more precise results than simple randomization. The simplest use of matching is a which compares just two treatments. Example 11: Cell phones and driving, part 2 Matched Pairs Recall Example 8, where we considered the effects of talking on a cell phone while driving. The experiment compared two treatments: driving in a simulator and driving in a simulator while talking on a hands-free cell phone. The response variable is the time the driver takes to apply the brake when the car in front brakes suddenly. In Example 8, 40 student subjects were assigned at random, 20 students to each treatment. As a matched pairs experiment Matched pairs designs compare just treatments. We choose blocks of two units that are as closely matched as possible. We assign one of the treatments to each unit by tossing a coin or reading odd and even digits from Table B. Alternatively, each block in a matched pairs design may consist of just subject, who gets treatments, one after the other. Each subject serves as his or her own control. The of the treatments can influence the subject s response, so we the order for each subject, again by a coin toss. Matched pairs are an example of block designs. 14

Cautions about Experimentation In a experiment, neither the subjects nor those who measure the response variable know which treatment a subject received. This is a way to control the placebo effect. Many perhaps most experiments have some weaknesses in detail. The of an experiment can influence the outcomes in unexpected ways. The most serious potential weakness of experiments is. Example 12: Placebo cigarettes Lack of Realism A study of the effects of marijuana recruited young men who used marijuana. Some were randomly assigned to smoke marijuana cigarettes, while others were given placebo cigarettes. This failed: Lack of realism can our ability to apply the of an experiment to the settings of. Most experimenters want to their conclusions to some setting wider than that of the actual experiment. Statistical analysis of an experiment cannot tell us how far the results will to other settings. Nonetheless, the randomized comparative experiment, because of its ability to give evidence for causation, is one of the most important ideas in statistics. 15