Chapter 3. Producing Data

Similar documents
Chapter 3. Producing Data

CHAPTER 5: PRODUCING DATA

Vocabulary. Bias. Blinding. Block. Cluster sample

Sampling. (James Madison University) January 9, / 13

AP Statistics Exam Review: Strand 2: Sampling and Experimentation Date:

Observational study is a poor way to gauge the effect of an intervention. When looking for cause effect relationships you MUST have an experiment.

Chapter 5: Producing Data

aps/stone U0 d14 review d2 teacher notes 9/14/17 obj: review Opener: I have- who has

Chapter 2. The Data Analysis Process and Collecting Data Sensibly. Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Variable Data univariate data set bivariate data set multivariate data set categorical qualitative numerical quantitative

Unit 3: Collecting Data. Observational Study Experimental Study Sampling Bias Types of Sampling

UNIT I SAMPLING AND EXPERIMENTATION: PLANNING AND CONDUCTING A STUDY (Chapter 4)

Chapter 1: Exploring Data

Section 6.1 Sampling. Population each element (or person) from the set of observations that can be made (entire group)

Chapter 1 - Sampling and Experimental Design

Problems for Chapter 8: Producing Data: Sampling. STAT Fall 2015.

MATH-134. Experimental Design

Chapter 1 Data Collection

Examining Relationships Least-squares regression. Sections 2.3

Unit 1 Exploring and Understanding Data

P. 266 #9, 11. p. 289 # 4, 6 11, 14, 17

REVIEW FOR THE PREVIOUS LECTURE

STA 291 Lecture 4 Jan 26, 2010

Chapter 9. Producing Data: Experiments. BPS - 5th Ed. Chapter 9 1

Section 6.1 Sampling. Population each element (or person) from the set of observations that can be made (entire group)

Chapter 5: Producing Data Review Sheet

Chapter 13 Summary Experiments and Observational Studies

Math 124: Modules 3 and 4. Sampling. Designing. Studies. Studies. Experimental Studies Surveys. Math 124: Modules 3 and 4. Sampling.

Chapter 13. Experiments and Observational Studies. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Sta 309 (Statistics And Probability for Engineers)

3.2 Designing Experiments

10.1 Estimating with Confidence. Chapter 10 Introduction to Inference

An observational study observes individuals and measures variables of interest but does not attempt to influence the responses.

You can t fix by analysis what you bungled by design. Fancy analysis can t fix a poorly designed study.

Observation Studies, Sampling Designs and Bias

Section 1.1 What is Statistics?

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 1.1-1

Math 124: Module 3 and Module 4

Data = collections of observations, measurements, gender, survey responses etc. Sample = collection of some members (a subset) of the population

Chapter 1: Data Collection Pearson Prentice Hall. All rights reserved

Chapter 4 Review. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

CHAPTER 4 Designing Studies

MAT 155. Chapter 1 Introduction to Statistics. Key Concept. Basics of Collecting Data. August 20, S1.5_3 Collecting Sample Data

Outline. Chapter 3: Random Sampling, Probability, and the Binomial Distribution. Some Data: The Value of Statistical Consulting

august 3, 2018 What do you think would have happened if we had time to do the same activity but with a sample size of 10?

CHAPTER 9: Producing Data: Experiments

Aim: Intro Chp. 4 Designing Studies

Math 140 Introductory Statistics

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Sampling Reminders about content and communications:

Define the population Determine appropriate sample size Choose a sampling design Choose an appropriate research design

Soci708 Statistics for Sociologists

Experimental Design There is no recovery from poorly collected data!

Sampling and Data Collection

Ch 1.1 & 1.2 Basic Definitions for Statistics

Class 1. b. Sampling a total of 100 Californians, where individuals are randomly selected from each major ethnic group.

BIAS: The design of a statistical study shows bias if it systematically favors certain outcomes.

Designed Experiments have developed their own terminology. The individuals in an experiment are often called subjects.

Handout 16: Opinion Polls, Sampling, and Margin of Error

Creative Commons Attribution-NonCommercial-Share Alike License

CHAPTER 8 Estimating with Confidence

Lecture Start

Chapter 8: Estimating with Confidence

Objectives. Data Collection 8/25/2017. Section 1-3. Identify the five basic sample techniques

Chapter 02. Basic Research Methodology

AP Statistics Chapter 5 Multiple Choice

Quiz 4.1C AP Statistics Name:

CHAPTER 8 Estimating with Confidence

Chapter 9. Producing Data: Experiments. BPS - 5th Ed. Chapter 9 1

Probability and Statistics Chapter 1 Notes

A) I only B) II only C) III only D) II and III only E) I, II, and III

General Biostatistics Concepts

Gathering. Useful Data. Chapter 3. Copyright 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Chapter 11: Experiments and Observational Studies p 318

I can explain how under coverage, nonresponse, and question wording can lead to bias in a sample survey. Strive p. 67; Textbook p.

Overview: Part I. December 3, Basics Sources of data Sample surveys Experiments

Collecting Data Example: Does aspirin prevent heart attacks?

Summer AP Statistic. Chapter 4 : Sampling and Surveys: Read What s the difference between a population and a sample?

Name: Class: Date: 1. Use Scenario 4-6. Explain why this is an experiment and not an observational study.

Villarreal Rm. 170 Handout (4.3)/(4.4) - 1 Designing Experiments I

Introduction. sample EXAMPLE 3.1. Helping welfare mothers find jobs

Sample Size, Power and Sampling Methods

Bias in Sampling. MATH 130, Elements of Statistics I. J. Robert Buchanan. Fall Department of Mathematics

MATH 2300: Statistical Methods. What is Statistics?

Methodological skills

Psych 1Chapter 2 Overview

For each of the following cases, describe the population, sample, population parameters, and sample statistics.

Chapter 13. Experiments and Observational Studies

GATHERING DATA. Chapter 4

Population. population. parameter. Census versus Sample. Statistic. sample. statistic. Parameter. Population. Example: Census.

Chapter 3 Producing Data

Ch 4 Practice Test. Multiple Choice Identify the choice that best completes the statement or answers the question. Scenario 4-1

AP Statistics Exam III Multiple Choice Questions

AP Statistics Unit 4.2 Day 3 Notes: Experimental Design. Expt1:

Introduction, Evidence, and Sampling

Chapter 8 Estimating with Confidence

Statistics and Probability

Introduction to Statistical Data Analysis I

7) A tax auditor selects every 1000th income tax return that is received.

DOING SOCIOLOGICAL RESEARCH C H A P T E R 3

Transcription:

Chapter 3. Producing Data Introduction Mostly data are collected for a specific purpose of answering certain questions. For example, Is smoking related to lung cancer? Is use of hand-held cell phones associated with brain cancer? Are smaller class sizes better for students learning? Is a new therapy better than standard therapy in reducing pain? What percent of college students consider themselves conservatives? How long do light bulbs last when the mean lifetime of the bulbs specified by the manufacturer is 1,000 hours? To establish a causal link between an explanatory variable and a response, we need to conduct a carefully designed experiment in which we deliberately impose some treatment on individuals to observe their responses and control the effects of other possible variables. Experiments can give good evidence for causation. However, in many cases, conducting an experiment is neither practical nor ethical. In an observational study, we simply observe individuals and measure variables without influencing the responses. There may be limitations in the conclusions that we draw from observational studies. (EXAMPLE: Television Viewing and Aggressive Behavior during Adolescence and Adulthood) A typical hour of prime-time television shows three to five violent acts. Linking family interviews and police records shows a clear association between time spent watching TV as a child and later aggressive behavior. Question: Despite the observed association, why would it be difficult to make conclusion that more TV causes more aggressive behavior? How to produce trustworthy data Statistical designs for producing data in sampling surveys or experiments refer to arrangements for collecting data from individuals. Statistical designs address the following questions: How shall we select the individuals to be studied? What treatments shall we consider? 1

How shall we assign each individual to the treatments? How many individuals shall we collect data from? Design of Experiments (EXAMPLE: Do Antioxidants Prevent Cancer?) Basic vocabulary of experiment Experimental units are individuals on which the experiment is done. A treatment is a specific experimental condition applied to the units. Factors are explanatory variables. A level of a factor is a specific value of the factor. The placebo effect is the response to a dummy treatment. A control group is the group of individuals who receive a sham treatment, which enables us to control the effects of outside variables on the outcome such as the placebo effect. The design of a study is called biased if it systematically favors certain outcomes. Designing an experiment (EXAMPLE: Clinical Experiment for Drug Comparison ) To compare the effectiveness of a new drug to a standard drug in curing a disease, 8 patients are included in a clinical study. They are assigned to the two drugs, and the response of main interest is the time to cure. How do we assign experimental units to treatments? 2

Randomize. Let chance make assignment that does not depend on any characteristics of the experimental units. Unless the assignment is fair to all the treatments, comparisons among treatments are not valid. Outline of a randomized comparative experiment Random allocation Experimental group (4 patients) Control group (4 patients) New drug Standard drug Compare the time to cure Completely Randomized Design: All experimental units are allocated at random among all treatments. i. Give a number label to each experimental unit. ii. Put the numbers in a hat and mix them up. iii. Draw four numbers at random, and assign the corresponding subjects to one treatment. What if gender might have different effects on the response? The variation among experimental units due to their gender may hide the systematic effect of the treatment. Can we improve the completely randomized design? Form blocks of experimental units that are similar in some way to remove undesirable sources of variation in the response. Block Design: The random assignment of units to treatments is carried out separately within each block (in this example, men and women groups). 3

Outline of a block design Men Random allocation Experimental group (2 patients) Control group (2 patients) New drug Standard drug Compare the time to cure Women Random allocation Experimental group (2 patients) Control group (2 patients) New drug Standard drug Compare the time to cure What if other physiological characteristics of the subjects also affect the outcome? Choose blocks of two units that are as closely matched as possible in terms of gender, age, height, weight and so on. Assign the treatments to each block in a random order. Matched Pairs Design: The matched pairs form blocks of size two for comparing just two treatments, and each unit receives one treatment. Treatment Effect vs Chance Is any difference in the experimental group and the control group due to the effect of the treatment? No, it can be attributed either to the effect of the treatment or to the effect of chance. However, using enough experimental units will reduce chance variation. We call an observed effect statistically significant if it is so large that it would rarely occur by chance alone. 4

Principles of Experimental Design Control the effects of lurking variables on the response. Randomize - let chance assign experimental units to treatments. Replicate each treatment on many units to reduce the role of chance variation in the results. Sampling What percent of adults in the states favor a national system of health insurance? What is the mean amount of student loans for undergraduates at OSU? Both questions involve gathering information about a large group of individuals. The idea of sampling is to study a part in order to gain information about the whole population, and widely used in opinion polls and market research. Vocabulary of Sampling The population is the entire group of individuals that we want information about. A sample is a part of the population that we actually examine. The design of a sample survey is the method used to select the sample from the population. A sampling scheme that systematically favors some parts of the population over other is called biased. For example, A voluntary response sample consists of people who choose themselves to respond. A convenience sample consists of individuals who are more convenient to choose from the population. Sampling Design For conclusions based on a sample to be valid for the entire population, a sound sampling design is required. How do we select a representative sample? Random selection of a sample eliminates bias by giving all individuals an equal chance to be chosen. Simple Random Sampling (SRS) gives every sample of a given size the same chance to be chosen. It also gives each individual an equal chance to be chosen. Label the members of the population and use random digits to select a sample of a given size. 5

Probability sampling means using chance to select a sample. (EXAMPLE: A Survey on Regulating Guns) In 1999, the University of Chicago s National Opinion Research Center carried out National Gun Policy Survey on national gun attitudes in the United States. The survey includes questions on gun ownership and opinions about government regulation of firearms. Participating households in the survey were identified through randomdigit-dialing, which is a practical method for obtaining almost an SRS of households. Stratified Random Sampling i. First divide the population into homogeneous groups, called strata. ii. Choose a separate SRS in each stratum. iii. Combine these SRSs to form the full sample. Similar to a block design, it tends to produce samples that are more similar to the population than an SRS. Population Urban Suburban Rural Sample Multistage Sampling For large-scale surveys, sending interviewers to widely scattered individuals in a simple random sample would be too costly. Most large-scale sample surveys use multistage samples. Select successively smaller groups within the population in stages. The final sample consists of clusters of individuals. Each stage may employ an SRS, or a stratified sampling. 6

State County Cautions about sample surveys Town Block Undercoverage occurs when some groups in the population are left out of the sampling procedure. For example, selecting a sample of households from the telephone directory would miss those households without residential phones. Nonresponse occurs when an individual chosen for the sample can not be contacted or does not cooperate. Respondents may lie especially if asked about illegal or unpopular behavior, which can result in response bias. Wording of questions may influence the survey outcome. Toward Statistical Inference (EXAMPLE: The latest New York Times/CBS News Poll) According to the latest poll based on telephone interviews conducted in September with 1,042 adults throughout the United States, 65% of the people favor the government offering everyone a health insurance plan like Medicare as an alternative to private insurers. What can we say about the entire population of all adults? Statistical inference means drawing conclusions about the entire population based on a sample. Vocabulary of Statistical Inference A parameter is a number that describes the population. It is a fixed number but unknown in practice. A statistic is a number that describes a sample. It is used to estimate an unknown parameter. The value of a statistic is known once a sample is taken. 7

Parameter Statistic population proportion (p) sample proportion (ˆp) population mean (µ) sample mean ( x) population variance (σ 2 ) sample variance (s 2 ).. How trustworthy is a statistic? Is the statistic based on a biased sample? Random sampling eliminates bias in choosing a sample. Even with a random sample, the value of a statistic changes from sample to sample. This fact is called sampling variability. How variable is the statistic when we repeat random sampling? How do we examine the variability of a statistic? Sampling distribution of a statistic The distribution of values taken by the statistic in all possible samples of the same size from the population. How to get the sampling distribution of a statistic? The sampling distribution can be approximated by a histogram of the values of the statistic obtained from repeated random samples. If we postulate a model for the population, then the sampling distribution of the statistic can be described exactly with the aid of the probability theory. The sampling distribution of a statistic often shows a regular pattern. As we examine distributions of variables in data, we examine the shape, center, and spread of the sampling distribution. 8

Bias of a statistic The difference between the mean of its sampling distribution and the true value of the parameter. A statistic is called unbiased if the mean of its sampling distribution is equal to the true value of the parameter estimated by the statistic. Variability of a statistic The spread of its sampling distribution. The spread is determined by the sampling design and the sample size. Statistics from large samples have smaller spreads. The variability of a statistic from a random sample does not depend on the population size, as long as the population is much larger than the sample. Bias and Variability 9