UNIT I SAMPLING AND EXPERIMENTATION: PLANNING AND CONDUCTING A STUDY (Chapter 4)

Similar documents
AP Statistics Exam Review: Strand 2: Sampling and Experimentation Date:

Chapter 3. Producing Data

Chapter 3. Producing Data

Chapter 1 - Sampling and Experimental Design

aps/stone U0 d14 review d2 teacher notes 9/14/17 obj: review Opener: I have- who has

Sampling. (James Madison University) January 9, / 13

august 3, 2018 What do you think would have happened if we had time to do the same activity but with a sample size of 10?

Vocabulary. Bias. Blinding. Block. Cluster sample

Math 140 Introductory Statistics

Section 6.1 Sampling. Population each element (or person) from the set of observations that can be made (entire group)

Observational study is a poor way to gauge the effect of an intervention. When looking for cause effect relationships you MUST have an experiment.

CHAPTER 5: PRODUCING DATA

P. 266 #9, 11. p. 289 # 4, 6 11, 14, 17

Chapter 1 Data Collection

Gathering. Useful Data. Chapter 3. Copyright 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Section 6.1 Sampling. Population each element (or person) from the set of observations that can be made (entire group)

Unit 3: Collecting Data. Observational Study Experimental Study Sampling Bias Types of Sampling

MATH-134. Experimental Design

You can t fix by analysis what you bungled by design. Fancy analysis can t fix a poorly designed study.

Chapter 13 Summary Experiments and Observational Studies

Chapter 13. Experiments and Observational Studies. Copyright 2012, 2008, 2005 Pearson Education, Inc.

REVIEW FOR THE PREVIOUS LECTURE

Variable Data univariate data set bivariate data set multivariate data set categorical qualitative numerical quantitative

Sampling Reminders about content and communications:

Observation Studies, Sampling Designs and Bias

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 1.1-1

Chapter 4 Review. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Chapter 2. The Data Analysis Process and Collecting Data Sensibly. Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Chapter 5: Producing Data

Quiz 4.1C AP Statistics Name:

Define the population Determine appropriate sample size Choose a sampling design Choose an appropriate research design

Chapter 13. Experiments and Observational Studies

Experimental Design There is no recovery from poorly collected data!

Sta 309 (Statistics And Probability for Engineers)

Moore, IPS 6e Chapter 03

Chapter 5: Producing Data Review Sheet

CHAPTER 9: Producing Data: Experiments

Sampling for Success. Dr. Jim Mirabella President, Mirabella Research Services, Inc. Professor of Research & Statistics

Examining Relationships Least-squares regression. Sections 2.3

Designed Experiments have developed their own terminology. The individuals in an experiment are often called subjects.

Dr. Allen Back. Oct. 7, 2016

Problems for Chapter 8: Producing Data: Sampling. STAT Fall 2015.

Villarreal Rm. 170 Handout (4.3)/(4.4) - 1 Designing Experiments I

Handout 1: Introduction to the Research Process and Study Design STAT 335 Fall 2016

Summer AP Statistic. Chapter 4 : Sampling and Surveys: Read What s the difference between a population and a sample?

Chapter Three Research Methodology

CHAPTER 4 Designing Studies

For each of the following cases, describe the population, sample, population parameters, and sample statistics.

I. Introduction and Data Collection B. Sampling. 1. Bias. In this section Bias Random Sampling Sampling Error

Unit 1 Exploring and Understanding Data

Chapter 1: Data Collection Pearson Prentice Hall. All rights reserved

Chapter 11: Experiments and Observational Studies p 318

Collecting Data Example: Does aspirin prevent heart attacks?

Appraising the Literature Overview of Study Designs

Psychology: The Science

Review+Practice. May 30, 2012

Chapter 8 Statistical Principles of Design. Fall 2010

MAT 155. Chapter 1 Introduction to Statistics. Key Concept. Basics of Collecting Data. August 20, S1.5_3 Collecting Sample Data

4.2: Experiments. SAT Survey vs. SAT. Experiment. Confounding Variables. Section 4.2 Experiments. Observational Study vs.

Chapter 1: Introduction to Statistics

STA 291 Lecture 4 Jan 26, 2010

Dr. Allen Back. Sep. 30, 2016

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Math 124: Module 3 and Module 4

Math 124: Modules 3 and 4. Sampling. Designing. Studies. Studies. Experimental Studies Surveys. Math 124: Modules 3 and 4. Sampling.

MATH 2300: Statistical Methods. What is Statistics?

Overview: Part I. December 3, Basics Sources of data Sample surveys Experiments

Sampling and Data Collection

Section Experiments

Aim: Intro Chp. 4 Designing Studies

Funnelling Used to describe a process of narrowing down of focus within a literature review. So, the writer begins with a broad discussion providing b

Psych 1Chapter 2 Overview

Survey Research. We can learn a lot simply by asking people what we want to know... THE PREVALENCE OF SURVEYS IN COMMUNICATION RESEARCH

CH 13 Experiments and observational Studies

Introduction to Research Methods

Handout 16: Opinion Polls, Sampling, and Margin of Error

General Biostatistics Concepts

Introduction to Statistics

Data = collections of observations, measurements, gender, survey responses etc. Sample = collection of some members (a subset) of the population

Higher Psychology RESEARCH REVISION

Creative Commons Attribution-NonCommercial-Share Alike License

Ch 1.1 & 1.2 Basic Definitions for Statistics

The Scientific Method. Philosophy of Science. Philosophy of Science. Epistemology: the philosophy of knowledge

Why do Psychologists Perform Research?

Lecture (chapter 1): Introduction

Design of Experiments & Introduction to Research

Human intuition is remarkably accurate and free from error.

Chapter 2 Survey Research Design and Quantitative Methods of Analysis for Cross-Sectional Data

2 Critical thinking guidelines

Class 1. b. Sampling a total of 100 Californians, where individuals are randomly selected from each major ethnic group.

AP Statistics Exam III Multiple Choice Questions

Chapter 5 & 6 Review. Producing Data Probability & Simulation

Module 4 Introduction

Chapter 1: Collecting Data, Bias and Experimental Design

Chapter 8. Learning Objectives 9/10/2012. Research Principles and Evidence Based Practice

Review. Chapter 5. Common Language. Ch 3: samples. Ch 4: real world sample surveys. Experiments, Good and Bad

Chapter 9. Producing Data: Experiments. BPS - 5th Ed. Chapter 9 1

Lecturer: Dr. Emmanuel Adjei Department of Information Studies Contact Information:

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

BIAS: The design of a statistical study shows bias if it systematically favors certain outcomes.

Transcription:

UNIT I SAMPLING AND EXPERIMENTATION: PLANNING AND CONDUCTING A STUDY (Chapter 4) A DATA COLLECTION (Overview) When researchers want to make conclusions/inferences about an entire population, they often must rely on a smaller portion of the group to gather information. In such cases, it is essential that the selected sample is representative of its population. We rely on chance/randomness to obtain a sample that best represents its larger population; otherwise, the data and any subsequent analyses are entirely useless. When an entire population is surveyed, a census is used. Reporting on American demographics, the U.S. Bureau of the Census intends/attempts to contact every individual but does not obtain a complete report. In reality, this method of considering every member of a population proves to be difficult, costly, and time-consuming, and, therefore, often unreasonable. Two methods of data collection are observational studies and experiments. In an observational study, information is collected only. In an experiment, a treatment is imposed on the experimental units. In both cases, proper design is again essential for obtaining useful data. Identify the population you want to describe and define the variable you want to measure. It is important to distinguish between observations and experiments: Observation Observe and measure but do not influence responses Data source for establishing correlations between variables Useful for obtaining data, from voter opinion to animal behavior Types of observational studies: o o o o Cross-sectional study seeks to find the prevalence of a phenomenon, problem, attitude, or issue by taking a snap-shot or cross-section of the population! Identify students who play an instrument and collect data on their current grades Longitudinal study information is collected across long periods of time and repeatedly collected throughout the extended time period! Identify students who play an instrument and repeatedly collect data on their grades over the next 10 years Retrospective study (case-control) investigates a phenomenon or issue that occurred in the past! Identify students who play an instrument and collect data on their past grades Prospective study (cohort study) seeks to estimate the likelihood of an event or problem in the future; experimenter must wait until the experiment runs its course in order to examine the effects; i.e., randomized controlled trials! Identify students who play an instrument and collect data on their future grades Experiment Apply some treatment to individuals in order to observe responses Only persuasive data source for establishing causal relationships between variables

B SURVEYS A survey is a type of observational study. With a sample survey, it is important to select a group that accurately represents the entire population. To avoid bias and gather a representative sample, methods of chance/randomness are used. If a sample is not representative of the population, analysis and inference of the sample statistics are futile. SAMPLING DESIGN METHODS the method used to select the sample from the population in order to conduct surveys and/or experiments 1. PROBABILITY SAMPLING DESIGNS (good) Good sampling designs ( probability sampling designs) use an element of randomness (subjects chosen by chance) in order to best avoiding bias. Bias is the systematic favoring of certain outcomes, where values are consistently overestimated or consistently underestimated. Unbiased values differ by chance, about half the time overestimated and half the time underestimated. With the following methods of collecting a random sample, bias is minimized, and the value obtained from the sample, known as a statistic, can be used to infer about the actual value of the population, known as the parameter. Random Sampling SRS (Simple Random Sample) Sample is randomly chosen using computer software, names in a hat, table of random digits, etc. Consists of individuals chosen from a population in such a way that each individual AND each group has an equally likely chance of being chosen *An SRS is the ONLY probability sampling design in which each possible group has an equal likelihood! How to choose an SRS: 1. Assign a number, 1-N, to each individual in the population. (Use the smallest amount of digits possible. For example, if 10 are in the population, use 0-9, not 01-10; if 100, use 01-00 or 00-99, not 01-100; etc.) 2. Randomly select numbers, 1-n, to form the sample. o Example: Using the table of random digits, find an SRS from line 149 for: N = 80 (capital N represents population) n = 9 (lower-case n represents sample size) 1. With population 80, assign the numbers 01-80 to each individual. 2. Reading line 149 from left to right (ignoring spaces), select the first 9 numbers that fall in the range 01-80 (ignoring those that do not and ignoring repeats).

Stratified Sample SRSs are chosen from homogeneous groups (gender, sport, grade, etc.) Consists of individuals chosen from a population in such a way that each individual has an equally likely chance of being chosen How to choose a stratified random sample: 1. Divide the population into homogeneous groups (subjectively). 2. Take an SRS from within each of these groups and combine to form the sample. *Groups should be weighted if they are unequal sizes. o Example: Choose a stratified random sample of size 200 from the 1300 students of Pompano Beach HS. 1. We want to ensure that all classes are represented, so the stratified design makes the most sense. --We will stratify according to class/grade level. There are 320 freshmen, 310 sophomores, 330 juniors, and 340 seniors. 2. Because the classes are unequal sizes, it is NOT correct to simply take an SRS of size 50 from each class. We must weight the classes: Freshmen: Sophomores: Juniors: Seniors:!"#!"#!"##!"##!!"!"##!"#!"## (200) 49 (200) 48 (200) 51 (200) 52 Cluster Sample Consists of individuals from chosen groups Consists of individuals chosen from a population in such a way that each individual has an equally likely chance of being chosen Useful when population is large or spread out; saves time, money How to choose a cluster random sample: 1. Divide the population into heterogeneous groups. 2. Randomly select a specified number of these groups. 3. Include EVERY individual in the selected groups to form the sample. o Example: Choose a cluster sample from study hall classes. Each study hall classroom is viewed as a cluster. 1. Randomly select 3 of the total classrooms. 2. Include ALL the students in these 3 classrooms to form the sample.

Systematic Sample Sample is chosen over a specific pattern (selecting every n th person in a listed population) Consists of individuals chosen from a population in such a way that each individual has an equally likely chance of being chosen How to choose a systematic random sample: 1. Divide the population by the sample size. 2. Choose an SRS of size 1 to determine the starting point. o Example: Choose a sample of size 4 from a population of size 100. 1. Divide the population by the sample size: 100/4 = 25. Therefore, we are going to select every 25 th individual. 2. Choose an SRS of size 1 to determine the starting point (choose a random number from 01 to 25 (1 to N)). If we randomly selected 3, then our first individual chosen is the 3 rd. 3. Choose every 25 th thereafter, and the systematic random sample will include: 3, 28, 53, 78. Multi-Stage sampling involves the use of two or more of the above random sampling methods. For example, the Gallup poll organization employs a multi-stage sampling procedure when they choose an SRS of nationwide locations, stratify by neighborhoods, and then select an SRS of households within each. Sampling variability With the above methods of collecting a random sample using chance, bias is avoided, and the value obtained from the sample, the statistic, can be used to infer about the actual value of the population, the parameter. Sampling variability refers to the varying of a statistic value from sample to sample. Even though the statistic values vary and will likely not equal the actual population parameter, it is considered the best estimate ( unbiased estimator ) of the population because of the use of chance/randomization to select the sample. Sampling variability occurs naturally due to chance. Sampling Error the difference between the sample statistic and the population parameter; expected, due to chance; expected variability between samples; occurs naturally due to the use of only some of the entire population Non-sampling Error occurs when data is improperly collected (see Non-Probability Sampling Designs below); due to human error Sample size When using probability sampling methods, it is also preferable to have a larger sample size. The more subjects in your sample, the less natural variability and the better the value estimates that of the population. This is true regardless of population size. For example, a sample of size 500 can represent a population of 100,000 as well as it can a population of 1,000,000. The average statistic of repeated observations/experiments is the best estimate of the population parameter (such gathered statistics form a sampling distribution ). (If the population is normally distributed, a random sample of size 30 can be sufficient...)

2. NON-PROBABILITY SAMPLING DESIGNS (bad, biased, non-representative of population) # Bad sampling designs ( non-probability sampling designs) do not include the use of chance and randomness, and, therefore, will result in a biased sample that will miss the truth about a population, repeatedly underestimating or repeatedly overestimating the value that you are trying to measure. Convenience Sampling individuals that are easiest to access convenient for the researcher o Example: Mall interviewing; Surveyor standing outside a stadium Voluntary Response individuals are self-selected usually people with strong opinions comprise the majority of the sample people can respond more than once o Example: Online surveys; American Idol, Dancing with the Stars voting ADDITIONAL PROBLEMS/BIAS issues that can result in inaccurate, unrepresentative answers and, therefore, unusable data UNDER-COVERAGE not all individuals can be reached/selected for participation o Example: Mailed surveys will not include homeless people, inmates, students in dorms; telephone surveys won t reach those without a phone NON-RESPONSE BIAS individuals selected for the sample cannot be contacted, refuse to participate, or do not answer o Example: Mailed surveys are not returned; survey questions are not answered RESPONSE BIAS any part of a survey that influences the responses of the individuals selected for the sample, intentionally or not; pressure on respondents to answer an untruthful way Wording bias: Confusing questions; Leading questions individuals may misunderstand the question being asked; question may be worded to lead the respondent to a particular answer o Example: Asking, How incompetent do you think (Politician X) is? Sensitive topics individuals may lie, give inaccurate information o Example: Questions involving religion, age, politics, personal/private issues, etc. Appearance of interviewer individuals may feel obligated to answer in a certain way o Example: A pregnant women interviewing about maternity leave Self-interest study/funding bias favors publishers, companies who funded the studies o Example: Study sponsored entirely by McArthur Dairy LLC to look at the health benefits of a milk (etc.) Considering the above problems that could arise when collecting information from individuals, one should read the sample survey questions before trusting research claims. We can infer about a population only if individuals were randomly selected.

C EXPERIMENTS Observational studies that show a correlation between variables often incite researchers to conduct experiments in order to determine possible causation between the variables. With a well-designed experiment, one can determine if changes in the explanatory variable caused changes in the response variable. A major goal of experimenting is to generalize the results to a broader population; therefore, repeating experiments in a variety of settings is crucial. Because most experiments use volunteers as opposed to a random sample, one must remember to apply the results to a population similar to that of the subjects used. Treatment In experiments, a treatment is given to experimental units. Experimental units can be people (known as subjects ), animals, or objects. The treatment, the explanatory variable (also known as a factor, which can be applied at different levels ), is applied to the experimental units in order to observe the response variable. For example, consider an experiment to test a particular heart medication. The experiment has two explanatory variables ( factors ), the dose and dosage, the amount of medication taken at one time and the number of doses over a period of time, respectively. These factors have different levels, two different doses, 50mg and 100mg, and two different dosages, 1, 2, or 3 times a day. In this case, there are six treatments: six combinations, one level of each factor. The response variable is heart-rate levels. Principles of Experimental Design 1. Control (to reduce confounding and variability in response variable) Comparison (of two or more treatments to reduce confounding) Controlling other influences is also essential for proper experimental design. Identifying a cause-and-effect relationship requires that other factors, other explanations for the resulting responses, have been accounted for and reduced by researcher control. Confounding occurs when the effect of different factors is indistinguishable. Control, keeping other variables that might affect the response the same as possible for all groups, is important because it will help to reduce confounding and reduce variability in the responses. For example, in the above example, we may want to control for the amount of activity/exercise in which the heart medication subjects engage. We can separate the subjects into two groups based on activity level so that the medication and the activity level will not be confounded.

Grouping subjects who are similar in ways that might affect the response to treatment is called blocking. Blocking limits confounding and variability within groups. Without control, it may be difficult to attribute the change in response to the treatment as opposed to other factors, confounding variables. Just as random selection reduces the chance of bias in a sample, a well-designed comparative experiment helps prevent confounding. At least two treatment groups must be compared to distinguish the effect of treatment from no treatment (or other treatment). A control group can receive no treatment/an inactive treatment (placebo) or a treatment (such as an existing medication) and provides baseline data as a means of comparison. The placebo effect is a subject s response to any treatment (or an experimenter s interpretation of a subject s response), perceived or actual, and can be counteracted by blinding where subjects and/or experimenters are unaware of the group the subjects are in (control vs. treatment) and are unaware of which treatment they are receiving. There are two types: Single-blind either the subjects are unaware of which treatment they are receiving OR the experimenters are unaware of which subjects are receiving which treatment Double-blind both the subjects AND the experimenters are unaware of which subjects are receiving which treatment 2. Randomization Use probability/randomization to assign treatments to the experimental units in order to reduce bias and to minimize the effects of other, lurking, background variables. Randomly assigning subjects to treatment will lessen the unknown, uncontrollable differences among subjects so that whatever factors other than the treatment itself that might affect the results are equally dispersed among the treatment groups. As this random assignment to treatment minimizes the differences between the groups, it will also reduce variability in the responses. 3. Replication Replicate the treatment on many experimental units to reduce chance variation in the results. The more subjects comprise the sample, the more apparent the real response differences are. The difference is deemed as either statistically significant or explainable by chance/natural variation. When the observed result is so large that it would rarely occur by chance alone, it is considered statistically significant. A statistically significant association in data from a welldesigned experiment implies causation.

Types of Experimental Design Completely Randomized design compares randomly-produced groups of similar subjects; influences other than treatments operate equally on both groups; responses are presumed due to treatments (or to chance) Randomized Block design experiments where randomness occurs only within blocks (such as age, gender, race, etc.), after the blocks are determined by the experimenter; random assignment of treatment is carried out separately within each block; allows for explicit control of sub-groups; allows for generalizability by characteristic (age, gender, race, etc.) o Matching/Matched-Pairs design used when the experiment has only two treatment conditions and participants can be grouped into pairs based on some blocking variable 2 subjects (twins, husband & wife,..): Each subject receives one of two treatments; randomization is used within each pair to determine who receives which treatment 1 subject: Subject receives two treatments separately; chance occurs in the order in which the treatments are given (take pre-/post-test; complete puzzle with/without music,..) We can infer about cause and effect only if treatments were randomly assigned.

D DATA ISSUES Complex issues of data ethics arise when we collect data from people. All planned studies must be reviewed in advance by an institutional review board (IRB) (or Independent Ethics Committee ) that is charged with protecting the safety and well-being of the subjects. All individuals who are subjects in a study must give their informed consent before data are collected. All individual data must be kept confidential. Only statistical summaries for groups of subjects may be made public. Observational studies to show causal relationships Sometimes, it is not practical/ethical to conduct experiments, such as to test if smoking causing lung cancer or if texting and driving increases risk of accidents. Sometimes, we can establish causation based on data from observational studies using criteria: A strong and consistent association among many studies Larger values of the explanatory variables associated with stronger responses Alleged cause is plausible and precedes the effect in time Observational studies & Experiments for Inference Once again: We can infer about a population only if individuals were randomly selected. We can infer about cause and effect only if treatments were randomly assigned.