I. Introduction and Data Collection B. Sampling. 1. Bias. In this section Bias Random Sampling Sampling Error

Similar documents
Sampling for Success. Dr. Jim Mirabella President, Mirabella Research Services, Inc. Professor of Research & Statistics

Probabilities and Research. Statistics

Observational study is a poor way to gauge the effect of an intervention. When looking for cause effect relationships you MUST have an experiment.

Chapter 1 Data Collection

Vocabulary. Bias. Blinding. Block. Cluster sample

Chapter 2. The Data Analysis Process and Collecting Data Sensibly. Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

BIAS: The design of a statistical study shows bias if it systematically favors certain outcomes.

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 1.1-1

REVIEW FOR THE PREVIOUS LECTURE

Section 6.1 Sampling. Population each element (or person) from the set of observations that can be made (entire group)

Math 124: Modules 3 and 4. Sampling. Designing. Studies. Studies. Experimental Studies Surveys. Math 124: Modules 3 and 4. Sampling.

Confidence in Sampling: Why Every Lawyer Needs to Know the Number 384. By John G. McCabe, M.A. and Justin C. Mary

Math 124: Module 3 and Module 4

Variable Data univariate data set bivariate data set multivariate data set categorical qualitative numerical quantitative

Ch. 1 Collecting and Displaying Data

Objectives. Data Collection 8/25/2017. Section 1-3. Identify the five basic sample techniques

Villarreal Rm. 170 Handout (4.3)/(4.4) - 1 Designing Experiments I

Sampling. (James Madison University) January 9, / 13

Problems for Chapter 8: Producing Data: Sampling. STAT Fall 2015.

Statistics are commonly used in most fields of study and are regularly seen in newspapers, on television, and in professional work.

Examining Relationships Least-squares regression. Sections 2.3

CHAPTER 5: PRODUCING DATA

Section 6.1 Sampling. Population each element (or person) from the set of observations that can be made (entire group)

STA 291 Lecture 4 Jan 26, 2010

Population. population. parameter. Census versus Sample. Statistic. sample. statistic. Parameter. Population. Example: Census.

15.301/310, Managerial Psychology Prof. Dan Ariely Recitation 8: T test and ANOVA

5.3: Associations in Categorical Variables

ORIENTATION SAN FRANCISCO STOP SMOKING PROGRAM

Unit 3: Collecting Data. Observational Study Experimental Study Sampling Bias Types of Sampling

AP Statistics Exam Review: Strand 2: Sampling and Experimentation Date:

Chapter 8: Two Dichotomous Variables

Chapter 3. Producing Data

UNIT I SAMPLING AND EXPERIMENTATION: PLANNING AND CONDUCTING A STUDY (Chapter 4)

Sampling Reminders about content and communications:

Statistics for Psychology

Chapter 1 - Sampling and Experimental Design

august 3, 2018 What do you think would have happened if we had time to do the same activity but with a sample size of 10?

For each of the following cases, describe the population, sample, population parameters, and sample statistics.

The random variable must be a numeric measure resulting from the outcome of a random experiment.

Creative Commons Attribution-NonCommercial-Share Alike License

A Probability Puzzler. Statistics, Data and Statistical Thinking. A Probability Puzzler. A Probability Puzzler. Statistics.

Suppose we tried to figure out the weights of everyone on campus. How could we do this? Weigh everyone. Is this practical? Possible? Accurate?

Chapter 1 Data Types and Data Collection. Brian Habing Department of Statistics University of South Carolina. Outline

MATH-134. Experimental Design

How to select study subjects using Sampling Technique

CHAPTER 3 METHOD AND PROCEDURE

National Institute on Drug Abuse (NIDA) What is Addiction?

Statistical Sampling: An Overview for Criminal Justice Researchers April 28, 2016

Lecture Start

How to select study subjects using Sampling Technique

Lecture 7 Section 2.5. Mon, Sep 8, 2008

Math 140 Introductory Statistics

National Survey of Teens and Young Adults on HIV/AIDS

Introduction: Statistics, Data and Statistical Thinking Part II

Chapter 1: Data Collection Pearson Prentice Hall. All rights reserved

Chapter 1: Statistical Basics

Chapter 4 Review. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Sheila Barron Statistics Outreach Center 2/8/2011

The Wellbeing Course. Resource: Mental Skills. The Wellbeing Course was written by Professor Nick Titov and Dr Blake Dear

7) A tax auditor selects every 1000th income tax return that is received.

Quiz 4.1C AP Statistics Name:

STATISTICS 8 CHAPTERS 1 TO 6, SAMPLE MULTIPLE CHOICE QUESTIONS

UNIT 4 ALGEBRA II TEMPLATE CREATED BY REGION 1 ESA UNIT 4

Chapter 5: Producing Data

Psychology Research Process

ISC- GRADE XI HUMANITIES ( ) PSYCHOLOGY. Chapter 2- Methods of Psychology

P. 266 #9, 11. p. 289 # 4, 6 11, 14, 17

MATH 2300: Statistical Methods. What is Statistics?

The research process is a fascinating method for increasing the understanding of our world.

Handout 16: Opinion Polls, Sampling, and Margin of Error

Review+Practice. May 30, 2012

Gathering. Useful Data. Chapter 3. Copyright 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Chapter 1. Dysfunctional Behavioral Cycles

Probability, Statistics, Error Analysis and Risk Assessment. 30-Oct-2014 PHYS 192 Lecture 8 1

How Confident Are Yo u?

aps/stone U0 d14 review d2 teacher notes 9/14/17 obj: review Opener: I have- who has

USING ASSERTIVENESS TO COMMUNICATE ABOUT SEX

OCW Epidemiology and Biostatistics, 2010 David Tybor, MS, MPH and Kenneth Chui, PhD Tufts University School of Medicine October 27, 2010

Reducing Social Threats

AP Stats Review for Midterm

STAGES OF ADDICTION. Materials Needed: Stages of Addiction cards, Stages of Addiction handout.

Unit 1 Exploring and Understanding Data

25. Two-way ANOVA. 25. Two-way ANOVA 371

Chapter 11. Experimental Design: One-Way Independent Samples Design

Chapter 12. The One- Sample

You can t fix by analysis what you bungled by design. Fancy analysis can t fix a poorly designed study.

Statistical Techniques. Masoud Mansoury and Anas Abulfaraj

5 14.notebook May 14, 2015

LSP 121. LSP 121 Math and Tech Literacy II. Topics. Binning. Stats & Probability. Stats Wrapup Intro to Probability

Self-Esteem Discussion Points

PROBABILITY Page 1 of So far we have been concerned about describing characteristics of a distribution.

Chapter 1: Exploring Data

UNIT 5 - Association Causation, Effect Modification and Validity

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

t-test for r Copyright 2000 Tom Malloy. All rights reserved

First Problem Set: Answers, Discussion and Background

Bayes theorem describes the probability of an event, based on prior knowledge of conditions that might be related to the event.

Selection at one locus with many alleles, fertility selection, and sexual selection

Scouter Support Training Participant Workbook

Data = collections of observations, measurements, gender, survey responses etc. Sample = collection of some members (a subset) of the population

Transcription:

I. Introduction and Data Collection B. Sampling In this section Bias Random Sampling Sampling Error 1. Bias Bias a prejudice in one direction (this occurs when the sample is selected in such a way that favors the selection of elements with a particular characteristic) If a statistic is biased it will tend to overestimate or underestimate the value we are trying to predict. We can eliminate (or at least minimize) bias by using proper sampling techniques. Samples that include Bias Convenience Sampling - Uses results or data that are conveniently and readily obtained. Data from this type of sample runs the risk of being severely biased and should not be generalized to the overall population (cannot use to conduct inferential statistics). Descriptive statistics can be calculated for the data, however, these numbers can only be used to describe the sample. This summary of the data is not of much practical use for making inference about a larger population. Voluntary response sample A common convenience sample is letting subjects volunteer. This kind of sample typically over-represents people with strong opinions. s 1. Restaurant comment cards: Who fills out restaurant comment cards? If a manager looks at the cards at the end of the day will the cards accurately portray the service that was provided at the restaurant that day? People will most likely fill out these cards if they are unhappy with their service. They may fill them out if they were extremely happy with their service. It is not very common for someone to fill one out and say they had average service. As with most volunteer samples people with strong opinions are much more likely to be included in your sample. Therefore, this data should never be extended to the population and used to determine the level of service at the restaurant. This does not mean that the data is meaningless. More than anything, it gives a sense of how many individuals were unhappy with their service. Obviously, 2 negative comment cards is better than 20. 1

2. Medical volunteers: When testing a drug, it is very common to begin testing on volunteers because that is the only ethical option. You cannot force people to take a drug. To illustrate, suppose we have a new treatment for a certain type of cancer. Who do you think is most likely to volunteer for this experimental drug? The most likely volunteers are individuals with the most severe of cases. Individuals who have already tried traditional treatments and are out of other options. The fact that drugs are tested on a sample that is not representative of the population is problematic. Using volunteer samples is partly the reason why drugs are sometimes pulled back off the shelf after a couple years. 2. Random Sampling For a sample to be useful when conducting inferential statistics, it needs to be representative of the population! This is the common terminology used in identifying a sample as useful in extending the results to the population. Random sample A sample determined completely by chance. This is one way to get a representative sample. With a random sample, the sample will typically be similar to our population with respect to demographic and other variables. Also, we can control the probability of making a mistake (or probability of error). For illustration purposes, suppose that we are interested in estimating the average income for residents in the state of Kentucky. If we sample 1000 males from Lexington, would the sample be representative of the population? No, the sample would be biased and would tend to overestimate average income. Males tend to make more on average than females and people from Lexington tend to make more than what is average in the state. What we would want is if 50% of our population is male then 50% of our sample should be male. Also, if 10% of the population is from Lexington then 10% of our sample should be from Lexington. There are an uncountable number of other variables for which we would want the sample to be similar to the population. A random sample will take care of all these variables at once. Keep in mind this is not a guarantee. However, if 10% of the population is from Lexington and we take a random sample, then the probability is high that 10% of our sample will be from Lexington. In actuality, we know that the probability is high that the sample is similar to the population in terms of all demographic and other variables. Just as important, we can calculate this probability of getting a good sample as compared to a bad sample. We will look at these calculations later in the semester. Finally, it should be understood that we can make this probability higher by taking a larger sample size. It works kind of like flipping a coin. If you flip a coin 10 times what is the probability of getting tails about 50% of the time? If you flip a coin 50 times what is the probability of getting tails about 50% of the time? The important thing here is that if you flip the coin more times, you are more likely to get a percentage close to the population value of 50%. 2

Types of Random Samples 1) Simple random sample A sample of n measurements from a population selected in such a manner that every sample of size n from the population has equal probability of being selected. (n is standard notation for sample size) Everything we do in this course will be based on a simple random sample. This is the most common method used and is preferred in most cases. Suppose that the students in this class are my population. For illustration purposes assume we are in a classroom and there are 5 rows and 10 students in each row. I want a sample of 10 students from the class so I randomly select one row as my sample. Is this a simple random sample? This is a situation where you must be careful about how you think of the probability. The mistake people tend to make is asking if each person has an equal chance of being selected. For this scenario, the answer is yes. Every person has a one out of five chance of being selected since 1 of the 5 rows is being selected. The question that should be asked is does every sample of size 10 have an equal chance of being selected. The answer here is no. You can only be in a sample with the people in your row. There is no probability of you being in a sample with someone from another row. Meaning every sample does not have equal probability of being selected and this is not a simple random sample. When you group individuals and select a group or groups, you no longer satisfy the definition of a simple random sample. In order to actually get a simple random sample something must be done equivalent to drawing names out of a hat. This will mean all samples have an equal probability of being selected. I want to introduce some other methods of random sampling even though we will not analyze data coming from these types of samples. I want you to understand that there are times when a simple random sample is not the best choice. 2) Stratified Sampling - The population is divided into at least two distinct strata or groups. Then, a (simple) random sample of a certain size is drawn from each stratum. Suppose we define our population as all surgeons in the United States. Also, assume that our research question of interest deals with comparing male and female surgeons on their job satisfaction. The problem with a simple random sample in this case comes from the makeup of the population. We are interested in comparing males and females and about 75% of the population is male and only 25% of the population is female. Therefore, if we take a simple random sample of 100 surgeons we would typically get about 75 males and 25 females. Since our purpose is to compare the groups this is not ideal. We would prefer to have an equal number of males and females in our sample. This can be accomplished with a stratified sample. We would stratify on gender which means separate the males and females. We would then sample 50 male surgeons and 50 female surgeons. This is a much better option. 3

3) Systematic Sampling - Select a (random) starting point and then select every (as an example) 4 th person to be part of your sample. This only really works if the data is in random order to start with. The data is almost always in some order and if there is any pattern to that order then your sample is no longer random. You must be careful if using a systematic sample. This sampling technique is mainly used to save time and money. I once worked on a project which studied inmates in the South Carolina prison system. There are three major prisons in the state so that was our population. We could get information on inmates, but only in hard copy form as we were not given access to the prisons data base. Therefore, we had lists of inmates in hard copy form. To get the sample size needed it was decided to take the 7 th (a random number) person from each page. Further information was then collected for those individuals in order to perform the analysis for this study. The one problem in this study is the names were in alphabetical order. This ordering could put into question the randomness of the sample. 4) Cluster Sampling Divide the population into sections. Then, randomly select the sections or clusters. Depending on the size of the cluster, every member or a random sample of the members may be used. Suppose we were interested in the population of elementary school children in New York City. This is a complicated situation because there are about 200 elementary schools in the city. A simple random sample would probably mean having kids from all 200 schools which would be time consuming and expensive. This is the perfect situation to instead use a cluster sample. In this case, the clusters are already set because each school can represent a cluster. First, we would randomly select a certain number, say 30 of the schools. Now we can go to each of those thirty schools and collect our data from either all students or a sample of students within each school. In this case the time and money saved would be worth dealing with a more complicated sample. It is still a random sample and we can calculate the probability of error, the mathematics of the analysis is just more difficult. 4

3. Sampling Error It should be noted that the value of a statistic depends on which items are selected for the sample. When we take a random sample a statistic is a random variable. We are randomly selecting one value for the statistic out of all its possible values. This is what allows us to calculate probability. In this class we will look at the probability of a random variable and then extend that to a statistic. Variability the spread of a set of values Sampling Variability how much a statistic varies from sample to sample We want the possible values of a statistic to not include bias and have a small amount of variability To eliminate bias in a statistic we should use random sampling. We can control the variability in a statistic since a larger sample will force less variation. Margin of error From the sampling variability we can calculate the margin of error We can say with 95% confidence that the amount by which a proportion obtained from a sample will differ from the population proportion will not exceed 1 n Suppose you take a random sample of 1600 adults and 800 enjoy amusement park rides. 1. What is the sample proportion of adults who enjoy amusement park rides? 2. What is the margin of error? 3. Give a confidence statement for the proportion of all adults who enjoy amusement park rides. Answers 1. 800 0.5 50% 1600 2. 1 1 1 0.025 2.5% n 1600 40 3. We are 95% confident that the population proportion of adults who enjoy amusement park rides is between 47.5% and 52.5% 5

There are other factors besides sample size that affect the margin of error. However, these factors can always be dealt with by a large enough sample size. The factors follow and will be discussed in detail later in the course. How confident do we want to be in our conclusion How much variability is in our data Sampling errors Errors that result from sampling Random sampling error This is the error that comes from taking a random sample. We expect this error and can calculate it. Margin of error covers only random sampling error. (other sources of error cause bias) Bad sampling methods Any deviation from a random sample is a bad sampling method which causes bias and increases the error in our estimate. Sampling frame list of every individual in the population Under-coverage when some groups in the population are left out of the sample Suppose your population is all Lexington residents and you are going to conduct phone surveys. A list of all residents in Lexington would be your sampling frame. If you use a phone book to make the calls this would be under-coverage. Obviously, all Lexington residents are not in the phone book. This would be an example of a bad sampling method. 6