Principle underlying all of statistics

Similar documents
MATH& 146 Lesson 4. Section 1.3 Study Beginnings

Math 124: Module 3 and Module 4

Bias in Sampling. MATH 130, Elements of Statistics I. J. Robert Buchanan. Fall Department of Mathematics

Math 124: Modules 3 and 4. Sampling. Designing. Studies. Studies. Experimental Studies Surveys. Math 124: Modules 3 and 4. Sampling.

MATH-134. Experimental Design

Introduction to Statistics

Introduction; Study design

STATISTICS: METHOD TO GET INSIGHT INTO VARIATION IN A POPULATIONS If every unit in the population had the same value,say

Confidence Intervals and Sampling Design. Lecture Notes VI

Math 140 Introductory Statistics

Data = collections of observations, measurements, gender, survey responses etc. Sample = collection of some members (a subset) of the population

Sampling and Data Collection

Sampling Controlled experiments Summary. Study design. Patrick Breheny. January 22. Patrick Breheny Introduction to Biostatistics (BIOS 4120) 1/34

Study Design. Study design. Patrick Breheny. January 23. Patrick Breheny Introduction to Biostatistics (171:161) 1/34

CHAPTER 2 SOCIOLOGICAL RESEARCH METHODS

Oncology Clinical Research & Race: Statistical Principles

Population. population. parameter. Census versus Sample. Statistic. sample. statistic. Parameter. Population. Example: Census.

BIAS: The design of a statistical study shows bias if it systematically favors certain outcomes.

THE ARAB AMERICAN VOTE September 27, 2012

ELECTION NIGHT 2018 HOT SHEET

For each of the following cases, describe the population, sample, population parameters, and sample statistics.

How are polls conducted? by Frank Newport, Lydia Saad, David Moore from Where America Stands, 1997 John Wiley & Sons, Inc.

Chapter 7. Sampling Techniques

Probability Models for Sampling

Arab American Voters 2014

Sampling Reminders about content and communications:

Collecting Data Example: Does aspirin prevent heart attacks?

THIS CHAPTER COVERS: The importance of sampling. Populations, sampling frames, and samples. Qualities of a good sample.

Political Science 15, Winter 2014 Final Review

Problems for Chapter 8: Producing Data: Sampling. STAT Fall 2015.

Ch. 1 Collecting and Displaying Data

TIPSHEET QUESTION WORDING

If you could interview anyone in the world, who. Polling. Do you think that grades in your school are inflated? would it be?

P. 266 #9, 11. p. 289 # 4, 6 11, 14, 17

Lecture (chapter 1): Introduction

Homework #2 is due next Friday at 5pm.

Polling. Polling. History of Polling

OPIOID USE IN NEW JERSEY: PUBLIC FAVORS TREATMENT MORE THAN PENALTIES

[POLS 4150] R, Randomization and Sampling

Sampling for Success. Dr. Jim Mirabella President, Mirabella Research Services, Inc. Professor of Research & Statistics

North Carolina Survey Results

Why do Psychologists Perform Research?

Illusory Correlation

Health Care Callback Survey Topline August 2001

Majority think marijuana dispensaries should be allowed in Toronto

BIOSTATS 540 Fall 2018 Exam 2 Page 1 of 12

Stat 13, Intro. to Statistical Methods for the Life and Health Sciences.

THE PUBLIC AND GENETIC EDITING, TESTING, AND THERAPY

Chapter Problem. Why was the Literary Digest poll so wrong?

Diageo/The Hotline Poll Conducted By:

UTAH VOTERS FAVOR RAISING THE LEGAL AGE FOR THE SALE OF TOBACCO TO AGE 21.

8.2 Relative Frequency

Establishing Causality Convincingly: Some Neat Tricks

WEDNESDAY JUNE 20, 2018

Why randomize? Rohini Pande Harvard University and J-PAL.

Chapter 1: Introduction to Statistics

"Everything you need to know about Ethics Iowa Municipal Management Institute March Martha Perego ICMA Director of Ethics

People have used random sampling for a long time

Handout 16: Opinion Polls, Sampling, and Margin of Error

Serbia Tracking Poll Fourth Wave. Prepared By Penn, Schoen & Berland Associates Inc. September 20, 2000

Statistics 13, Midterm 1

ZIKA VIRUS AND THE ELECTION SEASON

DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Research Methods Posc 302 ANALYSIS OF SURVEY DATA

Proposal: To remove organs from imminently dying patients before circulatory death.

Creative Commons Attribution-NonCommercial-Share Alike License

SURVEY RESEARCH. Topic #9. Measurement and assessment of opinions, attitudes, etc. Usually by means of questionnaires and sampling methods.

SAMPLING AND SAMPLE SIZE

RECOMMENDED CITATION: Pew Research Center, October 2014, Public Divided over Whether Secret Service Lapses Signal Broader Problems

Unit 5. Thinking Statistically

Jeffrey leads in Brampton

Patrick Breheny. January 28

TOPIC: Introduction to Statistics WELCOME TO MY CLASS!

Majority approve of legalized, regulated, taxed cannabis

Serbia Election Tracking Polls. Prepared By Penn, Schoen & Berland Associates Inc. September 1, 2000

The Logic of Causal Order Richard Williams, University of Notre Dame, Last revised February 15, 2015

Lecture 1. Benefits of Using Statistics

Draft 0-25 special educational needs (SEN) Code of Practice: young disabled people s views

FUNDRAISING HELP PACK. Registered Charity

1.1 Goals and Learning Objectives. 1.2 Basic Principles. 1.3 Criteria for Good Measurement. Goals and Learning Objectives

Oregon Political Issues N = 618 Active Voters April 28 30

A Probability Puzzler. Statistics, Data and Statistical Thinking. A Probability Puzzler. A Probability Puzzler. Statistics.

GENDER AND RESPONSE EFFECTS IN A PRE-ELECTION POLL: ILLINOIS 1992

Review+Practice. May 30, 2012

3/12/2011. You Know The Story. What is missing? Balance, Fairness and Bias

STA 291 Lecture 4 Jan 26, 2010

Objectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests

Mentoring. Awards. Debbie Thie Mentor Chair Person Serena Dr. Largo, FL

FAQ: Heuristics, Biases, and Alternatives

Majority approve of legal marijuana

How do you know if a newspaper article is giving a balanced view of an issue? Write down some of the things you should look for.

Theory Building and Hypothesis Testing. POLI 205 Doing Research in Politics. Theory. Building. Hypotheses. Testing. Fall 2015

Section Introduction

How Bad Data Analyses Can Sabotage Discrimination Cases

AP Statistics Hypothesis Testing and Confidence Intervals Take Home Test 40 Points. Name Due Date:

Majority approve of grow your own ruling

Objectives. Data Collection 8/25/2017. Section 1-3. Identify the five basic sample techniques

CHANCE YEAR. A guide for teachers - Year 10 June The Improving Mathematics Education in Schools (TIMES) Project

(happiness, input, feedback, improvement)

THE OPIOID CRISIS: AMERICANS FAVOR COMPASSION AND TREATMENT FOR ADDICTS OF ALL STRIPES

STA Module 1 The Nature of Statistics. Rev.F07 1

Transcription:

Sampling Bias

Principle underlying all of statistics Drawing inferences from a suitable sample of a population is far less labor intensive than, but can be equally as informative as, studying the entire population. What s the catch? A suitable sample must look like the population from which it is drawn, which is easier said than done. Also, bigger is better. An unsuitable sample is usually biased in some way.

iclicker Q: Sampling Bias Who won the 1948 U.S. Presidential Election? A: Harry Truman B: Thomas Dewey C: Someone Else Image source

The News Got it Wrong Because of a strike at the paper, the Chicago Tribune had printed the papers before the election results became official. Almost every pre-election poll had Dewey winning, so the paper assumed a Dewey victory. The Tribune was not the only paper to make this mistake. The Journal of Commerce had 8 articles about what to expect from the Dewey administration the day after the election. Errors in how the pre-election polls were conducted led to the wrong prediction. Pre-election polls were not a random sample of voters; Republicans were more likely to appear in pre-election polls than Democrats. Most of these polls were conducted by telephone; Republicans were more likely to have telephones than Democrats, and were thus overrepresented in the sample.

Types of Bias

Selection Bias When each member of the relevant population does not have an equal chance of being selected for the study. Iowa straw poll (now defunct!) Republican candidates descend upon Ames, IA in a year before an election Both a poll and a fundraiser: it costs $30 to cast your vote Very poor predictor of future Republican candidate

Self-Selection Bias Participants choose whether or not to participate, so the sample biases towards the types of people that opt in Examples: Surveys being conducted at airports or highway rest areas What would bias someone to participate? Extra time Strong opinions Incentives are not equally appealing to all How can studies avoid self-selection bias? Thorough outreach; no barriers to participation Carefully designed incentives Opt out system

Exclusion Bias Researchers cannot sample some subgroups of the population: e.g., We cannot reach Brown students who don t read Morning Mail. Phone surveys cannot reach people without phones. Government records exclude illegal immigrants. Exclusion bias is the opposite of selection bias because you cannot self select if you are not presented with the opportunity to do so. Not necessarily motivated by an intent to discriminate/gather biased data; it s just much more difficult to reach people without phones, or email, etc.

Famous Examples of Exclusion Bias Literary Digest Poll of 1936 FDR or Alf Landon (Republican) Ten million subscribers / prospective voters! Prediction: Landon over FDR, with 57% of the popular vote Actual: FDR over Landon with 60% of the popular vote Problem: Magazine subscribers were wealthy Americans Wealthier Americans lean Republican Dewey vs. Truman (1948): same problem; they didn t learn!

Survivorship Bias Rather than sample everyone, sample only those that survived a treatment. Examples: Studying the effects of a medicine only in patients that survived the clinical trials. Studying test scores in high-school students, and claiming higher scores over the years (each year more and more high school dropouts are excluded). Studying mutual funds over the last ten years, but not including mutual funds that do not survive the ten year period of study. Need not involve actual death or destruction, only the removal of certain parties from the subject group.

Example of survivorship bias During World War 2, the military hired a statistician to tell them where to armour their aircrafts. Below, is a diagram of bullet holes on planes after they returned from war. Where should you put the armour? Image source

Recall Bias From the New York Times Magazine (2011) The diagnosis of breast cancer had not just changed a woman s present and the future; it had altered her sense of her past. Women with breast cancer had (unconsciously) decided that a higher-fat diet was a likely predisposition for their disease and (unconsciously) recalled a high-fat diet. It was a pattern poignantly familiar to anyone who knows the history of this stigmatized illness: these women, like thousands of women before them, had searched their own memories for a cause and then summoned that cause into memory.

Publication Bias From the New York Times (2008) The makers of antidepressants like Prozac and Paxil never published the results of about a third of the drug trials that they conducted to win government approval, misleading doctors and consumers about the drugs true effectiveness. 94% of studies with positive fundings on the effectiveness of the drugs were published But only 14% of the studies with non-positive findings were published

Random Sampling Everyone in the population is equally likely to be selected, so that the sample is reflective of the entire population. And since the sample reflects the population, we can make inferences about the population just by looking at the sample.

Avoiding Bias

Made-up example of selection bias We want to find out the political preferences of Brown students. We post a morning mail asking students to choose their favorite among candidates in various elections. Is there selection bias in this method of sampling?

Made-up example of selection bias We want to find out the political preferences of Brown students. We post a morning mail asking students to choose their favorite among candidates in various elections. Is there selection bias in this method of sampling? Who would answer this Morning Mail? Students who have time to read morning mail Students who also have time to act on morning mail content

Made-up example of exclusion bias Amy wants to find out which CS department in the US is the best. She picks ten of her colleagues to survey, and then asks them to ask ten of their colleagues for their top picks. She declares the winner to be the department with the most votes. Do you see any potential for exclusion bias in this method of sampling?

Made-up example of exclusion bias Amy wants to find out which CS department in the US is the best. She picks ten of her colleagues to survey, and then asks them to ask ten of their colleagues for their top picks. She declares the winner to be the department with the most votes. Do you see any potential for exclusion bias in this method of sampling? Which departments are likely more (less) connected to Amy? What about the voters might exclude certain groups?

Made-up example of survivorship bias We want to know if muggles can survive a cruciatus curse. We know that Presmouth was recently attacked by Death Eaters. We ask muggles if they survived a cruciatus curse. Who are we missing? Is the actual survival rate likely lower or higher?