[POLS 4150] R, Randomization and Sampling

Similar documents
Sampling Reminders about content and communications:

MATH-134. Experimental Design

Experimental Design There is no recovery from poorly collected data!

Unit 5. Thinking Statistically

CHAPTER 2. MEASURING AND DESCRIBING VARIABLES

DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Research Methods Posc 302 ANALYSIS OF SURVEY DATA

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 1.1-1

Theory Building and Hypothesis Testing. POLI 205 Doing Research in Politics. Theory. Building. Hypotheses. Testing. Fall 2015

Omnibus Poll April 11-12, 2013

Data = collections of observations, measurements, gender, survey responses etc. Sample = collection of some members (a subset) of the population

Diageo/The Hotline Poll Conducted By:

Test Bank for Privitera, Statistics for the Behavioral Sciences

Principle underlying all of statistics

Creative Commons Attribution-NonCommercial-Share Alike License

(CORRELATIONAL DESIGN AND COMPARATIVE DESIGN)

Survey of Young Americans Attitudes toward Politics and Public Service 30th Edition: June 21 July 3, 2016

Population. population. parameter. Census versus Sample. Statistic. sample. statistic. Parameter. Population. Example: Census.

Lecture (chapter 1): Introduction

Attitudes about Opioids among North Carolina Voters

THE ARAB AMERICAN VOTE September 27, 2012

How are polls conducted? by Frank Newport, Lydia Saad, David Moore from Where America Stands, 1997 John Wiley & Sons, Inc.

Arab American Voters 2014

Chapter 1 Data Collection

Math 140 Introductory Statistics

Issues in the 2000 Election: Health Care

Sheila Barron Statistics Outreach Center 2/8/2011

STA 291 Lecture 4 Jan 26, 2010

Political Science 15, Winter 2014 Final Review

Oregon Political Issues N = 618 Active Voters April 28 30

CAMPAIGN FOR TOBACCO-FREE KIDS SURVEY OF WASHINGTON VOTERS Raising the Age of Tobacco Sales DECEMBER 2015.

Causality and Treatment Effects

Teen Civic Engagement: Action to Inspire Change. Action to Inspire Change 11/13/2018. What does Civic Engagement mean to you?

For each of the following cases, describe the population, sample, population parameters, and sample statistics.

THE PUBLIC S PRIORITIES FOR CONGRESS AND PRESIDENT TRUMP IN THE POST- THANKSGIVING PERIOD

Vocabulary. Bias. Blinding. Block. Cluster sample

ZIKA VIRUS AND THE ELECTION SEASON

The motivation to volunteer varies for each

Bias in Sampling. MATH 130, Elements of Statistics I. J. Robert Buchanan. Fall Department of Mathematics

Exam 4 Review Exercises

Research Methods in Social Psychology. Lecture Notes By Halford H. Fairchild Pitzer College September 4, 2013

Economics 345 Applied Econometrics

Final Exam Practice Test

Math 124: Modules 3 and 4. Sampling. Designing. Studies. Studies. Experimental Studies Surveys. Math 124: Modules 3 and 4. Sampling.

5.3: Associations in Categorical Variables

Chapter 1: Introduction to Statistics

66 Questions, 5 pages - 1 of 5 Bio301D Exam 3

Survey of Nevada Voters Examines Views Toward Smoke-Free Law and SB 372

Stat 13, Intro. to Statistical Methods for the Life and Health Sciences.

What type of graph should I use?

P. 266 #9, 11. p. 289 # 4, 6 11, 14, 17

Statistical Sampling: An Overview for Criminal Justice Researchers April 28, 2016

ELECTION NIGHT 2018 HOT SHEET

Op-Ed There's scientific consensus on guns -- and the NRA won't like it

Chapter Three: Sampling Methods

Serbia Tracking Poll Fourth Wave. Prepared By Penn, Schoen & Berland Associates Inc. September 20, 2000

Mayoral Runoff: Close

III. WHAT ANSWERS DO YOU EXPECT?

Statistics Mathematics 243

Math 124: Module 3 and Module 4

Sampling and Data Collection

10 Nov of 6 Bio301D Exam 3

A STARTER BOOK OF CAMPAIGN TEMPLATES

Alaskan Opinions Regarding Statewide Smoke-Free Workplace Law

EWMBA 296 (Fall 2015)

North Carolina Survey Results

The Hunger Games. An Examination of How Hunger Affects Judgments of Political Figures

Reflection Questions for Math 58B

Political Science 30: Political Inquiry Section 1

MAKING A JOY JAR DISCOVERING GRATITUDE DAY BY DAY

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Introduction: Statistics, Data and Statistical Thinking Part II

8.2 Relative Frequency

Status of Cubans. Dario Moreno Inc.

Creative Commons Attribution-NonCommercial-Share Alike License

Chapter 12. The One- Sample

SAN DIEGO COUNTY DEMOCRATS IN A Growing Progressive Movement

Sociology 301. Sampling + Research Ethics + Exam Review. Non-Probability Sampling

Chapter 5: Producing Data

Slide 1 - Introduction to Statistics Tutorial: An Overview Slide notes

An Escalation Model of Consciousness

Jeffrey leads in Brampton

Mental capacity and mental illness

Statistics Success Stories and Cautionary Tales

VOSI. The following questions are asking for your views related to science and scientific investigations. There are no right or wrong answers.

Forming a Friends of the Park Group

Islington LINk Annual Report

AUTHORITIES & STATISTICS

Chapter 4 Review. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Introduction to Research Methods

Majority think marijuana dispensaries should be allowed in Toronto

More than Half Approve of the Sale of Marijuana Edibles

BIOSTATS 540 Fall 2018 Exam 2 Page 1 of 12

VERMONT SUICIDE PREVENTION & INTERVENTION PROTOCOLS FOR PRIMARY CARE PROFESSIONALS

2015 Survey on Prescription Drugs

Chapter 3. Producing Data

Visual semantics: image elements. Symbols Objects People Poses

UTAH VOTERS FAVOR RAISING THE LEGAL AGE FOR THE SALE OF TOBACCO TO AGE 21.

[Source: Freedman, Pisani, Purves (1998) Statistics, 3rd ed., Ch 2.]

For groups of young people aged 15 to 21

What is: regression discontinuity design?

Transcription:

Professor Jason Anastasopoulos ljanastas@uga.edu January 12, 2017

Variables and their measurement When we talk about political phenomena and their relationships we are talking about variables.

Variables and their measurement: example Claim: more guns lead to more crime. Variables: gun ownership in states, crime rates in states & their relationship.

Variables and their measurement: example Claim: more guns lead to more crime. Variables: gun ownership in states, crime rates in states & their relationship.

Variables and their measurement: definition So what is a variable after all? It is something (usually a quantity) that can vary among subjects in a population.

Variables types Quantitative and Categorical/Qualitative relates to the qualities of the variable. Discrete and Continuous relates to qualities of the variable and the statistical methods used to analyze them.

Quantitative variables Quantitative variables are variables which contain a measure of the magnitude of something on a scale. Hillary Clinton s vote share in the 2016 election by state. Age of each registered voter in Georgia. Years of education.

Categorical variables Categorical variables are variables which contain information about a category. Numbers are usually used as placeholders. Gender: Male = 0 or Female = 1. Political party affiliation: Democrat = 0, Republican = 1, Independent = 2.

Discrete variables Discrete variables are those in which the possible values for a finite set of numbers. Gender: Male = 0 or Female = 1. Political party affiliation: Democrat = 0, Republican = 1, Independent = 2.

Continuous variables Variable can take on a (theoretically) infinite number of values on R (the real number line). Used mostly to treat variables that can take on many, but not necessarily and infinity number of values for statistical reasons. Eg) age, vote share, etc are all treated as continuous even though they may not be technically.

Dealing with variables in R Loading data. Accessing data. Basic numerical operations.

Data Data: 2016 Presidential Election Data by precinct and state Thanks to former UGA undergrad (now at Harvard) Stephen Pettigrew! https://goo.gl/hhvd2q. Variables state abbreviated state. jurisdiction election precinct. candidate presidential candidate. votes number of votes for the candidate. totalvotes total number of votes in the precinct.

Loading the data in R Getting started 1 Download the data: president-long.csv 2 Download the code: election-data.r

Randomization Randomization is probably one of the most important concepts in statistical inference. It allows us to make causal inferences. It allows us to design studies which allow us to make inferences about populations from samples.

Randomization and causal inference Randomization is used in experiments to make causal inferences about an intervention. It can also be used to make causal inferences outside of experiments.

Randomization and causal inference: clinical trials Assessing the causal effect of a drug is done via clinical trials as stipulated by the FDA. Patients randomly assigned to a treatment (the actual drug) or control/placebo (a sugar pill) Outcomes measured to test effectiveness of drug.

Randomization and causal inference: clinical trials Who receives treatment or placebo is controlled by the experimenter allows for causal inference. Who decides to participate in the clinical trial is not voluntary participation. Potential issues?

Randomization and design Randomization is also what allows us to design studies and polls to obtain accurate information about populations from samples. Allows us to estimate parameters using statistics.

Pew research polls

Pew research polls

Pew research polls

Randomization in the context of polling/survey research X = {x 1, x 2,, x n } Select a set of individuals, N, with p(n 1 ) = p(n 2 ) = Randomization, for this class, will involve a process known as simple random sampling. This process involves randomly choosing a sample of individuals N of size k from, for example a list. The list of subjects in the population is known as the sampling frame.

Simple random sampling There are two ways to randomly sample: 1 With replacement randomly choose one person from the sampling frame. Put it back. Randomly choose another. 2 Without replacement randomly choose one person from the sampling frame, set it aside, randomly choose another person from the sample frame.

Simple random sampling with replacement Possible Samples: (A, B), (A, A), (B, E), # of Possible Samples: N k = 5 2 = 25 P(Any Sample): 1 N k = 1 25 Imagine a sampling frame containing only 5 people: ABCDE. Let s say we want to randomly sample 2 out of the 5 with replacement.

Simple random sampling without replacement Possible Samples: (A, B), (A, C), (B, C), ( ) N N! # of Possible Samples: = k (N k)!k! = 5! (5 2)!2! = 120 12 = 10 P(Any Sample): 1 ) = 1 10 ( N k Imagine a sampling frame containing only 5 people: ABCDE. Let s say we want to randomly sample 2 out of the 5 with replacement.

Sampling with v. without replacement Sampling without replacement in the real world is almost always preferred. Why? Now algorithms perform sampling almost exclusively.

Randomization and sampling in action: polling Athens Polls rely on random sampling to make inferences about populations. We might be interested, for example, in knowing what % of people in Athens-Clarke County support the building of a border wall.

Randomization in action: polling Athens Step 1 Design a survey which asks respondents for information relating to your question. This can be done using Qualtrics or other survey software: https://ugeorgia.qualtrics.com/se/?sid=sv_0huotnb91o4iloz Or can be accomplished with a paper survey.

Randomization in action: polling Athens Step 2 Find a list of registered voters in Athens-Clarke County: http://sos.ga.gov/admin/uploads/voter_list_order_form_2-15-16.pdf. This list typically contains voters names and addresses but may also contain email contact information. The list of registered voters is now the population that you will be sampling from.

Randomization in action: polling Athens Step 3 Determine sample size this will depend on a lot of factors including how much money you have and how accurate you would like the poll to be. We will discuss sample size calculation in greater detail later on. For the sake of argument let s say that we ll sample k = 2, 000 registered voters from the pool of about N = 500, 000.

Randomization in action: polling Athens In R using the sample() function "sample(x, size, replace = FALSE, prob = NULL)" > sampling.frame = c("john", "Mary", "Joe", "Lefteris",..) > random.sample = sample(x = sampling.frame, size = 2000) > random.sample [1] "Jack" "John"... Step 4 Randomly select k = 2, 000 registered voters using an algorithm (with replacement). In R we might use the sample() function.

Randomization in action: polling Athens Step 5 Collect data by sending out emails or mailing surveys to the people in your sample. Step 6 Analyze the data and write a paper/report or sell it to someone for $$$.