Introduction to Survey Sample Weighting. Linda Owens

Similar documents
Subject index. bootstrap...94 National Maternal and Infant Health Study (NMIHS) example

Section on Survey Research Methods JSM 2009

Comparing Alternatives for Estimation from Nonprobability Samples

A Comparison of Variance Estimates for Schools and Students Using Taylor Series and Replicate Weighting

Nonresponse Adjustment Methodology for NHIS-Medicare Linked Data

Logistic Regression Predicting the Chances of Coronary Heart Disease. Multivariate Solutions

Intro to Survey Design and Issues. Sampling methods and tips

Online appendix to The 2008 election: A preregistered replication analysis. 2. Switch to fully Bayesian inference and replications using old data

You can t fix by analysis what you bungled by design. Fancy analysis can t fix a poorly designed study.

Tips and Tricks for Raking Survey Data with Advanced Weight Trimming

Supplementary Online Content

Political Science 15, Winter 2014 Final Review

Addendum: Multiple Regression Analysis (DRAFT 8/2/07)

Section 6.1 Sampling. Population each element (or person) from the set of observations that can be made (entire group)

Attitudes about Opioids among North Carolina Voters

An Application of Propensity Modeling: Comparing Unweighted and Weighted Logistic Regression Models for Nonresponse Adjustments

Strategic Sampling for Better Research Practice Survey Peer Network November 18, 2011

MATH-134. Experimental Design

Vocabulary. Bias. Blinding. Block. Cluster sample

Comparing Multiple Imputation to Single Imputation in the Presence of Large Design Effects: A Case Study and Some Theory

STATISTICS: METHOD TO GET INSIGHT INTO VARIATION IN A POPULATIONS If every unit in the population had the same value,say

An Empirical Study of Nonresponse Adjustment Methods for the Survey of Doctorate Recipients Wilson Blvd., Suite 965, Arlington, VA 22230

UN Handbook Ch. 7 'Managing sources of non-sampling error': recommendations on response rates

JSM Survey Research Methods Section

Combining Probability and Nonprobability Samples to form Efficient Hybrid Estimates

Decision Making Process

JSM Survey Research Methods Section

Chapter 2 Survey Research Design and Quantitative Methods of Analysis for Cross-Sectional Data

Sampling for Success. Dr. Jim Mirabella President, Mirabella Research Services, Inc. Professor of Research & Statistics

CHECKLIST FOR EVALUATING A RESEARCH REPORT Provided by Dr. Blevins

The use of employment/vocational rehabilitation services for persons with HIV/AIDS and substance abuse: A potential health benefit Presenters:

Appendix I: Methodology

NEW JERSEY: LEGAL WEED SEEN AS ECONOMIC BOON

Chapter 4 Review. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Overrepresentation? Under-identification? Both? Understanding the Terms

INTERNAL VALIDITY, BIAS AND CONFOUNDING

Research Design. Beyond Randomized Control Trials. Jody Worley, Ph.D. College of Arts & Sciences Human Relations

Supplementary Appendix

Creative Commons Attribution-NonCommercial-Share Alike License

TRIPLL Webinar: Propensity score methods in chronic pain research

Stat 13, Intro. to Statistical Methods for the Life and Health Sciences.

BIOSTATISTICAL METHODS

The Pennsylvania State University. The Graduate School. The College of the Liberal Arts THE INCONVENIENCE OF HAPHAZARD SAMPLING.

Evaluators Perspectives on Research on Evaluation

aps/stone U0 d14 review d2 teacher notes 9/14/17 obj: review Opener: I have- who has

1. The Role of Sample Survey Design

Preliminary Report on Simple Statistical Tests (t-tests and bivariate correlations)

Basic Biostatistics. Chapter 1. Content

Unit 1 Exploring and Understanding Data

Assessing Representativeness of the California Department of Mental Health Consumer Perception Surveys

Survey Sampling Weights and Item Response Parameter Estimation

UMbRELLA interim report Preparatory work

Handout 16: Opinion Polls, Sampling, and Margin of Error

An Independent Analysis of the Nielsen Meter and Diary Nonresponse Bias Studies

Inference and Error in Surveys. Professor Ron Fricker Naval Postgraduate School Monterey, California

Confounding and Effect Modification. John McGready Johns Hopkins University

HCHS/SOL Manuscript Writing Recommendations

WEDNESDAY JUNE 20, 2018

Trends in Smoking Prevalence by Race based on the Tobacco Use Supplement to the Current Population Survey

CANCER FACTS & FIGURES For African Americans

2011 AP STATISTICS FREE-RESPONSE QUESTIONS (Form B)

Experimental Design There is no recovery from poorly collected data!

Association Between Geographic Concentration Of Chiropractors In 2008 And Circulatory Disease Death Rates In 2009

DAILY SMOKERS - AVERAGE NUMBER OF CIGARETTES SMOKED DAILY KEY MESSAGES

What s New in SUDAAN 11

PubH 7405: REGRESSION ANALYSIS. Propensity Score

Case study examining the impact of German reunification on life expectancy

Intro to SPSS. Using SPSS through WebFAS

Why Does Sampling Matter? Answers From the Gender Norms and Labour Supply Project

Biostatistics for Med Students. Lecture 1

Arizona health survey special Issue. Influence of Community, the Built Environment and Individual Behavior on Weight and Obesity among Arizona Adults

Review+Practice. May 30, 2012

Enrollment under the Medicaid Expansion and Health Insurance Exchanges. A Focus on Those with Behavioral Health Conditions in Michigan

Propensity score methods to adjust for confounding in assessing treatment effects: bias and precision

Sampling Reminders about content and communications:

Assessing Diversity, Disparity, and Best Practices Results of a 2017 Review of Over 150 Adult Drug Courts and DUI Courts

Chapter 1: Exploring Data

NONRESPONSE ADJUSTMENT IN A LONGITUDINAL SURVEY OF AFRICAN AMERICANS

Estimating indirect and direct effects of a Cancer of Unknown Primary (CUP) diagnosis on survival for a 6 month-period after diagnosis.

LOGISTIC PROPENSITY MODELS TO ADJUST FOR NONRESPONSE IN PHYSICIAN SURVEYS

Cigarette Smoking and Lung Obstruction Among Adults Aged 40 79: United States,

Math 140 Introductory Statistics

7 Statistical Issues that Researchers Shouldn t Worry (So Much) About

Selection bias and models in nonprobability sampling

CHILD HEALTH AND DEVELOPMENT STUDY

Lecture 9 Internal Validity

Bronx Community Health Dashboard: Lung Cancer

AnExaminationoftheQualityand UtilityofInterviewerEstimatesof HouseholdCharacteristicsinthe NationalSurveyofFamilyGrowth. BradyWest

Lecture Start

Controlling Bias & Confounding

ethnicity recording in primary care

SMOKING STAGES OF CHANGE KEY MESSAGES

Disparity Data Fact Sheet General Information

CHAPTER 4: FINDINGS 4.1 Introduction This chapter includes five major sections. The first section reports descriptive statistics and discusses the

Reliability. Internal Reliability

CHAPTER 3 RESEARCH METHODOLOGY

Tobacco Free Kids, American Cancer Society Cancer Action Network, Inc., American Heart Association

Assessing the Validity and Reliability of the Teacher Keys Effectiveness. System (TKES) and the Leader Keys Effectiveness System (LKES)

NEW JERSEY: SUPPORT FOR LEGAL WEED STAYS HIGH

Calaveras County TRL Evaluation Reporting Plan

Transcription:

Introduction to Survey Sample Weighting Linda Owens Content of Webinar What are weights Types of weights Weighting adjustment methods General guidelines for weight construction/use. 2 1

What are weights? A weight is a value assigned to each case in the data file to restore the proportional representation of the target population. The value of a weight indicates how much each case will count in a statistical procedure (or how many cases it will represent). A case with a weight of 1 represents only itself. A case with a weight of 2 represents itself plus one other unit. Weights are always positive and nonzero, but can be fractions. 3 Simple example In a simple random sample of 1,000 drawn from a population of 100,000, each sampled member would have a weight of 100, and would represent 100 members of the population (the case itself, plus 99 others). 1,000/100,000=.01; 1/.01=100 If only half of the 1,000 sampled members responded, the weight would be doubled to 200, to account for nonsampled members and sampled members who did not respond. 500/100,000=.005; 1/.005=200 4 2

Conditions for using weights Weights allow the researcher to make inferences to the population from which the sample was drawn e.g. what percent of the population engages in regular exercise? Weights are used to make adjustments in probability samples, not to fix poorly designed samples or convenience samples Sample must be: drawn with probabilistic methods high quality sufficiently large 5 Reasons for using weights Members of population sampled with varying probabilities (e.g. Freshman sampled at higher rate than Seniors) Nonresponse varies by some characteristic of sampled respondents (e.g. women have higher response rates than men) Make sample characteristics consistent with population characteristics (e.g. percent of sample by gender matches percent of population by gender) 6 3

Types of weights Base weights (selection weights) Expansion weights Relative weights Nonresponse weights Post-stratification weights Final analysis weight (generally a combination of the above types) 7 Base/Selection weights (1) Base weights adjust for different probabilities of selection among sampled population members Epsem (equal probability selection methods) result in sample in which each member has same probability of selection. Epsem samples sometimes called self-weighting; using weights rarely necessary for these samples 8 4

Base/Selection weights (2) Base/selection weight is the inverse of the probability of selection: w i = 1 f where Sample 100 from a population of 10,000 i f i = n N f i 100 = =.01 10,000 w i = 1 =.01 100 9 Base/Selection weights (3) When population members are sampled with unequal probabilities, base weights are necessary to ensure proper representation of population Example: If women are sampled from a list at a rate of 1/10 and men at a rate of 1/5, men will be overrepresented in the sample if the data are not weighted. Women have a weight of 10, men a weight of 5. 10 5

Expansion weights Expansion weights are weights that inflate the number of sampled cases to the population N. Base weights can sometimes also serve as expansion weights. Expansion weights should be used only to estimate total numbers of the population who possess the characteristic of interest. Never use expansion weights for model testing as they will inappropriately inflate the sample size being used for analysis. 11 Relative weights Relative weights are appropriate for analytic studies because they do not inflate the sample size. Are constructed by normalizing expansion weights. Dividing each case s expansion weight by the mean expansion weight Or, multiply each weight by ratio of actual sample size to sum of expansion weights 12 6

Expansion weights vs.relative weights Expansion weights sum to the total population size (N) Relative weights sum to the study sample size (n) the number of cases in the data file 13 Weight construction example Stratum N i n i f i w i rw i (n i )(rw i ) 1 100 5.05 20 1.33 6.65 2 100 5.05 20 1.33 6.65 3 100 5.05 20 1.33 6.65 4 100 5.05 20 1.33 6.65 5 100 5.05 20 1.33 6.65 6 50 5.10 10.67 3.35 7 50 5.10 10.67 3.35 8 50 5.10 10.67 3.35 9 50 5.10 10.67 3.35 10 50 5.10 10.67 3.35 Totals 750 50 150 10 50 N i = total population in stratum i n i = number sampled from stratum i f i = probability of selection in stratum i = (n i /N i ) w i = base (expansion) weight in stratum i = (N i /n i ) = 1/f i = mean expansion (base) weight = [ (w i )(n i ) ]/n = (750)/(50) = 15 rw i = relative weight = (w i ) / ( ) 7

Nonresponse weights (1) Nonresponse occurs when some sampled units do not respond to survey: 40% of men respond to survey compared to 50% of women 30% of smokers respond compared to 45% of nonsmokers Nonresponse weights adjust base weights so responding units represent those that don t respond 15 Nonresponse weights (2) Respondents assigned to weighting adjustment cells Characteristics defining cells (gender, race, age) must be on the sample frame Nonresponse adjustment is reciprocal of response rate in each cell. NR adjustment for men=1/.40=2.5; women=1/.5=2 NR weight for smokers=1/.30=3.3; nonsmokers=1/.45=2.2 These adjustments assume characteristics defining cells are the only variables associated with nonresponse NR adjustment is multiplied by the base weight 16 8

Post-stratification (1) Adjusting sample marginal distribution to match population distribution on key variables (generally demographic) Requires an auxiliary dataset to provide the population estimates (census, American Community Survey, etc.) Post-stratification formula: where: p p = population proportion p s = sample proportion 17 Post-stratification example Gender Population Proportion Sample Proportion Population/ Sample Weight Female.52.60.52/.60.8666 Male.48.40.48.40 1.2 Women are over-represented; men are under-represented. Their weights are adjusted by the post-stratification ratios. Post-stratification weights are used to adjust for minor differences in non-response. 18 9

Post-stratification (2) Post-stratification adjustments often use multiple factors gender, race, age, education, etc. How to incorporate all? One large cross-classification of all factors Often results in a huge number of cells Sample sizes in cells too small to work with Iteratively adjust to factors one at a time Raking Manually or with software designed for it 19 Raking how to 1. Weight data with base weights or adjusted base weights (w b ). 2. Run frequency of first demographic variable (e.g. gender) 3. Adjust weighted sample proportion to population proportion w g =(w b *P f /p f ) for women and (w b *P m /p m ) for men 4. Apply this new weight (w g ) and run frequency on next demographic variable (e.g. race) 5. Adjust weighted sample proportion to population proportion w r =(w g *P nhb /p nhb ) for non-hispanic Black, = (w g *P nhw /p nhw ) for non-hispanic White, etc. 6. Do this for each demographic variable; repeat until sample proportions on all demographic variables are close to population proportions. 20 10

Trimming weights Sometimes weights have a large range or a few cases have unusually large weights These may cause problems in the analysis Sometimes researchers trim these weights (create a cutoff for large weights) No clear standards for how to do it Use of trimming should be limited 21 To trim or not to trim? How One 19-Year-Old Illinois Man is Distorting National Polling Averages the Upshot, Nate Cohn 10/12/16 R in U.S.C/LAT poll had a final weight that was 30 times larger than average R and 300 times larger than least-weighted R Jill Darling, the survey director at the U.S.C. Center for Economic and Social Research, noted that they had decided not to trim the weights (that s when a poll prevents one person from being weighted up by more than some amount, like five or 10) because the sample would otherwise underrepresent African-American and young voters. This makes sense. Gallup got itself into trouble for this reason in 2012: It trimmed its weights, and nonwhite voters were underrepresented. https://www.nytimes.com/2016/10/13/upshot/how-one-19-year-old-illinois-man-is-distorting-national-pollingaverages.html?_r=0 22 11

Final analysis weights Only one weight per case can be used for data analysis Final weight typically product of base weight and adjustments made for nonresponse and poststratification e.g. w f = w b x w nr x w ps Due to rounding, sum of final weights often different from analysis sample size. e.g. n=1,500 cases; sum of final weights=1,508.6 Make final adjustment by multiplying final weight by a ratio of actual sample size to sum of final weights (1,500/1,508.6) 23 Use of weights If sample design uses unequal probabilities of selection, weights are necessary when making population inferences with descriptive statistics (e.g. 30% of population smokes). In multivariate analysis (e.g. regression): Not as much consensus about using weights If variables used to construct weights are predictors in regression model, maybe not necessary to use weights Run both ways and compare results Weights almost always increase variance of estimates Understand how your software (STATA, SAS, SPSS) uses weights 24 12