IPUMS â CPS Extraction and Analysis

Similar documents
IPUMS CPS Extraction and Analysis

Quick Start Guide for the CPI Web Training Modules and Assessment FOR NEW USERS

SHOEBOX Audiometry Pro. Quickstart Guide. SHOEBOX Audiometry Pro

Florida Blue s Member Website

Hour 2: lm (regression), plot (scatterplots), cooks.distance and resid (diagnostics) Stat 302, Winter 2016 SFU, Week 3, Hour 1, Page 1

HOW TO USE THE BENCHMARK CALENDAR SYSTEM

Using SPSS for Correlation

Getting Started.

COMPLETING ONLINE EHS COURSES

Allergy Basics. This handout describes the process for adding and removing allergies from a patient s chart.

User Instruction Guide

User Guide V: 3.0, August 2017

Care Pathways User Guide

Making a Room Reservation with Service Requests in Virtual EMS

DICOM Tractography Converter

Data Management System (DMS) User Guide

BREEAM In-Use International 2015 Client User Guide

Content Part 2 Users manual... 4

Publishing WFS Services Tutorial

TMWSuite. DAT Interactive interface

Anticoagulation Manager - Getting Started

NRT Contractor Safety Management System Portal User Guide Registering your company and purchasing new cards

VMMC Installation Guide (Windows NT) Version 2.0

MEAT CONTENT CALCULATION

ProScript User Guide. Pharmacy Access Medicines Manager

Medtech Training Guide

EDUCATIONAL TECHNOLOGY MAKING AUDIO AND VIDEO ACCESSIBLE

Dietary Calculations and DietSoft

Elemental Kinection. Requirements. 2 May Version Texas Christian University, Computer Science Department

IMPORTANT!!! Please read the FAQ document BEFORE you step through this tutorial.

Add_A_Class_with_Class_Number_Revised Thursday, March 18, 2010

Amplifon Hearing Health Care

Module 3: Pathway and Drug Development

TEST-TAKER PROCESS. Create an account Schedule an exam Proctored appointment

WORKS

MyDispense OTC exercise Guide

RESULTS REPORTING MANUAL. Hospital Births Newborn Screening Program June 2016

Data Management System (DMS) User Guide

Instructor Guide to EHR Go

Registration Instructions Thank you for joining the Million Mile Movement!

OneTouch Reveal Web Application. User Manual for Healthcare Professionals Instructions for Use

MNSCREEN TRAINING MANUAL Hospital Births Newborn Screening Program October 2015

Spectrum. Quick Start Tutorial

TEST-TAKER PROCESS. Create an account Schedule an exam Connecting to the proctor.

POST INSTALLATION STEPS FOR SHORETEL CONNECT, SHORETEL COMMUNICATOR AND FAQ

Technical Bulletin. Technical Information for Quidel Molecular Influenza A+B Assay on the Bio-Rad CFX96 Touch

Release Notes. Medtech32. Version Build 3347 Update HbA1c Laboratory Data Format Change Advanced Forms Licensing Renewal (September 2011)

BlueBayCT - Warfarin User Guide

Add_A_Class_with_Class_Search_Revised Thursday, March 18, 2010

Clay Tablet Connector for hybris. User Guide. Version 1.5.0

INSTRUCTOR WALKTHROUGH

RenalSmart (diabetic) Instruction Manual

mpaceline for Peloton Riders User Guide

Treatment Guidelines & Clinical Outcomes User Guide

Core Technology Development Team Meeting

smk72+ Handbook Prof. Dr. Andreas Frey Dr. Lars Balzer Stephan Spuhler smk72+ Handbook Page 1

Data Management System (DMS) User Guide

Diabetes Management App. Instruction Manual

Demo Mode. Once you have taken the time to navigate your RPM 2 app in "Demo mode" you should be ready to pair, connect, and try your inserts.

Contour Diabetes app User Guide

Audit Firm Administrator steps to follow

BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA

Lionbridge Connector for Hybris. User Guide

DENTRIX ENTERPRISE 8.0.5

Sleep Apnea Therapy Software Clinician Manual

You can use this app to build a causal Bayesian network and experiment with inferences. We hope you ll find it interesting and helpful.

What s New in itero Scanner 4.0

mysugr App Manual 1 Getting started 1.1 Key features Quick and easy data entry. Smart search. Neat and clear graphs.

Logbook Manual. 1 Getting started. 1.1 Key features. Quick and easy data entry. Smart search. Neat and clear graphs.

Shepherdbase (Church Membership Database) Screen Shots

Assignment 5: Integrative epigenomics analysis

Double Your Weight Loss By Using A Journal

Student Guide to EHR Go

SleepImage Website Instructions for Use

NIH BIOSKETCH CLINIC

To begin using the Nutrients feature, visibility of the Modules must be turned on by a MICROS Account Manager.

How Immunisations work in Best Practice?

Student Orientation 2010, Centricity Enterprise EMR Training, and netlearning.parkview.com Instructions for Nursing Students

Modifying Food Packages

ATLANTIS WebOrder. ATLANTIS ISUS User guide

SCI-Diabetes. Mark McCue

Global reporting system for hepatitis (GRSH) data approval manual

The Hospital Anxiety and Depression Scale Guidance and Information

CEREC CONNECT. Omnicam and Bluecam Workflow and Upload Instructions for Sending Cases to the Laboratory

NFT User Guide Release Date 01/09/2018

Introduction to SPSS: Defining Variables and Data Entry

CEREC CONNECT. Omnicam and Bluecam Workflow and Upload Instructions for Sending Cases to the Laboratory

CSDplotter user guide Klas H. Pettersen

User Guide Seeing and Managing Patients with AASM SleepTM

Data Management, Data Management PLUS User Guide

User Guide for Classification of Diabetes: A search tool for identifying miscoded, misclassified or misdiagnosed patients

MYFITNESSPAL: SETTING UP AN ACCOUNT

Self Assessment 8.3 to 8.4.x

mysugr App Manual 1 Getting started 1.1 Key features Quick and easy data entry. Smart search. Neat and clear graphs.

ReSound Assist quick guide. A guide for professionals

WORKS

mysugr App Manual Version: 3.44_Android

FGM Data Collection NHS Digital CAP

Cortex Gateway 2.0. Administrator Guide. September Document Version C

FREQUENTLY ASKED QUESTIONS

Transcription:

Minnesota Population Center Training and Development IPUMS â CPS Extraction and Analysis Exercise 2 OBJECTIVE: Gain an understanding of how the IPUMS dataset is structured and how it can be leveraged to explore your research interests. This exercise will use the IPUMS dataset to explore associations between parent and child health, and analyzing relationships between disability variables and marital status. 11/13/2017

Page1 Research Questions Is there an association between parent and child health? What are the trends in disabilities and marital status? Objectives Create and download an IPUMS data extract Decompress data file and read data into R Analyze the data using sample code Validate data analysis work using answer key IPUMS Variables AGE: Age SEX: Sex MARST: Marital status HEALTH: Health status DIFFHEAR: Hearing difficulty DIFFEYE: Vision difficulty R Code to Review This tutorial's sample code and answers use the so-called "tidyverse" style, but R has the blessing (and curse) that there are many different ways to do almost everything. If you prefer another programming style, please feel free to use it. But, for your reference, these are some quick explanations for commands that this tutorial will use: %>% - The pipe operator which helps make code with nested function calls easier to read. When reading code, it can be read as "and then". The pipe makes it so that code like ingredients %>% stir() %>% cook() is equivalent to cook(stir(ingredients)) (read as "take ingredients and then stir and then cook"). as_factor - Converts the value labels provide for IPUMS data into a factor variable for R summarize - Summarize a datasets observations to one or more groups group_by - Set the groups for the summarize function to group by filter - Filter the dataset so that it only contains these values mutate - Add on a new variable to a dataset

Page2 ggplot - Make graphs using ggplot2 weighted.mean - Get the weighted mean of the a variable Review Answer Key (End of document) Common Mistakes to Avoid 1) Not changing the working directory to the folder where your data is stored 2) Mixing up = and == ; To assign a value in generating a variable, use "<-" (or "="). Use "==" to test for equality. 3) Not including missing values when needed. The attached characteristics have missing values when the person doesn't have the relationship, but sometimes you want to treat that as a "No", not as a missing value. Note: In this exercise, for simplicity we will use "weighted.mean". For analysis where variance estimates are needed, use the survey or srvyr package instead. Registering with IPUMS Go to http://cps.ipums.org, click on CPS Registration and apply for access. On login screen, enter email address and password and submit it! Step 1: Make an Extract Go to the homepage and go to Select Data Click the Select/Change Samples box, check the box for the 2010 and 2011 ASEC samples, then click Submit Sample Selections Using the drop down menu or search feature, select the following variables: AGE: Age SEX: Sex MARST: Marital status HEALTH: Health status DIFFHEAR: Hearing difficulty DIFFEYE: Vision difficulty Step 2: Request the Data Click the orange VIEW CART button under your data cart Review variable selection. Click the orange Create Data Extract button Click on â Attach Characteristicsâ

Page3 The following screen will allow you to select who you would like to attach variables for, it should look like this: Describe your extract and click Submit Extract To get to the page to download the data, follow the link in the email, or follow the Download and Revise Extracts link on the homepage Use the steps in the next section to get the data into your statistical package Getting the data into your statistics software The following instructions are for R If you would like to use a different stats package, see: http://cps.ipums.org/cps/extract_instructions.shtml Step 1: Download the Data Go to http://cps.ipums.org and click on Download or Revise Extracts Right-click on the data link next to extract you created Choose "Save Target As..." (or "Save Link As...") Save into "Documents" (that should pop up as the default location) Do the same thing for the DDI link next to the extract (Optional) Do the same thing for the R script You do not need to decompress the data to use it in R Step 2: Install the ipumsr package Open R from the Start menu If you haven't already installed the ipumsr package, in the command prompt, type the following command: install.packages("ipumsr")

Page4 Step 3: Read in the data Set your working directory to where you saved the data above by adapting the following command (Rstudio users can also use the "Project" feature to set the working directory. In the menubar, select File -> New Project -> Existing Directory and then navigate to the folder): setwd("~/") # "~/" goes to your Documents directory on most computers Run the following command from the console, adapting it so it refers to the extract you just created (note the number may not be the same depending on how many extracts you've already made): library(ipumsr) ddi <- read_ipums_ddi("cps_00001.xml") data <- read_ipums_micro(ddi) # Or, if you downloaded the R script, the following is equivalent: # source("cps_00001.r") This tutorial will also rely on the dplyr and ggplot2 packages, so if you want to run the same code, run the following command (but if you know other ways better, feel free to use them): library(dplyr) library(ggplot2) To stay consistent with the exercises for other statistical packages, this exercise does not spend much time on the helpers to allow for translation of the way IPUMS uses labelled values to the way base R does. You can learn more about these in the value-labes vignette in the R package. From R run command: vignette("valuelabels", package = "ipumsr") Analyze the Sample â Part I Creating New Variables A) What are the names of the attached variables (can be found on extract request screen or in the data)? B) On the website, find the FAQ entry for attaching characteristics. What value will the respondents without a parent or spouse present have for the attached variables? C) What are the MARST codes for married respondents? D) Create a variable for married men equal to the difference in spousesâ age.

Page5 data <- mutate(age_diff = ifelse(sex == 1 & (MARST %in% c(1, 2)), AGE - AGE_SP, NA)) E) What is the mean age difference between married men and their spouses? For men aged 30 and under? For 50 and over? summarize(mn_all_married_men = weighted.mean(age_diff, WTSUPP, na.rm = TRUE)) filter(age <= 30) %>% summarize(mn_under30 = weighted.mean(age_diff, WTSUPP, na.rm = TRUE)) filter(age >= 50) %>% summarize(mn_over50 = weighted.mean(age_diff, WTSUPP, na.rm = TRUE)) Analyze the Sample â Part II Relationships in the Data A) What is the universe for DIFFEYE and DIFFHEAR? What is the Code for NIU (Not in Universe)? B) What percent of the population (in the universe) is deaf or has a serious hearing difficulty? What percent of the population (in the universe) is blind or has serious sight difficulties? filter(diffhear!= 0) %>% summarize(diffhear = weighted.mean(diffhear == 2, WTSUPP)) filter(diffeye!= 0) %>% summarize(diffeye = weighted.mean(diffeye == 2, WTSUPP)) C) What percent of the deaf population is married with a spouse present? filter(diffhear == 2) %>% summarize(marst = weighted.mean(marst == 1, WTSUPP)) D) What percent of the deaf population is married to a spouse who is also deaf?

Page6 filter(diffhear == 2) %>% mutate(diffhear_sp_bin =!is.na(diffhear_sp) & DIFFHEAR_SP == 2) %>% summarize(couple_deaf = weighted.mean(diffhear_sp_bin, WTSUPP, na.rm = TRUE)) Analyze the Sample â Part III Relationships in the Data A) What ages of respondents have their parents identified through the attach characteristics? (hint: see variable descriptions for MOMLOC and POPLOC). B) Does there seem to be a relationship between parents and childrenâ s health? data_summary <- filter(!is.na(health_mom)) %>% group_by(health = as_factor(health), HEALTH_MOM = as_factor(health_mom)) %>% summarize(n = sum(wtsupp)) %>% mutate(pct = n / sum(n)) ggplot(data_summary, aes(x = HEALTH_MOM, y = pct, fill = HEALTH)) + geom_col(position = "dodge") C) What other tests could you do to examine this relationship? D) Could there be a sampling issue affecting the relationship between children and parentâ s health? ANSWERS: Analyze the Sample â Part I Creating New Variables A) What are the names of the attached variables (can be found on extract request screen or in the data)? AGE_SP, age of spouse; HEALTH_MOM, health of mother; HEALTH_POP, health of father; HEALTH_SP, health of spouse; DIFFHEAR_SP, hearing disability of spouse; DIFFEYE_SP, vision disability of spouse B) On the website, find the FAQ entry for attaching characteristics. What value will the respondents without a parent or spouse present have for the attached variables? A missing code

Page7 C) What are the MARST codes for married respondents? 1 Married, spouse present; 2 Married, spouse absent D) Create a variable for married men equal to the difference in spousesâ age. data <- mutate(age_diff = ifelse(sex == 1 & (MARST %in% c(1, 2)), AGE - AGE_SP, NA)) E) What is the mean age difference between married men and their spouses? 2.3 For men aged 30 and under? -1.6 For 50 and over? 3.2 summarize(mn_all_married_men = weighted.mean(age_diff, WTSUPP, na.rm = TRUE)) #> mn_all_married_men #> 1 2.266375 filter(age <= 30) %>% summarize(mn_under30 = weighted.mean(age_diff, WTSUPP, na.rm = TRUE)) #> mn_under30 #> 1-0.1605179 filter(age >= 50) %>% summarize(mn_over50 = weighted.mean(age_diff, WTSUPP, na.rm = TRUE)) #> mn_over50 #> 1 3.230359 ANSWERS: Analyze the Sample â Part II Relationships in the Data A) What is the universe for DIFFEYE and DIFFHEAR? What is the Code for NIU (Not in Universe)? Persons age 15+, 0 is the NIU code.

Page8 B) What percent of the population (in the universe) is deaf or has a serious hearing difficulty? 3.1% What percent of the population (in the universe) is blind or has serious sight difficulties? 1.7% filter(diffhear!= 0) %>% summarize(diffhear = weighted.mean(diffhear == 2, WTSUPP)) #> DIFFHEAR #> 1 0.03093566 filter(diffeye!= 0) %>% summarize(diffeye = weighted.mean(diffeye == 2, WTSUPP)) #> DIFFEYE #> 1 0.01666509 C) What percent of the deaf population is married with a spouse present? 49.7% filter(diffhear == 2) %>% summarize(marst = weighted.mean(marst == 1, WTSUPP)) #> MARST #> 1 0.4965242 D) What percent of the deaf population is married to a spouse who is also deaf? 7.7% filter(diffhear == 2) %>% mutate(diffhear_sp_bin =!is.na(diffhear_sp) & DIFFHEAR_SP == 2) %>% summarize(couple_deaf = weighted.mean(diffhear_sp_bin, WTSUPP, na.rm = TRUE)) #> COUPLE_DEAF #> 1 0.07663705

Page9 ANSWERS: Analyze the Sample â Part III Relationships in the Data A) What ages of respondents have their parents identified through the attach characteristics? (hint: see variable descriptions for MOMLOC and POPLOC). Children under age 19 B) Does there seem to be a relationship between parents and childrenâ s health? Parent's health and children's health seem to be directly correlated. data_summary <- filter(!is.na(health_mom)) %>% group_by(health = as_factor(health), HEALTH_MOM = as_factor(health_mom)) %>% summarize(n = sum(wtsupp)) %>% mutate(pct = n / sum(n)) ggplot(data_summary, aes(x = HEALTH_MOM, y = pct, fill = HEALTH)) + geom_col(position = "dodge") C) What other tests could you do to examine this relationship? Correlation matrix, covariance analysis, regression analysis D) Could there be a sampling issue affecting the relationship between children and parentâ s health? Parents are reporting childrenâ s health