ANALYSIS OF SURVEYS WITH EPI INFO AND STATA

Similar documents
Problem Two (Data Analysis) Grading Criteria and Suggested Answers

Multiple Linear Regression Analysis

Biostatistics II

Today: Binomial response variable with an explanatory variable on an ordinal (rank) scale.

Applied Medical. Statistics Using SAS. Geoff Der. Brian S. Everitt. CRC Press. Taylor Si Francis Croup. Taylor & Francis Croup, an informa business

Subject index. bootstrap...94 National Maternal and Infant Health Study (NMIHS) example

Use of GEEs in STATA

m 11 m.1 > m 12 m.2 risk for smokers risk for nonsmokers

Daniel Boduszek University of Huddersfield


Lesson: A Ten Minute Course in Epidemiology

ESM1 for Glucose, blood pressure and cholesterol levels and their relationships to clinical outcomes in type 2 diabetes: a retrospective cohort study

Controlling Bias & Confounding

Intro to SPSS. Using SPSS through WebFAS

Practical Multivariate Analysis

Part 8 Logistic Regression

Chapter 13 Estimating the Modified Odds Ratio

Meta-analysis using RevMan. Yemisi Takwoingi October 2015

Ethnicity and Maternal Health Care Utilization in Nigeria: the Role of Diversity and Homogeneity

An Introduction to Modern Econometrics Using Stata

CD21 CD24 CD38 CD27 CD27. IgD IgD. bm3+4. bm2. early bm5. bm2. bm1. late bm5. unswitched memory. switched memory. naive

Estimating Heterogeneous Choice Models with Stata

Biostatistics 513 Spring Homework 1 Key solution. See the Appendix below for Stata code and output related to this assignment.

Section D. Another Non-Randomized Study Design: The Case-Control Design

Section F. Measures of Association: Risk Difference, Relative Risk, and the Odds Ratio

Investigation of relative survival from colorectal cancer between NHS organisations

Review of Veterinary Epidemiologic Research by Dohoo, Martin, and Stryhn

Empirical assessment of univariate and bivariate meta-analyses for comparing the accuracy of diagnostic tests

Main objective of Epidemiology. Statistical Inference. Statistical Inference: Example. Statistical Inference: Example

Introduction to diagnostic accuracy meta-analysis. Yemisi Takwoingi October 2015

Index. Springer International Publishing Switzerland 2017 T.J. Cleophas, A.H. Zwinderman, Modern Meta-Analysis, DOI /

Rapid Survey Methodology. Rapid Survey Methodology. Rapid Survey Methodology. Rapid Survey Methodology. Rapid Survey Methodology

Table. A [a] Multiply imputed. Outpu

PREVALENCE OF HIV INFECTION AND RISK FACTORS OF TUBERCULIN INFECTION AMONG HOUSEHOLD CONTACTS IN AN HIV EPIDEMIC AREA: CHIANG RAI PROVINCE, THAILAND

Comparing Multiple Imputation to Single Imputation in the Presence of Large Design Effects: A Case Study and Some Theory

Benchmark Dose Modeling Cancer Models. Allen Davis, MSPH Jeff Gift, Ph.D. Jay Zhao, Ph.D. National Center for Environmental Assessment, U.S.

Sociology Exam 3 Answer Key [Draft] May 9, 201 3

Industrial and Manufacturing Engineering 786. Applied Biostatistics in Ergonomics Spring 2012 Kurt Beschorner

Applications. DSC 410/510 Multivariate Statistical Methods. Discriminating Two Groups. What is Discriminant Analysis

Today Retrospective analysis of binomial response across two levels of a single factor.

Supplementary Materials

Data Analysis with SPSS

Trends in hepatocellular carcinoma, Victoria, Australia,

Why Mixed Effects Models?

Supplementary Online Content

STAT362 Homework Assignment 5

Measures of association: comparing disease frequencies. Outline. Repetition measures of disease occurrence. Gustaf Edgren, PhD Karolinska Institutet

Analysis of TB prevalence surveys

Notes for laboratory session 2

Hour 2: lm (regression), plot (scatterplots), cooks.distance and resid (diagnostics) Stat 302, Winter 2016 SFU, Week 3, Hour 1, Page 1

Midterm Exam ANSWERS Categorical Data Analysis, CHL5407H

Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 5 Residuals and multiple regression Introduction

Preliminary Report on Simple Statistical Tests (t-tests and bivariate correlations)

A SAS Macro for Adaptive Regression Modeling

ASDA2 ANALYSIS EXAMPLE REPLICATION SPSS C6

Basic Biostatistics. Chapter 1. Content

Choice of place for childbirth: prevalence and correlates of utilization of health facilities in Chongwe district, Zambia

COSTS AND EFFECTIVENESS OF IMMUNIZATION PROGRAMS

The Australian longitudinal study on male health sampling design and survey weighting: implications for analysis and interpretation of clustered data

Daniel Boduszek University of Huddersfield

Estimating Adjusted Prevalence Ratio in Clustered Cross-Sectional Epidemiological Data

What to Expect Today. Example Study: Statin Letter Intervention. ! Review biostatistic principles. ! Hands on application

Generalized Estimating Equations for Depression Dose Regimes

patients actual drug exposure for every single-day of contribution to monthly cohorts, either before or

Introduction to SPSS S0

On-going and planned colorectal cancer clinical outcome analyses

Trends in Hospital Admissions For Diabetes Complications

Age (continuous) Gender (0=Male, 1=Female) SES (1=Low, 2=Medium, 3=High) Prior Victimization (0= Not Victimized, 1=Victimized)

First birth and the trajectory of women s empowerment in Egypt

Propensity Score Methods for Causal Inference with the PSMATCH Procedure

Statistical Analysis Plan: Post-hoc analysis of the CALORIES trial

Umakaltume Abubakar 1 J. Kwaga 2, E. Okolocha 2, G. Poggensee 1

California s Experiences Using Tuberculosis Whole Genome Sequencing

Methods to control for confounding - Introduction & Overview - Nicolle M Gatto 18 February 2015

From Biostatistics Using JMP: A Practical Guide. Full book available for purchase here. Chapter 1: Introduction... 1

Certificate Courses in Biostatistics

Daniel Boduszek University of Huddersfield

Mark Asbridge, PhD Donald Langille, MD Jennifer Cartwright, MA. Department of Community Health and Epidemiology Dalhousie University

Meta-analysis of diagnostic test accuracy studies with multiple & missing thresholds

The relationship between adherence to clinic appointments and year-one mortality for HIV infected patients at a Referral Hospital in Western Kenya

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition

PTHP 7101 Research 1 Chapter Assignments

Supplementary Online Content

Design and Analysis of a Cancer Prevention Trial: Plans and Results. Matthew Somerville 09 November 2009

Correlation and regression

Title:Postpartum contraceptive use in Gondar town, Northwest Ethiopia: a community based cross-sectional study

Exploring HIV Evolution: An Opportunity for Research Sam Donovan and Anton E. Weisstein

SBIRT IOWA THE IOWA CONSORTIUM FOR SUBSTANCE ABUSE RESEARCH AND EVALUATION. Iowa Army National Guard. Biannual Report Fall 2015

Allergy Basics. This handout describes the process for adding and removing allergies from a patient s chart.

Using Direct Standardization SAS Macro for a Valid Comparison in Observational Studies

Confounding Bias: Stratification

Poisson regression. Dae-Jin Lee Basque Center for Applied Mathematics.

FACTORS ASSOCIATED WITH CHOICE OF POST-ABORTION CONTRACEPTIVE IN ADDIS ABABA, ETHIOPIA. University of California, Berkeley, USA

Analysis of Variance (ANOVA) Program Transcript

Risk Factors, Comorbid Conditions, and Epidemiology of Autism in Children

Modelling Research Productivity Using a Generalization of the Ordered Logistic Regression Model

Statistics as a Tool. A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations.

ROC Curves. I wrote, from SAS, the relevant data to a plain text file which I imported to SPSS. The ROC analysis was conducted this way:

Applied Regression The University of Texas at Dallas EPPS 6316, Spring 2013 Tuesday, 7pm 9:45pm Room: FO 2.410

A Concise Guide to Market

Transcription:

Department of Epidemiology Course EPI 418 School of Public Health University of California, Los Angeles Session 11 ANALYSIS OF SURVEYS WITH EPI INFO AND STATA Note: prepared with Epi Info (Windows) and Stata 8 For rapid surveys most measures of importance to epidemiologists can be derived with Epi Info (a DOS program) and Stata (a Windows program). The Stata program is more sophisticated, with options for multivariate analysis that are not included in Epi Info. Yet Epi Info remains useful for data entry and preliminary analysis, especially involving stratification. In this handout, I will present some of the common analyses of both programs, focusing on a single example that uses data from a study of neonatal mortality given to students for the Individual Problem. I will view two variables: use of a razor to cut the umbilical cord shortly after delivery, and death during the first 27 days after birth. The data are from an imaginary cohort study. Implied Model: Use of razor (-) < Neonatal death Death during Neonatal Period Yes (1) No (2) Cut cord with razor No (2) 186 822 1008 Yes (1) 88 1004 1092 274 1826 2100 Table 1. Epi Info analysis as simple random sample (SRS) and cluster sample (CLU). Sampling assumption Variable(s) Measure 95% Confidence Interval Point Estimate Lower Limit Upper Limit Design effect Section with example output SRS DEAD=1 incidence 0.130 0.117 0.146 -- 1 CLU DEAD=1 incidence 0.130 0.099 0.162 4.40 2 SRS RAZOR=2 prevalence 0.480 0.458 0.502 -- 1 CLU RAZOR=2 prevalence 0.480 0.452 0.508 1.63 2 SRS DEAD & RAZOR risk ratio 2.290 1.803 2.907 -- 3 CLU DEAD & RAZOR risk ratio 2.290 1.530 3.427 NA 4 SRS DEAD & RAZOR odds ratio 2.582 1.971 3.381 -- 3 CLU DEAD & RAZOR odds ratio 2.582 1.640 4.058 NA 4 SRS DEAD & RAZOR risk difference 0.104 0.075 0.133 -- 3 CLU DEAD & RAZOR risk difference 0.104 0.055 0.153 NA 4 NA = not applicable (but could be calculated by comparing the variance estimates for SRS and CLU) Analysis of Surveys: Epi Info and Stata Page 1

Table 2. Stata analysis as simple random sample (SRS) and cluster sample (CLU). Sampling assumption Variable(s) Measure 95% Confidence Interval Point Estimate Lower Limit Upper Limit Design effect Section with example output SRS DEAD = 1 incidence 0.130 0.116 0.145 -- 5 CLU DEAD = 1 incidence 0.130 0.099 0.162 4.398 6 SRS RAZOR = 1 prevalence 0.480 0.459 0.501 -- 5 CLU RAZOR = 1 prevalence 0.480 0.452 0.508 1.632 6 SRS DEAD & RAZOR risk ratio 2.290 1.777 2.951 -- 7 CLU DEAD & RAZOR risk ratio 2.290 1.530 3.427 2.619 8 SRS DEAD & RAZOR odds ratio 2.582 1.971 3.381 -- 9 CLU DEAD & RAZOR odds ratio 2.582 1.642 4.058 2.580 10 SRS DEAD & RAZOR risk difference 0.104 0.075 0.133 -- 11 CLU DEAD & RAZOR risk difference 0.104 0.055 0.153 2.624 12 Analyses with Epi Info 1. Under Analysis Commands, the Options command for Statistics is set to Advanced. Then under Statistics the Frequencies command is used for DEAD and RAZOR. FREQ DEAD FREQ RAZOR 2. Under Analysis Commands, the Advanced Statistics option for Complex Sample Frequencies is used for DEAD and RAZOR. Analysis of Surveys: Epi Info and Stata Page 2

FREQ DEAD PSUVAR=CLUSTER FREQ RAZOR PSUVAR=CLUSTER 3. Under Analysis Commands, the Statistics option for Tables is used to compare RAZOR (exposure variable) with DEAD (outcome variable). First, however, RAZOR is recoded from 1 and 2 to no and yes so that it lines up correctly in the Epi Info table [note: no is exposed to the risk of neonatal mortality from not using a razor to cut the umbilical cord]. Analysis of Surveys: Epi Info and Stata Page 3

RECODE RAZOR TO RAZOR 1 = "Yes" 2 = "No" END TABLES RAZOR DEAD 4. Under Analysis Commands, the Advanced Statistics option for Complex Sample Tables is used to compare RAZOR (exposure variable) with DEAD (outcome variable). As before, RAZOR is recoded from 1 and 2 to no and yes so that it lines up correctly in the Epi Info table. Analysis of Surveys: Epi Info and Stata Page 4

TABLES RAZOR DEAD PSUVAR=CLUSTER Analyses with Stata 5. The variables DEAD and RAZOR in the Epi Info file were recoded from the original values of 1 and 2 for DEAD to 1 (dead) and 0 (alive) and for RAZOR to 1 (not used) and 0 (used). The *.mdb file was then saved as a *.rec file (named with 8 letters or less), moved to the DATA subdirectory in Stata, and converted to a *.dct file with the epi2dct.exe program. Thereafter the *.dct file was read into Stata with the infile using command and saved as a *.dta file. Once in Stata, the means of the binomial variables DEAD and RAZOR were analyzed as if the data came from a simple random sample. means dead razor 6. To analyzed the means correct as a cluster survey, the primary sampling unit (PSU cluster) needs to be recognized. The recognition is created with the svyset, psu(cluster) command, followed by svymean dead razor to derive the means. Analysis of Surveys: Epi Info and Stata Page 5

svymean dead razor 7. For the risk ratio, first assume the data were derived from a simple random sample. To calculate the risk ratio comparing RAZOR (the exposure variable coded as 1 for not used and 0 for used) to DEAD (the outcome variable coded as 1 for dead and 0 for alive), the poisson regression command is used. poisson dead razor, irr 8. Next correctly assume the data were derived from a cluster survey. To calculate the risk ratio comparing RAZOR to DEAD, the survey version of the poisson regression command is used. svypois dead razor, irr ci deff 9. For the odds ratio, first assume the data were derived from a simple random sample. To calculate the odds ratio comparing RAZOR (the exposure variable coded as 1 for not used and 0 for used) to DEAD (the outcome variable coded as 1 for dead and 0 for alive), the logistic regression command is used. Analysis of Surveys: Epi Info and Stata Page 6

logit dead razor, or 10. Next correctly assume the data were derived from a cluster survey. To calculate the odds ratio comparing RAZOR to DEAD, the survey version of the logistic regression command is used. svylogit dead razor, or ci deff 11. For the risk difference, first assume the data were derived from a simple random sample. To calculate the risk difference comparing DEAD (coded as 1 for dead and 0 for alive) among RAZOR coded as 1 (i.e., not used) versus RAZOR coded as 0 (i.e., used), the binomial regression command is used. binreg dead razor, rd 12. Lastly, correctly assume the data were derived from a cluster survey. To calculate the risk difference comparing DEAD (coded as 1 for dead and 0 for alive) among RAZOR coded as 1 (i.e., not used) versus RAZOR coded as 0 (i.e., used), the linear regression command is used. Analysis of Surveys: Epi Info and Stata Page 7

svyregress dead razor, ci deff Analysis of Surveys: Epi Info and Stata Page 8