Notes for laboratory session 2

Similar documents
Age (continuous) Gender (0=Male, 1=Female) SES (1=Low, 2=Medium, 3=High) Prior Victimization (0= Not Victimized, 1=Victimized)

Multiple Linear Regression Analysis

2. Scientific question: Determine whether there is a difference between boys and girls with respect to the distance and its change over time.

Final Exam - section 2. Thursday, December hours, 30 minutes

ANOVA. Thomas Elliott. January 29, 2013

1. Objective: analyzing CD4 counts data using GEE marginal model and random effects model. Demonstrate the analysis using SAS and STATA.

m 11 m.1 > m 12 m.2 risk for smokers risk for nonsmokers

This tutorial presentation is prepared by. Mohammad Ehsanul Karim

Introduction to regression

Name: emergency please discuss this with the exam proctor. 6. Vanderbilt s academic honor code applies.

MULTIPLE REGRESSION OF CPS DATA

Regression Output: Table 5 (Random Effects OLS) Random-effects GLS regression Number of obs = 1806 Group variable (i): subject Number of groups = 70

Sociology 63993, Exam1 February 12, 2015 Richard Williams, University of Notre Dame,

4. STATA output of the analysis

MODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING

Modeling unobserved heterogeneity in Stata

Week 8 Hour 1: More on polynomial fits. The AIC. Hour 2: Dummy Variables what are they? An NHL Example. Hour 3: Interactions. The stepwise method.

Sociology Exam 3 Answer Key [Draft] May 9, 201 3

Vessel wall differences between middle cerebral artery and basilar artery. plaques on magnetic resonance imaging

ECON Introductory Econometrics Seminar 7

Regression models, R solution day7

Least likely observations in regression models for categorical outcomes

Chapter 9. Factorial ANOVA with Two Between-Group Factors 10/22/ Factorial ANOVA with Two Between-Group Factors

Effect of Source and Level of Protein on Weight Gain of Rats

Cross-validation. Miguel Angel Luque Fernandez Faculty of Epidemiology and Population Health Department of Non-communicable Diseases.

Answer all three questions. All questions carry equal marks.

Professor Rose-Helleknat's PCR Data for Breast Cancer Study

HZAU MULTIVARIATE HOMEWORK #2 MULTIPLE AND STEPWISE LINEAR REGRESSION

Multiple Regression Analysis

Psych 5741/5751: Data Analysis University of Boulder Gary McClelland & Charles Judd. Exam #2, Spring 1992

Poisson regression. Dae-Jin Lee Basque Center for Applied Mathematics.

Use the above variables and any you might need to construct to specify the MODEL A/C comparisons you would use to ask the following questions.

Data Analysis in the Health Sciences. Final Exam 2010 EPIB 621

Cross-validation. Miguel Angel Luque Fernandez Faculty of Epidemiology and Population Health Department of Non-communicable Disease.

Statistical reports Regression, 2010

Heidi Hardt. *** For questions about this online resource, contact Prof. Heidi Hardt at hhardt at uci dot edu ***

Demystifying causal inference in randomised trials. Lecture 3: Introduction to mediation and mediation analysis using instrumental variables

NEUROBLASTOMA DATA -- TWO GROUPS -- QUANTITATIVE MEASURES 38 15:37 Saturday, January 25, 2003

Multivariate dose-response meta-analysis: an update on glst

Simple Linear Regression

G , G , G MHRN

Dr. Kelly Bradley Final Exam Summer {2 points} Name

Linear Regression in SAS

STAT Factor Analysis in SAS

Generalized Linear Models of Malaria Incidence in Jubek State, South Sudan

Content. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries. Research question. Example Newly diagnosed Type 2 Diabetes

Answer to exercise: Growth of guinea pigs

Answer Key to Problem Set #1

Cross-over trials. Martin Bland. Cross-over trials. Cross-over trials. Professor of Health Statistics University of York

5 To Invest or not to Invest? That is the Question.

Effects of Nutrients on Shrimp Growth

Limited dependent variable regression models

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?

The Stata Journal. Editor Nicholas J. Cox Geography Department Durham University South Road Durham City DH1 3LE UK

Generalized Mixed Linear Models Practical 2

Editor Executive Editor Associate Editors Copyright Statement:

EXECUTIVE SUMMARY DATA AND PROBLEM

Inferential Statistics

Research Analysis MICHAEL BERNSTEIN CS 376

DF-Analyses of Heritability with Double-Entry Twin Data: Asymptotic Standard Errors and E cient Estimation

Data with Python - Examples

Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models, 2nd Ed.

Cross-validated AUC in Stata: CVAUROC

4Stat Wk 10: Regression

Using SAS to Conduct Pilot Studies: An Instructors Guide

Today: Binomial response variable with an explanatory variable on an ordinal (rank) scale.

General Example: Gas Mileage (Stat 5044 Schabenberger & J.P.Morgen)

NORTH SOUTH UNIVERSITY TUTORIAL 2

Estimating average treatment effects from observational data using teffects

Final Exam. Thursday the 23rd of March, Question Total Points Score Number Possible Received Total 100

(a) Perform a cost-benefit analysis of diabetes screening for this group. Does it favor screening?

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)

PARALLELISM AND THE LEGITIMACY GAP 1. Appendix A. Country Information

A Comparison of Methods of Analysis to Control for Confounding in a Cohort Study of a Dietary Intervention

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Business Statistics Probability

Careful with Causal Inference

Lab 4 (M13) Objective: This lab will give you more practice exploring the shape of data, and in particular in breaking the data into two groups.

Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties

Daniel Boduszek University of Huddersfield

Example of Interpreting and Applying a Multiple Regression Model

Sarwar Islam Mozumder 1, Mark J Rutherford 1 & Paul C Lambert 1, Stata London Users Group Meeting

CLASSICAL AND. MODERN REGRESSION WITH APPLICATIONS

Econometric analysis and counterfactual studies in the context of IA practices

Business Research Methods. Introduction to Data Analysis

Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 5 Residuals and multiple regression Introduction

Math Section MW 1-2:30pm SR 117. Bekki George 206 PGH

Math 215, Lab 7: 5/23/2007

Final Research on Underage Cigarette Consumption

Study Guide #2: MULTIPLE REGRESSION in education

The following lines were read from file C:\Documents and Settings\User\Desktop\Example LISREL-CFA\cfacommand.LS8:

Regression Including the Interaction Between Quantitative Variables

STAT 503X Case Study 1: Restaurant Tipping

Two-Way Independent Samples ANOVA with SPSS

Data Analysis in Practice-Based Research. Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine

Determinants and Status of Vaccination in Bangladesh

Implementing double-robust estimators of causal effects

Extending Rungie et al. s model of brand image stability to account for heterogeneity

WEB APPENDIX: STIMULI & FINDINGS. Feeling Love and Doing More for Distant Others:

Chapter 11 Multiple Regression

Transcription:

Notes for laboratory session 2 Preliminaries Consider the ordinary least-squares (OLS) regression of alcohol (alcohol) and plasma retinol (retplasm). We do this with STATA as follows:. reg retplasm alcohol Source SS df MS Number of obs = 314 -------------+------------------------------ F( 1, 312) = 16.19 Model 671843.17 1 671843.17 Prob > F = 0.0001 Residual 12948338.7 312 41501.0855 R-squared = 0.0493 -------------+------------------------------ Adj R-squared = 0.0463 Total 13620181.9 313 43514.958 Root MSE = 203.72 retplasm Coef. Std. Err. t P> t [95% Conf. Interval] alcohol 9.365251 2.327637 4.02 0.000 4.785401 13.9451 _cons 578.8857 13.04634 44.37 0.000 553.2158 604.5556 Try to locate the following: a. What is the overall significance of the model and how is it being assessed? b. What is the effect of alcohol on plasma retinol? c. For each unit of alcohol consumption increase what is the unit-change in plasma retinol? What is the 95% confidence interval? Now do the same using the glm command of STATA.. glm retplasm alcohol Iteration 0: log likelihood = -2113.9991 Optimization : ML: Newton-Raphson Residual df = 312 Scale parameter = 41501.09 Deviance = 12948338.69 (1/df) Deviance = 41501.09 Pearson = 12948338.69 (1/df) Pearson = 41501.09 Log likelihood = -2113.999055 AIC = 13.4777 BIC = 12946544.88 alcohol 9.365251 2.327637 4.02 0.000 4.803167 13.92734 _cons 578.8857 13.04634 44.37 0.000 553.3153 604.4561

Try to notice the similarities between the two approaches. Specifically, notice the following: d. Note the type of link and variance function. Why do you think these are the links and variance function used? Produce the predicted regression line for alcohol consumption, along with a scatter plot of the observed values.. quietly glm retplasm alcohol. predict yhat (option mu assumed; predicted mean retplasm).sc yhat retplasm alcohol,xlab() ylab() c(l.) ms(i o)scheme(s2mono) 0 500 1000 1500 2000 0 10 20 30 40 Alcohol intake predicted mean retplasm Plasma retinol (ng/ml)

Model building Assess the effect of adding variable fat after alcohol has been added to the model. Recall from your previous experience that this can be done with the general linear model procedure as follows:. anova retplasm alcohol fat, continuous(alcohol fat) seq Number of obs = 314 R-squared = 0.0617 Root MSE = 202.716 Adj R-squared = 0.0556 Source Seq. SS df MS F Prob > F -----------+---------------------------------------------------- Model 839986.495 2 419993.248 10.22 0.0001 alcohol 671843.17 1 671843.17 16.35 0.0001 fat 168143.325 1 168143.325 4.09 0.0439 Residual 12780195.4 311 41093.8758 -----------+---------------------------------------------------- Total 13620181.9 313 43514.958 e. What is the criterion of whether fat intake has a significant effect on plasma retinol levels after adjusting for alcohol intake? f. What is the decision about whether we should include fat intake in a model of plasma retinol levels? Now using the glm command in STATA:. glm retplasm alcohol fat Iteration 0: log likelihood = -2111.9469 Optimization : ML: Newton-Raphson Residual df = 311 Scale parameter = 41093.88 Deviance = 12780195.36 (1/df) Deviance = 41093.88 Pearson = 12780195.36 (1/df) Pearson = 41093.88 Log likelihood = -2111.946946 AIC = 13.471 BIC = 12778407.3 alcohol 9.990265 2.336708 4.28 0.000 5.410401 14.57013 fat -.6975524.3448463-2.02 0.043-1.373439 -.0216661 _cons 630.7705 28.7483 21.94 0.000 574.4249 687.1162 We carry out the calculations that lead to the decision about adding or not of fat in the model. Consider using the deviance of the joint model (with fat and alcohol included) versus the model with only alcohol included.

We do this as follows: The deviance of the former model is D ( X ) 12948338.69, while the 1 one for the latter model is D ( X, X ) 12780195.36, where X 1 2 1 is alcohol and X 2 is fat. D( X1) D( X1, X 2) The criterion is 4. 09. D( X, X ) n 3 1 2 g. We can compare this to a chi-square distribution with one degree of freedom. Why? This is done as follows:. display chi2tail(1,4.09).04313765 The p-value is 0.043<0.05 which suggests that fat should be included into the model. Alternatively, you can use the test command as follows:. test fat ( 1) [retplasm]fat = 0 chi2( 1) = 4.09 Prob > chi2 = 0.0431 If you wanted to assess whether two variables added, after alcohol consumption has been entered in the model, are significant, you can use the same method. Consider the following:. glm retplasm alcohol fat fiber Iteration 0: log likelihood = -2111.9263 Optimization : ML: Newton-Raphson Residual df = 310 Scale parameter = 41221.03 Deviance = 12778518.55 (1/df) Deviance = 41221.03 Pearson = 12778518.55 (1/df) Pearson = 41221.03 Log likelihood = -2111.926345 AIC = 13.47724 BIC = 12776736.24 alcohol 9.964244 2.343874 4.25 0.000 5.370336 14.55815 fat -.6767318.3604768-1.88 0.060-1.383253.0297898 fiber -.4526052 2.244069-0.20 0.840-4.850899 3.945688 _cons 635.0317 35.71259 17.78 0.000 565.0363 705.0271

Now you can test the addition of fat and fiber intake in the model as follows:. test fat fiber ( 1) [retplasm]fat = 0 ( 2) [retplasm]fiber = 0 chi2( 2) = 4.12 Prob > chi2 = 0.1275 The results imply that the joint effect of fat and fiber intake is not significant when considered in addition to alcohol intake. h. Can you replicate these results by hand, by considering this model and compare it to the one with only alcohol consumption included?