Statistical Reasoning in Public Health 2009 Biostatistics 612, Homework #2

Similar documents
Statistical Reasoning in Public Health Biostatistics 612, 2009, HW#3

Stat 13, Lab 11-12, Correlation and Regression Analysis

5 To Invest or not to Invest? That is the Question.

AP Statistics Practice Test Ch. 3 and Previous

The Effect of Sitagliptin on Carotid Artery Atherosclerosis in Type 2 Diabetes. Literature Review and Data Analysis. Megan Rouse

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

STP 231 Example FINAL

Lesson 1: Distributions and Their Shapes

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

Lab 4 (M13) Objective: This lab will give you more practice exploring the shape of data, and in particular in breaking the data into two groups.

MULTIPLE REGRESSION OF CPS DATA

Section 3.2 Least-Squares Regression

Homework #3. SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

Autonomic nervous system, inflammation and preclinical carotid atherosclerosis in depressed subjects with coronary risk factors

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do.

Unit 8 Day 1 Correlation Coefficients.notebook January 02, 2018

Multiple Linear Regression (Dummy Variable Treatment) CIVL 7012/8012

Statistical reports Regression, 2010

STATISTICS INFORMED DECISIONS USING DATA

Chapter 3 CORRELATION AND REGRESSION

10. LINEAR REGRESSION AND CORRELATION

While correlation analysis helps

Introduction to Econometrics

Adolescent Hypertension Roles of obesity and hyperuricemia. Daniel Landau, MD Pediatrics, Soroka University Medical Center

Assessing Overweight in School Going Children: A Simplified Formula

3.2 Least- Squares Regression

NORTH SOUTH UNIVERSITY TUTORIAL 2

Homework Linear Regression Problems should be worked out in your notebook

Preliminary Report on Simple Statistical Tests (t-tests and bivariate correlations)

Reminders/Comments. Thanks for the quick feedback I ll try to put HW up on Saturday and I ll you

A response variable is a variable that. An explanatory variable is a variable that.

Ordinary Least Squares Regression

ESM1 for Glucose, blood pressure and cholesterol levels and their relationships to clinical outcomes in type 2 diabetes: a retrospective cohort study

12.1 Inference for Linear Regression. Introduction

Problem Set 3 ECN Econometrics Professor Oscar Jorda. Name. ESSAY. Write your answer in the space provided.

Bangor University Laboratory Exercise 1, June 2008

HW 3.2: page 193 #35-51 odd, 55, odd, 69, 71-78

IAPT: Regression. Regression analyses

Statistics and Probability

ORIGINAL INVESTIGATION. C-Reactive Protein Concentration and Incident Hypertension in Young Adults

1. The figure below shows the lengths in centimetres of fish found in the net of a small trawler.

ANALYZING BIVARIATE DATA

Chapter 1: Exploring Data

Report For Center Created Gender D.O.B Page 1 Sean Breen HeartSmart IMT plus 3/29/2012 Male 11/26/1973 B C D E

10/4/2007 MATH 171 Name: Dr. Lunsford Test Points Possible

Folland et al Chapter 4

Data Analysis in the Health Sciences. Final Exam 2010 EPIB 621

Chapter 14. Inference for Regression Inference about the Model 14.1 Testing the Relationship Signi!cance Test Practice

m 11 m.1 > m 12 m.2 risk for smokers risk for nonsmokers

7. Bivariate Graphing

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

Introduction to regression

Case Study: Lead Exposure in Children

3.2A Least-Squares Regression

LAB ASSIGNMENT 4 INFERENCES FOR NUMERICAL DATA. Comparison of Cancer Survival*

Theory. = an explanation using an integrated set of principles that organizes observations and predicts behaviors or events.

Business Statistics Probability

UF#Stats#Club#STA#2023#Exam#1#Review#Packet# #Fall#2013#

Association between arterial stiffness and cardiovascular risk factors in a pediatric population

Math 075 Activities and Worksheets Book 2:

Chapter 3: Examining Relationships

14.1: Inference about the Model

RICHMOND PARK SCHOOL LIFESTYLE SCREENING REPORT Carmarthenshire County Council

Society for Behavioral Medicine 33 rd Annual Meeting New Orleans, LA

STAT 135 Introduction to Statistics via Modeling: Midterm II Thursday November 16th, Name:

STATISTICS 201. Survey: Provide this Info. How familiar are you with these? Survey, continued IMPORTANT NOTE. Regression and ANOVA 9/29/2013

Biostatistics 513 Spring Homework 1 Key solution. See the Appendix below for Stata code and output related to this assignment.

Trial Evidences. Marjet Braamskamp Departement of Vascular Medicine/ Pediatrics Lipidology in Pediatrics 24 September 2015

Problem #1 Neurological signs and symptoms of ciguatera poisoning as the start of treatment and 2.5 hours after treatment with mannitol.

1.4 - Linear Regression and MS Excel

Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 5 Residuals and multiple regression Introduction

EuroPRevent Risk assessment models: what is to come? Risk Assessment Models: Applications in Clinical Practice

SCATTER PLOTS AND TREND LINES

Simple Linear Regression

Familial hypercholesterolemia in childhood: diagnostics, therapeutical options and risk stratification Rodenburg, J.

First of two parts Joseph Hogan Brown University and AMPATH

AP STATISTICS 2010 SCORING GUIDELINES

Part 8 Logistic Regression

Controlling Bias & Confounding

Online Supplementary Appendix

Scatter Plots and Association

I. Identifying the question Define Research Hypothesis and Questions

Supplementary Online Content

Seasonal variation in blood pressure and its relationship with outdoor air temperature in 10 diverse regions of China: the China Kadoorie Biobank

Further Mathematics 2018 CORE: Data analysis Chapter 3 Investigating associations between two variables

An Introduction to Bayesian Statistics

BIVARIATE DATA ANALYSIS

Name: emergency please discuss this with the exam proctor. 6. Vanderbilt s academic honor code applies.

ORIGINAL INVESTIGATION. Relation of Triglyceride Levels, Fasting and Nonfasting, to Fatal and Nonfatal Coronary Heart Disease

Original Research Article

Regression CHAPTER SIXTEEN NOTE TO INSTRUCTORS OUTLINE OF RESOURCES

Endothelial function is impaired in women who had pre-eclampsia

M 140 Test 1 A Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

Biostatistics II

Results. Example 1: Table 2.1 The Effect of Additives on Daphnia Heart Rate. Time (min)

Simple Linear Regression the model, estimation and testing

Correlation and regression

Supplementary Appendix

Basic Biostatistics. Chapter 1. Content

University of Padova, Padua, Italy, and HARVEST Study Group, Italy

Transcription:

Statistical Reasoning in Public Health 2009 Biostatistics 612, Homework #2 1. Suppose it is the year 1985 and you are doing research on the differences in wages earned by men and women in the U.S. workforce. You gain access to a data set that contains information on a random sample of 534 U.S. workers surveyed in 1985. The data set contains information about the hourly wage (in U.S. dollars) and the sex of each of the workers surveyed, as well as information about each worker s age, union membership, and type of occupation (collapsed into 8 different categories). You decide to use linear regression to estimate unadjusted differences in the mean hourly wage for female workers as compared to males, as well as adjusted gender-wage differences, adjusted for various other worker characteristics. Below find the estimated coefficient for sex, along with its standard error, from 4 different linear regression models, all which include sex as a predictor. Linear Regression of Wages ($/hr) on Sex, and Other Predictors Estimated Regression Coefficient Predictors (xs) in (Slope) for Sex Standard Error of MODEL Model (1 = Female, 0 = Male) Slope for Sex A Sex -2.1 0.44 B Sex, Age -2.2 0.43 C Sex, Age, -2 0.43 Union Membership D Sex, Age, Union -1.9 0.43 Membership, Job Type a. What is the estimated unadjusted mean difference in hourly wages for females as compared to males? Give a 95% confidence interval for this difference. Write a sentence interpreting both the unadjusted mean difference and the corresponding confidence interval. (3 points) b. What is the estimated adjusted mean difference in hourly wages for females as compared to males, adjusting for age, union membership, and job type? Give a 95% confidence interval for this difference. Write a sentence interpreting both the adjusted mean difference and the corresponding confidence interval. (3 points) c. Comment on any disparities in the estimated mean difference in hourly wages between males and females in the four models whose results are listed above. Does it appear from these results that the wage/gender relationship in confounded by other worker characteristics such as worker age, membership in a union, and job type? Why or why not? (3 points) d. Use the results from Model D to estimate the mean difference in hourly wages for females, age 42, who are union members with manufacturing jobs, as compared to

42-year male union members with manufacturing jobs. (note that you have already done this in a previous portion of the problem I am just trying to drill into you how to interpret multiple linear regression coefficients.) (1 point) e. Does the given information allow you to assess whether the relationship between hourly wages and sex is modified by age? If not, what additional results would you need to see? (2 points) 2. Carotid artery intima-media thickness (IMT) is a measure of thickening in the arterial wall. Higher values are associated with the development of atherosclerosis (thickening and hardening of the arterial walls resulting in restricted blood flow). A study published in 2003 in the Journal of the American Medical Association 1 concerns potential risk factors associated with increased IMT. This study was a population based large study in Finland involving over 2,000 subjects. One of the analyses utilized multiple linear regression to assess the relationship between average IMT and other subject characteristics for persons in the age range 29-39 years. The results are presented in the following table taken directly from the article: The above results estimate a model of the form Where is estimated mean IMT (in mm), and the predictors are in units described in the table footnotes. a) According to the table footnotes, what unit did the authors use for age in the multiple linear regression? (1 point) 1 Raitakari O et al. Cardiovascular Risk Factors in Childhood and Carotid Artery Intima-Media Thickness in Adulthood: The Cardiovascular Risk in Young Finns Study. (2003) Journal of the American Medical Association, Vol 290 No 17. 2277-2283.

b) What is the estimated mean difference in IMT for two groups of persons who differ by one-year in age, adjusted for the other predictors in the model? (1 point) c) Estimated a 95% CI for the quantity estimated in part b. (1 point) d) Interpret the slope of sex in words. (1 point) e) Give a 95% CI for the slope of sex. (1 point) f) What additional information would you need to assess whether the relationship between IMT and sex is confounded by at least some of the additional predictors from the given multiple linear regression? (1 point) g) What additional information would you need to assess whether sex modifies the relationship between IMT and smoking, after adjusting for age, LDL, BMI and SBP? (1 point) h) Given the above results, can you ascertain whether the linear relationship between IMT and the six predictors in the regression is strong? Why or why not? (1 point) i) Suppose above results are used to compare average IMT between 39 year olds to 29 year old after adjusting for the other 5 predictors in the model what would be the estimated average difference in IMT? Compute a 95% confidence interval for this difference. (2 points) j) Would it be appropriate to use the above results to estimate the adjusted average difference in IMT levels for 80 year olds compared 70 year olds? Why or why not? (1 point) 3. Total lung capacity (TLC) is a key indicator of pulmonary function. TLC is important in lung transplantation because it is important for the donor s lungs to be similar to that of the recipient. We have data on pre-transplant TLC (liters) of 32 recipients of heart lung transplants, obtained by whole body plethysmography and their age (years, ranging from 11-52), sex (1=female, 0 =male), and height (cm, range 138-189) The data is also on a file and more details on how to access the data are on the course web page (see homework section of the web page). The necessary Stata commands for completing each part of this exercise appear at the end of this document. Also included on the course website is the Stata output you will get if you use the commands listed at the end of this document so you may do this assignment without using Stata directly. IMPORTANT: Please do not include any Stata output in your responses! If you wish to include graphics in your document, this is fine however, it is also fine just to describe what you see in a graph where asked. a. Graph the relationship between TLC and age in a scatterplot. Comment on the nature of the relationship between TLC and age. (1 point) b. Perform a simple linear regression of TLC on age. Are the results consistent with what you saw in the scatterplots? Interpret the estimated coefficient (slope) of age in a sentence. Report a 95% confidence interval for the (true) coefficient of age for this population. (3 points) c. Graph the relationship between TLC and height in a scatterplot. Comment on the nature of the relationship between TLC and height. (1 point) d. Perform a simple linear regression of TLC on height. Are the results consistent with what you saw in the scatterplots? Interpret the estimated coefficient (slope) of height in a sentence. Report a 95% confidence interval for the (true) coefficient of height for this population. (3 points)

e. Graph the relationship between TLC and patient s sex in a scatterplot. Is this a useful exploratory approach for assessing gender differences in TLC? Can you suggest other ways of exploring the relationship between a continuous outcome and a binary predictor? (1 point) f. Perform a simple linear regression of TLC on sex. Interpret the estimated coefficient (slope) of sex in a sentence. Report a 95% confidence interval for the (true) coefficient of sex for this population. (3 points) g. Now perform a multiple linear regression of TLC on height, age, and sex together. Interpret the slope estimates for height, age, and sex in words. (3 points) h. Which predictors are statistically significantly associated with TLC (α=.05) in the multiple linear regression model? (1 point) i. Compare the unadjusted relationship between TLC and sex, to the height and age adjusted association between TLC and sex. Is there any suggestion of confounding? Why/why not? (1 point) j. What is the R 2 value for the multiple regression model you fit in (g)? What is the the interpretation of this value? (1 point) k. Using the regression model results from part (g), estimate the mean TLC level for: a. 42 year old males, 170 cm tall (1 point) b. 35 year old females, 145 cm tall (1 point) l. Using the regression model results from part (g), estimate the mean difference in TLC between 50 year old females 150 cm tall, and 40 year old males 160 cm tall, (1 point) Parts (j) and (k) are extra credit! m. Create a scatterplot of TLC versus height separately for male and for females. Does the relationship between TLC and height appear similar for both sexes? (up to 2 point extra credit) n. Run a regression of TLC on height, sex, and an interaction between sex and height. Based on this result: (up to 3 points extra credit) a. What is the estimated mean difference in TLC for two groups of men who differ by 1 cm in height? b. What is the estimated difference between two groups of women who differ by 1 cm in height. c. Is there a statistically significant interaction between sex and height?

Sample Quiz Questions: Choose the correct answer from the following multiple choice question. Include a sentence or two justifying your answer choice. (1 point for each correct answer, 1 point for correct justification) The objective of a study is to understand the factors that are associated with systolic blood pressure in infants. Systolic blood pressure, weight (ounces) and age (days) are measured in 100 infants. A multiple linear regression is performed to predict blood pressure (mm Hg) from age and weight. The following results are presented in a journal article. (Questions 3-5 refer to these results) Multiple Linear Regression Analysis of the Predictors of Systolic Blood Pressure in Infants coefficients ( ) SE of Intercept 50. 4.0 Birth Weight 0.10 0.3 Age (days) 4.0 0.60 4. How much higher would you expect the blood pressure to be of an infant who weighed 120 ounces compared to an infant who weighed 90 ounces if both infants were of exactly the same age? a. 0.1 mm Hg b 1.0 mm Hg c. 2.0 mm Hg d. 3.0 mm Hg e. 4.0 mm Hg 5. Which of the following is a 95% confidence for the difference in SBP between two infants of the same weight who differ by 2 days in age (older compared to younger)? a. 2.8 mmhg to 5.2 mmhg b. 5.6 mmhg to 10.4 mmhg c. 6.8 mmhg to 9.6 mmhg d. 0.5 mmhg to 0.7 mmhg 6. Suppose the R 2 from the above regression model is.57, which means that roughly 57% of the variability in the infant s blood pressure measurements is explained by infant s age and weight. What would happen to this R 2 value, if weight had been recorded as kilograms instead of ounces? a. R 2 would go increase. b. R 2 would decrease c. R 2 would equal.57. d. Not enough information to determine.

Appendix : Stata Commands for Problem 3 If you are using STATA, the commands are as follows: a. twoway (scatter tlc age) b. regress tlc age c. twoway (scatter tlc height) d. regress tlc height e. twoway (scatter tlc sex) f. regress tlc sex g. regress tlc age height sex h l. no command necessary m. twoway (scatter tlc height) if sex == 0 twoway (scatter tlc height) if sex == 1 n. This is up to you to figure out if you want the extra credit. You can find information on how to do this in the notes. note that the == in commands j and i is actually two, adjacent equals (=) signs.