First of two parts Joseph Hogan Brown University and AMPATH

Similar documents
10. LINEAR REGRESSION AND CORRELATION

Joseph W Hogan Brown University & AMPATH February 16, 2010

Multiple Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 5 Residuals and multiple regression Introduction

Class 7 Everything is Related

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data

Chapter 3 CORRELATION AND REGRESSION

Multiple Linear Regression Analysis

2 Assumptions of simple linear regression

Statistics: A Brief Overview Part I. Katherine Shaver, M.S. Biostatistician Carilion Clinic

Poisson regression. Dae-Jin Lee Basque Center for Applied Mathematics.

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data

12.1 Inference for Linear Regression. Introduction

Section 3.2 Least-Squares Regression

The AB of Random Effects Models

Biostatistics II

STAT 201 Chapter 3. Association and Regression

Statistical Reasoning in Public Health 2009 Biostatistics 612, Homework #2

Lesson 1: Distributions and Their Shapes

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics

Reminders/Comments. Thanks for the quick feedback I ll try to put HW up on Saturday and I ll you

Complex Regression Models with Coded, Centered & Quadratic Terms

Multiple Linear Regression

Generalized Linear Models and Logistic Regression

Simple Linear Regression the model, estimation and testing

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto

STATISTICS 201. Survey: Provide this Info. How familiar are you with these? Survey, continued IMPORTANT NOTE. Regression and ANOVA 9/29/2013

Decomposition of the Genotypic Value

Overbidding and Heterogeneous Behavior in Contest Experiments: A Comment on the Endowment Effect Subhasish M. Chowdhury* Peter G.

Lab 4 (M13) Objective: This lab will give you more practice exploring the shape of data, and in particular in breaking the data into two groups.

Chapter 2 Interactions Between Socioeconomic Status and Components of Variation in Cognitive Ability

AP STATISTICS 2010 SCORING GUIDELINES

The Impact of Melamine Spiking on the Gel Strength and Viscosity of Gelatin

Content. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries. Research question. Example Newly diagnosed Type 2 Diabetes

Addendum: Multiple Regression Analysis (DRAFT 8/2/07)

Logistic regression. Department of Statistics, University of South Carolina. Stat 205: Elementary Statistics for the Biological and Life Sciences

Does Body Mass Index Adequately Capture the Relation of Body Composition and Body Size to Health Outcomes?

A response variable is a variable that. An explanatory variable is a variable that.

F1: Introduction to Econometrics

Multiple Regression Analysis

Interaction Effects: Centering, Variance Inflation Factor, and Interpretation Issues

EXPERIMENT 3 ENZYMATIC QUANTITATION OF GLUCOSE

Analysis of TB prevalence surveys

Sample Applied Skills Problems Grade 10 Math

Pharmacokinetics Overview

Available from Deakin Research Online:

26:010:557 / 26:620:557 Social Science Research Methods

Preliminary Report on Simple Statistical Tests (t-tests and bivariate correlations)

Chapter 3: Describing Relationships

Business Statistics Probability

Part 1. For each of the following questions fill-in the blanks. Each question is worth 2 points.

Diurnal Pattern of Reaction Time: Statistical analysis

CHAPTER 2: TWO-VARIABLE REGRESSION ANALYSIS: SOME BASIC IDEAS

Lecture 12: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

Sociology 593 Exam 2 March 28, 2003

Psychometrics in context: Test Construction with IRT. Professor John Rust University of Cambridge

7. Bivariate Graphing

Relationship of nighttime arousals and nocturnal cortisol in IBS and normal subjects. Miranda Bradford. A thesis submitted in partial fulfillment

Simple Linear Regression One Categorical Independent Variable with Several Categories

Caffeine & Calories in Soda. Statistics. Anthony W Dick

Supplement for: CD4 cell dynamics in untreated HIV-1 infection: overall rates, and effects of age, viral load, gender and calendar time.

Graphical assessment of internal and external calibration of logistic regression models by using loess smoothers

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do.

5 To Invest or not to Invest? That is the Question.

Instrumental Variables Estimation: An Introduction

APPENDIX D REFERENCE AND PREDICTIVE VALUES FOR PEAK EXPIRATORY FLOW RATE (PEFR)

M AXIMUM INGREDIENT LEVEL OPTIMIZATION WORKBOOK

AP Statistics Practice Test Ch. 3 and Previous

Unit 1 Exploring and Understanding Data

Pitfalls in Linear Regression Analysis

Genetic Algorithms and their Application to Continuum Generation

Centering Predictors

Ec331: Research in Applied Economics Spring term, Panel Data: brief outlines

MATH 2560 C F03 Elementary Statistics I LECTURE 6: Scatterplots (Continuation).

CHAPTER TWO MECHANISMS OF RADIATION EFFECTS

3.2A Least-Squares Regression

Bayesian Inference. Thomas Nichols. With thanks Lee Harrison

Analytic Strategies for the OAI Data

AP Statistics. Semester One Review Part 1 Chapters 1-5

Regression so far... Lecture 22 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102

Chapter 3: Examining Relationships

PubH 7405: REGRESSION ANALYSIS

m 11 m.1 > m 12 m.2 risk for smokers risk for nonsmokers

The Nature of Regression Analysis

Co-Variation in Sexual and Non-Sexual Risk Behaviors Over Time Among U.S. High School Students:

Modeling Time-Dependent Association in Longitudinal Data: A Lag as Moderator Approach

1 Simple and Multiple Linear Regression Assumptions

6. Unusual and Influential Data

Background. 2 5/30/2017 Company Confidential 2015 Eli Lilly and Company

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

Lecture 6B: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

Graphical Exploration of Statistical Interactions. Nick Jackson University of Southern California Department of Psychology 10/25/2013

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

Part 8 Logistic Regression

Midterm project due next Wednesday at 2 PM

A Short Primer on Power Calculations for Meta-analysis

Introduction to regression

The Pretest! Pretest! Pretest! Assignment (Example 2)

3.2 Least- Squares Regression

Transcription:

First of two parts Joseph Hogan Brown University and AMPATH

Overview What is regression? Does regression have to be linear? Case study: Modeling the relationship between weight and CD4 count Exploratory analysis Linear model (???) Nonlinear models: quadratic and exponential Summary and practical recommendations

What is regression? One explanatory variable Functional form is E(Y X=x) = g(x) Translation: the average value of Y when X=x follows the function g(x)

What is the correct function? Most of the time, regression thought of as linear E(Y X=x) = α + βx Most of the time, life is not linear! Examples Growth over time PK/PD characteristics of a new drug

Example: Weight vs CD4 1,100 individuals initiating cart At cart initiation, measure weight and CD4 count What is the relationship between these variables?

20 40 80 100 weight (kg) 02 24 46 68 810 20 40 weight (kg) 80 100 0 2 4 6 8 10

Fitting a model Objective: fit a model to characterize this relationship Steps Explore relationship without a model Find a function that best characterizes the relationship Interpret the model Try not to be confined to linearity

Exploration Can use model- free methods to estimate the functional relationship Key idea: Within windows of X, compute mean of Y Move the window a little bit at a time Connect the dots in a smooth way Tool: LOWESS (LOcally WEighted regression) Available in Stata

20 40 80 100 weight (kg) 02 24 46 68 810 20 40 weight (kg) 80 100 0 2 4 6 8 10

100 Lowess smoother Lowess smoother bandwidth =.4 20 40 80 100 weight (kg) 02 24 46 68 810 20 40 weight (kg) 80 0 bandwidth =.4 2 4 6 8 10

50 55 65 70 02 24 46 68 810 Fitted values lowess weight cd4_100 50 55 65 70 0 2 4 6 8 10 Fitted values lowess weight cd4_100

Linear regression (?) Regress Y on X (weight on CD4) Assumes linear relationship E(Y X=x) = α + βx α = intercept (weight when CD4 = 0) β = slope (difference in mean weight for those who differ by 100 CD4 units)

Fitted linear regression α = 53.7 β = 1.21 SE =.14 p <.001 How well does this model fit?

50 55 65 70 02 24 46 68 810 Fitted values lowess weight cd4_100 50 55 65 70 0 2 4 6 8 10 Fitted values lowess weight cd4_100

50 55 65 70 02 24 46 68 810 Fitted values lpred1 upred1 lowess weight cd4_100 50 55 65 70 0 2 4 6 8 10 Fitted values upred1 lpred1 lowess weight cd4_100

Brief digression What does the slope mean in a linear model? β = 1.21 SE =.14 p <.001 (a) If CD4 changes by 100 units, weight will change by 1.21 kg

Brief digression What does the slope mean in a linear model? β = 1.21 SE =.14 p <.001 (b) Two individuals who differ by 100 CD4 units will differ, on average, by 1.21 kg in weight

Correct answer is (b) First interpretation assumes that if we increase CD4 by 100 units, we will increase that person s weight by 1.21 kg Longitudinal comparison; requires repeated measures Second interpretation compares weights of separate individuals who differ by 100 units in CD4 count. Compares different individuals at a single point in time

Think outside the line Quadratic model Adds curvature Can be restrictive Exponential model Useful for capturing leveling off behavior

Quadratic model g(x) = α + β 1 X + β 2 X 2 Applied to CD4 and weight: E(Wt CD4) = α + β 1 CD4 + β 2 CD4 2

Fitted quadratic model Linear term β 1 = 2.0 SE =.21 p <.001 On average, higher CD4 associated with higher weight Quadratic term β 2 = - 0.26 SE =.05 p <.001 Implies negative curvature

50 55 65 02 24 46 68 810 Quadratic Model l2u2 lowess weight cd4_100 50 55 65 0 2 4 6 8 10 Quadratic Model u2 l2 lowess weight cd4_100

Quadratic model assessment Technically the model fits OK Does not capture leveling off Degree of curvature implies highest CD4 associated with lower weights

Exponential model Form of regression is nonlinear g(x) = α + β ϕ x In terms of wt and CD4 E(Wt CD4) = α + β ϕ CD4

Interpretation E(Wt CD4) = α + β ϕ CD4 α = leveling off point for Weight at high CD4 β = difference in Wt between those with CD4 near zero and the leveling off point ϕ = how fast Wt reaches its limiting value Quickly if near zero Slowly if near 1

Fitted model α = leveling- off Wt for large CD4 values Estimate =.4 kg β = difference in Wt between CD4 near zero and leveling- off point Estimate = - 9.9 kg ϕ = how fast Wt reaches its leveling off point Estimate = 0.59

50 55 65 02 24 46 68 810 Exponential Model lowess weight cd4_100 lower 95% upper bound: wtpred3 50 55 65 0 2 4 6 8 10 Exponential Model 95% lower bound: wtpred3 lowess weight cd4_100 95% upper bound: wtpred3

Summary Regression characterizes mean value of Y as function of X Today s example: Y = Weight in kg X = CD4 count Regression is a very broad topic Today s theme: think outside the line

Practical suggestions Use scatterplots and exploratory analysis Use LOWESS curves to approximate relationships in the scatterplot If relationship nonlinear, should take this into account Especially important for predictive models If you want to predict Wt from CD4 count

Next lecture Multiple regression (more than one predictor) Focus: Analysis of change from baseline Adjusting for one or more variables when testing hypothesis about a primary variable