4Stat Wk 10: Regression

Size: px
Start display at page:

Download "4Stat Wk 10: Regression"

Transcription

1 4Stat Wk 10: Regression Loading data with datalines Regression (Proc glm) - with interactions - with polynomial terms - with categorical variables (Proc glmselect) - with model selection (this is mostly chapter 6 material) Stat 342 Notes. Week 10 Page 1 / 57

2 In Week 8, we saw correlations, which are the first step to regression. In Week 9, we saw ANOVA, but treated like a regression on categorical variables. This week we look at a suite of examples surrounding regression and PROC GLM. As time permits, we will also look at t-tests and power analysis. Stat 342 Notes. Week 10 Page 2 / 57

3 First, let's load up the 'mtcars' dataset. Rather than relying on a.csv file, let's try loading it in through a data step and the DATALINES command. The advantage of loading text this way is... 1) It can be done without knowing in advance the folder structure of your system. 2) Complete control over how variables are interpreted. Stat 342 Notes. Week 10 Page 3 / 57

4 Stat 342 Notes. Week 10 Page 4 / 57

5 LENGTH Make $ 10. Model $ 22.; Establish the variables 'make' and 'model' to 10 and 22 characters long, respectively. If this not done, SAS will assume that the variables are 8 characters long and will cut off anything after that. INFILE DATALINES TRUNCOVER; The file source isn't an external file, but a set of data lines to be written later in this data step. Stat 342 Notes. Week 10 Page 5 / 57

6 INFILE DATALINES TRUNCOVER; 'TRUNCOVER' is short for TRUNCate OVER missing, meaning that every new line of the datalines is considered a new line of dataset that will be made. Other options include 'missover' (similar), and the default 'flowover' which keeps filling variables even after a new line has been started. Stat 342 Notes. Week 10 Page 6 / 57

7 INPUT Make $ Model $ mpg... Take the following datalines and put them in the variables 'make' (character/string), 'model' (character string), 'mpg' (numeric),... and so on. Every space is a new variable. You could also tell SAS to put two words (with a space) into 'model' with &, such as INPUT Model & $ Stat 342 Notes. Week 10 Page 7 / 57

8 DATALINES Mazda RX Mazda RX4_Wag Volvo 142E ; The actual data to be entered. Only one semicolon is used at the very end of the data. (If you need a semicolon IN the data somewhere you can use an escape sequence, like \; ) Stat 342 Notes. Week 10 Page 8 / 57

9 Finally, I wanted the company ('make') to show up in the model category as well, so I concatenated make and model together (and put the result into model). Stat 342 Notes. Week 10 Page 9 / 57

10 To concatenate two strings means to take one and put it on the end of the other. Three or more strings can also be concatenated. DATA mtcars; SET mtcars; model = cat(make,model); run; Stat 342 Notes. Week 10 Page 10 / 57

11 The result: Stat 342 Notes. Week 10 Page 11 / 57

12 <break> Stat 342 Notes. Week 10 Page 12 / 57

13 Now let's dig into the actual regression, starting with a simple one: fuel economy vs weight. proc glm data = mtcars; run; model mpg = weight / solution; The SOLUTION option for the model tells SAS to print the estimates of the intercept and slope coefficients. Without this, we got much simpler model summaries Stat 342 Notes. Week 10 Page 13 / 57

14 Stat 342 Notes. Week 10 Page 14 / 57

15 For simple regression, we also get a scatterplot with a line of best fit (i.e the least-squares line, the regression line) with two bands around it: The inner band (shaded) shows the confidence limits of the MEAN, also called the confidence interval. This is where the line COULD BE if we incorporated the variance of the coefficients. (95% of the time) The outer band (dotted lines) shows the confidence limits of INDIVIDUAL PREDICTIONS. This is where new data points could be if we predicted them from this model (95% again). Stat 342 Notes. Week 10 Page 15 / 57

16 Stat 342 Notes. Week 10 Page 16 / 57

17 The 'clparm' option gives you the confidence limits of the parameters. You can use alpha to change the confidence level of these limits, as well as the confidence bands. model mpg = weight / solution clparm; Stat 342 Notes. Week 10 Page 17 / 57

18 You can use alpha to change the confidence level of these limits, as well as the confidence bands. model mpg = weight / solution clparm alpha=0.01; Stat 342 Notes. Week 10 Page 18 / 57

19 Other options, like p and clm add predictions and confidence limits of the mean response for each observation. These are output into a separate table. proc glm data = mtcars; run; model mpg = weight / p clm; Stat 342 Notes. Week 10 Page 19 / 57

20 Stat 342 Notes. Week 10 Page 20 / 57

21 ...and this table can be appended to the existing dataset so you can do further processing. proc glm data = mtcars; model mpg = weight / p clm; output out = mtcars_model P=predicted_mpg R=residual_mpg; run; Stat 342 Notes. Week 10 Page 21 / 57

22 (SAS demo) Stat 342 Notes. Week 10 Page 22 / 57

23 ...such as comparing residuals to predicted values, which is very useful for detecting unequal variance. The most typical sign of unequal variance to see in this plot is a fan or a code shape. proc sgplot data=mtcars_model; scatter x=predicted_mpg y=residual_mpg; ellipse x=predicted_mpg y=residual_mpg; run; Stat 342 Notes. Week 10 Page 23 / 57

24 Stat 342 Notes. Week 10 Page 24 / 57

25 All of these options just tell SAS to add them to the list of things to calculate and/or include in the output. You can use all of them together. One drawback/feature is that 'alpha' will apply to ALL the output of that model. proc glm data = mtcars; run; model mpg = weight / alpha = solution clparm p clm; Stat 342 Notes. Week 10 Page 25 / 57

26 <break image> Stat 342 Notes. Week 10 Page 26 / 57

27 Let's try a more sophisticated model of fuel economy. Instead of just looking at the weight of a car, let's also look at its horsepower and displacement. proc glm data = mtcars; run; model mpg = weight hp displacement / solution; Stat 342 Notes. Week 10 Page 27 / 57

28 What if there's an interaction between weight and horsepower? We can include an interaction term with an asterisk. Note that I've explicitly included the main effects 'weight' and 'hp' in here as well. This is good statistical practice. proc glm data = mtcars; model mpg = weight hp weight*hp displacement / solution; run; Stat 342 Notes. Week 10 Page 28 / 57

29 (SAS demo, comparing these two models) (Document camera work) Stat 342 Notes. Week 10 Page 29 / 57

30 What about polynomial terms? Option one is to make an interaction of a variable with itself. The following code will let you see how the fuel economy of a car changes with horsepower AND with horsepower squared. proc glm data = mtcars; run; model mpg = hp hp*hp / solution; Stat 342 Notes. Week 10 Page 30 / 57

31 However, other mathematical functions won't work in the model statement. proc glm data = mtcars; run; model mpg = hp hp**2 / solution;...not even premade ones proc glm data = mtcars; model mpg = hp sqrt(hp) / solution; run; Stat 342 Notes. Week 10 Page 31 / 57

32 To regress against transformations of variables, or polynomial terms of variables, you need to create these transformations with a data step. data mtcars; set mtcars; hp2 = hp**2; hp_sqrt = sqrt(hp); hp_log = log(hp); run; Stat 342 Notes. Week 10 Page 32 / 57

33 Then you can regress against these proc glm data = mtcars; model mpg = hp hp2 hp_sqrt hp_log / solution; run; Stat 342 Notes. Week 10 Page 33 / 57

34 What about categorical variables, like number of cylinders? We have cars with 4, 6, or 8 cylinders, but it doesn't make sense to treat this as a continuous variable. Predicting the fuel economy of a car with 5.5 cylinders is meaningless, because no such car exists. This is where the CLASS statement from last week comes back into play. Stat 342 Notes. Week 10 Page 34 / 57

35 We need to specify to SAS which variables are categorical. After that, we can those variables like any other in a model. Each category is the amount the mean response is increased or decreased for observations in that category. All else being equal. proc glm data = mtcars; class cylinders; model mpg = weight cylinders / solution; run; Stat 342 Notes. Week 10 Page 35 / 57

36 Stat 342 Notes. Week 10 Page 36 / 57

37 (document camera work) Stat 342 Notes. Week 10 Page 37 / 57

38 We can even include interactions between numeric and categorical variables. This will produce a separate slope coefficient under each category. proc glm data = mtcars; class cylinders; model mpg = weight cylinders weight*cylinders / solution; run; Stat 342 Notes. Week 10 Page 38 / 57

39 Stat 342 Notes. Week 10 Page 39 / 57

40 Note that the LAST category is considered the baseline. This is the opposite of R. Stat 342 Notes. Week 10 Page 40 / 57

41 (Document camera work) Stat 342 Notes. Week 10 Page 41 / 57

42 As with two-way (or multi-way) ANOVA, we can include more than one categorical variable proc glm data = mtcars; class cylinders; model mpg = weight hp cylinders weight*cylinders hp*cylinders / solution; run; Stat 342 Notes. Week 10 Page 42 / 57

43 Stat 342 Notes. Week 10 Page 43 / 57

44 PROC GLM is very flexible, but also very generalist. For more detailed results from regression, you can use PROC REG, which includes options like... cross-validation (Does a model derived from part of your data fit 'new' observations from the rest of your data?) Stat 342 Notes. Week 10 Page 44 / 57

45 model selection (is the model you're using now the best one? How do I efficiently compare many different models?) Diagnostics (are some of my observations overly influential?) PROC REG lacks generalization, however. It's designed for simple and multiple regression where all the explanatory variables are continuous. Stat 342 Notes. Week 10 Page 45 / 57

46 To incorporate categorical data, we would need to manually create dummy variables from categories using a data step. Stat 342 Notes. Week 10 Page 46 / 57

47 However... PROC GLMSELECT has the advantages of both. Stat 342 Notes. Week 10 Page 47 / 57

48 Model selection methods aim to find models that do two things Fit the data well. That is, models with small residuals, high r-squared, and low root-mean-square-error (RMSE). 2. Describe the data simply / parsimoniously. This means having few terms in the model, estimating few parameters, and using few degrees of freedom. Stat 342 Notes. Week 10 Page 48 / 57

49 Stat 342 Notes. Week 10 Page 49 / 57

50 We can take our previous model of weight, horsepower, cylinders, and two interaction terms and apply a model selection method called 'stepwise' to determine if this model is the best. proc glmselect data = mtcars; class cylinders; model mpg = weight hp cylinders weight*cylinders hp*cylinders / selection=stepwise(select = AIC); run; Stat 342 Notes. Week 10 Page 50 / 57

51 Stat 342 Notes. Week 10 Page 51 / 57

52 Stat 342 Notes. Week 10 Page 52 / 57

53 'stepwise' is just one method of model selection. It's popular, it's a nice combination of 'forward selection' and 'backwards elimination', but it's outdated. A much more popular method these days is LASSO, which is also available in SAS with... Selection=lasso (But requires a special package in R). The LASSO method can handle HUNDREDS of different variables at once, even if there are more variables than observations! Stat 342 Notes. Week 10 Page 53 / 57

54 Likewise, Akaike Information Criterion (AIC) is just one criterion of selection for models. Other option include: BIC (higher preference towards simpler models, if you sample size is large) AICc (AIC with a cross-validation adjustment) ADJRSQ (Adjusted R-squared. You typical coefficient of determination with a penalty per term) Stat 342 Notes. Week 10 Page 54 / 57

55 plots = criterionpanel will show you if the other criteria agree. Stat 342 Notes. Week 10 Page 55 / 57

56 Try something with LOTS of variables. proc glmselect data = mtcars plots=criterionpanel; class cylinders gear; model mpg = weight hp cylinders weight*cylinders hp*cylinders hp*gear displacement*gear weight*weight / selection=lasso; run; Stat 342 Notes. Week 10 Page 56 / 57

57 Additional Proc GLMSELECT slides from nt/dam/sas/en_ca/user %20Group%20Presentations/Winnipeg-User- Group/SylvainTremblay-PROCGLMSELECT-Spring2012.pdf GLMSELECT for Model Selection Winnipeg SAS User Group Meeting May 11, 2012 Sylvain Tremblay SAS Canada Education Stat 342 Notes. Week 10 Page 57 / 57

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality Week 9 Hour 3 Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality Stat 302 Notes. Week 9, Hour 3, Page 1 / 39 Stepwise Now that we've introduced interactions,

More information

Week 8 Hour 1: More on polynomial fits. The AIC. Hour 2: Dummy Variables what are they? An NHL Example. Hour 3: Interactions. The stepwise method.

Week 8 Hour 1: More on polynomial fits. The AIC. Hour 2: Dummy Variables what are they? An NHL Example. Hour 3: Interactions. The stepwise method. Week 8 Hour 1: More on polynomial fits. The AIC Hour 2: Dummy Variables what are they? An NHL Example Hour 3: Interactions. The stepwise method. Stat 302 Notes. Week 8, Hour 1, Page 1 / 34 Human growth

More information

Stat Wk 9: Hypothesis Tests and Analysis

Stat Wk 9: Hypothesis Tests and Analysis Stat 342 - Wk 9: Hypothesis Tests and Analysis Crash course on ANOVA, proc glm Stat 342 Notes. Week 9 Page 1 / 57 Crash Course: ANOVA AnOVa stands for Analysis Of Variance. Sometimes it s called ANOVA,

More information

General Example: Gas Mileage (Stat 5044 Schabenberger & J.P.Morgen)

General Example: Gas Mileage (Stat 5044 Schabenberger & J.P.Morgen) General Example: Gas Mileage (Stat 5044 Schabenberger & J.P.Morgen) From Motor Trend magazine data were obtained for n=32 cars on the following variables: Y= Gas Mileage (miles per gallon, MPG) X1= Engine

More information

Problem #1 Neurological signs and symptoms of ciguatera poisoning as the start of treatment and 2.5 hours after treatment with mannitol.

Problem #1 Neurological signs and symptoms of ciguatera poisoning as the start of treatment and 2.5 hours after treatment with mannitol. Ho (null hypothesis) Ha (alternative hypothesis) Problem #1 Neurological signs and symptoms of ciguatera poisoning as the start of treatment and 2.5 hours after treatment with mannitol. Hypothesis: Ho:

More information

Lab 8: Multiple Linear Regression

Lab 8: Multiple Linear Regression Lab 8: Multiple Linear Regression 1 Grading the Professor Many college courses conclude by giving students the opportunity to evaluate the course and the instructor anonymously. However, the use of these

More information

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys Multiple Regression Analysis 1 CRITERIA FOR USE Multiple regression analysis is used to test the effects of n independent (predictor) variables on a single dependent (criterion) variable. Regression tests

More information

Bangor University Laboratory Exercise 1, June 2008

Bangor University Laboratory Exercise 1, June 2008 Laboratory Exercise, June 2008 Classroom Exercise A forest land owner measures the outside bark diameters at.30 m above ground (called diameter at breast height or dbh) and total tree height from ground

More information

Problem 1) Match the terms to their definitions. Every term is used exactly once. (In the real midterm, there are fewer terms).

Problem 1) Match the terms to their definitions. Every term is used exactly once. (In the real midterm, there are fewer terms). Problem 1) Match the terms to their definitions. Every term is used exactly once. (In the real midterm, there are fewer terms). 1. Bayesian Information Criterion 2. Cross-Validation 3. Robust 4. Imputation

More information

Linear Regression in SAS

Linear Regression in SAS 1 Suppose we wish to examine factors that predict patient s hemoglobin levels. Simulated data for six patients is used throughout this tutorial. data hgb_data; input id age race $ bmi hgb; cards; 21 25

More information

Stat Wk 8: Continuous Data

Stat Wk 8: Continuous Data Stat 342 - Wk 8: Continuous Data proc iml Loading and saving to datasets proc means proc univariate proc sgplot proc corr Stat 342 Notes. Week 3, Page 1 / 71 PROC IML - Reading other datasets. If you want

More information

Notes for laboratory session 2

Notes for laboratory session 2 Notes for laboratory session 2 Preliminaries Consider the ordinary least-squares (OLS) regression of alcohol (alcohol) and plasma retinol (retplasm). We do this with STATA as follows:. reg retplasm alcohol

More information

Answer to exercise: Growth of guinea pigs

Answer to exercise: Growth of guinea pigs Answer to exercise: Growth of guinea pigs The effect of a vitamin E diet on the growth of guinea pigs is investigated in the following way: In the beginning of week 1, 10 animals received a growth inhibitor.

More information

Lab 4 (M13) Objective: This lab will give you more practice exploring the shape of data, and in particular in breaking the data into two groups.

Lab 4 (M13) Objective: This lab will give you more practice exploring the shape of data, and in particular in breaking the data into two groups. Lab 4 (M13) Objective: This lab will give you more practice exploring the shape of data, and in particular in breaking the data into two groups. Activity 1 Examining Data From Class Background Download

More information

ANOVA. Thomas Elliott. January 29, 2013

ANOVA. Thomas Elliott. January 29, 2013 ANOVA Thomas Elliott January 29, 2013 ANOVA stands for analysis of variance and is one of the basic statistical tests we can use to find relationships between two or more variables. ANOVA compares the

More information

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES 24 MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES In the previous chapter, simple linear regression was used when you have one independent variable and one dependent variable. This chapter

More information

Applied Medical. Statistics Using SAS. Geoff Der. Brian S. Everitt. CRC Press. Taylor Si Francis Croup. Taylor & Francis Croup, an informa business

Applied Medical. Statistics Using SAS. Geoff Der. Brian S. Everitt. CRC Press. Taylor Si Francis Croup. Taylor & Francis Croup, an informa business Applied Medical Statistics Using SAS Geoff Der Brian S. Everitt CRC Press Taylor Si Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Croup, an informa business A

More information

Multiple Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Multiple Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Multiple Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Multiple Regression 1 / 19 Multiple Regression 1 The Multiple

More information

Part 8 Logistic Regression

Part 8 Logistic Regression 1 Quantitative Methods for Health Research A Practical Interactive Guide to Epidemiology and Statistics Practical Course in Quantitative Data Handling SPSS (Statistical Package for the Social Sciences)

More information

Data Science and Statistics in Research: unlocking the power of your data

Data Science and Statistics in Research: unlocking the power of your data Data Science and Statistics in Research: unlocking the power of your data Session 1.4: Data and variables 1/ 33 OUTLINE Types of data Types of variables Presentation of data Tables Summarising Data 2/

More information

Hour 2: lm (regression), plot (scatterplots), cooks.distance and resid (diagnostics) Stat 302, Winter 2016 SFU, Week 3, Hour 1, Page 1

Hour 2: lm (regression), plot (scatterplots), cooks.distance and resid (diagnostics) Stat 302, Winter 2016 SFU, Week 3, Hour 1, Page 1 Agenda for Week 3, Hr 1 (Tuesday, Jan 19) Hour 1: - Installing R and inputting data. - Different tools for R: Notepad++ and RStudio. - Basic commands:?,??, mean(), sd(), t.test(), lm(), plot() - t.test()

More information

Chapter 3: Examining Relationships

Chapter 3: Examining Relationships Name Date Per Key Vocabulary: response variable explanatory variable independent variable dependent variable scatterplot positive association negative association linear correlation r-value regression

More information

STAT 201 Chapter 3. Association and Regression

STAT 201 Chapter 3. Association and Regression STAT 201 Chapter 3 Association and Regression 1 Association of Variables Two Categorical Variables Response Variable (dependent variable): the outcome variable whose variation is being studied Explanatory

More information

Review Questions in Introductory Knowledge... 37

Review Questions in Introductory Knowledge... 37 Table of Contents Preface..... 17 About the Authors... 19 How This Book is Organized... 20 Who Should Buy This Book?... 20 Where to Find Answers to Review Questions and Exercises... 20 How to Report Errata...

More information

Chapter 3 Software Packages to Install How to Set Up Python Eclipse How to Set Up Eclipse... 42

Chapter 3 Software Packages to Install How to Set Up Python Eclipse How to Set Up Eclipse... 42 Table of Contents Preface..... 21 About the Authors... 23 Acknowledgments... 24 How This Book is Organized... 24 Who Should Buy This Book?... 24 Where to Find Answers to Review Questions and Exercises...

More information

The Association Design and a Continuous Phenotype

The Association Design and a Continuous Phenotype PSYC 5102: Association Design & Continuous Phenotypes (4/4/07) 1 The Association Design and a Continuous Phenotype The purpose of this note is to demonstrate how to perform a population-based association

More information

BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA

BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA PART 1: Introduction to Factorial ANOVA ingle factor or One - Way Analysis of Variance can be used to test the null hypothesis that k or more treatment or group

More information

Today: Binomial response variable with an explanatory variable on an ordinal (rank) scale.

Today: Binomial response variable with an explanatory variable on an ordinal (rank) scale. Model Based Statistics in Biology. Part V. The Generalized Linear Model. Single Explanatory Variable on an Ordinal Scale ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10,

More information

Professor Rose-Helleknat's PCR Data for Breast Cancer Study

Professor Rose-Helleknat's PCR Data for Breast Cancer Study Professor Rose-Helleknat's PCR Data for Breast Cancer Study Summary statistics for Crossing Point, cp = - log 2 (RNA) Obs Treatment Outcome n Mean Variance St_Deviation St_Error 1 Placebo Cancer 7 21.4686

More information

Intro to SPSS. Using SPSS through WebFAS

Intro to SPSS. Using SPSS through WebFAS Intro to SPSS Using SPSS through WebFAS http://www.yorku.ca/computing/students/labs/webfas/ Try it early (make sure it works from your computer) If you need help contact UIT Client Services Voice: 416-736-5800

More information

Lecture 12: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

Lecture 12: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression Lecture 12: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression Equation of Regression Line; Residuals Effect of Explanatory/Response Roles Unusual Observations Sample

More information

A response variable is a variable that. An explanatory variable is a variable that.

A response variable is a variable that. An explanatory variable is a variable that. Name:!!!! Date: Scatterplots The most common way to display the relation between two quantitative variable is a scatterplot. Statistical studies often try to show through scatterplots, that changing one

More information

Inferential Statistics

Inferential Statistics Inferential Statistics and t - tests ScWk 242 Session 9 Slides Inferential Statistics Ø Inferential statistics are used to test hypotheses about the relationship between the independent and the dependent

More information

Preliminary Report on Simple Statistical Tests (t-tests and bivariate correlations)

Preliminary Report on Simple Statistical Tests (t-tests and bivariate correlations) Preliminary Report on Simple Statistical Tests (t-tests and bivariate correlations) After receiving my comments on the preliminary reports of your datasets, the next step for the groups is to complete

More information

Chapter 9. Factorial ANOVA with Two Between-Group Factors 10/22/ Factorial ANOVA with Two Between-Group Factors

Chapter 9. Factorial ANOVA with Two Between-Group Factors 10/22/ Factorial ANOVA with Two Between-Group Factors Chapter 9 Factorial ANOVA with Two Between-Group Factors 10/22/2001 1 Factorial ANOVA with Two Between-Group Factors Recall that in one-way ANOVA we study the relation between one criterion variable and

More information

Benchmark Dose Modeling Cancer Models. Allen Davis, MSPH Jeff Gift, Ph.D. Jay Zhao, Ph.D. National Center for Environmental Assessment, U.S.

Benchmark Dose Modeling Cancer Models. Allen Davis, MSPH Jeff Gift, Ph.D. Jay Zhao, Ph.D. National Center for Environmental Assessment, U.S. Benchmark Dose Modeling Cancer Models Allen Davis, MSPH Jeff Gift, Ph.D. Jay Zhao, Ph.D. National Center for Environmental Assessment, U.S. EPA Disclaimer The views expressed in this presentation are those

More information

Dan Byrd UC Office of the President

Dan Byrd UC Office of the President Dan Byrd UC Office of the President 1. OLS regression assumes that residuals (observed value- predicted value) are normally distributed and that each observation is independent from others and that the

More information

CHAPTER TWO REGRESSION

CHAPTER TWO REGRESSION CHAPTER TWO REGRESSION 2.0 Introduction The second chapter, Regression analysis is an extension of correlation. The aim of the discussion of exercises is to enhance students capability to assess the effect

More information

HZAU MULTIVARIATE HOMEWORK #2 MULTIPLE AND STEPWISE LINEAR REGRESSION

HZAU MULTIVARIATE HOMEWORK #2 MULTIPLE AND STEPWISE LINEAR REGRESSION HZAU MULTIVARIATE HOMEWORK #2 MULTIPLE AND STEPWISE LINEAR REGRESSION Using the malt quality dataset on the class s Web page: 1. Determine the simple linear correlation of extract with the remaining variables.

More information

Multiple Linear Regression Analysis

Multiple Linear Regression Analysis Revised July 2018 Multiple Linear Regression Analysis This set of notes shows how to use Stata in multiple regression analysis. It assumes that you have set Stata up on your computer (see the Getting Started

More information

Using SPSS for Correlation

Using SPSS for Correlation Using SPSS for Correlation This tutorial will show you how to use SPSS version 12.0 to perform bivariate correlations. You will use SPSS to calculate Pearson's r. This tutorial assumes that you have: Downloaded

More information

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

3 CONCEPTUAL FOUNDATIONS OF STATISTICS 3 CONCEPTUAL FOUNDATIONS OF STATISTICS In this chapter, we examine the conceptual foundations of statistics. The goal is to give you an appreciation and conceptual understanding of some basic statistical

More information

7) Briefly explain why a large value of r 2 is desirable in a regression setting.

7) Briefly explain why a large value of r 2 is desirable in a regression setting. Directions: Complete each problem. A complete problem has not only the answer, but the solution and reasoning behind that answer. All work must be submitted on separate pieces of paper. 1) Manatees are

More information

Section 6: Analysing Relationships Between Variables

Section 6: Analysing Relationships Between Variables 6. 1 Analysing Relationships Between Variables Section 6: Analysing Relationships Between Variables Choosing a Technique The Crosstabs Procedure The Chi Square Test The Means Procedure The Correlations

More information

HW 3.2: page 193 #35-51 odd, 55, odd, 69, 71-78

HW 3.2: page 193 #35-51 odd, 55, odd, 69, 71-78 35. What s My Line? You use the same bar of soap to shower each morning. The bar weighs 80 grams when it is new. Its weight goes down by 6 grams per day on average. What is the equation of the regression

More information

isc ove ring i Statistics sing SPSS

isc ove ring i Statistics sing SPSS isc ove ring i Statistics sing SPSS S E C O N D! E D I T I O N (and sex, drugs and rock V roll) A N D Y F I E L D Publications London o Thousand Oaks New Delhi CONTENTS Preface How To Use This Book Acknowledgements

More information

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018 Introduction to Machine Learning Katherine Heller Deep Learning Summer School 2018 Outline Kinds of machine learning Linear regression Regularization Bayesian methods Logistic Regression Why we do this

More information

Reveal Relationships in Categorical Data

Reveal Relationships in Categorical Data SPSS Categories 15.0 Specifications Reveal Relationships in Categorical Data Unleash the full potential of your data through perceptual mapping, optimal scaling, preference scaling, and dimension reduction

More information

3.2 Least- Squares Regression

3.2 Least- Squares Regression 3.2 Least- Squares Regression Linear (straight- line) relationships between two quantitative variables are pretty common and easy to understand. Correlation measures the direction and strength of these

More information

Chapter 3: Describing Relationships

Chapter 3: Describing Relationships Chapter 3: Describing Relationships Objectives: Students will: Construct and interpret a scatterplot for a set of bivariate data. Compute and interpret the correlation, r, between two variables. Demonstrate

More information

A macro of building predictive model in PROC LOGISTIC with AIC-optimal variable selection embedded in cross-validation

A macro of building predictive model in PROC LOGISTIC with AIC-optimal variable selection embedded in cross-validation SESUG Paper AD-36-2017 A macro of building predictive model in PROC LOGISTIC with AIC-optimal variable selection embedded in cross-validation Hongmei Yang, Andréa Maslow, Carolinas Healthcare System. ABSTRACT

More information

Simple Linear Regression One Categorical Independent Variable with Several Categories

Simple Linear Regression One Categorical Independent Variable with Several Categories Simple Linear Regression One Categorical Independent Variable with Several Categories Does ethnicity influence total GCSE score? We ve learned that variables with just two categories are called binary

More information

Section 3.2 Least-Squares Regression

Section 3.2 Least-Squares Regression Section 3.2 Least-Squares Regression Linear relationships between two quantitative variables are pretty common and easy to understand. Correlation measures the direction and strength of these relationships.

More information

Introduction to regression

Introduction to regression Introduction to regression Regression describes how one variable (response) depends on another variable (explanatory variable). Response variable: variable of interest, measures the outcome of a study

More information

Lecture 12 Cautions in Analyzing Associations

Lecture 12 Cautions in Analyzing Associations Lecture 12 Cautions in Analyzing Associations MA 217 - Stephen Sawin Fairfield University August 8, 2017 Cautions in Linear Regression Three things to be careful when doing linear regression we have already

More information

Lecture 6B: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

Lecture 6B: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression Lecture 6B: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression! Equation of Regression Line; Residuals! Effect of Explanatory/Response Roles! Unusual Observations! Sample

More information

Today s Agenda Wk 1 - Welcome to Stat 342

Today s Agenda Wk 1 - Welcome to Stat 342 Today s Agenda Wk 1 - Welcome to Stat 342 - Policy - Some motivation - Course Schedule - Installing SAS - Resources available Stat 342 Notes. Week 1, Page 1 / 49 Contact: E-mail: jackd@sfu.ca Course website:

More information

IAPT: Regression. Regression analyses

IAPT: Regression. Regression analyses Regression analyses IAPT: Regression Regression is the rather strange name given to a set of methods for predicting one variable from another. The data shown in Table 1 and come from a student project

More information

Chapter 9: Comparing two means

Chapter 9: Comparing two means Chapter 9: Comparing two means Smart Alex s Solutions Task 1 Is arachnophobia (fear of spiders) specific to real spiders or will pictures of spiders evoke similar levels of anxiety? Twelve arachnophobes

More information

Research Methods in Forest Sciences: Learning Diary. Yoko Lu December Research process

Research Methods in Forest Sciences: Learning Diary. Yoko Lu December Research process Research Methods in Forest Sciences: Learning Diary Yoko Lu 285122 9 December 2016 1. Research process It is important to pursue and apply knowledge and understand the world under both natural and social

More information

Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 5 Residuals and multiple regression Introduction

Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 5 Residuals and multiple regression Introduction Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 5 Residuals and multiple regression Introduction In this exercise, we will gain experience assessing scatterplots in regression and

More information

Following is a list of topics in this paper:

Following is a list of topics in this paper: Preliminary NTS Data Analysis Overview In this paper A preliminary investigation of some data around NTS performance has been started. This document reviews the results to date. Following is a list of

More information

Generalized Estimating Equations for Depression Dose Regimes

Generalized Estimating Equations for Depression Dose Regimes Generalized Estimating Equations for Depression Dose Regimes Karen Walker, Walker Consulting LLC, Menifee CA Generalized Estimating Equations on the average produce consistent estimates of the regression

More information

STAT 503X Case Study 1: Restaurant Tipping

STAT 503X Case Study 1: Restaurant Tipping STAT 503X Case Study 1: Restaurant Tipping 1 Description Food server s tips in restaurants may be influenced by many factors including the nature of the restaurant, size of the party, table locations in

More information

10. LINEAR REGRESSION AND CORRELATION

10. LINEAR REGRESSION AND CORRELATION 1 10. LINEAR REGRESSION AND CORRELATION The contingency table describes an association between two nominal (categorical) variables (e.g., use of supplemental oxygen and mountaineer survival ). We have

More information

Chapter 12: Analysis of covariance, ANCOVA

Chapter 12: Analysis of covariance, ANCOVA Chapter 12: Analysis of covariance, ANCOVA Smart Alex s Solutions Task 1 A few years back I was stalked. You d think they could have found someone a bit more interesting to stalk, but apparently times

More information

Small Group Presentations

Small Group Presentations Admin Assignment 1 due next Tuesday at 3pm in the Psychology course centre. Matrix Quiz during the first hour of next lecture. Assignment 2 due 13 May at 10am. I will upload and distribute these at the

More information

An Introduction to Modern Econometrics Using Stata

An Introduction to Modern Econometrics Using Stata An Introduction to Modern Econometrics Using Stata CHRISTOPHER F. BAUM Department of Economics Boston College A Stata Press Publication StataCorp LP College Station, Texas Contents Illustrations Preface

More information

12.1 Inference for Linear Regression. Introduction

12.1 Inference for Linear Regression. Introduction 12.1 Inference for Linear Regression vocab examples Introduction Many people believe that students learn better if they sit closer to the front of the classroom. Does sitting closer cause higher achievement,

More information

Homework 2 Math 11, UCSD, Winter 2018 Due on Tuesday, 23rd January

Homework 2 Math 11, UCSD, Winter 2018 Due on Tuesday, 23rd January PID: Last Name, First Name: Section: Approximate time spent to complete this assignment: hour(s) Readings: Chapters 7, 8 and 9. Homework 2 Math 11, UCSD, Winter 2018 Due on Tuesday, 23rd January Exercise

More information

Complex Regression Models with Coded, Centered & Quadratic Terms

Complex Regression Models with Coded, Centered & Quadratic Terms Complex Regression Models with Coded, Centered & Quadratic Terms We decided to continue our study of the relationships among amount and difficulty of exam practice with exam performance in the first graduate

More information

UNEQUAL CELL SIZES DO MATTER

UNEQUAL CELL SIZES DO MATTER 1 of 7 1/12/2010 11:26 AM UNEQUAL CELL SIZES DO MATTER David C. Howell Most textbooks dealing with factorial analysis of variance will tell you that unequal cell sizes alter the analysis in some way. I

More information

The impact of pre-selected variance inflation factor thresholds on the stability and predictive power of logistic regression models in credit scoring

The impact of pre-selected variance inflation factor thresholds on the stability and predictive power of logistic regression models in credit scoring Volume 31 (1), pp. 17 37 http://orion.journals.ac.za ORiON ISSN 0529-191-X 2015 The impact of pre-selected variance inflation factor thresholds on the stability and predictive power of logistic regression

More information

Study Guide #2: MULTIPLE REGRESSION in education

Study Guide #2: MULTIPLE REGRESSION in education Study Guide #2: MULTIPLE REGRESSION in education What is Multiple Regression? When using Multiple Regression in education, researchers use the term independent variables to identify those variables that

More information

Stat 13, Lab 11-12, Correlation and Regression Analysis

Stat 13, Lab 11-12, Correlation and Regression Analysis Stat 13, Lab 11-12, Correlation and Regression Analysis Part I: Before Class Objective: This lab will give you practice exploring the relationship between two variables by using correlation, linear regression

More information

12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Introduce moderated multiple regression Continuous predictor continuous predictor Continuous predictor categorical predictor Understand

More information

A SAS Macro for Adaptive Regression Modeling

A SAS Macro for Adaptive Regression Modeling A SAS Macro for Adaptive Regression Modeling George J. Knafl, PhD Professor University of North Carolina at Chapel Hill School of Nursing Supported in part by NIH Grants R01 AI57043 and R03 MH086132 Overview

More information

Simple Linear Regression the model, estimation and testing

Simple Linear Regression the model, estimation and testing Simple Linear Regression the model, estimation and testing Lecture No. 05 Example 1 A production manager has compared the dexterity test scores of five assembly-line employees with their hourly productivity.

More information

Week 10 Hour 1. Shapiro-Wilks Test (from last time) Cross-Validation. Week 10 Hour 2 Missing Data. Stat 302 Notes. Week 10, Hour 2, Page 1 / 32

Week 10 Hour 1. Shapiro-Wilks Test (from last time) Cross-Validation. Week 10 Hour 2 Missing Data. Stat 302 Notes. Week 10, Hour 2, Page 1 / 32 Week 10 Hour 1 Shapiro-Wilks Test (from last time) Cross-Validation Week 10 Hour 2 Missing Data Stat 302 Notes. Week 10, Hour 2, Page 1 / 32 Cross-Validation in the Wild It s often more important to describe

More information

CHAPTER ONE CORRELATION

CHAPTER ONE CORRELATION CHAPTER ONE CORRELATION 1.0 Introduction The first chapter focuses on the nature of statistical data of correlation. The aim of the series of exercises is to ensure the students are able to use SPSS to

More information

3.2A Least-Squares Regression

3.2A Least-Squares Regression 3.2A Least-Squares Regression Linear (straight-line) relationships between two quantitative variables are pretty common and easy to understand. Our instinct when looking at a scatterplot of data is to

More information

Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties

Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties Bob Obenchain, Risk Benefit Statistics, August 2015 Our motivation for using a Cut-Point

More information

What Are Your Odds? : An Interactive Web Application to Visualize Health Outcomes

What Are Your Odds? : An Interactive Web Application to Visualize Health Outcomes What Are Your Odds? : An Interactive Web Application to Visualize Health Outcomes Abstract Spreading health knowledge and promoting healthy behavior can impact the lives of many people. Our project aims

More information

MULTIPLE REGRESSION OF CPS DATA

MULTIPLE REGRESSION OF CPS DATA MULTIPLE REGRESSION OF CPS DATA A further inspection of the relationship between hourly wages and education level can show whether other factors, such as gender and work experience, influence wages. Linear

More information

Lesson 9: Two Factor ANOVAS

Lesson 9: Two Factor ANOVAS Published on Agron 513 (https://courses.agron.iastate.edu/agron513) Home > Lesson 9 Lesson 9: Two Factor ANOVAS Developed by: Ron Mowers, Marin Harbur, and Ken Moore Completion Time: 1 week Introduction

More information

Score Tests of Normality in Bivariate Probit Models

Score Tests of Normality in Bivariate Probit Models Score Tests of Normality in Bivariate Probit Models Anthony Murphy Nuffield College, Oxford OX1 1NF, UK Abstract: A relatively simple and convenient score test of normality in the bivariate probit model

More information

5 To Invest or not to Invest? That is the Question.

5 To Invest or not to Invest? That is the Question. 5 To Invest or not to Invest? That is the Question. Before starting this lab, you should be familiar with these terms: response y (or dependent) and explanatory x (or independent) variables; slope and

More information

How to analyze correlated and longitudinal data?

How to analyze correlated and longitudinal data? How to analyze correlated and longitudinal data? Niloofar Ramezani, University of Northern Colorado, Greeley, Colorado ABSTRACT Longitudinal and correlated data are extensively used across disciplines

More information

Chapter 3 CORRELATION AND REGRESSION

Chapter 3 CORRELATION AND REGRESSION CORRELATION AND REGRESSION TOPIC SLIDE Linear Regression Defined 2 Regression Equation 3 The Slope or b 4 The Y-Intercept or a 5 What Value of the Y-Variable Should be Predicted When r = 0? 7 The Regression

More information

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0% Capstone Test (will consist of FOUR quizzes and the FINAL test grade will be an average of the four quizzes). Capstone #1: Review of Chapters 1-3 Capstone #2: Review of Chapter 4 Capstone #3: Review of

More information

ATTACH YOUR SAS CODE WITH YOUR ANSWERS.

ATTACH YOUR SAS CODE WITH YOUR ANSWERS. BSTA 6652 Survival Analysis Winter, 2017 Problem Set 5 Reading: Klein: Chapter 12; SAS textbook: Chapter 4 ATTACH YOUR SAS CODE WITH YOUR ANSWERS. The data in BMTH.txt was collected on 43 bone marrow transplant

More information

Analysis of single gene effects 1. Quantitative analysis of single gene effects. Gregory Carey, Barbara J. Bowers, Jeanne M.

Analysis of single gene effects 1. Quantitative analysis of single gene effects. Gregory Carey, Barbara J. Bowers, Jeanne M. Analysis of single gene effects 1 Quantitative analysis of single gene effects Gregory Carey, Barbara J. Bowers, Jeanne M. Wehner From the Department of Psychology (GC, JMW) and Institute for Behavioral

More information

Analysis of Covariance (ANCOVA)

Analysis of Covariance (ANCOVA) Analysis of Covariance (ANCOVA) Some background ANOVA can be extended to include one or more continuous variables that predict the outcome (or dependent variable). Continuous variables such as these, that

More information

STATS Relationships between variables: Correlation

STATS Relationships between variables: Correlation STATS 1060 Relationships between variables: Correlation READINGS: Chapter 7 of your text book (DeVeaux, Vellman and Bock); on-line notes for correlation; on-line practice problems for correlation NOTICE:

More information

Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1: Exploring Data Key Vocabulary:! individual! variable! frequency table! relative frequency table! distribution! pie chart! bar graph! two-way table! marginal distributions! conditional distributions!

More information

Parameter Estimation of Cognitive Attributes using the Crossed Random- Effects Linear Logistic Test Model with PROC GLIMMIX

Parameter Estimation of Cognitive Attributes using the Crossed Random- Effects Linear Logistic Test Model with PROC GLIMMIX Paper 1766-2014 Parameter Estimation of Cognitive Attributes using the Crossed Random- Effects Linear Logistic Test Model with PROC GLIMMIX ABSTRACT Chunhua Cao, Yan Wang, Yi-Hsin Chen, Isaac Y. Li University

More information

Statistical Tools in Biology

Statistical Tools in Biology Statistical Tools in Biology Research Methodology Design protocol/procedure. (2 types) Cross sectional study comparing two different grps. e.g, comparing LDL levels between athletes and couch potatoes.

More information

CLASSICAL AND. MODERN REGRESSION WITH APPLICATIONS

CLASSICAL AND. MODERN REGRESSION WITH APPLICATIONS - CLASSICAL AND. MODERN REGRESSION WITH APPLICATIONS SECOND EDITION Raymond H. Myers Virginia Polytechnic Institute and State university 1 ~l~~l~l~~~~~~~l!~ ~~~~~l~/ll~~ Donated by Duxbury o Thomson Learning,,

More information

SPSS output for 420 midterm study

SPSS output for 420 midterm study Ψ Psy Midterm Part In lab (5 points total) Your professor decides that he wants to find out how much impact amount of study time has on the first midterm. He randomly assigns students to study for hours,

More information

15.301/310, Managerial Psychology Prof. Dan Ariely Recitation 8: T test and ANOVA

15.301/310, Managerial Psychology Prof. Dan Ariely Recitation 8: T test and ANOVA 15.301/310, Managerial Psychology Prof. Dan Ariely Recitation 8: T test and ANOVA Statistics does all kinds of stuff to describe data Talk about baseball, other useful stuff We can calculate the probability.

More information