Multiple Linear Regression Analysis

Similar documents
Notes for laboratory session 2

Age (continuous) Gender (0=Male, 1=Female) SES (1=Low, 2=Medium, 3=High) Prior Victimization (0= Not Victimized, 1=Victimized)

Introduction to regression

ANOVA. Thomas Elliott. January 29, 2013

Final Exam - section 2. Thursday, December hours, 30 minutes

Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 5 Residuals and multiple regression Introduction

MODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

Using SPSS for Correlation

MULTIPLE REGRESSION OF CPS DATA

CHAPTER TWO REGRESSION

Simple Linear Regression One Categorical Independent Variable with Several Categories

HZAU MULTIVARIATE HOMEWORK #2 MULTIPLE AND STEPWISE LINEAR REGRESSION

This tutorial presentation is prepared by. Mohammad Ehsanul Karim

Name: emergency please discuss this with the exam proctor. 6. Vanderbilt s academic honor code applies.

Multiple Regression Analysis

Daniel Boduszek University of Huddersfield

m 11 m.1 > m 12 m.2 risk for smokers risk for nonsmokers

2 Assumptions of simple linear regression

Example of Interpreting and Applying a Multiple Regression Model

Daniel Boduszek University of Huddersfield

Sociology 63993, Exam1 February 12, 2015 Richard Williams, University of Notre Dame,

Psych 5741/5751: Data Analysis University of Boulder Gary McClelland & Charles Judd. Exam #2, Spring 1992

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

Two-Way Independent ANOVA

Use the above variables and any you might need to construct to specify the MODEL A/C comparisons you would use to ask the following questions.

ANOVA in SPSS (Practical)

LAB ASSIGNMENT 4 INFERENCES FOR NUMERICAL DATA. Comparison of Cancer Survival*

One-Way Independent ANOVA

The North Carolina Health Data Explorer

Simple Linear Regression

Effects of Nutrients on Shrimp Growth

Allergy Basics. This handout describes the process for adding and removing allergies from a patient s chart.

BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA

Regression Including the Interaction Between Quantitative Variables

Today: Binomial response variable with an explanatory variable on an ordinal (rank) scale.

Here are the various choices. All of them are found in the Analyze menu in SPSS, under the sub-menu for Descriptive Statistics :

Step 3 Tutorial #3: Obtaining equations for scoring new cases in an advanced example with quadratic term

Two-Way Independent Samples ANOVA with SPSS

1. Objective: analyzing CD4 counts data using GEE marginal model and random effects model. Demonstrate the analysis using SAS and STATA.

ECON Introductory Econometrics Seminar 7

SPSS output for 420 midterm study

Modeling unobserved heterogeneity in Stata

Intro to SPSS. Using SPSS through WebFAS

Binary Diagnostic Tests Paired Samples

2. Scientific question: Determine whether there is a difference between boys and girls with respect to the distance and its change over time.

Correlation and regression

Binary Diagnostic Tests Two Independent Samples

Business Statistics Probability

Correlation and Regression

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do.

10. LINEAR REGRESSION AND CORRELATION

How to Conduct On-Farm Trials. Dr. Jim Walworth Dept. of Soil, Water & Environmental Sci. University of Arizona

Bangor University Laboratory Exercise 1, June 2008

Meta-analysis using RevMan. Yemisi Takwoingi October 2015

Section 3 Correlation and Regression - Teachers Notes

Lab 4 (M13) Objective: This lab will give you more practice exploring the shape of data, and in particular in breaking the data into two groups.

Tutorial #7A: Latent Class Growth Model (# seizures)

Sociology Exam 3 Answer Key [Draft] May 9, 201 3

Regression Output: Table 5 (Random Effects OLS) Random-effects GLS regression Number of obs = 1806 Group variable (i): subject Number of groups = 70

Your Task: Find a ZIP code in Seattle where the crime rate is worse than you would expect and better than you would expect.

Simple Linear Regression the model, estimation and testing

General Example: Gas Mileage (Stat 5044 Schabenberger & J.P.Morgen)

bivariate analysis: The statistical analysis of the relationship between two variables.

To open a CMA file > Download and Save file Start CMA Open file from within CMA

Preliminary Report on Simple Statistical Tests (t-tests and bivariate correlations)

Dr. Kelly Bradley Final Exam Summer {2 points} Name

Impact of Group Rejections on Self Esteem

Two Factor Analysis of Variance

7. Bivariate Graphing

Chapter 3 CORRELATION AND REGRESSION

SPSS output for 420 midterm study

Chapter 1: Exploring Data

Chapter 3: Describing Relationships

Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models, 2nd Ed.

Least likely observations in regression models for categorical outcomes

4. STATA output of the analysis

WELCOME! Lecture 11 Thommy Perlinger

Examining differences between two sets of scores

Assessing Studies Based on Multiple Regression. Chapter 7. Michael Ash CPPA

MULTIPLE OLS REGRESSION RESEARCH QUESTION ONE:

Q: How do I get the protein concentration in mg/ml from the standard curve if the X-axis is in units of µg.

Lecture 6B: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto

Testing Means. Related-Samples t Test With Confidence Intervals. 6. Compute a related-samples t test and interpret the results.

1. To review research methods and the principles of experimental design that are typically used in an experiment.

Multivariate dose-response meta-analysis: an update on glst

EXPERIMENT 3 ENZYMATIC QUANTITATION OF GLUCOSE

Psychology of Perception Psychology 4165, Fall 2001 Laboratory 1 Weight Discrimination

The Association Design and a Continuous Phenotype

Stat 13, Lab 11-12, Correlation and Regression Analysis

Lesson: A Ten Minute Course in Epidemiology

Class 7 Everything is Related

NORTH SOUTH UNIVERSITY TUTORIAL 2

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?

Unit 1 Exploring and Understanding Data

5 To Invest or not to Invest? That is the Question.

Answer all three questions. All questions carry equal marks.

Using SAS to Conduct Pilot Studies: An Instructors Guide

Part 8 Logistic Regression

Transcription:

Revised July 2018 Multiple Linear Regression Analysis This set of notes shows how to use Stata in multiple regression analysis. It assumes that you have set Stata up on your computer (see the Getting Started with Stata handout), and that you have read in the set of data that you want to analyze (see the Reading in Stata Format (.dta) Data Files handout). In Stata, most tasks can be performed either by issuing commands within the Stata command window, or by using the menus. These notes illustrate both approaches, using the data file GSS2016.DTA (this data file is posted here: https://canvas.harvard.edu/courses/53958). To estimate the linear regression of a response variable on K explanatory variables, issue the following command: regress <depvar> <indepvar1> <indepvar2> <indepvark> Where you fill in the name of your response variable in place of "<depvar>" and the name of your explanatory variables in place of "<indepvar1>", "<indepvar2>", etc. The response variable must always be listed first in the list. The order in which the explanatory variables are listed does not matter. The explanatory variables may include indicator variables (see the Multiple Linear Regression Analysis with Indicator Variable handout) or other terms that you construct yourself, as well as quantitative variables. For a regression of vocabulary test score ( wordsum ) on education ( educ ) and father s education ( paeduc ), the command would be: regress wordsum educ paeduc Using the Stata menus, you can estimate the linear regression as follows: click on "Statistics" click on "Linear models and related" click on "Linear regression" 1

A window like this will open up: Fill in the name of your response variable in the "Dependent variable:" box and the name of your explanatory variables in the "Independent variables:" box, and click "OK". Whether via the menus or via the command line approach, the following output will appear in the Stata "Results" window:. reg wordsum educ paeduc Source SS df MS Number of obs = 1,355 -------------+---------------------------------- F(2, 1352) = 167.68 Model 963.189642 2 481.594821 Prob > F = 0.0000 Residual 3883.11589 1,352 2.87212714 R-squared = 0.1987 -------------+---------------------------------- Adj R-squared = 0.1976 Total 4846.30554 1,354 3.57925076 Root MSE = 1.6947 wordsum Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- educ.2456815.017593 13.96 0.000.2111689.2801941 paeduc.0720562.0126831 5.68 0.000.0471755.096937 _cons 1.912368.2409929 7.94 0.000 1.439607 2.385129 The estimates that determine the regression equation (a plane, when there are two explanatory variables) are in the "Coef." column in the bottom panel of this chart; the number beside "cons" is the "constant" or "intercept" a, here 1.91; the number beside each explanatory variable is its "partial regression coefficient" or "slope" b. Here, the coefficient of 0.25 for respondent s education ("educ") indicates that a 1-year rise in respondent s education is associated with a 0.25 word rise in vocabulary test score, among persons having the same level of father s education; the coefficient of 0.07 for father s education ("paeduc") indicates that a 1-year rise in father s 2

education is associated with a 0.07 word rise in vocabulary test score, among persons having the same level of education. The coefficient of determination is reported in the upper right panel as "R-squared": about 20% of the variation in vocabulary test score is "explained" by respondent s education and father s education. The standard error of estimate (or residual standard deviation) is reported as "Root MSE." The standard deviation of residuals or "errors" (that is, a typical distance of an actual observation from its predicted value under the regression) around the regression plane is about 1.7. At the upper left is an analysis of variance table that leads to the F statistic reported at the upper right (167.7, with 2 and 1352 df). At the bottom are standard errors and t-statistics for the slopes and intercept, as well as 95% confidence intervals for those statistics. If you want Stata to calculate and print the standardized (beta) coefficients, select the "Reporting" tab of the above screen and check the box for "Standardized beta coefficients" there, before clicking OK. Alternatively, add the "beta" option to the command-line version of this command, as follows for this example: regress wordsum educ paeduc, beta In this case, you will see the following output:. regress wordsum educ paeduc, beta Source SS df MS Number of obs = 1,355 -------------+---------------------------------- F(2, 1352) = 167.68 Model 963.189642 2 481.594821 Prob > F = 0.0000 Residual 3883.11589 1,352 2.87212714 R-squared = 0.1987 -------------+---------------------------------- Adj R-squared = 0.1976 Total 4846.30554 1,354 3.57925076 Root MSE = 1.6947 wordsum Coef. Std. Err. t P> t Beta -------------+---------------------------------------------------------------- educ.2456815.017593 13.96 0.000.3672722 paeduc.0720562.0126831 5.68 0.000.1494177 _cons 1.912368.2409929 7.94 0.000. As you can see, this is identical to the above, except that the "Beta" coefficient appears in place of the confidence interval. Predicted values of the response variable based on a regression are often useful. If you want to compute and save them, follow the exact same procedures used in doing this for bivariate regression (see the Stata handout for bivariate regression). Testing hypotheses about subsets of explanatory variables. The output shown earlier for regression analysis allows you to test hypotheses about single partial coefficients (via the t statistic) or about all of the partial regression coefficients taken together (via the F statistic). 3

Sometimes one will wish to test a hypothesis about a subset of two or more but not all the regression coefficients. You can do this by using one of Stata s "post-estimation" commands, immediately after you have estimated the regression. For example, here is the same regression from page 1 but with the variable age added to the set of explanatory variables:. regress wordsum educ paeduc age Source SS df MS Number of obs = 1,350 -------------+---------------------------------- F(3, 1346) = 124.90 Model 1051.18418 3 350.394727 Prob > F = 0.0000 Residual 3776.00174 1,346 2.80535048 R-squared = 0.2178 -------------+---------------------------------- Adj R-squared = 0.2160 Total 4827.18593 1,349 3.5783439 Root MSE = 1.6749 wordsum Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- educ.2357611.0174946 13.48 0.000.2014413.2700808 paeduc.0965124.0131911 7.32 0.000.0706351.1223897 age.0154274.0026967 5.72 0.000.0101372.0207175 _cons.9948572.2872937 3.46 0.001.4312652 1.558449 Perhaps you would like to test the hypothesis that the partial coefficients for father s education and age are both zero; if retained (i.e., not rejected), this hypothesis would suggest that only education shapes vocabulary test score. Using the command line, you can accomplish this by typing test paeduc age right after you estimate the regression. Alternately, you can use the menu options to do this, again immediately after the regression. Click on "Post-estimation" Click on the arrow next to "Tests, contrasts, and comparisons Select Linear tests of parameter estimates Click Launch 4

The screen below will appear: Click create. The following window will pop up: 5

Fill in the names of the variables corresponding to the coefficients in your joint hypothesis. Then click OK. Either way, the following output will appear:. test paeduc age ( 1) paeduc = 0 ( 2) age = 0 F( 2, 1346) = 33.40 Prob > F = 0.0000 This F statistic for removing both paeduc and age from the regression (while leaving education in the regression) offers strong evidence against the hypothesis that the regression coefficients for father s education and age, controlling for education, are simultaneously 0. 6