STAT/SOC/CSSS 221 Statistical Concepts and Methods for the Social Sciences. Introduction to Mulitple Regression

Similar documents
STAT/SOC/CSSS 221 Statistical Concepts and Methods for the Social Sciences. Introduction to Bivariate Regression

POLS 205 Political Science as a Social Science. Analyzing Bivariate Relationships

מדינת ישראל. Tourist Visa Table

Current State of Global HIV Care Continua. Reuben Granich 1, Somya Gupta 1, Irene Hall 2, John Aberle-Grasse 2, Shannon Hader 2, Jonathan Mermin 2

מדינת ישראל. Tourist Visa Table. Tourist visa exemption is applied to national and official passports only, and not to other travel documents.

APPENDIX II - TABLE 2.3 ANTI-TOBACCO MASS MEDIA CAMPAIGNS

THE CARE WE PROMISE FACTS AND FIGURES 2017

World Health organization/ International Society of Hypertension (WH0/ISH) risk prediction charts

CALLING ABROAD PRICES FOR EE SMALL BUSINESS PLANS

ADMINISTRATIVE AND FINANCIAL MATTERS. Note by the Executive Secretary * CONTENTS. Explanatory notes Tables. 1. Core budget

Maternal Deaths Disproportionately High in Developing Countries

Eligibility List 2018

FRAMEWORK CONVENTION ALLIANCE BUILDING SUPPORT FOR TOBACCO CONTROL. Smoke-free. International Status Report

Hearing loss in persons 65 years and older based on WHO global estimates on prevalence of hearing loss

AGaRT The Advisory Group on increasing access to Radiotherapy Technology in low and middle income countries

Supplementary appendix

3.5 Consumption Annual Prevalence Opiates

Main developments in past 24 hours

Drug Prices Report Opioids Retail and wholesale prices * and purity levels,by drug, region and country or territory (prices expressed in US$ )

WELLNESS COACHING. Wellness & Personal Fitness Solution Providers

WHO report highlights violence against women as a global health problem of epidemic proportions

Copyright 2011 Joint United Nations Programme on HIV/AIDS (UNAIDS) All rights reserved

TOBACCO USE PREVALENCE APPENDIX II: The following definitions are used in Table 2.1 and Table 2.3:

Why Invest in Nutrition?

ANNEX 3: Country progress indicators

Challenges and Opportunities to Optimizing the HIV Care Continuum Can We Test and Treat Enough People to Make a Seismic Difference by 2030?

1. Consent for Treatment This form must be completed in order to receive healthcare services in the campus clinic.

ICM: Trade-offs in the fight against HIV/AIDS

BCG. and your baby. Immunisation. Protecting babies against TB. the safest way to protect your child

Annex 2 A. Regional profile: West Africa

World Health Organization Department of Communicable Disease Surveillance and Response

Analysis of Immunization Financing Indicators from the WHO-UNICEF Joint Reporting Form (JRF),

GLOBAL RepORt UNAIDS RepoRt on the global AIDS epidemic

#1 #2 OR Immunity verified by immune titer (please attach report) * No titer needed if proof of two doses of Varicella provided

Articles. Funding Bill & Melinda Gates Foundation.

This portion to be completed by the student Return by July 1 Please use ballpoint pen

The worldwide societal costs of dementia: Estimates for 2009

WELLNESS COACHING. Wellness & Personal Fitness Solution Providers NZ & Australia

Certificate of Immunization

Outcomes of the Global Consultation Interim diagnostic algorithms and Operational considerations

Tobacco: World Markets and Trade

Global and regional burden of first-ever ischaemic and haemorrhagic stroke during : findings from the Global Burden of Disease Study 2010

The Single Convention on Narcotic Drugs- Implementation in Six Countries: Albania, Bangladesh, India, Kyrgyzstan, Sri Lanka, Ukraine

we are daisy Daisy Conferencing Max Bridge charges* International charges* International toll-free access levy

Development Database. Compiled by James W. McGuire Department of Government Wesleyan University Summer

FORMS MUST BE COMPLETED PRIOR TO THE START OF YOUR FIRST SEMESTER

Stakeholders consultation on strengthened cooperation against vaccine preventable diseases

Tipping the dependency

STUDENT HEALTH SERVICES NEW STUDENT QUESTIONNAIRE

The IB Diploma Programme Statistical Bulletin. November 2015 Examination Session. Education for a better world

Calls from home residential tariffs

Social Capital Achievement: 2009 Country Rankings

Country-wise and Item-wise Exports of Animal By Products Value Rs. Lakh Quantity in '000 Unit: Kgs Source: MoC Export Import Data Bank

REQUIRED COLLEGIATE START. (High school students/ early entry only not for undergraduates) IMMUNIZATION FORM THIS IS REQUIRED INFORMATION

STUDENT MEDICAL REPORT For Graduate and Part-time Undergraduate Students The State of Connecticut General Statutes Section 10a and Fairfield

Supplementary Online Content

Malnutrition prevalences by country and year, from survey data and interpolated for reference years (1990, 1995, 2000)

DEMOGRAPHIC CHANGES IN DEVELOPED AND DEVELOPING COUNTRIES

Copyright 2010 Joint United Nations Programme on HIV/AIDS (UNAIDS) All rights reserved

I. THE TRANSITION TO LOW FERTILITY AND ITS IMPLICATIONS FOR THE FUTURE

WORLD COUNCIL OF CREDIT UNIONS 2017 STATISTICAL REPORT

Global EHS Resource Center

Donor Support for Contraceptives and Condoms for STI/HIV Prevention

Global Fund ARV Fact Sheet 1 st June, 2009

Undetectable = Untransmittable. Mariah Wilberg Communications Specialist

STUDENT HEALTH FORM *IMPORTANT DEADLINES: DUE AUGUST 1 FOR AUGUST ENTRY (FALL SEMESTER) Due January 1 for January Entry(spring semester)

UNDERGRADUATE STUDENT HEALTH PACKET

Terms and Conditions. VISA Global Customer Assistance Services

FACTS AND FIGURES 2015 OUR VILLAGE

Country Profiles for Population and Reproductive Health: Policy Developments and Indicators 2003

ACCESS 7. TOWARDS UNIVERSAL ACCESS: THE WAY FORWARD

WHO Global Status Report on Alcohol 2004

Supplementary Online Content

JOINT TB AND HIV PROGRAMMING

Dear New Student and Family,

Health Services Immunization and Health Information

Global Fund Mid-2013 Results

all incoming UWL students MUST submit an up-to-date immunization history, including vaccination dates.

The Immunization Record is available to download from the Health Insurance and Immunizations website at drexel.edu/hii/forms.

NON-STANDARD PRICE GUIDE FOR EE SMALL BUSINESS More information about out-of-bundle charges for our small business customers

HIV and development challenges for Africa Catherine Hankins, Associate Director & Chief Scientific Adviser to UNAIDS

SEATTLE UNIVERSITY. Name. Address Street City State Zip Code. Date of Entry / Date of Birth / / School ID# M Y M D Y

Epidemiological Estimates for Haemoglobin Disorders: WHO South East Asian Region by Country

Welcomes New Students

Name DOB / / LAST FIRST MI Home Address: Street City: State: Zip: Name of Parent/Guardian(Emergency Contact) Relationship Contact Phone Number

Seizures of ATS (excluding ecstasy ), 2010

THE SOCIAL FOUNDATIONS OF WORLD HAPPINESS

Developed for the Global Initiative for Asthma

Tracking progress in achieving the global nutrition targets May 2014

Student Health Center Mandatory Immunization Information

PUBLIC HEALTH FACT SHEET

Update: Xpert MTB/RIF system for rapid diagnosis of TB and MDR-TB

Global, regional, and national burden of Parkinson s disease, : a systematic analysis for the Global Burden of Disease Study 2016

CIGARETTE PACKAGE HEALTH WARNINGS

Decline in Human Fertility:

ESPEN Congress Geneva 2014 FOOD: THE FACTOR RESHAPING THE SIZE OF THE PLANET

Annual prevalence estimates of cannabis use in the late 1990s

The Prematriculation Health Status Report must be completed in its entirety prior to arriving at Allegheny College. Please PRINT legibly.

Transcription:

STAT/SOC/CSSS 1 Statistical Concepts and Methods for the Social Sciences Introduction to Mulitple Regression Christopher Adolph Department of Political Science and Center for Statistics and the Social Sciences University of Washington, Seattle Chris Adolph (UW) Multiple Regression 1 / 33

Motivating Example: Cross-national determinants of fertility We have cross-national data from several sources: Fertility The average number of children born per adult female, in 000 (United Nations) Education Ratio The ratio of girls to boys in primary and secondary education, in 000 (Word Bank Development Indicators) GDP per capita Economic activity in thousands of dollars, purchasing power parity in 000 (Penn World Tables) Agricultural Labor Percentage of the labor force working in agriculture in 000 (International Labor Organization) Note the addition of a fourth variable Chris Adolph (UW) Multiple Regression / 33

Motivating Example: Cross-national determinants of fertility All three independent variables might cause the fertility rate More agricultural nations may have more children to bolster the labor force on family farms Let s look at the univariate summaries & bivariate regression results for this new covariate Chris Adolph (UW) Multiple Regression 3 / 33

Summary of Univariate Distribution: Agricultural Labor Frequency 0 10 0 30 40 Median = 8.1% Mean = 16.0 % std dev = 17.9% 0 0 40 60 80 Agriculture workers as % of labor force Chris Adolph (UW) Multiple Regression 4 / 33

Summary of Univariate Distribution: Agricultural Labor Frequency 0 10 0 30 40 Median = 8.1% Mean = 16.0 % std dev = 17.9% How would you describe this distribution? 0 0 40 60 80 Agriculture workers as % of labor force Chris Adolph (UW) Multiple Regression 4 / 33

Regression of Fertility on Agricultural Labor Variable Estimates se t-stat p-value Intercept 1.83 (0.15) 1.34 < 0.001 Agricultural Labor 0.0 (0.01) 3.5 < 0.001 N 7 R 0.15 RMSE 0.93 How do we read this table? Chris Adolph (UW) Multiple Regression 5 / 33

Regression of Fertility on Agricultural Labor Variable Estimates se t-stat p-value Intercept 1.83 (0.15) 1.34 < 0.001 Agricultural Labor 0.0 (0.01) 3.5 < 0.001 N 7 R 0.15 RMSE 0.93 How do we read this table? Note the reduction in N: lots of cases are missing data on agricultural labor Any cases missing any covariates need to be deleted from the data before using regression (listwise deletion) Chris Adolph (UW) Multiple Regression 5 / 33

0 10 0 30 40 50 4 6 8 Agricultural workers as % labor force Fertility Rate What looks different about this scatterplot? Chris Adolph (UW) Multiple Regression 6 / 33

Fertility Rate 8 6 4 0 10 0 30 40 50 Agricultural workers as % labor force What looks different about this scatterplot? The high fertility cases seem to be missing (deleted due to missing data) Chris Adolph (UW) Multiple Regression 6 / 33

8 Fertility Rate 6 4 Guatemala Oman Bolivia Jordan Namibia Paraguay Botswana Israel Ecuador Peru Malaysia El Salvador entina ombia ited Arab Maldives South Emirates Panama Africa Mexico Costa Jamaica BrazilRica Iceland Azerbaijan Denmark nds Uruguay ed New States Australia Barbados elgium Cyprus Antilles Ireland Zealand Mongo etherlands embourg Malta Finland Canada nidad Norway apore weden Kingdom Tobago Austria Croatia Cuba Mol Ge Slovak Switzerland ermany Japan Hungary Estonia Korea, Republic Portugal Spain Slovenia Rep. Latvia Greece Lithuania Poland Ukraine Bulgaria Romania o, China 0 10 0 30 40 50 Agricultural workers as % labor force Chris Adolph (UW) Multiple Regression 7 / 33

0 10 0 30 40 50 4 6 8 Agricultural workers as % labor force Fertility Rate Is this a strong relationship? Chris Adolph (UW) Multiple Regression 8 / 33

0 10 0 30 40 50 4 6 8 Agricultural workers as % labor force Fertility Rate Is this a strong relationship? How many datapoints would have to move to reduce the slope to 0? Chris Adolph (UW) Multiple Regression 8 / 33

0 10 0 30 40 50 4 6 8 Agricultural workers as % labor force Fertility Rate Which are larger, the residuals or the explained variance? Chris Adolph (UW) Multiple Regression 9 / 33

Density 0.4 0.3 0. What is the standard deviation of this distribution called? 0.1 1 0 1 3 Residuals from Fertility vs Agriculture Chris Adolph (UW) Multiple Regression 10 / 33

Density 0.4 0.3 0. 0.1 1 0 1 3 Residuals from Fertility vs Agriculture What is the standard deviation of this distribution called? The RMSE, or standard error of the regression: how much predictions from this model tend to miss by Chris Adolph (UW) Multiple Regression 10 / 33

0 10 0 30 40 50 4 6 8 Residuals from Fertility vs Agriculture Fertility Rate How confident are we that this line has a positive slope? Chris Adolph (UW) Multiple Regression 11 / 33

Fertility Rate 8 6 4 0 10 0 30 40 50 Residuals from Fertility vs Agriculture How confident are we that this line has a positive slope? Are we as confident as we were for the other models? Chris Adolph (UW) Multiple Regression 11 / 33

Confounders and Omitted Variable Bias Which (if any) of the three models we ve looked at are right? Do Education, GDP, and Ag Labor all affect Fertility? Chris Adolph (UW) Multiple Regression 1 / 33

Confounders and Omitted Variable Bias Which (if any) of the three models we ve looked at are right? Do Education, GDP, and Ag Labor all affect Fertility? What if Education, GDP, and Ag Labor are correlated? Chris Adolph (UW) Multiple Regression 1 / 33

Confounders and Omitted Variable Bias Which (if any) of the three models we ve looked at are right? Do Education, GDP, and Ag Labor all affect Fertility? What if Education, GDP, and Ag Labor are correlated? If we regress Fertility on Education, and Education is correlated with GDP and Ag, might it proxy all three variables? Chris Adolph (UW) Multiple Regression 1 / 33

Confounders and Omitted Variable Bias Which (if any) of the three models we ve looked at are right? Do Education, GDP, and Ag Labor all affect Fertility? What if Education, GDP, and Ag Labor are correlated? If we regress Fertility on Education, and Education is correlated with GDP and Ag, might it proxy all three variables? Yes: if countries which educate women also tend to be rich and have few ag workers, then the bivariate results will blur all three relationships Chris Adolph (UW) Multiple Regression 1 / 33

Confounders and Omitted Variable Bias Should we be worried? Correlation between: Education & GDP is 0.46 Chris Adolph (UW) Multiple Regression 13 / 33

Confounders and Omitted Variable Bias Should we be worried? Correlation between: Education & GDP is 0.46 Correlation between GDP & Ag is -0.64 Chris Adolph (UW) Multiple Regression 13 / 33

Confounders and Omitted Variable Bias Should we be worried? Correlation between: Education & GDP is 0.46 Correlation between GDP & Ag is -0.64 Correlation between Education & Ag is -0.41 (What do these numbers mean?) Chris Adolph (UW) Multiple Regression 13 / 33

Confounders and Omitted Variable Bias Should we be worried? Correlation between: Education & GDP is 0.46 Correlation between GDP & Ag is -0.64 Correlation between Education & Ag is -0.41 (What do these numbers mean?) Chris Adolph (UW) Multiple Regression 13 / 33

Confounders and Omitted Variable Bias Should we be worried? Correlation between: Education & GDP is 0.46 Correlation between GDP & Ag is -0.64 Correlation between Education & Ag is -0.41 (What do these numbers mean?) Omitted variable bias: Leaving any of these variables out of our model could lead to misleading estimates of the effects of any variables we do include Chris Adolph (UW) Multiple Regression 13 / 33

The linear regression model, redux y i = β 0 + β 1 x 1i + β x i +... + β k x ki + ε i Our dependent variable likely depends on many covariates For example, x 1 might be the education ratio, x might be GDP per capita, and so on for as many covariates as we have, up to our kth covariate This leads to the above model, with multiple partial slopes β 1, β,... β 3 Chris Adolph (UW) Multiple Regression 14 / 33

The linear regression model, redux y i = β 0 + β 1 x 1i + β x i +... + β k x ki + ε i This model is still a linear regression model. Sometimes called a multiple regression model to distinguish from a bivariate regression, but mathematically, they are equivalent Henceforth, we will assume a linear regression can have many covariates How many covariates are allowed? Up to N 1, where N is the number of observations Each covariate added uses up a degree of freedom; once they are gone, there is nothing left for an additional covariate to explain Chris Adolph (UW) Multiple Regression 15 / 33

The linear regression model, redux y i = β 0 + β 1 x 1i + β x i +... + β k x ki + ε i How do we interpret the β s? Just as before. Chris Adolph (UW) Multiple Regression 16 / 33

The linear regression model, redux y i = β 0 + β 1 x 1i + β x i +... + β k x ki + ε i How do we interpret the β s? Just as before. The β s are still slopes, or the amount y i changes on average for a 1 unit increase in x, all else held equal Chris Adolph (UW) Multiple Regression 16 / 33

The linear regression model, redux y i = β 0 + β 1 x 1i + β x i +... + β k x ki + ε i How do we interpret the β s? Just as before. The β s are still slopes, or the amount y i changes on average for a 1 unit increase in x, all else held equal If we increase x 1 by 1 unit, and hold x fixed at its present level, then y goes up by β 1 Chris Adolph (UW) Multiple Regression 16 / 33

The linear regression model, redux y i = β 0 + β 1 x 1i + β x i +... + β k x ki + ε i How do we interpret the β s? Just as before. The β s are still slopes, or the amount y i changes on average for a 1 unit increase in x, all else held equal If we increase x 1 by 1 unit, and hold x fixed at its present level, then y goes up by β 1 Finally found a way to control for confounders using observational data! Aside for calculus-users: The β s are partial derivatives with respect to the x they multiply Chris Adolph (UW) Multiple Regression 16 / 33

Multiple regression: just like bivariate 1 Our estimates, ˆβ k, are the β k s that minimize the sum of the squared residuals (least squares) Chris Adolph (UW) Multiple Regression 17 / 33

Multiple regression: just like bivariate 1 Our estimates, ˆβ k, are the β k s that minimize the sum of the squared residuals (least squares) The uncertainty of each ˆβ k is given by its standard error Chris Adolph (UW) Multiple Regression 17 / 33

Multiple regression: just like bivariate 1 Our estimates, ˆβ k, are the β k s that minimize the sum of the squared residuals (least squares) The uncertainty of each ˆβ k is given by its standard error 3 We can still perform t-tests and calculate confidence intervals for each ˆβ k Chris Adolph (UW) Multiple Regression 17 / 33

Multiple regression: just like bivariate 1 Our estimates, ˆβ k, are the β k s that minimize the sum of the squared residuals (least squares) The uncertainty of each ˆβ k is given by its standard error 3 We can still perform t-tests and calculate confidence intervals for each ˆβ k 4 We can still calculate the fitted value ŷ i of any observation i: this is the model prediction for that case Chris Adolph (UW) Multiple Regression 17 / 33

Multiple regression: just like bivariate 1 Our estimates, ˆβ k, are the β k s that minimize the sum of the squared residuals (least squares) The uncertainty of each ˆβ k is given by its standard error 3 We can still perform t-tests and calculate confidence intervals for each ˆβ k 4 We can still calculate the fitted value ŷ i of any observation i: this is the model prediction for that case 5 We can still summarize goodness of fit using such measures as RMSE and R Chris Adolph (UW) Multiple Regression 17 / 33

Fertility as function of Education and GDP per capita Let s start small: a model with two covariates: Fertility i = ˆβ 0 + ˆβ 1 EduRatio i + ˆβ GDPpc i Fertility i = 11.4 0.08 EduRatio 1 0.05 GDPpc i We can present this result in several ways: 1 In a table by itself In a table compared to other models 3 Through graphics Chris Adolph (UW) Multiple Regression 18 / 33

Regression of Fertility on Education Ratio & GDP Variable Estimates se t-stat p-value Intercept 11.5 (0.73) 15.46 < 0.001 Education Ratio 0.08 (0.01) 9.93 < 0.001 GDP per capita ($k) 0.05 (0.01) 5.3 < 0.001 N 130 R 0.64 RMSE 1.01 How do we interpret the above? Chris Adolph (UW) Multiple Regression 19 / 33

Three regression models of fertility Model Variable 1 3 Intercept 1.59 4.13 11.5 (0.75) (0.17) (0.73) Education Ratio 0.10 0.08 (0.01) (0.01) GDP per capita 0.10 0.05 (0.01) (0.01) N 130 130 130 R 0.55 0.35 0.64 RMSE 1.1 1.35 1.01 Standard errors in parentheses How do we interpret the above table? Chris Adolph (UW) Multiple Regression 0 / 33

Three regression models of fertility Model Variable 1 3 Intercept 1.59 4.13 11.5 [11.11, 14.08] [3.80, 4.46] [9.81, 1.69] Education Ratio -0.10-0.08 [-0.1, -0.08] [-0.10, -0.06] GDP per capita -0.10-0.05 [-0.1, -0.08] [-0.07, -0.03] N 130 130 130 R 0.55 0.35 0.64 RMSE 1.1 1.35 1.01 95% confidence intervals in brackets This table presents the same information, but is easier to digest Chris Adolph (UW) Multiple Regression 1 / 33

4 6 8 4 6 8 Model fitted values, Fertility hat Actual data, Fertility To see the residuals, compare the model fit with reality Chris Adolph (UW) Multiple Regression / 33

4 6 8 4 6 8 Model fitted values, Fertility hat Actual data, Fertility Note that in the multivariate case, we need to plot against ŷ i, not x i, because there is more than one x i Chris Adolph (UW) Multiple Regression 3 / 33

Actual data, Fertility 8 6 4 Niger Uganda Chad Somalia Malawi Burkina Faso Yemen, Rep. Zambia Ethiopia Benin Guinea Rwanda Liberia Guinea Bissau Equatorial Guinea Mali Mozambique Senegal Eritrea Mauritania Cote Togo d'ivoire Kenya Iraq Guatemala Congo, Rep. Djibouti Solomon Ghana Islands Samoa OmanVanuatu Comoros Tonga Swaziland Lesotho Bolivia Namibia Gabon Tajikistan Nepal JordanZimbabwe Cambodia Bhutan Paraguay Belize Botswana Nicaragua India Qatar Bangladesh Fiji Israel Malaysia Ecuador South El Salvador Maldives AfricaPeru United Arab Emirates Panama Bahrain Colombia Morocco Jamaica Kuwait Brunei Argentina Mexico Costa Brazil Rica Guyana LebanonIndonesia Vietnam Uruguay Albania Netherlands Mongolia Iceland Chile Antilles Tunisia United Azerbaijan Ireland New States ZealandMauritius Kazakhstan Norway Denmark Finland Australia France ourg Netherlands Cyprus Belgium United Singapore Malta Macedonia, Cuba Georgia FYR weden Trinidad Kingdom and Tobago Canada Moldova Switzerland Portugal Barbados Croatia Germany Japan Austria Greece Korea, Estonia Belarus Hungary Lithuania Romania Rep. Slovak Republic Poland Slovenia Spain Latvia Ukraine Bulgaria Macao, China Examining which cases are big outliers may suggest additional variables to include as covariates 4 6 8 Model fitted values, Fertility hat Chris Adolph (UW) Multiple Regression 4 / 33

Actual data, Fertility 8 6 4 Niger Uganda Chad Somalia Malawi Burkina Faso Yemen, Rep. Zambia Ethiopia Benin Guinea Rwanda Liberia Guinea Bissau Equatorial Guinea Mali Mozambique Senegal Eritrea Mauritania Cote Togo d'ivoire Kenya Iraq Guatemala Congo, Rep. Djibouti Solomon Ghana Islands Samoa OmanVanuatu Comoros Tonga Swaziland Lesotho Bolivia Namibia Gabon Tajikistan Nepal JordanZimbabwe Cambodia Bhutan Paraguay Belize Botswana Nicaragua India Qatar Bangladesh Fiji Israel Malaysia Ecuador South El Salvador Maldives AfricaPeru United Arab Emirates Panama Bahrain Colombia Morocco Jamaica Kuwait Brunei Argentina Mexico Costa Brazil Rica Guyana LebanonIndonesia Vietnam Uruguay Albania Netherlands Mongolia Iceland Chile Antilles Tunisia United Azerbaijan Ireland New States ZealandMauritius Kazakhstan Norway Denmark Finland Australia France ourg Netherlands Cyprus Belgium United Singapore Malta Macedonia, Cuba Georgia FYR weden Trinidad Kingdom and Tobago Canada Moldova Switzerland Portugal Barbados Croatia Germany Japan Austria Greece Korea, Estonia Belarus Hungary Lithuania Romania Rep. Slovak Republic Poland Slovenia Spain Latvia Ukraine Bulgaria Macao, China 4 6 8 Model fitted values, Fertility hat Examining which cases are big outliers may suggest additional variables to include as covariates Think of what the missing cases have in common Chris Adolph (UW) Multiple Regression 4 / 33

linear predictor Visualizing the modelled relationship between many variables is tricky Education Ratio GDP per capita Chris Adolph (UW) Multiple Regression 5 / 33

linear predictor Visualizing the modelled relationship between many variables is tricky Education Ratio GDP per capita We can do it with a 3D plot for covariates, but not for 3 or more Chris Adolph (UW) Multiple Regression 5 / 33

8 Vary Education; GDP at mean 8 Vary GDP; Education at mean Fertility Rate 6 4 6 4 60 70 80 90 100 110 Education Ratio 10 0 30 40 50 GDP pc $k An alternative that works for any number of covariates: Plot out the model predictions as a function of each covariate, hold the other covariates fixed, e.g., at their means Then predict what Fertility rate should happen on average if the country had average GDP but variable Education (or vice versa) Chris Adolph (UW) Multiple Regression 6 / 33

8 Vary Education; GDP at mean 8 Vary GDP; Education at mean Fertility Rate 6 4 6 4 60 70 80 90 100 110 Education Ratio 10 0 30 40 50 GDP pc $k Let s compare the multiple regression estimates with the bivariate regression results (in black) How are they different? Are the bivariate results affected by omitted variable bias? Chris Adolph (UW) Multiple Regression 7 / 33

Regression models including Agricultural Labor Model Variable 1 3 4 Intercept 11.15.76 1.83 8.95 (.64) (0.18) (0.15) (.79) Education Ratio -0.09-0.06 (0.03) (0.03) GDP per capita ($k) -0.04-0.03 (0.01) (0.01) Agriculture Labor 0.0 0.004 (0.01) (0.008) N 7 7 7 7 R 0.13 0.17 0.14 0.6 RMSE 0.94 0.9 0.93 0.88 Standard errors in parentheses Chris Adolph (UW) Multiple Regression 8 / 33

Regression models including Agricultural Labor Model Variable 1 3 4 Intercept 11.15.76 1.83 8.95 [5.90, 16.41] [.39, 3.13] [1.53,.1] [3.38, 14.5] Edu Ratio -0.09-0.06 [-0.14, -0.04] [-0.1, -0.01] GDP pc -0.04-0.03 [-0.06, -0.0] [-0.06, -0.003] Ag Labor 0.0 0 0.004 [0.01, 0.03] [-0.01, 0.0] N 7 7 7 7 R 0.13 0.17 0.14 0.6 RMSE 0.94 0.9 0.93 0.88 95% confidence intervals in brackets Chris Adolph (UW) Multiple Regression 9 / 33

Vary Edu; GDP & Ag at mean 8 Vary GDP; Edu & Ag at mean 8 Vary Ag; Edu & GDP at mean 8 Fertility Rate 6 4 6 4 6 4 60 70 80 90 100 110 Education Ratio 10 0 30 40 50 GDP pc $k 0 10 0 30 40 50 60 70 Ag Labor % How do we interpret these plots? The dashed lines indicate extrapolation: no observed data have these values for the covariates Chris Adolph (UW) Multiple Regression 30 / 33

Vary Edu; GDP & Ag at mean 8 Vary GDP; Edu & Ag at mean 8 Vary Ag; Edu & GDP at mean 8 Fertility Rate 6 4 6 4 6 4 60 70 80 90 100 110 Education Ratio 10 0 30 40 50 GDP pc $k 0 10 0 30 40 50 60 70 Ag Labor % The black lines show the bivariate results. Was there omitted variable bias? Chris Adolph (UW) Multiple Regression 31 / 33

Vary Edu; GDP & Ag at mean 8 Vary GDP; Edu & Ag at mean 8 Vary Ag; Edu & GDP at mean 8 Fertility Rate 6 4 6 4 6 4 60 70 80 90 100 110 Education Ratio 10 0 30 40 50 GDP pc $k 0 10 0 30 40 50 60 70 Ag Labor % The black lines show the bivariate results. Was there omitted variable bias? YES. The apparent effect of Ag Labor was a mirage: just the omitted effect of GDP per capita. If we control for GDP, we see Ag Labor has no effect. Chris Adolph (UW) Multiple Regression 31 / 33

Warning! Linear regression is powerful, but easy to misuse We mentioned one assumption last time: That the error term is Normally distributed To this we now add two additonal assumptions Correct specification The model contains all the covariates that produce Y. If any omitted cause of Y is correlated with the included X s, then ˆβ can no longer be trusted. No endogeneity of Y None of the included X s are caused by Y Chris Adolph (UW) Multiple Regression 3 / 33

Final thoughts on linear regression Linear regression is a powerful tool for isolating conditional expectations of y given x after removing confounding variables But vulnerable to many hazards: Outliers Reverse causation Selection bias Omitted variable bias Advanced techniques can mitigate these problems, as well as deal with others Topic of a sequence of required courses in most graduate social science programs Chris Adolph (UW) Multiple Regression 33 / 33