4Stat Wk 10: Regression
|
|
- Joshua Lindsey
- 5 years ago
- Views:
Transcription
1 4Stat Wk 10: Regression Loading data with datalines Regression (Proc glm) - with interactions - with polynomial terms - with categorical variables (Proc glmselect) - with model selection (this is mostly chapter 6 material) Stat 342 Notes. Week 10 Page 1 / 57
2 In Week 8, we saw correlations, which are the first step to regression. In Week 9, we saw ANOVA, but treated like a regression on categorical variables. This week we look at a suite of examples surrounding regression and PROC GLM. As time permits, we will also look at t-tests and power analysis. Stat 342 Notes. Week 10 Page 2 / 57
3 First, let's load up the 'mtcars' dataset. Rather than relying on a.csv file, let's try loading it in through a data step and the DATALINES command. The advantage of loading text this way is... 1) It can be done without knowing in advance the folder structure of your system. 2) Complete control over how variables are interpreted. Stat 342 Notes. Week 10 Page 3 / 57
4 Stat 342 Notes. Week 10 Page 4 / 57
5 LENGTH Make $ 10. Model $ 22.; Establish the variables 'make' and 'model' to 10 and 22 characters long, respectively. If this not done, SAS will assume that the variables are 8 characters long and will cut off anything after that. INFILE DATALINES TRUNCOVER; The file source isn't an external file, but a set of data lines to be written later in this data step. Stat 342 Notes. Week 10 Page 5 / 57
6 INFILE DATALINES TRUNCOVER; 'TRUNCOVER' is short for TRUNCate OVER missing, meaning that every new line of the datalines is considered a new line of dataset that will be made. Other options include 'missover' (similar), and the default 'flowover' which keeps filling variables even after a new line has been started. Stat 342 Notes. Week 10 Page 6 / 57
7 INPUT Make $ Model $ mpg... Take the following datalines and put them in the variables 'make' (character/string), 'model' (character string), 'mpg' (numeric),... and so on. Every space is a new variable. You could also tell SAS to put two words (with a space) into 'model' with &, such as INPUT Model & $ Stat 342 Notes. Week 10 Page 7 / 57
8 DATALINES Mazda RX Mazda RX4_Wag Volvo 142E ; The actual data to be entered. Only one semicolon is used at the very end of the data. (If you need a semicolon IN the data somewhere you can use an escape sequence, like \; ) Stat 342 Notes. Week 10 Page 8 / 57
9 Finally, I wanted the company ('make') to show up in the model category as well, so I concatenated make and model together (and put the result into model). Stat 342 Notes. Week 10 Page 9 / 57
10 To concatenate two strings means to take one and put it on the end of the other. Three or more strings can also be concatenated. DATA mtcars; SET mtcars; model = cat(make,model); run; Stat 342 Notes. Week 10 Page 10 / 57
11 The result: Stat 342 Notes. Week 10 Page 11 / 57
12 <break> Stat 342 Notes. Week 10 Page 12 / 57
13 Now let's dig into the actual regression, starting with a simple one: fuel economy vs weight. proc glm data = mtcars; run; model mpg = weight / solution; The SOLUTION option for the model tells SAS to print the estimates of the intercept and slope coefficients. Without this, we got much simpler model summaries Stat 342 Notes. Week 10 Page 13 / 57
14 Stat 342 Notes. Week 10 Page 14 / 57
15 For simple regression, we also get a scatterplot with a line of best fit (i.e the least-squares line, the regression line) with two bands around it: The inner band (shaded) shows the confidence limits of the MEAN, also called the confidence interval. This is where the line COULD BE if we incorporated the variance of the coefficients. (95% of the time) The outer band (dotted lines) shows the confidence limits of INDIVIDUAL PREDICTIONS. This is where new data points could be if we predicted them from this model (95% again). Stat 342 Notes. Week 10 Page 15 / 57
16 Stat 342 Notes. Week 10 Page 16 / 57
17 The 'clparm' option gives you the confidence limits of the parameters. You can use alpha to change the confidence level of these limits, as well as the confidence bands. model mpg = weight / solution clparm; Stat 342 Notes. Week 10 Page 17 / 57
18 You can use alpha to change the confidence level of these limits, as well as the confidence bands. model mpg = weight / solution clparm alpha=0.01; Stat 342 Notes. Week 10 Page 18 / 57
19 Other options, like p and clm add predictions and confidence limits of the mean response for each observation. These are output into a separate table. proc glm data = mtcars; run; model mpg = weight / p clm; Stat 342 Notes. Week 10 Page 19 / 57
20 Stat 342 Notes. Week 10 Page 20 / 57
21 ...and this table can be appended to the existing dataset so you can do further processing. proc glm data = mtcars; model mpg = weight / p clm; output out = mtcars_model P=predicted_mpg R=residual_mpg; run; Stat 342 Notes. Week 10 Page 21 / 57
22 (SAS demo) Stat 342 Notes. Week 10 Page 22 / 57
23 ...such as comparing residuals to predicted values, which is very useful for detecting unequal variance. The most typical sign of unequal variance to see in this plot is a fan or a code shape. proc sgplot data=mtcars_model; scatter x=predicted_mpg y=residual_mpg; ellipse x=predicted_mpg y=residual_mpg; run; Stat 342 Notes. Week 10 Page 23 / 57
24 Stat 342 Notes. Week 10 Page 24 / 57
25 All of these options just tell SAS to add them to the list of things to calculate and/or include in the output. You can use all of them together. One drawback/feature is that 'alpha' will apply to ALL the output of that model. proc glm data = mtcars; run; model mpg = weight / alpha = solution clparm p clm; Stat 342 Notes. Week 10 Page 25 / 57
26 <break image> Stat 342 Notes. Week 10 Page 26 / 57
27 Let's try a more sophisticated model of fuel economy. Instead of just looking at the weight of a car, let's also look at its horsepower and displacement. proc glm data = mtcars; run; model mpg = weight hp displacement / solution; Stat 342 Notes. Week 10 Page 27 / 57
28 What if there's an interaction between weight and horsepower? We can include an interaction term with an asterisk. Note that I've explicitly included the main effects 'weight' and 'hp' in here as well. This is good statistical practice. proc glm data = mtcars; model mpg = weight hp weight*hp displacement / solution; run; Stat 342 Notes. Week 10 Page 28 / 57
29 (SAS demo, comparing these two models) (Document camera work) Stat 342 Notes. Week 10 Page 29 / 57
30 What about polynomial terms? Option one is to make an interaction of a variable with itself. The following code will let you see how the fuel economy of a car changes with horsepower AND with horsepower squared. proc glm data = mtcars; run; model mpg = hp hp*hp / solution; Stat 342 Notes. Week 10 Page 30 / 57
31 However, other mathematical functions won't work in the model statement. proc glm data = mtcars; run; model mpg = hp hp**2 / solution;...not even premade ones proc glm data = mtcars; model mpg = hp sqrt(hp) / solution; run; Stat 342 Notes. Week 10 Page 31 / 57
32 To regress against transformations of variables, or polynomial terms of variables, you need to create these transformations with a data step. data mtcars; set mtcars; hp2 = hp**2; hp_sqrt = sqrt(hp); hp_log = log(hp); run; Stat 342 Notes. Week 10 Page 32 / 57
33 Then you can regress against these proc glm data = mtcars; model mpg = hp hp2 hp_sqrt hp_log / solution; run; Stat 342 Notes. Week 10 Page 33 / 57
34 What about categorical variables, like number of cylinders? We have cars with 4, 6, or 8 cylinders, but it doesn't make sense to treat this as a continuous variable. Predicting the fuel economy of a car with 5.5 cylinders is meaningless, because no such car exists. This is where the CLASS statement from last week comes back into play. Stat 342 Notes. Week 10 Page 34 / 57
35 We need to specify to SAS which variables are categorical. After that, we can those variables like any other in a model. Each category is the amount the mean response is increased or decreased for observations in that category. All else being equal. proc glm data = mtcars; class cylinders; model mpg = weight cylinders / solution; run; Stat 342 Notes. Week 10 Page 35 / 57
36 Stat 342 Notes. Week 10 Page 36 / 57
37 (document camera work) Stat 342 Notes. Week 10 Page 37 / 57
38 We can even include interactions between numeric and categorical variables. This will produce a separate slope coefficient under each category. proc glm data = mtcars; class cylinders; model mpg = weight cylinders weight*cylinders / solution; run; Stat 342 Notes. Week 10 Page 38 / 57
39 Stat 342 Notes. Week 10 Page 39 / 57
40 Note that the LAST category is considered the baseline. This is the opposite of R. Stat 342 Notes. Week 10 Page 40 / 57
41 (Document camera work) Stat 342 Notes. Week 10 Page 41 / 57
42 As with two-way (or multi-way) ANOVA, we can include more than one categorical variable proc glm data = mtcars; class cylinders; model mpg = weight hp cylinders weight*cylinders hp*cylinders / solution; run; Stat 342 Notes. Week 10 Page 42 / 57
43 Stat 342 Notes. Week 10 Page 43 / 57
44 PROC GLM is very flexible, but also very generalist. For more detailed results from regression, you can use PROC REG, which includes options like... cross-validation (Does a model derived from part of your data fit 'new' observations from the rest of your data?) Stat 342 Notes. Week 10 Page 44 / 57
45 model selection (is the model you're using now the best one? How do I efficiently compare many different models?) Diagnostics (are some of my observations overly influential?) PROC REG lacks generalization, however. It's designed for simple and multiple regression where all the explanatory variables are continuous. Stat 342 Notes. Week 10 Page 45 / 57
46 To incorporate categorical data, we would need to manually create dummy variables from categories using a data step. Stat 342 Notes. Week 10 Page 46 / 57
47 However... PROC GLMSELECT has the advantages of both. Stat 342 Notes. Week 10 Page 47 / 57
48 Model selection methods aim to find models that do two things Fit the data well. That is, models with small residuals, high r-squared, and low root-mean-square-error (RMSE). 2. Describe the data simply / parsimoniously. This means having few terms in the model, estimating few parameters, and using few degrees of freedom. Stat 342 Notes. Week 10 Page 48 / 57
49 Stat 342 Notes. Week 10 Page 49 / 57
50 We can take our previous model of weight, horsepower, cylinders, and two interaction terms and apply a model selection method called 'stepwise' to determine if this model is the best. proc glmselect data = mtcars; class cylinders; model mpg = weight hp cylinders weight*cylinders hp*cylinders / selection=stepwise(select = AIC); run; Stat 342 Notes. Week 10 Page 50 / 57
51 Stat 342 Notes. Week 10 Page 51 / 57
52 Stat 342 Notes. Week 10 Page 52 / 57
53 'stepwise' is just one method of model selection. It's popular, it's a nice combination of 'forward selection' and 'backwards elimination', but it's outdated. A much more popular method these days is LASSO, which is also available in SAS with... Selection=lasso (But requires a special package in R). The LASSO method can handle HUNDREDS of different variables at once, even if there are more variables than observations! Stat 342 Notes. Week 10 Page 53 / 57
54 Likewise, Akaike Information Criterion (AIC) is just one criterion of selection for models. Other option include: BIC (higher preference towards simpler models, if you sample size is large) AICc (AIC with a cross-validation adjustment) ADJRSQ (Adjusted R-squared. You typical coefficient of determination with a penalty per term) Stat 342 Notes. Week 10 Page 54 / 57
55 plots = criterionpanel will show you if the other criteria agree. Stat 342 Notes. Week 10 Page 55 / 57
56 Try something with LOTS of variables. proc glmselect data = mtcars plots=criterionpanel; class cylinders gear; model mpg = weight hp cylinders weight*cylinders hp*cylinders hp*gear displacement*gear weight*weight / selection=lasso; run; Stat 342 Notes. Week 10 Page 56 / 57
57 Additional Proc GLMSELECT slides from nt/dam/sas/en_ca/user %20Group%20Presentations/Winnipeg-User- Group/SylvainTremblay-PROCGLMSELECT-Spring2012.pdf GLMSELECT for Model Selection Winnipeg SAS User Group Meeting May 11, 2012 Sylvain Tremblay SAS Canada Education Stat 342 Notes. Week 10 Page 57 / 57
Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality
Week 9 Hour 3 Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality Stat 302 Notes. Week 9, Hour 3, Page 1 / 39 Stepwise Now that we've introduced interactions,
More informationWeek 8 Hour 1: More on polynomial fits. The AIC. Hour 2: Dummy Variables what are they? An NHL Example. Hour 3: Interactions. The stepwise method.
Week 8 Hour 1: More on polynomial fits. The AIC Hour 2: Dummy Variables what are they? An NHL Example Hour 3: Interactions. The stepwise method. Stat 302 Notes. Week 8, Hour 1, Page 1 / 34 Human growth
More informationStat Wk 9: Hypothesis Tests and Analysis
Stat 342 - Wk 9: Hypothesis Tests and Analysis Crash course on ANOVA, proc glm Stat 342 Notes. Week 9 Page 1 / 57 Crash Course: ANOVA AnOVa stands for Analysis Of Variance. Sometimes it s called ANOVA,
More informationGeneral Example: Gas Mileage (Stat 5044 Schabenberger & J.P.Morgen)
General Example: Gas Mileage (Stat 5044 Schabenberger & J.P.Morgen) From Motor Trend magazine data were obtained for n=32 cars on the following variables: Y= Gas Mileage (miles per gallon, MPG) X1= Engine
More informationProblem #1 Neurological signs and symptoms of ciguatera poisoning as the start of treatment and 2.5 hours after treatment with mannitol.
Ho (null hypothesis) Ha (alternative hypothesis) Problem #1 Neurological signs and symptoms of ciguatera poisoning as the start of treatment and 2.5 hours after treatment with mannitol. Hypothesis: Ho:
More informationLab 8: Multiple Linear Regression
Lab 8: Multiple Linear Regression 1 Grading the Professor Many college courses conclude by giving students the opportunity to evaluate the course and the instructor anonymously. However, the use of these
More informationCRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys
Multiple Regression Analysis 1 CRITERIA FOR USE Multiple regression analysis is used to test the effects of n independent (predictor) variables on a single dependent (criterion) variable. Regression tests
More informationBangor University Laboratory Exercise 1, June 2008
Laboratory Exercise, June 2008 Classroom Exercise A forest land owner measures the outside bark diameters at.30 m above ground (called diameter at breast height or dbh) and total tree height from ground
More informationProblem 1) Match the terms to their definitions. Every term is used exactly once. (In the real midterm, there are fewer terms).
Problem 1) Match the terms to their definitions. Every term is used exactly once. (In the real midterm, there are fewer terms). 1. Bayesian Information Criterion 2. Cross-Validation 3. Robust 4. Imputation
More informationLinear Regression in SAS
1 Suppose we wish to examine factors that predict patient s hemoglobin levels. Simulated data for six patients is used throughout this tutorial. data hgb_data; input id age race $ bmi hgb; cards; 21 25
More informationStat Wk 8: Continuous Data
Stat 342 - Wk 8: Continuous Data proc iml Loading and saving to datasets proc means proc univariate proc sgplot proc corr Stat 342 Notes. Week 3, Page 1 / 71 PROC IML - Reading other datasets. If you want
More informationNotes for laboratory session 2
Notes for laboratory session 2 Preliminaries Consider the ordinary least-squares (OLS) regression of alcohol (alcohol) and plasma retinol (retplasm). We do this with STATA as follows:. reg retplasm alcohol
More informationAnswer to exercise: Growth of guinea pigs
Answer to exercise: Growth of guinea pigs The effect of a vitamin E diet on the growth of guinea pigs is investigated in the following way: In the beginning of week 1, 10 animals received a growth inhibitor.
More informationLab 4 (M13) Objective: This lab will give you more practice exploring the shape of data, and in particular in breaking the data into two groups.
Lab 4 (M13) Objective: This lab will give you more practice exploring the shape of data, and in particular in breaking the data into two groups. Activity 1 Examining Data From Class Background Download
More informationANOVA. Thomas Elliott. January 29, 2013
ANOVA Thomas Elliott January 29, 2013 ANOVA stands for analysis of variance and is one of the basic statistical tests we can use to find relationships between two or more variables. ANOVA compares the
More informationMULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES
24 MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES In the previous chapter, simple linear regression was used when you have one independent variable and one dependent variable. This chapter
More informationApplied Medical. Statistics Using SAS. Geoff Der. Brian S. Everitt. CRC Press. Taylor Si Francis Croup. Taylor & Francis Croup, an informa business
Applied Medical Statistics Using SAS Geoff Der Brian S. Everitt CRC Press Taylor Si Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Croup, an informa business A
More informationMultiple Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University
Multiple Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Multiple Regression 1 / 19 Multiple Regression 1 The Multiple
More informationPart 8 Logistic Regression
1 Quantitative Methods for Health Research A Practical Interactive Guide to Epidemiology and Statistics Practical Course in Quantitative Data Handling SPSS (Statistical Package for the Social Sciences)
More informationData Science and Statistics in Research: unlocking the power of your data
Data Science and Statistics in Research: unlocking the power of your data Session 1.4: Data and variables 1/ 33 OUTLINE Types of data Types of variables Presentation of data Tables Summarising Data 2/
More informationHour 2: lm (regression), plot (scatterplots), cooks.distance and resid (diagnostics) Stat 302, Winter 2016 SFU, Week 3, Hour 1, Page 1
Agenda for Week 3, Hr 1 (Tuesday, Jan 19) Hour 1: - Installing R and inputting data. - Different tools for R: Notepad++ and RStudio. - Basic commands:?,??, mean(), sd(), t.test(), lm(), plot() - t.test()
More informationChapter 3: Examining Relationships
Name Date Per Key Vocabulary: response variable explanatory variable independent variable dependent variable scatterplot positive association negative association linear correlation r-value regression
More informationSTAT 201 Chapter 3. Association and Regression
STAT 201 Chapter 3 Association and Regression 1 Association of Variables Two Categorical Variables Response Variable (dependent variable): the outcome variable whose variation is being studied Explanatory
More informationReview Questions in Introductory Knowledge... 37
Table of Contents Preface..... 17 About the Authors... 19 How This Book is Organized... 20 Who Should Buy This Book?... 20 Where to Find Answers to Review Questions and Exercises... 20 How to Report Errata...
More informationChapter 3 Software Packages to Install How to Set Up Python Eclipse How to Set Up Eclipse... 42
Table of Contents Preface..... 21 About the Authors... 23 Acknowledgments... 24 How This Book is Organized... 24 Who Should Buy This Book?... 24 Where to Find Answers to Review Questions and Exercises...
More informationThe Association Design and a Continuous Phenotype
PSYC 5102: Association Design & Continuous Phenotypes (4/4/07) 1 The Association Design and a Continuous Phenotype The purpose of this note is to demonstrate how to perform a population-based association
More informationBIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA
BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA PART 1: Introduction to Factorial ANOVA ingle factor or One - Way Analysis of Variance can be used to test the null hypothesis that k or more treatment or group
More informationToday: Binomial response variable with an explanatory variable on an ordinal (rank) scale.
Model Based Statistics in Biology. Part V. The Generalized Linear Model. Single Explanatory Variable on an Ordinal Scale ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10,
More informationProfessor Rose-Helleknat's PCR Data for Breast Cancer Study
Professor Rose-Helleknat's PCR Data for Breast Cancer Study Summary statistics for Crossing Point, cp = - log 2 (RNA) Obs Treatment Outcome n Mean Variance St_Deviation St_Error 1 Placebo Cancer 7 21.4686
More informationIntro to SPSS. Using SPSS through WebFAS
Intro to SPSS Using SPSS through WebFAS http://www.yorku.ca/computing/students/labs/webfas/ Try it early (make sure it works from your computer) If you need help contact UIT Client Services Voice: 416-736-5800
More informationLecture 12: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression
Lecture 12: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression Equation of Regression Line; Residuals Effect of Explanatory/Response Roles Unusual Observations Sample
More informationA response variable is a variable that. An explanatory variable is a variable that.
Name:!!!! Date: Scatterplots The most common way to display the relation between two quantitative variable is a scatterplot. Statistical studies often try to show through scatterplots, that changing one
More informationInferential Statistics
Inferential Statistics and t - tests ScWk 242 Session 9 Slides Inferential Statistics Ø Inferential statistics are used to test hypotheses about the relationship between the independent and the dependent
More informationPreliminary Report on Simple Statistical Tests (t-tests and bivariate correlations)
Preliminary Report on Simple Statistical Tests (t-tests and bivariate correlations) After receiving my comments on the preliminary reports of your datasets, the next step for the groups is to complete
More informationChapter 9. Factorial ANOVA with Two Between-Group Factors 10/22/ Factorial ANOVA with Two Between-Group Factors
Chapter 9 Factorial ANOVA with Two Between-Group Factors 10/22/2001 1 Factorial ANOVA with Two Between-Group Factors Recall that in one-way ANOVA we study the relation between one criterion variable and
More informationBenchmark Dose Modeling Cancer Models. Allen Davis, MSPH Jeff Gift, Ph.D. Jay Zhao, Ph.D. National Center for Environmental Assessment, U.S.
Benchmark Dose Modeling Cancer Models Allen Davis, MSPH Jeff Gift, Ph.D. Jay Zhao, Ph.D. National Center for Environmental Assessment, U.S. EPA Disclaimer The views expressed in this presentation are those
More informationDan Byrd UC Office of the President
Dan Byrd UC Office of the President 1. OLS regression assumes that residuals (observed value- predicted value) are normally distributed and that each observation is independent from others and that the
More informationCHAPTER TWO REGRESSION
CHAPTER TWO REGRESSION 2.0 Introduction The second chapter, Regression analysis is an extension of correlation. The aim of the discussion of exercises is to enhance students capability to assess the effect
More informationHZAU MULTIVARIATE HOMEWORK #2 MULTIPLE AND STEPWISE LINEAR REGRESSION
HZAU MULTIVARIATE HOMEWORK #2 MULTIPLE AND STEPWISE LINEAR REGRESSION Using the malt quality dataset on the class s Web page: 1. Determine the simple linear correlation of extract with the remaining variables.
More informationMultiple Linear Regression Analysis
Revised July 2018 Multiple Linear Regression Analysis This set of notes shows how to use Stata in multiple regression analysis. It assumes that you have set Stata up on your computer (see the Getting Started
More informationUsing SPSS for Correlation
Using SPSS for Correlation This tutorial will show you how to use SPSS version 12.0 to perform bivariate correlations. You will use SPSS to calculate Pearson's r. This tutorial assumes that you have: Downloaded
More information3 CONCEPTUAL FOUNDATIONS OF STATISTICS
3 CONCEPTUAL FOUNDATIONS OF STATISTICS In this chapter, we examine the conceptual foundations of statistics. The goal is to give you an appreciation and conceptual understanding of some basic statistical
More information7) Briefly explain why a large value of r 2 is desirable in a regression setting.
Directions: Complete each problem. A complete problem has not only the answer, but the solution and reasoning behind that answer. All work must be submitted on separate pieces of paper. 1) Manatees are
More informationSection 6: Analysing Relationships Between Variables
6. 1 Analysing Relationships Between Variables Section 6: Analysing Relationships Between Variables Choosing a Technique The Crosstabs Procedure The Chi Square Test The Means Procedure The Correlations
More informationHW 3.2: page 193 #35-51 odd, 55, odd, 69, 71-78
35. What s My Line? You use the same bar of soap to shower each morning. The bar weighs 80 grams when it is new. Its weight goes down by 6 grams per day on average. What is the equation of the regression
More informationisc ove ring i Statistics sing SPSS
isc ove ring i Statistics sing SPSS S E C O N D! E D I T I O N (and sex, drugs and rock V roll) A N D Y F I E L D Publications London o Thousand Oaks New Delhi CONTENTS Preface How To Use This Book Acknowledgements
More informationIntroduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018
Introduction to Machine Learning Katherine Heller Deep Learning Summer School 2018 Outline Kinds of machine learning Linear regression Regularization Bayesian methods Logistic Regression Why we do this
More informationReveal Relationships in Categorical Data
SPSS Categories 15.0 Specifications Reveal Relationships in Categorical Data Unleash the full potential of your data through perceptual mapping, optimal scaling, preference scaling, and dimension reduction
More information3.2 Least- Squares Regression
3.2 Least- Squares Regression Linear (straight- line) relationships between two quantitative variables are pretty common and easy to understand. Correlation measures the direction and strength of these
More informationChapter 3: Describing Relationships
Chapter 3: Describing Relationships Objectives: Students will: Construct and interpret a scatterplot for a set of bivariate data. Compute and interpret the correlation, r, between two variables. Demonstrate
More informationA macro of building predictive model in PROC LOGISTIC with AIC-optimal variable selection embedded in cross-validation
SESUG Paper AD-36-2017 A macro of building predictive model in PROC LOGISTIC with AIC-optimal variable selection embedded in cross-validation Hongmei Yang, Andréa Maslow, Carolinas Healthcare System. ABSTRACT
More informationSimple Linear Regression One Categorical Independent Variable with Several Categories
Simple Linear Regression One Categorical Independent Variable with Several Categories Does ethnicity influence total GCSE score? We ve learned that variables with just two categories are called binary
More informationSection 3.2 Least-Squares Regression
Section 3.2 Least-Squares Regression Linear relationships between two quantitative variables are pretty common and easy to understand. Correlation measures the direction and strength of these relationships.
More informationIntroduction to regression
Introduction to regression Regression describes how one variable (response) depends on another variable (explanatory variable). Response variable: variable of interest, measures the outcome of a study
More informationLecture 12 Cautions in Analyzing Associations
Lecture 12 Cautions in Analyzing Associations MA 217 - Stephen Sawin Fairfield University August 8, 2017 Cautions in Linear Regression Three things to be careful when doing linear regression we have already
More informationLecture 6B: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression
Lecture 6B: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression! Equation of Regression Line; Residuals! Effect of Explanatory/Response Roles! Unusual Observations! Sample
More informationToday s Agenda Wk 1 - Welcome to Stat 342
Today s Agenda Wk 1 - Welcome to Stat 342 - Policy - Some motivation - Course Schedule - Installing SAS - Resources available Stat 342 Notes. Week 1, Page 1 / 49 Contact: E-mail: jackd@sfu.ca Course website:
More informationIAPT: Regression. Regression analyses
Regression analyses IAPT: Regression Regression is the rather strange name given to a set of methods for predicting one variable from another. The data shown in Table 1 and come from a student project
More informationChapter 9: Comparing two means
Chapter 9: Comparing two means Smart Alex s Solutions Task 1 Is arachnophobia (fear of spiders) specific to real spiders or will pictures of spiders evoke similar levels of anxiety? Twelve arachnophobes
More informationResearch Methods in Forest Sciences: Learning Diary. Yoko Lu December Research process
Research Methods in Forest Sciences: Learning Diary Yoko Lu 285122 9 December 2016 1. Research process It is important to pursue and apply knowledge and understand the world under both natural and social
More informationBiology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 5 Residuals and multiple regression Introduction
Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 5 Residuals and multiple regression Introduction In this exercise, we will gain experience assessing scatterplots in regression and
More informationFollowing is a list of topics in this paper:
Preliminary NTS Data Analysis Overview In this paper A preliminary investigation of some data around NTS performance has been started. This document reviews the results to date. Following is a list of
More informationGeneralized Estimating Equations for Depression Dose Regimes
Generalized Estimating Equations for Depression Dose Regimes Karen Walker, Walker Consulting LLC, Menifee CA Generalized Estimating Equations on the average produce consistent estimates of the regression
More informationSTAT 503X Case Study 1: Restaurant Tipping
STAT 503X Case Study 1: Restaurant Tipping 1 Description Food server s tips in restaurants may be influenced by many factors including the nature of the restaurant, size of the party, table locations in
More information10. LINEAR REGRESSION AND CORRELATION
1 10. LINEAR REGRESSION AND CORRELATION The contingency table describes an association between two nominal (categorical) variables (e.g., use of supplemental oxygen and mountaineer survival ). We have
More informationChapter 12: Analysis of covariance, ANCOVA
Chapter 12: Analysis of covariance, ANCOVA Smart Alex s Solutions Task 1 A few years back I was stalked. You d think they could have found someone a bit more interesting to stalk, but apparently times
More informationSmall Group Presentations
Admin Assignment 1 due next Tuesday at 3pm in the Psychology course centre. Matrix Quiz during the first hour of next lecture. Assignment 2 due 13 May at 10am. I will upload and distribute these at the
More informationAn Introduction to Modern Econometrics Using Stata
An Introduction to Modern Econometrics Using Stata CHRISTOPHER F. BAUM Department of Economics Boston College A Stata Press Publication StataCorp LP College Station, Texas Contents Illustrations Preface
More information12.1 Inference for Linear Regression. Introduction
12.1 Inference for Linear Regression vocab examples Introduction Many people believe that students learn better if they sit closer to the front of the classroom. Does sitting closer cause higher achievement,
More informationHomework 2 Math 11, UCSD, Winter 2018 Due on Tuesday, 23rd January
PID: Last Name, First Name: Section: Approximate time spent to complete this assignment: hour(s) Readings: Chapters 7, 8 and 9. Homework 2 Math 11, UCSD, Winter 2018 Due on Tuesday, 23rd January Exercise
More informationComplex Regression Models with Coded, Centered & Quadratic Terms
Complex Regression Models with Coded, Centered & Quadratic Terms We decided to continue our study of the relationships among amount and difficulty of exam practice with exam performance in the first graduate
More informationUNEQUAL CELL SIZES DO MATTER
1 of 7 1/12/2010 11:26 AM UNEQUAL CELL SIZES DO MATTER David C. Howell Most textbooks dealing with factorial analysis of variance will tell you that unequal cell sizes alter the analysis in some way. I
More informationThe impact of pre-selected variance inflation factor thresholds on the stability and predictive power of logistic regression models in credit scoring
Volume 31 (1), pp. 17 37 http://orion.journals.ac.za ORiON ISSN 0529-191-X 2015 The impact of pre-selected variance inflation factor thresholds on the stability and predictive power of logistic regression
More informationStudy Guide #2: MULTIPLE REGRESSION in education
Study Guide #2: MULTIPLE REGRESSION in education What is Multiple Regression? When using Multiple Regression in education, researchers use the term independent variables to identify those variables that
More informationStat 13, Lab 11-12, Correlation and Regression Analysis
Stat 13, Lab 11-12, Correlation and Regression Analysis Part I: Before Class Objective: This lab will give you practice exploring the relationship between two variables by using correlation, linear regression
More information12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2
PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Introduce moderated multiple regression Continuous predictor continuous predictor Continuous predictor categorical predictor Understand
More informationA SAS Macro for Adaptive Regression Modeling
A SAS Macro for Adaptive Regression Modeling George J. Knafl, PhD Professor University of North Carolina at Chapel Hill School of Nursing Supported in part by NIH Grants R01 AI57043 and R03 MH086132 Overview
More informationSimple Linear Regression the model, estimation and testing
Simple Linear Regression the model, estimation and testing Lecture No. 05 Example 1 A production manager has compared the dexterity test scores of five assembly-line employees with their hourly productivity.
More informationWeek 10 Hour 1. Shapiro-Wilks Test (from last time) Cross-Validation. Week 10 Hour 2 Missing Data. Stat 302 Notes. Week 10, Hour 2, Page 1 / 32
Week 10 Hour 1 Shapiro-Wilks Test (from last time) Cross-Validation Week 10 Hour 2 Missing Data Stat 302 Notes. Week 10, Hour 2, Page 1 / 32 Cross-Validation in the Wild It s often more important to describe
More informationCHAPTER ONE CORRELATION
CHAPTER ONE CORRELATION 1.0 Introduction The first chapter focuses on the nature of statistical data of correlation. The aim of the series of exercises is to ensure the students are able to use SPSS to
More information3.2A Least-Squares Regression
3.2A Least-Squares Regression Linear (straight-line) relationships between two quantitative variables are pretty common and easy to understand. Our instinct when looking at a scatterplot of data is to
More informationApplication of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties
Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties Bob Obenchain, Risk Benefit Statistics, August 2015 Our motivation for using a Cut-Point
More informationWhat Are Your Odds? : An Interactive Web Application to Visualize Health Outcomes
What Are Your Odds? : An Interactive Web Application to Visualize Health Outcomes Abstract Spreading health knowledge and promoting healthy behavior can impact the lives of many people. Our project aims
More informationMULTIPLE REGRESSION OF CPS DATA
MULTIPLE REGRESSION OF CPS DATA A further inspection of the relationship between hourly wages and education level can show whether other factors, such as gender and work experience, influence wages. Linear
More informationLesson 9: Two Factor ANOVAS
Published on Agron 513 (https://courses.agron.iastate.edu/agron513) Home > Lesson 9 Lesson 9: Two Factor ANOVAS Developed by: Ron Mowers, Marin Harbur, and Ken Moore Completion Time: 1 week Introduction
More informationScore Tests of Normality in Bivariate Probit Models
Score Tests of Normality in Bivariate Probit Models Anthony Murphy Nuffield College, Oxford OX1 1NF, UK Abstract: A relatively simple and convenient score test of normality in the bivariate probit model
More information5 To Invest or not to Invest? That is the Question.
5 To Invest or not to Invest? That is the Question. Before starting this lab, you should be familiar with these terms: response y (or dependent) and explanatory x (or independent) variables; slope and
More informationHow to analyze correlated and longitudinal data?
How to analyze correlated and longitudinal data? Niloofar Ramezani, University of Northern Colorado, Greeley, Colorado ABSTRACT Longitudinal and correlated data are extensively used across disciplines
More informationChapter 3 CORRELATION AND REGRESSION
CORRELATION AND REGRESSION TOPIC SLIDE Linear Regression Defined 2 Regression Equation 3 The Slope or b 4 The Y-Intercept or a 5 What Value of the Y-Variable Should be Predicted When r = 0? 7 The Regression
More information2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%
Capstone Test (will consist of FOUR quizzes and the FINAL test grade will be an average of the four quizzes). Capstone #1: Review of Chapters 1-3 Capstone #2: Review of Chapter 4 Capstone #3: Review of
More informationATTACH YOUR SAS CODE WITH YOUR ANSWERS.
BSTA 6652 Survival Analysis Winter, 2017 Problem Set 5 Reading: Klein: Chapter 12; SAS textbook: Chapter 4 ATTACH YOUR SAS CODE WITH YOUR ANSWERS. The data in BMTH.txt was collected on 43 bone marrow transplant
More informationAnalysis of single gene effects 1. Quantitative analysis of single gene effects. Gregory Carey, Barbara J. Bowers, Jeanne M.
Analysis of single gene effects 1 Quantitative analysis of single gene effects Gregory Carey, Barbara J. Bowers, Jeanne M. Wehner From the Department of Psychology (GC, JMW) and Institute for Behavioral
More informationAnalysis of Covariance (ANCOVA)
Analysis of Covariance (ANCOVA) Some background ANOVA can be extended to include one or more continuous variables that predict the outcome (or dependent variable). Continuous variables such as these, that
More informationSTATS Relationships between variables: Correlation
STATS 1060 Relationships between variables: Correlation READINGS: Chapter 7 of your text book (DeVeaux, Vellman and Bock); on-line notes for correlation; on-line practice problems for correlation NOTICE:
More informationChapter 1: Exploring Data
Chapter 1: Exploring Data Key Vocabulary:! individual! variable! frequency table! relative frequency table! distribution! pie chart! bar graph! two-way table! marginal distributions! conditional distributions!
More informationParameter Estimation of Cognitive Attributes using the Crossed Random- Effects Linear Logistic Test Model with PROC GLIMMIX
Paper 1766-2014 Parameter Estimation of Cognitive Attributes using the Crossed Random- Effects Linear Logistic Test Model with PROC GLIMMIX ABSTRACT Chunhua Cao, Yan Wang, Yi-Hsin Chen, Isaac Y. Li University
More informationStatistical Tools in Biology
Statistical Tools in Biology Research Methodology Design protocol/procedure. (2 types) Cross sectional study comparing two different grps. e.g, comparing LDL levels between athletes and couch potatoes.
More informationCLASSICAL AND. MODERN REGRESSION WITH APPLICATIONS
- CLASSICAL AND. MODERN REGRESSION WITH APPLICATIONS SECOND EDITION Raymond H. Myers Virginia Polytechnic Institute and State university 1 ~l~~l~l~~~~~~~l!~ ~~~~~l~/ll~~ Donated by Duxbury o Thomson Learning,,
More informationSPSS output for 420 midterm study
Ψ Psy Midterm Part In lab (5 points total) Your professor decides that he wants to find out how much impact amount of study time has on the first midterm. He randomly assigns students to study for hours,
More information15.301/310, Managerial Psychology Prof. Dan Ariely Recitation 8: T test and ANOVA
15.301/310, Managerial Psychology Prof. Dan Ariely Recitation 8: T test and ANOVA Statistics does all kinds of stuff to describe data Talk about baseball, other useful stuff We can calculate the probability.
More information