NORTH SOUTH UNIVERSITY TUTORIAL 2

Similar documents
Content. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries. Research question. Example Newly diagnosed Type 2 Diabetes

Correlation and regression

Pitfalls in Linear Regression Analysis

Simple Linear Regression

EXECUTIVE SUMMARY DATA AND PROBLEM

STATISTICS INFORMED DECISIONS USING DATA

Simple Linear Regression the model, estimation and testing

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

IAPT: Regression. Regression analyses

Linear Regression in SAS

GPA vs. Hours of Sleep: A Simple Linear Regression Jacob Ushkurnis 12/16/2016

Chapter 3 CORRELATION AND REGRESSION

Normal Q Q. Residuals vs Fitted. Standardized residuals. Theoretical Quantiles. Fitted values. Scale Location 26. Residuals vs Leverage

Math 215, Lab 7: 5/23/2007

Statistics for Psychology

12/30/2017. PSY 5102: Advanced Statistics for Psychological and Behavioral Research 2

Data Analysis in the Health Sciences. Final Exam 2010 EPIB 621

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

Midterm Exam ANSWERS Categorical Data Analysis, CHL5407H

1.4 - Linear Regression and MS Excel

Class 7 Everything is Related

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

Poisson regression. Dae-Jin Lee Basque Center for Applied Mathematics.

Chapter 14: More Powerful Statistical Methods

Section 3.2 Least-Squares Regression

Lecture 12: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

Multiple Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Regression Including the Interaction Between Quantitative Variables

M15_BERE8380_12_SE_C15.6.qxd 2/21/11 8:21 PM Page Influence Analysis 1

STAT 503X Case Study 1: Restaurant Tipping

bivariate analysis: The statistical analysis of the relationship between two variables.

MEA DISCUSSION PAPERS

10. LINEAR REGRESSION AND CORRELATION

Simple Linear Regression One Categorical Independent Variable with Several Categories

Question 1(25= )

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do.

Regression models, R solution day7

Chapter 1: Explaining Behavior

1. Objective: analyzing CD4 counts data using GEE marginal model and random effects model. Demonstrate the analysis using SAS and STATA.

CHAPTER ONE CORRELATION

Statistical Methods and Reasoning for the Clinical Sciences

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?

Lecture 6B: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

Daniel Boduszek University of Huddersfield

11/24/2017. Do not imply a cause-and-effect relationship

Chapter 3: Examining Relationships

BAM Monitor Performance. Seasonal and Geographic Variation in NC

ANOVA in SPSS (Practical)

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

STATISTICAL METHODS FOR DIAGNOSTIC TESTING: AN ILLUSTRATION USING A NEW METHOD FOR CANCER DETECTION XIN SUN. PhD, Kansas State University, 2012

Original Article Downloaded from jhs.mazums.ac.ir at 22: on Friday October 5th 2018 [ DOI: /acadpub.jhs ]

Problem #1 Neurological signs and symptoms of ciguatera poisoning as the start of treatment and 2.5 hours after treatment with mannitol.

Examining Relationships Least-squares regression. Sections 2.3

Chapter 13 Estimating the Modified Odds Ratio

3.2 Least- Squares Regression

CHILD HEALTH AND DEVELOPMENT STUDY

Introduction to regression

Multiple Regression Analysis

6. Unusual and Influential Data

m 11 m.1 > m 12 m.2 risk for smokers risk for nonsmokers

Multiple Linear Regression

Final Exam Version A

STP 231 Example FINAL

Lesson 1: Distributions and Their Shapes

Business Statistics Probability

SPSS output for 420 midterm study

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

AP Statistics Practice Test Ch. 3 and Previous

Section 3 Correlation and Regression - Teachers Notes

Chapter 1: Exploring Data

2 Assumptions of simple linear regression

Multiple Linear Regression Analysis

t-test Tutorial Aliza McConnahey & Josh Petkash

Statistics Assignment 11 - Solutions

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

The purpose of our project was to determine the percentage of the Drug. Enforcement Administration s (DEA) budget that is being spent on drug

Statistics as a Tool. A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations.

Objective: To describe a new approach to neighborhood effects studies based on residential mobility and demonstrate this approach in the context of

Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties

14.1: Inference about the Model

Chapter 9. Factorial ANOVA with Two Between-Group Factors 10/22/ Factorial ANOVA with Two Between-Group Factors

On Regression Analysis Using Bivariate Extreme Ranked Set Sampling

Psych 5741/5751: Data Analysis University of Boulder Gary McClelland & Charles Judd. Exam #2, Spring 1992

INTERPRET SCATTERPLOTS

Unit 1 Exploring and Understanding Data

Statistical techniques to evaluate the agreement degree of medicine measurements

Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world

Chapter 3: Describing Relationships

Clincial Biostatistics. Regression

Regression CHAPTER SIXTEEN NOTE TO INSTRUCTORS OUTLINE OF RESOURCES

Ordinary Least Squares Regression

STAT445 Midterm Project1

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics

Psychology Research Process

HZAU MULTIVARIATE HOMEWORK #2 MULTIPLE AND STEPWISE LINEAR REGRESSION

Today: Binomial response variable with an explanatory variable on an ordinal (rank) scale.

Dr. Kelly Bradley Final Exam Summer {2 points} Name

Use the above variables and any you might need to construct to specify the MODEL A/C comparisons you would use to ask the following questions.

STATISTICS AND RESEARCH DESIGN

Transcription:

NORTH SOUTH UNIVERSITY TUTORIAL 2 AHMED HOSSAIN,PhD Data Management and Analysis AHMED HOSSAIN,PhD - Data Management and Analysis 1

Correlation Analysis INTRODUCTION In correlation analysis, we estimate a sample correlation coefficient, more specifically the Pearson Product Moment correlation coefficient. The sample correlation coefficient, denoted r, ranges between -1 and +1. r quantifies the direction and strength of the linear relationship between the two variables. The sign of the r indicates the direction of the association. The magnitude of the r indicates the strength of the association. For example, a correlation of r = 0.9 suggests a strong, positive association between two variables, whereas a correlation of r = -0.2 suggest a weak, negative association. r close to zero suggests no linear association between two continuous variables. Limitations: There may be a non-linear association between two continuous variables, but computation of a r does not detect this. AHMED HOSSAIN,PhD - Data Management and Analysis 2

Correlation Analysis SCATTER DIAGRAM We wish to estimate the association between gestational age and infant birth weight. In this example, birth weight is the dependent variable and gestational age is the independent variable. Thus Y =birth weight and X=gestational age. Note that the independent variable is on the horizontal axis (or X-axis), and the dependent variable is on the vertical axis (or Y-axis). AHMED HOSSAIN,PhD - Data Management and Analysis 3

Correlation Analysis SCATTER DIAGRAM AHMED HOSSAIN,PhD - Data Management and Analysis 4

Simple Linear Regression INTRODUCTION In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship 1 The strength of the relationship. 2 The direction of the relationship. We shall also be interested in making inferences about the relationship. We will be assuming here that the relationship between X and Y is linear (or has been linearized through transformation). AHMED HOSSAIN,PhD - Data Management and Analysis 5

Regression INTRODUCTION Technique used for the modeling and analysis of numerical data. Exploits the relationship between two or more variables so that we can gain information about one of them through knowing values of the other. Regression can be used for prediction, estimation, hypothesis testing, and modeling causal relationships. AHMED HOSSAIN,PhD - Data Management and Analysis 6

Simple Linear Regression ASSUMPTIONS Suppose that we have a dataset (y 1, x 1 ), (y 2, x 2 ),, (y n, x n). Our interest is in using our model to predict values of Y for any given value of X = x. If we know the values of β 0 and β 1 then the fitted value for the observation y i would be β 0 + β 1 x i. The error in the fitted value can be measured by the vertical distance ɛ i = y i β 0 β 1 x i We would like to make these errors as small as possible. AHMED HOSSAIN,PhD - Data Management and Analysis 7

Simple Linear Regression EXAMPLE AHMED HOSSAIN,PhD - Data Management and Analysis 8

Simple Linear Regression EXAMPLE AHMED HOSSAIN,PhD - Data Management and Analysis 9

INTRODUCTION Extension of the simple linear regression model to two or more independent variables y = β 0 + β 1 x 1 + β 2 x 2 + + β nx n + ɛ For example, Expression = Baseline + Age + Tissue + Sex + Error. Partial Regression Coefficients: β i effect on the dependent variable when increasing the ith independent variable by 1 unit, holding all other predictors constant. AHMED HOSSAIN,PhD - Data Management and Analysis 10

CATEGORICAL INDEPENDENT VARIABLES AHMED HOSSAIN,PhD - Data Management and Analysis 11

CATEGORICAL INDEPENDENT VARIABLES AHMED HOSSAIN,PhD - Data Management and Analysis 12

RESULTS FROM R Call: lm(formula = y X1 + X2) Residuals: Coefficients: Min 1Q Median 3Q Max -4.5021-0.8847-0.2502 0.5476 4.3438 Estimate Std. Error t value Pr(> t ) (Intercept) 4.694357 1.365469 3.438 0.00146 ** X1-0.023186 0.023210-0.999 0.32432 X2-0.005716 0.007608-0.751 0.45721 Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 1.688 on 37 degrees of freedom Multiple R-squared: 0.03497, Adjusted R-squared: -0.0172 F-statistic: 0.6703 on 2 and 37 DF, p-value: 0.5176 AHMED HOSSAIN,PhD - Data Management and Analysis 13

HYPOTHESIS TESTS: INDIVIDUAL REGRESSION COEFFICIENTS AHMED HOSSAIN,PhD - Data Management and Analysis 14

HYPOTHESIS TESTING: MODEL UTILITY TEST AHMED HOSSAIN,PhD - Data Management and Analysis 15

THE COEFFICIENT OF DETERMINATION The total sum of squares is a measure of the variability in y 1,, y n without taking the covariate into account. The error sum of squares is the amount of variability left after fitting a linear regression for the covariate. The model sum of squares is the amount of variability explained by the model. The proportion of the variability explained by the model is R 2 = SSR SST = 1 SSE SST In simple regression R 2 is the square of the sample correlation between x 1,, x n and y 1,, y n. AHMED HOSSAIN,PhD - Data Management and Analysis 16

BIRTHWEIGHT IS CONTINIOUS AND CATEGORICAL INDEPENDENT VARIABLES AHMED HOSSAIN,PhD - Data Management and Analysis 17

RESULTS AHMED HOSSAIN,PhD - Data Management and Analysis 18

INTERACTION INTERACTION Interaction effects represent the combined effects of variables on the criterion or dependent measure. When an interaction effect is present, the impact of one variable depends on the level of the other variable. EXAMPLE 1 Interaction between adding sugar to coffee and stirring the coffee. Neither of the two individual variables has much effect on sweetness but a combination of the two does. EXAMPLE 2 Interaction between smoking and inhaling asbestos fibres: Both raise lung carcinoma risk, but exposure to asbestos multiplies the cancer risk in smokers and non-smokers. Here, the joint effect of inhaling asbestos and smoking is higher than the sum of both effects. AHMED HOSSAIN,PhD - Data Management and Analysis 19

IDENTIFYING INTERACTION CATEGORICAL PREDICTORS If the researcher is interested in whether the treatment is equally effective for females and males. That is, is there a difference in treatment depending on gender group? This is a question of interaction. Interaction results whose lines do not cross. CONTINUOUS PREDICTORS : Single slope test. AHMED HOSSAIN,PhD - Data Management and Analysis 20