Performance of Median and Least Squares Regression for Slightly Skewed Data

Size: px
Start display at page:

Download "Performance of Median and Least Squares Regression for Slightly Skewed Data"

Transcription

1 World Academy of Science, Engineering and Technology 9 Performance of Median and Least Squares Regression for Slightly Skewed Data Carolina Bancayrin - Baguio Abstract This paper presents the concept of quantile regression which is a nonparametric procedure used for prediction. This method is more robust compared to the Ordinary Least Squares (OLS) for asymmetric data. Specifically, in this study the procedure for Median Regression was emphasized and the attempt to compare its efficiency to that of Least Squares Method was done for slightly skewed data. The study utilized a subset of the data on the study entitled The Profile of Mindanao State University-Iligan Institute of Technology Students and Its Effects to their Academic Performance with total sample size of 677 students coming from different colleges in the Institute for the year -. The grade point average (gpa) is the dependent variable and their academic load, entrance exam scores ((SASE), age and number of hours for studying are the independent variables. It was found out that values of the standard errors of the estimates of the regression coefficients are relatively smaller for some predictors using the median regression compared to those coefficients obtained using the least squares method. In order to validate the performance of the median regression in terms of asymptotic efficiency, a bootstrapped median regression with, replicates was employed to compute the estimates of the regression coefficients. It was shown that the standard errors of the estimates of the median and bootstrapped regression are comparable while the values of the pseudo R are the same. This result confirms the asymptotic efficiency of the median regression for relative large sample sizes of the pseudo replicates for the data generated by bootstrapping. Keywords Quantile regression, robust, asymmetric distribution, nonparametric procedure, ordinary least squares, least trimmed squares. I. INTRODUCTION T has been observed that most of the studies in various I fields deal with the investigation of causal relationships of response on some realistic phenomena and predictor variables. Most often, linear regression analysis using least squares is used by researchers without the knowledge of a robust regression. This method is being driven with several assumptions like normally distributed errors, independence of observations, linearity, and absence of outliers. It was proven by so many studies that ordinary least squares is not robust compared to other methods such as the Median Regression, Least Trimmed Squares, and the Least Median Squares (Yaffee, R.A., ). The Least Absolute Deviation (LAD) was proven to be better than the Ordinary Least Squares (OLS) as revealed in Carolina Bancayrin - Baguio, PhD, is with MSU-Iligan Institute of Technology, Iligan City, 9, Philippines ( cbbaguio@yahoo.com). many studies which resulted into the Median Regression for the th quantile or in general for any quantile, the so called Quantile Regression. This method is classified under the nonparametric statistics since it is not assumed that data follow certain kind of distribution. Quantile Regression was never popular several decades ago and is not being taught in the basic Statistics course due to its inherent difficulty in the computation. Since the LAD function is not differentiable unlike the OLS, the Linear programming method was employed to solve the optimization problem instead of using calculus. The computation of which is greatly facilitated nowadays since the advent of better solution methods and advances in computing. The curse of multidimensionality or parsimony which arise if there are so many variables and data points was already addressed by some advanced algorithms other than the simplex method of Linear programming, the so called Interior Point Algorithm of Linear Programming. II. OBJECTIVES OF THE STUDY The main objective of this study is to investigate the potential of median regression for slightly skewed data. Specifically, it attempted to: ) To discuss the procedure for computing the estimates of Median Regression ) To compare Median Regression with Ordinary Least Squares (OLS) through the magnitudes of the standard errors of the regression coefficient estimates III. THEORY AND CONCEPTS A. Median Regression Median Regression estimates the median of the dependent variable, conditional on the values of the independent variable whereas the Least-Squares Regression estimates the mean of the dependent variable. The basic difference of the two methods is that the median regression finds the regression plane that minimizes the sum of the absolute residuals rather than the sum of the squared residuals for the Ordinary Least Squares. B. Bootstrapping Bootstrapping is a way of testing the reliability of the data set. It is the creation of pseudoreplicate datasets by resampling and a statistical method for obtaining an estimate of standard error. Moreover, bootstrapping allows one to assess whether the distribution of characters has been influenced by stochastic 6

2 World Academy of Science, Engineering and Technology 9 effects. C. Quantile Function Let Y be any real valued random variable characterized by its distribution function as, The τ th quantile of Y for any < τ <, on the other hand, can be written as: The quantile function in () provides a complete characterization of the random variable just like the distribution function. Using this function the median can be written as Q(/). The quantiles defined above can be formulated as the solution to a simple optimization problem. For any < τ <, define the piecewise linear function, ρ τ (v) = v(τ-i(v<)). Minimizing the expectation of ρ τ (Y θ) with respect to θ, yields solutions in which the smallest is the Q(τ) defined in (). The sample analog q(τ) of Q(τ), based on a random sample, (y,y,...,y n ) of Y s, is called the τ th sample quantile, which can be found by solving () Equation in () yields a natural generalization to the regression context. The linear conditional quantile function can be estimated by solving, for X ε R K and β ε R K : D. Computation of Quantile Regression Estimates The package Quantreg procedure using the R computer programming language solves Linear programs using the simplex algorithm of Barrodale and Roberts (97).The algorithm solves the linear program by two stages. The first stage picks the X or X as pivotal columns. The second stage interchanges the columns in I and I as basis or nonbasis columns. The optimal solution can be obtained by executing the two stages interactively. Only the main data matrix is stored in the current memory because of the special structure of the matrix A. This special version of the Simplex Algorithm for Median Regression can be extended to quantile regression for any given quantile, even for the entire quantile process (Koenker and d Orey 99). It was found out that this procedure reduces greatly the computing time required by a general simplex algorithm, and suitable for data sets with less than, observations and variables. The software called STATA version 9. can also solve the estimates of the coefficients of Median Regression as well as the Bootstrapped Median Regression. () () E. Hypothesis Testing Using the Dual Quantile estimation can be represented as a linear program. The confidence intervals of the coefficient estimates can be constructed by the solution of dual problem in quantile regression. It involves the rank statistics due to the dual problem of quantile regression. The advantage of rank test is that the nuisance parameter estimation can be avoided and the result is robust.the testing of the hypothesis for linearity of the quantiles in quantile regression can be done by the Rank inverse test. Gutenbrunner and Jureckova (99) showed that the solutions of the dual problem which is formulated for computing regression quantiles generalize the duality of ranks and quantiles to linear models. The dual solution called regression rank score process establishes the link between the linear rank statistics and regression quantiles. The procedure emanates from the classical theory of rank tests which can be extended to the test of the hypothesis IV. METHODS A subset of the data on the study entitled The Profile of Mindanao State University-Iligan Institute of Technology Students and its Effect to their Academic Performance with total sample size of 677 students coming from different colleges in the Institute for the year - was utilized in the study in order to illustrate the performance of Median Regression. The grade point average (gpa) is the dependent variable and their academic load, entrance exam scores (SASE), age and number of hours for studying are the independent variable. The software called STATA version9. licensed to Mindanao State University - Iligan Institute of Technology, Iligan City Philippines was used for the computation. V. EMPIRICAL RESULTS TABLE I DESCRIPTIVE STATISTICS OF THE DEPENDENT AND INDEPENDENT VARIABLES Variables Skewness Mean s.d. / cv Grade Point.86.. /.87% Average Entrance Exam /.8% Academic Load / 6.% Age / 9.7% Study hours /.% s.e. = standard error cv = coefficient of variation It is shown in Table I that the skewness values of the variables are relatively small and different from zero which implies that the data are slightly skewed. The percentage of variation is more than % for most of the variables except the ages of the students. This means that these variables are quite dispersed from the mean except for the ages of the 7

3 World Academy of Science, Engineering and Technology 9 students. Figs. to give the histograms and the Q-Q plots of the different variables. These charts confirm that the data departs from normality and are slightly skewed academic load.. Mean = Std. Dev. = 6.76 Fig. Histogram of the Students Academic Load Mean =. Std. Dev. = gpa Fig. Histogram for the Students Grade point Average(gpa) Normal Q-Q Plot of gpa Normal Q-Q Plot of academic load Fig. 6 Q-Q Plot of Students Academic Load Fig. QQ Plot of the Students Grade Point Average (gpa) Histogram for Students' age Mean = 9.6 Std. Dev. =.79 AGE Mean = 98.6 Std. Dev. = admission test result (raw score) Fig. Histogram of the Students Admission Test (SASE) Normal Q-Q Plot of admission test result (raw score) Fig. 7 Histogram of the Students Age Normal Q-Q Plot of age Fig. Q-Q Plot of the Students Admission Test (SASE) Fig. 8 Q-Q Plot of Students Age 8

4 World Academy of Science, Engineering and Technology 9 Mean =.88 Std. Dev. = time spent for studying per day Fig. 9 Histogram for Students Study Hours Normal Q-Q Plot of time spent for studying per day TABLE III COMPARATIVE PERFORMANCE OF MEDIAN AND BOOTSTRAPPED MEDIAN LEAST SQUARES RESPONSE MEDIAN VARIABLE coeff. s.e. coeff. s.e. (gpa) Predictors Admission Test -.7**. -.7**. -6. < <. Academic Load -.8**. -.7** Age.6**..6**...<...<. Study Hours -.6**. -.6** Constant.9**.8.9**. No. of Students N=677 Coeff.=coefficient s.e. = standard error ** highly significant at. level 6.8.<. 6.9.<. Pseudo R =.6 R =.6 Fig. Q-Q Plot of Students time spent for study per Day TABLE II COMPARATIVE PERFORMANCE OF MEDIAN AND LEAST SQUARES LEAST SQUARES RESPONSE MEDIAN VARIABLE coeff. s.e. coeff. s.e. (gpa) Predictors Admission Test -.7**. -.6**. -6. <. -..<. Academic Load -.8**. -.7** Age.6**..6**...<...<. Study Hours -.6**. -.6** Constant.9**.8.97**.98 No. of Students N=677 Coeff.=coefficient s.e. = standard error ** highly significant at. level 6.8.<. 6.6.<. Pseudo R =.6 R =.6 In order to compare the performance of the Median and Least Squares Regression for slightly skewed regression data that is normality assumption is waived for all the variables, Table II is constructed. It can be observed from the results of both methods that the significant linear predictors for the students grade point average are the admission test scores, academic load, age, and time spent for studying in hours. It is apparent from this result that the standard errors in the regression coefficient for both methods are the same for some variables like admission test score, academic load, study hours but there is a slight difference on the standard errors of the coefficients for the age and constant. However, the value of the R for the Least Squares method is larger than the pseudo R value of the Median Regression. But the violation of the assumption of normality on the values of the data can be waived for the median regression but not for the least squares regression. Hence, median regression is the appropriate procedure and leads to accurate statistical results. Table III gives the comparison of the regression estimates of the Median and Bootstrapped Median Regression with, pseudo replicates of the data. It can be observed that the standard errors of the estimates are just slightly different and so with the values of the significance levels. The values of the pseudo R for the goodness of fit for the two methods are equal. These findings reveal that the Median Regression gives asymptotic efficient estimates for the coefficients of the regression model. Hence, it can be said that Median Regression is the robust and asymptotically efficient procedure to address the problem of lightly skewed data which violates the assumption of normality required for regression data. VI. CONCLUSION From the findings of this study the following inferences can be drawn: 9

5 World Academy of Science, Engineering and Technology 9 a. The Median Regression is relatively better than the Least Squares Regression for slightly skewed data basing from the magnitudes of the standard errors of the regression coefficients. b. The Median Regression gives asymptotic efficient estimates for the coefficients of the regression model. VII. FUTURE DIRECTION There are still other robust methods in dealing with slightly skewed data mentioned in the literature. Hence, it is hereby recommended to compare the findings of this study to the results from other robust methods. Other resampling plans which reduce the biasedness in the standard errors of the regression estimates are also endorsed to be explored. REFERENCES [] Barrodale,I. and Roberts,F.D.K. (97). An Improved Algorithm for Discrete l Linear Approximation, SIAM J. Nmer. Anal.,, [] Gutenbrunner, C. and Jureckova, J. (99). Regression rank scores and regression quantiles, Annals Of Statistics (), -. [] Koenker, R and d Orey,V. (99), Computing Regression Quantiles, Applied Statistics,, -. [] Koenker, R. And Bassett,G.W. (978). Regression Quantiles,,Econometrica, 6, -. [] Yaffee, R. A. (). Robust Regression Analysis: Some Popular Statistical package Options, Statistics, Social Science and mapping Group, Academic Computing Services, Information Technology Services.

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS) Chapter : Advanced Remedial Measures Weighted Least Squares (WLS) When the error variance appears nonconstant, a transformation (of Y and/or X) is a quick remedy. But it may not solve the problem, or it

More information

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n. University of Groningen Latent instrumental variables Ebbes, P. IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document

More information

MEA DISCUSSION PAPERS

MEA DISCUSSION PAPERS Inference Problems under a Special Form of Heteroskedasticity Helmut Farbmacher, Heinrich Kögel 03-2015 MEA DISCUSSION PAPERS mea Amalienstr. 33_D-80799 Munich_Phone+49 89 38602-355_Fax +49 89 38602-390_www.mea.mpisoc.mpg.de

More information

A Comparison of Robust and Nonparametric Estimators Under the Simple Linear Regression Model

A Comparison of Robust and Nonparametric Estimators Under the Simple Linear Regression Model Nevitt & Tam A Comparison of Robust and Nonparametric Estimators Under the Simple Linear Regression Model Jonathan Nevitt, University of Maryland, College Park Hak P. Tam, National Taiwan Normal University

More information

Dr. Kelly Bradley Final Exam Summer {2 points} Name

Dr. Kelly Bradley Final Exam Summer {2 points} Name {2 points} Name You MUST work alone no tutors; no help from classmates. Email me or see me with questions. You will receive a score of 0 if this rule is violated. This exam is being scored out of 00 points.

More information

Pitfalls in Linear Regression Analysis

Pitfalls in Linear Regression Analysis Pitfalls in Linear Regression Analysis Due to the widespread availability of spreadsheet and statistical software for disposal, many of us do not really have a good understanding of how to use regression

More information

Quantitative Methods in Computing Education Research (A brief overview tips and techniques)

Quantitative Methods in Computing Education Research (A brief overview tips and techniques) Quantitative Methods in Computing Education Research (A brief overview tips and techniques) Dr Judy Sheard Senior Lecturer Co-Director, Computing Education Research Group Monash University judy.sheard@monash.edu

More information

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review Results & Statistics: Description and Correlation The description and presentation of results involves a number of topics. These include scales of measurement, descriptive statistics used to summarize

More information

Addendum: Multiple Regression Analysis (DRAFT 8/2/07)

Addendum: Multiple Regression Analysis (DRAFT 8/2/07) Addendum: Multiple Regression Analysis (DRAFT 8/2/07) When conducting a rapid ethnographic assessment, program staff may: Want to assess the relative degree to which a number of possible predictive variables

More information

WELCOME! Lecture 11 Thommy Perlinger

WELCOME! Lecture 11 Thommy Perlinger Quantitative Methods II WELCOME! Lecture 11 Thommy Perlinger Regression based on violated assumptions If any of the assumptions are violated, potential inaccuracies may be present in the estimated regression

More information

Content. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries. Research question. Example Newly diagnosed Type 2 Diabetes

Content. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries. Research question. Example Newly diagnosed Type 2 Diabetes Content Quantifying association between continuous variables. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General

More information

AP Statistics. Semester One Review Part 1 Chapters 1-5

AP Statistics. Semester One Review Part 1 Chapters 1-5 AP Statistics Semester One Review Part 1 Chapters 1-5 AP Statistics Topics Describing Data Producing Data Probability Statistical Inference Describing Data Ch 1: Describing Data: Graphically and Numerically

More information

Business Statistics Probability

Business Statistics Probability Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition List of Figures List of Tables Preface to the Second Edition Preface to the First Edition xv xxv xxix xxxi 1 What Is R? 1 1.1 Introduction to R................................ 1 1.2 Downloading and Installing

More information

Chapter 1: Explaining Behavior

Chapter 1: Explaining Behavior Chapter 1: Explaining Behavior GOAL OF SCIENCE is to generate explanations for various puzzling natural phenomenon. - Generate general laws of behavior (psychology) RESEARCH: principle method for acquiring

More information

STAT445 Midterm Project1

STAT445 Midterm Project1 STAT445 Midterm Project1 Executive Summary This report works on the dataset of Part of This Nutritious Breakfast! In this dataset, 77 different breakfast cereals were collected. The dataset also explores

More information

STATISTICS AND RESEARCH DESIGN

STATISTICS AND RESEARCH DESIGN Statistics 1 STATISTICS AND RESEARCH DESIGN These are subjects that are frequently confused. Both subjects often evoke student anxiety and avoidance. To further complicate matters, both areas appear have

More information

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug? MMI 409 Spring 2009 Final Examination Gordon Bleil Table of Contents Research Scenario and General Assumptions Questions for Dataset (Questions are hyperlinked to detailed answers) 1. Is there a difference

More information

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0% Capstone Test (will consist of FOUR quizzes and the FINAL test grade will be an average of the four quizzes). Capstone #1: Review of Chapters 1-3 Capstone #2: Review of Chapter 4 Capstone #3: Review of

More information

Original Article Downloaded from jhs.mazums.ac.ir at 22: on Friday October 5th 2018 [ DOI: /acadpub.jhs ]

Original Article Downloaded from jhs.mazums.ac.ir at 22: on Friday October 5th 2018 [ DOI: /acadpub.jhs ] Iranian journal of health sciences 213;1(3):58-7 http://jhs.mazums.ac.ir Original Article Downloaded from jhs.mazums.ac.ir at 22:2 +33 on Friday October 5th 218 [ DOI: 1.18869/acadpub.jhs.1.3.58 ] A New

More information

9 research designs likely for PSYC 2100

9 research designs likely for PSYC 2100 9 research designs likely for PSYC 2100 1) 1 factor, 2 levels, 1 group (one group gets both treatment levels) related samples t-test (compare means of 2 levels only) 2) 1 factor, 2 levels, 2 groups (one

More information

Still important ideas

Still important ideas Readings: OpenStax - Chapters 1 13 & Appendix D & E (online) Plous Chapters 17 & 18 - Chapter 17: Social Influences - Chapter 18: Group Judgments and Decisions Still important ideas Contrast the measurement

More information

Basic Statistics 01. Describing Data. Special Program: Pre-training 1

Basic Statistics 01. Describing Data. Special Program: Pre-training 1 Basic Statistics 01 Describing Data Special Program: Pre-training 1 Describing Data 1. Numerical Measures Measures of Location Measures of Dispersion Correlation Analysis 2. Frequency Distributions (Relative)

More information

Study of cigarette sales in the United States Ge Cheng1, a,

Study of cigarette sales in the United States Ge Cheng1, a, 2nd International Conference on Economics, Management Engineering and Education Technology (ICEMEET 2016) 1Department Study of cigarette sales in the United States Ge Cheng1, a, of pure mathematics and

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection

Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection Author's response to reviews Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection Authors: Jestinah M Mahachie John

More information

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do.

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do. Midterm STAT-UB.0003 Regression and Forecasting Models The exam is closed book and notes, with the following exception: you are allowed to bring one letter-sized page of notes into the exam (front and

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

STATISTICS & PROBABILITY

STATISTICS & PROBABILITY STATISTICS & PROBABILITY LAWRENCE HIGH SCHOOL STATISTICS & PROBABILITY CURRICULUM MAP 2015-2016 Quarter 1 Unit 1 Collecting Data and Drawing Conclusions Unit 2 Summarizing Data Quarter 2 Unit 3 Randomness

More information

Ecological Statistics

Ecological Statistics A Primer of Ecological Statistics Second Edition Nicholas J. Gotelli University of Vermont Aaron M. Ellison Harvard Forest Sinauer Associates, Inc. Publishers Sunderland, Massachusetts U.S.A. Brief Contents

More information

Score Tests of Normality in Bivariate Probit Models

Score Tests of Normality in Bivariate Probit Models Score Tests of Normality in Bivariate Probit Models Anthony Murphy Nuffield College, Oxford OX1 1NF, UK Abstract: A relatively simple and convenient score test of normality in the bivariate probit model

More information

NORTH SOUTH UNIVERSITY TUTORIAL 2

NORTH SOUTH UNIVERSITY TUTORIAL 2 NORTH SOUTH UNIVERSITY TUTORIAL 2 AHMED HOSSAIN,PhD Data Management and Analysis AHMED HOSSAIN,PhD - Data Management and Analysis 1 Correlation Analysis INTRODUCTION In correlation analysis, we estimate

More information

Various Approaches to Szroeter s Test for Regression Quantiles

Various Approaches to Szroeter s Test for Regression Quantiles The International Scientific Conference INPROFORUM 2017, November 9, 2017, České Budějovice, 361-365, ISBN 978-80-7394-667-8. Various Approaches to Szroeter s Test for Regression Quantiles Jan Kalina,

More information

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions Readings: OpenStax Textbook - Chapters 1 5 (online) Appendix D & E (online) Plous - Chapters 1, 5, 6, 13 (online) Introductory comments Describe how familiarity with statistical methods can - be associated

More information

Section on Survey Research Methods JSM 2009

Section on Survey Research Methods JSM 2009 Missing Data and Complex Samples: The Impact of Listwise Deletion vs. Subpopulation Analysis on Statistical Bias and Hypothesis Test Results when Data are MCAR and MAR Bethany A. Bell, Jeffrey D. Kromrey

More information

CHAPTER TWO REGRESSION

CHAPTER TWO REGRESSION CHAPTER TWO REGRESSION 2.0 Introduction The second chapter, Regression analysis is an extension of correlation. The aim of the discussion of exercises is to enhance students capability to assess the effect

More information

Examining Relationships Least-squares regression. Sections 2.3

Examining Relationships Least-squares regression. Sections 2.3 Examining Relationships Least-squares regression Sections 2.3 The regression line A regression line describes a one-way linear relationship between variables. An explanatory variable, x, explains variability

More information

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys Multiple Regression Analysis 1 CRITERIA FOR USE Multiple regression analysis is used to test the effects of n independent (predictor) variables on a single dependent (criterion) variable. Regression tests

More information

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Plous Chapters 17 & 18 Chapter 17: Social Influences Chapter 18: Group Judgments and Decisions

More information

Russian Journal of Agricultural and Socio-Economic Sciences, 3(15)

Russian Journal of Agricultural and Socio-Economic Sciences, 3(15) ON THE COMPARISON OF BAYESIAN INFORMATION CRITERION AND DRAPER S INFORMATION CRITERION IN SELECTION OF AN ASYMMETRIC PRICE RELATIONSHIP: BOOTSTRAP SIMULATION RESULTS Henry de-graft Acquah, Senior Lecturer

More information

11/24/2017. Do not imply a cause-and-effect relationship

11/24/2017. Do not imply a cause-and-effect relationship Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are highly extraverted people less afraid of rejection

More information

Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach

Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach University of South Florida Scholar Commons Graduate Theses and Dissertations Graduate School November 2015 Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach Wei Chen

More information

CHILD HEALTH AND DEVELOPMENT STUDY

CHILD HEALTH AND DEVELOPMENT STUDY CHILD HEALTH AND DEVELOPMENT STUDY 9. Diagnostics In this section various diagnostic tools will be used to evaluate the adequacy of the regression model with the five independent variables developed in

More information

bivariate analysis: The statistical analysis of the relationship between two variables.

bivariate analysis: The statistical analysis of the relationship between two variables. bivariate analysis: The statistical analysis of the relationship between two variables. cell frequency: The number of cases in a cell of a cross-tabulation (contingency table). chi-square (χ 2 ) test for

More information

Research Methods in Forest Sciences: Learning Diary. Yoko Lu December Research process

Research Methods in Forest Sciences: Learning Diary. Yoko Lu December Research process Research Methods in Forest Sciences: Learning Diary Yoko Lu 285122 9 December 2016 1. Research process It is important to pursue and apply knowledge and understand the world under both natural and social

More information

COAL COMBUSTION RESIDUALS RULE STATISTICAL METHODS CERTIFICATION SOUTHERN ILLINOIS POWER COOPERATIVE (SIPC)

COAL COMBUSTION RESIDUALS RULE STATISTICAL METHODS CERTIFICATION SOUTHERN ILLINOIS POWER COOPERATIVE (SIPC) Regulatory Guidance Regulatory guidance provided in 40 CFR 257.90 specifies that a CCR groundwater monitoring program must include selection of the statistical procedures to be used for evaluating groundwater

More information

Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1: Exploring Data Key Vocabulary:! individual! variable! frequency table! relative frequency table! distribution! pie chart! bar graph! two-way table! marginal distributions! conditional distributions!

More information

Investigating the robustness of the nonparametric Levene test with more than two groups

Investigating the robustness of the nonparametric Levene test with more than two groups Psicológica (2014), 35, 361-383. Investigating the robustness of the nonparametric Levene test with more than two groups David W. Nordstokke * and S. Mitchell Colp University of Calgary, Canada Testing

More information

Simple Linear Regression the model, estimation and testing

Simple Linear Regression the model, estimation and testing Simple Linear Regression the model, estimation and testing Lecture No. 05 Example 1 A production manager has compared the dexterity test scores of five assembly-line employees with their hourly productivity.

More information

In this module I provide a few illustrations of options within lavaan for handling various situations.

In this module I provide a few illustrations of options within lavaan for handling various situations. In this module I provide a few illustrations of options within lavaan for handling various situations. An appropriate citation for this material is Yves Rosseel (2012). lavaan: An R Package for Structural

More information

Analysis and Interpretation of Data Part 1

Analysis and Interpretation of Data Part 1 Analysis and Interpretation of Data Part 1 DATA ANALYSIS: PRELIMINARY STEPS 1. Editing Field Edit Completeness Legibility Comprehensibility Consistency Uniformity Central Office Edit 2. Coding Specifying

More information

Biostatistics II

Biostatistics II Biostatistics II 514-5509 Course Description: Modern multivariable statistical analysis based on the concept of generalized linear models. Includes linear, logistic, and Poisson regression, survival analysis,

More information

Modeling Sentiment with Ridge Regression

Modeling Sentiment with Ridge Regression Modeling Sentiment with Ridge Regression Luke Segars 2/20/2012 The goal of this project was to generate a linear sentiment model for classifying Amazon book reviews according to their star rank. More generally,

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression Assoc. Prof Dr Sarimah Abdullah Unit of Biostatistics & Research Methodology School of Medical Sciences, Health Campus Universiti Sains Malaysia Regression Regression analysis

More information

Modern Regression Methods

Modern Regression Methods Modern Regression Methods Second Edition THOMAS P. RYAN Acworth, Georgia WILEY A JOHN WILEY & SONS, INC. PUBLICATION Contents Preface 1. Introduction 1.1 Simple Linear Regression Model, 3 1.2 Uses of Regression

More information

Bayesian Confidence Intervals for Means and Variances of Lognormal and Bivariate Lognormal Distributions

Bayesian Confidence Intervals for Means and Variances of Lognormal and Bivariate Lognormal Distributions Bayesian Confidence Intervals for Means and Variances of Lognormal and Bivariate Lognormal Distributions J. Harvey a,b, & A.J. van der Merwe b a Centre for Statistical Consultation Department of Statistics

More information

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions Readings: OpenStax Textbook - Chapters 1 5 (online) Appendix D & E (online) Plous - Chapters 1, 5, 6, 13 (online) Introductory comments Describe how familiarity with statistical methods can - be associated

More information

Statistical Methods Exam I Review

Statistical Methods Exam I Review Statistical Methods Exam I Review Professor: Dr. Kathleen Suchora SI Leader: Camila M. DISCLAIMER: I have created this review sheet to supplement your studies for your first exam. I am a student here at

More information

Part 1. Online Session: Math Review and Math Preparation for Course 5 minutes Introduction 45 minutes Reading and Practice Problem Assignment

Part 1. Online Session: Math Review and Math Preparation for Course 5 minutes Introduction 45 minutes Reading and Practice Problem Assignment Course Schedule PREREQUISITE (Pre-Class) Advanced Education Diagnostic Test 10 minutes Excel 2007 Exercise SECTION 1. (Completed before face-to-face sections begin) (2 hours) Part 1. Online Session: Math

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Please note the page numbers listed for the Lind book may vary by a page or two depending on which version of the textbook you have. Readings: Lind 1 11 (with emphasis on chapters 10, 11) Please note chapter

More information

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality Week 9 Hour 3 Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality Stat 302 Notes. Week 9, Hour 3, Page 1 / 39 Stepwise Now that we've introduced interactions,

More information

Still important ideas

Still important ideas Readings: OpenStax - Chapters 1 11 + 13 & Appendix D & E (online) Plous - Chapters 2, 3, and 4 Chapter 2: Cognitive Dissonance, Chapter 3: Memory and Hindsight Bias, Chapter 4: Context Dependence Still

More information

What you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu

What you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu What you should know before you collect data BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Types and levels of study Descriptive statistics Inferential statistics How to choose a statistical test

More information

CHAPTER - 6 STATISTICAL ANALYSIS. This chapter discusses inferential statistics, which use sample data to

CHAPTER - 6 STATISTICAL ANALYSIS. This chapter discusses inferential statistics, which use sample data to CHAPTER - 6 STATISTICAL ANALYSIS 6.1 Introduction This chapter discusses inferential statistics, which use sample data to make decisions or inferences about population. Populations are group of interest

More information

EXECUTIVE SUMMARY DATA AND PROBLEM

EXECUTIVE SUMMARY DATA AND PROBLEM EXECUTIVE SUMMARY Every morning, almost half of Americans start the day with a bowl of cereal, but choosing the right healthy breakfast is not always easy. Consumer Reports is therefore calculated by an

More information

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you? WDHS Curriculum Map Probability and Statistics Time Interval/ Unit 1: Introduction to Statistics 1.1-1.3 2 weeks S-IC-1: Understand statistics as a process for making inferences about population parameters

More information

10. LINEAR REGRESSION AND CORRELATION

10. LINEAR REGRESSION AND CORRELATION 1 10. LINEAR REGRESSION AND CORRELATION The contingency table describes an association between two nominal (categorical) variables (e.g., use of supplemental oxygen and mountaineer survival ). We have

More information

From Biostatistics Using JMP: A Practical Guide. Full book available for purchase here. Chapter 1: Introduction... 1

From Biostatistics Using JMP: A Practical Guide. Full book available for purchase here. Chapter 1: Introduction... 1 From Biostatistics Using JMP: A Practical Guide. Full book available for purchase here. Contents Dedication... iii Acknowledgments... xi About This Book... xiii About the Author... xvii Chapter 1: Introduction...

More information

Correlation and Regression

Correlation and Regression Dublin Institute of Technology ARROW@DIT Books/Book Chapters School of Management 2012-10 Correlation and Regression Donal O'Brien Dublin Institute of Technology, donal.obrien@dit.ie Pamela Sharkey Scott

More information

Overview of Non-Parametric Statistics

Overview of Non-Parametric Statistics Overview of Non-Parametric Statistics LISA Short Course Series Mark Seiss, Dept. of Statistics April 7, 2009 Presentation Outline 1. Homework 2. Review of Parametric Statistics 3. Overview Non-Parametric

More information

CHAPTER ONE CORRELATION

CHAPTER ONE CORRELATION CHAPTER ONE CORRELATION 1.0 Introduction The first chapter focuses on the nature of statistical data of correlation. The aim of the series of exercises is to ensure the students are able to use SPSS to

More information

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

Lec 02: Estimation & Hypothesis Testing in Animal Ecology Lec 02: Estimation & Hypothesis Testing in Animal Ecology Parameter Estimation from Samples Samples We typically observe systems incompletely, i.e., we sample according to a designed protocol. We then

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis Basic Concept: Extend the simple regression model to include additional explanatory variables: Y = β 0 + β1x1 + β2x2 +... + βp-1xp + ε p = (number of independent variables

More information

Econometric Game 2012: infants birthweight?

Econometric Game 2012: infants birthweight? Econometric Game 2012: How does maternal smoking during pregnancy affect infants birthweight? Case A April 18, 2012 1 Introduction Low birthweight is associated with adverse health related and economic

More information

Statistics as a Tool. A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations.

Statistics as a Tool. A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations. Statistics as a Tool A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations. Descriptive Statistics Numerical facts or observations that are organized describe

More information

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test February 2016

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test February 2016 UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test February 2016 STAB22H3 Statistics I, LEC 01 and LEC 02 Duration: 1 hour and 45 minutes Last Name: First Name:

More information

Making Inferences from Experiments

Making Inferences from Experiments 11.6 Making Inferences from Experiments Essential Question How can you test a hypothesis about an experiment? Resampling Data Yield (kilograms) Control Group Treatment Group 1. 1.1 1.2 1. 1.5 1.4.9 1.2

More information

Underweight Children in Ghana: Evidence of Policy Effects. Samuel Kobina Annim

Underweight Children in Ghana: Evidence of Policy Effects. Samuel Kobina Annim Underweight Children in Ghana: Evidence of Policy Effects Samuel Kobina Annim Correspondence: Economics Discipline Area School of Social Sciences University of Manchester Oxford Road, M13 9PL Manchester,

More information

Contents. Part 1 Introduction. Part 2 Cross-Sectional Selection Bias Adjustment

Contents. Part 1 Introduction. Part 2 Cross-Sectional Selection Bias Adjustment From Analysis of Observational Health Care Data Using SAS. Full book available for purchase here. Contents Preface ix Part 1 Introduction Chapter 1 Introduction to Observational Studies... 3 1.1 Observational

More information

Introduction to Quantitative Methods (SR8511) Project Report

Introduction to Quantitative Methods (SR8511) Project Report Introduction to Quantitative Methods (SR8511) Project Report Exploring the variables related to and possibly affecting the consumption of alcohol by adults Student Registration number: 554561 Word counts

More information

Statistical Techniques. Meta-Stat provides a wealth of statistical tools to help you examine your data. Overview

Statistical Techniques. Meta-Stat provides a wealth of statistical tools to help you examine your data. Overview 7 Applying Statistical Techniques Meta-Stat provides a wealth of statistical tools to help you examine your data. Overview... 137 Common Functions... 141 Selecting Variables to be Analyzed... 141 Deselecting

More information

Normal Q Q. Residuals vs Fitted. Standardized residuals. Theoretical Quantiles. Fitted values. Scale Location 26. Residuals vs Leverage

Normal Q Q. Residuals vs Fitted. Standardized residuals. Theoretical Quantiles. Fitted values. Scale Location 26. Residuals vs Leverage Residuals 400 0 400 800 Residuals vs Fitted 26 42 29 Standardized residuals 2 0 1 2 3 Normal Q Q 26 42 29 360 400 440 2 1 0 1 2 Fitted values Theoretical Quantiles Standardized residuals 0.0 0.5 1.0 1.5

More information

1.4 - Linear Regression and MS Excel

1.4 - Linear Regression and MS Excel 1.4 - Linear Regression and MS Excel Regression is an analytic technique for determining the relationship between a dependent variable and an independent variable. When the two variables have a linear

More information

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES 24 MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES In the previous chapter, simple linear regression was used when you have one independent variable and one dependent variable. This chapter

More information

Lab 2: Non-linear regression models

Lab 2: Non-linear regression models Lab 2: Non-linear regression models The flint water crisis has received much attention in the past few months. In today s lab, we will apply the non-linear regression to analyze the lead testing results

More information

THE ROLE OF PSYCHOMETRIC ENTRANCE TEST IN ADMISSION PROCESSES FOR NON-SELECTIVE ACADEMIC DEPARTMENTS: STUDY CASE IN YEZREEL VALLEY COLLEGE

THE ROLE OF PSYCHOMETRIC ENTRANCE TEST IN ADMISSION PROCESSES FOR NON-SELECTIVE ACADEMIC DEPARTMENTS: STUDY CASE IN YEZREEL VALLEY COLLEGE THE ROLE OF PSYCHOMETRIC ENTRANCE TEST IN ADMISSION PROCESSES FOR NON-SELECTIVE ACADEMIC DEPARTMENTS: STUDY CASE IN YEZREEL VALLEY COLLEGE Tal Shahor The Academic College of Emek Yezreel Emek Yezreel 19300,

More information

Recent Advances in Methods for Quantiles. Matteo Bottai, Sc.D.

Recent Advances in Methods for Quantiles. Matteo Bottai, Sc.D. Recent Advances in Methods for Quantiles Matteo Bottai, Sc.D. Many Thanks to Advisees Andrew Ortaglia Huiling Zhen Joe Holbrook Junlong Wu Li Zhou Marco Geraci Nicola Orsini Paolo Frumento Yuan Liu Collaborators

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Please note the page numbers listed for the Lind book may vary by a page or two depending on which version of the textbook you have. Readings: Lind 1 11 (with emphasis on chapters 5, 6, 7, 8, 9 10 & 11)

More information

NPTEL Project. Econometric Modelling. Module 14: Heteroscedasticity Problem. Module 16: Heteroscedasticity Problem. Vinod Gupta School of Management

NPTEL Project. Econometric Modelling. Module 14: Heteroscedasticity Problem. Module 16: Heteroscedasticity Problem. Vinod Gupta School of Management 1 P age NPTEL Project Econometric Modelling Vinod Gupta School of Management Module 14: Heteroscedasticity Problem Module 16: Heteroscedasticity Problem Rudra P. Pradhan Vinod Gupta School of Management

More information

Linear Regression in SAS

Linear Regression in SAS 1 Suppose we wish to examine factors that predict patient s hemoglobin levels. Simulated data for six patients is used throughout this tutorial. data hgb_data; input id age race $ bmi hgb; cards; 21 25

More information

Here are the various choices. All of them are found in the Analyze menu in SPSS, under the sub-menu for Descriptive Statistics :

Here are the various choices. All of them are found in the Analyze menu in SPSS, under the sub-menu for Descriptive Statistics : Descriptive Statistics in SPSS When first looking at a dataset, it is wise to use descriptive statistics to get some idea of what your data look like. Here is a simple dataset, showing three different

More information

Statistical Summaries. Kerala School of MathematicsCourse in Statistics for Scientists. Descriptive Statistics. Summary Statistics

Statistical Summaries. Kerala School of MathematicsCourse in Statistics for Scientists. Descriptive Statistics. Summary Statistics Kerala School of Mathematics Course in Statistics for Scientists Statistical Summaries Descriptive Statistics T.Krishnan Strand Life Sciences, Bangalore may be single numerical summaries of a batch, such

More information

ANOVA in SPSS (Practical)

ANOVA in SPSS (Practical) ANOVA in SPSS (Practical) Analysis of Variance practical In this practical we will investigate how we model the influence of a categorical predictor on a continuous response. Centre for Multilevel Modelling

More information

Statistical techniques to evaluate the agreement degree of medicine measurements

Statistical techniques to evaluate the agreement degree of medicine measurements Statistical techniques to evaluate the agreement degree of medicine measurements Luís M. Grilo 1, Helena L. Grilo 2, António de Oliveira 3 1 lgrilo@ipt.pt, Mathematics Department, Polytechnic Institute

More information

Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study

Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study STATISTICAL METHODS Epidemiology Biostatistics and Public Health - 2016, Volume 13, Number 1 Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation

More information

South Australian Research and Development Institute. Positive lot sampling for E. coli O157

South Australian Research and Development Institute. Positive lot sampling for E. coli O157 final report Project code: Prepared by: A.MFS.0158 Andreas Kiermeier Date submitted: June 2009 South Australian Research and Development Institute PUBLISHED BY Meat & Livestock Australia Limited Locked

More information

Study Guide #2: MULTIPLE REGRESSION in education

Study Guide #2: MULTIPLE REGRESSION in education Study Guide #2: MULTIPLE REGRESSION in education What is Multiple Regression? When using Multiple Regression in education, researchers use the term independent variables to identify those variables that

More information

Statistics Guide. Prepared by: Amanda J. Rockinson- Szapkiw, Ed.D.

Statistics Guide. Prepared by: Amanda J. Rockinson- Szapkiw, Ed.D. This guide contains a summary of the statistical terms and procedures. This guide can be used as a reference for course work and the dissertation process. However, it is recommended that you refer to statistical

More information

6. Unusual and Influential Data

6. Unusual and Influential Data Sociology 740 John ox Lecture Notes 6. Unusual and Influential Data Copyright 2014 by John ox Unusual and Influential Data 1 1. Introduction I Linear statistical models make strong assumptions about the

More information

Normal Distribution. Many variables are nearly normal, but none are exactly normal Not perfect, but still useful for a variety of problems.

Normal Distribution. Many variables are nearly normal, but none are exactly normal Not perfect, but still useful for a variety of problems. Review Probability: likelihood of an event Each possible outcome can be assigned a probability If we plotted the probabilities they would follow some type a distribution Modeling the distribution is important

More information