Performance of Median and Least Squares Regression for Slightly Skewed Data
|
|
- Barnaby Rogers
- 5 years ago
- Views:
Transcription
1 World Academy of Science, Engineering and Technology 9 Performance of Median and Least Squares Regression for Slightly Skewed Data Carolina Bancayrin - Baguio Abstract This paper presents the concept of quantile regression which is a nonparametric procedure used for prediction. This method is more robust compared to the Ordinary Least Squares (OLS) for asymmetric data. Specifically, in this study the procedure for Median Regression was emphasized and the attempt to compare its efficiency to that of Least Squares Method was done for slightly skewed data. The study utilized a subset of the data on the study entitled The Profile of Mindanao State University-Iligan Institute of Technology Students and Its Effects to their Academic Performance with total sample size of 677 students coming from different colleges in the Institute for the year -. The grade point average (gpa) is the dependent variable and their academic load, entrance exam scores ((SASE), age and number of hours for studying are the independent variables. It was found out that values of the standard errors of the estimates of the regression coefficients are relatively smaller for some predictors using the median regression compared to those coefficients obtained using the least squares method. In order to validate the performance of the median regression in terms of asymptotic efficiency, a bootstrapped median regression with, replicates was employed to compute the estimates of the regression coefficients. It was shown that the standard errors of the estimates of the median and bootstrapped regression are comparable while the values of the pseudo R are the same. This result confirms the asymptotic efficiency of the median regression for relative large sample sizes of the pseudo replicates for the data generated by bootstrapping. Keywords Quantile regression, robust, asymmetric distribution, nonparametric procedure, ordinary least squares, least trimmed squares. I. INTRODUCTION T has been observed that most of the studies in various I fields deal with the investigation of causal relationships of response on some realistic phenomena and predictor variables. Most often, linear regression analysis using least squares is used by researchers without the knowledge of a robust regression. This method is being driven with several assumptions like normally distributed errors, independence of observations, linearity, and absence of outliers. It was proven by so many studies that ordinary least squares is not robust compared to other methods such as the Median Regression, Least Trimmed Squares, and the Least Median Squares (Yaffee, R.A., ). The Least Absolute Deviation (LAD) was proven to be better than the Ordinary Least Squares (OLS) as revealed in Carolina Bancayrin - Baguio, PhD, is with MSU-Iligan Institute of Technology, Iligan City, 9, Philippines ( cbbaguio@yahoo.com). many studies which resulted into the Median Regression for the th quantile or in general for any quantile, the so called Quantile Regression. This method is classified under the nonparametric statistics since it is not assumed that data follow certain kind of distribution. Quantile Regression was never popular several decades ago and is not being taught in the basic Statistics course due to its inherent difficulty in the computation. Since the LAD function is not differentiable unlike the OLS, the Linear programming method was employed to solve the optimization problem instead of using calculus. The computation of which is greatly facilitated nowadays since the advent of better solution methods and advances in computing. The curse of multidimensionality or parsimony which arise if there are so many variables and data points was already addressed by some advanced algorithms other than the simplex method of Linear programming, the so called Interior Point Algorithm of Linear Programming. II. OBJECTIVES OF THE STUDY The main objective of this study is to investigate the potential of median regression for slightly skewed data. Specifically, it attempted to: ) To discuss the procedure for computing the estimates of Median Regression ) To compare Median Regression with Ordinary Least Squares (OLS) through the magnitudes of the standard errors of the regression coefficient estimates III. THEORY AND CONCEPTS A. Median Regression Median Regression estimates the median of the dependent variable, conditional on the values of the independent variable whereas the Least-Squares Regression estimates the mean of the dependent variable. The basic difference of the two methods is that the median regression finds the regression plane that minimizes the sum of the absolute residuals rather than the sum of the squared residuals for the Ordinary Least Squares. B. Bootstrapping Bootstrapping is a way of testing the reliability of the data set. It is the creation of pseudoreplicate datasets by resampling and a statistical method for obtaining an estimate of standard error. Moreover, bootstrapping allows one to assess whether the distribution of characters has been influenced by stochastic 6
2 World Academy of Science, Engineering and Technology 9 effects. C. Quantile Function Let Y be any real valued random variable characterized by its distribution function as, The τ th quantile of Y for any < τ <, on the other hand, can be written as: The quantile function in () provides a complete characterization of the random variable just like the distribution function. Using this function the median can be written as Q(/). The quantiles defined above can be formulated as the solution to a simple optimization problem. For any < τ <, define the piecewise linear function, ρ τ (v) = v(τ-i(v<)). Minimizing the expectation of ρ τ (Y θ) with respect to θ, yields solutions in which the smallest is the Q(τ) defined in (). The sample analog q(τ) of Q(τ), based on a random sample, (y,y,...,y n ) of Y s, is called the τ th sample quantile, which can be found by solving () Equation in () yields a natural generalization to the regression context. The linear conditional quantile function can be estimated by solving, for X ε R K and β ε R K : D. Computation of Quantile Regression Estimates The package Quantreg procedure using the R computer programming language solves Linear programs using the simplex algorithm of Barrodale and Roberts (97).The algorithm solves the linear program by two stages. The first stage picks the X or X as pivotal columns. The second stage interchanges the columns in I and I as basis or nonbasis columns. The optimal solution can be obtained by executing the two stages interactively. Only the main data matrix is stored in the current memory because of the special structure of the matrix A. This special version of the Simplex Algorithm for Median Regression can be extended to quantile regression for any given quantile, even for the entire quantile process (Koenker and d Orey 99). It was found out that this procedure reduces greatly the computing time required by a general simplex algorithm, and suitable for data sets with less than, observations and variables. The software called STATA version 9. can also solve the estimates of the coefficients of Median Regression as well as the Bootstrapped Median Regression. () () E. Hypothesis Testing Using the Dual Quantile estimation can be represented as a linear program. The confidence intervals of the coefficient estimates can be constructed by the solution of dual problem in quantile regression. It involves the rank statistics due to the dual problem of quantile regression. The advantage of rank test is that the nuisance parameter estimation can be avoided and the result is robust.the testing of the hypothesis for linearity of the quantiles in quantile regression can be done by the Rank inverse test. Gutenbrunner and Jureckova (99) showed that the solutions of the dual problem which is formulated for computing regression quantiles generalize the duality of ranks and quantiles to linear models. The dual solution called regression rank score process establishes the link between the linear rank statistics and regression quantiles. The procedure emanates from the classical theory of rank tests which can be extended to the test of the hypothesis IV. METHODS A subset of the data on the study entitled The Profile of Mindanao State University-Iligan Institute of Technology Students and its Effect to their Academic Performance with total sample size of 677 students coming from different colleges in the Institute for the year - was utilized in the study in order to illustrate the performance of Median Regression. The grade point average (gpa) is the dependent variable and their academic load, entrance exam scores (SASE), age and number of hours for studying are the independent variable. The software called STATA version9. licensed to Mindanao State University - Iligan Institute of Technology, Iligan City Philippines was used for the computation. V. EMPIRICAL RESULTS TABLE I DESCRIPTIVE STATISTICS OF THE DEPENDENT AND INDEPENDENT VARIABLES Variables Skewness Mean s.d. / cv Grade Point.86.. /.87% Average Entrance Exam /.8% Academic Load / 6.% Age / 9.7% Study hours /.% s.e. = standard error cv = coefficient of variation It is shown in Table I that the skewness values of the variables are relatively small and different from zero which implies that the data are slightly skewed. The percentage of variation is more than % for most of the variables except the ages of the students. This means that these variables are quite dispersed from the mean except for the ages of the 7
3 World Academy of Science, Engineering and Technology 9 students. Figs. to give the histograms and the Q-Q plots of the different variables. These charts confirm that the data departs from normality and are slightly skewed academic load.. Mean = Std. Dev. = 6.76 Fig. Histogram of the Students Academic Load Mean =. Std. Dev. = gpa Fig. Histogram for the Students Grade point Average(gpa) Normal Q-Q Plot of gpa Normal Q-Q Plot of academic load Fig. 6 Q-Q Plot of Students Academic Load Fig. QQ Plot of the Students Grade Point Average (gpa) Histogram for Students' age Mean = 9.6 Std. Dev. =.79 AGE Mean = 98.6 Std. Dev. = admission test result (raw score) Fig. Histogram of the Students Admission Test (SASE) Normal Q-Q Plot of admission test result (raw score) Fig. 7 Histogram of the Students Age Normal Q-Q Plot of age Fig. Q-Q Plot of the Students Admission Test (SASE) Fig. 8 Q-Q Plot of Students Age 8
4 World Academy of Science, Engineering and Technology 9 Mean =.88 Std. Dev. = time spent for studying per day Fig. 9 Histogram for Students Study Hours Normal Q-Q Plot of time spent for studying per day TABLE III COMPARATIVE PERFORMANCE OF MEDIAN AND BOOTSTRAPPED MEDIAN LEAST SQUARES RESPONSE MEDIAN VARIABLE coeff. s.e. coeff. s.e. (gpa) Predictors Admission Test -.7**. -.7**. -6. < <. Academic Load -.8**. -.7** Age.6**..6**...<...<. Study Hours -.6**. -.6** Constant.9**.8.9**. No. of Students N=677 Coeff.=coefficient s.e. = standard error ** highly significant at. level 6.8.<. 6.9.<. Pseudo R =.6 R =.6 Fig. Q-Q Plot of Students time spent for study per Day TABLE II COMPARATIVE PERFORMANCE OF MEDIAN AND LEAST SQUARES LEAST SQUARES RESPONSE MEDIAN VARIABLE coeff. s.e. coeff. s.e. (gpa) Predictors Admission Test -.7**. -.6**. -6. <. -..<. Academic Load -.8**. -.7** Age.6**..6**...<...<. Study Hours -.6**. -.6** Constant.9**.8.97**.98 No. of Students N=677 Coeff.=coefficient s.e. = standard error ** highly significant at. level 6.8.<. 6.6.<. Pseudo R =.6 R =.6 In order to compare the performance of the Median and Least Squares Regression for slightly skewed regression data that is normality assumption is waived for all the variables, Table II is constructed. It can be observed from the results of both methods that the significant linear predictors for the students grade point average are the admission test scores, academic load, age, and time spent for studying in hours. It is apparent from this result that the standard errors in the regression coefficient for both methods are the same for some variables like admission test score, academic load, study hours but there is a slight difference on the standard errors of the coefficients for the age and constant. However, the value of the R for the Least Squares method is larger than the pseudo R value of the Median Regression. But the violation of the assumption of normality on the values of the data can be waived for the median regression but not for the least squares regression. Hence, median regression is the appropriate procedure and leads to accurate statistical results. Table III gives the comparison of the regression estimates of the Median and Bootstrapped Median Regression with, pseudo replicates of the data. It can be observed that the standard errors of the estimates are just slightly different and so with the values of the significance levels. The values of the pseudo R for the goodness of fit for the two methods are equal. These findings reveal that the Median Regression gives asymptotic efficient estimates for the coefficients of the regression model. Hence, it can be said that Median Regression is the robust and asymptotically efficient procedure to address the problem of lightly skewed data which violates the assumption of normality required for regression data. VI. CONCLUSION From the findings of this study the following inferences can be drawn: 9
5 World Academy of Science, Engineering and Technology 9 a. The Median Regression is relatively better than the Least Squares Regression for slightly skewed data basing from the magnitudes of the standard errors of the regression coefficients. b. The Median Regression gives asymptotic efficient estimates for the coefficients of the regression model. VII. FUTURE DIRECTION There are still other robust methods in dealing with slightly skewed data mentioned in the literature. Hence, it is hereby recommended to compare the findings of this study to the results from other robust methods. Other resampling plans which reduce the biasedness in the standard errors of the regression estimates are also endorsed to be explored. REFERENCES [] Barrodale,I. and Roberts,F.D.K. (97). An Improved Algorithm for Discrete l Linear Approximation, SIAM J. Nmer. Anal.,, [] Gutenbrunner, C. and Jureckova, J. (99). Regression rank scores and regression quantiles, Annals Of Statistics (), -. [] Koenker, R and d Orey,V. (99), Computing Regression Quantiles, Applied Statistics,, -. [] Koenker, R. And Bassett,G.W. (978). Regression Quantiles,,Econometrica, 6, -. [] Yaffee, R. A. (). Robust Regression Analysis: Some Popular Statistical package Options, Statistics, Social Science and mapping Group, Academic Computing Services, Information Technology Services.
Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)
Chapter : Advanced Remedial Measures Weighted Least Squares (WLS) When the error variance appears nonconstant, a transformation (of Y and/or X) is a quick remedy. But it may not solve the problem, or it
More informationCitation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.
University of Groningen Latent instrumental variables Ebbes, P. IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document
More informationMEA DISCUSSION PAPERS
Inference Problems under a Special Form of Heteroskedasticity Helmut Farbmacher, Heinrich Kögel 03-2015 MEA DISCUSSION PAPERS mea Amalienstr. 33_D-80799 Munich_Phone+49 89 38602-355_Fax +49 89 38602-390_www.mea.mpisoc.mpg.de
More informationA Comparison of Robust and Nonparametric Estimators Under the Simple Linear Regression Model
Nevitt & Tam A Comparison of Robust and Nonparametric Estimators Under the Simple Linear Regression Model Jonathan Nevitt, University of Maryland, College Park Hak P. Tam, National Taiwan Normal University
More informationDr. Kelly Bradley Final Exam Summer {2 points} Name
{2 points} Name You MUST work alone no tutors; no help from classmates. Email me or see me with questions. You will receive a score of 0 if this rule is violated. This exam is being scored out of 00 points.
More informationPitfalls in Linear Regression Analysis
Pitfalls in Linear Regression Analysis Due to the widespread availability of spreadsheet and statistical software for disposal, many of us do not really have a good understanding of how to use regression
More informationQuantitative Methods in Computing Education Research (A brief overview tips and techniques)
Quantitative Methods in Computing Education Research (A brief overview tips and techniques) Dr Judy Sheard Senior Lecturer Co-Director, Computing Education Research Group Monash University judy.sheard@monash.edu
More informationResults & Statistics: Description and Correlation. I. Scales of Measurement A Review
Results & Statistics: Description and Correlation The description and presentation of results involves a number of topics. These include scales of measurement, descriptive statistics used to summarize
More informationAddendum: Multiple Regression Analysis (DRAFT 8/2/07)
Addendum: Multiple Regression Analysis (DRAFT 8/2/07) When conducting a rapid ethnographic assessment, program staff may: Want to assess the relative degree to which a number of possible predictive variables
More informationWELCOME! Lecture 11 Thommy Perlinger
Quantitative Methods II WELCOME! Lecture 11 Thommy Perlinger Regression based on violated assumptions If any of the assumptions are violated, potential inaccuracies may be present in the estimated regression
More informationContent. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries. Research question. Example Newly diagnosed Type 2 Diabetes
Content Quantifying association between continuous variables. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General
More informationAP Statistics. Semester One Review Part 1 Chapters 1-5
AP Statistics Semester One Review Part 1 Chapters 1-5 AP Statistics Topics Describing Data Producing Data Probability Statistical Inference Describing Data Ch 1: Describing Data: Graphically and Numerically
More informationBusiness Statistics Probability
Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment
More informationList of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition
List of Figures List of Tables Preface to the Second Edition Preface to the First Edition xv xxv xxix xxxi 1 What Is R? 1 1.1 Introduction to R................................ 1 1.2 Downloading and Installing
More informationChapter 1: Explaining Behavior
Chapter 1: Explaining Behavior GOAL OF SCIENCE is to generate explanations for various puzzling natural phenomenon. - Generate general laws of behavior (psychology) RESEARCH: principle method for acquiring
More informationSTAT445 Midterm Project1
STAT445 Midterm Project1 Executive Summary This report works on the dataset of Part of This Nutritious Breakfast! In this dataset, 77 different breakfast cereals were collected. The dataset also explores
More informationSTATISTICS AND RESEARCH DESIGN
Statistics 1 STATISTICS AND RESEARCH DESIGN These are subjects that are frequently confused. Both subjects often evoke student anxiety and avoidance. To further complicate matters, both areas appear have
More informationMMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?
MMI 409 Spring 2009 Final Examination Gordon Bleil Table of Contents Research Scenario and General Assumptions Questions for Dataset (Questions are hyperlinked to detailed answers) 1. Is there a difference
More information2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%
Capstone Test (will consist of FOUR quizzes and the FINAL test grade will be an average of the four quizzes). Capstone #1: Review of Chapters 1-3 Capstone #2: Review of Chapter 4 Capstone #3: Review of
More informationOriginal Article Downloaded from jhs.mazums.ac.ir at 22: on Friday October 5th 2018 [ DOI: /acadpub.jhs ]
Iranian journal of health sciences 213;1(3):58-7 http://jhs.mazums.ac.ir Original Article Downloaded from jhs.mazums.ac.ir at 22:2 +33 on Friday October 5th 218 [ DOI: 1.18869/acadpub.jhs.1.3.58 ] A New
More information9 research designs likely for PSYC 2100
9 research designs likely for PSYC 2100 1) 1 factor, 2 levels, 1 group (one group gets both treatment levels) related samples t-test (compare means of 2 levels only) 2) 1 factor, 2 levels, 2 groups (one
More informationStill important ideas
Readings: OpenStax - Chapters 1 13 & Appendix D & E (online) Plous Chapters 17 & 18 - Chapter 17: Social Influences - Chapter 18: Group Judgments and Decisions Still important ideas Contrast the measurement
More informationBasic Statistics 01. Describing Data. Special Program: Pre-training 1
Basic Statistics 01 Describing Data Special Program: Pre-training 1 Describing Data 1. Numerical Measures Measures of Location Measures of Dispersion Correlation Analysis 2. Frequency Distributions (Relative)
More informationStudy of cigarette sales in the United States Ge Cheng1, a,
2nd International Conference on Economics, Management Engineering and Education Technology (ICEMEET 2016) 1Department Study of cigarette sales in the United States Ge Cheng1, a, of pure mathematics and
More information11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES
Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are
More informationTitle: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection
Author's response to reviews Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection Authors: Jestinah M Mahachie John
More informationMidterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do.
Midterm STAT-UB.0003 Regression and Forecasting Models The exam is closed book and notes, with the following exception: you are allowed to bring one letter-sized page of notes into the exam (front and
More informationDescribe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo
Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment
More informationSTATISTICS & PROBABILITY
STATISTICS & PROBABILITY LAWRENCE HIGH SCHOOL STATISTICS & PROBABILITY CURRICULUM MAP 2015-2016 Quarter 1 Unit 1 Collecting Data and Drawing Conclusions Unit 2 Summarizing Data Quarter 2 Unit 3 Randomness
More informationEcological Statistics
A Primer of Ecological Statistics Second Edition Nicholas J. Gotelli University of Vermont Aaron M. Ellison Harvard Forest Sinauer Associates, Inc. Publishers Sunderland, Massachusetts U.S.A. Brief Contents
More informationScore Tests of Normality in Bivariate Probit Models
Score Tests of Normality in Bivariate Probit Models Anthony Murphy Nuffield College, Oxford OX1 1NF, UK Abstract: A relatively simple and convenient score test of normality in the bivariate probit model
More informationNORTH SOUTH UNIVERSITY TUTORIAL 2
NORTH SOUTH UNIVERSITY TUTORIAL 2 AHMED HOSSAIN,PhD Data Management and Analysis AHMED HOSSAIN,PhD - Data Management and Analysis 1 Correlation Analysis INTRODUCTION In correlation analysis, we estimate
More informationVarious Approaches to Szroeter s Test for Regression Quantiles
The International Scientific Conference INPROFORUM 2017, November 9, 2017, České Budějovice, 361-365, ISBN 978-80-7394-667-8. Various Approaches to Szroeter s Test for Regression Quantiles Jan Kalina,
More informationStatistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions
Readings: OpenStax Textbook - Chapters 1 5 (online) Appendix D & E (online) Plous - Chapters 1, 5, 6, 13 (online) Introductory comments Describe how familiarity with statistical methods can - be associated
More informationSection on Survey Research Methods JSM 2009
Missing Data and Complex Samples: The Impact of Listwise Deletion vs. Subpopulation Analysis on Statistical Bias and Hypothesis Test Results when Data are MCAR and MAR Bethany A. Bell, Jeffrey D. Kromrey
More informationCHAPTER TWO REGRESSION
CHAPTER TWO REGRESSION 2.0 Introduction The second chapter, Regression analysis is an extension of correlation. The aim of the discussion of exercises is to enhance students capability to assess the effect
More informationExamining Relationships Least-squares regression. Sections 2.3
Examining Relationships Least-squares regression Sections 2.3 The regression line A regression line describes a one-way linear relationship between variables. An explanatory variable, x, explains variability
More informationCRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys
Multiple Regression Analysis 1 CRITERIA FOR USE Multiple regression analysis is used to test the effects of n independent (predictor) variables on a single dependent (criterion) variable. Regression tests
More informationReadings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F
Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Plous Chapters 17 & 18 Chapter 17: Social Influences Chapter 18: Group Judgments and Decisions
More informationRussian Journal of Agricultural and Socio-Economic Sciences, 3(15)
ON THE COMPARISON OF BAYESIAN INFORMATION CRITERION AND DRAPER S INFORMATION CRITERION IN SELECTION OF AN ASYMMETRIC PRICE RELATIONSHIP: BOOTSTRAP SIMULATION RESULTS Henry de-graft Acquah, Senior Lecturer
More information11/24/2017. Do not imply a cause-and-effect relationship
Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are highly extraverted people less afraid of rejection
More informationAnalysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach
University of South Florida Scholar Commons Graduate Theses and Dissertations Graduate School November 2015 Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach Wei Chen
More informationCHILD HEALTH AND DEVELOPMENT STUDY
CHILD HEALTH AND DEVELOPMENT STUDY 9. Diagnostics In this section various diagnostic tools will be used to evaluate the adequacy of the regression model with the five independent variables developed in
More informationbivariate analysis: The statistical analysis of the relationship between two variables.
bivariate analysis: The statistical analysis of the relationship between two variables. cell frequency: The number of cases in a cell of a cross-tabulation (contingency table). chi-square (χ 2 ) test for
More informationResearch Methods in Forest Sciences: Learning Diary. Yoko Lu December Research process
Research Methods in Forest Sciences: Learning Diary Yoko Lu 285122 9 December 2016 1. Research process It is important to pursue and apply knowledge and understand the world under both natural and social
More informationCOAL COMBUSTION RESIDUALS RULE STATISTICAL METHODS CERTIFICATION SOUTHERN ILLINOIS POWER COOPERATIVE (SIPC)
Regulatory Guidance Regulatory guidance provided in 40 CFR 257.90 specifies that a CCR groundwater monitoring program must include selection of the statistical procedures to be used for evaluating groundwater
More informationChapter 1: Exploring Data
Chapter 1: Exploring Data Key Vocabulary:! individual! variable! frequency table! relative frequency table! distribution! pie chart! bar graph! two-way table! marginal distributions! conditional distributions!
More informationInvestigating the robustness of the nonparametric Levene test with more than two groups
Psicológica (2014), 35, 361-383. Investigating the robustness of the nonparametric Levene test with more than two groups David W. Nordstokke * and S. Mitchell Colp University of Calgary, Canada Testing
More informationSimple Linear Regression the model, estimation and testing
Simple Linear Regression the model, estimation and testing Lecture No. 05 Example 1 A production manager has compared the dexterity test scores of five assembly-line employees with their hourly productivity.
More informationIn this module I provide a few illustrations of options within lavaan for handling various situations.
In this module I provide a few illustrations of options within lavaan for handling various situations. An appropriate citation for this material is Yves Rosseel (2012). lavaan: An R Package for Structural
More informationAnalysis and Interpretation of Data Part 1
Analysis and Interpretation of Data Part 1 DATA ANALYSIS: PRELIMINARY STEPS 1. Editing Field Edit Completeness Legibility Comprehensibility Consistency Uniformity Central Office Edit 2. Coding Specifying
More informationBiostatistics II
Biostatistics II 514-5509 Course Description: Modern multivariable statistical analysis based on the concept of generalized linear models. Includes linear, logistic, and Poisson regression, survival analysis,
More informationModeling Sentiment with Ridge Regression
Modeling Sentiment with Ridge Regression Luke Segars 2/20/2012 The goal of this project was to generate a linear sentiment model for classifying Amazon book reviews according to their star rank. More generally,
More informationSimple Linear Regression
Simple Linear Regression Assoc. Prof Dr Sarimah Abdullah Unit of Biostatistics & Research Methodology School of Medical Sciences, Health Campus Universiti Sains Malaysia Regression Regression analysis
More informationModern Regression Methods
Modern Regression Methods Second Edition THOMAS P. RYAN Acworth, Georgia WILEY A JOHN WILEY & SONS, INC. PUBLICATION Contents Preface 1. Introduction 1.1 Simple Linear Regression Model, 3 1.2 Uses of Regression
More informationBayesian Confidence Intervals for Means and Variances of Lognormal and Bivariate Lognormal Distributions
Bayesian Confidence Intervals for Means and Variances of Lognormal and Bivariate Lognormal Distributions J. Harvey a,b, & A.J. van der Merwe b a Centre for Statistical Consultation Department of Statistics
More informationStatistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions
Readings: OpenStax Textbook - Chapters 1 5 (online) Appendix D & E (online) Plous - Chapters 1, 5, 6, 13 (online) Introductory comments Describe how familiarity with statistical methods can - be associated
More informationStatistical Methods Exam I Review
Statistical Methods Exam I Review Professor: Dr. Kathleen Suchora SI Leader: Camila M. DISCLAIMER: I have created this review sheet to supplement your studies for your first exam. I am a student here at
More informationPart 1. Online Session: Math Review and Math Preparation for Course 5 minutes Introduction 45 minutes Reading and Practice Problem Assignment
Course Schedule PREREQUISITE (Pre-Class) Advanced Education Diagnostic Test 10 minutes Excel 2007 Exercise SECTION 1. (Completed before face-to-face sections begin) (2 hours) Part 1. Online Session: Math
More informationDescribe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo
Please note the page numbers listed for the Lind book may vary by a page or two depending on which version of the textbook you have. Readings: Lind 1 11 (with emphasis on chapters 10, 11) Please note chapter
More informationStepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality
Week 9 Hour 3 Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality Stat 302 Notes. Week 9, Hour 3, Page 1 / 39 Stepwise Now that we've introduced interactions,
More informationStill important ideas
Readings: OpenStax - Chapters 1 11 + 13 & Appendix D & E (online) Plous - Chapters 2, 3, and 4 Chapter 2: Cognitive Dissonance, Chapter 3: Memory and Hindsight Bias, Chapter 4: Context Dependence Still
More informationWhat you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu
What you should know before you collect data BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Types and levels of study Descriptive statistics Inferential statistics How to choose a statistical test
More informationCHAPTER - 6 STATISTICAL ANALYSIS. This chapter discusses inferential statistics, which use sample data to
CHAPTER - 6 STATISTICAL ANALYSIS 6.1 Introduction This chapter discusses inferential statistics, which use sample data to make decisions or inferences about population. Populations are group of interest
More informationEXECUTIVE SUMMARY DATA AND PROBLEM
EXECUTIVE SUMMARY Every morning, almost half of Americans start the day with a bowl of cereal, but choosing the right healthy breakfast is not always easy. Consumer Reports is therefore calculated by an
More informationWDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?
WDHS Curriculum Map Probability and Statistics Time Interval/ Unit 1: Introduction to Statistics 1.1-1.3 2 weeks S-IC-1: Understand statistics as a process for making inferences about population parameters
More information10. LINEAR REGRESSION AND CORRELATION
1 10. LINEAR REGRESSION AND CORRELATION The contingency table describes an association between two nominal (categorical) variables (e.g., use of supplemental oxygen and mountaineer survival ). We have
More informationFrom Biostatistics Using JMP: A Practical Guide. Full book available for purchase here. Chapter 1: Introduction... 1
From Biostatistics Using JMP: A Practical Guide. Full book available for purchase here. Contents Dedication... iii Acknowledgments... xi About This Book... xiii About the Author... xvii Chapter 1: Introduction...
More informationCorrelation and Regression
Dublin Institute of Technology ARROW@DIT Books/Book Chapters School of Management 2012-10 Correlation and Regression Donal O'Brien Dublin Institute of Technology, donal.obrien@dit.ie Pamela Sharkey Scott
More informationOverview of Non-Parametric Statistics
Overview of Non-Parametric Statistics LISA Short Course Series Mark Seiss, Dept. of Statistics April 7, 2009 Presentation Outline 1. Homework 2. Review of Parametric Statistics 3. Overview Non-Parametric
More informationCHAPTER ONE CORRELATION
CHAPTER ONE CORRELATION 1.0 Introduction The first chapter focuses on the nature of statistical data of correlation. The aim of the series of exercises is to ensure the students are able to use SPSS to
More informationLec 02: Estimation & Hypothesis Testing in Animal Ecology
Lec 02: Estimation & Hypothesis Testing in Animal Ecology Parameter Estimation from Samples Samples We typically observe systems incompletely, i.e., we sample according to a designed protocol. We then
More informationMultiple Regression Analysis
Multiple Regression Analysis Basic Concept: Extend the simple regression model to include additional explanatory variables: Y = β 0 + β1x1 + β2x2 +... + βp-1xp + ε p = (number of independent variables
More informationEconometric Game 2012: infants birthweight?
Econometric Game 2012: How does maternal smoking during pregnancy affect infants birthweight? Case A April 18, 2012 1 Introduction Low birthweight is associated with adverse health related and economic
More informationStatistics as a Tool. A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations.
Statistics as a Tool A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations. Descriptive Statistics Numerical facts or observations that are organized describe
More informationUNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test February 2016
UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test February 2016 STAB22H3 Statistics I, LEC 01 and LEC 02 Duration: 1 hour and 45 minutes Last Name: First Name:
More informationMaking Inferences from Experiments
11.6 Making Inferences from Experiments Essential Question How can you test a hypothesis about an experiment? Resampling Data Yield (kilograms) Control Group Treatment Group 1. 1.1 1.2 1. 1.5 1.4.9 1.2
More informationUnderweight Children in Ghana: Evidence of Policy Effects. Samuel Kobina Annim
Underweight Children in Ghana: Evidence of Policy Effects Samuel Kobina Annim Correspondence: Economics Discipline Area School of Social Sciences University of Manchester Oxford Road, M13 9PL Manchester,
More informationContents. Part 1 Introduction. Part 2 Cross-Sectional Selection Bias Adjustment
From Analysis of Observational Health Care Data Using SAS. Full book available for purchase here. Contents Preface ix Part 1 Introduction Chapter 1 Introduction to Observational Studies... 3 1.1 Observational
More informationIntroduction to Quantitative Methods (SR8511) Project Report
Introduction to Quantitative Methods (SR8511) Project Report Exploring the variables related to and possibly affecting the consumption of alcohol by adults Student Registration number: 554561 Word counts
More informationStatistical Techniques. Meta-Stat provides a wealth of statistical tools to help you examine your data. Overview
7 Applying Statistical Techniques Meta-Stat provides a wealth of statistical tools to help you examine your data. Overview... 137 Common Functions... 141 Selecting Variables to be Analyzed... 141 Deselecting
More informationNormal Q Q. Residuals vs Fitted. Standardized residuals. Theoretical Quantiles. Fitted values. Scale Location 26. Residuals vs Leverage
Residuals 400 0 400 800 Residuals vs Fitted 26 42 29 Standardized residuals 2 0 1 2 3 Normal Q Q 26 42 29 360 400 440 2 1 0 1 2 Fitted values Theoretical Quantiles Standardized residuals 0.0 0.5 1.0 1.5
More information1.4 - Linear Regression and MS Excel
1.4 - Linear Regression and MS Excel Regression is an analytic technique for determining the relationship between a dependent variable and an independent variable. When the two variables have a linear
More informationMULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES
24 MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES In the previous chapter, simple linear regression was used when you have one independent variable and one dependent variable. This chapter
More informationLab 2: Non-linear regression models
Lab 2: Non-linear regression models The flint water crisis has received much attention in the past few months. In today s lab, we will apply the non-linear regression to analyze the lead testing results
More informationTHE ROLE OF PSYCHOMETRIC ENTRANCE TEST IN ADMISSION PROCESSES FOR NON-SELECTIVE ACADEMIC DEPARTMENTS: STUDY CASE IN YEZREEL VALLEY COLLEGE
THE ROLE OF PSYCHOMETRIC ENTRANCE TEST IN ADMISSION PROCESSES FOR NON-SELECTIVE ACADEMIC DEPARTMENTS: STUDY CASE IN YEZREEL VALLEY COLLEGE Tal Shahor The Academic College of Emek Yezreel Emek Yezreel 19300,
More informationRecent Advances in Methods for Quantiles. Matteo Bottai, Sc.D.
Recent Advances in Methods for Quantiles Matteo Bottai, Sc.D. Many Thanks to Advisees Andrew Ortaglia Huiling Zhen Joe Holbrook Junlong Wu Li Zhou Marco Geraci Nicola Orsini Paolo Frumento Yuan Liu Collaborators
More informationDescribe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo
Please note the page numbers listed for the Lind book may vary by a page or two depending on which version of the textbook you have. Readings: Lind 1 11 (with emphasis on chapters 5, 6, 7, 8, 9 10 & 11)
More informationNPTEL Project. Econometric Modelling. Module 14: Heteroscedasticity Problem. Module 16: Heteroscedasticity Problem. Vinod Gupta School of Management
1 P age NPTEL Project Econometric Modelling Vinod Gupta School of Management Module 14: Heteroscedasticity Problem Module 16: Heteroscedasticity Problem Rudra P. Pradhan Vinod Gupta School of Management
More informationLinear Regression in SAS
1 Suppose we wish to examine factors that predict patient s hemoglobin levels. Simulated data for six patients is used throughout this tutorial. data hgb_data; input id age race $ bmi hgb; cards; 21 25
More informationHere are the various choices. All of them are found in the Analyze menu in SPSS, under the sub-menu for Descriptive Statistics :
Descriptive Statistics in SPSS When first looking at a dataset, it is wise to use descriptive statistics to get some idea of what your data look like. Here is a simple dataset, showing three different
More informationStatistical Summaries. Kerala School of MathematicsCourse in Statistics for Scientists. Descriptive Statistics. Summary Statistics
Kerala School of Mathematics Course in Statistics for Scientists Statistical Summaries Descriptive Statistics T.Krishnan Strand Life Sciences, Bangalore may be single numerical summaries of a batch, such
More informationANOVA in SPSS (Practical)
ANOVA in SPSS (Practical) Analysis of Variance practical In this practical we will investigate how we model the influence of a categorical predictor on a continuous response. Centre for Multilevel Modelling
More informationStatistical techniques to evaluate the agreement degree of medicine measurements
Statistical techniques to evaluate the agreement degree of medicine measurements Luís M. Grilo 1, Helena L. Grilo 2, António de Oliveira 3 1 lgrilo@ipt.pt, Mathematics Department, Polytechnic Institute
More informationBias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study
STATISTICAL METHODS Epidemiology Biostatistics and Public Health - 2016, Volume 13, Number 1 Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation
More informationSouth Australian Research and Development Institute. Positive lot sampling for E. coli O157
final report Project code: Prepared by: A.MFS.0158 Andreas Kiermeier Date submitted: June 2009 South Australian Research and Development Institute PUBLISHED BY Meat & Livestock Australia Limited Locked
More informationStudy Guide #2: MULTIPLE REGRESSION in education
Study Guide #2: MULTIPLE REGRESSION in education What is Multiple Regression? When using Multiple Regression in education, researchers use the term independent variables to identify those variables that
More informationStatistics Guide. Prepared by: Amanda J. Rockinson- Szapkiw, Ed.D.
This guide contains a summary of the statistical terms and procedures. This guide can be used as a reference for course work and the dissertation process. However, it is recommended that you refer to statistical
More information6. Unusual and Influential Data
Sociology 740 John ox Lecture Notes 6. Unusual and Influential Data Copyright 2014 by John ox Unusual and Influential Data 1 1. Introduction I Linear statistical models make strong assumptions about the
More informationNormal Distribution. Many variables are nearly normal, but none are exactly normal Not perfect, but still useful for a variety of problems.
Review Probability: likelihood of an event Each possible outcome can be assigned a probability If we plotted the probabilities they would follow some type a distribution Modeling the distribution is important
More information