BOOTSTRAPPING CONFIDENCE LEVELS FOR HYPOTHESES ABOUT REGRESSION MODELS

Similar documents
BOOTSTRAPPING CONFIDENCE LEVELS FOR HYPOTHESES ABOUT QUADRATIC (U-SHAPED) REGRESSION MODELS

Pitfalls in Linear Regression Analysis

The use of statistical methods in management research: a critique and some suggestions based on a case study 30 March 2010

Brief notes on statistics: Part 4 More on regression: multiple regression, p values, confidence intervals, etc

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

Business Statistics Probability

Understandable Statistics

Simple Linear Regression the model, estimation and testing

Unit 1 Exploring and Understanding Data

IAPT: Regression. Regression analyses

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences.

Research Methods in Forest Sciences: Learning Diary. Yoko Lu December Research process

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

1.4 - Linear Regression and MS Excel

Chapter 1: Exploring Data

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

A Spreadsheet for Deriving a Confidence Interval, Mechanistic Inference and Clinical Inference from a P Value

12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

Statistical Methods and Reasoning for the Clinical Sciences

Still important ideas

Political Science 15, Winter 2014 Final Review

Understanding Uncertainty in School League Tables*

Statistics for Psychology

PTHP 7101 Research 1 Chapter Assignments

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14

Section 6: Analysing Relationships Between Variables

The Regression-Discontinuity Design

Regression Discontinuity Analysis

Section 3.2 Least-Squares Regression

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

6. Unusual and Influential Data

Convergence Principles: Information in the Answer

Reflection Questions for Math 58B

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Moderation in management research: What, why, when and how. Jeremy F. Dawson. University of Sheffield, United Kingdom

Conditional Distributions and the Bivariate Normal Distribution. James H. Steiger

10.1 Estimating with Confidence. Chapter 10 Introduction to Inference

Chapter 1: Introduction to Statistics

WELCOME! Lecture 11 Thommy Perlinger

AQA (A) Research methods. Model exam answers

Q: How do I get the protein concentration in mg/ml from the standard curve if the X-axis is in units of µg.

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc.

SUPPLEMENTAL MATERIAL

STATISTICS & PROBABILITY

STATISTICS AND RESEARCH DESIGN

Chapter 8: Estimating with Confidence

How to interpret results of metaanalysis

Statistics and Probability

CHAPTER ONE CORRELATION

USING STATCRUNCH TO CONSTRUCT CONFIDENCE INTERVALS and CALCULATE SAMPLE SIZE

Appendix B Statistical Methods

Bayes Linear Statistics. Theory and Methods

Chapter-2 RESEARCH DESIGN

Chapter 3: Examining Relationships

Introduction. We can make a prediction about Y i based on X i by setting a threshold value T, and predicting Y i = 1 when X i > T.

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

THE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER

Statistical Techniques. Masoud Mansoury and Anas Abulfaraj

Chapter 02 Developing and Evaluating Theories of Behavior

Sheila Barron Statistics Outreach Center 2/8/2011

CHAPTER 3 RESEARCH METHODOLOGY

Overview of Non-Parametric Statistics

Inferential Statistics

Multiple Linear Regression (Dummy Variable Treatment) CIVL 7012/8012

Student Performance Q&A:

EXPERIMENTAL DESIGN Page 1 of 11. relationships between certain events in the environment and the occurrence of particular

Chapter 12. The One- Sample

A Brief Introduction to Bayesian Statistics

CHAPTER 3. Methodology

3.2 Least- Squares Regression

Title: A new statistical test for trends: establishing the properties of a test for repeated binomial observations on a set of items

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

LAB ASSIGNMENT 4 INFERENCES FOR NUMERICAL DATA. Comparison of Cancer Survival*

Chapter 3 CORRELATION AND REGRESSION

Asignificant amount of information systems (IS) research involves hypothesizing and testing for interaction

CHILD HEALTH AND DEVELOPMENT STUDY

Chapter 7: Descriptive Statistics

11/24/2017. Do not imply a cause-and-effect relationship

You must answer question 1.

Sample Math 71B Final Exam #1. Answer Key

Context of Best Subset Regression

Regression CHAPTER SIXTEEN NOTE TO INSTRUCTORS OUTLINE OF RESOURCES

Reliability, validity, and all that jazz

Chapter 8: Estimating with Confidence

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)

Neuropsychology, in press. (Neuropsychology journal home page) American Psychological Association

Chapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE

Correlational analysis: Pearson s r CHAPTER OVERVIEW

MTH 225: Introductory Statistics

Assessing Agreement Between Methods Of Clinical Measurement

Calories per oz. Price per oz Corn Wheat

Chapter 4. Navigating. Analysis. Data. through. Exploring Bivariate Data. Navigations Series. Grades 6 8. Important Mathematical Ideas.

Bayesian Tailored Testing and the Influence

Theory Building and Hypothesis Testing. POLI 205 Doing Research in Politics. Theory. Building. Hypotheses. Testing. Fall 2015

FOR TEACHERS ONLY. The University of the State of New York REGENTS HIGH SCHOOL EXAMINATION MATHEMATICS B

Survival Skills for Researchers. Study Design

Transcription:

BOOTSTRAPPING CONFIDENCE LEVELS FOR HYPOTHESES ABOUT REGRESSION MODELS 17 December 2009 Michael Wood University of Portsmouth Business School SBS Department, Richmond Building Portland Street, Portsmouth PO1 3DE, UK michael.wood@port.ac.uk. 1

BOOTSTRAPPING CONFIDENCE LEVELS FOR HYPOTHESES ABOUT REGRESSION MODELS Abstract This paper shows how bootstrapping using a spreadsheet can be used to derive confidence levels for hypotheses about features of regression models such as their shape, and the location of optimum values. The data used as an example leads to a confidence level of 67% that the sample comes from a population which displays the hypothesized inverted U shape. There is no obvious and satisfactory alternative way of deriving this result, or an equivalent result. In particular, null hypothesis tests cannot provide adequate support for this type of hypothesis. Keywords: Confidence, Regression models, Curvilinear models, Bootstrapping 2

Introduction Glebbeek and Bax (2004) investigated the hypothesis that there is an inverted U-shape relationship between two variables staff turnover and organizational performance by setting up regression models with both staff turnover, and staff turnover squared, as independent variables. Their sample comprised 110 branches of an employment agency on the Netherlands. In all their models they found that the coefficient for the linear term was positive and for the squared term the coefficient was negative, which confirms their hypothesis. One of these models is shown in Figure 1 below. (This diagram is not in Glebbeek and Bax, 2004. Dr. Glebbeek, however, was kind enough to give me access to their data, which I have used to produce Figure 1.) Figure 1. Predicted performance from quadratic model (after adjusting for values of three control variables) 250000 200000 Performance 150000 100000 50000 0 0 5 10 15 20 25 30 35 40-50000 Turnover (% per year) (The solid line is the prediction from the regression model; the scattered points are the data on which the regression model is based.) The question now arises of whether this demonstrates that a similar pattern would occur if the analysis was done with the whole population from which the sample was drawn. Can we be sure that this is a stable result, or might another sample from the same source show a different pattern? Conventionally this question is answered by testing two 3

null hypotheses the first being that the coefficient of the linear term is zero, and the second being that the squared term is zero. In the model represented by Figure 1, neither coefficient is significantly different from zero. The evidence provides some support for the inverted U-shape hypothesis, but it is difficult to combine the two significance levels into a single figure to indicate the strength of the support for this hypothesis. Bootstrap methods provide a way out of this difficulty. The result from the bootstrap analysis below is that the data on which Figure 1 is based suggests a confidence level of 67% for the inverted U shape hypothesis. Using a confidence level in this way has a number of further advantages which are explained after we have shown how the bootstrap method works. The method shown is implemented on an Excel spreadsheet (available on the web) which can easily be adapted to analyse different models. The approach is mentioned briefly in Wood (2009a); the present paper extends it and analyzes it in more detail. Bootstrapping confidence levels for hypotheses The idea of bootstrapping is very simple. Suppose we have a random sample of size n from a specified population, and we have worked out a statistic, s, based on this sample. Now imagine that the population comprises a large number of copies of the sample say one million of them. If we now take a series of random samples from this imaginary population, and work out s for each of these, we can investigate how variable sample values of s are, and so derive sampling error statistics such as the standard error and confidence intervals. In practice, the easiest way of doing this is to take resamples with replacement from the original sample. This means we choose a member of the original sample at random, then replace it and choose again, until we have a sample of size n. This means that some members of the original sample will appear in the resample more than once, and others not at all. This is equivalent to choosing an ordinary sample from the large constructed population because the large size of this population means that its composition is effectively unchanged by removing each member of a sample. This principle has been widely used with a variety of statistics. Where conventional methods are possible, the answers obtained tend to be similar, but bootstrapping does have a 4

number of advantages, including the fact that it can be used where there is no convenient standard method (see, for example, Lunneborg, 2000; Wood, 2005). Figure 2. Predicted performance using a quadratic model (after adjusting for values of three control variables) from the data (bold) and three resamples (dotted lines) 150000 100000 Performance 50000 0-50000 -100000-150000 -200000 0 10 20 30 40 50 Staff turnover (% per year) Applying this idea to our present problem, the statistic, is now a line on a graph. Figure 2 shows results from three resamples as well as the original sample. Each of the dotted lines in this figure is based on identical formulae to the solid line representing the real sample, but using the data from a resample, rather than the original sample. Two of these resamples are obviously an inverted U shape; the third is not. These results come from an excel spreadsheet at http:/userweb.port.ac.uk/~woodm/brq.xls. The Resample sheet of the spreadsheet allows users to press the Recalculate button (F9) and generate a new resample and line on the graph. These can be thought of as different simulated samples from the same source. It is then a simple matter to produce more of these resamples and count up the number which are inverted U shapes. The conclusion was that 62 of 100 resamples gave an inverted U shape (with, obviously, the top of the U at a positive value of turnover), which suggests that the confidence level for this hypothesis, based on the data, should be 5

put at 62%. For a more stable and reliable answer, we can use a larger number of resamples 1000 resamples yielded a confidence level of 67%. It is very easy to use this method to obtain confidence levels for other hypotheses. The frequency of occurrence of any feature of the resample graphs can easily be worked out. For example, we might want to know the location of the optimum staff turnover. The point estimate from the regression shown in Figure 1 is that the optimum performance occurs with a staff turnover of 6%. Examining 1000 resamples gives these confidence level for three hypotheses: Confidence in hypothesis that the optimum is between 0% and 10% = 30% Confidence in hypothesis that the optimum is between 10% and 20% = 37% Confidence in hypothesis that the optimum is above 20% = 0% Another hypothesis of interest to Glebbeek and Bax (2004) is that the relationship between performance and turnover is negative this being the rival to the inverted U shape hypothesis. The top resample in Figure 2 illustrates the importance of defining this clearly: this shows a negative relationship for low turnover, but a positive relation for higher turnover. If we define a negative relationship as one which is not an inverted U shape, and for which the predicted performance for Turnover = 25% is less than the prediction for Turnover = 0 (which makes the top resample in Figure 2 such a negative relationship), then the spreadsheet shows that Confidence in hypothesis that the relationship is negative = 33% In fact, all 1000 resamples gave either an inverted U shape or a negative relationship in this sense. Pros and cons of this method of analysis The main advantage is that this method gives an answer for the degree of support for the inverted U shape hypothesis (a confidence level of 67%) which the conventional p values do not. Glebbeek and Bax (2004) cited p values for two coefficients, and it is unclear how these should be combined. More fundamentally, it is difficult to see what null hypothesis could be tested to demonstrate an inverted U shape. A null hypothesis of no relationship between the two variables would not differentiate between the hypothesis that the 6

relationship is linear and negative (the main competing hypothesis here), and the inverted U shape hypothesis. The method has the further advantages of flexibility (it can easily be adapted to analyze the hypotheses about the optimum turnover above, for example), and transparency users can literally see, by pressing the recalculate (F9) key, how variable different resamples are and so how variable real samples from the same source might have been. Figure 2 above demonstrates, graphically and clearly, the sampling error problem, and the derivation of the confidence levels from the spreadsheet is very straightforward. Against this there are some issues about the interpretation and validity of the method. Validity of bootstrapped confidence levels There an obvious logical problem with the description of the bootstrap method above: the imaginary population constructed from the sample is not the real population, so using it to make inferences about the accuracy of the sample as a guide to the real population obviously entail a few assumptions. Bootstrapping essentially models the proccess of sampling, so it will tell us about likely discrepancies between the real population and the sample, but, as Bayes theorem reminds us, to make inferences about the real population we need to take account of the prior probabilities of the various possibilities. In practice, this is not feasible (except in simpler cases than this see Wood, 2009b), so we will accept the bootstrap conjecture that we can use the bootstrap-world to learn about the real world (Lunneborg, 2000), but it is important to realise that the validity of the approach is not guaranteed. However, experience shows that in normal, well-behaved situations the bootstrap approach gives similar result to standard approaches based on probability theory. In our present example, we can check the confidence intervals from the Excel Regression Tool with bootstrapped estimates. For the data and model used for Figures 1 and 2, the Excel Regression Tool gives 95% confidence interval for square coeff. is 230 to +57 (Regression Tool) 7

On the bootstrap spreadsheet, the square term is described as curvature because it measures whether the curve is a U shape (+) or an inverted U shape ( ). Taking 1000 resamples and arranging them in order of curvature, the 95% confidence interval extends from the 2.5 percentile to the 97.5 percentile, which is 95% confidence interval for square coeff. is 301 to +75 (Bootstrap) This is 31% wider than the interval produced by the Regression Tool. Furthermore, the next two intervals produced by further sets of 1000 resamples were also wider than the Excel Regression Tool interval. We can also compare the two methods for the linear model without the coefficient for the square term (Model 3 in Glebbeek and Bax, 2004). With the Regression Tool this gives 95% confidence interval for the slope is 3060 to 495 (Regression Tool) and the corresponding result from 1000 bootstrap samples (using http://userweb.port.ac.uk/~woodm/brl.xls) is 95% confidence interval for the slope is 3147 to 704 (Bootstrap). In this case the bootstrap interval is 5% narrower (and the next bootstrap interval was 3% wider). The bootstrap confidence interval for the single variable model (Model 1 in Glebbeek and Bax, 2004) is similarly close to the Regression Tool estimate (a difference of less than 1% in the width of the intervals). These results suggest that the confidence interval estimates for the linear models are very close, and not too far apart (31%) for the curvilinear model. It is obviously very difficult to be sure which estimate is the better, but the reasonably close agreement of the two methods should give us some confidence in the bootstrap method for assessing confidence for the inverted U shape hypothesis, for which we have got no conventional method for comparison. It is also very important to acknowledge that all these confidence intervals and levels presuppose a background model. For the derivation of the confidence level for the inverted U shape hypothesis we used a quadratic model, as did Glebbeek and Bax (2004) for their analysis in terms of p values. Similarly, the linear model discussed above can be used to derive a confidence level of the regression slope being negative the bootstrap confidence level based on 3000 resamples was 99.5%. (The p value given by the 8

Regression Tool is 0.704%, which yields a confidence level of 99.6% using the method described in Wood, 2009b.) This confidence level is much more than the 33% obtained above for a negative relationship based on a quadratic model, largely because the definition of a negative relationship above is more restrictive it excludes, for example, the model based on the actual data in Figure 1 because this is an inverted U shape (but if a straight line were fitted it would obviously have a negative slope). Conclusions The bootstrap method outlined here gives a simple, direct and transparent method of assessing the confidence for hypotheses about various features of regression models. For example, the confidence in a hypothesis of an inverted U shape based on the data used was 67%. The method is implemented by a spreadsheet at http:/userweb.port.ac.uk/~woodm/brq.xls and can easily be adapted to analyze other hypotheses and models. There is no satisfactory way to analyze the support for hypotheses like this using null hypothesis testing. The bootstrap method answers a question which cannot easily be answered by other means. It is also a transparent method based simply on simulating successive samples from the same source. References Glebbeek, A. C., & Bax, E. H. (2004). Is high employee turnover really harmful? An empirical test using company records. Academy of Management Journal, 47(2), 277-286. Lunneborg, C. E. (2000). Data analysis by resampling: concepts and applications. Pacific Grove, CA, USA: Duxbury. Wood, M. (2005). Bootstrapped confidence intervals as an approach to statistical inference. Organizational Research Methods, 8(4), 454-470. Wood, M. (2009a). The use of statistical methods in management research: suggestions from a case study. arxiv:0908.0067v1 [stat.ap] (http://arxiv.org/abs/0908.0067v1) 9

Wood, M. (2009b). Liberating research from null hypotheses: confidence levels for substantive hypotheses instead of p values. 10