BOOTSTRAPPING CONFIDENCE LEVELS FOR HYPOTHESES ABOUT REGRESSION MODELS

Size: px

Start display at page:

Download "BOOTSTRAPPING CONFIDENCE LEVELS FOR HYPOTHESES ABOUT REGRESSION MODELS"

Victoria Norris
6 years ago
Views:

1 BOOTSTRAPPING CONFIDENCE LEVELS FOR HYPOTHESES ABOUT REGRESSION MODELS 17 December 2009 Michael Wood University of Portsmouth Business School SBS Department, Richmond Building Portland Street, Portsmouth PO1 3DE, UK 1

2 BOOTSTRAPPING CONFIDENCE LEVELS FOR HYPOTHESES ABOUT REGRESSION MODELS Abstract This paper shows how bootstrapping using a spreadsheet can be used to derive confidence levels for hypotheses about features of regression models such as their shape, and the location of optimum values. The data used as an example leads to a confidence level of 67% that the sample comes from a population which displays the hypothesized inverted U shape. There is no obvious and satisfactory alternative way of deriving this result, or an equivalent result. In particular, null hypothesis tests cannot provide adequate support for this type of hypothesis. Keywords: Confidence, Regression models, Curvilinear models, Bootstrapping 2

3 Introduction Glebbeek and Bax (2004) investigated the hypothesis that there is an inverted U-shape relationship between two variables staff turnover and organizational performance by setting up regression models with both staff turnover, and staff turnover squared, as independent variables. Their sample comprised 110 branches of an employment agency on the Netherlands. In all their models they found that the coefficient for the linear term was positive and for the squared term the coefficient was negative, which confirms their hypothesis. One of these models is shown in Figure 1 below. (This diagram is not in Glebbeek and Bax, Dr. Glebbeek, however, was kind enough to give me access to their data, which I have used to produce Figure 1.) Figure 1. Predicted performance from quadratic model (after adjusting for values of three control variables) Performance Turnover (% per year) (The solid line is the prediction from the regression model; the scattered points are the data on which the regression model is based.) The question now arises of whether this demonstrates that a similar pattern would occur if the analysis was done with the whole population from which the sample was drawn. Can we be sure that this is a stable result, or might another sample from the same source show a different pattern? Conventionally this question is answered by testing two 3

4 null hypotheses the first being that the coefficient of the linear term is zero, and the second being that the squared term is zero. In the model represented by Figure 1, neither coefficient is significantly different from zero. The evidence provides some support for the inverted U-shape hypothesis, but it is difficult to combine the two significance levels into a single figure to indicate the strength of the support for this hypothesis. Bootstrap methods provide a way out of this difficulty. The result from the bootstrap analysis below is that the data on which Figure 1 is based suggests a confidence level of 67% for the inverted U shape hypothesis. Using a confidence level in this way has a number of further advantages which are explained after we have shown how the bootstrap method works. The method shown is implemented on an Excel spreadsheet (available on the web) which can easily be adapted to analyse different models. The approach is mentioned briefly in Wood (2009a); the present paper extends it and analyzes it in more detail. Bootstrapping confidence levels for hypotheses The idea of bootstrapping is very simple. Suppose we have a random sample of size n from a specified population, and we have worked out a statistic, s, based on this sample. Now imagine that the population comprises a large number of copies of the sample say one million of them. If we now take a series of random samples from this imaginary population, and work out s for each of these, we can investigate how variable sample values of s are, and so derive sampling error statistics such as the standard error and confidence intervals. In practice, the easiest way of doing this is to take resamples with replacement from the original sample. This means we choose a member of the original sample at random, then replace it and choose again, until we have a sample of size n. This means that some members of the original sample will appear in the resample more than once, and others not at all. This is equivalent to choosing an ordinary sample from the large constructed population because the large size of this population means that its composition is effectively unchanged by removing each member of a sample. This principle has been widely used with a variety of statistics. Where conventional methods are possible, the answers obtained tend to be similar, but bootstrapping does have a 4

5 number of advantages, including the fact that it can be used where there is no convenient standard method (see, for example, Lunneborg, 2000; Wood, 2005). Figure 2. Predicted performance using a quadratic model (after adjusting for values of three control variables) from the data (bold) and three resamples (dotted lines) Performance Staff turnover (% per year) Applying this idea to our present problem, the statistic, is now a line on a graph. Figure 2 shows results from three resamples as well as the original sample. Each of the dotted lines in this figure is based on identical formulae to the solid line representing the real sample, but using the data from a resample, rather than the original sample. Two of these resamples are obviously an inverted U shape; the third is not. These results come from an excel spreadsheet at The Resample sheet of the spreadsheet allows users to press the Recalculate button (F9) and generate a new resample and line on the graph. These can be thought of as different simulated samples from the same source. It is then a simple matter to produce more of these resamples and count up the number which are inverted U shapes. The conclusion was that 62 of 100 resamples gave an inverted U shape (with, obviously, the top of the U at a positive value of turnover), which suggests that the confidence level for this hypothesis, based on the data, should be 5

6 put at 62%. For a more stable and reliable answer, we can use a larger number of resamples 1000 resamples yielded a confidence level of 67%. It is very easy to use this method to obtain confidence levels for other hypotheses. The frequency of occurrence of any feature of the resample graphs can easily be worked out. For example, we might want to know the location of the optimum staff turnover. The point estimate from the regression shown in Figure 1 is that the optimum performance occurs with a staff turnover of 6%. Examining 1000 resamples gives these confidence level for three hypotheses: Confidence in hypothesis that the optimum is between 0% and 10% = 30% Confidence in hypothesis that the optimum is between 10% and 20% = 37% Confidence in hypothesis that the optimum is above 20% = 0% Another hypothesis of interest to Glebbeek and Bax (2004) is that the relationship between performance and turnover is negative this being the rival to the inverted U shape hypothesis. The top resample in Figure 2 illustrates the importance of defining this clearly: this shows a negative relationship for low turnover, but a positive relation for higher turnover. If we define a negative relationship as one which is not an inverted U shape, and for which the predicted performance for Turnover = 25% is less than the prediction for Turnover = 0 (which makes the top resample in Figure 2 such a negative relationship), then the spreadsheet shows that Confidence in hypothesis that the relationship is negative = 33% In fact, all 1000 resamples gave either an inverted U shape or a negative relationship in this sense. Pros and cons of this method of analysis The main advantage is that this method gives an answer for the degree of support for the inverted U shape hypothesis (a confidence level of 67%) which the conventional p values do not. Glebbeek and Bax (2004) cited p values for two coefficients, and it is unclear how these should be combined. More fundamentally, it is difficult to see what null hypothesis could be tested to demonstrate an inverted U shape. A null hypothesis of no relationship between the two variables would not differentiate between the hypothesis that the 6

7 relationship is linear and negative (the main competing hypothesis here), and the inverted U shape hypothesis. The method has the further advantages of flexibility (it can easily be adapted to analyze the hypotheses about the optimum turnover above, for example), and transparency users can literally see, by pressing the recalculate (F9) key, how variable different resamples are and so how variable real samples from the same source might have been. Figure 2 above demonstrates, graphically and clearly, the sampling error problem, and the derivation of the confidence levels from the spreadsheet is very straightforward. Against this there are some issues about the interpretation and validity of the method. Validity of bootstrapped confidence levels There an obvious logical problem with the description of the bootstrap method above: the imaginary population constructed from the sample is not the real population, so using it to make inferences about the accuracy of the sample as a guide to the real population obviously entail a few assumptions. Bootstrapping essentially models the proccess of sampling, so it will tell us about likely discrepancies between the real population and the sample, but, as Bayes theorem reminds us, to make inferences about the real population we need to take account of the prior probabilities of the various possibilities. In practice, this is not feasible (except in simpler cases than this see Wood, 2009b), so we will accept the bootstrap conjecture that we can use the bootstrap-world to learn about the real world (Lunneborg, 2000), but it is important to realise that the validity of the approach is not guaranteed. However, experience shows that in normal, well-behaved situations the bootstrap approach gives similar result to standard approaches based on probability theory. In our present example, we can check the confidence intervals from the Excel Regression Tool with bootstrapped estimates. For the data and model used for Figures 1 and 2, the Excel Regression Tool gives 95% confidence interval for square coeff. is 230 to +57 (Regression Tool) 7

8 On the bootstrap spreadsheet, the square term is described as curvature because it measures whether the curve is a U shape (+) or an inverted U shape ( ). Taking 1000 resamples and arranging them in order of curvature, the 95% confidence interval extends from the 2.5 percentile to the 97.5 percentile, which is 95% confidence interval for square coeff. is 301 to +75 (Bootstrap) This is 31% wider than the interval produced by the Regression Tool. Furthermore, the next two intervals produced by further sets of 1000 resamples were also wider than the Excel Regression Tool interval. We can also compare the two methods for the linear model without the coefficient for the square term (Model 3 in Glebbeek and Bax, 2004). With the Regression Tool this gives 95% confidence interval for the slope is 3060 to 495 (Regression Tool) and the corresponding result from 1000 bootstrap samples (using is 95% confidence interval for the slope is 3147 to 704 (Bootstrap). In this case the bootstrap interval is 5% narrower (and the next bootstrap interval was 3% wider). The bootstrap confidence interval for the single variable model (Model 1 in Glebbeek and Bax, 2004) is similarly close to the Regression Tool estimate (a difference of less than 1% in the width of the intervals). These results suggest that the confidence interval estimates for the linear models are very close, and not too far apart (31%) for the curvilinear model. It is obviously very difficult to be sure which estimate is the better, but the reasonably close agreement of the two methods should give us some confidence in the bootstrap method for assessing confidence for the inverted U shape hypothesis, for which we have got no conventional method for comparison. It is also very important to acknowledge that all these confidence intervals and levels presuppose a background model. For the derivation of the confidence level for the inverted U shape hypothesis we used a quadratic model, as did Glebbeek and Bax (2004) for their analysis in terms of p values. Similarly, the linear model discussed above can be used to derive a confidence level of the regression slope being negative the bootstrap confidence level based on 3000 resamples was 99.5%. (The p value given by the 8

9 Regression Tool is 0.704%, which yields a confidence level of 99.6% using the method described in Wood, 2009b.) This confidence level is much more than the 33% obtained above for a negative relationship based on a quadratic model, largely because the definition of a negative relationship above is more restrictive it excludes, for example, the model based on the actual data in Figure 1 because this is an inverted U shape (but if a straight line were fitted it would obviously have a negative slope). Conclusions The bootstrap method outlined here gives a simple, direct and transparent method of assessing the confidence for hypotheses about various features of regression models. For example, the confidence in a hypothesis of an inverted U shape based on the data used was 67%. The method is implemented by a spreadsheet at and can easily be adapted to analyze other hypotheses and models. There is no satisfactory way to analyze the support for hypotheses like this using null hypothesis testing. The bootstrap method answers a question which cannot easily be answered by other means. It is also a transparent method based simply on simulating successive samples from the same source. References Glebbeek, A. C., & Bax, E. H. (2004). Is high employee turnover really harmful? An empirical test using company records. Academy of Management Journal, 47(2), Lunneborg, C. E. (2000). Data analysis by resampling: concepts and applications. Pacific Grove, CA, USA: Duxbury. Wood, M. (2005). Bootstrapped confidence intervals as an approach to statistical inference. Organizational Research Methods, 8(4), Wood, M. (2009a). The use of statistical methods in management research: suggestions from a case study. arxiv: v1 [stat.ap] ( 9

10 Wood, M. (2009b). Liberating research from null hypotheses: confidence levels for substantive hypotheses instead of p values. 10

BOOTSTRAPPING CONFIDENCE LEVELS FOR HYPOTHESES ABOUT QUADRATIC (U-SHAPED) REGRESSION MODELS

BOOTSTRAPPING CONFIDENCE LEVELS FOR HYPOTHESES ABOUT QUADRATIC (U-SHAPED) REGRESSION MODELS 12 June 2012 Michael Wood University of Portsmouth Business School SBS Department, Richmond Building Portland