Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Size: px
Start display at page:

Download "Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at"

Transcription

1 The Choice of Variables in Observational Studies Author(s): D. R. Cox and E. J. Snell Source: Journal of the Royal Statistical Society. Series C (Applied Statistics), Vol. 23, No. 1 (1974), pp Published by: Wiley for the Royal Statistical Society Stable URL: Accessed: :15 UTC JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org. Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at Royal Statistical Society, Wiley are collaborating with JSTOR to digitize, preserve and extend access to Journal of the Royal Statistical Society. Series C (Applied Statistics)

2 Appl. Statist., (1974), 23, No. 1, p. 51 The Choice of Variables in Observational Studiest By D. R. Cox and E. J. SNELL Imperial College, London SUMMARY A review is given of considerations affecting the choice of explanatory variables in observational studies. Aspects of both design and analysis are considered. In particular the choice of explanatory variables in multiple regression is discussed and some recommendations made. Keywords: MULTIPLE REGRESSION; ANALYTICAL SURVEYS; MEDICAL APPLICATION; SELECTION OF VARIABLES; DESIGN OF INVESTIGATIONS; OBSERVATIONAL STUDIES 1. INTRODUCTION THIS paper reviews some general aspects of the choice of variables in observational studies. To keep the paper concise only outline examples have been included and to be specific these are medical, although the ideas apply widely. Observational studies, where they are not purely descriptive, have as their objective the explanation or prediction of some response in terms of explanatory or predictor variables. It is useful to have two examples in mind. Example 1. Consider an investigation into the incidence of a respiratory disease among a certain group of workers. The response variable may be severity of the disease, with possible explanatory variables being the worker's age, physical status, working conditions, previous employment, etc. Some variables may be more important than others in explaining the severity of the disease. Example 2. A different situation is one of trying to predict the time to death among patients known to be suffering from a progressive and fatal disease. Possible predictive variables are type of treatment, treatment variables such as dose, clinical and biochemical measurements made on diagnosis, etc. Although careful discussion of the most appropriate way to measure response is always important, and often several different measures will be called for, nevertheless what response variables to consider is frequently fairly clearcut. Thus in Example 1, severity may be assessed radiologically and graded according to standard levels. In Example 2, time to death is likely to be measured from time of diagnosis. In this paper we concentrate on the explanatory variables; how many such variables should be measured and, if many are observed, how should the analysis be handled to find the most relevant ones? These are difficult issues. Many of the following points are rather trite when put in general terms and do not lend themselves very well to quantitative discussion. On the other hand, the decision as to what to do in any particular investigation can be hard. t This paper is based on one prepared for the Division of Research in Epidemiology and Communications Science, World Health Organization. 51

3 52 APPLIED STATISTICS 2. SELECTION OF EXPLANATORY VARIABLES FOR MEASUREMENT The following general aspects of a study will influence the nature and number of explanatory variables measured: (a) Whether the study is intended to investigate some rather specific hypothesis about the phenomenon or whether it is designed to screen out the most important variables from rather a large number of possibly relevant variables, the important variables to be examined in detail subsequently, possibly by experiments rather than by observational studies. In the former case it is important to try to anticipate the main explanations competing with the hypothesis under test and to measure relevant variables. (b) Whether the response variables are observed quite quickly, so that the later parts of the study can be modified, if necessary, in the light of the earlier results. If this is not possible, it is more likely to be necessary to measure many variables on each individual. (c) Questions of economy of time, ease of setting up instruments, difficulty of contacting individuals, loss of accuracy arising from increased work-load, availability of "good" official statistics, etc. will often be crucial in deciding how many variables can be measured. (d) Variables may be included primarily to establish comparisons with previous related studies. In many studies binary explanatory variables will be adequate in analysis, except for the most important variables, provided that the split between the two categories is appropriately made. On the other hand, poorly defined binary variables may be virtually useless and for this reason it will often be essential to record on a more than two-point scale. Usually it will be sensible to arrange that binary explanatory variables are constructed to have roughly a split, in order for the effect on response to be as clear-cut as possible but if the response also is binary and its effect appreciable or if very non-linear effects are involved the position is more complicated (Cox, 1969). Multiplicity in explanatory variables can arise in two rather different ways. We may measure a number of quite different properties, or we may have a number of ways of measuring what is essentially one property. For example, occurrence of particular symptoms may be elicited by a single question or probed by a battery of related questions. Many possibilities exist for more sophisticated design, especially where the total number of explanatory variables is large and accuracy of measurement is likely to drop if all variables are measured on all subjects. One possibility is to measure only a subset of variables on each subject, the variables being chosen in a suitably balanced way. Another possibility, where some of the variables are arranged in batteries as indicated above, is not to measure each full battery on each subject, but to measure detailed variables only on subsets of individuals. Used with care these ideas should be fruitful, especially in very large-scale investigations. Of course simplicity in design remains a vitally important requirement. 3. BROAD PROBLEMS OF ANALYSIS Two main kinds of response variable commonly encountered are binary (e.g. occurrence, non-occurrence) and measurements that, possibly after transformation, lead to approximately normally distributed data. An important point is that while the

4 THE CHOICE OF VARIABLES IN OBSERVATIONAL STUDIES 53 precise techniques of analysis will be different for different the broad strategy to be adopted and the difficulties of interpretation likely to be encountered are the same. We shall concentrate in Section 4 largely on the techniques for normal theory multiple regression, simply because this is the most thoroughly investigated case. The following general points have to be considered: (a) There is a working distinction between producing (i) a fit to the data useful for future prediction in the absence of major changes in the system and (ii) an "explanation" which will link with other studies, e.g. fundamental laboratory work, and will predict under quite different circumstances. For (i), two quite different models, involving different explanatory variables, are equally acceptable if they fit the data equally well. If a choice has to be made between them, it may be done on the basis of simplicity, e.g. in terms of the number of explanatory variables necessary or the ease with which the relevant variables can be measured; for a quantitative decision theoretic analysis, see Lindley (1968). In (ii), however, it is usually of central importance to find which explanatory variables have important effects. (b) Even in the first case of prediction in the narrow sense it will not normally be wise to include all predictor variables. This is both for reasons of simplicity and because typically the mean square error of prediction will be raised by including too many variables. (c) The main difficulties in dealing with observational studies stem from two rather different sources, the omission of relevant variables from those measured and the presence of fairly high dependencies among the explanatory variables. As a simple illustration of the first situation consider the relation between time to death y and the level of some prescribed "dose" x. If the dose level is determined by the severity of disease, the dependence of y upon x as given by a simple linear regression cannot be interpreted as predicting the change in y for a particular individual given a change in x. Only if the omitted variable, severity of disease, is included as a further explanatory variable is a "causal" interpretation at all feasible. The second situation would, for example, arise in measuring the percentage of substances A, B,... in a compound, where there will be an exact linear relationship between the percentages in a compound. Although this is an extreme example, close dependencies are often unavoidable if a large number of explanatory variables are measured. Interpretation is difficult because many apparently different models may fit the data almost equally well. A principal component analysis of the explanatory variables may be tried in these circumstances (Jeffers, 1967). In designed experiments, balance and randomization largely overcome these difficulties; the omission and hence randomization of an important variable leads to an increased error variance and to a seriously incomplete understanding, but not to a "biased" conclusion. It may sometimes be possible to take additional observations at values of the explanatory variables chosen so as to reduce non-orthogonalities in the data (Dykstra, 1966; Gaylor and Merrill, 1968; Silvey, 1969). In observational studies in which the objective is the comparison of, say, a treatment with a control, matching individuals to remove bias is likely to be useful (Cochran, 1965, 1972). Of course in a very large-scale study the amount of data collected may be so great that quite apart from the difficulties of principle alluded to above there may be limitations imposed on computational grounds, or because of human limitations in what can be absorbed.

5 54 APPLIED STATISTICS In deciding what variables to include it is important to take account of additional information, for example as to which variables are likely to be alternatives to one another and which it is almost certain to be necessary to include. It is equally important that any such "prior" knowledge inserted into the analysis should be tested for consistency with the data. There are two approaches to the inclusion of general classification variables like age, sex, etc. One is to make separate analyses for men and women, combining the conclusions at a later stage if they seem compatible. The other is to fit a composite model in which say the sex difference is represented by a single parameter; this assumes that in some sense there is no interaction between the main explanatory variables and sex, an assumption that can be tested at least informally. With large sets of data it will, however, frequently be sensible to analyse in a series of sections merging the analyses in a second stage. In any case the examination of the consistency of conclusions from independent sets of data is an important and simple technique for assessing precision. Note that there is a distinction implied here between genuine explanatory variables and classification variables that serve merely to define major subclasses of individuals. This is relevant when we are looking for proper "causal" relations. That is, it is not an "explanation" to say that a death rate for men is greater than that for women. 4. MORE DETAILED PROBLEMS OF ANALYSIS We now consider in more detail situations where for each individual there is a continuous response variable y and a number of explanatory variables xl,..., x. Suppose that we work provisionally with the assumption that the expected value of y is a linear function of the explanatory variables. It is assumed that any preliminary transformation of the response variable, e.g. from response time to its logarithm has been made, also that the data have been edited to remove gross errors and to isolate suspect values. We shall not describe the large body of statistical methods and theory associated with the linear model; for an introductory account, see Draper and Smith (1966) and for general comments Cox (1968). Formal significance tests are a useful guide to the importance of different explanatory variables, but have not to be followed too rigidly. One reason for caution is that the tabulated significance levels of the F distribution refer to a single test carried out in isolation; in practice we are nearly always concerned with a chain of related tests and this makes the interpretation of the ordinary significance levels indirect (Draper et al., 1971; Pope and Webster, 1972; Spj0tvoll, 1972). The difficulties of interpretation caused by non-orthogonality are less important if interest is purely in prediction over the range of explanatory variables covered by the data. Although several rather different looking equations will often have similar residual mean squares, it may be unimportant which equation is used. An equation with few explanatory variables will, however, give biased estimates if the omitted variables are at all relevant. The extent of bias, averaged over the observed distribution of the explanatory variables, in an equation with k variables is indicated by the statistic suggested by C. L. Mallows (Gorman and Toman, 1966) Ck = (residual sum of squares)/12 - (n - 2k), where n denotes sample size and C2 is a separate estimate Of a2; in the absence of bias, E(Ck) k. Given several equations with similar residual mean squares, one

6 THE CHOICE OF VARIABLES IN OBSERVATIONAL STUDIES 55 with small bias is likely to be preferred. This does not necessarily mean one with many explanatory variables; increasing the number of variables may reduce the bias but at the expense of increasing the total error of prediction. If predicting outside the observed region of the explanatory variables, different equations will give vastly different predictions. Methods for selecting single well-fitting equations from a large set will be reviewed at the end of this section. In most applications, however, the particular variables affecting response and the directions of their effects are of intrinsic interest and then the selection of just one wellfitting equation from among many is unsatisfactory and possibly very misleading. In principle, the following procedure seems a sensibly cautious approach in such situations. All possible 2P equations are fitted (Garside, 1965; Schatzoff et al., 1968; Morgan and Tatar, 1972) and those clearly inconsistent with the data rejected; that is equations with a residual mean square significantly greater than the mean square residual from the full model are rejected. Typically if an equation involving a subset Y of explanatory variables is consistent with the data, so is that based on a larger subset ey', Y'= Y 9. (Any exceptions to this will be minor ones depending on the particular levels of significance used.) Such a subset we call primitive. A program to find the primitive models and associated information has been written at Imperial College by Mrs M. Ansell and is available for use on the CDC 6400 computer. If there is only one primitive model, the situation is fairly clear-cut; where there is more than one, a choice between them can be made only on the basis of additional information. Unfortunately this procedure, even with sophisticated numerical analytic and programming techniques (Wampler, 1970; Mullet and Murray, 1971), does not seem feasible for more than explanatory variables. If more explanatory variables are available, as will often be the case for example in large epidemiological studies, it follows that some reduction will be essential before the above method can be used. The use of several alternative reductions will usually be desirable. The main methods for such reduction are as follows. (a) We may examine sets of explanatory variables. If the data contain batteries of questions, some form of total score may be adequate. This can be tested for consistency with the data. (b) Some variables may be specified for definite inclusion, for example if interest lies primarily in the supplementary effect of other variables. (c) Classification variables (such as sex, age, etc.) may be used to split the data into sections for separate analysis. (d) It may be thought on general grounds that a regression coefficient or regression coefficients, associated with a particular variable, even one not of primary importance, should be of a certain sign, e.g. should not be negative. Occasionally this may be helpful in clarifying the relationship. In any fitted model for which the regression coefficient is of the wrong sign, but not significantly different from zero, the coefficient is replaced by zero, i.e. the variable is in effect omitted. An estimate significantly different from zero and of the wrong sign implies that the prior assumption is wrong, or that the wrong form of relation is being fitted, or that an important variable has been omitted, or that by chance an extreme fluctuation has occurred. (e) When special relationships can be postulated among the explanatory variables the methods of path analysis can be used; see, for example, Turner and Stevens (1959). These methods, originally developed in connection with genetics, have

7 56 APPLIED STATISTICS more recently been examined by sociologists, for example, by Blalock and Blalock (1968). The general idea is partly that the postulation or discovery of a series of special relationships between the variables will clarify the whole problem and partly that such relationships will increase the precision of estimates and hence help to resolve ambiguities. The above approaches involve injecting some further external information. The remaining devices are essentially general computational devices; see Draper and Smith (1969) for a review up to that date. (f) The most commonly used procedures for progressive selection of variables are forward selection, backward elimination (Hamaker, 1962; Oosterhoff, 1963; Abt, 1967; Mantel, 1970), stepwise regression (Efroymson, 1960; Breaux, 1968; Goodman, 1971) and "optimum" regression. These will not be described in detail. "Optimum" regression finds that equation which for a specified number of explanatory variables has the minimum residual sum of squares. An algorithm by Beale et al. (1967) makes it unnecessary to evaluate all regressions; the procedure is claimed to be manageable provided the number of variables is not much in excess of 20 (Beale, 1970). (g) Newton and Spurrell (1967a, b) have proposed a method called element analysis for assessing the information provided by all 2P fits; 2P - 1 elements, to be used in conjunction with certain rules, are calculated from the sums of squares attributable to regression. (h) A suggestion of Gorman and Toman (1966) is to calculate a fractional factorial of the 2P possible regressions and to select variables by a subjective inspection of the values of the residual mean square or the statistic Ck for the computed regressions. Further evidence is needed on the efficiency of this procedure; an example is given in Daniel and Wood (1971). Hocking and Leslie (1967) and La Motte and Hocking (1970) consider a technique to minimize Ck for given k, calculating a subset of the regressions; see also Rothman (1968). (i) A procedure based on estimates which differ from the usual least squares estimates is that of ridge regression (Hoerl and Kennard, 1970a, b; Marquardt, 1970; Lindley and Smith, 1972). The least squares equations are modified to give estimates which are stable and which, although biased, give smaller mean square error of prediction. This is helpful when the main emphasis is on prediction and especially appropriate when the regression coefficients (or a subset of them) are generated by a random mechanism. It is not clear how useful the method is in the isolation of important variables. We consider that the procedures (f) and (h) should be used, if at all, only where the particular variables selected are not of intrinsic interest or as a preliminary device in the reduction of variables, so that the recommended techniques for up to 10 variables can be followed; several different reductions should then normally be examined. The possibility of interactions between the effects of different explanatory variables has usually to be borne in mind. They can be detected in essentially three ways, by graphical analysis of residuals, by fitting an extended model usually with fairly simple forms of interaction represented by cross products of primary explanatory variables or by analysing the data in sections. Of course in a problem with many explanatory variables the number of possible interaction terms, even of the simplest kind, is large. Then attention will often have to be restricted to those interactions thought

8 THE CHOICE OF VARIABLES IN OBSERVATIONAL STUDIES 57 particularly likely on general grounds and to interactions among variables with large "ordinary" effects. With binary response variables essentially the same problems arise. 5. SOME MORE COMPLEX PROBLEMS The difficulties discussed in Section 4 arise in the context even of the simplest multiple regression model. There are, of course, many other sources of difficulty of analysis. In addition to those arising from different kinds of response, e.g. binary, some further problems associated with normal theory regression that need caution are as follows: (a) Missing values among the explanatory variables are a common source of difficulty (Buck, 1960; Afifi and Elashoff, 1966, 1967, 1969; Dagenais, 1971; Hartley and Hocking, 1971; Orchard and Woodbury, 1972). Current unpublished work by E. M. L. Beale and R. J. A. Little at Imperial College supports the method of Orchard and Woodbury. (b) There may be a need for models non-linear in the parameters or variables. (c) Major problems can arise when the individuals are arranged in groups. For example, the regressions between and within groups are likely to be different, and the errors of different individuals are unlikely to be mutually independent. The groups may be characterized by random variables and involve models with components of variance. (d) There may be appreciable rounding or measurement errors in the explanatory variables (Swindel and Bower, 1972). 6. SOME RECOMMENDATIONS It is difficult to give specific recommendations because of the widely differing situations that can arise in application. Some of the main points can be summarized as follows: In design (a) The nature of the study, and considerations of accuracy and economy determine how many variables are sensible. (b) Divide the variables into batteries, where relevant and consider the possibility of a special design to omit some measurements. In analysis, given that multiple regression techniques are applied, (c) The distinction between predicting future observations and interpreting the data can influence the choice of variables. (d) If interpretation is the objective, and p is not greater than 10-15, compute all 2P regressions and examine those consistent with the data. Larger values of p should in some way be reduced to make the computations feasible. (e) Automatic selection procedures, such as are commonly used in many generally available computer programs, should be used only as a preliminary device or if the particular variables selected are not of intrinsic interest. (f) Use of supplementary information and assumptions may be crucial in clarifying relationships. Any such assumptions should, however, be tested for consistency with the data and the conclusions with and without the supplementary information should normally be compared. (g) The possibility of interactions between the effects of different explanatory variables should be considered.

9 58 APPLIED STATISTICS REFERENCES ABT, K. (1967). On the identification of the significant independent variables in linear models. I, II. Metrika, 12, 1-15, AFIFI, A. A. and ELASHOFF, R. M. (1966, 1967, 1969). Missing observations in multivariate statistics. I-IV. J. Am. Statist. Assoc., 61, ; 62, 10-29; 64, , BEALE, E. M. L. (1970). Note on procedures for variable selection in multiple regression. Technometrics, 12, BEALE, E. M. L., KENDALL, M. G. and MANN, D. W. (1967). The discarding of variables in multivariate analysis. Biometrika, 54, BLALOCK, H. M. and BLALOCK, A. (editors) (1968). Methodology in Social Research. New York: McGraw Hill. BREAUX, H. J. (1968). A modification of Efroymson's technique for stepwise regression analysis. Comm. ACM, 11, BUCK, S. F. (1960). A method of estimation of missing values in multivariate data suitable for use with an electronic computer. J. R. Statist. Soc. B, 22, COCHRAN, W. G. (1965). The planning of observational studies of human populations. J. R. Statist. Soc. A, 128, (1972). Observational studies. In Statistical Papers in Honor of George W. Snedecor (T. A. Bancroft, ed.), pp Iowa: Iowa State Press. Cox, D. R. (1968). Notes on some aspects of regression analysis. J. R. Statist. Soc. A, 131, (1969). Analysis of Binary Data. London: Methuen. DAGENAIS, M. G. (1971). Further suggestions concerning the utilization of incomplete observations in regression analysis. J. Am. Statist. Assoc., 66, DANIEL, C. and WOOD, F. S. (1971). Fitting Equations to Data. New York: Wiley-Interscience. DRAPER, N. R., GUTTMAN, I. and KANEMASU, H. (1971). The distribution of certain regression statistics. Biometrika, 58, DRAPER, N. and SMITH, H. (1966). Applied Regression Analysis. New York: Wiley. (1969). Methods for selecting variables from a given set of variables for regression analysis. Bull. Inst. Int. Statist., 43, DYKSTRA, 0. (1966). The orthogonalization of undesigned experiments. Technometrics, 6, EFROYMSON, M. A. (1960). Multiple regression analysis. In Mathematical Methods for Digital Computers (A. Ralston and H. S. Wilf, eds), Chapter 17. New York: Wiley. GARSIDE, M. J. (1965). The best subset in multiple regression analysis. Appl. Statist., 14, GAYLOR, D. W. and MERRILL, J. A. (1968). Augmenting existing data in multiple regression. Technometrics, 10, GOODMAN, L. A. (1971). The analysis of multidimensional contingency tables: stepwise procedures and direct estimation methods for building models for multiple classification. Technometrics, 13, GORMAN, J. W. and TOMAN, R. J. (1966). Selection of variables for fitting equations to data. Technometrics, 8, HAMAKER, H. C. (1962). On multiple regression analysis. Statist. Neerlandica, 16, HARTLEY, H. 0. and HOCKING, R. R. (1971). The analysis of incomplete data. Biometrics, 27, HOCKING, R. R. and LESLIE, R. W. (1967). Selection of the best subset in regression analysis. Technometrics, 9, HOERL, A. E. and KENNARD, R. W. (1970a). Ridge regression: biased estimation for nonorthogonal problems. Technometrics, 12, (1970b). Ridge regression: applications to nonorthogonal problems. Technometrics, 12, JEFFERS, J. N. R. (1967). Two case studies in the application of principal component analysis. Appl. Statist., 16, LA MOTTE, L. R. and HOCKING, R. R. (1970). Computational efficiency in the selection of regression variables. Technometrics, 12, LINDLEY, D. V. (1968). The choice of variables in multiple regression. J. R. Statist. Soc. B, 30, LINDLEY, D. V. and SMITH, A. F. M. (1972). Bayes estimates for the linear model (with Discussion). J. R. Statist. Soc. B, 34, 1-41.

10 THE CHOICE OF VARIABLES IN OBSERVATIONAL STUDIES 59 MANTEL, N. (1970). Why stepdown procedures in variable selection. Technometrics, 12, MARQUARDT, D. W. (1970). Generalized inverses, ridge regression, biased linear estimation and non-linear estimation. Technometrics, 12, MORGAN, J. A. and TATAR, J. F. (1972). Calculation of the residual sum of squares for all possible regressions. Technometrics, 14, MULLET, G. M. and MURRAY, T. W. (1971). A new method for examining rounding error in least-squares regression computer programs. J. Am. Statist. Assoc., 66, NEWTON, R. G. and SPURRELL, D. J. (1967a). A development of multiple regression for the analysis of routine data. Appl. Statist., 16, (1967b). Examples of the use of elements for clarifying regression analysis. Appl. Statist., 16, OOSTERHOFF, J. (1963). On the selection of independent variables in a regression equation. Report 319. Math. Centre, Amsterdam. ORCHARD, T. and WOODBURY, M. A. (1972). A missing information principle, theory and applications. Proc. 6th Berkeley Symp., 1, POPE, P. T. and WEBSTER, J. T. (1972). The use of an F-statistic in stepwise regression procedures. Technometrics, 14, ROTHMAN, D. (1968). Comment on Hocking and Leslie's paper. Technometrics, 10, 432. SCHATZOFF, M., TSAO, R. and FIENBERG, S. (1968). Efficient calculation of all possible regressions. Technometrics, 10, SILVEY, S. D. (1969). On choosing additional values of explanatory variables to counter multicollinearity. Bull. Inst. Int. Statist., 43, SPJ0TVOLL, E. (1972). Multiple comparison of regression functions. Ann. Math. Statist., 43, SWINDEL, B. F. and BOWER, D. R. (1972). Rounding errors in the independent variables in a general linear model. Technometrics, 14, TURNER, M. E. and STEVENS, C. D. (1959). The regression analysis of causal paths. Biometrics, 15, WAMPLER, R. H. (1970). A report on the accuracy of some widely used least squares computer programs. J. Am. Statist. Assoc., 65,

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at Notes on Some Aspects of Regression Analysis Author(s): D. R. Cox Source: Journal of the Royal Statistical Society. Series A (General), Vol. 131, No. 3 (1968), pp. 265-279 Published by: Wiley for the Royal

More information

Some interpretational issues connected with observational studies

Some interpretational issues connected with observational studies Some interpretational issues connected with observational studies D.R. Cox Nuffield College, Oxford, UK and Nanny Wermuth Chalmers/Gothenburg University, Gothenburg, Sweden ABSTRACT After some general

More information

A MONTE CARLO STUDY OF MODEL SELECTION PROCEDURES FOR THE ANALYSIS OF CATEGORICAL DATA

A MONTE CARLO STUDY OF MODEL SELECTION PROCEDURES FOR THE ANALYSIS OF CATEGORICAL DATA A MONTE CARLO STUDY OF MODEL SELECTION PROCEDURES FOR THE ANALYSIS OF CATEGORICAL DATA Elizabeth Martin Fischer, University of North Carolina Introduction Researchers and social scientists frequently confront

More information

Chapter 5: Field experimental designs in agriculture

Chapter 5: Field experimental designs in agriculture Chapter 5: Field experimental designs in agriculture Jose Crossa Biometrics and Statistics Unit Crop Research Informatics Lab (CRIL) CIMMYT. Int. Apdo. Postal 6-641, 06600 Mexico, DF, Mexico Introduction

More information

A Comparative Study of Some Estimation Methods for Multicollinear Data

A Comparative Study of Some Estimation Methods for Multicollinear Data International Journal of Engineering and Applied Sciences (IJEAS) A Comparative Study of Some Estimation Methods for Multicollinear Okeke Evelyn Nkiruka, Okeke Joseph Uchenna Abstract This article compares

More information

WELCOME! Lecture 11 Thommy Perlinger

WELCOME! Lecture 11 Thommy Perlinger Quantitative Methods II WELCOME! Lecture 11 Thommy Perlinger Regression based on violated assumptions If any of the assumptions are violated, potential inaccuracies may be present in the estimated regression

More information

MODEL SELECTION STRATEGIES. Tony Panzarella

MODEL SELECTION STRATEGIES. Tony Panzarella MODEL SELECTION STRATEGIES Tony Panzarella Lab Course March 20, 2014 2 Preamble Although focus will be on time-to-event data the same principles apply to other outcome data Lab Course March 20, 2014 3

More information

Multivariable Systems. Lawrence Hubert. July 31, 2011

Multivariable Systems. Lawrence Hubert. July 31, 2011 Multivariable July 31, 2011 Whenever results are presented within a multivariate context, it is important to remember that there is a system present among the variables, and this has a number of implications

More information

Mark J. Anderson, Patrick J. Whitcomb Stat-Ease, Inc., Minneapolis, MN USA

Mark J. Anderson, Patrick J. Whitcomb Stat-Ease, Inc., Minneapolis, MN USA Journal of Statistical Science and Application (014) 85-9 D DAV I D PUBLISHING Practical Aspects for Designing Statistically Optimal Experiments Mark J. Anderson, Patrick J. Whitcomb Stat-Ease, Inc., Minneapolis,

More information

1.4 - Linear Regression and MS Excel

1.4 - Linear Regression and MS Excel 1.4 - Linear Regression and MS Excel Regression is an analytic technique for determining the relationship between a dependent variable and an independent variable. When the two variables have a linear

More information

Cochrane Pregnancy and Childbirth Group Methodological Guidelines

Cochrane Pregnancy and Childbirth Group Methodological Guidelines Cochrane Pregnancy and Childbirth Group Methodological Guidelines [Prepared by Simon Gates: July 2009, updated July 2012] These guidelines are intended to aid quality and consistency across the reviews

More information

RESPONSE SURFACE MODELING AND OPTIMIZATION TO ELUCIDATE THE DIFFERENTIAL EFFECTS OF DEMOGRAPHIC CHARACTERISTICS ON HIV PREVALENCE IN SOUTH AFRICA

RESPONSE SURFACE MODELING AND OPTIMIZATION TO ELUCIDATE THE DIFFERENTIAL EFFECTS OF DEMOGRAPHIC CHARACTERISTICS ON HIV PREVALENCE IN SOUTH AFRICA RESPONSE SURFACE MODELING AND OPTIMIZATION TO ELUCIDATE THE DIFFERENTIAL EFFECTS OF DEMOGRAPHIC CHARACTERISTICS ON HIV PREVALENCE IN SOUTH AFRICA W. Sibanda 1* and P. Pretorius 2 1 DST/NWU Pre-clinical

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

Module 14: Missing Data Concepts

Module 14: Missing Data Concepts Module 14: Missing Data Concepts Jonathan Bartlett & James Carpenter London School of Hygiene & Tropical Medicine Supported by ESRC grant RES 189-25-0103 and MRC grant G0900724 Pre-requisites Module 3

More information

Quality Digest Daily, March 3, 2014 Manuscript 266. Statistics and SPC. Two things sharing a common name can still be different. Donald J.

Quality Digest Daily, March 3, 2014 Manuscript 266. Statistics and SPC. Two things sharing a common name can still be different. Donald J. Quality Digest Daily, March 3, 2014 Manuscript 266 Statistics and SPC Two things sharing a common name can still be different Donald J. Wheeler Students typically encounter many obstacles while learning

More information

CLASSICAL AND. MODERN REGRESSION WITH APPLICATIONS

CLASSICAL AND. MODERN REGRESSION WITH APPLICATIONS - CLASSICAL AND. MODERN REGRESSION WITH APPLICATIONS SECOND EDITION Raymond H. Myers Virginia Polytechnic Institute and State university 1 ~l~~l~l~~~~~~~l!~ ~~~~~l~/ll~~ Donated by Duxbury o Thomson Learning,,

More information

CHAPTER - 6 STATISTICAL ANALYSIS. This chapter discusses inferential statistics, which use sample data to

CHAPTER - 6 STATISTICAL ANALYSIS. This chapter discusses inferential statistics, which use sample data to CHAPTER - 6 STATISTICAL ANALYSIS 6.1 Introduction This chapter discusses inferential statistics, which use sample data to make decisions or inferences about population. Populations are group of interest

More information

Context of Best Subset Regression

Context of Best Subset Regression Estimation of the Squared Cross-Validity Coefficient in the Context of Best Subset Regression Eugene Kennedy South Carolina Department of Education A monte carlo study was conducted to examine the performance

More information

6. A theory that has been substantially verified is sometimes called a a. law. b. model.

6. A theory that has been substantially verified is sometimes called a a. law. b. model. Chapter 2 Multiple Choice Questions 1. A theory is a(n) a. a plausible or scientifically acceptable, well-substantiated explanation of some aspect of the natural world. b. a well-substantiated explanation

More information

All Possible Regressions Using IBM SPSS: A Practitioner s Guide to Automatic Linear Modeling

All Possible Regressions Using IBM SPSS: A Practitioner s Guide to Automatic Linear Modeling Georgia Southern University Digital Commons@Georgia Southern Georgia Educational Research Association Conference Oct 7th, 1:45 PM - 3:00 PM All Possible Regressions Using IBM SPSS: A Practitioner s Guide

More information

EXPERIMENTAL RESEARCH DESIGNS

EXPERIMENTAL RESEARCH DESIGNS ARTHUR PSYC 204 (EXPERIMENTAL PSYCHOLOGY) 14A LECTURE NOTES [02/28/14] EXPERIMENTAL RESEARCH DESIGNS PAGE 1 Topic #5 EXPERIMENTAL RESEARCH DESIGNS As a strict technical definition, an experiment is a study

More information

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS) Chapter : Advanced Remedial Measures Weighted Least Squares (WLS) When the error variance appears nonconstant, a transformation (of Y and/or X) is a quick remedy. But it may not solve the problem, or it

More information

Chapter 02 Developing and Evaluating Theories of Behavior

Chapter 02 Developing and Evaluating Theories of Behavior Chapter 02 Developing and Evaluating Theories of Behavior Multiple Choice Questions 1. A theory is a(n): A. plausible or scientifically acceptable, well-substantiated explanation of some aspect of the

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis

Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis EFSA/EBTC Colloquium, 25 October 2017 Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis Julian Higgins University of Bristol 1 Introduction to concepts Standard

More information

Correlation and Regression

Correlation and Regression Dublin Institute of Technology ARROW@DIT Books/Book Chapters School of Management 2012-10 Correlation and Regression Donal O'Brien Dublin Institute of Technology, donal.obrien@dit.ie Pamela Sharkey Scott

More information

(CORRELATIONAL DESIGN AND COMPARATIVE DESIGN)

(CORRELATIONAL DESIGN AND COMPARATIVE DESIGN) UNIT 4 OTHER DESIGNS (CORRELATIONAL DESIGN AND COMPARATIVE DESIGN) Quasi Experimental Design Structure 4.0 Introduction 4.1 Objectives 4.2 Definition of Correlational Research Design 4.3 Types of Correlational

More information

How to describe bivariate data

How to describe bivariate data Statistics Corner How to describe bivariate data Alessandro Bertani 1, Gioacchino Di Paola 2, Emanuele Russo 1, Fabio Tuzzolino 2 1 Department for the Treatment and Study of Cardiothoracic Diseases and

More information

A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) *

A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) * A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) * by J. RICHARD LANDIS** and GARY G. KOCH** 4 Methods proposed for nominal and ordinal data Many

More information

A Comparison of Several Goodness-of-Fit Statistics

A Comparison of Several Goodness-of-Fit Statistics A Comparison of Several Goodness-of-Fit Statistics Robert L. McKinley The University of Toledo Craig N. Mills Educational Testing Service A study was conducted to evaluate four goodnessof-fit procedures

More information

Doctors Fees in Ireland Following the Change in Reimbursement: Did They Jump?

Doctors Fees in Ireland Following the Change in Reimbursement: Did They Jump? The Economic and Social Review, Vol. 38, No. 2, Summer/Autumn, 2007, pp. 259 274 Doctors Fees in Ireland Following the Change in Reimbursement: Did They Jump? DAVID MADDEN University College Dublin Abstract:

More information

ISC- GRADE XI HUMANITIES ( ) PSYCHOLOGY. Chapter 2- Methods of Psychology

ISC- GRADE XI HUMANITIES ( ) PSYCHOLOGY. Chapter 2- Methods of Psychology ISC- GRADE XI HUMANITIES (2018-19) PSYCHOLOGY Chapter 2- Methods of Psychology OUTLINE OF THE CHAPTER (i) Scientific Methods in Psychology -observation, case study, surveys, psychological tests, experimentation

More information

Applying Machine Learning Methods in Medical Research Studies

Applying Machine Learning Methods in Medical Research Studies Applying Machine Learning Methods in Medical Research Studies Daniel Stahl Department of Biostatistics and Health Informatics Psychiatry, Psychology & Neuroscience (IoPPN), King s College London daniel.r.stahl@kcl.ac.uk

More information

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n. University of Groningen Latent instrumental variables Ebbes, P. IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document

More information

Revised Cochrane risk of bias tool for randomized trials (RoB 2.0) Additional considerations for cross-over trials

Revised Cochrane risk of bias tool for randomized trials (RoB 2.0) Additional considerations for cross-over trials Revised Cochrane risk of bias tool for randomized trials (RoB 2.0) Additional considerations for cross-over trials Edited by Julian PT Higgins on behalf of the RoB 2.0 working group on cross-over trials

More information

Interpretation of Data and Statistical Fallacies

Interpretation of Data and Statistical Fallacies ISSN: 2349-7637 (Online) RESEARCH HUB International Multidisciplinary Research Journal Research Paper Available online at: www.rhimrj.com Interpretation of Data and Statistical Fallacies Prof. Usha Jogi

More information

Chapter 17 Sensitivity Analysis and Model Validation

Chapter 17 Sensitivity Analysis and Model Validation Chapter 17 Sensitivity Analysis and Model Validation Justin D. Salciccioli, Yves Crutain, Matthieu Komorowski and Dominic C. Marshall Learning Objectives Appreciate that all models possess inherent limitations

More information

STEP II Conceptualising a Research Design

STEP II Conceptualising a Research Design STEP II Conceptualising a Research Design This operational step includes two chapters: Chapter 7: The research design Chapter 8: Selecting a study design CHAPTER 7 The Research Design In this chapter you

More information

Speaker Notes: Qualitative Comparative Analysis (QCA) in Implementation Studies

Speaker Notes: Qualitative Comparative Analysis (QCA) in Implementation Studies Speaker Notes: Qualitative Comparative Analysis (QCA) in Implementation Studies PART 1: OVERVIEW Slide 1: Overview Welcome to Qualitative Comparative Analysis in Implementation Studies. This narrated powerpoint

More information

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES 24 MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES In the previous chapter, simple linear regression was used when you have one independent variable and one dependent variable. This chapter

More information

Survival Skills for Researchers. Study Design

Survival Skills for Researchers. Study Design Survival Skills for Researchers Study Design Typical Process in Research Design study Collect information Generate hypotheses Analyze & interpret findings Develop tentative new theories Purpose What is

More information

THE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER

THE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER THE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER Introduction, 639. Factor analysis, 639. Discriminant analysis, 644. INTRODUCTION

More information

Chapter 11. Experimental Design: One-Way Independent Samples Design

Chapter 11. Experimental Design: One-Way Independent Samples Design 11-1 Chapter 11. Experimental Design: One-Way Independent Samples Design Advantages and Limitations Comparing Two Groups Comparing t Test to ANOVA Independent Samples t Test Independent Samples ANOVA Comparing

More information

Adjustments for Rater Effects in

Adjustments for Rater Effects in Adjustments for Rater Effects in Performance Assessment Walter M. Houston, Mark R. Raymond, and Joseph C. Svec American College Testing Alternative methods to correct for rater leniency/stringency effects

More information

Meta-Analysis. Zifei Liu. Biological and Agricultural Engineering

Meta-Analysis. Zifei Liu. Biological and Agricultural Engineering Meta-Analysis Zifei Liu What is a meta-analysis; why perform a metaanalysis? How a meta-analysis work some basic concepts and principles Steps of Meta-analysis Cautions on meta-analysis 2 What is Meta-analysis

More information

UNIT 5 - Association Causation, Effect Modification and Validity

UNIT 5 - Association Causation, Effect Modification and Validity 5 UNIT 5 - Association Causation, Effect Modification and Validity Introduction In Unit 1 we introduced the concept of causality in epidemiology and presented different ways in which causes can be understood

More information

Basis for Conclusions: ISA 230 (Redrafted), Audit Documentation

Basis for Conclusions: ISA 230 (Redrafted), Audit Documentation Basis for Conclusions: ISA 230 (Redrafted), Audit Documentation Prepared by the Staff of the International Auditing and Assurance Standards Board December 2007 , AUDIT DOCUMENTATION This Basis for Conclusions

More information

Comparing Direct and Indirect Measures of Just Rewards: What Have We Learned?

Comparing Direct and Indirect Measures of Just Rewards: What Have We Learned? Comparing Direct and Indirect Measures of Just Rewards: What Have We Learned? BARRY MARKOVSKY University of South Carolina KIMMO ERIKSSON Mälardalen University We appreciate the opportunity to comment

More information

Journal of Political Economy, Vol. 93, No. 2 (Apr., 1985)

Journal of Political Economy, Vol. 93, No. 2 (Apr., 1985) Confirmations and Contradictions Journal of Political Economy, Vol. 93, No. 2 (Apr., 1985) Estimates of the Deterrent Effect of Capital Punishment: The Importance of the Researcher's Prior Beliefs Walter

More information

Sage Publications, Inc. and American Sociological Association are collaborating with JSTOR to digitize, preserve and extend access to Sociometry.

Sage Publications, Inc. and American Sociological Association are collaborating with JSTOR to digitize, preserve and extend access to Sociometry. A Probability Model for Conformity Author(s): Bernard P. Cohen Source: Sociometry, Vol. 21, No. 1 (Mar., 1958), pp. 69-81 Published by: American Sociological Association Stable URL: http://www.jstor.org/stable/2786059

More information

BIOSTATISTICAL METHODS AND RESEARCH DESIGNS. Xihong Lin Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA

BIOSTATISTICAL METHODS AND RESEARCH DESIGNS. Xihong Lin Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA BIOSTATISTICAL METHODS AND RESEARCH DESIGNS Xihong Lin Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA Keywords: Case-control study, Cohort study, Cross-Sectional Study, Generalized

More information

International Journal on Future Revolution in Computer Science & Communication Engineering ISSN: Volume: 4 Issue:

International Journal on Future Revolution in Computer Science & Communication Engineering ISSN: Volume: 4 Issue: Application of the Variance Function of the Difference Between two estimated responses in regulating Blood Sugar Level in a Diabetic patient using Herbal Formula Karanjah Anthony N. School of Science Maasai

More information

Bayesian and Frequentist Approaches

Bayesian and Frequentist Approaches Bayesian and Frequentist Approaches G. Jogesh Babu Penn State University http://sites.stat.psu.edu/ babu http://astrostatistics.psu.edu All models are wrong But some are useful George E. P. Box (son-in-law

More information

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover). STOCKHOLM UNIVERSITY Department of Economics Course name: Empirical methods 2 Course code: EC2402 Examiner: Per Pettersson-Lidbom Number of credits: 7,5 credits Date of exam: Sunday 21 February 2010 Examination

More information

Confidence Intervals On Subsets May Be Misleading

Confidence Intervals On Subsets May Be Misleading Journal of Modern Applied Statistical Methods Volume 3 Issue 2 Article 2 11-1-2004 Confidence Intervals On Subsets May Be Misleading Juliet Popper Shaffer University of California, Berkeley, shaffer@stat.berkeley.edu

More information

Anale. Seria Informatică. Vol. XVI fasc Annals. Computer Science Series. 16 th Tome 1 st Fasc. 2018

Anale. Seria Informatică. Vol. XVI fasc Annals. Computer Science Series. 16 th Tome 1 st Fasc. 2018 HANDLING MULTICOLLINEARITY; A COMPARATIVE STUDY OF THE PREDICTION PERFORMANCE OF SOME METHODS BASED ON SOME PROBABILITY DISTRIBUTIONS Zakari Y., Yau S. A., Usman U. Department of Mathematics, Usmanu Danfodiyo

More information

Session 3: Dealing with Reverse Causality

Session 3: Dealing with Reverse Causality Principal, Developing Trade Consultants Ltd. ARTNeT Capacity Building Workshop for Trade Research: Gravity Modeling Thursday, August 26, 2010 Outline Introduction 1 Introduction Overview Endogeneity and

More information

Lessons in biostatistics

Lessons in biostatistics Lessons in biostatistics The test of independence Mary L. McHugh Department of Nursing, School of Health and Human Services, National University, Aero Court, San Diego, California, USA Corresponding author:

More information

CASE STUDY 2: VOCATIONAL TRAINING FOR DISADVANTAGED YOUTH

CASE STUDY 2: VOCATIONAL TRAINING FOR DISADVANTAGED YOUTH CASE STUDY 2: VOCATIONAL TRAINING FOR DISADVANTAGED YOUTH Why Randomize? This case study is based on Training Disadvantaged Youth in Latin America: Evidence from a Randomized Trial by Orazio Attanasio,

More information

1. The Role of Sample Survey Design

1. The Role of Sample Survey Design Vista's Approach to Sample Survey Design 1978, 1988, 2006, 2007, 2009 Joseph George Caldwell. All Rights Reserved. Posted at Internet website http://www.foundationwebsite.org. Updated 20 March 2009 (two

More information

Carrying out an Empirical Project

Carrying out an Empirical Project Carrying out an Empirical Project Empirical Analysis & Style Hint Special program: Pre-training 1 Carrying out an Empirical Project 1. Posing a Question 2. Literature Review 3. Data Collection 4. Econometric

More information

August 29, Introduction and Overview

August 29, Introduction and Overview August 29, 2018 Introduction and Overview Why are we here? Haavelmo(1944): to become master of the happenings of real life. Theoretical models are necessary tools in our attempts to understand and explain

More information

Russian Journal of Agricultural and Socio-Economic Sciences, 3(15)

Russian Journal of Agricultural and Socio-Economic Sciences, 3(15) ON THE COMPARISON OF BAYESIAN INFORMATION CRITERION AND DRAPER S INFORMATION CRITERION IN SELECTION OF AN ASYMMETRIC PRICE RELATIONSHIP: BOOTSTRAP SIMULATION RESULTS Henry de-graft Acquah, Senior Lecturer

More information

Alan S. Gerber Donald P. Green Yale University. January 4, 2003

Alan S. Gerber Donald P. Green Yale University. January 4, 2003 Technical Note on the Conditions Under Which It is Efficient to Discard Observations Assigned to Multiple Treatments in an Experiment Using a Factorial Design Alan S. Gerber Donald P. Green Yale University

More information

Introduction to Econometrics

Introduction to Econometrics Global edition Introduction to Econometrics Updated Third edition James H. Stock Mark W. Watson MyEconLab of Practice Provides the Power Optimize your study time with MyEconLab, the online assessment and

More information

Chapter-2 RESEARCH DESIGN

Chapter-2 RESEARCH DESIGN Chapter-2 RESEARCH DESIGN 33 2.1 Introduction to Research Methodology: The general meaning of research is the search for knowledge. Research is also defined as a careful investigation or inquiry, especially

More information

investigate. educate. inform.

investigate. educate. inform. investigate. educate. inform. Research Design What drives your research design? The battle between Qualitative and Quantitative is over Think before you leap What SHOULD drive your research design. Advanced

More information

PLANNING THE RESEARCH PROJECT

PLANNING THE RESEARCH PROJECT Van Der Velde / Guide to Business Research Methods First Proof 6.11.2003 4:53pm page 1 Part I PLANNING THE RESEARCH PROJECT Van Der Velde / Guide to Business Research Methods First Proof 6.11.2003 4:53pm

More information

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research 2012 CCPRC Meeting Methodology Presession Workshop October 23, 2012, 2:00-5:00 p.m. Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy

More information

Study of cigarette sales in the United States Ge Cheng1, a,

Study of cigarette sales in the United States Ge Cheng1, a, 2nd International Conference on Economics, Management Engineering and Education Technology (ICEMEET 2016) 1Department Study of cigarette sales in the United States Ge Cheng1, a, of pure mathematics and

More information

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity Measurement & Variables - Initial step is to conceptualize and clarify the concepts embedded in a hypothesis or research question with

More information

Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1: Exploring Data Key Vocabulary:! individual! variable! frequency table! relative frequency table! distribution! pie chart! bar graph! two-way table! marginal distributions! conditional distributions!

More information

How Does Analysis of Competing Hypotheses (ACH) Improve Intelligence Analysis?

How Does Analysis of Competing Hypotheses (ACH) Improve Intelligence Analysis? How Does Analysis of Competing Hypotheses (ACH) Improve Intelligence Analysis? Richards J. Heuer, Jr. Version 1.2, October 16, 2005 This document is from a collection of works by Richards J. Heuer, Jr.

More information

Lab 2: The Scientific Method. Summary

Lab 2: The Scientific Method. Summary Lab 2: The Scientific Method Summary Today we will venture outside to the University pond to develop your ability to apply the scientific method to the study of animal behavior. It s not the African savannah,

More information

The Pretest! Pretest! Pretest! Assignment (Example 2)

The Pretest! Pretest! Pretest! Assignment (Example 2) The Pretest! Pretest! Pretest! Assignment (Example 2) May 19, 2003 1 Statement of Purpose and Description of Pretest Procedure When one designs a Math 10 exam one hopes to measure whether a student s ability

More information

Analysis and Interpretation of Data Part 1

Analysis and Interpretation of Data Part 1 Analysis and Interpretation of Data Part 1 DATA ANALYSIS: PRELIMINARY STEPS 1. Editing Field Edit Completeness Legibility Comprehensibility Consistency Uniformity Central Office Edit 2. Coding Specifying

More information

A Survey of Techniques for Optimizing Multiresponse Experiments

A Survey of Techniques for Optimizing Multiresponse Experiments A Survey of Techniques for Optimizing Multiresponse Experiments Flavio S. Fogliatto, Ph.D. PPGEP / UFRGS Praça Argentina, 9/ Sala 402 Porto Alegre, RS 90040-020 Abstract Most industrial processes and products

More information

Correlation vs. Causation - and What Are the Implications for Our Project? By Michael Reames and Gabriel Kemeny

Correlation vs. Causation - and What Are the Implications for Our Project? By Michael Reames and Gabriel Kemeny Correlation vs. Causation - and What Are the Implications for Our Project? By Michael Reames and Gabriel Kemeny In problem solving, accurately establishing and validating root causes are vital to improving

More information

IAPT: Regression. Regression analyses

IAPT: Regression. Regression analyses Regression analyses IAPT: Regression Regression is the rather strange name given to a set of methods for predicting one variable from another. The data shown in Table 1 and come from a student project

More information

Marno Verbeek Erasmus University, the Netherlands. Cons. Pros

Marno Verbeek Erasmus University, the Netherlands. Cons. Pros Marno Verbeek Erasmus University, the Netherlands Using linear regression to establish empirical relationships Linear regression is a powerful tool for estimating the relationship between one variable

More information

This exam consists of three parts. Provide answers to ALL THREE sections.

This exam consists of three parts. Provide answers to ALL THREE sections. Empirical Analysis and Research Methodology Examination Yale University Department of Political Science January 2008 This exam consists of three parts. Provide answers to ALL THREE sections. Your answers

More information

Book review of Herbert I. Weisberg: Bias and Causation, Models and Judgment for Valid Comparisons Reviewed by Judea Pearl

Book review of Herbert I. Weisberg: Bias and Causation, Models and Judgment for Valid Comparisons Reviewed by Judea Pearl Book review of Herbert I. Weisberg: Bias and Causation, Models and Judgment for Valid Comparisons Reviewed by Judea Pearl Judea Pearl University of California, Los Angeles Computer Science Department Los

More information

What is an Impact Factor? How variable is the impact factor? Time after publication (Years) Figure 1. Generalized Citation Curve

What is an Impact Factor? How variable is the impact factor? Time after publication (Years) Figure 1. Generalized Citation Curve Reprinted with pemission from Perspectives in Puplishing, No 1, Oct 2000, http://www.elsevier.com/framework_editors/pdfs/perspectives 1.pdf What is an Impact Factor? The impact factor is only one of three

More information

Simultaneous Equation and Instrumental Variable Models for Sexiness and Power/Status

Simultaneous Equation and Instrumental Variable Models for Sexiness and Power/Status Simultaneous Equation and Instrumental Variable Models for Seiness and Power/Status We would like ideally to determine whether power is indeed sey, or whether seiness is powerful. We here describe the

More information

Modern Regression Methods

Modern Regression Methods Modern Regression Methods Second Edition THOMAS P. RYAN Acworth, Georgia WILEY A JOHN WILEY & SONS, INC. PUBLICATION Contents Preface 1. Introduction 1.1 Simple Linear Regression Model, 3 1.2 Uses of Regression

More information

COMMITTEE FOR PROPRIETARY MEDICINAL PRODUCTS (CPMP) POINTS TO CONSIDER ON MISSING DATA

COMMITTEE FOR PROPRIETARY MEDICINAL PRODUCTS (CPMP) POINTS TO CONSIDER ON MISSING DATA The European Agency for the Evaluation of Medicinal Products Evaluation of Medicines for Human Use London, 15 November 2001 CPMP/EWP/1776/99 COMMITTEE FOR PROPRIETARY MEDICINAL PRODUCTS (CPMP) POINTS TO

More information

Part 8 Logistic Regression

Part 8 Logistic Regression 1 Quantitative Methods for Health Research A Practical Interactive Guide to Epidemiology and Statistics Practical Course in Quantitative Data Handling SPSS (Statistical Package for the Social Sciences)

More information

MATCHED SAMPLES IN MEDICAL INVESTIGATIONS

MATCHED SAMPLES IN MEDICAL INVESTIGATIONS Brit. J. prev. soc. Med. (1964), 18, 167-173 MATCHED SAMPLES IN MEDICAL INVESTIGATIONS BY Research Group in Biometric Medicine and Obstetric Medicine Research Unit (M.R.C.), University of Aberdeen In investigating

More information

MBA SEMESTER III. MB0050 Research Methodology- 4 Credits. (Book ID: B1206 ) Assignment Set- 1 (60 Marks)

MBA SEMESTER III. MB0050 Research Methodology- 4 Credits. (Book ID: B1206 ) Assignment Set- 1 (60 Marks) MBA SEMESTER III MB0050 Research Methodology- 4 Credits (Book ID: B1206 ) Assignment Set- 1 (60 Marks) Note: Each question carries 10 Marks. Answer all the questions Q1. a. Differentiate between nominal,

More information

Clinical research in AKI Timing of initiation of dialysis in AKI

Clinical research in AKI Timing of initiation of dialysis in AKI Clinical research in AKI Timing of initiation of dialysis in AKI Josée Bouchard, MD Krescent Workshop December 10 th, 2011 1 Acute kidney injury in ICU 15 25% of critically ill patients experience AKI

More information

Political Science 15, Winter 2014 Final Review

Political Science 15, Winter 2014 Final Review Political Science 15, Winter 2014 Final Review The major topics covered in class are listed below. You should also take a look at the readings listed on the class website. Studying Politics Scientifically

More information

Instrumental Variables Estimation: An Introduction

Instrumental Variables Estimation: An Introduction Instrumental Variables Estimation: An Introduction Susan L. Ettner, Ph.D. Professor Division of General Internal Medicine and Health Services Research, UCLA The Problem The Problem Suppose you wish to

More information

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form INVESTIGATING FIT WITH THE RASCH MODEL Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form of multidimensionality. The settings in which measurement

More information

Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study

Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study STATISTICAL METHODS Epidemiology Biostatistics and Public Health - 2016, Volume 13, Number 1 Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation

More information

QA 605 WINTER QUARTER ACADEMIC YEAR

QA 605 WINTER QUARTER ACADEMIC YEAR Instructor: Office: James J. Cochran 117A CAB Telephone: (318) 257-3445 Hours: e-mail: URL: QA 605 WINTER QUARTER 2006-2007 ACADEMIC YEAR Tuesday & Thursday 8:00 a.m. 10:00 a.m. Wednesday 8:00 a.m. noon

More information

ADMS Sampling Technique and Survey Studies

ADMS Sampling Technique and Survey Studies Principles of Measurement Measurement As a way of understanding, evaluating, and differentiating characteristics Provides a mechanism to achieve precision in this understanding, the extent or quality As

More information

12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Introduce moderated multiple regression Continuous predictor continuous predictor Continuous predictor categorical predictor Understand

More information

Epidemiologic Methods I & II Epidem 201AB Winter & Spring 2002

Epidemiologic Methods I & II Epidem 201AB Winter & Spring 2002 DETAILED COURSE OUTLINE Epidemiologic Methods I & II Epidem 201AB Winter & Spring 2002 Hal Morgenstern, Ph.D. Department of Epidemiology UCLA School of Public Health Page 1 I. THE NATURE OF EPIDEMIOLOGIC

More information

Sawtooth Software. The Number of Levels Effect in Conjoint: Where Does It Come From and Can It Be Eliminated? RESEARCH PAPER SERIES

Sawtooth Software. The Number of Levels Effect in Conjoint: Where Does It Come From and Can It Be Eliminated? RESEARCH PAPER SERIES Sawtooth Software RESEARCH PAPER SERIES The Number of Levels Effect in Conjoint: Where Does It Come From and Can It Be Eliminated? Dick Wittink, Yale University Joel Huber, Duke University Peter Zandan,

More information

INTERNATIONAL STANDARD ON ASSURANCE ENGAGEMENTS 3000 ASSURANCE ENGAGEMENTS OTHER THAN AUDITS OR REVIEWS OF HISTORICAL FINANCIAL INFORMATION CONTENTS

INTERNATIONAL STANDARD ON ASSURANCE ENGAGEMENTS 3000 ASSURANCE ENGAGEMENTS OTHER THAN AUDITS OR REVIEWS OF HISTORICAL FINANCIAL INFORMATION CONTENTS INTERNATIONAL STANDARD ON ASSURANCE ENGAGEMENTS 3000 ASSURANCE ENGAGEMENTS OTHER THAN AUDITS OR REVIEWS OF HISTORICAL FINANCIAL INFORMATION (Effective for assurance reports dated on or after January 1,

More information