C.7 Multilevel Modeling

Size: px
Start display at page:

Download "C.7 Multilevel Modeling"

Transcription

1 C.7 Multilevel Modeling S.V. Subramanian C.7.1 Introduction Individuals are organized within a nearly infinite number of levels of organization, from the individual up (for example, families, neighborhoods, counties, states), from the individual down (for example, body organs, cellular matrices, DNA), and for overlapping units (for example, area of residence and work environment). It is necessary, therefore, that links should be made between these possible levels of analysis. The term multilevel refers to the distinct levels or units of analysis, which usually, but not always, consists of, individuals (at lower level) who are nested within contextual/aggregate units (at higher level). Multilevel methods consist of statistical procedures that are pertinent when (i) the observations that are being analyzed are correlated or clustered, or (ii) the causal processes is thought to operate simultaneously at more than one level, and/or (iii) there is an intrinsic interest in describing the variability and heterogeneity in the phenomenon, over and above the focus on the average (Diez Roux 00; Subramanian et al. 003; Subramanian 004a, 004b). Multilevel statistical models are often used in areas such as image processing and remote sensing (Kolaczyk et al. 005). Multilevel methods are specifically geared towards the statistical analysis of data that have a nested structure. The nesting, typically, but not always, is hierarchical. For instance, a two level structure would have many level-1 units nested within a smaller number of level- units. In educational research, the field that provided the impetus for multilevel methods, level-1 usually consists of pupils who are nested within schools at level-. Such structures arise routinely in health and social sciences, such that level-1 and level- units could be, workers in organizations, patients in hospitals, individuals in neighborhoods, respectively. In this chapter, for exemplification, we will consider the structure of individuals nested within neighborhoods (used to reflect one practical realization of place). The existence of nested data structures is neither random nor ignorable; for instance, individuals differ but so do the neighborhoods. Differences among neighborhoods could either be directly due to the differences among individuals

2 508 S.V. Subramanian who live in them; or groupings based on neighborhoods may arise for reasons less strongly associated with the characteristics of the individuals who live in them. Regardless, once such groupings are established, even if their establishment is random, they will tend to become differentiated. This would imply that the group (for example, neighborhoods) and its members (for example, individual residents) can exert influence on each other suggesting different sources of variation (for example, individual-induced and neighborhood-induced) in the outcome of interest and thus compelling analysts to consider covariates at the individual and at the neighborhood level. Ignoring this multilevel structure of variations not simply risks overlooking the importance of neighborhood effects, but has implications for statistical validity. To put this in perspective, in an influential study of progress among primary school children, Bennett (1976), using single-level multiple regression analysis, claimed that children exposed to formal style of teaching exhibited more progress than those who were not. The analysis while recognizing individual children as units of analysis ignored their grouping into teachers/classes. In what was the first important example of multilevel analysis using social science data, Aitkin et al. (1981) reanalyzed the data and demonstrated that when the analysis accounted properly for the grouping of children (at lower level) into classes (at higher levels), the progress of formally taught children could not be shown to significantly differ from the others. What was occurring here was that children within any one class/teacher, because they were taught together, tended to be similar in their performance thereby providing much less information than would have been the case if the same number of children had been taught separately. More formally, the individual samples (for example, children) were correlated or clustered. Such clustered samples do not contain as much information as simple random samples of similar size. As was shown by Aitkin et al. (1981), ignoring this autocorrelation and clustering resulted in an increased risk of finding differences and relationships where none existed. Clustered data also arise as a result of sampling strategies. For instance, while planning large-scale survey data collection, for reasons of cost and efficiency, it is usual to adopt a multistage sampling design. A national population survey, for example, might involve a three-stage design, with regions sampled first, then neighborhoods, and then individuals. A design of this kind generates a three-level hierarchically clustered structure of individuals at level-1 nested within neighborhoods at level-, which in turn are nested in regions at level-3. Individuals living in the same neighborhood can be expected to be more alike than they would be if the sample were truly random. Similar correlation can be expected for neighborhoods within a region. Much documentation exists on measuring this design effect and correcting for it. Indeed, clustered designs (for example, individuals at level-1 nested in neighborhoods at level- nested in regions at level-3) are often a nuisance in traditional analysis. However, individuals, neighborhoods and regions can be seen as distinct structures that exist in the population that should be measured and modeled.

3 C.7 Multilevel modeling 509 C.7. Multilevel framework: A necessity for understanding ecological effects Figure C.7.1 identifies a typology of designs for data collection and analyses (Blakely and Woodward 000; Kawachi and Subramanian 006; Subramanian et al. 007) where the rows indicate the level or unit at which the outcome variable is being measured [that is, at the individual level (y) or the ecological level (Y)], and the columns indicate whether the exposure is being measured at the individual level (x) or the ecological level (X). The ecological level, in this illustration, relates to the neighborhood level. Study-type (y, x) is most commonly encountered when the researcher aims to link exposure to outcomes, with both being measured at the individual level. Study-type (y, x) typically ignores ecological effects (either implicitly or explicitly). Individual (x) Exposure Ecologic (X) Individual (measured at individual level) (y, x) (measured at ecological level) (y, X) Outcome (y) Ecologic Traditional risk factor study Multilevel study (Y, x) a (Y, X) (Y) Ecological study Notes: a This type of study is impossible to specify as it stands. Practically speaking, it will either take the form of (Y, X), that is, ecological study, where X will now simply be central tendency of x. Or, if disaggregation of Y is possible, so that we can observe y, then it will be equivalent to (y, x). Source: (Subramanian et al. 008) Fig. C.7.1. Typology of studies (Subramanian et al. 007) Conversely, study-type (Y, X) referred to as an ecological study may seem intuitively appropriate for research where higher levels (for instances, neighborhoods, regions, states, schools and so on) are the targets of interest. However, study-type (Y, X) conflates the genuinely ecological and the aggregate or compositional (Moon et al. 005), and precludes the possibility of testing heterogeneous contextual effects on different types of individuals. Ecological effects reflect predictors and associated mechanisms operating primarily at the contextual level. The search for such measures and their scientific validation and assessment is an area of active research (Raudenbush 003). Aggregate effects, in contrast, equate the effect of a neighborhood with the sum of the individual effects associated with

4 510 S.V. Subramanian the people living within the neighborhood. In this situation the interpretative question becomes particularly relevant. If common membership of a neighborhood by a set of individuals brings about an effect that is over and above those resulting from individual characteristics, then there may indeed be an ecological effect. Study-type (y, X) provides a multilevel approach in which an ecological exposure is linked to an individual outcome. A more complete representation would be type (y, x, X) whereby we have an individual outcome, individual confounders (x), and neighborhood exposure reflecting a multilevel structure of individuals nested within neighborhoods. A fundamental motivation for study-type (y, x, X) is to distinguish neighborhood differences from the difference a neighborhood makes (Moon et al. 005). Stated differently, ecological effects on the individual outcome should be ascertained after individual factors that reflect the composition of the places (and may be potential confounders) have been controlled. Indeed, compositional explanations for ecological variations in health are common. It nonetheless makes intuitive sense to test for the possibility of ecological effects. Besides anticipating their impact on individual outcomes, compositional factors may vary by context. Thus, unless contextual variables are considered, their direct effects and any indirect mediation through compositional variables remain unidentified. Moreover, composition itself has an intrinsic ecologic dimension; the very fact that individual (compositional) factors may explain ecologic variations serves as a reminder that the real understanding of ecologic effects is likely to be complex. The multilevel framework with its simultaneous examination of the characteristics of the individuals at one level and the context or ecologies in which they are located at another level accordingly offers a comprehensive framework for understanding the ways in which places can affect people (contextual) and/or people can affect places (composition). It likewise allows for a more precise distinction between aggregative fallacy versus ecologic effects (Subramanian et al. 008). C.7.3 A typology of multilevel data structures The idea of multilevel structure can be recast, with great advantage, to address a range of circumstances where one may anticipate clustering. Outcomes as well as their causal mechanisms are rarely stable and invariant over time, producing data structures that involve repeated measures, which can be considered a special case of multilevel clustered data structures. Consider the repeated cross-sectional design that can be structured in multilevel terms with neighborhoods at level-3; year/time at level- and individuals at level-1. In this example, level- represents repeated measurements on the neighborhoods (level-3) over time. Such a structure can be used to investigate what sorts of individuals and what sorts of neighborhoods have changed with respect to the outcome. Alternatively, there is the classic longitudinal or panel design in which the level-1 is the measurement

5 C.7 Multilevel modeling 511 occasion, level- is the individual and level-3 is the neighborhood. This time, the individuals are repeatedly measured at different time intervals so that it becomes possible to model changing individual behaviors within a contextual setting of, say neighborhoods. When different responses/outcomes are correlated this lends itself to a multivariate multilevel data structure in which level-1 are sets of response variables measured on individuals at level- nested in neighborhoods at level-3. The multivariate responses could be, for instance, different aspects of, say, health behavior (for example, smoking and drinking). In addition, such responses could be a mixture of quality (do you smoke/do you drink) and quantity (how many/how much) producing mixed multivariate responses. The substantive benefit of this approach is that it is possible to assess whether different types of behavior and whether the qualitative and quantitative aspects of each behavior are related to individual characteristics in the same or different ways. Additionally, we can also ascertain whether neighborhoods that is high for one behavior also high for another and whether neighborhoods with high prevalence of smoking, for instance, also high in terms of the number of cigarettes smoked. While the previous examples are strictly hierarchical, in that all level-1 units that form a level- grouping are always in the same group at any higher level, data structures could be non-hierarchical. For example, a model of health behavior (for instance, smoking) could be formulated with individuals at level-1 and both residential neighborhoods and workplaces at level- not nested but crossed and are also called as the cross-classified structures. Individuals are then seen as occupying more than one set of contexts, each of which may have an important influence. For instance individuals in a particular workplace may come from different neighborhoods and individuals in a neighborhood may go to several worksites. A related structure occurs where for a single level- classification (for example, neighborhoods), level-1 units (for example, individuals) may belong to more than one level- unit and these are also referred as multiple membership designs. The individual can be considered to belong simultaneously to several neighborhoods with the contributions of each neighborhood being weighted in relation to its distance (if the interest is spatial) from the individual. In summary, between some combination of hierarchical structures, cross-classified nesting and multiple membership exhibit a great of complexity that is imprinted either explicitly or implicitly in data can be incorporated via multilevel models. C.7.4 The distinction between levels and variables Each of the levels that were discussed in the previous section (for example, neighborhoods) can be considered as variables in a regression equation with an indicator variable specified for each neighborhood. Conversely, why are many categorical variables such as gender, ethnicity/race, social class not a level? Critical to treating neighborhoods, for example, as a level is because neighborhoods are

6 51 S.V. Subramanian treated as a population of units from which we have observed one random sample. This enables us to draw generalizations for a particular level (for example, neighborhoods) based on an observed sample of neighborhoods. Further, it is more efficient to model neighborhoods as a random variable given the (likely) large number of neighborhoods. On the other hand gender, for instance is not a level because it is not a sample out of all possible gender categories. Rather, it is an attribute of individuals. Thus, male or female in our gender example are fixed discrete categories of a variable with the specific categories only contribute to their respective means. They are not a random sample of gender categories from a population of gender groupings. Further, we would usually wish to ascribe a fixed-effect to each gender, but not each neighborhood. Rather, we wish to model an ecologic attribute at the neighborhood-level. It is possible to consider levels as variables. Thus, when neighborhoods are considered as a variable, they are typically reflective of a fixed classification. While this may be useful in certain circumstances, doing so robs the researcher of the ability to generalize to all neighborhoods and inferences are only possible for the specific neighborhoods observed in the sample. C.7.5 Multilevel analysis There are three constitutive components of multilevel analysis which are now discussed. Evaluating sources of variation: Compositional and/or contextual. A fundamental application of multilevel methods is disentangling the different sources of variations in the outcome. Evidence for variations in poor health, for example, between different neighborhoods can be due to factors that are intrinsic to, and are measured at, the neighborhood level. In other words, the variation is due to what can be described as contextual, or neighborhood effects. Alternatively, variations between neighborhoods may be compositional, that is, certain types of people who are more likely to be in poor health due to their individual characteristics happen to be clustered in certain neighborhoods. The issue, therefore, is not whether variations between different neighborhoods exist (they usually do), but what is the primary source of these variations. Put simply, are there significant contextual differences in health between neighborhoods, after taking into account the individual compositional characteristic of the neighborhood? The notions of contextual and compositional sources of variation have general relevance and they are applicable whether the context is administrative (for example, political boundaries), temporal (for example, different time periods), or institutional (for example, schools or hospitals). Describing contextual heterogeneity. Contextual differences may be complex such that it may not be the same for all types of people. Describing such contextual heterogeneity is another aspect of multilevel analysis and can have two interpretative dimensions. First, there may be a different amount of neighborhood

7 C.7 Multilevel modeling 513 variation, such that, for example, for high social class individuals it may not matter in which neighborhoods they live (thus a lower between neighborhood variation), but it matters a great deal for the low social class and as such shows a large between-neighborhood variation. Second, there may be a differential ordering: neighborhoods that are high for one group are low for the other and vice versa. Stated simply, the multilevel analytical question is whether the contextual neighborhood differences in poor health, after taking into account the individual composition of the neighborhood, is different for different types of population groups? Characterizing and explaining the contextual variations. Contextual differences, in addition to people s characteristics, may also be influenced by the different characteristics of neighborhoods. Stated differently, individual differences may interact with context and ascertaining the relative importance of individual and neighborhood covariates is another key aspect of a multilevel analysis. For example, over and above social class (individual characteristic) health may depend upon the poverty levels of the neighborhoods (neighborhood characteristic). The contextual effect of poverty can either be the same for both the high and low social class suggesting that while neighborhood poverty explains the prevalence of poor health, it does not influence the social class inequalities in health. On the other hand, the contextual effects of poverty may be different for different groups, such that neighborhood poverty adversely affects the low social class, but does the opposite for the high social class. Thus, neighborhood level poverty may not only be related to average health achievements but also shapes social inequalities in health. The analytical question of interest is whether the effect of neighborhood level socioeconomic characteristics on health is different for different types of people? In the presence of a multilevel data, as described in Section C.7.3, and having motivations as discussed above, there are substantive as well as technical reasons to use multilevel statistical models to analyze such data (Raudenbush and Bryk 00; Goldstein 003). We shall not review the basic principles of multilevel modeling here as they have been described elsewhere in the context of health research (Subramanian et al. 003; Moon et al. 005; Blakely and Subramanian 006), but rather provide a brief overview of the type of models invoked for identifying ecologic effects discussed in this section. C.7.6 Multilevel statistical models Like all statistical regression equations, multilevel models have the same underlying function, which can be expressed as: RESPONSE = FIXED/AVERAGE PARAMETERS + (RANDOM/VARIANCE PARAMETERS).

8 514 S.V. Subramanian While in a conventional regression model the random part of the model is usually restricted to a single term (called error terms or residuals), in the multilevel regression model the focus is on expanding the random part of a statistical model. In order to exemplify multilevel models we consider the following example. Suppose we are interested in studying the variation in health score, as a function of certain individual and neighborhood predictors. Let us assume that the researcher collected data on a sample of 50 neighborhoods and, for each of these neighborhoods, a random sample of individuals. We then have a two-level structure where the outcome is a health score (with higher score indicating better health), y, for individual i in neighborhood. We will restrict this exemplification to one individual-level predictor, poverty, x 1i, coded as zero if not poor and one if poor, for every individual i in neighborhood ; and one neighborhood predictor, w 1, a socioeconomic deprivation index in neighborhood. Variance component or random intercepts model. Multilevel models operate by developing regression equations at each level of analysis. In the illustration considered here, models would have to be specified at two levels, level-1 and level-. The model at level-1 can be formally expressed as y = β + β x + e (C.7.1) i 0 1 1i 0i where β 0 (associated with a constant, x 0i, which is a set of ones, and therefore, not written) is the mean health score for the th neighborhood for the non-poor group; β 1 is the average differential in health score associated with individual poverty status (x 1i ) across all neighborhoods. e 0i is the individual or the level-1 residual term. To make this a genuine two-level model we let β 0 become a random variable as β 0 = β 0 + u 0 (C.7.) where u 0 is the random neighborhood-specific displacement associated with the overall mean health score (β 0 ) for the non-poor group. Since we do not allow, at this stage, the average differential for the poor and non-poor group (β 1 ) to vary across neighborhoods, u 0 is assumed to be same for both groups. Equation (C.7.) is then the level- between-neighborhood model. It is worth emphasizing that the neighborhood effect, u 0 can be treated in one of the two ways. One can estimate each neighborhood separately as a fixedeffect (that is, treat them as a variable, with 50 neighborhoods there will be 49 additional parameters to be estimated). Such a strategy may be appropriate if the interest is in making inferences about ust those sampled neighborhoods. On the other hand, if neighborhoods are treated as a (random) sample from a population of neighborhoods (which might include neighborhoods in future studies if one has

9 C.7 Multilevel modeling 515 complete population data), the target of inference is the variation between neighborhoods in general. Adopting this multilevel statistical approach makes u 0 a random variable at level- in a two-level statistical model. Substituting Eq. (C.7.) into Eq. (C.7.1) and grouping them into fixed and random part components (the latter shown in brackets) yields the following random-intercepts or variance components model y = β + β x + ( u + e ). (C.7.3) i 0 1 1i 0 0i We have now expressed the response y i as the sum of a fixed part and a random part. Assuming a normal distribution with zero mean, we can estimate a variance at level-1 (σ² e0 : the between-individual within-neighborhood variation) and level- (σ² u0 : the between-neighborhood variation), both conditional on fixed poverty differences in health score. It is the presence of more than one residual term (or the structure of the random part more generally) that distinguishes the multilevel model from the standard linear regression models or analysis of variance type analysis. The underlying random structure (variance-covariance) of the model specified in Eq. (C.7.3) is var (u 0 ) N (0, σ u0 ) (C.7.4a) var (e 0i ) N (0, σ e0 ) (C.7.4b) cov( u, e ) = 0. (C.7.4c) 0 0i It is this aspect of the regression model that requires special estimation procedures in order to obtain satisfactory parameter estimates (Goldstein 003). The model specified in Eq. (C.7.3) with the above random structure is typically used to partition variation according to the different levels, with the variance in y i being the sum of σ u0 and σ e0. This leads to a statistic known as intra-class correlation, or intra-unit correlation, or more generally variance partitioning coefficient (Goldstein 00), representing the degree of similarity between two randomly chosen individuals within a neighborhood. This can be expressed as ρ σ = σ σ u0 u0 + e0. (C.7.5)

10 516 S.V. Subramanian Note that Eq. (C.7.3) estimates a variance based on the observed sample of neighborhoods. While this is important to establish the overall importance of neighborhoods as a unit or level, another quantity of interest may pertain to estimating whether living in neighborhood 1, as compared to neighborhood 3, for example, predicts a different health score conditional on compositional influences of covariates. Given Eq. (C.7.3), we can estimate for each level- unit uˆ = E( u Y, ˆ β, Ω^ ). (C.7.6) 0 0 ˆ The quantity u 0 are referred to as estimated or predicted residuals, or using Bayesian terminology, as posterior residual estimates, and is calculated as u 0 = r σ u0 σ + σ u0 e0 / n (C.7.7) where σ u0 and σ e0 are as defined above, r is the mean of the individual-level raw residuals for neighborhood, and n is the number of individuals within each neighborhood. This formula for u ˆ0 uses the level-1 and level- variances and the number of people observed in neighborhood to scale the observed level- residual r. As the level-1 variance declines or the sample size increases, the scale factor approaches one, and thus u ˆ0 approaches r. These neighborhood-level residuals are random variables with a distribution whose parameter values tell us about the variation among the level- units (Goldstein 003). Another interpretation is that each u ˆ0 estimates neighborhood s departure from expected mean outcome. This interpretation is based on the assumption that each neighborhood belongs to a population of neighborhoods, and the distribution of the population provides information about plausible values for neighborhood (Goldstein 003). For a neighborhood with only a few individuals, we can obtain more precise estimates by combining the population and neighborhood-specific observations than if we were to ignore the population membership assumption and use only the information from that neighborhood. When the estimated residuals at higher-level units are of interest in their own right, we need to provide standard errors, interval estimates and significance tests as well as point estimates for them (Goldstein 003). Modeling places: fixed or random? It is worth drawing parallels between the multilevel or random-effects model given by Eq. (C.7.3) and the conventional OLS or fixed-effects regression model. Consider the fixed-effects model, whereby the neighborhood effect is estimated by including a dummy for each neighborhood, as shown by

11 C.7 Multilevel modeling 517 y = β + β x + β N + e (C.7.8) i 0 i 0i where N is a vector of dummy variables for N 1 neighborhoods. The key conceptual difference between the fixed-effects and the random-effects approach to modeling neighborhoods is that while the fixed part coefficients are estimated separately, the random part differentials ( u 0 ) are conceptualized as coming from a distribution (Goldstein 003). This conceptualization results in three practical benefits (Jones and Bullen 1994) (i) pooling information between neighborhoods, with all the information in the data being used in the combined estimation of the fixed and random part; in particular, the overall regression terms are based on the information for all neighborhoods; (ii) borrowing strength, whereby neighborhood-specific relations that are imprecisely estimated benefit from the information for other neighborhoods; and (iii) precision-weighted estimation, whereby unreliable neighborhood-specific fixed estimates are differentially down-weighted or shrunk toward the overall city-wide estimate. A reliably estimated within-neighborhood relation will be largely immune to this shrinkage. The random-effects and the fixed-effects estimates for each neighborhood are related (Jones and Bullen 1994). The neighborhood-specific random intercept ( β 0 ) in a multilevel model is a weighted combination of the specific neighborhood coefficient in a fixed-effects model ( β * 0 ) and the overall multilevel intercept ( β 0 ), in the following way β = w β + (1 w ) β (C.7.9) * with the overall multilevel intercept being a weighted average of all the fixed intercepts β = 0 w β0 w. (C.7.10) Each neighborhood weight is the ratio of the true between-neighborhood parameter variance to the total variance, which additionally includes sampling variance resulting from observing a sample from the neighborhood. Consequently, the weights represent the reliability or precision of the fixed terms

12 518 S.V. Subramanian w σ = υ uo + σ uo (C.7.11) where the random sampling variance of the fixed parameter is σ υ = (C.7.1) e n with n being the number of observations within neighborhood. When there are genuine differences between the neighborhoods and the sample sizes within a neighborhood are large, the sampling variance will be small in comparison to the total variance. As a result, the associated weight will be close to one, with the fixed neighborhood effect being reliably estimated, and the random effect neighborhood estimate will be close to the fixed neighborhood effect. As the sampling variance increases, however, the weight will be less than one and the multilevel estimate will increasingly be influenced by the overall intercept based on pooling across neighborhoods. Shrinkage estimates allow the data to determine an appropriate compromise between specific estimates for different neighborhoods and the overall fixed estimate that pools information across places over the entire sample (Jones and Bullen 1994). Importantly, the fixed-effects approach to modeling neighborhood differences using cross-sectional data is not a choice for a typical multilevel research question, where there is an intrinsic interest in an exposure measured at the level of neighborhood such as the one specified in Eq. (C.7.3). In such instances, a multilevel modeling approach is a necessity. This is because the dummy variables associated with the neighborhoods (measuring the fixed-effects of each neighborhood) and the neighborhood exposure is perfectly confounded and, as such, the latter is not identifiable (Fielding 004). Thus, the fixed-effects specification to understand neighborhood differences is unsuitable for the sort of complex questions which multilevel modeling can address. The random coefficient or random slopes model. We can expand the random structure in Eq. (C.7.3) by allowing the fixed-effect of individual poverty (β 1 ) to randomly vary across neighborhoods in the following manner y = + x + e (C.7.13). i β0 β1 1i 0 i

13 C.7 Multilevel modeling 519 At level-, there will now be two models β = β + u (C.7.14) β = β + u. (C.7.15) Substituting the level- models in Eqs. (C.7.14) and (C.7.15) into the level-1 model in Eq. (C.7.13) gives: y = β + β x + ( u + u x + e ). (C.7.16) i 0 1 1i 0 1 1i 0i Across neighborhoods, the mean health score for non-poor is β 0, and β 0 + β 1 is the mean health score for the poor, and the mean poverty-differential is β 1. The poverty differential is no longer constant across neighborhoods, but varies by the amount u 0 around the mean, β 1. Such models are also referred to as randomslopes or random coefficient models. These models have a more complex variance-covariance structure than before u var u 0 1 σ u ~ N 0, σ u 0 0u1 σ u1 (C.7.17) var[ e ] ~ N (0, σ ). (C.7.18) 0i e0 With this formulation, it is no longer straightforward to think in terms of a summary intraclass correlation statistic ρ as the level- variation is now a function of an individual predictor variable, x 1i. In our exemplification when x 1i is a dummy variable, we will have two variances estimated at level-, one for non-poor which is σ and one for poor which is u0 σ + σ x + σ x. (C.7.19) u0 uou1 1i u1 1i

14 50 S.V. Subramanian That is, level- variation will be a quadratic function of the individual predictor variable when x i is a continuous predictor. Thus the notion of random intercepts and slopes, while intuitive, is not entirely appropriate. Rather, what these models are really doing is modeling variance as some function (constant, quadratic or linear) of a predictor variable (Subramanian et al. 003). Building on the above perspective of modeling the variance-covariance function (as opposed to random intercepts and slopes ), we can extend the concept to modeling variance function at level-1. It is extremely common to assume that the variance is homoskedastic in the random part at level-1 [ σ e0 ; Eq. (C.7.16))], and indeed researchers seldom report whether this assumption was tested or not. One strategy would be to model the different variances for poor and non-poor of the following form: y = β + β x + ( u + u x + e x + e x ) (C.7.0) i 0 1 1i 0 1 1i 1i 1i i i where x 1i = 0 for non-poor, one for poor, and the new variable x i = 1 for nonpoor, zero for poor, with var( e1 i ) = σ e1 giving the variance for poor, and var( ei ) = σ e giving the variance for non-poor, and cov( e1 i, e i ) = 0. There are other parsimonious ways to model level-1 variation in the presence of a number of predictor variables (Goldstein 003; Subramanian et al. 003). With this specification, we do not have an interpretation of the random level-1 coefficients as random slopes as we did at level-. The level-1 parameters, σ e1 and σ e, describe the complexity of level-1 variation, which is no longer homoskedastic (Goldstein 003). Anticipating and modeling heteroskedasticity or heterogeneity at the individual level may be important in multilevel analysis as there may be cross-level confounding what may appear to be neighborhood heterogeneity (level-) to be explained by some ecological variable could be due to a failure to take account of the between individual (within-neighborhood) heterogeneity (level-1). Modeling the fixed-effect of a neighborhood predictor. An attractive feature of multilevel models one that is perhaps most commonly used in social science research is their utility in modeling neighborhood and individual characteristics, and any interaction between them, simultaneously. We will consider the underlying level- model related to Eq. (C.7.0), which is exactly the same as specified in Eqs. (C.7.14) to (C.7.15), but now including a level- predictor w 1, the deprivation index for neighborhood β = β + α w + u (C.7.1) β = β + α w + u. (C.7.)

15 C.7 Multilevel modeling 51 Note that the separate specification of micro and macro models correctly recognizes that the contextual variables ( w 1 ) are predictors of between-neighborhood differences. The extension of Eq. (C.7.0) will now be y = β + β x + α w + α w x + ( u + u x + e x + e x ). (C.7.3) i 0 1 1i i 0 1 1i 1i 1i i i The combined formulation in Eq. (C.7.3) highlights an important feature, the presence of an interaction between a level- and level-1 predictor ( w1 x 1i ), represented by the fixed parameter α. Now, α 1 estimates the marginal change in health score for a unit change in the neighborhood deprivation index for the nonpoor, and α estimates the extent to which the marginal change in health score for unit change in the neighborhood deprivation index is different for the poor. This multilevel statistical formulation allows cross-level effect modification or interaction between individual and neighborhood characteristics to be robustly specified and estimated. In summary, multilevel models are concerned with modeling both the average and the variation around the average, at different levels. To accomplish this they consist of two sets of parameters: those summarizing the average relationships(s), and those summarizing the variation around the average at both the level of individuals and neighborhoods. Models presented in the preceding section can be easily adapted to other structures with nesting of level-1 units within level- units. Additionally, these models can be extended to three or more levels. While the preceding discussion considered a single normally distributed response variable for illustration, multilevel models are capable of handling a wide range of responses. These include: binary outcomes, proportions (for example, logit, log-log, and probit models); multiple categories (for example, ordered and unordered multinomial models); and counts (for example, Poisson and negative binomial distribution models). In essence, these models work by assuming a specific, non- Gaussian distribution for the random part at level-1, while maintaining the normality assumptions for random parts at higher levels. Consequently, the discussion presented in this entry focusing at the neighborhood level would continue to hold regardless of the nature of the response variable, with some exceptions. For instance, determining intra-class correlation or partitioning variances across individual and neighborhood levels in complex non-linear multilevel logistic models is not straightforward (see for details, Browne et al. 005; Goldstein et al. 00). C.7.7 Exploiting the flexibility of multilevel models to incorporating realistic complexity Current implementations of multilevel models have generally failed to exploit the full capabilities of the analytical framework (Subramanian 004a; Leyland 005;

16 5 S.V. Subramanian Moon et al. 005). Much, if not all, of the current research linking neighborhoods and health is cross-sectional, and assumes a hierarchical structure of individuals nested within neighborhoods. This simplistic scenario ignores, for instance, the possibility that an individual might move several times and as such reflect neighborhood effects drawn from several contexts, or that other competing contexts (for example, schools, workplaces, hospital settings) may simultaneously contribute to contextual effects. Figure C.7. provides a visual illustration of one complex, but realistic multilevel structure for neighborhoods and health research, where time measurements (level-1) are nested within individuals (level-) who are in turn nested within neighborhoods (level-3). Importantly, individuals are assigned different weights for the time spent in each neighborhood. For example, individual 5 moved from neighborhood one to neighborhood 5 during the time period t 1 -t, spending 0 percent of her time in neighborhood one and 80 percent in her new neighborhood. This multiple membership design would allow control of changing context as well as changing composition. Such designs could be extended to incorporate memberships to additional contexts, such as workplaces, or schools. It can also be extended to enable consideration of weighted effects of proximate contexts (Langford et al. 1998). So, for example, the geographic distribution of disease can be seen not only as a matter of composition and the immediate context in which an outcome occurs, but also a consequence of the impact of nearby contexts with nearer areas being more influential than more distant ones. This is also called spatial autocorrelation and forms an important area of spatial statistical research (Lawson 001). While such analyses require high-quality longitudinal and context-referenced data, models that incorporate such realistic complexity (Best et al. 1996) are likely to improve our understanding of true neighborhood effects. While the foregoing discussion provides a sound rationale to adopt a multilevel analytic approach for modeling ecologic effects, it obviously does not overcome the limitations intrinsic to any observational study design, single-level or multilevel. Fig. C.7.. Multilevel structure of repeated measurements of individuals over time across neighborhoods with individuals having multiple membership to different neighborhoods across the time span. Source: Subramanian (004b)

17 C.7 Multilevel modeling 53 C.7.8 Concluding remarks The multilevel statistical approach an approach that explicitly models the correlated nature of the data arising either due to sampling design or because populations are clustered has a number of substantive and technical advantages. From a substantive perspective, it circumvents the problems associated with ecological fallacy (the invalid transfer of results observed at the ecological level to the individual level), individualistic fallacy (which occurs by failing to take into account the ecology or context within which individual relationships happen), and atomistic fallacy (that arises when associations between individual variables are used to make inferences on the association between the analogous variables at the group/ecological level). The issue common to the above fallacies is the failure to recognize the existence of unique relationships being observable at multiple levels and each being important in its own right. Specifically, one can think of an individual relationship (for example, individuals who are poor are more likely to have poor health), an ecological/contextual relationship (for example, places with a high proportion of poor individuals are more likely to have higher rates of poor health), and an individual-contextual relationship (for example, the greatest likelihood of being in poor health is found for poor individuals in places with a high proportion of poor people). Multilevel models explicitly recognize the levelcontingent nature of relationships. From a technical perspective, the multilevel approach enables researchers to obtain statistically efficient estimates of fixed-effects regression coefficients. Specifically, using the clustering information, multilevel models provide correct standard errors, and thereby robust confidence intervals and significance tests. These generally will be more conservative than the traditional ones that are obtained simply by ignoring the presence of clustering. More broadly, multilevel models allow a more appropriate and realistic specification of complex variance structures at each level. Multilevel models are also precision weighted and capitalize on the advantages that accrue as a result of pooling information from all the neighborhoods to make inferences about specific neighborhoods. While the advances in statistical research and computing has shown the potential of multilevel methods for health and social behavioral research there are issues to be considered while developing and interpreting multilevel applications. First, it is important to clearly motivate and conceptualize the choice of higher levels in a multilevel analysis. Second, establishing the relative importance of context and composition is probably more apparent than real and necessary caution must be exercised while conceptualizing and interpreting the compositional and contextual sources of variation. Third, it is important that the sample of neighborhoods belong to well-defined population of neighborhoods such that the sample shares exchangeable properties that are essential for robust inferences. Fourth, it is important to ensure adequate sample size at all levels of analysis. In general, if the research focus is essentially on neighborhoods then clearly the analysis requires more neighborhoods (as compared to more individuals within a neighborhood).

18 54 S.V. Subramanian Lastly, the ability of multilevel models to make causal inferences is limited and innovative strategies including randomized neighborhood-level research designs (via trials or natural experiments) in combination with multilevel analytical strategy may be required to convincingly demonstrate causal effects of social contexts such as neighborhoods. References Aitkin M, Anderson DR, Hinde J (1981) Statistical modelling of data on teaching styles (with discussion). J Roy Stat Soc A 144(4): Bennett N (1976) Teaching styles and pupil progress. Open Books, London Best N, Spiegelhalter DJ, Thomas A, Brayne CEG (1996) Bayesian analysis of realistically complex models. J Roy Stat Soc A 159():3-34 Blakely TA, Subramanian SV (006) Multilevel studies. In Oakes M, Kaufman J (eds) Methods for social epidemiology. Jossey Bass, San Francisco, pp Blakely TA, Woodward AJ (000). Ecological effects in multi-level studies. J Epid Comm Health 54(5): Browne WJ, Subramanian SV, Jones K, Goldstein H (005) Variance partitioning in multilevel logistic models that exhibit overdispersion. J Roy Stat Soc A168(3): Diez Roux AV (00) A glossary for multilevel analysis. J Epid Comm Health 56(8): Fielding A (004) The role of the Hausman test and whether higher level effects should be treated as random or fixed. Multil Mode Newsl 16():3-9 Goldstein H (003) Multilevel statistical models. Edward Arnold, London Goldstein H, Browne WJ, Rasbash J (00) Partitioning variation in multilevel models. Underst Stat 1(4):3-3 Jones K, Bullen N (1994) Contextual models of urban house prices: a comparison of fixedand random-coefficient models developed by expansion. Econ Geogr 70(3):5-7 Kawachi I, Subramanian SV (006) Measuring and modeling the social and geographic context of trauma: a multilevel modeling approach. J Trauma Stress 19(): Kolaczyk ED, Ju J, Gopal S (005) Multiscale, multigranular statistical image segmentation. J Am Stat Assoc 100: Langford IH, Bentham G, McDonald AL (1998) Multilevel modelling of geographically aggregated health data: a case study on malignant melanoma mortality and UV exposure in the European Community. Stat Med 17(1):41-57 Lawson AB (001) Statistical methods in spatial epidemiology (nd edition). Wiley, New York, Chichester, Toronto and Brisbane Leyland AH (005) Assessing the impact of mobility on health: Implications for life course epidemiology. J Epid Comm Health 59():90-91 Moon G, Subramanian SV, Jones K, Duncan C, Twigg L (005) Area-based studies and the evaluation of multilevel influences on health outcomes. In Bowling A, Ebrahim S (eds) Handbook of health research methods: investigation, measurement and analysis. Open University Press, Berkshire [UK], pp.66-9 Raudenbush SW (003). The quantitative assessment of neighborhood social environment. In Kawachi I, Berkman LF (eds) Neighborhoods and health. Oxford University Press, New York, pp Raudenbush SW, Bryk A (00) Hierarchical linear models: applications and data analysis methods. Sage, Thousand Oaks [CA]

19 C.7 Multilevel modeling 55 Subramanian SV (004a) Multilevel methods, theory and analysis. In Anderson N (ed) Encyclopedia on health and behavior. Sage, Thousand Oaks [CA], pp Subramanian SV (004b) The relevance of multilevel statistical methods for identifying causal neighborhood effects. Soc Sci Med 58(10): Subramanian SV, Glymour MM, Kawachi I (007) Identifying causal ecologic effects on health: a methodologic assessment. In Galea S (ed) Macrosocial determinants of population health. Springer, New York, pp Subramanian SV, Jones K, Duncan C (003) Multilevel methods for public health research. In Kawachi I, Berkman LF (eds) Neighborhoods and health. Oxford University Press, New York, pp Subramanian SV, Jones K, Kaddour A, Krieger N (009) Revisiting Robinson: the perils of individualistic and ecologic fallacy. Int J Epidem 38 ():34-360

Understanding and Applying Multilevel Models in Maternal and Child Health Epidemiology and Public Health

Understanding and Applying Multilevel Models in Maternal and Child Health Epidemiology and Public Health Understanding and Applying Multilevel Models in Maternal and Child Health Epidemiology and Public Health Adam C. Carle, M.A., Ph.D. adam.carle@cchmc.org Division of Health Policy and Clinical Effectiveness

More information

Objective: To describe a new approach to neighborhood effects studies based on residential mobility and demonstrate this approach in the context of

Objective: To describe a new approach to neighborhood effects studies based on residential mobility and demonstrate this approach in the context of Objective: To describe a new approach to neighborhood effects studies based on residential mobility and demonstrate this approach in the context of neighborhood deprivation and preterm birth. Key Points:

More information

Households: the missing level of analysis in multilevel epidemiological studies- the case for multiple membership models

Households: the missing level of analysis in multilevel epidemiological studies- the case for multiple membership models Households: the missing level of analysis in multilevel epidemiological studies- the case for multiple membership models Tarani Chandola* Paul Clarke* Dick Wiggins^ Mel Bartley* 10/6/2003- Draft version

More information

Hierarchical Linear Models: Applications to cross-cultural comparisons of school culture

Hierarchical Linear Models: Applications to cross-cultural comparisons of school culture Hierarchical Linear Models: Applications to cross-cultural comparisons of school culture Magdalena M.C. Mok, Macquarie University & Teresa W.C. Ling, City Polytechnic of Hong Kong Paper presented at the

More information

Multi-level approaches to understanding and preventing obesity: analytical challenges and new directions

Multi-level approaches to understanding and preventing obesity: analytical challenges and new directions Multi-level approaches to understanding and preventing obesity: analytical challenges and new directions Ana V. Diez Roux MD PhD Center for Integrative Approaches to Health Disparities University of Michigan

More information

Understanding Uncertainty in School League Tables*

Understanding Uncertainty in School League Tables* FISCAL STUDIES, vol. 32, no. 2, pp. 207 224 (2011) 0143-5671 Understanding Uncertainty in School League Tables* GEORGE LECKIE and HARVEY GOLDSTEIN Centre for Multilevel Modelling, University of Bristol

More information

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n. University of Groningen Latent instrumental variables Ebbes, P. IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document

More information

What is Multilevel Modelling Vs Fixed Effects. Will Cook Social Statistics

What is Multilevel Modelling Vs Fixed Effects. Will Cook Social Statistics What is Multilevel Modelling Vs Fixed Effects Will Cook Social Statistics Intro Multilevel models are commonly employed in the social sciences with data that is hierarchically structured Estimated effects

More information

Multilevel analysis quantifies variation in the experimental effect while optimizing power and preventing false positives

Multilevel analysis quantifies variation in the experimental effect while optimizing power and preventing false positives DOI 10.1186/s12868-015-0228-5 BMC Neuroscience RESEARCH ARTICLE Open Access Multilevel analysis quantifies variation in the experimental effect while optimizing power and preventing false positives Emmeke

More information

Analysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data

Analysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data Analysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data 1. Purpose of data collection...................................................... 2 2. Samples and populations.......................................................

More information

How to analyze correlated and longitudinal data?

How to analyze correlated and longitudinal data? How to analyze correlated and longitudinal data? Niloofar Ramezani, University of Northern Colorado, Greeley, Colorado ABSTRACT Longitudinal and correlated data are extensively used across disciplines

More information

An informal analysis of multilevel variance

An informal analysis of multilevel variance APPENDIX 11A An informal analysis of multilevel Imagine we are studying the blood pressure of a number of individuals (level 1) from different neighbourhoods (level 2) in the same city. We start by doing

More information

A Brief Introduction to Bayesian Statistics

A Brief Introduction to Bayesian Statistics A Brief Introduction to Statistics David Kaplan Department of Educational Psychology Methods for Social Policy Research and, Washington, DC 2017 1 / 37 The Reverend Thomas Bayes, 1701 1761 2 / 37 Pierre-Simon

More information

C h a p t e r 1 1. Psychologists. John B. Nezlek

C h a p t e r 1 1. Psychologists. John B. Nezlek C h a p t e r 1 1 Multilevel Modeling for Psychologists John B. Nezlek Multilevel analyses have become increasingly common in psychological research, although unfortunately, many researchers understanding

More information

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

3 CONCEPTUAL FOUNDATIONS OF STATISTICS 3 CONCEPTUAL FOUNDATIONS OF STATISTICS In this chapter, we examine the conceptual foundations of statistics. The goal is to give you an appreciation and conceptual understanding of some basic statistical

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

MODELING HIERARCHICAL STRUCTURES HIERARCHICAL LINEAR MODELING USING MPLUS

MODELING HIERARCHICAL STRUCTURES HIERARCHICAL LINEAR MODELING USING MPLUS MODELING HIERARCHICAL STRUCTURES HIERARCHICAL LINEAR MODELING USING MPLUS M. Jelonek Institute of Sociology, Jagiellonian University Grodzka 52, 31-044 Kraków, Poland e-mail: magjelonek@wp.pl The aim of

More information

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data TECHNICAL REPORT Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data CONTENTS Executive Summary...1 Introduction...2 Overview of Data Analysis Concepts...2

More information

A Case Study: Two-sample categorical data

A Case Study: Two-sample categorical data A Case Study: Two-sample categorical data Patrick Breheny January 31 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/43 Introduction Model specification Continuous vs. mixture priors Choice

More information

Using directed acyclic graphs to guide analyses of neighbourhood health effects: an introduction

Using directed acyclic graphs to guide analyses of neighbourhood health effects: an introduction University of Michigan, Ann Arbor, Michigan, USA Correspondence to: Dr A V Diez Roux, Center for Social Epidemiology and Population Health, 3rd Floor SPH Tower, 109 Observatory St, Ann Arbor, MI 48109-2029,

More information

Meta-analysis using HLM 1. Running head: META-ANALYSIS FOR SINGLE-CASE INTERVENTION DESIGNS

Meta-analysis using HLM 1. Running head: META-ANALYSIS FOR SINGLE-CASE INTERVENTION DESIGNS Meta-analysis using HLM 1 Running head: META-ANALYSIS FOR SINGLE-CASE INTERVENTION DESIGNS Comparing Two Meta-Analysis Approaches for Single Subject Design: Hierarchical Linear Model Perspective Rafa Kasim

More information

multilevel modeling for social and personality psychology

multilevel modeling for social and personality psychology 1 Introduction Once you know that hierarchies exist, you see them everywhere. I have used this quote by Kreft and de Leeuw (1998) frequently when writing about why, when, and how to use multilevel models

More information

THE UNIVERSITY OF OKLAHOMA HEALTH SCIENCES CENTER GRADUATE COLLEGE A COMPARISON OF STATISTICAL ANALYSIS MODELING APPROACHES FOR STEPPED-

THE UNIVERSITY OF OKLAHOMA HEALTH SCIENCES CENTER GRADUATE COLLEGE A COMPARISON OF STATISTICAL ANALYSIS MODELING APPROACHES FOR STEPPED- THE UNIVERSITY OF OKLAHOMA HEALTH SCIENCES CENTER GRADUATE COLLEGE A COMPARISON OF STATISTICAL ANALYSIS MODELING APPROACHES FOR STEPPED- WEDGE CLUSTER RANDOMIZED TRIALS THAT INCLUDE MULTILEVEL CLUSTERING,

More information

Data Analysis Using Regression and Multilevel/Hierarchical Models

Data Analysis Using Regression and Multilevel/Hierarchical Models Data Analysis Using Regression and Multilevel/Hierarchical Models ANDREW GELMAN Columbia University JENNIFER HILL Columbia University CAMBRIDGE UNIVERSITY PRESS Contents List of examples V a 9 e xv " Preface

More information

Instrumental Variables Estimation: An Introduction

Instrumental Variables Estimation: An Introduction Instrumental Variables Estimation: An Introduction Susan L. Ettner, Ph.D. Professor Division of General Internal Medicine and Health Services Research, UCLA The Problem The Problem Suppose you wish to

More information

For general queries, contact

For general queries, contact Much of the work in Bayesian econometrics has focused on showing the value of Bayesian methods for parametric models (see, for example, Geweke (2005), Koop (2003), Li and Tobias (2011), and Rossi, Allenby,

More information

Current Directions in Mediation Analysis David P. MacKinnon 1 and Amanda J. Fairchild 2

Current Directions in Mediation Analysis David P. MacKinnon 1 and Amanda J. Fairchild 2 CURRENT DIRECTIONS IN PSYCHOLOGICAL SCIENCE Current Directions in Mediation Analysis David P. MacKinnon 1 and Amanda J. Fairchild 2 1 Arizona State University and 2 University of South Carolina ABSTRACT

More information

Chapter 21 Multilevel Propensity Score Methods for Estimating Causal Effects: A Latent Class Modeling Strategy

Chapter 21 Multilevel Propensity Score Methods for Estimating Causal Effects: A Latent Class Modeling Strategy Chapter 21 Multilevel Propensity Score Methods for Estimating Causal Effects: A Latent Class Modeling Strategy Jee-Seon Kim and Peter M. Steiner Abstract Despite their appeal, randomized experiments cannot

More information

Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties

Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties Bob Obenchain, Risk Benefit Statistics, August 2015 Our motivation for using a Cut-Point

More information

Centering Predictors

Centering Predictors Centering Predictors Longitudinal Data Analysis Workshop Section 3 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section 3: Centering Covered this Section

More information

You must answer question 1.

You must answer question 1. Research Methods and Statistics Specialty Area Exam October 28, 2015 Part I: Statistics Committee: Richard Williams (Chair), Elizabeth McClintock, Sarah Mustillo You must answer question 1. 1. Suppose

More information

Available from Deakin Research Online:

Available from Deakin Research Online: This is the published version: Richardson, Ben and Fuller Tyszkiewicz, Matthew 2014, The application of non linear multilevel models to experience sampling data, European health psychologist, vol. 16,

More information

INTRODUCTION TO ECONOMETRICS (EC212)

INTRODUCTION TO ECONOMETRICS (EC212) INTRODUCTION TO ECONOMETRICS (EC212) Course duration: 54 hours lecture and class time (Over three weeks) LSE Teaching Department: Department of Economics Lead Faculty (session two): Dr Taisuke Otsu and

More information

Propensity Score Analysis Shenyang Guo, Ph.D.

Propensity Score Analysis Shenyang Guo, Ph.D. Propensity Score Analysis Shenyang Guo, Ph.D. Upcoming Seminar: April 7-8, 2017, Philadelphia, Pennsylvania Propensity Score Analysis 1. Overview 1.1 Observational studies and challenges 1.2 Why and when

More information

Data Analysis in Practice-Based Research. Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine

Data Analysis in Practice-Based Research. Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine Data Analysis in Practice-Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine Multilevel Data Statistical analyses that fail to recognize

More information

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data Introduction to Multilevel Models for Longitudinal and Repeated Measures Data Today s Class: Features of longitudinal data Features of longitudinal models What can MLM do for you? What to expect in this

More information

Missing data. Patrick Breheny. April 23. Introduction Missing response data Missing covariate data

Missing data. Patrick Breheny. April 23. Introduction Missing response data Missing covariate data Missing data Patrick Breheny April 3 Patrick Breheny BST 71: Bayesian Modeling in Biostatistics 1/39 Our final topic for the semester is missing data Missing data is very common in practice, and can occur

More information

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

Lec 02: Estimation & Hypothesis Testing in Animal Ecology Lec 02: Estimation & Hypothesis Testing in Animal Ecology Parameter Estimation from Samples Samples We typically observe systems incompletely, i.e., we sample according to a designed protocol. We then

More information

Political Science 15, Winter 2014 Final Review

Political Science 15, Winter 2014 Final Review Political Science 15, Winter 2014 Final Review The major topics covered in class are listed below. You should also take a look at the readings listed on the class website. Studying Politics Scientifically

More information

OLS Regression with Clustered Data

OLS Regression with Clustered Data OLS Regression with Clustered Data Analyzing Clustered Data with OLS Regression: The Effect of a Hierarchical Data Structure Daniel M. McNeish University of Maryland, College Park A previous study by Mundfrom

More information

Regression Discontinuity Analysis

Regression Discontinuity Analysis Regression Discontinuity Analysis A researcher wants to determine whether tutoring underachieving middle school students improves their math grades. Another wonders whether providing financial aid to low-income

More information

EXERCISE: HOW TO DO POWER CALCULATIONS IN OPTIMAL DESIGN SOFTWARE

EXERCISE: HOW TO DO POWER CALCULATIONS IN OPTIMAL DESIGN SOFTWARE ...... EXERCISE: HOW TO DO POWER CALCULATIONS IN OPTIMAL DESIGN SOFTWARE TABLE OF CONTENTS 73TKey Vocabulary37T... 1 73TIntroduction37T... 73TUsing the Optimal Design Software37T... 73TEstimating Sample

More information

The SAGE Encyclopedia of Educational Research, Measurement, and Evaluation Multivariate Analysis of Variance

The SAGE Encyclopedia of Educational Research, Measurement, and Evaluation Multivariate Analysis of Variance The SAGE Encyclopedia of Educational Research, Measurement, Multivariate Analysis of Variance Contributors: David W. Stockburger Edited by: Bruce B. Frey Book Title: Chapter Title: "Multivariate Analysis

More information

MEA DISCUSSION PAPERS

MEA DISCUSSION PAPERS Inference Problems under a Special Form of Heteroskedasticity Helmut Farbmacher, Heinrich Kögel 03-2015 MEA DISCUSSION PAPERS mea Amalienstr. 33_D-80799 Munich_Phone+49 89 38602-355_Fax +49 89 38602-390_www.mea.mpisoc.mpg.de

More information

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data Introduction to Multilevel Models for Longitudinal and Repeated Measures Data Today s Class: Features of longitudinal data Features of longitudinal models What can MLM do for you? What to expect in this

More information

Chapter 11 Nonexperimental Quantitative Research Steps in Nonexperimental Research

Chapter 11 Nonexperimental Quantitative Research Steps in Nonexperimental Research Chapter 11 Nonexperimental Quantitative Research (Reminder: Don t forget to utilize the concept maps and study questions as you study this and the other chapters.) Nonexperimental research is needed because

More information

How many speakers? How many tokens?:

How many speakers? How many tokens?: 1 NWAV 38- Ottawa, Canada 23/10/09 How many speakers? How many tokens?: A methodological contribution to the study of variation. Jorge Aguilar-Sánchez University of Wisconsin-La Crosse 2 Sample size in

More information

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity Measurement & Variables - Initial step is to conceptualize and clarify the concepts embedded in a hypothesis or research question with

More information

Bayesian hierarchical modelling

Bayesian hierarchical modelling Bayesian hierarchical modelling Matthew Schofield Department of Mathematics and Statistics, University of Otago Bayesian hierarchical modelling Slide 1 What is a statistical model? A statistical model:

More information

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover). STOCKHOLM UNIVERSITY Department of Economics Course name: Empirical methods 2 Course code: EC2402 Examiner: Per Pettersson-Lidbom Number of credits: 7,5 credits Date of exam: Sunday 21 February 2010 Examination

More information

Confidence Intervals On Subsets May Be Misleading

Confidence Intervals On Subsets May Be Misleading Journal of Modern Applied Statistical Methods Volume 3 Issue 2 Article 2 11-1-2004 Confidence Intervals On Subsets May Be Misleading Juliet Popper Shaffer University of California, Berkeley, shaffer@stat.berkeley.edu

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 10: Introduction to inference (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 17 What is inference? 2 / 17 Where did our data come from? Recall our sample is: Y, the vector

More information

Placebo and Belief Effects: Optimal Design for Randomized Trials

Placebo and Belief Effects: Optimal Design for Randomized Trials Placebo and Belief Effects: Optimal Design for Randomized Trials Scott Ogawa & Ken Onishi 2 Department of Economics Northwestern University Abstract The mere possibility of receiving a placebo during a

More information

An Instrumental Variable Consistent Estimation Procedure to Overcome the Problem of Endogenous Variables in Multilevel Models

An Instrumental Variable Consistent Estimation Procedure to Overcome the Problem of Endogenous Variables in Multilevel Models An Instrumental Variable Consistent Estimation Procedure to Overcome the Problem of Endogenous Variables in Multilevel Models Neil H Spencer University of Hertfordshire Antony Fielding University of Birmingham

More information

Chapter 13 Estimating the Modified Odds Ratio

Chapter 13 Estimating the Modified Odds Ratio Chapter 13 Estimating the Modified Odds Ratio Modified odds ratio vis-à-vis modified mean difference To a large extent, this chapter replicates the content of Chapter 10 (Estimating the modified mean difference),

More information

Mixed Effect Modeling. Mixed Effects Models. Synonyms. Definition. Description

Mixed Effect Modeling. Mixed Effects Models. Synonyms. Definition. Description ixed Effects odels 4089 ixed Effect odeling Hierarchical Linear odeling ixed Effects odels atthew P. Buman 1 and Eric B. Hekler 2 1 Exercise and Wellness Program, School of Nutrition and Health Promotion

More information

Donna L. Coffman Joint Prevention Methodology Seminar

Donna L. Coffman Joint Prevention Methodology Seminar Donna L. Coffman Joint Prevention Methodology Seminar The purpose of this talk is to illustrate how to obtain propensity scores in multilevel data and use these to strengthen causal inferences about mediation.

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

Ecological Statistics

Ecological Statistics A Primer of Ecological Statistics Second Edition Nicholas J. Gotelli University of Vermont Aaron M. Ellison Harvard Forest Sinauer Associates, Inc. Publishers Sunderland, Massachusetts U.S.A. Brief Contents

More information

THE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER

THE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER THE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER Introduction, 639. Factor analysis, 639. Discriminant analysis, 644. INTRODUCTION

More information

Modelling Research Productivity Using a Generalization of the Ordered Logistic Regression Model

Modelling Research Productivity Using a Generalization of the Ordered Logistic Regression Model Modelling Research Productivity Using a Generalization of the Ordered Logistic Regression Model Delia North Temesgen Zewotir Michael Murray Abstract In South Africa, the Department of Education allocates

More information

IAPT: Regression. Regression analyses

IAPT: Regression. Regression analyses Regression analyses IAPT: Regression Regression is the rather strange name given to a set of methods for predicting one variable from another. The data shown in Table 1 and come from a student project

More information

Supplement 2. Use of Directed Acyclic Graphs (DAGs)

Supplement 2. Use of Directed Acyclic Graphs (DAGs) Supplement 2. Use of Directed Acyclic Graphs (DAGs) Abstract This supplement describes how counterfactual theory is used to define causal effects and the conditions in which observed data can be used to

More information

Lecture 21. RNA-seq: Advanced analysis

Lecture 21. RNA-seq: Advanced analysis Lecture 21 RNA-seq: Advanced analysis Experimental design Introduction An experiment is a process or study that results in the collection of data. Statistical experiments are conducted in situations in

More information

Methods for Addressing Selection Bias in Observational Studies

Methods for Addressing Selection Bias in Observational Studies Methods for Addressing Selection Bias in Observational Studies Susan L. Ettner, Ph.D. Professor Division of General Internal Medicine and Health Services Research, UCLA What is Selection Bias? In the regression

More information

Multivariate meta-analysis for non-linear and other multi-parameter associations

Multivariate meta-analysis for non-linear and other multi-parameter associations Research Article Received 9 August 2011, Accepted 11 May 2012 Published online 16 July 2012 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/sim.5471 Multivariate meta-analysis for non-linear

More information

Simultaneous Equation and Instrumental Variable Models for Sexiness and Power/Status

Simultaneous Equation and Instrumental Variable Models for Sexiness and Power/Status Simultaneous Equation and Instrumental Variable Models for Seiness and Power/Status We would like ideally to determine whether power is indeed sey, or whether seiness is powerful. We here describe the

More information

The University of North Carolina at Chapel Hill School of Social Work

The University of North Carolina at Chapel Hill School of Social Work The University of North Carolina at Chapel Hill School of Social Work SOWO 918: Applied Regression Analysis and Generalized Linear Models Spring Semester, 2014 Instructor Shenyang Guo, Ph.D., Room 524j,

More information

Causal Mediation Analysis with the CAUSALMED Procedure

Causal Mediation Analysis with the CAUSALMED Procedure Paper SAS1991-2018 Causal Mediation Analysis with the CAUSALMED Procedure Yiu-Fai Yung, Michael Lamm, and Wei Zhang, SAS Institute Inc. Abstract Important policy and health care decisions often depend

More information

An Introduction to Multilevel Regression Models

An Introduction to Multilevel Regression Models A B S T R A C T Data in health research are frequently structured hierarchically. For example, data may consist of patients nested within physicians, who in turn may be nested in hospitals or geographic

More information

6. Unusual and Influential Data

6. Unusual and Influential Data Sociology 740 John ox Lecture Notes 6. Unusual and Influential Data Copyright 2014 by John ox Unusual and Influential Data 1 1. Introduction I Linear statistical models make strong assumptions about the

More information

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES 24 MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES In the previous chapter, simple linear regression was used when you have one independent variable and one dependent variable. This chapter

More information

(CORRELATIONAL DESIGN AND COMPARATIVE DESIGN)

(CORRELATIONAL DESIGN AND COMPARATIVE DESIGN) UNIT 4 OTHER DESIGNS (CORRELATIONAL DESIGN AND COMPARATIVE DESIGN) Quasi Experimental Design Structure 4.0 Introduction 4.1 Objectives 4.2 Definition of Correlational Research Design 4.3 Types of Correlational

More information

BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA

BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA PART 1: Introduction to Factorial ANOVA ingle factor or One - Way Analysis of Variance can be used to test the null hypothesis that k or more treatment or group

More information

The Impact of Relative Standards on the Propensity to Disclose. Alessandro Acquisti, Leslie K. John, George Loewenstein WEB APPENDIX

The Impact of Relative Standards on the Propensity to Disclose. Alessandro Acquisti, Leslie K. John, George Loewenstein WEB APPENDIX The Impact of Relative Standards on the Propensity to Disclose Alessandro Acquisti, Leslie K. John, George Loewenstein WEB APPENDIX 2 Web Appendix A: Panel data estimation approach As noted in the main

More information

Cross-Lagged Panel Analysis

Cross-Lagged Panel Analysis Cross-Lagged Panel Analysis Michael W. Kearney Cross-lagged panel analysis is an analytical strategy used to describe reciprocal relationships, or directional influences, between variables over time. Cross-lagged

More information

Statistical Techniques. Meta-Stat provides a wealth of statistical tools to help you examine your data. Overview

Statistical Techniques. Meta-Stat provides a wealth of statistical tools to help you examine your data. Overview 7 Applying Statistical Techniques Meta-Stat provides a wealth of statistical tools to help you examine your data. Overview... 137 Common Functions... 141 Selecting Variables to be Analyzed... 141 Deselecting

More information

The Regression-Discontinuity Design

The Regression-Discontinuity Design Page 1 of 10 Home» Design» Quasi-Experimental Design» The Regression-Discontinuity Design The regression-discontinuity design. What a terrible name! In everyday language both parts of the term have connotations

More information

Structural Equation Modeling (SEM)

Structural Equation Modeling (SEM) Structural Equation Modeling (SEM) Today s topics The Big Picture of SEM What to do (and what NOT to do) when SEM breaks for you Single indicator (ASU) models Parceling indicators Using single factor scores

More information

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics Biost 517 Applied Biostatistics I Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 3: Overview of Descriptive Statistics October 3, 2005 Lecture Outline Purpose

More information

Prediction, Causation, and Interpretation in Social Science. Duncan Watts Microsoft Research

Prediction, Causation, and Interpretation in Social Science. Duncan Watts Microsoft Research Prediction, Causation, and Interpretation in Social Science Duncan Watts Microsoft Research Explanation in Social Science: Causation or Interpretation? When social scientists talk about explanation they

More information

Discriminant Analysis with Categorical Data

Discriminant Analysis with Categorical Data - AW)a Discriminant Analysis with Categorical Data John E. Overall and J. Arthur Woodward The University of Texas Medical Branch, Galveston A method for studying relationships among groups in terms of

More information

In this chapter we discuss validity issues for quantitative research and for qualitative research.

In this chapter we discuss validity issues for quantitative research and for qualitative research. Chapter 8 Validity of Research Results (Reminder: Don t forget to utilize the concept maps and study questions as you study this and the other chapters.) In this chapter we discuss validity issues for

More information

Estimating Heterogeneous Choice Models with Stata

Estimating Heterogeneous Choice Models with Stata Estimating Heterogeneous Choice Models with Stata Richard Williams Notre Dame Sociology rwilliam@nd.edu West Coast Stata Users Group Meetings October 25, 2007 Overview When a binary or ordinal regression

More information

Meta-Analysis and Subgroups

Meta-Analysis and Subgroups Prev Sci (2013) 14:134 143 DOI 10.1007/s11121-013-0377-7 Meta-Analysis and Subgroups Michael Borenstein & Julian P. T. Higgins Published online: 13 March 2013 # Society for Prevention Research 2013 Abstract

More information

Small-area estimation of mental illness prevalence for schools

Small-area estimation of mental illness prevalence for schools Small-area estimation of mental illness prevalence for schools Fan Li 1 Alan Zaslavsky 2 1 Department of Statistical Science Duke University 2 Department of Health Care Policy Harvard Medical School March

More information

Chapter 5: Field experimental designs in agriculture

Chapter 5: Field experimental designs in agriculture Chapter 5: Field experimental designs in agriculture Jose Crossa Biometrics and Statistics Unit Crop Research Informatics Lab (CRIL) CIMMYT. Int. Apdo. Postal 6-641, 06600 Mexico, DF, Mexico Introduction

More information

Fuzzy-set Qualitative Comparative Analysis Summary 1

Fuzzy-set Qualitative Comparative Analysis Summary 1 Fuzzy-set Qualitative Comparative Analysis Summary Arthur P. Tomasino Bentley University Fuzzy-set Qualitative Comparative Analysis (fsqca) is a relatively new but accepted method for understanding case-based

More information

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA Data Analysis: Describing Data CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA In the analysis process, the researcher tries to evaluate the data collected both from written documents and from other sources such

More information

PLANNING THE RESEARCH PROJECT

PLANNING THE RESEARCH PROJECT Van Der Velde / Guide to Business Research Methods First Proof 6.11.2003 4:53pm page 1 Part I PLANNING THE RESEARCH PROJECT Van Der Velde / Guide to Business Research Methods First Proof 6.11.2003 4:53pm

More information

1 The conceptual underpinnings of statistical power

1 The conceptual underpinnings of statistical power 1 The conceptual underpinnings of statistical power The importance of statistical power As currently practiced in the social and health sciences, inferential statistics rest solidly upon two pillars: statistical

More information

The role of sampling assumptions in generalization with multiple categories

The role of sampling assumptions in generalization with multiple categories The role of sampling assumptions in generalization with multiple categories Wai Keen Vong (waikeen.vong@adelaide.edu.au) Andrew T. Hendrickson (drew.hendrickson@adelaide.edu.au) Amy Perfors (amy.perfors@adelaide.edu.au)

More information

Data Sources & Issues for Health Inequalities Research. J. Dunn

Data Sources & Issues for Health Inequalities Research. J. Dunn Data Sources & Issues for Health Inequalities Research J. Dunn Background & Introduction major challenge to find secondary data sources that are compatible with research questions in many instances, data

More information

Sample size calculation for a stepped wedge trial

Sample size calculation for a stepped wedge trial Baio et al. Trials (2015) 16:354 DOI 10.1186/s13063-015-0840-9 TRIALS RESEARCH Sample size calculation for a stepped wedge trial Open Access Gianluca Baio 1*,AndrewCopas 2, Gareth Ambler 1, James Hargreaves

More information

Many studies conducted in practice settings collect patient-level. Multilevel Modeling and Practice-Based Research

Many studies conducted in practice settings collect patient-level. Multilevel Modeling and Practice-Based Research Multilevel Modeling and Practice-Based Research L. Miriam Dickinson, PhD 1 Anirban Basu, PhD 2 1 Department of Family Medicine, University of Colorado Health Sciences Center, Aurora, Colo 2 Section of

More information

Clinical Trials A Practical Guide to Design, Analysis, and Reporting

Clinical Trials A Practical Guide to Design, Analysis, and Reporting Clinical Trials A Practical Guide to Design, Analysis, and Reporting Duolao Wang, PhD Ameet Bakhai, MBBS, MRCP Statistician Cardiologist Clinical Trials A Practical Guide to Design, Analysis, and Reporting

More information

In this module I provide a few illustrations of options within lavaan for handling various situations.

In this module I provide a few illustrations of options within lavaan for handling various situations. In this module I provide a few illustrations of options within lavaan for handling various situations. An appropriate citation for this material is Yves Rosseel (2012). lavaan: An R Package for Structural

More information

CHAPTER 6. Conclusions and Perspectives

CHAPTER 6. Conclusions and Perspectives CHAPTER 6 Conclusions and Perspectives In Chapter 2 of this thesis, similarities and differences among members of (mainly MZ) twin families in their blood plasma lipidomics profiles were investigated.

More information

Business Statistics Probability

Business Statistics Probability Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information