C.7 Multilevel Modeling

Size: px

Start display at page:

Download "C.7 Multilevel Modeling"

Kelley Osborne
5 years ago
Views:

1 C.7 Multilevel Modeling S.V. Subramanian C.7.1 Introduction Individuals are organized within a nearly infinite number of levels of organization, from the individual up (for example, families, neighborhoods, counties, states), from the individual down (for example, body organs, cellular matrices, DNA), and for overlapping units (for example, area of residence and work environment). It is necessary, therefore, that links should be made between these possible levels of analysis. The term multilevel refers to the distinct levels or units of analysis, which usually, but not always, consists of, individuals (at lower level) who are nested within contextual/aggregate units (at higher level). Multilevel methods consist of statistical procedures that are pertinent when (i) the observations that are being analyzed are correlated or clustered, or (ii) the causal processes is thought to operate simultaneously at more than one level, and/or (iii) there is an intrinsic interest in describing the variability and heterogeneity in the phenomenon, over and above the focus on the average (Diez Roux 00; Subramanian et al. 003; Subramanian 004a, 004b). Multilevel statistical models are often used in areas such as image processing and remote sensing (Kolaczyk et al. 005). Multilevel methods are specifically geared towards the statistical analysis of data that have a nested structure. The nesting, typically, but not always, is hierarchical. For instance, a two level structure would have many level-1 units nested within a smaller number of level- units. In educational research, the field that provided the impetus for multilevel methods, level-1 usually consists of pupils who are nested within schools at level-. Such structures arise routinely in health and social sciences, such that level-1 and level- units could be, workers in organizations, patients in hospitals, individuals in neighborhoods, respectively. In this chapter, for exemplification, we will consider the structure of individuals nested within neighborhoods (used to reflect one practical realization of place). The existence of nested data structures is neither random nor ignorable; for instance, individuals differ but so do the neighborhoods. Differences among neighborhoods could either be directly due to the differences among individuals

2 508 S.V. Subramanian who live in them; or groupings based on neighborhoods may arise for reasons less strongly associated with the characteristics of the individuals who live in them. Regardless, once such groupings are established, even if their establishment is random, they will tend to become differentiated. This would imply that the group (for example, neighborhoods) and its members (for example, individual residents) can exert influence on each other suggesting different sources of variation (for example, individual-induced and neighborhood-induced) in the outcome of interest and thus compelling analysts to consider covariates at the individual and at the neighborhood level. Ignoring this multilevel structure of variations not simply risks overlooking the importance of neighborhood effects, but has implications for statistical validity. To put this in perspective, in an influential study of progress among primary school children, Bennett (1976), using single-level multiple regression analysis, claimed that children exposed to formal style of teaching exhibited more progress than those who were not. The analysis while recognizing individual children as units of analysis ignored their grouping into teachers/classes. In what was the first important example of multilevel analysis using social science data, Aitkin et al. (1981) reanalyzed the data and demonstrated that when the analysis accounted properly for the grouping of children (at lower level) into classes (at higher levels), the progress of formally taught children could not be shown to significantly differ from the others. What was occurring here was that children within any one class/teacher, because they were taught together, tended to be similar in their performance thereby providing much less information than would have been the case if the same number of children had been taught separately. More formally, the individual samples (for example, children) were correlated or clustered. Such clustered samples do not contain as much information as simple random samples of similar size. As was shown by Aitkin et al. (1981), ignoring this autocorrelation and clustering resulted in an increased risk of finding differences and relationships where none existed. Clustered data also arise as a result of sampling strategies. For instance, while planning large-scale survey data collection, for reasons of cost and efficiency, it is usual to adopt a multistage sampling design. A national population survey, for example, might involve a three-stage design, with regions sampled first, then neighborhoods, and then individuals. A design of this kind generates a three-level hierarchically clustered structure of individuals at level-1 nested within neighborhoods at level-, which in turn are nested in regions at level-3. Individuals living in the same neighborhood can be expected to be more alike than they would be if the sample were truly random. Similar correlation can be expected for neighborhoods within a region. Much documentation exists on measuring this design effect and correcting for it. Indeed, clustered designs (for example, individuals at level-1 nested in neighborhoods at level- nested in regions at level-3) are often a nuisance in traditional analysis. However, individuals, neighborhoods and regions can be seen as distinct structures that exist in the population that should be measured and modeled.

3 C.7 Multilevel modeling 509 C.7. Multilevel framework: A necessity for understanding ecological effects Figure C.7.1 identifies a typology of designs for data collection and analyses (Blakely and Woodward 000; Kawachi and Subramanian 006; Subramanian et al. 007) where the rows indicate the level or unit at which the outcome variable is being measured [that is, at the individual level (y) or the ecological level (Y)], and the columns indicate whether the exposure is being measured at the individual level (x) or the ecological level (X). The ecological level, in this illustration, relates to the neighborhood level. Study-type (y, x) is most commonly encountered when the researcher aims to link exposure to outcomes, with both being measured at the individual level. Study-type (y, x) typically ignores ecological effects (either implicitly or explicitly). Individual (x) Exposure Ecologic (X) Individual (measured at individual level) (y, x) (measured at ecological level) (y, X) Outcome (y) Ecologic Traditional risk factor study Multilevel study (Y, x) a (Y, X) (Y) Ecological study Notes: a This type of study is impossible to specify as it stands. Practically speaking, it will either take the form of (Y, X), that is, ecological study, where X will now simply be central tendency of x. Or, if disaggregation of Y is possible, so that we can observe y, then it will be equivalent to (y, x). Source: (Subramanian et al. 008) Fig. C.7.1. Typology of studies (Subramanian et al. 007) Conversely, study-type (Y, X) referred to as an ecological study may seem intuitively appropriate for research where higher levels (for instances, neighborhoods, regions, states, schools and so on) are the targets of interest. However, study-type (Y, X) conflates the genuinely ecological and the aggregate or compositional (Moon et al. 005), and precludes the possibility of testing heterogeneous contextual effects on different types of individuals. Ecological effects reflect predictors and associated mechanisms operating primarily at the contextual level. The search for such measures and their scientific validation and assessment is an area of active research (Raudenbush 003). Aggregate effects, in contrast, equate the effect of a neighborhood with the sum of the individual effects associated with

4 510 S.V. Subramanian the people living within the neighborhood. In this situation the interpretative question becomes particularly relevant. If common membership of a neighborhood by a set of individuals brings about an effect that is over and above those resulting from individual characteristics, then there may indeed be an ecological effect. Study-type (y, X) provides a multilevel approach in which an ecological exposure is linked to an individual outcome. A more complete representation would be type (y, x, X) whereby we have an individual outcome, individual confounders (x), and neighborhood exposure reflecting a multilevel structure of individuals nested within neighborhoods. A fundamental motivation for study-type (y, x, X) is to distinguish neighborhood differences from the difference a neighborhood makes (Moon et al. 005). Stated differently, ecological effects on the individual outcome should be ascertained after individual factors that reflect the composition of the places (and may be potential confounders) have been controlled. Indeed, compositional explanations for ecological variations in health are common. It nonetheless makes intuitive sense to test for the possibility of ecological effects. Besides anticipating their impact on individual outcomes, compositional factors may vary by context. Thus, unless contextual variables are considered, their direct effects and any indirect mediation through compositional variables remain unidentified. Moreover, composition itself has an intrinsic ecologic dimension; the very fact that individual (compositional) factors may explain ecologic variations serves as a reminder that the real understanding of ecologic effects is likely to be complex. The multilevel framework with its simultaneous examination of the characteristics of the individuals at one level and the context or ecologies in which they are located at another level accordingly offers a comprehensive framework for understanding the ways in which places can affect people (contextual) and/or people can affect places (composition). It likewise allows for a more precise distinction between aggregative fallacy versus ecologic effects (Subramanian et al. 008). C.7.3 A typology of multilevel data structures The idea of multilevel structure can be recast, with great advantage, to address a range of circumstances where one may anticipate clustering. Outcomes as well as their causal mechanisms are rarely stable and invariant over time, producing data structures that involve repeated measures, which can be considered a special case of multilevel clustered data structures. Consider the repeated cross-sectional design that can be structured in multilevel terms with neighborhoods at level-3; year/time at level- and individuals at level-1. In this example, level- represents repeated measurements on the neighborhoods (level-3) over time. Such a structure can be used to investigate what sorts of individuals and what sorts of neighborhoods have changed with respect to the outcome. Alternatively, there is the classic longitudinal or panel design in which the level-1 is the measurement

5 C.7 Multilevel modeling 511 occasion, level- is the individual and level-3 is the neighborhood. This time, the individuals are repeatedly measured at different time intervals so that it becomes possible to model changing individual behaviors within a contextual setting of, say neighborhoods. When different responses/outcomes are correlated this lends itself to a multivariate multilevel data structure in which level-1 are sets of response variables measured on individuals at level- nested in neighborhoods at level-3. The multivariate responses could be, for instance, different aspects of, say, health behavior (for example, smoking and drinking). In addition, such responses could be a mixture of quality (do you smoke/do you drink) and quantity (how many/how much) producing mixed multivariate responses. The substantive benefit of this approach is that it is possible to assess whether different types of behavior and whether the qualitative and quantitative aspects of each behavior are related to individual characteristics in the same or different ways. Additionally, we can also ascertain whether neighborhoods that is high for one behavior also high for another and whether neighborhoods with high prevalence of smoking, for instance, also high in terms of the number of cigarettes smoked. While the previous examples are strictly hierarchical, in that all level-1 units that form a level- grouping are always in the same group at any higher level, data structures could be non-hierarchical. For example, a model of health behavior (for instance, smoking) could be formulated with individuals at level-1 and both residential neighborhoods and workplaces at level- not nested but crossed and are also called as the cross-classified structures. Individuals are then seen as occupying more than one set of contexts, each of which may have an important influence. For instance individuals in a particular workplace may come from different neighborhoods and individuals in a neighborhood may go to several worksites. A related structure occurs where for a single level- classification (for example, neighborhoods), level-1 units (for example, individuals) may belong to more than one level- unit and these are also referred as multiple membership designs. The individual can be considered to belong simultaneously to several neighborhoods with the contributions of each neighborhood being weighted in relation to its distance (if the interest is spatial) from the individual. In summary, between some combination of hierarchical structures, cross-classified nesting and multiple membership exhibit a great of complexity that is imprinted either explicitly or implicitly in data can be incorporated via multilevel models. C.7.4 The distinction between levels and variables Each of the levels that were discussed in the previous section (for example, neighborhoods) can be considered as variables in a regression equation with an indicator variable specified for each neighborhood. Conversely, why are many categorical variables such as gender, ethnicity/race, social class not a level? Critical to treating neighborhoods, for example, as a level is because neighborhoods are

6 51 S.V. Subramanian treated as a population of units from which we have observed one random sample. This enables us to draw generalizations for a particular level (for example, neighborhoods) based on an observed sample of neighborhoods. Further, it is more efficient to model neighborhoods as a random variable given the (likely) large number of neighborhoods. On the other hand gender, for instance is not a level because it is not a sample out of all possible gender categories. Rather, it is an attribute of individuals. Thus, male or female in our gender example are fixed discrete categories of a variable with the specific categories only contribute to their respective means. They are not a random sample of gender categories from a population of gender groupings. Further, we would usually wish to ascribe a fixed-effect to each gender, but not each neighborhood. Rather, we wish to model an ecologic attribute at the neighborhood-level. It is possible to consider levels as variables. Thus, when neighborhoods are considered as a variable, they are typically reflective of a fixed classification. While this may be useful in certain circumstances, doing so robs the researcher of the ability to generalize to all neighborhoods and inferences are only possible for the specific neighborhoods observed in the sample. C.7.5 Multilevel analysis There are three constitutive components of multilevel analysis which are now discussed. Evaluating sources of variation: Compositional and/or contextual. A fundamental application of multilevel methods is disentangling the different sources of variations in the outcome. Evidence for variations in poor health, for example, between different neighborhoods can be due to factors that are intrinsic to, and are measured at, the neighborhood level. In other words, the variation is due to what can be described as contextual, or neighborhood effects. Alternatively, variations between neighborhoods may be compositional, that is, certain types of people who are more likely to be in poor health due to their individual characteristics happen to be clustered in certain neighborhoods. The issue, therefore, is not whether variations between different neighborhoods exist (they usually do), but what is the primary source of these variations. Put simply, are there significant contextual differences in health between neighborhoods, after taking into account the individual compositional characteristic of the neighborhood? The notions of contextual and compositional sources of variation have general relevance and they are applicable whether the context is administrative (for example, political boundaries), temporal (for example, different time periods), or institutional (for example, schools or hospitals). Describing contextual heterogeneity. Contextual differences may be complex such that it may not be the same for all types of people. Describing such contextual heterogeneity is another aspect of multilevel analysis and can have two interpretative dimensions. First, there may be a different amount of neighborhood

7 C.7 Multilevel modeling 513 variation, such that, for example, for high social class individuals it may not matter in which neighborhoods they live (thus a lower between neighborhood variation), but it matters a great deal for the low social class and as such shows a large between-neighborhood variation. Second, there may be a differential ordering: neighborhoods that are high for one group are low for the other and vice versa. Stated simply, the multilevel analytical question is whether the contextual neighborhood differences in poor health, after taking into account the individual composition of the neighborhood, is different for different types of population groups? Characterizing and explaining the contextual variations. Contextual differences, in addition to people s characteristics, may also be influenced by the different characteristics of neighborhoods. Stated differently, individual differences may interact with context and ascertaining the relative importance of individual and neighborhood covariates is another key aspect of a multilevel analysis. For example, over and above social class (individual characteristic) health may depend upon the poverty levels of the neighborhoods (neighborhood characteristic). The contextual effect of poverty can either be the same for both the high and low social class suggesting that while neighborhood poverty explains the prevalence of poor health, it does not influence the social class inequalities in health. On the other hand, the contextual effects of poverty may be different for different groups, such that neighborhood poverty adversely affects the low social class, but does the opposite for the high social class. Thus, neighborhood level poverty may not only be related to average health achievements but also shapes social inequalities in health. The analytical question of interest is whether the effect of neighborhood level socioeconomic characteristics on health is different for different types of people? In the presence of a multilevel data, as described in Section C.7.3, and having motivations as discussed above, there are substantive as well as technical reasons to use multilevel statistical models to analyze such data (Raudenbush and Bryk 00; Goldstein 003). We shall not review the basic principles of multilevel modeling here as they have been described elsewhere in the context of health research (Subramanian et al. 003; Moon et al. 005; Blakely and Subramanian 006), but rather provide a brief overview of the type of models invoked for identifying ecologic effects discussed in this section. C.7.6 Multilevel statistical models Like all statistical regression equations, multilevel models have the same underlying function, which can be expressed as: RESPONSE = FIXED/AVERAGE PARAMETERS + (RANDOM/VARIANCE PARAMETERS).

8 514 S.V. Subramanian While in a conventional regression model the random part of the model is usually restricted to a single term (called error terms or residuals), in the multilevel regression model the focus is on expanding the random part of a statistical model. In order to exemplify multilevel models we consider the following example. Suppose we are interested in studying the variation in health score, as a function of certain individual and neighborhood predictors. Let us assume that the researcher collected data on a sample of 50 neighborhoods and, for each of these neighborhoods, a random sample of individuals. We then have a two-level structure where the outcome is a health score (with higher score indicating better health), y, for individual i in neighborhood. We will restrict this exemplification to one individual-level predictor, poverty, x 1i, coded as zero if not poor and one if poor, for every individual i in neighborhood ; and one neighborhood predictor, w 1, a socioeconomic deprivation index in neighborhood. Variance component or random intercepts model. Multilevel models operate by developing regression equations at each level of analysis. In the illustration considered here, models would have to be specified at two levels, level-1 and level-. The model at level-1 can be formally expressed as y = β + β x + e (C.7.1) i 0 1 1i 0i where β 0 (associated with a constant, x 0i, which is a set of ones, and therefore, not written) is the mean health score for the th neighborhood for the non-poor group; β 1 is the average differential in health score associated with individual poverty status (x 1i ) across all neighborhoods. e 0i is the individual or the level-1 residual term. To make this a genuine two-level model we let β 0 become a random variable as β 0 = β 0 + u 0 (C.7.) where u 0 is the random neighborhood-specific displacement associated with the overall mean health score (β 0 ) for the non-poor group. Since we do not allow, at this stage, the average differential for the poor and non-poor group (β 1 ) to vary across neighborhoods, u 0 is assumed to be same for both groups. Equation (C.7.) is then the level- between-neighborhood model. It is worth emphasizing that the neighborhood effect, u 0 can be treated in one of the two ways. One can estimate each neighborhood separately as a fixedeffect (that is, treat them as a variable, with 50 neighborhoods there will be 49 additional parameters to be estimated). Such a strategy may be appropriate if the interest is in making inferences about ust those sampled neighborhoods. On the other hand, if neighborhoods are treated as a (random) sample from a population of neighborhoods (which might include neighborhoods in future studies if one has

9 C.7 Multilevel modeling 515 complete population data), the target of inference is the variation between neighborhoods in general. Adopting this multilevel statistical approach makes u 0 a random variable at level- in a two-level statistical model. Substituting Eq. (C.7.) into Eq. (C.7.1) and grouping them into fixed and random part components (the latter shown in brackets) yields the following random-intercepts or variance components model y = β + β x + ( u + e ). (C.7.3) i 0 1 1i 0 0i We have now expressed the response y i as the sum of a fixed part and a random part. Assuming a normal distribution with zero mean, we can estimate a variance at level-1 (σ² e0 : the between-individual within-neighborhood variation) and level- (σ² u0 : the between-neighborhood variation), both conditional on fixed poverty differences in health score. It is the presence of more than one residual term (or the structure of the random part more generally) that distinguishes the multilevel model from the standard linear regression models or analysis of variance type analysis. The underlying random structure (variance-covariance) of the model specified in Eq. (C.7.3) is var (u 0 ) N (0, σ u0 ) (C.7.4a) var (e 0i ) N (0, σ e0 ) (C.7.4b) cov( u, e ) = 0. (C.7.4c) 0 0i It is this aspect of the regression model that requires special estimation procedures in order to obtain satisfactory parameter estimates (Goldstein 003). The model specified in Eq. (C.7.3) with the above random structure is typically used to partition variation according to the different levels, with the variance in y i being the sum of σ u0 and σ e0. This leads to a statistic known as intra-class correlation, or intra-unit correlation, or more generally variance partitioning coefficient (Goldstein 00), representing the degree of similarity between two randomly chosen individuals within a neighborhood. This can be expressed as ρ σ = σ σ u0 u0 + e0. (C.7.5)

10 516 S.V. Subramanian Note that Eq. (C.7.3) estimates a variance based on the observed sample of neighborhoods. While this is important to establish the overall importance of neighborhoods as a unit or level, another quantity of interest may pertain to estimating whether living in neighborhood 1, as compared to neighborhood 3, for example, predicts a different health score conditional on compositional influences of covariates. Given Eq. (C.7.3), we can estimate for each level- unit uˆ = E( u Y, ˆ β, Ω^ ). (C.7.6) 0 0 ˆ The quantity u 0 are referred to as estimated or predicted residuals, or using Bayesian terminology, as posterior residual estimates, and is calculated as u 0 = r σ u0 σ + σ u0 e0 / n (C.7.7) where σ u0 and σ e0 are as defined above, r is the mean of the individual-level raw residuals for neighborhood, and n is the number of individuals within each neighborhood. This formula for u ˆ0 uses the level-1 and level- variances and the number of people observed in neighborhood to scale the observed level- residual r. As the level-1 variance declines or the sample size increases, the scale factor approaches one, and thus u ˆ0 approaches r. These neighborhood-level residuals are random variables with a distribution whose parameter values tell us about the variation among the level- units (Goldstein 003). Another interpretation is that each u ˆ0 estimates neighborhood s departure from expected mean outcome. This interpretation is based on the assumption that each neighborhood belongs to a population of neighborhoods, and the distribution of the population provides information about plausible values for neighborhood (Goldstein 003). For a neighborhood with only a few individuals, we can obtain more precise estimates by combining the population and neighborhood-specific observations than if we were to ignore the population membership assumption and use only the information from that neighborhood. When the estimated residuals at higher-level units are of interest in their own right, we need to provide standard errors, interval estimates and significance tests as well as point estimates for them (Goldstein 003). Modeling places: fixed or random? It is worth drawing parallels between the multilevel or random-effects model given by Eq. (C.7.3) and the conventional OLS or fixed-effects regression model. Consider the fixed-effects model, whereby the neighborhood effect is estimated by including a dummy for each neighborhood, as shown by

11 C.7 Multilevel modeling 517 y = β + β x + β N + e (C.7.8) i 0 i 0i where N is a vector of dummy variables for N 1 neighborhoods. The key conceptual difference between the fixed-effects and the random-effects approach to modeling neighborhoods is that while the fixed part coefficients are estimated separately, the random part differentials ( u 0 ) are conceptualized as coming from a distribution (Goldstein 003). This conceptualization results in three practical benefits (Jones and Bullen 1994) (i) pooling information between neighborhoods, with all the information in the data being used in the combined estimation of the fixed and random part; in particular, the overall regression terms are based on the information for all neighborhoods; (ii) borrowing strength, whereby neighborhood-specific relations that are imprecisely estimated benefit from the information for other neighborhoods; and (iii) precision-weighted estimation, whereby unreliable neighborhood-specific fixed estimates are differentially down-weighted or shrunk toward the overall city-wide estimate. A reliably estimated within-neighborhood relation will be largely immune to this shrinkage. The random-effects and the fixed-effects estimates for each neighborhood are related (Jones and Bullen 1994). The neighborhood-specific random intercept ( β 0 ) in a multilevel model is a weighted combination of the specific neighborhood coefficient in a fixed-effects model ( β * 0 ) and the overall multilevel intercept ( β 0 ), in the following way β = w β + (1 w ) β (C.7.9) * with the overall multilevel intercept being a weighted average of all the fixed intercepts β = 0 w β0 w. (C.7.10) Each neighborhood weight is the ratio of the true between-neighborhood parameter variance to the total variance, which additionally includes sampling variance resulting from observing a sample from the neighborhood. Consequently, the weights represent the reliability or precision of the fixed terms

12 518 S.V. Subramanian w σ = υ uo + σ uo (C.7.11) where the random sampling variance of the fixed parameter is σ υ = (C.7.1) e n with n being the number of observations within neighborhood. When there are genuine differences between the neighborhoods and the sample sizes within a neighborhood are large, the sampling variance will be small in comparison to the total variance. As a result, the associated weight will be close to one, with the fixed neighborhood effect being reliably estimated, and the random effect neighborhood estimate will be close to the fixed neighborhood effect. As the sampling variance increases, however, the weight will be less than one and the multilevel estimate will increasingly be influenced by the overall intercept based on pooling across neighborhoods. Shrinkage estimates allow the data to determine an appropriate compromise between specific estimates for different neighborhoods and the overall fixed estimate that pools information across places over the entire sample (Jones and Bullen 1994). Importantly, the fixed-effects approach to modeling neighborhood differences using cross-sectional data is not a choice for a typical multilevel research question, where there is an intrinsic interest in an exposure measured at the level of neighborhood such as the one specified in Eq. (C.7.3). In such instances, a multilevel modeling approach is a necessity. This is because the dummy variables associated with the neighborhoods (measuring the fixed-effects of each neighborhood) and the neighborhood exposure is perfectly confounded and, as such, the latter is not identifiable (Fielding 004). Thus, the fixed-effects specification to understand neighborhood differences is unsuitable for the sort of complex questions which multilevel modeling can address. The random coefficient or random slopes model. We can expand the random structure in Eq. (C.7.3) by allowing the fixed-effect of individual poverty (β 1 ) to randomly vary across neighborhoods in the following manner y = + x + e (C.7.13). i β0 β1 1i 0 i

13 C.7 Multilevel modeling 519 At level-, there will now be two models β = β + u (C.7.14) β = β + u. (C.7.15) Substituting the level- models in Eqs. (C.7.14) and (C.7.15) into the level-1 model in Eq. (C.7.13) gives: y = β + β x + ( u + u x + e ). (C.7.16) i 0 1 1i 0 1 1i 0i Across neighborhoods, the mean health score for non-poor is β 0, and β 0 + β 1 is the mean health score for the poor, and the mean poverty-differential is β 1. The poverty differential is no longer constant across neighborhoods, but varies by the amount u 0 around the mean, β 1. Such models are also referred to as randomslopes or random coefficient models. These models have a more complex variance-covariance structure than before u var u 0 1 σ u ~ N 0, σ u 0 0u1 σ u1 (C.7.17) var[ e ] ~ N (0, σ ). (C.7.18) 0i e0 With this formulation, it is no longer straightforward to think in terms of a summary intraclass correlation statistic ρ as the level- variation is now a function of an individual predictor variable, x 1i. In our exemplification when x 1i is a dummy variable, we will have two variances estimated at level-, one for non-poor which is σ and one for poor which is u0 σ + σ x + σ x. (C.7.19) u0 uou1 1i u1 1i

14 50 S.V. Subramanian That is, level- variation will be a quadratic function of the individual predictor variable when x i is a continuous predictor. Thus the notion of random intercepts and slopes, while intuitive, is not entirely appropriate. Rather, what these models are really doing is modeling variance as some function (constant, quadratic or linear) of a predictor variable (Subramanian et al. 003). Building on the above perspective of modeling the variance-covariance function (as opposed to random intercepts and slopes ), we can extend the concept to modeling variance function at level-1. It is extremely common to assume that the variance is homoskedastic in the random part at level-1 [ σ e0 ; Eq. (C.7.16))], and indeed researchers seldom report whether this assumption was tested or not. One strategy would be to model the different variances for poor and non-poor of the following form: y = β + β x + ( u + u x + e x + e x ) (C.7.0) i 0 1 1i 0 1 1i 1i 1i i i where x 1i = 0 for non-poor, one for poor, and the new variable x i = 1 for nonpoor, zero for poor, with var( e1 i ) = σ e1 giving the variance for poor, and var( ei ) = σ e giving the variance for non-poor, and cov( e1 i, e i ) = 0. There are other parsimonious ways to model level-1 variation in the presence of a number of predictor variables (Goldstein 003; Subramanian et al. 003). With this specification, we do not have an interpretation of the random level-1 coefficients as random slopes as we did at level-. The level-1 parameters, σ e1 and σ e, describe the complexity of level-1 variation, which is no longer homoskedastic (Goldstein 003). Anticipating and modeling heteroskedasticity or heterogeneity at the individual level may be important in multilevel analysis as there may be cross-level confounding what may appear to be neighborhood heterogeneity (level-) to be explained by some ecological variable could be due to a failure to take account of the between individual (within-neighborhood) heterogeneity (level-1). Modeling the fixed-effect of a neighborhood predictor. An attractive feature of multilevel models one that is perhaps most commonly used in social science research is their utility in modeling neighborhood and individual characteristics, and any interaction between them, simultaneously. We will consider the underlying level- model related to Eq. (C.7.0), which is exactly the same as specified in Eqs. (C.7.14) to (C.7.15), but now including a level- predictor w 1, the deprivation index for neighborhood β = β + α w + u (C.7.1) β = β + α w + u. (C.7.)

15 C.7 Multilevel modeling 51 Note that the separate specification of micro and macro models correctly recognizes that the contextual variables ( w 1 ) are predictors of between-neighborhood differences. The extension of Eq. (C.7.0) will now be y = β + β x + α w + α w x + ( u + u x + e x + e x ). (C.7.3) i 0 1 1i i 0 1 1i 1i 1i i i The combined formulation in Eq. (C.7.3) highlights an important feature, the presence of an interaction between a level- and level-1 predictor ( w1 x 1i ), represented by the fixed parameter α. Now, α 1 estimates the marginal change in health score for a unit change in the neighborhood deprivation index for the nonpoor, and α estimates the extent to which the marginal change in health score for unit change in the neighborhood deprivation index is different for the poor. This multilevel statistical formulation allows cross-level effect modification or interaction between individual and neighborhood characteristics to be robustly specified and estimated. In summary, multilevel models are concerned with modeling both the average and the variation around the average, at different levels. To accomplish this they consist of two sets of parameters: those summarizing the average relationships(s), and those summarizing the variation around the average at both the level of individuals and neighborhoods. Models presented in the preceding section can be easily adapted to other structures with nesting of level-1 units within level- units. Additionally, these models can be extended to three or more levels. While the preceding discussion considered a single normally distributed response variable for illustration, multilevel models are capable of handling a wide range of responses. These include: binary outcomes, proportions (for example, logit, log-log, and probit models); multiple categories (for example, ordered and unordered multinomial models); and counts (for example, Poisson and negative binomial distribution models). In essence, these models work by assuming a specific, non- Gaussian distribution for the random part at level-1, while maintaining the normality assumptions for random parts at higher levels. Consequently, the discussion presented in this entry focusing at the neighborhood level would continue to hold regardless of the nature of the response variable, with some exceptions. For instance, determining intra-class correlation or partitioning variances across individual and neighborhood levels in complex non-linear multilevel logistic models is not straightforward (see for details, Browne et al. 005; Goldstein et al. 00). C.7.7 Exploiting the flexibility of multilevel models to incorporating realistic complexity Current implementations of multilevel models have generally failed to exploit the full capabilities of the analytical framework (Subramanian 004a; Leyland 005;

16 5 S.V. Subramanian Moon et al. 005). Much, if not all, of the current research linking neighborhoods and health is cross-sectional, and assumes a hierarchical structure of individuals nested within neighborhoods. This simplistic scenario ignores, for instance, the possibility that an individual might move several times and as such reflect neighborhood effects drawn from several contexts, or that other competing contexts (for example, schools, workplaces, hospital settings) may simultaneously contribute to contextual effects. Figure C.7. provides a visual illustration of one complex, but realistic multilevel structure for neighborhoods and health research, where time measurements (level-1) are nested within individuals (level-) who are in turn nested within neighborhoods (level-3). Importantly, individuals are assigned different weights for the time spent in each neighborhood. For example, individual 5 moved from neighborhood one to neighborhood 5 during the time period t 1 -t, spending 0 percent of her time in neighborhood one and 80 percent in her new neighborhood. This multiple membership design would allow control of changing context as well as changing composition. Such designs could be extended to incorporate memberships to additional contexts, such as workplaces, or schools. It can also be extended to enable consideration of weighted effects of proximate contexts (Langford et al. 1998). So, for example, the geographic distribution of disease can be seen not only as a matter of composition and the immediate context in which an outcome occurs, but also a consequence of the impact of nearby contexts with nearer areas being more influential than more distant ones. This is also called spatial autocorrelation and forms an important area of spatial statistical research (Lawson 001). While such analyses require high-quality longitudinal and context-referenced data, models that incorporate such realistic complexity (Best et al. 1996) are likely to improve our understanding of true neighborhood effects. While the foregoing discussion provides a sound rationale to adopt a multilevel analytic approach for modeling ecologic effects, it obviously does not overcome the limitations intrinsic to any observational study design, single-level or multilevel. Fig. C.7.. Multilevel structure of repeated measurements of individuals over time across neighborhoods with individuals having multiple membership to different neighborhoods across the time span. Source: Subramanian (004b)

17 C.7 Multilevel modeling 53 C.7.8 Concluding remarks The multilevel statistical approach an approach that explicitly models the correlated nature of the data arising either due to sampling design or because populations are clustered has a number of substantive and technical advantages. From a substantive perspective, it circumvents the problems associated with ecological fallacy (the invalid transfer of results observed at the ecological level to the individual level), individualistic fallacy (which occurs by failing to take into account the ecology or context within which individual relationships happen), and atomistic fallacy (that arises when associations between individual variables are used to make inferences on the association between the analogous variables at the group/ecological level). The issue common to the above fallacies is the failure to recognize the existence of unique relationships being observable at multiple levels and each being important in its own right. Specifically, one can think of an individual relationship (for example, individuals who are poor are more likely to have poor health), an ecological/contextual relationship (for example, places with a high proportion of poor individuals are more likely to have higher rates of poor health), and an individual-contextual relationship (for example, the greatest likelihood of being in poor health is found for poor individuals in places with a high proportion of poor people). Multilevel models explicitly recognize the levelcontingent nature of relationships. From a technical perspective, the multilevel approach enables researchers to obtain statistically efficient estimates of fixed-effects regression coefficients. Specifically, using the clustering information, multilevel models provide correct standard errors, and thereby robust confidence intervals and significance tests. These generally will be more conservative than the traditional ones that are obtained simply by ignoring the presence of clustering. More broadly, multilevel models allow a more appropriate and realistic specification of complex variance structures at each level. Multilevel models are also precision weighted and capitalize on the advantages that accrue as a result of pooling information from all the neighborhoods to make inferences about specific neighborhoods. While the advances in statistical research and computing has shown the potential of multilevel methods for health and social behavioral research there are issues to be considered while developing and interpreting multilevel applications. First, it is important to clearly motivate and conceptualize the choice of higher levels in a multilevel analysis. Second, establishing the relative importance of context and composition is probably more apparent than real and necessary caution must be exercised while conceptualizing and interpreting the compositional and contextual sources of variation. Third, it is important that the sample of neighborhoods belong to well-defined population of neighborhoods such that the sample shares exchangeable properties that are essential for robust inferences. Fourth, it is important to ensure adequate sample size at all levels of analysis. In general, if the research focus is essentially on neighborhoods then clearly the analysis requires more neighborhoods (as compared to more individuals within a neighborhood).

18 54 S.V. Subramanian Lastly, the ability of multilevel models to make causal inferences is limited and innovative strategies including randomized neighborhood-level research designs (via trials or natural experiments) in combination with multilevel analytical strategy may be required to convincingly demonstrate causal effects of social contexts such as neighborhoods. References Aitkin M, Anderson DR, Hinde J (1981) Statistical modelling of data on teaching styles (with discussion). J Roy Stat Soc A 144(4): Bennett N (1976) Teaching styles and pupil progress. Open Books, London Best N, Spiegelhalter DJ, Thomas A, Brayne CEG (1996) Bayesian analysis of realistically complex models. J Roy Stat Soc A 159():3-34 Blakely TA, Subramanian SV (006) Multilevel studies. In Oakes M, Kaufman J (eds) Methods for social epidemiology. Jossey Bass, San Francisco, pp Blakely TA, Woodward AJ (000). Ecological effects in multi-level studies. J Epid Comm Health 54(5): Browne WJ, Subramanian SV, Jones K, Goldstein H (005) Variance partitioning in multilevel logistic models that exhibit overdispersion. J Roy Stat Soc A168(3): Diez Roux AV (00) A glossary for multilevel analysis. J Epid Comm Health 56(8): Fielding A (004) The role of the Hausman test and whether higher level effects should be treated as random or fixed. Multil Mode Newsl 16():3-9 Goldstein H (003) Multilevel statistical models. Edward Arnold, London Goldstein H, Browne WJ, Rasbash J (00) Partitioning variation in multilevel models. Underst Stat 1(4):3-3 Jones K, Bullen N (1994) Contextual models of urban house prices: a comparison of fixedand random-coefficient models developed by expansion. Econ Geogr 70(3):5-7 Kawachi I, Subramanian SV (006) Measuring and modeling the social and geographic context of trauma: a multilevel modeling approach. J Trauma Stress 19(): Kolaczyk ED, Ju J, Gopal S (005) Multiscale, multigranular statistical image segmentation. J Am Stat Assoc 100: Langford IH, Bentham G, McDonald AL (1998) Multilevel modelling of geographically aggregated health data: a case study on malignant melanoma mortality and UV exposure in the European Community. Stat Med 17(1):41-57 Lawson AB (001) Statistical methods in spatial epidemiology (nd edition). Wiley, New York, Chichester, Toronto and Brisbane Leyland AH (005) Assessing the impact of mobility on health: Implications for life course epidemiology. J Epid Comm Health 59():90-91 Moon G, Subramanian SV, Jones K, Duncan C, Twigg L (005) Area-based studies and the evaluation of multilevel influences on health outcomes. In Bowling A, Ebrahim S (eds) Handbook of health research methods: investigation, measurement and analysis. Open University Press, Berkshire [UK], pp.66-9 Raudenbush SW (003). The quantitative assessment of neighborhood social environment. In Kawachi I, Berkman LF (eds) Neighborhoods and health. Oxford University Press, New York, pp Raudenbush SW, Bryk A (00) Hierarchical linear models: applications and data analysis methods. Sage, Thousand Oaks [CA]

19 C.7 Multilevel modeling 55 Subramanian SV (004a) Multilevel methods, theory and analysis. In Anderson N (ed) Encyclopedia on health and behavior. Sage, Thousand Oaks [CA], pp Subramanian SV (004b) The relevance of multilevel statistical methods for identifying causal neighborhood effects. Soc Sci Med 58(10): Subramanian SV, Glymour MM, Kawachi I (007) Identifying causal ecologic effects on health: a methodologic assessment. In Galea S (ed) Macrosocial determinants of population health. Springer, New York, pp Subramanian SV, Jones K, Duncan C (003) Multilevel methods for public health research. In Kawachi I, Berkman LF (eds) Neighborhoods and health. Oxford University Press, New York, pp Subramanian SV, Jones K, Kaddour A, Krieger N (009) Revisiting Robinson: the perils of individualistic and ecologic fallacy. Int J Epidem 38 ():34-360

Understanding and Applying Multilevel Models in Maternal and Child Health Epidemiology and Public Health

Understanding and Applying Multilevel Models in Maternal and Child Health Epidemiology and Public Health Adam C. Carle, M.A., Ph.D. adam.carle@cchmc.org Division of Health Policy and Clinical Effectiveness