Cutting-Edge Statistical Methods for a Life-Course Approach 1,2

Size: px
Start display at page:

Download "Cutting-Edge Statistical Methods for a Life-Course Approach 1,2"

Transcription

1 REVIEWS FROM ASN EB 2013 SYMPOSIA Cutting-Edge Statistical Methods for a Life-Course Approach 1,2 Kristen L. Bub* and Larissa K. Ferretti Auburn University, Auburn, AL ABSTRACT Advances in research methods, data collection and record keeping, and statistical software have substantially increased our ability to conduct rigorous research across the lifespan. In this article, we review a set of cutting-edge statistical methods that life-course researchers can use to rigorously address their research questions. For each technique, we describe the method, highlight the benefits and unique attributes of the strategy, offer a step-by-step guide on how to conduct the analysis, and illustrate the technique using data from the National Institute of Child Health and Human Development Study of Early Child Care and Youth Development. In addition, we recommend a set of technical and empirical readings for each technique. Our goal was not to address a substantive question of interest but instead to provide life-course researchers with a useful reference guide to cutting-edge statistical methods. Adv. Nutr. 5: 46 56, Introduction As the number of large-scale longitudinal studies grows and researchers consider an even broader range of structural, social, and cultural determinants of health-related outcomes, the need for statistical techniques that adequately address complex questions has also increased. In addition, the expanding interdisciplinary nature of our field demands greater statistical sophistication. However, even the most experienced researchers often lack the knowledge they need to use these new techniques, and, as a result, life-course research is sometimes based on less appropriate and less powerful techniques than are currently available. Indeed, a review of the literature indicates that the majority of existing life-course studies are cross-sectional, using cohorts of individuals of different ages to model an outcome across the lifespan. Those studies that are longitudinal frequently include only 2 time points for each individual or are analyzed as if they only include 2 time points (e.g., as a difference score). However, if we are to continue to move the field forward, methods that allow us to explore change over time in both continuous and dichotomous outcomes, combine a range of related indicators into a single construct, simultaneously 1 Presented at the symposium Life-Course Epidemiology in Nutrition and Chronic Disease Research: A Timely Discussion held 24 April 2013 at the American Society for Nutrition Scientific Sessions and Annual Meeting at Experimental Biology 2013 in Boston, MA. The symposium was sponsored by the American Society for Nutrition. A summary of this symposium was published in the September 2013 issue of Advances in Nutrition. 2 Author disclosures: K. L. Bub and L. K. Ferretti, no conflicts of interest. * To whom correspondence should be addressed. klb0018@auburn.edu. estimate numerous regression effects, or deal with issues of selection bias must become the norm and not the exception. As such, the purpose of this study is to provide an overview of a set of analytic techniques that may be particularly relevant for life-course researchers. These techniques are drawn from a range of other disciplines, including education, psychology, and economics, and offer a critical perspective on life-course research. Our intent is not to teach readers how to conduct these analyses; instead, our hope is to illustrate the kinds of questions we can be asking of our data and to provide researchers with the information and resources they need to get started. In the following, we review 6 modeling strategies that can be used to examine associations between a predictor or set of predictors and an outcome across the life course: 1) regression with covariates; 2) hazard modeling; 3) individual growth modeling; 4) structural equation modeling (SEM 3 ); 5) propensity score analysis (PSA); and 6) regression discontinuity design (RDD) analysis (for a summary of each technique, see Table 1). We realize that regression with covariates is a very familiar analytic strategy for lifecourse researchers and have included it here only as a comparison for other techniques. We also recognize that many life-course researchers are familiar with hazard modeling and so we only provided a brief description 3 Abbreviations used: CFI, comparative fit index; pre-k, pre-kindergarten; PSA, propensity score analysis; RDD, regression discontinuity design; RMSEA, root mean square error of approximation; SEM, structural equation modeling. 46 ã2014 American Society for Nutrition. Adv. Nutr. 5: 46 56, 2014; doi: /an

2 TABLE 1 Modeling strategies, purposes, sample equations, and statistical software packages for a life-course approach Technique Equations Explanation Program Command Regression with covariates Yi = b0 + b1x + b2z Yis the outcome for individual i Standard statistical packages, such as SAS (SAS Institute Inc.), SPSS (IBM), Stata (StataCorp) SAS, SPSS, Stata b 0 is the average outcome value when all else is 0 b1 is the association between predictor and outcome controlling for other covariates X is the predictor of interest Z is a vector of covariates Hazard modeling Logit h(tij) =[a1d1ij + a2d2ij +.ajdij] +[b1x1i + b2x2ij] Logit h(tij) is the individual i value of logit hazard at time j Individual growth modeling Structural equation modeling Propensity score analysis Regression discontinuity analysis Level 1: Yij =[p0i + p1i (Ageij)] + [eij] Level 1: Yij is the outcome for individual i at time j Level 2: p0i = g00 + g01hqi + z0i p0i is the initial level or intercept value for individual i when all else in the model is 0 Ageij is the linear rate of change for individual i at time j; centered at the first time point p 1i = g 10 + g 11 HQ i + z 1i Level 2: p 0i, p 1i are the individual growth parameters g01hqi, g11hqi is the relation between early home quality and individual growth parameters Step 1: T i = l 0 + l 1 X 1 + l 2 X 2.l n X n + d i Step 1: T i is the outcome in the logistic regression analysis (treatment) Step 2: Yi = b0 + b1ti + b2p1 + b3p2 +. bnpn + ei X1, X2, etc., are selection variables identified from the literature Step 2: Y i is the outcome for individual i b1 is the effect of predictor of interest, controlling for the propensity blocks T i is the treatment effect P 1, P 2, etc., represent matched groups within a given propensity block Standard statistical packages, such as SAS, SPSS, Stata Standard statistical packages, such as SAS, SPSS, Stata Specialized software packages, such as MPlus (Muthen & Muthen) and AMOS (IBM) Proc mixed in SAS, regression in SPSS, and xtmixed in Stata Proc logistic in SAS, logistic regression in SPSS, logistic in Stata Proc mixed in SAS, mixed in SPSS, and xtmixed in Stata Preferably Stata Pscore in Stata Yi = b0 + b1ti + g(xi 2 Xc) +ei Yi is the outcome for individual i Standard statistical packages, such as SAS, SPSS, Stata b1 is the causal effect of treatment Ti is a dichotomous variable representing treatment assignment X i 2 X c is the exogenous covariate used to determine treatment centered on the cutoff point BY for measurement model and ON for structural model in MPlus Proc mixed in SAS, regression in SPSS, and xtmixed in Stata Statistical methods 47

3 here. To illustrate 4 of the 6 techniques, we used data from the National Institutes of Child Health and Human Development Study of Early Child Care and Youth Development (1). We are not able to illustrate regression discontinuity analysis because we do not have an appropriate treatment variable in the dataset. However, we do provide several excellent examples from the literature. We chose to test the hypothesis that BMI at 15 y is associated with child birth weight and early home quality. BMI at 15 y was defined as the adolescent s weight (pounds) divided by the square of their height (inches) times 703. Child birth weight was measured in grams. Early home quality was a composite of 8 subscales measured when study children were aged 36 and 54 mo: 1) learning materials; 2) language stimulation; 3) physical environment; 4) parental responsivity; 5) learning stimulation; 6) modeling of social maturity; 7) variety in experience; and 8) acceptance of the child. Higher scores on the composite indicate better home quality. For 3 of the modeling strategies (regression with covariates, SEM, and PSA), we used birth weight and early home quality to predict BMI at 15 y; for individual growth modeling, we used birth weight and early home quality to predict changes in BMI between 24 mo and 15 y. Our purpose here is not to answer a substantive question of interest. Instead, our goal is to illustrate the method and to demonstrate how the answer to our question may vary across analytic strategies. However, we recommend that, whenever possible, researchers should consider using multiple modeling strategies to address their research questions because no single technique provides a solution to all problems (2). Regression with Covariates Regression with covariates is one of the most commonly used strategies across a wide range of disciplines. In general, the goal of regression analysis is to estimate the magnitude and direction of the association between a predictor or set of predictors and an outcome, controlling for an extensive set of covariates. In non-experimental studies, the inclusion of covariates is meant to help isolate the association between the predictor of interest and the outcome variable by modeling important sources of influence. Including covariates also helps rule out other plausible explanations for differences in the outcome. Nevertheless, a model can be over-controlled (i.e., include too many controls) or under-controlled (i.e., fail to include key controls). This leads to biased estimates of the association between a predictor and an outcome and reduces the generalizability of the researchers findings (3,4). Thus, both theory and existing literature must play a central role in the selection of the covariates that will be used in any analysis. To illustrate regression with covariates, we regressed the outcome, BMI at 15 y, on the predictors birth weight and early home quality. There was evidence that both variables predicted BMI at age 15 y (Table 2). Specifically, children who had lower birth weights (B = 0.001, P < 0.001) and higher-quality home environments during early childhood (B = 20.18, P < 0.001) had lower BMI scores at 15 y than TABLE 2 Results from regression with covariate analyses for BMI at 15 y (n = 833) 1 Model 1 Model 2 Birth weight 0.001* (0.000) 0.001* (0.000) Early home quality 20.18* (0.029) 20.10* (0.036) Male (0.331) Maternal education 20.20** (0.082) Poor (24 mo) 0.55 (0.609) Poor (36 mo) 1.86*** (0.602) Poor (54 mo) (0.588) R 2 statistic Data are expressed as parameter estimate (SD). *P, 0.001; **P, 0.05; ***P, children who had higher birth weights or children from lower-quality home environments. Next, to control for the possible effects of other family environment variables, we added to the model child sex (male), maternal education, and family poverty status at 24, 36, and 54 mo. Both birth weight and early home quality remained statistically significant predictors of BMI at 15 y. Examinations of the B values across models showed that the effect size of the predictors of interest reduced only slightly when the covariates were added to the model. Note, also, that some of the covariates were statistically significant as well. Although it has not always been common to present the uncontrolled associations between the predictor and outcome variable, in a recent article published in Psychological Science, Simmons et al. (5) recommended that researchers present both the controlled and uncontrolled. The authors suggest that doing so makes it clear the extent to which the association of interest is dependent on the presence of a covariate. Model fit is typically described using the R 2 statistic, with higher values indicating better fit, and model comparisons are made by calculating a change in R 2. We recommend the study by Tabachnick and Fidell (4) for some additional reading on regression with covariates. For a more technical reading on multiple regression analysis, see the study by Rubinfeld (6). All statistical software packages include commands for regression with covariates, and all are comparable. Hazard Modeling Like regression with covariates, hazard modeling is a relatively common statistical technique for life-course researchers. As such, we devoted only a small amount of time to the topic. Broadly speaking, hazard modeling, also referred to as survival analysis or event history analysis, is used to describe event occurrence and/or reoccurrence over time, as well as to identify determinants of these event histories (7). Researchers interested in whether an event occurs (e.g., relapse to drug or alcohol use, diagnosis of a disease, becoming overweight or obese) and if so when it occurs should consider using hazard modeling to address their research questions. Hazard modeling produces estimates of the unique probability that individuals will experience an event during a specified time frame (referred to as hazard probability), as well as estimates of the cumulative probability that 48 Symposium

4 individuals will not experience the event over time (referred to as survival probability). In addition, associations between event occurrence and time-invariant (i.e., takes on the same value at each assessment) or time-varying (i.e., can take on a different value at each assessment) predictors can be estimated. One of the greatest challenges faced by researchers studying event occurrence is that participants may not experience the event under investigation during the study period. As a result, the researcher does not know whether the individual ever experienced the event; they only know that, if they did experience the event, it was outside of the data collection period (8). These participants are referred to as censored individuals. In the past, censored cases were often dropped from the analysis, and thus estimates of event occurrence and timing were based on only those individuals who actually experienced the event. The result was often an underestimate of average time to event occurrence. Others have assigned censored individuals an event occurrence time usually equivalent to the last time period thereby including these individuals in their analysis. However, the result of this strategy was often an overestimate of the average time to event occurrence (7). Hazard modeling addresses the problem by assigning censored cases a 0 for the event and then including them in all analyses. That is, individuals who do not experience the event contribute to the estimation of event occurrence during each assessment period. The result is a more precise estimate of the probability of event occurrence because estimates are based on all individuals, i.e., those that experienced the event and those who did not. Hazard modeling involves several steps. First, the researcher must create a person-period dataset with a set of time dummy variables (for more information on the person-period dataset, see the section on individual growth modeling below), as well as a dichotomous event variable. The time dummies take on a value of 1 during each assessment to which an individual contributes and a 0 during all other time periods (e.g., for assessment 1, time dummy 1 gets a 1 and assessments 2, 3, and 4 get 0 ; for assessment 2, time dummy 2 gets a 1 and assessments 1, 3, and 4 get 0 ). The event variable indicates whether or not the individual experienced the event of interest during that time period, given that he or she had not already experienced the event. Next, using the person-period dataset, the researcher fits a basic hazard model using logistic regression. That is, the dichotomous event variable is regressed on the time dummy variables (e.g., 4 dummies if there are 4 time periods), and a set of logit values are obtained. These logits are then used to calculate either the hazard (i.e., the unique probability of experiencing the event during that assessment period) or survival (i.e., cumulative probability that an individual will not experience the event) probability. It is important to note that, to obtain estimates for all of the time dummy variables, the researcher must specify that the intercept should not be estimated. As a final step, determinants of risk (i.e., predictors) are added to the model. Predictors can be time invariant, i.e., they take on the same value at each assessment point, or time varying, i.e., they can take on a different value at each assessment point. Interactions between these predictors and the time dummies can also be included to determine whether the effect of a predictor is consistent across time or whether there are periods of greater and lesser risk. Model fit is compared using a deviance statistic, with smaller values indicating a better fitting model. For additional technical reading on hazard modeling, see the studies by Singer and Willett (7), Alison (9), and Willett and Singer (10). For applications of hazard modeling, see the studies by Willett and Singer (11), Rank and Hirschl (12), and Gupta et al. (13). Individual Growth Modeling Life-course researchers are frequently interested in understanding change over time in 1 or more continuous outcomes. Individual growth modeling refers to a family of methods used to describe development in an outcome over time to compare patterns of change across groups and to identify determinants of change (7,14). This set of techniques provides basic information on population estimates of the average level of the outcome at a given time (intercept), the average linear rate of change or growth over time (slope), and the average nonlinear rate of change (quadratic, cubic, etc.). In addition, these techniques can tell us about the effects that substantive predictors have on the intercept and slope. Because repeated assessments of an individual are necessarily correlated, failure to adequately account for these correlations can result in biased estimates of the intercept and slope (7). However, individual growth modeling methods typically account for these correlations by nesting repeated assessments within individuals, although there is considerable variation in how effectively they do so (14). There are a variety of approaches to individual growth modeling, including univariate and multivariate repeated measures, hierarchical linear modeling, latent growth-curve modeling, growth mixture modeling, and latent class analysis. Each of these techniques varies, to some extent, in whether they allow for repeated assessments of a predictor (i.e., time-varying predictors), whether they can handle missing data, whether they allow for individual differences in the intercept as well as the slope, whether they require equally spaced assessments for each individual, and whether they handle measurement error (14). For example, univariate repeated-measures modeling only estimates individual differences in the intercept but not in the rate of change (or slope). In contrast, multivariate repeated-measures modeling allows for individual differences in both the intercept and slope. That is, in addition to estimating fixed effects for the intercept and slope, multivariate repeated-measures modeling also estimates random effects (often referred to as variance components) for the intercept and slope. Latent growth-curve modeling separates an individual s true score from measurement error, thereby producing more precise estimates of the population average intercept and rate of Statistical methods 49

5 change (15). In general, however, researchers will likely reach the same conclusion regardless of which individual growth modeling method they use, and, thus, selection of a specific technique is driven primarily by the researcher s goals. Estimating individual growth curves involves numerous steps. For techniques such as univariate and multivariate repeated-measures or hierarchical linear modeling, the researcher must begin by creating a person-period dataset (note that individual growth modeling techniques such as latent growth-curve modeling, growth mixture modeling, and latent class analysis do not always require this structure) (7). Data are commonly stored in a way that every individual in the dataset has a single line of data (often referred to as a person-oriented dataset). Repeated assessments of the outcome (or predictors) are represented by multiple columns that are differentiated by a time or assessment index (e.g., BMI1, BMI2, BMI3, etc.). In contrast, in a person-period dataset, every individual has multiple lines of data corresponding to the number of assessment periods. For example, in a longitudinal study with 4 assessment points, every individual would have 4 lines of data. Every variable that cantakeonadifferentvalueateachassessmentwillbe represented by a single column rather than by multiple columns. Once the dataset has been transformed, the researcher then fits a set of unconditional models. The unconditional means model contains only an intercept and is used to partition the outcome variance into within-individual and between-individual variance (7). This information can be used to inform model building. For example, higher within-individual variation suggests a need for more timevarying predictors (and fewer time-invariant predictors), whereas higher between-individual variation suggests a need for both. The unconditional growth model contains both an intercept and a slope and tells us whether there is statistically significant linear change in our outcome variable over time. Decisions about where to center time (e.g., beginning, middle, or end of study) must be made before fitting the model (7). Assuming that there is significant linear growth and adequate data, nonlinear growth terms (e.g., quadratic, cubic, quartic, etc.) can then be added to the model. To estimate a linear growth model, you need at least 3 time points; to estimate a quadratic model, at least 4 time points are needed; to estimate a cubic model, at least 5 points are needed; and so on. As a final step, determinants or predictors of the intercept and slope are added to the model. Note that the inclusion of the main effect of a predictor in the model simply tests the effect of that predictor on the intercept; to test the effects of a predictor on the slope, an interaction between the predictor and time must also be added to the model. Predictors of the intercept and slope are commonly referred to as level 2 predictors in hierarchical linear modeling techniques. Nested models are compared using the deviance statistic, with smaller values indicating a better fitting model. Models that are not nested can be compared using the Akaike information criterion and Bayesian information criterion statistics, and again, smaller values indicate a better fit. To illustrate multivariate repeated-measures analysis, one of the many individual growth modeling techniques, we return to our working example of the association between child birth weight, the early home environment, and BMI. We began by restructuring our dataset from a personoriented dataset to a person-period dataset. Next, we centered our time variable at the last assessment point by subtracting 180 mo (~15 y) from each time period. This enables us to interpret our intercept as the average BMI at 15 y. Then, using a multilevel procedure, we fit an unconditional linear growth model to determine whether there was statistically significant linear growth in BMI from 24 mo to 15 y. It is here that the researcher would also test for nonlinear growth before adding predictors to the model. However, because of space limitations in here, we focused only on linear growth. Finally, we predicted growth in BMI first from our question predictors (i.e., birth weight and early home environment) and then from our question predictors and control variables. Not surprisingly, there was evidence of linear change in children s BMI between 24 mo and 15 y. On average, adolescents BMI was (P < 0.001) at 15 y and increased between 24 mo and 15 y at a rate of 0.05 (P < 0.001) units per month (Table 3). Both birth weight and the early home environment significantly predicted the average level of BMI at 15 y, as well as the rate of change in BMI across childhood and adolescence. Higher birth weight was associated with a higher BMI at 15 y, as well as more rapid increases in BMI over time. In contrast, the quality of the early home environment was negatively associated with level and rate of change in BMI such that a higher-quality early home environment predicted a lower BMI at 15 y, as well as a less rapid increase in BMI over time (Table 3, Model 1). To control for TABLE 3 Individual growth model predicting changes in BMI from 24 mo to 15 y from birth weight and quality of the early home environment 1 Model 1 Model 2 Intercept (p 0i ) 25.72* (1.05) 26.26* (1.09) Birth weight 0.001* (0.0002) 0.001* (0.0002) Early home quality * (0.019) * (0.022) Male (0.212) Maternal education * (0.051) Poor (24 mo) 0.753* (0.203) Rate of change (p 1i ) 0.079* (0.006) 0.078* (0.006) Birth weight * (0.000) * (0.000) Early home quality * (0.0001) * (0.0001) Male ** (0.001) Maternal education * (0.0003) Poor (24 mo) 0.009* (0.002) Variance components Within individual Between individual Fit statistics R 2 within R 2 between R 2 overall Data are expressed as parameter estimate (SD). *P, 0.001; **P, Symposium

6 other possible explanatory factors, we added to the model child sex, maternal education, and family poverty status over time. Birth weight and quality of the early home environment remained statistically significant predictors of BMI at 15 y, as well as change in BMI across childhood and adolescence (Table 3, Model 2). Note that, with the exception of the effect of child sex on the intercept, our control variables were also statistically significant. For more information on individual growth modeling techniques more generally, see the studies by Singer and Willett (7) and Burchinal et al. (14). For specific information on hierarchical linear modeling, see the study by Bryk and Raudenbush (16). For specific information on latent growth modeling, see the studies by Willett and Bub (15) or Willett and Sayer (17). Most statistical software packages include commands for individual growth modeling, and all are comparable. SEM SEM, also referred to as covariance structure analysis, is an extremely versatile tool that can be used to address very complex research questions. SEM is a very large topic, and much of the works on SEM are technical in nature. As such, we attempted to provide a very brief overview of the many ways in which SEM can be used. Common analytic strategies that can be performed within the SEM framework include confirmatory factor analysis, multivariate regression analysis, simultaneous equation estimation, path analysis, latent growth modeling, and multilevel modeling. One of the key benefits of SEM is that it explicitly models measurement error, which commonly occurs when social, psychologic, and health-related indicators are measured. This disattenuation of observed variables from measurement error results in unbiased (or less biased) estimates of the relation between 2 or more variables (18). Thus, SEM is sometimes referred to as a causal effects technique. Another benefit of SEM is that the researcher can examine multiple outcomes simultaneously rather than fitting separate models for each outcome of interest. Similarly, SEM can be used to model the effects of change in one domain on changes in another domain, something no other analytic tool can handle. Clearly, the benefits of SEM are great. Not surprisingly, the kind of information obtained from SEM depends on the type of analysis conducted. For example, researchers interested in creating a latent construct (i.e., a theoretical model of something that has not been measured directly) will learn not only whether the indicators they have selected offer a reasonable representation of the construct they were attempting to construct but also the extent to which each indicator contributes to the construct. Similarly, for those who use SEM to estimate a regression or a set of simultaneous equations, parameter estimates describing the magnitude and direction of association between predictors or latent constructs and one or more outcomes are derived. When latent growth modeling is conducted, estimates of the true intercept and rate of change are obtained. Assessing model fit is one of the most important steps in SEM, although there is little consensus on which indices to use and what values suggest a good fit. The most common fit indices include the x 2 statistic, the comparative fit index (CFI), and root mean square error of approximation (RMSEA). The x 2 statistic, sometimes referred to as the badness of fit statistic, provides an index of the discrepancy between the sample and the fitted covariance matrices (19). When the match is poor, the x 2 statistic will be larger, but when the match is good, the x 2 statistic will be smaller. In general, model fit is considered good if the x 2 statistic is nonsignificant. Note that the x 2 statistic is sensitive to sample size; when large samples are used, the x 2 statistic will frequently lead the researcher to reject the model. The CFI essentially compares the sample covariance matrix with a hypothetical null model (20). Because the CFI is minimally affected by sample size, it is a frequently reported fit statistic (21). CFI values ranging from 0.90 to 1 have been said to provide a fair fit to the data, although a more conservative value of 0.95 is now being recommended (18). Finally, the RMSEA, which is often considered one of the most informative fit indices, provides an estimate of how well the model fits the population covariance matrix (22). The RMSEA is sensitive to the number of parameters being estimated, and thus more parsimonious models will typically have lower RMSEA values. Values <0.10 are said to provide adequate fit to the data, although more recent cutoff values are 0.06 or 0.07 (18,23). It is generally recommended that researchers report multiple fit indices because each index is sensitive to different elements of the model. It is not uncommon to obtain model fit statistics that indicate a moderate to poor fit to the data. Before concluding that the model is not viable, there are a number of strategies the researcher can use to try to improve model fit. General strategies include dropping indicators if the factor loadings are small or too large (typically determined by the R 2 statistic), setting error variances to 0 if they are small or negative and nonsignificant, and fixing the correlations among indicators and/or errors to be 0 when there is no theoretical or statistical reason they should be correlated. When there are a large number of items (as we have here), it is worth considering whether multiple factors might better represent the constructs of interest. In instances in which multiple latent constructs are involved, it is a good idea to estimate each one separately to determine whether one construct is more problematic than the other. Software programs, such as MPlus, do provide a set of modification indices, which can be used to improve model fit. More specifically, modification indices offer guidance for parameters that can be fixed, freed, or removed to improve model fit, but they should not be used without careful consideration. For a useful review of fit indices and model respecification, see the study by Hooper et al. (24). To estimate a latent construct using SEM, often referred to as a measurement model, the researcher must begin by identifying the observed variables he or she would like to use to represent this abstract or unmeasured construct. Statistical methods 51

7 Although a latent construct can be represented by as few as 2 indicators, model fit tends to be better when more indicators are used. Thus, we recommend using at least 3 measured variables. Once the indicators have been identified, the researcher must select 1 to serve as the scaling factor for the latent construct. The factor loading for this variable will be fixed to 1 (this is done by placing it first in the list of variables), and the factor loadings (or relative contribution) for all other variables will be estimated. Once the measurement model has been estimated, the researcher can use the latent construct to predict one or more observed variables (or use one or more observed variables to predict the latent construct). That is, the observed variable is simply regressed on the latent construct (or vice versa). One of the key benefits of SEM over other regression techniques is that, with SEM, we can simultaneously estimate the relations between a predictor (or latent construct) and multiple outcomes (or latent constructs). To do this, the researcher specifies the latent construct in the first line of the syntax and then specifies the relation of interest in the second line of syntax. This portion of the model is typically referred to as the structural model. Common extensions of SEM include investigating interactions between 2 observed variables, an observed variable and a latent construct, or 2 latent constructs and exploring path differences across groups using a multigroup analysis. To illustrate the SEM technique, we return to our working example. In previous analyses, the early home quality variable was created by averaging total home quality scores at 36 and 54 mo. In this analysis, we began by creating a latent construct representing early home quality that comprised 16 observed variables. Indicators included the following: 1) learning materials; 2) language stimulation; 3) physical environment; 4) parental responsivity; 5) learning stimulation; 6) modeling of social maturity; 7) variety in experience; and 8) acceptance of the child measured when children were aged 36 and 54 mo. The factor loading for learning materials at 36 mo was fixed to 1 to provide the scaling unit. The 15 remaining observed variables significantly loaded onto the early home quality factor, although they did not contribute to the construct equally (for factor loadings, see Table 4). More specifically, B values ranged from 0.66 (language stimulation at 36 mo) to 0.31 (acceptance at 54 mo). Examination of the fit statistics indicated an adequate, although not ideal, fit to the data (x 2 = , df = 98, P = 0.00; CFI = 0.88; RMSEA = 0.07, P = 0.00), and thus a single latent construct representing early home quality was retained for subsequent analyses. Next, we regressed 15-y BMI on the latent construct representing early home quality, as well as the observed indicators for birth weight, child sex (male), maternal education, and poverty status at 24, 36, and 54 mo. Once again, there was evidence that birth weight and the latent construct representing early home quality predicted BMI at 15 y (Fig. 1). As was the case with the regression with covariates analysis, children who had lower birth weights (B =0.13,P < 0.001) and higher-quality home environments (B = 20.14, P < 0.01) TABLE 4 Estimated factor loadings for early home quality measurement model 1 Unstandardized Factor Loading Standardized Factor Loading Learning materials, 36 mo 1.00 (0.000) 0.78* (0.018) Language stimulation, 36 mo 0.39* (0.021) 0.66* (0.023) Physical environment, 36 mo 0.31* (0.024) 0.47* (0.031) Responsivity, 36 mo 0.36* (0.026) 0.52* (0.029) Academic stimulation, 36 mo 0.41* (0.023) 0.64* (0.024) Modeling, 36 mo 0.29* (0.022) 0.51* (0.029) Variety, 36 mo 0.52* (0.028) 0.68* (0.022) Acceptance, 36 mo 0.17* (0.017) 0.39* (0.033) Learning materials, 54 mo 0.47* (0.025) 0.61* (0.027) Language stimulation, 36 mo 0.16* (0.013) 0.45* (0.032) Physical environment, 36 mo 0.28* (0.020) 0.52* (0.029) Responsivity, 54 mo 0.25* (0.026) 0.37* (0.034) Academic stimulation, 54 mo 0.25* (0.021) 0.47* (0.031) Modeling, 54 mo 0.25* (0.020) 0.47* (0.031) Variety, 54 mo 0.41* (0.025) 0.61* (0.026) Acceptance, 54 mo 0.11* (0.014) 0.31* (0.036) Fit statistics x 2 (df) * (98) CFI 0.88 RMSEA 0.07 (P, 0.001) 1 Data are expressed as factor loading (SE). CFI, comparative fit index; RMSEA, root mean square error of approximation. *P, had lower BMI scores at 15 y than other children. Again, some of the covariates were statistically significant as well. Examination of the fit statistics indicated that the model provided an adequate (although not ideal) fit to the data (x 2 = , df = 203, P = 0.00; CFI = 0.87; RMSEA = 0.06, P = 0.00). For additional technical readings on SEM, see the studies by Keiley et al. (25), Kline (26,27), and Muthén and Muthén (28). For applications of SEM, see the studies by Windle and Mason (29), (L.K. Ferretti, K.I. Bub, unpublished results), and Ferretti and Bub (30). Specialized software packages, such as MPlus, AMOS, and Stata, are required for SEM. PSA Even in the most well-designed studies, issues of selection or omitted variables bias can affect our findings. Thus, researchers across disciplines, including education and the social sciences, have begun to seek methods that allow them to more rigorously test the associations between their predictors of interest and their outcomes. One such technique is PSA (31). Broadly speaking, PSA models the probability that an individual will have a particular experience (e.g., high-quality home environment) or be in a treatment program (e.g., obesity-prevention program) given a set of background characteristics for that individual. Assuming that the model is appropriately specified (i.e., the background characteristics are correct), PSA produces unbiased estimates of the effect of the treatment on the outcome variable, corrected for selection effects (32). PSA is a 2-stage modeling process. In the first stage, logistic regression is used to predict treatment status from a set of background characteristics (note that treatment does not necessarily imply a formal treatment program; it can 52 Symposium

8 also reflect a specific experience like a high-quality home environment). Each participant receives a score indicating his or her unique probability of being in a given group (e.g., high-quality home environment). Individuals who are actually in the group should have high propensity scores given their background characteristics. Individuals with similar background characteristics who were not in the group should also have high propensity scores. In this way, PSA can be used to identify a reasonable comparison group. However, it should be noted that, in non-experimental studies, a researcher can never be sure that he or she has included all of the relevant background characteristics that influenced selection into a given group. Nevertheless, a careful consideration of the characteristics increases the chances that the majority of factors were included. In the second stage of modeling, the propensity scores are used to model associations between the predictor of interest and the outcome. One of 2 approaches can be used here. First, the continuous propensity score can be entered into a regression model to control for background or selection factors. This approach is not all that different from adding a vector of covariates to your regression model and thus is not used all that frequently. The second approach is to use the propensity scores to create groups who are matched on background characteristics and thus should be equally likely to be in a given group (e.g., high-quality early environment) as not (e.g., low-quality early environment). Groups are matched in that the mean values for all background variables within a block are similar regardless of treatment status. Once matched groups are obtained, separate regression models within each block are fitted and the coefficients for the predictor of interest are averaged across models to obtain a mean treatment effect (e.g., a mean effect of high-quality home environment). FIGURE 1 Fitted path diagram for birth weight and early home quality on BMI at 15 y (standardized results with SEs in parentheses). Note that the circle represents the latent construct for early home quality and comprises 16 indicators. The squares represent observed variables. CFI, Comparative Fit Index; RMSEA, Root Mean Square Error of Approximation. To illustrate PSA, we return to our working example of the association between early home quality and BMI at age 15 y. We began by dichotomizing the early home quality variable such that individuals who were in the top 75th percentile were given a score of 1 to indicate a high-quality home environment, and those below the 75th percentile received a score of 0. Note that any value could be used to dichotomize your variable of interest, although the decision is typically driven by theory and previous research; we selected the 75th percentile because the range of values was relatively limited and we wanted to maximize the differences across groups. We then examined our descriptive statistics on other child and family background variables (mothers and fathers education, maternal age at birth of study child, poverty status at 24 mo, and child s birth order) for the 2 quality groups. Next, we fit a logistic regression model in which we predicted the probability of being in a high-quality home environment from these background variables. In addition to calculating a propensity score for each individual (ranging from 0 to 1), PSA also creates a set of balanced blocks in which individuals in the block are equally likely to be in the treatment (in this case high-quality home environment) as not given a similar set of background characteristics. More specifically, through an iterative process, the continuous propensity score is divided into blocks. The number of blocks can be specified by the researcher or driven by the data. When specified by the researcher, the most common number of blocks requested is 5. Blocks are retained and the model is considered adequate when a t test reveals that there are no differences in the average propensity score for the treatment and control groups and when the number of treatment and control individuals in each group is sufficient. Within each block, the probability of being in the treatment (i.e., high-quality home environment) is identical. For Statistical methods 53

9 example, in our dataset, block 1 consists of all individuals with a propensity score <0.20, block 2 consists of individuals with propensity scores between 0.20 and 0.40, etc. The result is a set of blocks or strata in which some individuals are in high-quality home environments and some are not, but the means on each background characteristic are similar regardless of group membership. Finally, we fit 2 linear regression models: 1) using the propensity score as a control; and 2) using the matched blocks as controls. An examination of the descriptive statistics for those with high-quality home environments versus those without it revealed that mothers in the high-quality sample were more educated (mean = vs ), had partners who were more educated (mean = vs ), and were older when the study child was born (mean = vs ). Families with high-quality early home environments were less likely to be poor as well. No differences between groups were found for child s birth order. Next, we fit a regression model in which we predicted adolescents BMI at 15 y from birth weight, our dichotomous home quality variable representing a high-quality early home environment, and the continuous propensity score. Home quality was not a statistically significant predictor of BMI, suggesting that there are no differences in BMI between those who experienced a high-quality home environment and those who did not once we account for the probability of being in a high-quality environment (Table 5). Finally, we refit the regression model in which we included 6 dummy variables representing the matched propensity blocks rather than the continuous propensity score. Note that we requested 5 blocks, but these blocks did not meet the balancing properties described above (i.e., no difference in propensity scores between treatment and control groups within a block and/or too few individuals in the treatment and control groups within a block), and thus a sixth block was created. Again, high-quality home environment did not significantly predict BMI at 15 y, but 4 (blocks 3 6) of the 6 propensity blocks significantly predicted BMI, and 1 (the block 7) marginally predicted BMI. The negative and statistically significant coefficients for each propensity block suggest that, relative to block 1 TABLE 5 Results from the propensity score analysis for adolescent BMI at 15 y 1 Model 1 Model 2 Intercept 20.83* (1.21) 20.79* (1.18) Birth weight 0.001** (0.0003) 0.001** (0.0003) High-quality early home (0.412) (0.416) environment Continuous propensity score 25.51* (1.01) Block (0.489) Block ** (0.544) Block * (0.575) Block * (0.539) Block * (0.842) Block (2.44) 1 Data are expressed as parameter estimate (SD). *P, 0.001; **P, 0.01; ***P, 0.05; 1 P, or the excluded group (i.e., the individuals with the lowest likelihood of being in a high-quality home environment), individuals with a higher probability of being in a home environment tend to have lower BMIs at 15 y. For an overview of PSA, see the studies by Murnane and Willett (32) and McCartney et al. (33). For more technical readings on PSA, see the studies by D Agostino (34), Rosenbaum and Rubin (35), and Rubin (36). For an application of PSA, see the study by Hill et al. (37). The most appropriate software program for estimating PSA is Stata. Regression Discontinuity The RDD is one of the most powerful quasi-experimental designs for identifying causal effects in non-experimental studies (38). Because assignment to a treatment or intervention is often endogenous (i.e., correlated with unmeasured variables and thus with the error term), estimating the causal effect of such a program can be difficult. To deal with this endogeneity, researchers using RDD assign individuals to either a treatment or control group based on a cutoff score for a given predictor variable. As such, treatment assignment is known (rather than unknown), and any differences between the treatment and control group on a subsequent outcome can be interpreted as the average treatment effect (38). For example, using our BMI data, children whose BMI is above the obese value might be assigned to participate in a nutrition intervention, whereas children whose BMI falls below the obese value would not receive the intervention. By comparing the outcomes (e.g., BMI in adolescence or early adulthood) of children just above the cutoff (i.e., the treatment group) with the outcomes of children just below the cutoff (i.e., the control group), we can determine the causal effect of our intervention, because any differences in the outcome are presumably attributable to the intervention alone. For an illustrated example of the RDD, see Figure 2. One additional benefit of RDD is that it reduces, if not eliminates, many of the ethical challenges associated with random assignment studies. Because it is often difficult to randomly assign individuals to a given condition (e.g., we cannot ethically assign someone to an obesity condition and make them gain weight), the RDD allows researchers in health-related fields to compute stronger causal estimates of treatment effects. RDDs can be either sharp or fuzzy. In a sharp design, treatment is determined only by a cutoff score on some observable variable. Causal inferences made from a sharp RDD closely resemble those obtained in a randomized experiment (39). In a fuzzy design, treatment may be determined by a set of unknown characteristics (or by a single characteristic), but the probability of receiving the treatment is discontinuous at the cutoff (40). In other words, exceptions to cutoff rules (e.g., BMI values 10% of a point below the obese value might be assigned to the treatment because the child needs the nutrition intervention) can lead to biased estimates, but as long as the likelihood of receiving the treatment is discontinuous, the RDD holds and estimates of the average treatment effect can be computed. 54 Symposium

10 FIGURE 2 Sample figure from a regression discontinuity analysis examining posttest scores on an outcome from pretest scores on the same construct. Solid black line represents the hypothetical control group, and the solid gray line represents the hypothetical treatment group. Like PSA, RDD is a 2-stage process. First, the researcher must identify a cutoff score on a predictor variable and determine which participants will serve as the treatment group (e.g., those above the cutoff) and which participants will serve as the control group (e.g., those below the cutoff). Based on this determination, the researcher creates a dichotomous treatment variable (i.e., 1 for treatment and 0 for control). In the second stage, the researcher fits a basic regression model in which the posttest score is regressed on the treatment indicator as well as the pretest score centered around the cutoff score. By centering the pretest score around the cutoff, the intercept is equal to the cutoff score. As a final step, additional covariates can be added to the regression model. As we noted previously, we are not able to demonstrate the RDD with our working example because we do not have a treatment program per se. Nevertheless, the field of education offers numerous effective examples of RDD. For example, to investigate the effect of a new school reform effort in the Chicago Public Schools, Jacob and Lefgren (41) examined the causal impact of summer school on children s reading and mathematics skills. They began by assigning children to summer school based on their pretest reading and math scores. Children above the cutoff did not receive remedial services (i.e., summer school), whereas children below the cutoff were required to attend summer school. They then compared the posttest reading and mathematics scores of children who received the treatment with the scores of children who did not receive the treatment and found that children who attended summer school had higher tests scores. They concluded that summer school lead to the differences in children s test performance. Gormley et al. (42) offer another excellent example of the application of RDD to education policy issues. Using variation in state-mandated age cutoffs for entering kindergarten, the authors investigated the impact of the Oklahoma universal pre-kindergarten (pre-k) program on children s academic achievement in kindergarten. More specifically, they compared the outcomes of young kindergarten children who had just completed pre-k (i.e., treatment group) with the outcomes of old pre-k children who were just beginning pre-k (i.e., control group). They found that the universal pre-k program led to increases (as much as 3 points) in a range of language, literacy, and mathematics outcomes. Furthermore, the benefits were conferred to a diverse group of children, including ethnic minority and low-income children. For an excellent overview of RDDs, see the study by Murnane and Willett (32). For those interested in more technical readings on RDD, see the studies by Hahn et al. (40) or Shadish et al. (43). Regression discontinuity analyses can be conducted in any software package, and all are relatively comparable. Conclusion In a field that is rapidly changing, the need to conduct rigorous research across the lifespan is expanding exponentially. In many instances, regression with covariates is insufficient for addressing the complex interdisciplinary questions that life-course researchers are investigating. Techniques that allow use to explore changes in an outcome over time or that carefully control for selection bias are necessary if we are going to produce results that accurately describe development or appropriately inform practice. In this study, we reviewed a set of statistical methods that life-course researchers can use to address their questions about health and development from infancy to older adulthood. Rather than offering a technical discussion of each analytic strategy, our goal was to provide a user-friendly guide to these techniques and offer additional resources for those who would like to learn more. We conclude with 2 recommendations. First, researchers should take the time to carefully consider the analytic strategy they plan use to address their questions. Sometimes, the data lend themselves to 1 technique over another; often, however, 1 technique is no better than another, and thus the selection of a strategy that leads to warranted conclusions is a critical step in moving our fields forward. Second, although each technique can be used alone, we encourage researchers to consider using multiple strategies to address their research questions, as we have done here. This approach to data analysis is being used more frequently, and a comparison of findings across strategies is often quite informative. Acknowledgments All authors read and approved the final manuscript. Literature Cited 1. NICHD Early Child Care Research Network. Child care and child development: results from the NICHD Study of Early Child Care and Youth Development. New York: Guilford Press; Winship C, Morgan SL. The estimation of causal effects from observational data. Annu Rev Sociol. 1999;25: Newcombe NS. Some controls control too much. Child Dev. 2003;74: Statistical methods 55

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

Regression Discontinuity Analysis

Regression Discontinuity Analysis Regression Discontinuity Analysis A researcher wants to determine whether tutoring underachieving middle school students improves their math grades. Another wonders whether providing financial aid to low-income

More information

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research 2012 CCPRC Meeting Methodology Presession Workshop October 23, 2012, 2:00-5:00 p.m. Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy

More information

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling Olli-Pekka Kauppila Daria Kautto Session VI, September 20 2017 Learning objectives 1. Get familiar with the basic idea

More information

11/24/2017. Do not imply a cause-and-effect relationship

11/24/2017. Do not imply a cause-and-effect relationship Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are highly extraverted people less afraid of rejection

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

UNIVERSITY OF FLORIDA 2010

UNIVERSITY OF FLORIDA 2010 COMPARISON OF LATENT GROWTH MODELS WITH DIFFERENT TIME CODING STRATEGIES IN THE PRESENCE OF INTER-INDIVIDUALLY VARYING TIME POINTS OF MEASUREMENT By BURAK AYDIN A THESIS PRESENTED TO THE GRADUATE SCHOOL

More information

Methods for Addressing Selection Bias in Observational Studies

Methods for Addressing Selection Bias in Observational Studies Methods for Addressing Selection Bias in Observational Studies Susan L. Ettner, Ph.D. Professor Division of General Internal Medicine and Health Services Research, UCLA What is Selection Bias? In the regression

More information

Understanding and Applying Multilevel Models in Maternal and Child Health Epidemiology and Public Health

Understanding and Applying Multilevel Models in Maternal and Child Health Epidemiology and Public Health Understanding and Applying Multilevel Models in Maternal and Child Health Epidemiology and Public Health Adam C. Carle, M.A., Ph.D. adam.carle@cchmc.org Division of Health Policy and Clinical Effectiveness

More information

Political Science 15, Winter 2014 Final Review

Political Science 15, Winter 2014 Final Review Political Science 15, Winter 2014 Final Review The major topics covered in class are listed below. You should also take a look at the readings listed on the class website. Studying Politics Scientifically

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

Propensity Score Analysis Shenyang Guo, Ph.D.

Propensity Score Analysis Shenyang Guo, Ph.D. Propensity Score Analysis Shenyang Guo, Ph.D. Upcoming Seminar: April 7-8, 2017, Philadelphia, Pennsylvania Propensity Score Analysis 1. Overview 1.1 Observational studies and challenges 1.2 Why and when

More information

What is Multilevel Modelling Vs Fixed Effects. Will Cook Social Statistics

What is Multilevel Modelling Vs Fixed Effects. Will Cook Social Statistics What is Multilevel Modelling Vs Fixed Effects Will Cook Social Statistics Intro Multilevel models are commonly employed in the social sciences with data that is hierarchically structured Estimated effects

More information

Manuscript Presentation: Writing up APIM Results

Manuscript Presentation: Writing up APIM Results Manuscript Presentation: Writing up APIM Results Example Articles Distinguishable Dyads Chung, M. L., Moser, D. K., Lennie, T. A., & Rayens, M. (2009). The effects of depressive symptoms and anxiety on

More information

Quantitative Methods. Lonnie Berger. Research Training Policy Practice

Quantitative Methods. Lonnie Berger. Research Training Policy Practice Quantitative Methods Lonnie Berger Research Training Policy Practice Defining Quantitative and Qualitative Research Quantitative methods: systematic empirical investigation of observable phenomena via

More information

The Impact of Relative Standards on the Propensity to Disclose. Alessandro Acquisti, Leslie K. John, George Loewenstein WEB APPENDIX

The Impact of Relative Standards on the Propensity to Disclose. Alessandro Acquisti, Leslie K. John, George Loewenstein WEB APPENDIX The Impact of Relative Standards on the Propensity to Disclose Alessandro Acquisti, Leslie K. John, George Loewenstein WEB APPENDIX 2 Web Appendix A: Panel data estimation approach As noted in the main

More information

Business Statistics Probability

Business Statistics Probability Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

Moving beyond regression toward causality:

Moving beyond regression toward causality: Moving beyond regression toward causality: INTRODUCING ADVANCED STATISTICAL METHODS TO ADVANCE SEXUAL VIOLENCE RESEARCH Regine Haardörfer, Ph.D. Emory University rhaardo@emory.edu OR Regine.Haardoerfer@Emory.edu

More information

Examining the efficacy of the Theory of Planned Behavior (TPB) to understand pre-service teachers intention to use technology*

Examining the efficacy of the Theory of Planned Behavior (TPB) to understand pre-service teachers intention to use technology* Examining the efficacy of the Theory of Planned Behavior (TPB) to understand pre-service teachers intention to use technology* Timothy Teo & Chwee Beng Lee Nanyang Technology University Singapore This

More information

THE APPLICATION OF ORDINAL LOGISTIC HEIRARCHICAL LINEAR MODELING IN ITEM RESPONSE THEORY FOR THE PURPOSES OF DIFFERENTIAL ITEM FUNCTIONING DETECTION

THE APPLICATION OF ORDINAL LOGISTIC HEIRARCHICAL LINEAR MODELING IN ITEM RESPONSE THEORY FOR THE PURPOSES OF DIFFERENTIAL ITEM FUNCTIONING DETECTION THE APPLICATION OF ORDINAL LOGISTIC HEIRARCHICAL LINEAR MODELING IN ITEM RESPONSE THEORY FOR THE PURPOSES OF DIFFERENTIAL ITEM FUNCTIONING DETECTION Timothy Olsen HLM II Dr. Gagne ABSTRACT Recent advances

More information

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

3 CONCEPTUAL FOUNDATIONS OF STATISTICS 3 CONCEPTUAL FOUNDATIONS OF STATISTICS In this chapter, we examine the conceptual foundations of statistics. The goal is to give you an appreciation and conceptual understanding of some basic statistical

More information

Lecture 4: Research Approaches

Lecture 4: Research Approaches Lecture 4: Research Approaches Lecture Objectives Theories in research Research design approaches ú Experimental vs. non-experimental ú Cross-sectional and longitudinal ú Descriptive approaches How to

More information

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data Introduction to Multilevel Models for Longitudinal and Repeated Measures Data Today s Class: Features of longitudinal data Features of longitudinal models What can MLM do for you? What to expect in this

More information

Biostatistics II

Biostatistics II Biostatistics II 514-5509 Course Description: Modern multivariable statistical analysis based on the concept of generalized linear models. Includes linear, logistic, and Poisson regression, survival analysis,

More information

Alternative Methods for Assessing the Fit of Structural Equation Models in Developmental Research

Alternative Methods for Assessing the Fit of Structural Equation Models in Developmental Research Alternative Methods for Assessing the Fit of Structural Equation Models in Developmental Research Michael T. Willoughby, B.S. & Patrick J. Curran, Ph.D. Duke University Abstract Structural Equation Modeling

More information

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz This study presents the steps Edgenuity uses to evaluate the reliability and validity of its quizzes, topic tests, and cumulative

More information

The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016

The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016 The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016 This course does not cover how to perform statistical tests on SPSS or any other computer program. There are several courses

More information

Impact and adjustment of selection bias. in the assessment of measurement equivalence

Impact and adjustment of selection bias. in the assessment of measurement equivalence Impact and adjustment of selection bias in the assessment of measurement equivalence Thomas Klausch, Joop Hox,& Barry Schouten Working Paper, Utrecht, December 2012 Corresponding author: Thomas Klausch,

More information

The Effects of Maternal Alcohol Use and Smoking on Children s Mental Health: Evidence from the National Longitudinal Survey of Children and Youth

The Effects of Maternal Alcohol Use and Smoking on Children s Mental Health: Evidence from the National Longitudinal Survey of Children and Youth 1 The Effects of Maternal Alcohol Use and Smoking on Children s Mental Health: Evidence from the National Longitudinal Survey of Children and Youth Madeleine Benjamin, MA Policy Research, Economics and

More information

existing statistical techniques. However, even with some statistical background, reading and

existing statistical techniques. However, even with some statistical background, reading and STRUCTURAL EQUATION MODELING (SEM): A STEP BY STEP APPROACH (PART 1) By: Zuraidah Zainol (PhD) Faculty of Management & Economics, Universiti Pendidikan Sultan Idris zuraidah@fpe.upsi.edu.my 2016 INTRODUCTION

More information

bivariate analysis: The statistical analysis of the relationship between two variables.

bivariate analysis: The statistical analysis of the relationship between two variables. bivariate analysis: The statistical analysis of the relationship between two variables. cell frequency: The number of cases in a cell of a cross-tabulation (contingency table). chi-square (χ 2 ) test for

More information

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data Introduction to Multilevel Models for Longitudinal and Repeated Measures Data Today s Class: Features of longitudinal data Features of longitudinal models What can MLM do for you? What to expect in this

More information

C h a p t e r 1 1. Psychologists. John B. Nezlek

C h a p t e r 1 1. Psychologists. John B. Nezlek C h a p t e r 1 1 Multilevel Modeling for Psychologists John B. Nezlek Multilevel analyses have become increasingly common in psychological research, although unfortunately, many researchers understanding

More information

Lecture II: Difference in Difference. Causality is difficult to Show from cross

Lecture II: Difference in Difference. Causality is difficult to Show from cross Review Lecture II: Regression Discontinuity and Difference in Difference From Lecture I Causality is difficult to Show from cross sectional observational studies What caused what? X caused Y, Y caused

More information

Chapter 21 Multilevel Propensity Score Methods for Estimating Causal Effects: A Latent Class Modeling Strategy

Chapter 21 Multilevel Propensity Score Methods for Estimating Causal Effects: A Latent Class Modeling Strategy Chapter 21 Multilevel Propensity Score Methods for Estimating Causal Effects: A Latent Class Modeling Strategy Jee-Seon Kim and Peter M. Steiner Abstract Despite their appeal, randomized experiments cannot

More information

EXAMINING THE EDUCATION GRADIENT IN CHRONIC ILLNESS

EXAMINING THE EDUCATION GRADIENT IN CHRONIC ILLNESS EXAMINING THE EDUCATION GRADIENT IN CHRONIC ILLNESS PINKA CHATTERJI, HEESOO JOO, AND KAJAL LAHIRI Department of Economics, University at Albany: SUNY February 6, 2012 This research was supported by the

More information

Instrumental Variables Estimation: An Introduction

Instrumental Variables Estimation: An Introduction Instrumental Variables Estimation: An Introduction Susan L. Ettner, Ph.D. Professor Division of General Internal Medicine and Health Services Research, UCLA The Problem The Problem Suppose you wish to

More information

Use of the Quantitative-Methods Approach in Scientific Inquiry. Du Feng, Ph.D. Professor School of Nursing University of Nevada, Las Vegas

Use of the Quantitative-Methods Approach in Scientific Inquiry. Du Feng, Ph.D. Professor School of Nursing University of Nevada, Las Vegas Use of the Quantitative-Methods Approach in Scientific Inquiry Du Feng, Ph.D. Professor School of Nursing University of Nevada, Las Vegas The Scientific Approach to Knowledge Two Criteria of the Scientific

More information

A critical look at the use of SEM in international business research

A critical look at the use of SEM in international business research sdss A critical look at the use of SEM in international business research Nicole F. Richter University of Southern Denmark Rudolf R. Sinkovics The University of Manchester Christian M. Ringle Hamburg University

More information

Study Guide #2: MULTIPLE REGRESSION in education

Study Guide #2: MULTIPLE REGRESSION in education Study Guide #2: MULTIPLE REGRESSION in education What is Multiple Regression? When using Multiple Regression in education, researchers use the term independent variables to identify those variables that

More information

On the Performance of Maximum Likelihood Versus Means and Variance Adjusted Weighted Least Squares Estimation in CFA

On the Performance of Maximum Likelihood Versus Means and Variance Adjusted Weighted Least Squares Estimation in CFA STRUCTURAL EQUATION MODELING, 13(2), 186 203 Copyright 2006, Lawrence Erlbaum Associates, Inc. On the Performance of Maximum Likelihood Versus Means and Variance Adjusted Weighted Least Squares Estimation

More information

Hierarchical Linear Models: Applications to cross-cultural comparisons of school culture

Hierarchical Linear Models: Applications to cross-cultural comparisons of school culture Hierarchical Linear Models: Applications to cross-cultural comparisons of school culture Magdalena M.C. Mok, Macquarie University & Teresa W.C. Ling, City Polytechnic of Hong Kong Paper presented at the

More information

Answers to end of chapter questions

Answers to end of chapter questions Answers to end of chapter questions Chapter 1 What are the three most important characteristics of QCA as a method of data analysis? QCA is (1) systematic, (2) flexible, and (3) it reduces data. What are

More information

Supplement 2. Use of Directed Acyclic Graphs (DAGs)

Supplement 2. Use of Directed Acyclic Graphs (DAGs) Supplement 2. Use of Directed Acyclic Graphs (DAGs) Abstract This supplement describes how counterfactual theory is used to define causal effects and the conditions in which observed data can be used to

More information

Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University.

Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University. Running head: ASSESS MEASUREMENT INVARIANCE Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies Xiaowen Zhu Xi an Jiaotong University Yanjie Bian Xi an Jiaotong

More information

EXPERIMENTAL RESEARCH DESIGNS

EXPERIMENTAL RESEARCH DESIGNS ARTHUR PSYC 204 (EXPERIMENTAL PSYCHOLOGY) 14A LECTURE NOTES [02/28/14] EXPERIMENTAL RESEARCH DESIGNS PAGE 1 Topic #5 EXPERIMENTAL RESEARCH DESIGNS As a strict technical definition, an experiment is a study

More information

OLS Regression with Clustered Data

OLS Regression with Clustered Data OLS Regression with Clustered Data Analyzing Clustered Data with OLS Regression: The Effect of a Hierarchical Data Structure Daniel M. McNeish University of Maryland, College Park A previous study by Mundfrom

More information

Use of Structural Equation Modeling in Social Science Research

Use of Structural Equation Modeling in Social Science Research Asian Social Science; Vol. 11, No. 4; 2015 ISSN 1911-2017 E-ISSN 1911-2025 Published by Canadian Center of Science and Education Use of Structural Equation Modeling in Social Science Research Wali Rahman

More information

STATS8: Introduction to Biostatistics. Overview. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics. Overview. Babak Shahbaba Department of Statistics, UCI STATS8: Introduction to Biostatistics Overview Babak Shahbaba Department of Statistics, UCI The role of statistical analysis in science This course discusses some biostatistical methods, which involve

More information

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity Measurement & Variables - Initial step is to conceptualize and clarify the concepts embedded in a hypothesis or research question with

More information

Addendum: Multiple Regression Analysis (DRAFT 8/2/07)

Addendum: Multiple Regression Analysis (DRAFT 8/2/07) Addendum: Multiple Regression Analysis (DRAFT 8/2/07) When conducting a rapid ethnographic assessment, program staff may: Want to assess the relative degree to which a number of possible predictive variables

More information

Detection of Unknown Confounders. by Bayesian Confirmatory Factor Analysis

Detection of Unknown Confounders. by Bayesian Confirmatory Factor Analysis Advanced Studies in Medical Sciences, Vol. 1, 2013, no. 3, 143-156 HIKARI Ltd, www.m-hikari.com Detection of Unknown Confounders by Bayesian Confirmatory Factor Analysis Emil Kupek Department of Public

More information

EPI 200C Final, June 4 th, 2009 This exam includes 24 questions.

EPI 200C Final, June 4 th, 2009 This exam includes 24 questions. Greenland/Arah, Epi 200C Sp 2000 1 of 6 EPI 200C Final, June 4 th, 2009 This exam includes 24 questions. INSTRUCTIONS: Write all answers on the answer sheets supplied; PRINT YOUR NAME and STUDENT ID NUMBER

More information

EVALUATING THE INTERACTION OF GROWTH FACTORS IN THE UNIVARIATE LATENT CURVE MODEL. Stephanie T. Lane. Chapel Hill 2014

EVALUATING THE INTERACTION OF GROWTH FACTORS IN THE UNIVARIATE LATENT CURVE MODEL. Stephanie T. Lane. Chapel Hill 2014 EVALUATING THE INTERACTION OF GROWTH FACTORS IN THE UNIVARIATE LATENT CURVE MODEL Stephanie T. Lane A thesis submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment

More information

Score Tests of Normality in Bivariate Probit Models

Score Tests of Normality in Bivariate Probit Models Score Tests of Normality in Bivariate Probit Models Anthony Murphy Nuffield College, Oxford OX1 1NF, UK Abstract: A relatively simple and convenient score test of normality in the bivariate probit model

More information

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES 24 MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES In the previous chapter, simple linear regression was used when you have one independent variable and one dependent variable. This chapter

More information

Regression Discontinuity Design

Regression Discontinuity Design Regression Discontinuity Design Regression Discontinuity Design Units are assigned to conditions based on a cutoff score on a measured covariate, For example, employees who exceed a cutoff for absenteeism

More information

Multifactor Confirmatory Factor Analysis

Multifactor Confirmatory Factor Analysis Multifactor Confirmatory Factor Analysis Latent Trait Measurement and Structural Equation Models Lecture #9 March 13, 2013 PSYC 948: Lecture #9 Today s Class Confirmatory Factor Analysis with more than

More information

Rapid decline of female genital circumcision in Egypt: An exploration of pathways. Jenny X. Liu 1 RAND Corporation. Sepideh Modrek Stanford University

Rapid decline of female genital circumcision in Egypt: An exploration of pathways. Jenny X. Liu 1 RAND Corporation. Sepideh Modrek Stanford University Rapid decline of female genital circumcision in Egypt: An exploration of pathways Jenny X. Liu 1 RAND Corporation Sepideh Modrek Stanford University This version: February 3, 2010 Abstract Egypt is currently

More information

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

Lec 02: Estimation & Hypothesis Testing in Animal Ecology Lec 02: Estimation & Hypothesis Testing in Animal Ecology Parameter Estimation from Samples Samples We typically observe systems incompletely, i.e., we sample according to a designed protocol. We then

More information

TWO-DAY DYADIC DATA ANALYSIS WORKSHOP Randi L. Garcia Smith College UCSF January 9 th and 10 th

TWO-DAY DYADIC DATA ANALYSIS WORKSHOP Randi L. Garcia Smith College UCSF January 9 th and 10 th TWO-DAY DYADIC DATA ANALYSIS WORKSHOP Randi L. Garcia Smith College UCSF January 9 th and 10 th @RandiLGarcia RandiLGarcia Mediation in the APIM Moderation in the APIM Dyadic Growth Curve Modeling Other

More information

26:010:557 / 26:620:557 Social Science Research Methods

26:010:557 / 26:620:557 Social Science Research Methods 26:010:557 / 26:620:557 Social Science Research Methods Dr. Peter R. Gillett Associate Professor Department of Accounting & Information Systems Rutgers Business School Newark & New Brunswick 1 Overview

More information

Small Group Presentations

Small Group Presentations Admin Assignment 1 due next Tuesday at 3pm in the Psychology course centre. Matrix Quiz during the first hour of next lecture. Assignment 2 due 13 May at 10am. I will upload and distribute these at the

More information

Basic concepts and principles of classical test theory

Basic concepts and principles of classical test theory Basic concepts and principles of classical test theory Jan-Eric Gustafsson What is measurement? Assignment of numbers to aspects of individuals according to some rule. The aspect which is measured must

More information

Methodology for Non-Randomized Clinical Trials: Propensity Score Analysis Dan Conroy, Ph.D., inventiv Health, Burlington, MA

Methodology for Non-Randomized Clinical Trials: Propensity Score Analysis Dan Conroy, Ph.D., inventiv Health, Burlington, MA PharmaSUG 2014 - Paper SP08 Methodology for Non-Randomized Clinical Trials: Propensity Score Analysis Dan Conroy, Ph.D., inventiv Health, Burlington, MA ABSTRACT Randomized clinical trials serve as the

More information

Available from Deakin Research Online:

Available from Deakin Research Online: This is the published version: Richardson, Ben and Fuller Tyszkiewicz, Matthew 2014, The application of non linear multilevel models to experience sampling data, European health psychologist, vol. 16,

More information

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug? MMI 409 Spring 2009 Final Examination Gordon Bleil Table of Contents Research Scenario and General Assumptions Questions for Dataset (Questions are hyperlinked to detailed answers) 1. Is there a difference

More information

You must answer question 1.

You must answer question 1. Research Methods and Statistics Specialty Area Exam October 28, 2015 Part I: Statistics Committee: Richard Williams (Chair), Elizabeth McClintock, Sarah Mustillo You must answer question 1. 1. Suppose

More information

Ordinary Least Squares Regression

Ordinary Least Squares Regression Ordinary Least Squares Regression March 2013 Nancy Burns (nburns@isr.umich.edu) - University of Michigan From description to cause Group Sample Size Mean Health Status Standard Error Hospital 7,774 3.21.014

More information

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n. University of Groningen Latent instrumental variables Ebbes, P. IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document

More information

Module 14: Missing Data Concepts

Module 14: Missing Data Concepts Module 14: Missing Data Concepts Jonathan Bartlett & James Carpenter London School of Hygiene & Tropical Medicine Supported by ESRC grant RES 189-25-0103 and MRC grant G0900724 Pre-requisites Module 3

More information

Recent advances in non-experimental comparison group designs

Recent advances in non-experimental comparison group designs Recent advances in non-experimental comparison group designs Elizabeth Stuart Johns Hopkins Bloomberg School of Public Health Department of Mental Health Department of Biostatistics Department of Health

More information

WELCOME! Lecture 11 Thommy Perlinger

WELCOME! Lecture 11 Thommy Perlinger Quantitative Methods II WELCOME! Lecture 11 Thommy Perlinger Regression based on violated assumptions If any of the assumptions are violated, potential inaccuracies may be present in the estimated regression

More information

Overview of Perspectives on Causal Inference: Campbell and Rubin. Stephen G. West Arizona State University Freie Universität Berlin, Germany

Overview of Perspectives on Causal Inference: Campbell and Rubin. Stephen G. West Arizona State University Freie Universität Berlin, Germany Overview of Perspectives on Causal Inference: Campbell and Rubin Stephen G. West Arizona State University Freie Universität Berlin, Germany 1 Randomized Experiment (RE) Sir Ronald Fisher E(X Treatment

More information

Impact of infant feeding on growth trajectory patterns in childhood and body composition in young adulthood

Impact of infant feeding on growth trajectory patterns in childhood and body composition in young adulthood Impact of infant feeding on growth trajectory patterns in childhood and body composition in young adulthood WP10 working group of the Early Nutrition Project Peter Rzehak*, Wendy H. Oddy,* Maria Luisa

More information

Estimating drug effects in the presence of placebo response: Causal inference using growth mixture modeling

Estimating drug effects in the presence of placebo response: Causal inference using growth mixture modeling STATISTICS IN MEDICINE Statist. Med. 2009; 28:3363 3385 Published online 3 September 2009 in Wiley InterScience (www.interscience.wiley.com).3721 Estimating drug effects in the presence of placebo response:

More information

Scale Building with Confirmatory Factor Analysis

Scale Building with Confirmatory Factor Analysis Scale Building with Confirmatory Factor Analysis Latent Trait Measurement and Structural Equation Models Lecture #7 February 27, 2013 PSYC 948: Lecture #7 Today s Class Scale building with confirmatory

More information

S P O U S A L R ES E M B L A N C E I N PSYCHOPATHOLOGY: A C O M PA R I SO N O F PA R E N T S O F C H I LD R E N W I T H A N D WITHOUT PSYCHOPATHOLOGY

S P O U S A L R ES E M B L A N C E I N PSYCHOPATHOLOGY: A C O M PA R I SO N O F PA R E N T S O F C H I LD R E N W I T H A N D WITHOUT PSYCHOPATHOLOGY Aggregation of psychopathology in a clinical sample of children and their parents S P O U S A L R ES E M B L A N C E I N PSYCHOPATHOLOGY: A C O M PA R I SO N O F PA R E N T S O F C H I LD R E N W I T H

More information

How to analyze correlated and longitudinal data?

How to analyze correlated and longitudinal data? How to analyze correlated and longitudinal data? Niloofar Ramezani, University of Northern Colorado, Greeley, Colorado ABSTRACT Longitudinal and correlated data are extensively used across disciplines

More information

Does Male Education Affect Fertility? Evidence from Mali

Does Male Education Affect Fertility? Evidence from Mali Does Male Education Affect Fertility? Evidence from Mali Raphael Godefroy (University of Montreal) Joshua Lewis (University of Montreal) April 6, 2018 Abstract This paper studies how school access affects

More information

Comprehensive Statistical Analysis of a Mathematics Placement Test

Comprehensive Statistical Analysis of a Mathematics Placement Test Comprehensive Statistical Analysis of a Mathematics Placement Test Robert J. Hall Department of Educational Psychology Texas A&M University, USA (bobhall@tamu.edu) Eunju Jung Department of Educational

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha attrition: When data are missing because we are unable to measure the outcomes of some of the

More information

The Regression-Discontinuity Design

The Regression-Discontinuity Design Page 1 of 10 Home» Design» Quasi-Experimental Design» The Regression-Discontinuity Design The regression-discontinuity design. What a terrible name! In everyday language both parts of the term have connotations

More information

Social Determinants and Consequences of Children s Non-Cognitive Skills: An Exploratory Analysis. Amy Hsin Yu Xie

Social Determinants and Consequences of Children s Non-Cognitive Skills: An Exploratory Analysis. Amy Hsin Yu Xie Social Determinants and Consequences of Children s Non-Cognitive Skills: An Exploratory Analysis Amy Hsin Yu Xie Abstract We assess the relative role of cognitive and non-cognitive skills in mediating

More information

INTRODUCTION TO ECONOMETRICS (EC212)

INTRODUCTION TO ECONOMETRICS (EC212) INTRODUCTION TO ECONOMETRICS (EC212) Course duration: 54 hours lecture and class time (Over three weeks) LSE Teaching Department: Department of Economics Lead Faculty (session two): Dr Taisuke Otsu and

More information

Isolating causality between gender and corruption: An IV approach Correa-Martínez, Wendy; Jetter, Michael

Isolating causality between gender and corruption: An IV approach Correa-Martínez, Wendy; Jetter, Michael No. 16-07 2016 Isolating causality between gender and corruption: An IV approach Correa-Martínez, Wendy; Jetter, Michael Isolating causality between gender and corruption: An IV approach 1 Wendy Correa

More information

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data TECHNICAL REPORT Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data CONTENTS Executive Summary...1 Introduction...2 Overview of Data Analysis Concepts...2

More information

CHAPTER NINE DATA ANALYSIS / EVALUATING QUALITY (VALIDITY) OF BETWEEN GROUP EXPERIMENTS

CHAPTER NINE DATA ANALYSIS / EVALUATING QUALITY (VALIDITY) OF BETWEEN GROUP EXPERIMENTS CHAPTER NINE DATA ANALYSIS / EVALUATING QUALITY (VALIDITY) OF BETWEEN GROUP EXPERIMENTS Chapter Objectives: Understand Null Hypothesis Significance Testing (NHST) Understand statistical significance and

More information

Review of Veterinary Epidemiologic Research by Dohoo, Martin, and Stryhn

Review of Veterinary Epidemiologic Research by Dohoo, Martin, and Stryhn The Stata Journal (2004) 4, Number 1, pp. 89 92 Review of Veterinary Epidemiologic Research by Dohoo, Martin, and Stryhn Laurent Audigé AO Foundation laurent.audige@aofoundation.org Abstract. The new book

More information

Experimental Design. Dewayne E Perry ENS C Empirical Studies in Software Engineering Lecture 8

Experimental Design. Dewayne E Perry ENS C Empirical Studies in Software Engineering Lecture 8 Experimental Design Dewayne E Perry ENS 623 Perry@ece.utexas.edu 1 Problems in Experimental Design 2 True Experimental Design Goal: uncover causal mechanisms Primary characteristic: random assignment to

More information

SUMMARY AND DISCUSSION

SUMMARY AND DISCUSSION Risk factors for the development and outcome of childhood psychopathology SUMMARY AND DISCUSSION Chapter 147 In this chapter I present a summary of the results of the studies described in this thesis followed

More information

CHAPTER TWO REGRESSION

CHAPTER TWO REGRESSION CHAPTER TWO REGRESSION 2.0 Introduction The second chapter, Regression analysis is an extension of correlation. The aim of the discussion of exercises is to enhance students capability to assess the effect

More information

CHAPTER 6. Conclusions and Perspectives

CHAPTER 6. Conclusions and Perspectives CHAPTER 6 Conclusions and Perspectives In Chapter 2 of this thesis, similarities and differences among members of (mainly MZ) twin families in their blood plasma lipidomics profiles were investigated.

More information

cloglog link function to transform the (population) hazard probability into a continuous

cloglog link function to transform the (population) hazard probability into a continuous Supplementary material. Discrete time event history analysis Hazard model details. In our discrete time event history analysis, we used the asymmetric cloglog link function to transform the (population)

More information

(CORRELATIONAL DESIGN AND COMPARATIVE DESIGN)

(CORRELATIONAL DESIGN AND COMPARATIVE DESIGN) UNIT 4 OTHER DESIGNS (CORRELATIONAL DESIGN AND COMPARATIVE DESIGN) Quasi Experimental Design Structure 4.0 Introduction 4.1 Objectives 4.2 Definition of Correlational Research Design 4.3 Types of Correlational

More information

Session 3: Dealing with Reverse Causality

Session 3: Dealing with Reverse Causality Principal, Developing Trade Consultants Ltd. ARTNeT Capacity Building Workshop for Trade Research: Gravity Modeling Thursday, August 26, 2010 Outline Introduction 1 Introduction Overview Endogeneity and

More information

Multiple Regression Models

Multiple Regression Models Multiple Regression Models Advantages of multiple regression Parts of a multiple regression model & interpretation Raw score vs. Standardized models Differences between r, b biv, b mult & β mult Steps

More information

6. Unusual and Influential Data

6. Unusual and Influential Data Sociology 740 John ox Lecture Notes 6. Unusual and Influential Data Copyright 2014 by John ox Unusual and Influential Data 1 1. Introduction I Linear statistical models make strong assumptions about the

More information