Application of Bayesian Inference using Gibbs Sampling to Item-Response Theory Modeling of Multi-Symptom Genetic Data

Size: px

Start display at page:

Download "Application of Bayesian Inference using Gibbs Sampling to Item-Response Theory Modeling of Multi-Symptom Genetic Data"

Victor Cannon
6 years ago
Views:

1 Behavior Genetics, Vol. 35, No. 6, November 2005 (Ó 2005) DOI: /s z Application of Bayesian Inference using Gibbs Sampling to Item-Response Theory Modeling of Multi-Symptom Genetic Data Lindon Eaves, 1,4 Alaattin Erkanli, 2,3 Judy Silberg, 1 Adrian Angold, 3 Hermine H. Maes, 1 and Debra Foley 1 Received 25 Mar Final 23 May 2005 Several genetic item-response theory (IRT) models are fitted to the responses of 1086 adolescent female twins to the 33 multi-category item Mood and Feeling Questionnaire relating to depressive symptomatology in adolescence. A Markov-chain Monte Carlo (MCMC) algorithm is used within a Bayesian framework for inference using Gibbs sampling, implemented in the program WinBUGS 1.4. The final model incorporated separate genetic and non-shared environmental traits ( A and E ) and item-specific genetic effects. Simpler models gave markedly poorer fit to the observations judged by the deviance information criterion (DIC). The common genetic factor showed major loadings on melancholic items, while the environmental factor loaded most highly on items relating to self-deprecation. The MCMC approach provides a convenient and flexible alternative to Maximum Likelihood for estimating the parameters of IRT models for relatively large numbers of items in a genetic context. Additional benefits of the IRT approach are discussed including the estimation of latent trait scores, including genetic factor scores, and their sampling errors. KEY WORDS: Bayesian inference; depression; genetic; Gibbs sampling; item-response theory; Markov Chain Monte Carlo; multivariate; twins. INTRODUCTION This paper illustrates the application of Markov Chain Monte Carlo (MCMC) methods to a problem in genetic analysis that has, so far, proved tedious to solve by conventional maximum likelihood approaches namely, the synthesis of genetic modeling, especially in twin data, with modeling of the latent 1 Virginia Institute for Psychiatric and Behavioral Genetics, Department of Human Genetics, Virginia Commonwealth University, Richmond, Virginia, USA. 2 Department of Biostatistics and Bioinformatics, Duke University Medical Center, Durham, North Carolina, USA. 3 Department of Psychiatry and Behavioral Sciences, Duke University Medical Center, Durham, North Carolina, USA. 4 To whom correspondence should be addressed at Virginia Institute for Psychiatric and Behavioral Genetics, PO Box , Virginia Commonwealth University, Richmond, VA , U.S.A. eaves@mail2.vcu.edu. 765 structure underlying large numbers of categorical items using item-response theory (IRT). We illustrate the approach by fitting a conventional IRT model to responses of female adolescent MZ and DZ twins to 33 three-category items from the Mood and Feelings Questionnaire (MFQ, Angold 1995). The approach allows us simultaneously to estimate parameters of the IRT model and the genetic model for twin resemblance, and to estimate scores of individual subjects. The approach yields samples from the posterior distributions given the data of the model parameters, including the subjects latent trait scores, from which is it possible to obtain point estimates of the parameters, such as the estimated mean, mode or median of these distributions, as well to infer information on the credibility of the estimates represented by estimates of the standard deviation/variance of the /05/ /0 Ó2005 Springer Science+Business Media, Inc.

2 766 Eaves, Erkanli, Silberg, Angold, Maes, and Foley posterior distributions, or by credible intervals for the parameters. Item-response theory has a long history (see e.g., chapters by Birnbaum in Lord and Novick, 1968) and provides a theoretically appealing framework for conceiving the relationship between categorical outcomes, such as responses to items on a psychological test, and latent traits. From one perspective, IRT is thus a generalization of the traditional factor model to incorporate categorical outcomes. The IRT model postulates that the probability that an individual exceeds a given response threshold, defined by the transition from one response category to the next, is a monotonic function (typically logistic or probit) of the subject s level on one or more continuous latent traits. Item parameters correspond to the category thresholds ( item difficulty parameters or item extremities ) and loadings of the items on the latent trait or traits ( item discriminating powers ). The basic IRT framework also provides a fertile general conceptual structure for dealing with many clinically and genetically important questions as etiological and phenotypic heterogeneity in complex behavioral disorders indexed by multiple symptoms. Several programs are available for fitting a variety of IRT models to data on unrelated subjects (e.g., Fraser, 1988; Muthe n and Muthe n, 2001; Thissen, 1995). However, as far as we can tell, none of these has been used yet to apply the IRT model to kinship data, such as data on MZ and DZ twins and test hypotheses about the underlying genetic and environmental influences on trait variation and covariation between relatives. Two problems typically arise in trying to fit IRT models to twin data. First, the twins are correlated and behavior-geneticists typically want to test models that imply a variety of constraints on the covariance structure between relatives, such as those implied by the familiar ACE model for twin resemblance (e.g., Neale and Cardon, 1992). Secondly, item parameters need to be constrained across latent traits and groups in the case of twins, to allow for the fact that the same factor structure applies to all subjects. Part of the problem of applying IRT to twin data stems from the fact that although it is relatively easy to write the likelihood for a twin IRT model (see e.g., Eaves et al., 1987), it is far harder to maximize the likelihood with respect to the parameters of a genetic IRT model because of the numerical problem of integrating the likelihood of individual pairs over the infinity of possible latent trait values in multiple dimensions. Although this is not impossible, it has proved very tedious and, to our knowledge there are few, if any, large scale applications of IRT to behavior-genetic data. The recent advance in computer power has unleashed a variety of computer-intensive approaches to statistical analysis that are only just beginning to have an impact on the way behavior-geneticists and twin researchers think about their data. One approach that is starting to receive some attention is the application of Bayesian inference, implemented through Markov Chain Monte Carlo algorithms. Gilks et al. (1996) provide a good general overview of the concepts and the freely available program WinBUGS 1.4 (Spiegelhalter et al., 2003) provides a relatively userfriendly platform for its implementation. Recent papers provide illustrations of the approach to survival analysis (Do et al., 2000), non-linear developmental change and genotypeenvironment interaction (Eaves and Erkanli, 2003; Eaves et al., 2003). In this paper, we focus on an application to IRT modeling and provide WinBUGS code that illustrates the relative transparency and flexibility of the approach. These three publications give more extended accounts of the conceptual and practical background to the application of MCMC methods to several different types of problem in twin data analysis. Briefly, the MCMC approach constructs a Markov Chain on the (parameter) space of unknown quantities such that, starting with a series of trial values (e.g., means, regressions, genetic variances, genetic and environmental effects etc.), after an initial series of iterations (the burn-in ) successive iterations represent samples from the unknown joint distribution. This is the so-called stationary distribution of the Markov Chain, and in the Bayesian context, it is the joint posterior distribution of all the parameters. We note that in this context, the parameters do not just include the usual parameters of the structural model (means, genetic and environmental variances etc.) but also the latent genetic and environmental deviations of the individual twins and any missing observed values. These MCMC iterations are furnished by simulating values for the unknown parameters, conditional upon the given data, using specially catered transition probability kernels (also called proposal distributions) that are not only easy to simulate from, but also guarantee the convergence (in distribution) of the simulated Markov Chain to the true joint posterior distribution. After a burn-in period the probability distribution, and its moments like the expected value of any function, of the unknown quantity is obtained to any desired degree of precision

3 Application of Bayesian Inference Using Gibbs Sampling 767 by taking an (ergodic) average of the successive values over a sufficiently large number of iterations. The Gibbs sampler (Creutz, 1979; Gelfand and Smith, 1990; Geman and Geman, 1984; Ripley, 1979) is perhaps the most popular MCMC approach to construct a Markov chain with the desired properties. In the Gibbs sampling approach, the Markov kernels consist of the conditional distributions of each variable of interest given all the other variables. As an example consider random variables X and Y having an unknown joint distribution [X, Y ], and assume further that each of the conditional distributions [X Y ] and [Y X ] are available in analytically closed form. Here the Markov kernels are the conditionals [X Y ]and[y X ]. So, if X 0 and Y 0 are initial values, then the Markov Chain is constructed on the XY space by simulating successively a sequence of {X r, Y r from the known conditionals [Y X r-1 ] and [X Y r-1 ] for r=1,2,...,r. Under mild regularity conditions, iit can be shown that the joint distribution of the sequence {X r, Y r converges to the joint distribution [X,Y ]asr tends toward infinity, as long as these conditionals are bona-fide probability distributions. Thus, for a sufficiently large R, the {X r, Y r will resemble draws from the true joint distribution [X,Y ]. Within the Bayesian context, X and Y are usually unknown parameters (e.g., the mean and variance), and [X Y ] and [Y X ] (suppressing the conditioning on the data) are the conditional posterior distributions, and [X,Y ] is the joint posterior distribution of X and Y, respectively. For example, the marginal posterior distribution [X ] of X can be approximated by the Monte Carlo integration, for each X=x, ½xŠ ¼1=R X r ½xjY r Š; where the summation is over r = 1,2,...,R. Alternatively, a kernel density approximation, or a histogram, can be used using the sequence {X r. Similarly, the expectation of a function g (X) is approximated by the ergodic Monte Carlo average EgðXÞ ¼1=R X r gðx r Þ: Note that a desired byproduct of MCMC approach is that not only the expectations, but the entire posterior distribution of g(x) is approximated by using the sequence {g(x r ). There are also several other ways, such as general Metropolis Hasting algorithms, to construct an MCMC sampler; in fact the Gibbs sampling is a special case of these general algorithms. A more thorough account of the approach, which is beyond the scope of this paper, may be found in Tierney (1994), Gilks et al., (1996), and Brooks, (1998). A recent paper by Besag (2000) reviews several MCMC approaches and provides a comprehensive list of references. DATA The data chosen to illustrate the method comprise responses of 373 MZ and 170 DZ adolescent female twin pairs aged 8 16 to 33 items of MFQ (Angold, 1995) designed to assess symptoms of depression. The twins completed the MFQ as part of a much more extensive psychiatric assessment in the home during the first wave of the Virginia Twin Study of Adolescent Behavioral Development (VTSABD, Eaves et al., 1997; Hewitt et al., 1997; Simonoff et al., 1997). The items and the raw endorsement frequencies for the three response categories are given for the whole sample of 1086 individual twins in Table I. GENETIC IRT MODEL The model combines two main features. The first is the psychometric model or measurement model that relates the probability of item endorsement to the latent trait(s) hypothesized in the second individual differences model or structural model. In the case of twin data, the model for individual differences incorporates hypotheses about the number of latent traits and the contributions of genetic and environmental factors to variation in latent trait values. An additional layer may reflect hypotheses about the distribution of the latent traits. In this illustrative example, we start by assuming that a single dimension of liability to depression underlies responses to all 33 items and that all genetic and environmental effects operate through a single common pathway (see e.g., Neale and Cardon, 1992). That is, we hypothesize that the same pattern of item loadings applies to both latent genetic and environmental effects. Later, we will relax this assumption and consider the possibility that genetic and environmental effects operate through independent pathways and that there might be item-specific genetic effects. Thus, if Y ijk is the response of the jth twin of the ith pair to the kth (binary) item, the model predicts that the link function relating the probability that the twin will endorse the kth item to the latent trait, X, is PðY ijk Þ¼ð1þe z Þ 1 where

4 768 Eaves, Erkanli, Silberg, Angold, Maes, and Foley Table I. MFQ Item Response Frequencies (N=1086) Response Response Item Item Miserable or unhappy Family better off w/o me Did not enjoy anything Thought of killing myself Less hungry than usual Did not want to see friends Ate more than usual Hard to concentrate Tired/just sat around Bad things might happen Slower movement Hated myself Very restless Felt I was a bad person Felt no good Thought I was ugly Blamed myself Worried about aches and pains Hard to decide Felt lonely Cranky with parents Thought nobody loved me Talked less Did not have fun at school Talked more slowly Not as good as other kids Cried a lot Did everything wrong Future no good Did not sleep as well Life not worth living Slept more than usual Thoughts of death z ¼ b k ðx ij a k Þ: X ij is the score of the twin on the latent trait, a k is the difficulty of the kth item (i.e., the trait value for which the endorsement probability is 50%, see e.g., Lord and Novick, 1968), and b k is the discriminating power of the item (i.e., the regression of endorsement probability on latent trait when X ij =a k ). The b k correspond conceptually to factor loadings in the conventional factor model. Lord and Novick (op. cit.) provide an extensive graphical and mathematical treatment of several IRT models and further details of how to interpret the model parameters. The extension to multi-category items is fairly simple (see e.g., Nelder and McCullagh, 1989). Under the (logistic) IRT model, we assume that the discriminating powers are constant over the range of the latent trait, but add further difficulty parameters to reflect the difficulty levels ( response thresholds ) associated with the additional categories. The simple model for individual differences assumes that there is a single latent trait with identical loadings in MZ and DZ twins and in first and second twins of each pair. The distribution of the latent trait in the twins is assumed to be bivariate normal, with covariances determined by the contribution of genetic and environmental effects to the resemblance of MZ and DZ twins (see e.g., Neale and Cardon, 1992). Given the consensus that shared environmental effects do not contribute much to liability to depression, we have further simplified our illustrative model to assume that only additive genetic effects (A) contribute to twin similarity and the only effects of environment are those specific to individuals (E). In order to fix the scale for the item parameters, we have to fix the total variance in the latent trait. We chose ultimately to scale the item parameters so that the variance in the latent trait is unity. ALTERNATIVE MODELS The initial model outlined above assumes that the probability of item endorsement is a logistic function of a single heritable latent trait. This is equivalent to what has become known as the common pathways model in conventional structural modeling of twin data because the scaling of loadings of responses on the latent genetic and environmental factors implies that genetic and environmental effects operate through a single underlying common pathway (e.g., physiological or neurochemical system, c.f. Martin and Eaves, 1977). This need not be the case. A more general class of models allows for several latent factors, genetic or environmental, with different patterns of loadings ( discriminating powers ) on different sources of variation. In addition, the above models assume that all item-specific effects are stochastic and uncorrelated across members of twin pairs. If there are specific

5 Application of Bayesian Inference Using Gibbs Sampling 769 genetic effects on individual items, residual effects on liability are correlated across twins within pairs for individual items. A more general form of the above model is a generalization of the genetic factor model to the IRT case. We write where PðY ijk Þ¼ð1 þ e z Þ 1 z ¼ Rb km X ijm a kl ; l ¼ 1...n 1; where n is the number of response categories and summation is over m=1...p latent traits. The columns of the matrix of loadings B are scaled initially so that each of the latent traits is N[0,1]. Specific factors are defined by having zero loadings on all but one item. If there is one latent genetic factor, one latent environmental factor, and t items, the number of latent traits is t+2. Excluding structural zeros, there are 3t loadings and t(n)1) thresholds. The latent traits are expected to correlate across twin pairs. The correlation will depend on how the various genetic and environmental differences contribute to variation in the latent traits. We consider the model for genetic and environmental contributions to variance components between and within MZ and DZ twin pairs below (pp. 11ff.) COMPARING MODELS In the conventional likelihood based approaches, (nested) models are compared by computing the likelihood ratio chi-square difference between the more general and more restricted models. A number of further criteria, such as the Akaike Information Criterion, AIC, (Akaike, 1974) have been developed in the attempt to optimize the trade-off between goodness of fit and parsimony. Within the Bayesian framework, Spiegelhalter et al. (2001) have proposed using the deviance information criterion, DIC, as a generalization of the AIC that assesses the ability of a model to predict a test data set. The approach uses two statistics available from iterations of the stable Markov Chain of the MCMC algorithm. Where l is the likelihood evaluated at a given set of parameter values, the average deviance, D, is the average value over many iterations of C)2ln(l) for a given model. ^D, is C)2ln(l) at the average parameter values of the same model over a large number of successive iterations. Spiegelhalter et al. show the effective number of parameters, pd, is approximately pd= D ^D and the DIC is D+pD. Thus, as with AIC, the DIC penalizes improvement in fit for loss of parsimony. IMPLEMENTATION IN WINBUGS There is no single method to implement the genetic IRT model in MCMC. In the current beta version of WinBUGS 1.4 (Spiegelhalter et al., 2003) we parameterized the model for twin resemblance as a variance components model (c.f. Jinks and Fulker, 1970). Apart from imparting a natural flow to the computations, this formulation is, strictly, more appropriate when the assignment to first and second twins is random rather than fixed. The WinBUGS code for the model for binary items is given in Appendix 1. In describing the model, we follow the notation in the appendix. The simplest genetic latent trait model assumes what has been called the common pathways model in which the loadings of the items on the latent genetic trait are a constant multiple of the corresponding environmental loadings. (see e.g., Martin and Eaves, 1977; Neale and Cardon, 1992). In this case, there is a single latent trait, X, (ignoring specific effects). The contributions of genes and environment may then be specified to the within- and between-pair variance components for MZ and DZ twins. Given the within-family environmental variance (E) is scaled to unity, we allow the total (additive) genetic variance (r 2 Æ g) to be free. Following Jinks and Fulker (1970) we then write the components of variance between MZ and DZ pairs as r 2 bmz ¼ r2 g; and r 2 bdz ¼ r2 g=2: Similarly, the components of variance within pairs are r 2 wmz ¼ 1; and r 2 wdz ¼ 1 þ r2 g=2: It is a minor alteration to include the effects of the between families environment (C) to the between-pair variance components if needed. The proportion of variance in the latent trait attributable to additive genetic factors is thus h 2 ¼ r 2 g=ð1 þ r 2 2 gþ: We chose the variance components formulation of the quantitative genetic model rather than the

6 770 Eaves, Erkanli, Silberg, Angold, Maes, and Foley variance covariance formulation (now) more familiar to twin researchers. The variance components formulation of the nested analysis of variance has a more transparent relationship with the form of hierachial mixed models that permit a wide range of extensions of the twin design to other applications. It also reflects more faithfully the assumption implicit in most genetic models for twin resemblance that likesex twins are unordered with a pair. Furthermore, the independence of between and within-pair effects in the anova model simplifies the coding of the MCMC application. The MCMC algorithm generates successive samples from the full conditional distribution of subject and item parameters employing a sequence of iterations which, ultimately, are (non-independent) samples from the posterior distribution of the model parameters, including item parameters, individual subject parameters, and parameters of the genetic model. Variance components are assumed to be sampled from a gamma distribution. Other parameters are assumed to be normal. The pair means and within-pair deviations from the pair means are assumed to be normal. Means of MZ pairs are N[0, r 2 bmz]. Means of DZ pairs are N[0, r 2 bdz]. The corresponding within pair deviations are assumed to be N[0, 1] for MZ twins and N[0, r 2 wdz] for DZ twins. WinBUGS employs the amount of information ( precision, s) to parameterize the variability of samples. Thus, the precision of MZ pair means is sæg=1/r 2 Æ g and the precision of DZ pair means is 2(sÆgl). In our case, we have no prior knowledge of the parameters of the distributions of the model parameters so we assume very broad ( uninformative ) prior distributions for the items parameters and variance components. In the WinBUGS code (Appendices 1 and 2) the uninformative priors are represented by assigning very small values to the precision of the prior gamma and normal distributions (see also examples in Spiegelhalter et al., 2003). We compared the multi-category common pathway, C(AE), model with a number of alternatives. The more complex models were: (1) The common pathway model plus genetic specifics on the individual items, C(AE)+S; 2) The separate pathways model with no specifics, AE; 3) The separate pathways model with item-specific genetic effects, AE+S. Appendix 2 gives the modified WinBUGS code for the multi-category case with independent pathways for genetic and environmental effects and item specific genetic effects. The principal alteration reflects the fact that dichotomous items represent Bernoulli trials with probability of success determined by the item parameters and the latent trait. Multi-category responses are sampled from the multinomial distribution. RESULTS Figures 1 and 2 illustrate the performance of the MCMC algorithm for some of the parameters of the multi-category IRT model for 1000 cycles after a 1000 iteration burn in. The item difficulty parameters, a[15,1] and a[15,2], for the lower and upper thresholds, respectively are shown and the sensitivity parameter b[15] for item 15 ( The future is no good ) and the proportion of variance explained by additive genetic factors, h 2. The absence of any apparent long-term cycling of the parameter values across the time series suggests that the MCMC algorithm is yielding reliable estimates of the parameter values (c.f. Gilks et al., 1996). Figure 2 shows the sequence of iterations for the latent trait scores of a pair of MZ twins. Based on the 1000 iterations presented, the 95% confidence intervals of the subjects scores are 0.150<0.614<0.994 for twin 1 and )1.703<)0.796<) for twin 2. For this pair of MZ twins, at least, the differences between scores on the latent trait are statistically significant and indicate a pair for which non-genetic factors are playing a major differentiating role in the etiology of depression. The fact that MCMC provides simultaneous estimates of the subjects latent trait scores under the IRT model, analogous to the genetic and environmental factor scores of the Linear Structural Model (e.g., Boomsma et al., 1990) is a further benefit of this approach. Figure 3 shows the autocorrelations of the MCMC sequence for the same parameters. Large autocorrelations that decay slowly as a function of lag imply poor mixing of the MCMC series and could indicate a high degree of co-linearity between the parameters or lack of identification of the model. In this case, all the parameters behave quite well, especially the estimates of the subject parameters, for which the autocorrelations are virtually zero. The results of model fitting are summarized for all the items in Tables II and III for the initial dichotomous and multi-category IRT models, respectively, assuming common genetic and environmental pathways and no item-specific genetic effects. The item parameters are highly correlated across models but by no means identical, suggesting that further modeling might usefully pay attention to the

7 Application of Bayesian Inference Using Gibbs Sampling 771 Fig. 1. Traces of 1000 MCMC iterations after 1000 cycle burn-in for selected parameters of multi-category genetic IRT model for MFQ responses. Note: Item parameters are shown for item 15 ( The future is no good ). a[15,1] and a[15,2] are the item difficulties, b[15] is the discriminating power, h 2 is the heritability of the latent trait. assumption that mild and severe responses map onto a single common normally distributed latent trait. The estimates of heritability (0.53 and 0.50) do not differ markedly under the dichotomous and multi-category assessments giving little reason to argue that the genetic contribution is greater for more severe definitions of the phenotype. Figures 4 and 5 illustrate a further spin-off of the use of MCMC for genetic/psychometric applications. For each of the two approaches to assessing the latent trait (dichotomous versus multi-category responses) we plot the sampling errors of the latent trait scores against latent trait score for all 1086 subjects in the sample. The U shape of the relationship between error variance and trait score in the figures show that: both assessments are most precise for subjects above the mean liability. However, errors of measurement of extreme scores derived from the dichotomized items are greater than those obtained from multi-category responses. The estimates based on the multi-category responses are generally more informative (have smaller errors) across the range than those based on the more extreme dichotomous assessments but that the measures show quite similar precisions at the respective points of optimal discrimination. As expected, the multi-category coding

8 772 Eaves, Erkanli, Silberg, Angold, Maes, and Foley Fig. 2. Traces of MCMC estimates of latent trait scores for a pair of MZ twins. Fig. 3. Autocorrelations of representative parameters for multi-category IRT model. yields greater precision over a greater range of latent trait values. Table IV summarizes the statistics used in model comparison. Based on a criterion that improvements less than 5 in the DIC are too trivial to justify a choice between models (Spiegelhalter et al., 2001), we see that each addition to the model leads to a very great improvement in the apparent predictive value of the model. Thus, the analysis clearly justifies independent pathways for genetic and environmental contributions to the profile of item responses, and the inclusion of item-specific genetic effects in the model. Results for the more complex models are summarized in Tables V and VI. Table V gives estimates

9 Application of Bayesian Inference Using Gibbs Sampling 773 Table II. Parameter Estimates for a Twin IRT Model for the Genetic Effects on Depression in Dichotomized MFQ Items in Adolescent Girls (2500 Updates after a 2500 Iteration Burn in ). Item Difficulty (s.e.) Discrimination (s.e.) Item Difficulty (s.e.) Discrimination (s.e.) Miserable or unhappy (0.138) (0.272) Thought of killing self (0.179) (0.712) Did not enjoy anything (0.436) (0.344) Did not want to see friends (0.221) (0.573) Less hungry than usual (0.318) (0.122) Hard to concentrate (0.116) (0.330) Ate more than usual (0.186) (0.122) Bad things might happen (0.106) (0.331) Tired/just sat around (0.195) (0.177) Hated myself (0.092) (0.776) Slower movement (0.148) (0.270) Felt I was bad person (0.163) (0.627) Very restless (0.167) (0.193) Thought I was ugly (0.127) (0.229) Felt no good (0.100) (0.612) Worried about aches and pains (0.142) (0.245) Blamed myself (0.114) (0.183) Felt lonely (0.088) (0.408) Hard to decide (0.117) (0.139) Thought nobody loved me (0.104) (0.743) Cranky with parents (0.166) (0.198) Did not have fun at school (0.165) (0.228) Talked less (0.090) (0.228) Not as good as other kids (0.104) (0.404) Talked more slowly (0.132) (0.394) Did everything wrong (0.156) (0.714) Cried a lot (0.125) (0.258) Did not sleep as well (0.112) (0.253) Future no good (0.128) (0.491) Slept more than usual (0.263) (0.139) Life not worth living (0.128) (0.538) Additive genetic variance (0.063) Thoughts of death (0.151) (0.205) Non-shared environmental variance (0.063) Family better off w/o me (0.131) (0.565) (without sampling errors) for the common pathway model including genetic specifics on the individual items and for the separate pathways model with no specifics. Estimates for the separate pathways model with item-specific genetic effects are given in Table VI. Table VI reports all the principal parameters of the model together with their estimated standard errors based on 2500 samples from the stationary distribution. The item difficulty parameters do not change greatly from one model to the next and are only given for the full model (Tables VI), scaled to unit total variance in the individual item liabilities. The common pathways model (Table VI) assumes that the loadings of items on the genetic and environmental factor are proportional, implying that the effects of genes and environments are mediated through a single underlying process that does not differentiate the effects of genes and environments across items. The independent pathways model implies that genes and environments are responsible for distinct profiles of item response and may be mediated through separate underlying pathways. The loadings of the items on the common and specific latent traits are expressed as proportions of reliable variance in item liability explained by each of the latent factors. The effects of sampling error in item responses are inseparable from those of the item-specific within family environment in cross-sectional twin data. They are absorbed by the multinomial stochastic element of the genetic IRT model and are not parameterized separately. This is one difference between the genetic IRT model and the more familiar multivariate threshold model employed within the LISREL framework. Using our analysis of the MFQ as an illustration, we summarize in Table VII the items that best characterize central core constructs relating to an etiologically-based nosology underlying the 33 putative depressive symtoms. Of the models we have considered, that which assumes two independent pathways, with symptom-specific genetic effects, minimizes the DIC (Table VI). This model supports the notion that there are two independent pathways to depression: one genetic and on environmental. Table VII lists the clusters of items that best characterize these core genetic and environmental constructs. In addition, we tabulate a handful of symptoms that seem to have large specific genetic effects and thus relate less well to the core genetic construct. Provisionally, the genetic factor appears to have the highest loadings on items that characterize endogenous or melancholic disorder while the environmental factor appears to discriminate most between responses to items relating to selfdeprecation. DISCUSSION The MCMC approach makes it relatively simple to fit IRT models within a genetically informative framework. This IRT problem is quite large. It involves estimation of item response characteristics

10 774 Eaves, Erkanli, Silberg, Angold, Maes, and Foley Table III. Estimates of Multicategory Genetic IRT Model for MFQ items (N=1000 Updates after 1000 Cycle Burn in ) Threshold 0 1 Threshold 1 2 Sensitivity Item Mean s.d. Mean s.d. Mean s.d. Miserable or unhappy ) Did not enjoy anything Less hungry than usual Ate more than usual Tired/just sat around Slower movement Very restless Felt no good Blamed myself Hard to decide ) Cranky with parents Talked less Talked more slowly Cried a lot Future no good Life not worth living Thoughts of death Family better off w/o me Thought of killing myself Did not want to see friends Hard to concentrate Bad things might happen Hated myself Felt I was a bad person Thought I was ugly Worried about aches and pains Felt lonely Thought nobody loved me Did not have fun at school Not as good as other kids Did everything wrong Did not sleep as well Slept more than usual Additive genetic variance (%) Table IV. Comparison of Models for Multicategory MFQ Responses Model D ^D pd DIC C (AE) AE C (AE)+S AE+S for 33 multi-category items in pairs of twins with 2 latent traits and genetically correlated errors. As a byproduct, we estimate score on the latent genetic and environmental traits of 1086 individual subjects. We also estimate the sampling errors of all the parameters and scores, together with other properties of the estimates if desired. Typically, one of the more complex models we describe easily ran overnight (c iterations) on a mid-range lap-top PC (Dell Inspiron 8100). Although this seems slow, we are not aware of any corresponding benchmark using maximum-likelihood methods at this point. Furthermore, the amount of computational time is still relatively small in relation to that required to collect the data, conceive of the models and write the paper. With twin data, the algorithm appears to be well-behaved and yields a rich variety of psychometric information, including estimates of trait values and their errors. Such statistics are more tedious to obtain under other approaches and, so far, have eluded researchers in behavior genetics. The genetic factor model we have assumed is about the simplest

11 Application of Bayesian Inference Using Gibbs Sampling 775 Fig. 4. Plot of error variances of individual latent trait values against trait score: items coded as dichotomous. Fig. 5. Error variances of latent trait scores plotted against trait values for multi-category responses.

12 776 Eaves, Erkanli, Silberg, Angold, Maes, and Foley Table V. Estimates of IRT Parameters (Omitting Item Difficulty Parameters and Standard Errors of Estimates) for Two More Complex Genetic Models for Multi-category MFQ items (2500 Iterations after 1500 Sample Burn-in ) Proportions of total variance in item liability A, E no S. Common (A,E) plus S Item A E A E S Miserable or unhappy Did not enjoy anything Less hungry than usual Ate more than usual Tired/just sat around Slower movement Very restless Felt no good Blamed myself Hard to decide Cranky with parents Talked less Talked more slowly Cried a lot Future no good Life not worth living Thoughts of death Family better off w/o me Thought of killing myself Did not want to see friends Hard to concentrate Bad things might happen Hated myself Felt I was a bad person Thought I was ugly Worried about aches and pains Felt lonely Thought nobody loved me Did not have fun at school Not as good as other kids Did everything wrong Did not sleep as well Slept more than usual we could envision. However, there is no barrier in principle to the specification of multiple latent traits, inclusion of covariates, missing values, or non-linear latent trait models. There is a danger that every new method invites new demonstrations of virtuosity that add little to the substantive interpretation of actual data. In noting the possible use of MCMC as a tool for genetic IRT analysis, we hope that the additional flexibility provided will help investigators formulate and test more concisely critical realistic hypotheses about the underlying structure of multisymptom data in genetic studies. It is our goal to help rather than hinder attempts to wrestle with the roles of genes and environment in the etiology and nosology of complex behavioral disorders. Our preliminary results suggest that the approach may help identify constellations of symptoms that relate to etiologically distinct expressions of the clinical phenotype.

13 Application of Bayesian Inference Using Gibbs Sampling 777 Table VI. Estimates for Multi-category Genetic IRT Models for MFQ Items: Separate Genetic and Environmental Factors with Specific Genetic Effects (A,E + S, N=2500 Updates after 1500 Cycle Burn in ) Thresholds Proportion of variance in item liability Common factor Specific Genetic Environmental Genetic Item Mean s.d. Mean s.d. Mean s.d Mean s.d. Mean s.d Miserable or unhappy ) Did not enjoy anything Less hungry than usual Ate more than usual Tired/just sat around Slower movement Very restless Felt no good Blamed myself Hard to decide ) Cranky with parents Talked less Talked more slowly Cried a lot Future no good Life not worth living Thoughts of death Family better off w/o me Thought of killing myself Did not want to see friends Hard to concentrate Bad things might happen Hated myself Felt I was a bad person Thought I was ugly Worried about aches and pains Felt lonely Thought nobody loved me Did not have fun at school Not as good as other kids Did everything wrong Did not sleep as well Slept more than usual Table VII. MFQ Symptom Content of Core Latent Genetic and Environmental Components (Seven Symptoms with Highest Loading in Each Category) Symptoms loading on Genetic construct Symptoms loading on Environmental construct Symptoms with specific genetic effects Item Loading Item Loading Item Loading Slept more than usual Family better of w/o me Thought I was ugly Ate more than usual Life not worth living Less hungry than usual Slower movement Hated myself Cranky with parents Talked more slowly Thought nobody loved me Thoughts of death Tired/just sat around Thought of killing myself Did not have fun at school Very restless Future no good Did everything wrong Hard to decide Not as good as other kids Cried a lot Note: Loadings are the square roots of the proportions of variance given in Table VI.

14 778 Eaves, Erkanli, Silberg, Angold, Maes, and Foley APPENDIX 1 model; { #### Genetic IRT Model : Fits IRT model to latent trait on MZ and DZ twins ####assuming A, E model for variation in latent trait tau.g dgamma(0.001,0.001) tau.g1<-2*tau.g #### Simulate item parameters for(k in 1 : item) { a[k]dnorm(0.,1.e-06) b[k]dnorm(0.,1.e-06) #### Simulate MZs first: for(i in 1 : nmz) { ##### simulate pair means (genetic deviations) g[i] dnorm(0,tau.g) for(j in 1 : 2) { #### scale inititally so that non-shared environmental variance is unity t[i, j] dnorm(g[i],1) #### Now do DZs for(i in (nmz+1) : (nmz+ndz)) { #### simulate between-pair genetic effects g2[i] dnorm(0, tau.g1) for(j in 1 : 2) { #### simulate within-pair genetic effects g1[i, j] dnorm(g2[i], tau.g1) #### add in non-shared environmental deviations t[i, j] dnorm(g1[i, j], 1) #### Now get IRT part for (i in 1:nmz+ndz){ for (j in 1:2) { for (k in 1:item) { logit(p[i,j,k])<-b[k] * (t[i, j] - a[k]) for(i in 1 : nmz+ndz) { for (j in 1:2){ for(k in 1 : item) { x[i, j, k] dbern(p[i, j, k]) #### Derived parameters s2.g <- 1 / tau.g s2.t<- 1 + s2.g h2 <- s2.g/s2.t e2 <-1/s2.t sd <- sqrt(s2.t) #### rescale scores and item parameters to latent trait with unit variance for (i in 1:nmz+ndz){ for (j in 1:2) { theta[i, j] <- t[i, j] / sd for (k in 1:item){ alpha[k] <- a[k]/sd beta[k] <- b[k]*sd APPENDIX 2 model; { ###### Code for fitting IRT model to depression items in MZ and DZ twin data ########### AE MODEL ############### ########### Separate loadings for A and E factors ########## ###### Components of variance and correlations for MZ and DZ pairs tau.g<-1 tau.g2<-2*tau.g ### Generate latent trait scores for nmz MZ pairs for(i in 1 : nmz) { ### c[i] dnorm(0.0,tau.c) g1[i] dnorm(0.0,tau.g) g2[i,m]<-g1[i] e[i,m] dnorm(0,1) ### Generate latent trait scores for ndz DZ pairs for(i in nmz+1:nmz+ndz) { ### c[i] dnorm(0.0,tau.c) g1[i] dnorm(0.0,tau.g2) g2[i,m] dnorm(g1[i],tau.g2) e[i,m] dnorm(0,1)

15 Application of Bayesian Inference Using Gibbs Sampling 779 ### Generate specific genetic trait scores for nmz MZ pairs for(j in 1:nitem){ for(i in 1 : nmz) { sg1[ j,i] dnorm(0,1) s[j,i,m]<-sg1[j,i] ### Generate specific latent trait scores for ndz DZ pairs for(j in 1:nitem){ for(i in nmz+1 : nmz+ndz) { sg1[ j,i] dnorm(0,2) s[j,i,m]dnorm(sg1[j,i],2) ### priors for item discriminating powers for(j in 1 : nitem) { gamma1[j] dnorm(0.0,1.0e-6)i(0,6) gamma2[j] dnorm(0.0,1.0e-6)i(0,6) gamma3[j] dnorm(0.0,1.0e-6)i(0,6) #### obtain item variances and scale parameters vitem[j]<-gamma1[j]*gamma1[j]+ gamma2[j]* gamma2[j]+gamma3[j]* gamma3[j] sitem[j]<-sqrt(vitem[j]) #### Express loadings as proportions of variance in item liability f1[j]<-(gamma1[j]*gamma1[j])/vitem[j] f2[j]<-(gamma2[j]*gamma2[j])/vitem[j] f3[j]<-(gamma3[j]*gamma3[j])/vitem[j] for(k in 1 : ncat - 1) { a[j,k]<-alpha[j,k]/sitem[j] ### Generate IRT model for item reponses (logits) for(j in 1 : nitem) { for(i in 1 :nmz+ndz) { for(k in 1 : ncat - 1) { logit(p[i, m, j, k]) <- gamma1[j] * g2[i,m]+ gamma2[j]* e[i,m] + gamma3[j]* s[j,i,m] - alpha[j, k] ### Generate difficulty parameters - constrained to be increasing with response category - parameterized in beta>0. for(j in 1 : nitem) { for(k in 1 : ncat - 1) { alpha[j, k] <- alpha0 + sum(beta[j, 1:k]) for(j in 1 : nitem) { for(k in 2 : ncat - 1) { for(i in 1 : nmz+ndz) { q[i,m, j, k] <- p[i,m, j, k - 1] - p[i,m, j, k] for(j in 1 : nitem) { for(i in 1 : nmz+ndz) { q[i,m, j, ncat] <- p[i,m, j, ncat - 1] for(j in 1 : nitem) { for(i in 1 : nmz+ndz) { for(k in 1 : ncat) { x[i,m, j, k] <- q[i,m, j, k] for(j in 1 : nitem) { for(i in 1 : nmz+ndz) { q[i,m, j, 1] <- 1 - p[i,m, j, 1] #### Priors for betas that are used to generate difficulty parameters (alpha) for(j in 1 : nitem) { for(k in 1 : ncat) { beta[j, k] dnorm(0.0,1.0e-6)i(0.01, 7.0) for(i in 1:nmz+ndz) { for(j in 1:nitem){

16 780 Eaves, Erkanli, Silberg, Angold, Maes, and Foley r1[i, j]dcat(x[i,1, j,1:ncat]) r2[i, j]dcat(x[i,2, j,1:ncat]) ACKNOWLEDGMENTS This work is part of the Statistical Genetics Core (P.I. L.J. Eaves) of the NIMH Center for Developmental Epidemiology (MH57761, P.I. Adrian Angold). Data collection was supported by MH45268 (PI LJ Eaves). The statistical analysis in this paper was conducted using the beta version of the program WinBUGS 1.4, freely available on the Internet through the generosity of the MRC BUGS project at Cambridge, England. REFERENCES Akaike, H. (1974). A new look at statistical model identification. IEEE Trans. Autonomic Control 19: Angold, A. et al., (1995). The development of a short questionnaire for use in epidemiological studies of depression in children and adolescents. Int. J. Meth. Psychiatry Res. 5:1 12. Besag, J. E. (2000). Markov Chain Monte Carlo for Statistical Inference. Working Paper # 9. WA: Center for Statistics and Social Sciences, University of Washington Boomsma, D. I., Molenaar, P. C. M., and Orlebeke, J. F. (1990). Estimation of individual genetic and environmental factor scores. Genet. Epidemiol. 7: Brooks, S. P. (1998). Markov Chain Monte Carlo and its applications. The Statistician 47: Creutz, M. (1979). Confinement and the critical dimensionality of space-time. Phys. Rev. Lett. 43: Do, K. A., Broom, B. M., Kuhnert, P., Duffy, D. L., Todorov, A. A., Treloar, S. A., and Martin, N. G. (2000). Genetic analysis of the age at menopause by using estimating equations and Bayesian random effects models. Statist. Med. 19: Eaves, L. J., Martin, N. G., Heath, A. C., and Kendler, K. S. (1987). Testing genetic models for multiple symptoms: an application to the genetic analysis of liability to depression. Behav. Genet. 17: Eaves, L. J., Silberg, J. L., Meyer, J. M., Maes, H. H., Simonoff, E., Pickles, A., Rutter, M., Neale, M. C., Reynolds, C., Erickson, M., Heath, A., Loeber, R., Truett, K. R., and Hewitt, J. K. (1997). Genetics and developmental psychopathology: 2. The main effects of genes and environment on behavioral problems in the Virginia twin study of adolescent behavioral development. J. Child Psychol. Psychiatry 38: Eaves, L. J., and Erkanli, A. (2003). Markov Chain Monte Carlo approaches to analysis of genetic and environmental components of human developmental change and GxE Interaction. Behav. Genet. 33: Eaves, L. J., Silberg, J. L., and Erkanli, A. (2003). Resolving multiple epigenetic pathways to adolescent depression. J. Child Psychol. Psychiatry 44: Fraser, C. (1988) NOHARM: a computer program for fitting both unidimensional and multidimensional normal ogive models of latent trait theory. NSW: University of New England Gelfand, A. E., and Smith, A. M. F. (1990). Sampling based approaches to calculating marginal densities. J. Am. Statist. Assoc. 85: Geman, S., and Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. Institute of Electrical and Electronics Gilks, W. R., Richardson, S., and Spielgelhalter, D. (1996). Markov Chain Monte Carlo in Practice. London: Chapman and Hall. Hewitt, J. K., Silberg, J. L., Rutter, M. L., Simonoff, E., Meyer, J. M., Maes, H. H., Pickles, A. R., Neale, M. C., Loeber, R., Erickson, M. T., Kendler, K. S., Heath, A. C., Truett, K. R., Reynolds, C. A., and Eaves, L. J. (1997). Genetics and developmental psychopathology: 1. Phenotypic assessment in The Virginia Twin Study of Adolescent Behavioral Development. J. Child Psychol. Psychiatry 38: Jinks, J. L., and Fulker, D. W. (1970). Comparison of the biometrical genetical, MAVA and classical approaches to the analysis of human behavior. Psychol. Bull. 73: Lord, F. M., and Novick, M. R. (1968). Statistical Theories of Mental Test Scores. New York: Addison-Wesley. Martin, N. G., and Eaves, L. J. (1977). The genetical analysis of covariance structure. Heredity 38: Nelder, J. A., and McCullagh, P. (1989) Generalized Linear Models (2nd Ed.), Chapman and Hall/CRC Press. Muthén, L. K., and Muthén, B. (2001). Mplus User s Guide. Los Angeles, CA: Muthén & Muthén. Neale, M. C., and Cardon, L. (1992). Methodology for Genetic Studies of Twins and Families. Dodrecht: Kluwer Academic Publishers. Ripley, B. D. (1979). Algorithm AS 137: simulating spatial patterns: dependent samples from a multivariate density. Appl. Statist. 28: Simonoff, E., Pickles, A. R., Meyer, J. M., Silberg, J. L., Maes, H. H., Loeber, R., Rutter, M. L., Hewitt, J. K., and Eaves, L. J. (1997). The Virginia Twin Study of Adolescent Behavioral Development: influences of age, gender and impairment on rates of disorder. Arch. Gen. Psychiatry 54: Spielgelhalter, D. J., Thomas, A., and Best, N. G. (2003). Win- BUGS version 1.4 User Manual. Cambridge: MRC Biostatistics Unit. Spiegelhalter, D. J., Best, N. G., Carlin, B. P., and van der Linde, A. (2001). Bayesian measures of model complexity and fit Technical Report. Cambridge, UK: Medical Research Council Biostatistics Unit. Thissen, D. (1995). Multlog 6.3: A computer program for multiple, categorical item analysis and test scoring using item response theory. Chicago: Scientific Software, Inc. Tierney, L. (1994). Markov chains for exploring posterior distributions (with Discussion). Ann. Statist. 22: Edited by Dorret Boomsma

Markov Chain Monte Carlo Approaches to Analysis of Genetic and Environmental Components of Human Developmental Change and G E Interaction

Behavior Genetics, Vol. 33, No. 3, May 2003 ( 2003) Markov Chain Monte Carlo Approaches to Analysis of Genetic and Environmental Components of Human Developmental Change and G E Interaction Lindon Eaves