Strategies for Mapping Heterogeneous Recessive Traits by Allele- Sharing Methods

Size: px
Start display at page:

Download "Strategies for Mapping Heterogeneous Recessive Traits by Allele- Sharing Methods"

Transcription

1 Am. J. Hum. Genet. 60: , 1997 Strategies for Mapping Heterogeneous Recessive Traits by Allele- Sharing Methods Eleanor Feingold' and David 0. Siegmund2 'Department of Biostatistics, Emory University, Atlanta; and 2Department of Statistics, Stanford University, Stanford Summary We investigate strategies for detecting linkage of recessive and partially recessive traits, using sibling pairs and inbred individuals. We assume that a genomewide search is being conducted and that locus heterogeneity of the trait is likely. For sibling pairs, we evaluate the efficiency of different statistics under the assumption that one does not know the true degree of recessiveness of the trait. We recommend a sibling-pair statistic that is a linear compromise between two previously suggested statistics. We also compare the power of sibling pairs to that of more distant relatives, such as cousins. For inbred individuals, we evaluate the power of offspring of different types of matings and compare them to sibling pairs. Over a broad range of trait etiologies, sibling pairs are more powerful than inbred individuals, but for traits caused by very rare alleles, particularly in the case of heterogeneity, inbred individuals can be much more powerful. The models we develop can also be used to examine specific situations other than those we look at. We present this analysis in the idealized context of a dense set of highly polymorphic markers. In general, incorporation of real-world complexities makes inbred individuals, particularly offspring of distant relatives, look slightly less useful than our results imply. Introduction There have been a number of recent successes in using inbred individuals and/or sibling pairs to map recessive traits. For example, Hugot et al. (1996) used 25 sibling pairs to map a locus for Crohn disease, and Gschwend et al. (1996) used 23 pedigrees involving primarily offspring of first cousins to map a locus for Fanconi anemia. Studies of this type increasingly involve wholegenome searches and relatively complex (at least hetero- Received December 20, 1995; accepted for publication January 23, Address for correspondence and reprints: Dr. Eleanor Feingold, Department of Biostatistics, Emory University, 1518 Clifton Road, Room 314, Atlanta, GA feingold@sph.emory.edu X 1997 by The American Society of Human Genetics. All rights reserved /97/ $02.00 geneous) disorders. A number of data-analysis tools for such studies are now in place. Many different methods and software packages exist for analyzing sibling-pair data, and a practical computational method for carrying out analyses with inbred individuals was developed by Kruglyak et al. (1995), but there is not much literature that is of help to an investigator trying to design such a study. Are sibling pairs or inbred individuals more efficient? Are offspring of closer relatives best? What is the best statistic for sibling pairs? The purpose of this paper is to start to answer some of these kinds of studydesign questions. We compare mapping strategies under varying assumptions about phenocopies, locus heterogeneity, and degree of recessiveness. The main issue for sibling pairs is choosing the best test statistic, given that the amount of dominance variance is likely to be known only approximately. For inbred individuals, the issue is what types of matings give the most mapping power. We also compare the utility of inbred individuals, sibling pairs, sibling triples, and cousin pairs for various trait etiologies. We assume that the whole genome, or a large portion thereof, is being searched, probably for multiple trait loci. Our calculations are done under the assump, tion that whole-genome identity-by-descent (IBD) and homozygosity-by-descent (HBD) information are available, as from a dense set of highly polymorphic markers. This idealized assumption allows our questions to be answered for many interesting cases. The strategies thus developed are generally applicable to more realistic mapping situations, with some caveats (see Discussion). We begin our discussions by developing the necessary extensions of previously published models. Most of the details and formulas are presented in appendices so that they can be ignored by readers who wish to do so. However, it should be noted that the models we present can be used to examine many specific situations other than those we have focused on, and the appendices do include enough information to allow others to apply the models. All of the models assume the Haldane mapping function, equal lengths of the male and female maps, and dense, completely informative markers yielding continuous information about regions of IBD and HBD. We also assume throughout that there is no more than one linked locus on a given chromosome. 965

2 966 Models Scanning the Genome Feingold (1993), Feingold et al. (1993), Lander and Schork (1994), Dupuis et al. (1995), and Lander and Kruglyak (1995) described statistical methods for analyzing whole-genome IBD data for affected relative pairs. The essence is to model the number of pairs sharing IBD at any given point on the chromosome as a stochastic process, with the "time" parameter being genetic distance along the chromosome. The methods are applicable to a wide variety of simple and complex modes of inheritance (cf. Dupuis et al. 1995), but, for the important case of sibling pairs, most of this literature has concentrated on additive traits (those with zero dominance variance of the penetrances). Here, we extend the previous methods to traits with nonzero dominance variance and to affected inbred individuals. The basic model for sibling pairs was described by Feingold (1993). The Markov process X, = (X0,t, X1,t, X2,,) gives the number of sibling pairs that share zero, one, and two alleles IBD at point t on a chromosome. The three components sum to the total number of pairs, N. On a chromosome with no trait locus, Xt has for all t a multinomial distribution with parameters (N, 1/4, 1/2, 1/4), although the values of X, fluctuate along the length of the chromosome. If there is exactly one trait locus on the chromosome, the probabilities for the multinomial will be different near the point where the gene lies, with more pairs sharing two alleles IBD and fewer sharing zero. The probabilities of finding zero, one, or two alleles IBD at the trait locus can be parameterized as (1 - a)/4, (1-8)/2, and (1 + a + 26)/4. The values of the parameters a and 6 depend on the specifics of the trait etiology and are discussed in detail below. In the case where N is large, following Feingold et al. (1993), we can instead work with the approximately Gaussian process Zt = (ZO,t, Zl,t, Z2,t) = N-1/2 (Xo,t - N/4, Xlt - N/2, X2,t - N/4). The calculations for this approximating model are always simpler than for the Markov chain but may not be as accurate for small sample sizes. The Markov assumption, which applies to both the Markov chain models and the Gaussian models, is equivalent to assuming the Haldane mapping function. In fact, the mathematical approximations that we use require only that the process be locally Markov, which is true under any standard mapping function. The effect of positive interference is to shorten the genome relative to the results we show, so the false positive rate based on the Haldane mapping function is conservative, but this effect is small. This is discussed further in Feingold et al. (1993) and I.-P. Tu and D. Siegmund (unpublished data). Am. J. Hum. Genet. 60: , 1997 The models for HBD information from inbred individuals are very similar to those used for IBD data from affected pairs. A stochastic process can be constructed to describe each type of consanguineous mating, and P values and power can be calculated by the same methods used for sibling pairs. The process, Yt, records the number of offspring of a particular mating type who are HBD at each marker locus. The properties of the processes Y, for different types of inbred individuals are described in appendix C. If there is no linked gene on the chromosome, Y, has at any point on the chromosome a binomial (N, F) distribution, where F is the coefficient of inbreeding. If there is a single trait locus on the chromosome, the distribution at the locus is binomial [N, F + y (1 - F)]. The parameter y takes values between 0 and 1, indicating the amount of excess HBD. It actually depends on F and is discussed further below. Gaussian models for inbred individuals can be defined similarly to those for sibling pairs. However, since inbred individuals tend to be most useful for rare traits, which often require only small sample sizes to detect linkage, and because F is often small, the Gaussian approximation is not very accurate. It can be vastly improved by transforming the data. This theory is not discussed further in this paper. Sample size calculations for inbred individuals in this paper were performed under the Markov chain models, not the approximating Gaussian models. We test for linkage by looking for extreme values of X, or Zt. One way to do this is to look for extreme values of a stochastic process that is an appropriate onedimensional summary of X, or Zt. That is, we want to find the appropriate weightings of the deviations from expectation in the numbers of siblings sharing 0, 1, and 2 alleles IBD. In our previous work, we examined the (equivalently 2X2,t + X1,t), the mean IBD statistic, which is very powerful for dominant traits (Blackwelder and Elston 1985). We also examined Z2,t + Z1,t (equivalently X2, + X1,t), which counts is and 2s equally and is appropriate if one and two alleles IBD cannot be distinguished (Nelson et al. 1993) but is a very poor statistic for recessive traits. Here we also consider Z2,t (equivalently X2,t), which is useful for recessive traits. It will be helpful to introduce the more general process T,(c) = Z2,t + cz1,t. Our test statistic is the maximum value attained by Tt(c) over the length of the genome, max Tt(c). We shall see below that the value c process Z2,t + 1/2Z1,, = 1/4 provides a useful compromise between c = 0 and C = 1/2 when the degree of dominance is unknown. Formulas for P values and power of these tests are given in appendix A for the Markov chain versions and in appendix B for the Gaussian versions. Other statistics that have been suggested in the literature are also discussed briefly in the Discussion.

3 Feingold and Siegmund: Strategies to Map Recessive Traits Models for a, 6, and y The parameters a, 6, and y introduced above do not have direct genetic interpretations. They are statistical conveniences that are useful for exploring mapping strategies. In order to use them, however, we need to know something about the values that these parameters take under various genetic models. The alternative hypothesis for sibling pairs is defined by the parameters a and 6. The parameter a must lie between 0 and 1 and essentially contains the information about how important the locus is. The parameter 6 must lie between 0 and a and contains, in a sense, the information about how recessive or dominant the disease allele is. If more than one locus is involved in the trait, each locus will have its own values of the parameters; ai and 6i will be associated with the ith trait susceptibility locus. Suarez et al. (1978) calculated the values of a and 6 for a single-gene, two-allele trait as a = (VA/2 + VD/4)/ (K2 + VA/2 + VD/4), and 6 = (VD/4)/(K2 + VA/2 + VD/4), where K is the population prevalence of the trait and VA and VD are the additive and dominance variances of the penetrances. It can easily be shown that the two-allele assumption is not necessary for these formulas to hold. Risch (1990b) did similar calculations in terms of relative risks to different kinds of relatives and got a = ('s - 1)/Xs, 6 = (Xs - So)/Xs, (1) where Xs is the relative risk to siblings of affecteds and X0 is the relative risk to offspring of affecteds. These latter representations have the advantage that, in principle, they can be estimated directly from pedigree data; their disadvantage is that they do not generalize simply and naturally to pedigrees containing more than two affecteds. These representations by Risch (1990b) and Suarez (1978) are appropriate in the presence of phenocopies, but they are not correct when there are other major genetic loci involved with the trait. Risch (1990a) (cf. also Dupuis et al. 1995) introduced models for complex disease involving more than one susceptibility locus. A particularly interesting special case that serves as an approximate model for locus heterogeneity of a rare trait is the additive model, where the total penetrance is assumed to be the sum of penetrances of susceptibility alleles at unlinked loci in linkage equilibrium. The degree of dominance can vary from one locus to the next. Associated with the ith locus are parameters ai and 6i, expressions for which are given in appendix D. Equation (1) continues to hold with a replaced by Xai and 6 replaced by Mi. A two-allele (per locus) model can be useful for simplifying the heterogeneity formulas in order to examine specific situations. Such a model can be 967 parameterized by assigning the three possible genotypes at the ith locus to have probabilities q2, 2pq, and p2. where p is the frequency of the susceptibility allele. The penetrances can be written as fo, fo + f + d, and fo + 2f. Equations for a and 6 under this model are given in appendix D. The parameter a equals one only in the limiting case of a single-gene trait as the allele frequency approaches 0. In the presence of phenocopies, a is smaller, approaching zero as the proportion of cases attributable to phenocopies approaches one. Under heterogeneity, the preceding remark applies to lai; each individual ai will be smaller. The sample sizes necessary to achieve a desired power are strongly dependent on a, regardless of what test is used. Two examples that are illustrative here are chronic obstructive pulmonary disease (COPD) and late-onset Alzheimer disease (AD). In the case of COPD, the PiZ allele is one of many possible causes (Khoury et al. 1993). Khoury et al. give values of.05 for the trait frequency,.02 for the allele frequency, and 20 for the risk ratio to homozygotes, yielding ai =.035. The allele is relatively rare, but the level of phenocopies/ heterogeneity is very high, so the value of ai is very low. This value of ai would require a sample size on the order of thousands, or even tens of thousands, to have a high probability of detecting the locus. If a trait is carefully defined to eliminate heterogeneity and phenocopies as much as possible, ai increases and the sample size is smaller. For AD and the APOE-e4 allele, Tsai et al. (1994) give odds ratios of 3.6 for heterozygotes and 18.3 for homozygotes, and an allele frequency of.13. Given a population prevalence on the order of 10%, it seems possible that a value of ai ;.5 could be achieved, requiring a sample size of -100 or 200 for a hypothetical attempt to map this locus. (This calculation was done under an approximating two-allele model.) The parameter 6 takes the value 0 in the case where VD = 0, and Xs = Xo. This additive case corresponds approximately to a relatively rare single-locus dominant trait. It takes its maximum value of a in the limiting case where VA<< VD and Xo <<Xs, which is a good approximation to many recessive traits. The choice of the best statistic for finding a disease locus using sibling pairs depends on "how recessive" the locus is, as expressed in terms of the ratio 6/a (see Results). Under locus heterogeneity, the Sis of the individual loci will be smaller, but so will the ais so that the ratio 8i/ai has about the same range of values as for a single-locus trait. The two-allele model for locus i gives further insight into exactly what characteristics give the large values of 6i/ai that make a locus "recessive" for the purposes of choosing the best statistic. It can be seen, from the equations in appendix D, that the phenocopy rate, fo, does not play a role in determining the size of this ratio, while

4 968 d and f affect the ratio only through the ratio dif. If the locus is completely recessive in the usual sense (heterozygote penetrance equals nonmutant homozygote penetrance), then -dif = 1. In this case, 6i/a, is > 1/2 as long as P <.2. Thus, a fully recessive locus is best found by a "recessive statistic" designed for large 6/a, as long as the susceptibility allele is moderately rare. A common allele that slightly increases the likelihood of disease, even if that increase acts recessively, will not give a large value of /ac. If the locus is not fully recessive, i.e., if -dif < 1, the picture changes markedly. For dif = -.9, the maximum value of 8/a is.52, and it can be much lower. For dif = -.8, the maximum value of 6/a is only.31. In the case of COPD described above, the PiZ allele acts, apparently, completely recessively, giving it the quite high value of 6i/ai =.92. For AD, depending on one's assumption about the overall trait frequency, one gets -dif of at most.7, and thus 6i/a, of at most.19. An example of an intermediate value of 6i/ai arises in Kruglyak and Lander's (1995) analysis of the type 1 diabetes data of Davies et al. (1994). They calculate =i/ai =.59 at the HLA locus. = The alternative hypothesis for inbred individuals is defined by the parameter y, which varies between 0 and 1, indicating the amount of excess HBD. At an unlinked locus, an inbred individual is HBD with probability F, the coefficient of inbreeding. At a trait locus, the probability of HBD is F + (1 - F)y. The parameter y depends on the inbreeding type. Appendix D gives formulas for y under the general model and under a two-allele model. For a given trait, y is highest for offspring of close relatives, but that does not necessarily imply greater mapping power for closer relatives. However, for a given inbreeding type, traits with higher values of y will require smaller sample sizes than those with lower values of y. In general, y is high when the trait allele is rare and the level of phenocopies is low. For example, for COPD, y for first-cousin offspring is only.02, because the level of phenocopies is so high. A trait with the same allele frequency in which only half the cases are phenocopies would give y =.61, and with no phenocopies would give y =.75. Under heterogeneity, values of y, decrease as the number of loci increases, with A} never exceeding 1. If there are very few phenocopies and n equally contributing recessive loci each with a very rare trait allele, yj 1/n. In order to compare the mapping power achievable using a sample of inbred individuals to that achiev- - able with a sample of sibling pairs, it is necessary to understand the relationships between the parameters y, a, and 6 under various genetic models. This is difficult in the general case, but appendix D gives formulas for the two-allele case with one or more loci. Table 1 gives additional examples of values of y, a, and 6 under a twoallele model for various values of the genetic parameters. Am. J. Hum. Genet. 60: , 1997 The theory developed here for inbred individuals refers to an otherwise outbred population. For an inbred individual in an inbred population, the probability of HBD at an arbitrary locus would be, in Wright's notation (cf. Crow and Kimura 1970, p. 106), FIT = Fs + (1 - FIS)FsT. Here F1s is the coefficient of inbreeding due to the immediate family relation, which we have been calling F (h1/6 for a child of first cousins), while FST is the coefficient of background inbreeding of the population. To consider an admittedly extreme example, for the Hutterite population the value of FST has been estimated to be.04 (cf. Crow and Kimura 1970, p. 106). In such a population, the probability of HBD - in a child of first cousins is.1, or 50 % larger than the.0625 of an outbred population. To have a valid test of the null hypothesis of no linkage, one should in principle use this corrected HBD probability. Bodmer and Cavalli-Sforza (1976, p. 371) give estimated inbreeding coefficients for a number of subgroups in different parts of the world. Their estimates are often.001, which are small enough to be ignored. In a few cases of isolated populations they are -.01, which probably can be ignored when dealing with offspring of first cousins but might cause problems if one's data involved largely second cousins, where F = 1/ , so FIT ;.026. Results Comparison of Sibling Pair Statistics We have defined the general test statistic T,(c) = Z2,1 + cz,,,. We wish to determine which value(s) of c are best for mapping different kinds of traits. One interesting value is c = (1 -K)/2(1 + K), which gives the score statistic when 6/a = K. Usually this ratio will be unknown, so this statistic cannot be used in practice, but it provides an ideal against which to evaluate the performance of the other statistics. According to the Gaussian approximation, the relative sample sizes for different values of c, and thus the choice of the best statistic, depends on only K = 6/a, not on a, although the absolute sample sizes do of course depend on a. For a falsepositive error rate of.05, power of.90, and a genome length of 3,000 cm, sample sizes for the optimal c = (1 - K)/2(1 + K) range from 8.5/a2 for 6/a = 1 to 50.1/a2 for 6/a = 0. Figure 1 shows the performance of the statistic T,(c) for c = 1/2, 1/4, and 0, expressed in terms of the approximate percentage increase needed in the sample size compared to the optimal value. As expected, the "recessive" statistic T,(0) performs very well when K is close to 1, and the "dominant" statistic Tt( 1/2) performs very well when K is close to 0. The compromise statistic T,( 1/4) requires a sample size no more than 12% larger than the optimum over the entire range of K to achieve an arbitrary amount of power. This

5 Feingold and Siegmund: Strategies to Map Recessive Traits 969 Table 1 Example Sizes and Parameter Values for Sibling Pairs and Inbred Individuals, under Various Two-Allele Monogenic and Polygenic Models No. Sibling First Second Line of Loci fo + 2f' -d/f" Kc fo/kd ae f 5 1 Y2g Pairs Cousins Cousins s S S NOTES.-The column labeled "sibling pairs" gives the number of pairs necessary to obtain 90% power. The columns labeled "first cousins" and "second cousins" give the corresponding numbers of offspring of the indicated matings. The sample sizes for sibling pairs are computed using the Markov chain version of the statistic T,(O). They do not depend on the two-allele model or the number of loci, only on the values of a and B. The corresponding sample sizes for inbred individuals are specific to the two-allele model and the number of loci. The calculations behind the multilocus sample sizes assume that all loci have equal allele frequencies and penetrances. When there is more than one locus, power refers to the probability to detect a particular locus. a f0 + 2f is the penetrance of the homozygote. Lines 9 and 10 show the effect of reduced penetrance. b -dif is the degree of recessiveness under the two-allele model. -dif = 1 indicates a fully recessive gene, i.e., no heterozygote liability above the phenocopy level. Lines 11 and 12 show the effect of such heterozygote liability. c K is the trait frequency. d fo/k is the fraction of cases due to phenocopies. 'a is the sib-pair parameter indicating (roughly) the importance of the locus. It varies from 0 to 1. f 8 is the sib-pair parameter indicating (roughly) the degree of recessiveness of the disease allele. It varies from 0 to a. g yi and y2 are the parameters for offspring of first and second cousins, respectively. They vary from 0 to 1. would seem to be a reasonable choice when the mode of inheritance is unknown. The simpler T,(O) performs very well when the trait is at least moderately recessive (IC > /2). Remark 1.-The simple comparisons of figure 1 are based on the Gaussian approximations, which to some extent make T,(O) look better than it actually is when the sample size is small to moderate. The reasons are that on unlinked chromosomes the distribution of T,(O) is skewed, so the correct threshold is slightly larger than the Gaussian approximation suggests, and at a trait locus the true variance of T,(O) is larger than the variance used in the Gaussian approximation, so the power is actually less than the approximation suggests. This approximation is, nevertheless, useful, because it gives us a simple qualitative picture based on the single parameter K instead of a more accurate but more difficult to interpret description involving N, a, and 6. Remark 2.-When the penetrance and trait incidence are both large, affected-unaffected sibling pairs can be quite powerful for mapping (Rogus and Krolewski 1996). The parameterization for discordant pairs is exactly the same as for affected pairs, although the values of a and 6 are now negative and satisfy the constraint a < 6 0. Although the values of a and 6 are different - for discordant pairs than for affected pairs, the value of a/6 is the same. Thus, for a given trait, whatever test statistic of the form T,(c) is appropriate for a sample of

6 970 en \ 0-10S/1c=1/ (D) E >E 0 N * F. of)th/ttsi tc 2t+clti... hw o / "oiat Figure 1 IC Comparison of sibling-pair test statistics. Performance of the statistic T1(c) Z2,1 = + cz,,, is shown for c "2 ("dominant" statistic), c = 1/4 ("compromise" statistic), and c = 0 ("recessive" statistic). Performance is expressed as the approximate percentage increase in sample size needed over what is required for the statistic that is optimal if the degree of recessiveness (the parameter ic) is known. affected pairs, the negative of that statistic is appropriate for a sample of discordant pairs. Comparing Sibling Pairs to More Distant Relatives Several authors (e.g., Risch 1990b; Elston 1992; Feingold et al. 1993) have noted that for additive traits, defined by the condition Xs = X0, pairs of more distant relatives are more efficient than sibling pairs when Xs = X0 is large. For completely recessive traits, where Xs >> Xo, clearly unilineal relative pairs are not useful, but it is interesting to see how they compare to sibling pairs when a trait is partially recessive. Figure 2 shows the relative efficiency of sibling pairs to first-cousin pairs. For additive traits (Xs = X0), sibling pairs are more powerful than cousin pairs only when Xs is small (approximately Xs < 2), a result that agrees with the authors above. However, when Xo = (1 + Xs)/2, which by (1) would be equivalent to K = 1/2 for a monogenic trait, sibling pairs are more efficient than cousin pairs up to about Xs = 13 and are much more efficient for small Xs. The relative efficiencies given in the figure also apply for a heterogeneous trait, provided 8i = a,/2 at the locus under consideration. (This means the degree of recessiveness observed at the pedigree level for the trait as a whole is the same as at the particular locus. The situation would be quite different if the overall intermediate mode of inheritance resulted from some loci behaving dominantly and others behaving recessively.) Am. J. Hum. Genet. 60: , 1997 Comparing Sibling Pairs to Inbred Individuals Table 1 gives examples of parameter values (a, 8, and y) and of sample sizes for sibling pairs and for offspring of first and second cousins. The parameter values are computed for the two-allele models that are shown in the table. The calculations for multilocus traits assume that all loci are equally contributing, unlinked, and in linkage equilibrium. Power to detect any particular locus is 90%. The sample sizes for sibling pairs do not depend on the two-allele model or the number of loci, only on the values of a and 8. The corresponding sample sizes for inbred individuals are specific to the two-allele model and the number of loci. This means that the relation between (a, 8) and (Ti, a2), hence the relative value of sibling pairs and inbred individuals, depends on the number of loci. All sample sizes are computed using the Markov chain models to give the best possible accuracy. The sample sizes for sibling pairs are computed using the Markov chain equivalent of the statistic T,(O) (see appendix A). For a fully recessive (-dif = 1) single locus with homozygote penetrance fo + 2f = 1, parameter values vary with the phenocopy level and the trait frequency (lines 1-8 of table 1). For a given trait frequency, the presence of phenocopies decreases a slightly, increases 8 slightly, and reduces Ty and T2 moderately (see, for example, lines 4 and 5 or lines 6 and 7), indicating that phenocopies reduce the power of inbred individuals more than that of sibling pairs. Rarer traits have higher values of all parameters and thus smaller sample sizes. If the homozygote penetrance is only.5, parameter values are somewhat lowered, but not dramatically so (compare lines 1 and 5 to lines 9 and 10). However, if the risk to heterozygotes is elevated even a small amount above the phenocopy rate (e.g., -dif =.9), values of 8, Ti, and T2 are quite low, indicating that "recessive-trait" mapping strategies will be less useful (compare lines 1 and 5 to lines 11 and 12). It is important to note that, in the case of elevated heterozygote risk, the value of 8 varies nonmonotonically with the allele frequency (see appendix D) and thus can be somewhat unpredictable (e.g., line 11). Except for a near 1 (few phenocopies and a very rare allele), sibling pairs are substantially more powerful than inbred individuals, and offspring of closer relatives are more powerful than offspring of distant relatives. Lander and Botstein (1987) obtained similar results, although they focused on marker heterozygosity and distance between markers rather than on phenocopies and heterogeneity as factors governing the difficulty of detecting linkage. The pattern is similar for heterogeneous traits, but the import is somewhat different. As long as there is no heterozygote liability and few phenocopies and the trait is moderately rare (e.g., lines 13, 14, 16, and 19-21),

7 Feingold and Siegmund: Strategies to Map Recessive Traits inbred individuals are more efficient than sibling pairs. The advantage increases as the number of loci increases. In addition, if the trait is quite rare, 6i is about equal to ai, and offspring of more distant relatives can be more powerful (lines and 19-20). Again, their advantage increases with the number of loci. If there is a high level of phenocopies (lines 15 and 18) or even a small amount of heterozygote liability, sibling pairs regain much of their relative efficiency, followed by offspring of closer relatives. In interpreting the sample sizes for heterogeneous traits in table 1, one must keep in mind that the sample size is that required to detect linkage to a given traitsusceptibility locus and that all loci are assumed to contribute equally to the trait. The results would be qualitatively similar, although quantitatively different if a small number of loci account for a comparatively large part of the incidence of the trait or if our requirement were different, e.g., to detect at least one of the trait loci or to detect all trait loci. The assumption that all trait loci contribute equally is a convenient conceptual device to simplify an intrinsically complex situation, but it is unlikely to be even approximately satisfied when several loci are involved. To explore the full range of possibilities there does not seem to be any substitute for performing a large number of computations under a variety of assumptions. For example, Gschwend et al. (1996) >, c) 0._ a) a) Figure 2 Relative statistical efficiency of sibling pairs to cousin pairs for additive and partially recessive traits. Values for the additive case, Xs X0, = are computed using the sibling statistic T,(1/2). Values for the partially recessive case, X0 (1 = + Xs)/2, are computed using Tt( '/4). scan the genome to detect linkage for Fanconi anemia, which through complementation analysis is known to be heterogeneous and to involve at least five loci. Because of the rarity of the disease, inbreds can be expected to be more powerful than sibling pairs and offspring of second cousins more powerful than offspring of first cousins. Gschwend et al.'s sample of inbred affecteds, which was selected not to contain families segregating at a previously linked trait locus on chromosome 9 (Strathdee et al. 1992), consisted of 23 pedigrees that involved primarily offspring of first cousin matings. Assuming for the sake of discussion that, after the chromosome 9 locus is excluded, there are exactly four equally contributing loci and that the sample consists of exactly 23 offspring of first-cousin matings, we find that the power to detect linkage to any given locus would be.45, while the power to detect at least one of the four loci would be -1 - (1 -.45)4, or -.9. In fact, according to Gschwend et al. (1996), one locus appears responsible for a substantial majority of the total incidence, so the power to detect that locus is large, while the power to detect the other loci is very small. Sibling Triples We have extended our models (not shown) to briefly consider the value of sibling triples. We have considered only the case of a completely recessive trait, using a statistic which counts the number of triples where all three siblings are IBD on both chromosomes. In this case, sibling triples are more valuable than pairs, but not as much more valuable as in the case of a dominant trait of comparable incidence (Teng and Siegmund 1997, in this issue). In the examples we have examined, sibling triples are always more valuable than offspring of first cousins, even in the cases in which offspring of cousins are better than sibling pairs. The number of sibling triples necessary to achieve 90% power for lines 6, 7, 15, 16, and 17 of table 1 are respectively 10, 16, 22, 33, and 41. Discussion 971 Recommendations on Study Design Our overall conclusion about study design is that sibling pairs are likely to be more useful than inbred individuals for a fairly wide range of recessive and partially recessive traits, especially if there is only one locus. In the presence of heterogeneity, inbred individuals can be much more useful than sibling pairs, as long as phenocopies are rare and the disease allele is purely recessive. This is an important case, which includes such traits of current interest as Fanconi anemia, xeroderma pigmentosum, and some loci implicated in retinitis pigmentosa. Efficiency is particularly important in the case

8 972 of a heterogeneous trait, because of the large sample sizes required and hence the possibility of large savings. In contrast, efficiency is not as critical for a monogenic trait with few phenocopies, because small sample sizes will provide sufficient power. We also find that the previously known advantage of more distant relative pairs over sibling pairs for mapping additive traits does not hold for partially recessive traits unless the effect of the locus is quite large. For inbred individuals, we find that offspring of closer relatives provide greater mapping power in most cases but that, if the level of phenocopies is low, the allele is rare, and there is no heterozygote liability, offspring of more distant relatives provide greater power, with the advantage increasing as the amount of heterogeneity increases. If there is a sparsity of markers or if unaffected family members are unavailable for typing, sibling pairs are likely to be more useful, and offspring of distant relatives less useful, than the above results indicate (see Marker Spacing and Incomplete Polymorphism, below). Sibling-Pair Statistics To compare the usefulness of different sibling-pair statistics, we measure the degree of recessiveness in terms of the parameter Kc = 8/a, which varies between 0 and 1, and is large when a double dose of a relatively rare allele causes a large increase in liability. We find the statistic T,( /4) to be quite efficient over almost all values of K, and the statistic T,(0) to be very good as long as K > 1/2. In the presence of elevated heterozygote liability (partial recessiveness), values of K can be quite low, and T,( '/2) is likely to be a better choice. Several other statistics have been suggested for the situation where the mode of inheritance is unknown. Perhaps the most appealing is the likelihood ratio or "possible triangle" test (Holmans 1993). Faraway (1993) also proposed essentially the same test. Within the Gaussian framework, this likelihood-ratio test is equivalent to estimating the optimal value of c in the statistic T,(c) from the marker data at the putative trait locus and using that estimated value to define an adaptive statistic. A detailed analysis is given in J. Dupuis and D. Siegmund (unpublished data). Although we have not included this test in the simple comparative analysis of figure 1, numerical calculations indicate that over a range of conditions it requires a sample size ranging from -4-% to -10% above the minimum sample size. Overall, it behaves much like the test based on Tt( 1/4). While Tt( 1/4) would be optimal in the case 8 = cr13, or equivalently VD = VA, the likelihood-ratio test does not favor any particular value of K. As a result, the test based on TP(/4) is slightly more efficient than the likelihoodratio test when the mode of inheritance is indeed intermediate between additive and recessive and less efficient Am. J. Hum. Genet. 60: , 1997 when the mode of inheritance is closer to one of the two extremes. The minimum power for both tests occurs at the extreme models 8 = 0 and 8 = a. A rather different test, at least in spirit, is Schaid and Nick's (1990) suggestion to use max(tj0], T,['/2]). This test will perform very well if one of the two extreme models is correct but less well for an intermediate case. Some rough calculations based on a Bonferroni correction indicate that it performs about as well as the likelihood ratio test or the test using T,(1/4); it does slightly better at the extremes and slightly less well in between. Since none of these tests seems uniformly preferable to the others, our preference for T,( 1/4) is to some extent arbitrary. A secondary consideration favoring statistics of the form T,(c) is that they are more easily combined with data from pedigrees containing other affected relative pairs, affected-unaffected pairs, or affected children of inbreds, when such data are available. Our results may at first glance appear inconsistent with the results of Knapp et al. (1994a, 1994b), who show that the statistic T,( 1/2) is in a certain sense optimal for detecting linkage of a recessive trait. However, their result is based on the assumptions of a monogenic recessive trait without phenocopies and a test of a single marker. Under these conditions, T,( 1/2) can be quite efficient. If the trait is rare, only a small sample is needed and efficiency is not an important issue. If the trait is not rare, then K is not close to one, so in that case as well T,(1/2) may be reasonable. For example, for a completely recessive trait without phenocopies and with an allele frequency p =.25, a =.84, and 8 =.36; the required sample sizes to achieve 90% power are 32 for TP('/2) and 37 for T,(0). However, for a rare heterogeneous trait T,(0) can be substantially more powerful than T,(1/2). Suppose, for example, that four loci act recessively and make equal contributions, so a 8 ;.25 at all loci. To have 90% power to detect a given locus, one would need -204 observations when using T,(1/2) and -148 when using T,(0) (line 19 of table 1). Similar results would be obtained for a rare recessive trait with a high level of phenocopies. Our analyses also put in context the observation of Knapp (1996), who notes that there may be evidence to favor linkage even when one observes "fewer alleles IBD than expected" (i.e., T,['/2] less than expected). He gives the hypothetical example of 2/3 of the pairs sharing zero alleles IBD and the remaining 1/3 sharing two alleles IBD. For this data set, Tt( 1/2) is less than would be expected under the null hypothesis, but T,(0) is quite large. Marker Spacing and Incomplete Polymorphism The models we have used in this paper assume that all possible information is extracted from the study population, i.e., that complete genomewide information on

9 Feingold and Siegmund: Strategies to Map Recessive Traits IBD and HBD is available. In practice, of course, this is not the case. Typical studies involve typing markers every 5, 10, or 20 cm to determine identity-by-state (IBS), which is then converted to IBD probabilities by use of estimated allele frequencies, although sometimes only unequivocal IBD information is used. In general, regions that generate high LOD scores are then saturated with more markers. This sometimes allows haplotypes to be examined and IBD to be established conclusively. When siblings are used without parents, of course, only IBS information is available. All of these situations amount to losses of varying amounts of information, adding noise to the processes we have modeled. Multipoint mapping methods can regain quite a bit of that information by looking at adjoining markers to estimate IBD probabilities more accurately (Kruglyak and Lander 1995; Kruglyak et al. 1996). The fact that real studies extract something less than the full IBD information does have implications for the interpretation of our results. The implications are quite difficult to quantify precisely, which is the reason for looking at the idealized situation as we did. Future work by J. Teng and D. Siegmund will address some of these issues quantitatively. However, several general qualitative statements are possible now. Kruglyak and Lander (1995) suggest a measure of the amount of information extracted from a sibling-pair analysis and use simulations to estimate the amount of information extracted for varying values of marker spacing and polymorphism. Their results suggest that studies using moderately closely spaced markers and multipoint methods are probably extracting very close to the full information, and thus our results as stated apply. The definition of "moderately closely spaced markers" in the above statement depends, however, on the amount of IBD that is expected. For offspring of distant relatives, there will be a greater loss of information from distant marker spacing than for offspring of closer relatives and sibling pairs. There can be a considerable loss of information for siblings without parents, even when multipoint methods are used, but the use of inbred individuals without additional family members would involve even more information loss. Thus, in any situation where less than the full information is being extracted, inbred individuals and distant relatives will be relatively less useful than our results imply. 973 Other Study Design Considerations Several factors we did not consider can complicate the comparison of the relative efficiency of sibling pairs and inbred individuals. In many, cases one could use or extend our models to examine these factors. For example, because the task of finding affected inbred individuals for linkage analysis is often facilitated by concentrating on isolated or inbreeding populations, it may be the case that, even if both sibling pairs and inbred individuals are available, the inbred individuals will come from a distinct subpopulation that is not directly comparable to the outbred population. In this case, the disease etiology parameters for the two different populations must be considered separately. Another possible complicating factor is familial aggregation of phenocopies, which would essentially mean that the population of sibling pairs has a higher phenocopy rate than does the population of inbred individuals. Qualitatively, this would make sibling pairs relatively less valuable than our results imply. The quantitative effect could be examined more precisely by extending our models. Another issue is that the relative utility of outbred siblings and inbred individuals for gene mapping depends to some extent on their frequencies within the population. For a monogenic recessive disease with a negligible level of phenocopies, caused by homozygosity of a rare allele of frequency p, the probability that both outbred parents of a pair of siblings are carriers is - (2p)2. The conditional probability that both siblings are affected is 146, so the probability that a randomly selected pair of outbred siblings are homozygous for the disease allele is _p2/4. Similarly, the probability that one or the other of the common grandparents of a pair of cousins is a carrier of the allele is G-p. The conditional probability that the cousins are both carriers is -'146, and the conditional probability they would both pass this allele to their child is 1/4. Hence, if ntc denotes the frequency of first-cousin matings, the probability that a child is the homozygous product of a first-cousin mating is approximately equal to pncr/16. Roughly speaking then, affected children of first-cousin matings will be sufficiently common to be useful if the frequency of first-cousin matings is large compared to the allele frequency p. Of course, the choice of whether to use inbred individuals or sibling pairs (or some other type of data) for a study is often made simply on the basis of the availability of a study population. In this case, our results will not be used to make the choice, but they can still give useful guidance in study design. For example, if inbred individuals are used, it is important to define the trait and the population carefully, in order to eliminate phenocopies as much as possible. We have not discussed the issue of combining sibling pairs and inbred individuals into a single analysis. In principle, this is quite possible. A naive statistic is to simply add up the number of pairs and individuals showing IBD and HBD at each locus. Choosing an optimal statistic is mathematically similar to the problem of choosing the best sibling-pair statistic; it is a matter of finding weightings that are robust over a wide range of

10 974 trait etiologies. One obvious caution is that, if the sibling pairs and inbred individuals are not really from the same population, it is probably not advisable to combine them, particularly in the situation of likely heterogeneity that we have presumed. A final issue is that the genome search whose properties are described in table 1 is fundamentally a search for a single trait locus, although it will, of course, detect multiple loci if the observed statistic exceeds the threshold b at several places along the genome. It is also possible to design procedures specifically to search for multiple loci simultaneously (Lander and Botstein 1986; Dupuis et al. 1995) or sequentially (Dupuis et al. 1995; Gschwend et al. 1996). These require that one be prepared to make assumptions about the number of major loci and their mode of interaction. The resulting procedures can be considerably more powerful than a singlelocus search when those assumptions are correct. Acknowledgments We wish to thank David Botstein and Michele Gschwend, for discussions of homozygosity mapping and in particular of Fanconi anemia, and Leonid Krugylak and Stephanie Sherman, for their comments on preliminary versions of the manuscript. This work is supported in part by NIH grant R01-HG00848-OlA1. Appendix A Markov Chain Models for Affected Relative Pairs The Markov process X, = (X0,t, X1,t, X2,t) gives the number of sibling pairs that share zero, one, and two alleles IBD at point t on a chromosome. One can test for linkage by looking for extreme values of a onedimensional summary of X,. In our previous work, we examined the process X2,t + X1,, which is appropriate if one and two alleles IBD cannot be distinguished, but is a very poor statistic for recessive traits. Others of interest include 2X2,t + X1,t, which is the score statistic for an additive model (6 = 0), and X2,t, which is the score statistic when 5 = a. One can also use an arbitrary weighting, analogous to the Gaussian process Tt(c). The calculations for the arbitrary weighting of the Markov chain are complex and are not discussed here, but they can be derived by extensions of the methods described by Feingold (1993). For any of these processes, the test statistic is the maximum value attained by the process over the length of the genome. The P value associated with an observed peak of height b is the probability that the maximum exceeds b under the null hypothesis that there is no Am. J. Hum. Genet. 60: , 1997 linked gene on the genome. Feingold (1993) discussed appropriate theory for finding such P values. For a large class of affected relative pair statistics, an approximation to the P value is 1- expl l('')p1 (1 p)n (b - Npo)1, (Al) where the binomial probability (N)pb(l _ po)n-b is the stationary probability that the process is in state b, I is the total length of the portion of the genome being examined, and P is a constant associated with the process (see appendix B). The case of X2,, + X1i,, for which (Al) applies with the values po = 3/4 and P = 16X/3 was fully described by Feingold (1993). The case of X2,, is very similar: the appropriate parameters for equation (Al) are po = 1/4 and P = 16V3. The case of 2X2,, + X1i, can also be handled similarly by noting that a single sibling pair is equivalent to the sum of two independent half-sibling processes (under the null hypothesis of no linked gene on the chromosome), since the maternal and paternal meioses are independent. Thus, N sibling pairs can be treated exactly as 2N half-sibling pairs, a case that was covered by Feingold (1993). The approximate P value is a special case of (Al) with po = 1/2, 5 = 4k, and with N replaced by 2N. The power of any of these tests can be written as P{Yr, b} + PIYr < b, Yt b for some t * r), where r - is the trait locus and Y is the one-dimensional process of interest. This expression omits the probability of finding a significant peak on chromosomes other than the one containing the trait locus r and should thus be interpreted as the power to find a peak on the true chromosome. The first term of this expression is generally easy to evaluate. The statistics X2,t + X1,t and X2,t have binomial distributions. For 2X2,, + X1,t, a normal approximation can be used. The first term by itself often provides a reasonable approximation to the power of the test; the more complicated second term usually adds -5%-10% to the total. The second term of the power expression can be approximated by b-l X P{Yr = i)(2vi - Vi) E i=o where Vi is approximately equal to ( Q(b - 15 b) Ab-i' Q(b- 1,b -2J (A2)

11 Feingold and Siegmund: Strategies to Map Recessive Traits which comes from assuming that Y acts like a simple random walk near b. The quantities Q(b - 1, b) and Q(b - 1, b - 2) are the instantaneous average transition rates away from state b - 1, as described by Feingold (1993), with the exception that they must be calculated under the alternative hypothesis. For the statistic X2,, Q(b - 1, b - 2) = 4X(b - 1) and Q(b - 1, b) = 4X(N - b + 1)[(1-6)/(3 - a - 26)]. For X2,, + X1,, Q(b - 1, b - 2) = 4X(b - 1)[(1-6)/(3 + a)] and Q(b - 1, b) = 4X(N - b + 1). For 2X2,1 + X1,,, Q(b - 1, b - 2) = 2X(b - 1) and Q(b - 1, b) = 2X(2N - b + 1), the same as under the null hypothesis. If the sample size is reasonably large, the computation of (A2) can be simplified with a continuous approximation, which is especially useful when P{Yr = i } is not a binomial probability. The continuous approximation is 2 2bfl/2 f 1 1 Ek( - )vd i1)vxdx _00-2 exp[p(- b) + 1/2 Pt2d] - exp[2p(,u - b) + 2p2t2] X - /2-22p] where g and T are the mean and standard deviation of Yr and p nq(b - 1, b - 2)0 P=ln( Q(b-lb) 2)). Appendix B Properties of the Gaussian Process Tt(c) If there is a single trait locus, r, on the chromosome the expected value of T,(c) can be written as 4R,(t - r), where 4 = N142[a + 26(1 - c)]/4 and RI(x) = a + 6 Ie-4XjxI - (2c - 1) e-8xx a +26(1- c) a +28(1 -c) The function R,(x) can be approximated as 1 -,1 x I, when x is close to zero, where Pi = 4X[a + 6(3-4c)]1 [a + 26(1 - c)]. The covariance of T,(c) and T,(c), where s and t are any two points on the same side of r, can be written as i2r(t - s), where G2 = L2 + (2c - 1)21/16 and - 1)2-8XIxI R(x)= 22 + e-4xjxi (2c 2+2c-1)22 +(2cl)2e The function R(x) can be approximated as 1 - Pjx, where,b = (2c - 1)21/12 + (2c - 1)21. (This constant IP is the same as that used for the corresponding Markov chains.) An approximation to the P value (Feingold et al. 1993; Lander and Schork 1994) for a test based on T,(c) is given by P(max Tt(c)/a - a) 1 - exp[-n(l - F(a)) - Ila4(a)], where and are the standard normal density and distribution, respectively, 1 is the total genetic length of the region of the genome being searched, and n is the number of chromosomes. For a genome of 23 chromosomes and a total genetic length of 30 Morgans, the value a ; 4.1 gives a P value of -.O. An approximation to the power of the test (Feingold et al. 1993) is given by 1 (-a o) + 0(a The parameter 4, which is a function of N, a, and 6, is the only part of the power formula that depends on the sample size, so we do sample size calculations by numerically finding the value of 4 that gives the desired level of power, setting that equal to the algebraic expression for 4, and solving for N. Appendix C 975 Markov Chain Models for HBD in Inbred Individuals Our test statistic is the maximum over the genome of the process, Yt, that records the number of offspring of a particular mating type who are HBD at each marker locus. This process is not a Markov chain, but it can be described as a function of an underlying,

12 976 Figure C1 child mating I IE_ X \\\\\R I chromosome 1 chromosome 2 Schematic of pattern of HBD in offspring of a parent/ unseen Markov chain in the following way. The underlying Markov chains are different for each type of consanguineous mating (e.g., parent/child, first cousin, second cousin). For the ith individual, let yui) equal 1 when there is HBD and 0 when there is not. Then Yt = E Ylti). Figure C1 shows a sample pattern of HBD on one pair of chromosomes in the offspring of a parent/child mating. It can be seen from the diagram that the offspring is HBD at a given locus only if (1) the offspring and its maternal grandmother do not share IBD at that locus and (2) the offspring and its mother do share IBD on their respective paternally derived chromosomes. This information allows us to construct an underlying Markov chain of which y(i) is a function. Let X(') = (X(ia), X(ib)), where X(ia) is the IBD process between the paternally derived chromosomes of the mother and offspring that equals one if those two chromosomes are IBD and zero if they are not, and X(lb) is the IBD process between the inbred offspring and its maternal grandmother. The processes X(ia) and X(ib) are independent Markov chains and are described in more detail by Feingold (1993) as half-sibling and grandparent/grandchild processes, Am. J. Hum. Genet. 60: , 1997 respectively. The process we actually observe, y(i) equals one only when X(') = (1, 0). This representation allows us to calculate an approximate P value by the methods of Feingold (1993). The approximate P value is given by equation (Al) with po = 1/4 and D = 4X. For a half-sibling mating, a similar analysis shows that (Al) applies with po = '/8 and,3 = 32V7. The other cases we have examined do not require the description of new underlying chains, because they are identical to affected-pair processes described by Feingold (1993). For example, the HBD process for a child of siblings is exactly the same as the IBD process for first cousins, because the child of siblings can be regarded as its own first cousin. The P values can be approximated by (Al) with parameters as given in table Cl. Power and sample size calculations can be done in the same manner as for sibling pairs, by use of the fact that the distribution of Yr is binomial [N, F + y(l - F)]. Appendix D Formulas for a, 8, and y Under the additive model for locus heterogeneity described by Risch (1990a), the parameters ct,, 5i associated with the ith trait susceptibility locus can be expressed in terms of locus-specific additive and dominance variances of the penetrances: A.i = VAI/2 + VDi/4 = K2 + E [VA//2 + VD,/4] VD,/4 K2 + E [VAj/2 + VD,/4] Under a two-allele per locus model, the prevalence, K, of the trait is fo + 2pqd + 2pf, the dominance variance is VD, = 4p2q2d2 (Kempthorne 1957) and the additive variance is VA, = 2pq[f + d(q p)]2. _ Then Table C1 P-Value Parameters for Different Mating Types Inbreeding Type Affected-Pair Type po (=F) 0 Sibling Cousin 1/4 16X/3 Avuncular Cousin once removed l/8 40X/7 Cousin Second cousin '/I6 32X/5 Cousin once removed Second cousin once removed 1/32 224X/31 Second cousin Third cousin l/64 512X/63

13 Feingold and Siegmund: Strategies to Map Recessive Traits 977 8iloci pq(d/f)2 = pq(d/f)2 + [1 + (d/f)(q p)]2 _ trait locus is qfo + p(fo + 2f) = fo + 2pf = K + V7 - for d 0. Since VD = 4p2q2d2 = 4K26/(1 - a), we have If -dif = 1, this gives 8i/cti = (1 - p)/(1 + 3p). If -dif < 1, the ratio 6i/ai takes a maximum value of (d/f)2/l4-3(d/f)21, achieved at allele frequency p =(1 + df)12. To derive formulas for y, it is useful to start with a general one-locus n-allele model. This sets the penetrance of an individual with alleles i and j equal to K + f, + fj + dij (Kempthorne 1957), where K is the prevalence of the trait in the outbred population, fi is the net additive effect of allele i, and dij is the net effect of the interaction between alleles i and j. Writing the frequency of allele i as pi, we have Xi pif, = 0 and Yi pidi, = 0. Then the probability that an inbred individual is HBD given that he or she is affected is so F + y(l F) -P{HBD}P~affected HBD} Plaffected) pidii) K + F E pidii _ F(K + FYE pidii K + FE pidii Thus, the relationship between values of y for different types of inbreeding depends on F, on K, and on I pidii, making it fairly complicated. We gain some simplification under the two-allele model, where X pidii = -2pqd. The above formula for P{HBD affected) is correct in the presence of both phenocopies and of heterogeneity with other loci that behave additively. If there is heterogeneity with another recessive locus, we have and thus F + yi(1 - F) P{HBD at locus 1 affected) F(K + E pidii + F E p'd,!i) K + F +pid+ F pfd, aly, F E pidii K + F z pidii + FY pids, where the primes indicate the values for locus 2. We derive formulas for relationships between y, a, and 6 only under the two-allele model. For the onelocus, two-allele model, the probability that an inbred individual is affected given that he or she is HBD at the P{HBD affected} = F(K + v%) F(K + VVD) + (1 - F)K =F + (1 - F) A- ). 1 +2Fe For a heterogeneous trait, modeled by assuming additivity of penetrances between, for example, two loci in linkage equilibrium, the probability of HBD at both trait loci is F2 + F(1 - F)(y1 + y2); the probability of HBD at the first but not the second trait locus is F(1 - F)[1 + (1 - F)y1/F -Y2], etc. When both trait loci are diallelic, References 2F[8i/(l - al -a22 = 1 + 2F[( &'2)/(1 - al - a2)2] Blackwelder WC, Elston RC (1985) A comparison of sib-pair linkage tests for disease susceptibility loci. Genet Epidemiol 2:85-97 Bodmer WF, Cavalli-Sforza LL (1976) Genetics, evolution, and man. WH Freeman, San Francisco Crow JF, Kimura M (1970) An introduction to population genetics theory. Harper & Row, New York Davies JL, Kawaguchi Y, Bennett ST, Copeman JB, Cordell HJ, Pritchard LE, Reed PW, et al (1994) A genome-wide search for human type 1 diabetes susceptibility genes. Nature 371: Dupuis J. Brown PO, Siegmund D (1995) Statistical methods for linkage analysis of complex traits from high resolution maps of identity by descent. Genetics 140: Elston RC (1992) Designs for the global search of human genome by linkage analysis. Proc Int Biometric Conf Faraway JJ (1993) Improved sib-pair linkage test for disease susceptibility loci. Genet Epidemiol 10: Feingold E (1993) Markov processes for modeling and analyzing a new genetic mapping method. J Appl Prob 30: Feingold E, Brown PO, Siegmund D (1993) Gaussian models for genetic linkage analysis using complete high resolution maps of identity by descent. Am J Hum Genet 53: Gschwend M, Levran 0, Kruglyak L, Ranade K, Verlander PC, Shen S. Faure S, et al (1996). A locus for Fanconi anemia on 16q determined by homozygosity mapping. Am J Hum Genet 59: Holmans P (1993) Asymptotic properties of affected-sib-pair linkage analysis. Am J Hum Genet 52: Hugot JP, Laurent-Puig P, Gower-Rousseau C, Olson JM, Lee

14 978 Am. J. Hum. Genet. 60: , 1997 JC, Beaugerie L, Naom I, et al (1996) Mapping of a susceptibility locus for Crohn's disease on chromosome 16. Nature 379: Kempthorne 0 (1957) An introduction to genetic statistics. John Wiley & Sons, New York Khoury MJ, Beaty TH, Cohen BH (1993) Fundamentals of genetic epidemiology. Oxford University Press, New York Knapp M (1996) Even a deficit of shared marker alleles in affected sib pairs can yield evidence for linkage. Am J Hum Genet 59: Knapp M, Seuchter SA, Baur MP (1994a) Linkage analysis in nuclear families. 1. Optimality criteria for affected sib-pair tests. Hum Hered 44:37-43 (1994b) Linkage analysis in nuclear families. 2. Relationship between affected sib-pair tests and lod score analysis. Hum Hered 44:44-51 Kruglyak L, Daly MJ, Lander ES (1995) Rapid multipoint linkage analysis of recessive traits in nuclear families, including homozygosity mapping. Am J Hum Genet 56: Kruglyak L, Daly MJ, Reeve-Daly MP, Lander ES (1996) Parametric and nonparametric linkage analysis: a unified multipoint approach. Am J Hum Genet 58: Kruglyak L, Lander ES (1995) Complete multipoint sib-pair analysis of qualitative and quantitative traits. Am J Hum Genet 57: Lander ES, Botstein D (1986) Strategies for studying heterogeneous genetic traits in humans by using a linkage map of restriction fragment length polymorphisms. Proc Natl Acad Sci USA 83: (1987) Homozygosity mapping: a way to map human recessive traits with the DNA of inbred children. Science 236: Lander ES, Kruglyak L (1995) Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results. Nat Genet 11: Lander ES, Schork NJ (1994) Genetic dissection of complex traits. Science 265: Nelson SF, McCusker JH, Sander MA, Kee Y. Modrich P, Brown PO (1993) Genomic mismatch scanning: a new approach to genetic linkage mapping. Nat Genet 4:11-18 Risch N (1990a) Linkage strategies for genetically complex traits. I. Multilocus models. Am J Hum Genet 46: (1990b) Linkage strategies for genetically complex traits. II. The power of affected relative pairs. Am J Hum Genet 46: Rogus JJ, Krolewski AS (1996) Using discordant sib pairs to map loci for qualitative traits with high sibling recurrence risk. Am J Hum Genet 59: Schaid DJ, Nick TG (1990) Sib-pair linkage tests for disease susceptibility loci: common tests vs. the asymptotically most powerful test. Genet Epidemiol 7: Strathdee CA, Duncan AMV, Buchwald M (1992) Evidence for at least four Fanconi anaemia genes including FACC on chromosome 9. Nat Genet 1: Suarez BK, Rice J, Reich, T (1978) The generalized sib pair IBD distribution: its use in the detection of linkage. Ann Hum Genet 44:87-94 Teng J. Siegmund D (1997) Combining information within and between pedigrees in allele sharing methods of linkage analysis. Am J Hum Genet 60: (in this issue) Tsai M-S, Tangalos EG, Petersen RC, Smith GE, Schaid DJ, Kokmen E, Ivnik RJ, et al (1994) Apolipoprotein E: risk factor for Alzheimer disease. Am J Hum Genet 54:

Non-parametric methods for linkage analysis

Non-parametric methods for linkage analysis BIOSTT516 Statistical Methods in Genetic Epidemiology utumn 005 Non-parametric methods for linkage analysis To this point, we have discussed model-based linkage analyses. These require one to specify a

More information

Inbreeding and Inbreeding Depression

Inbreeding and Inbreeding Depression Inbreeding and Inbreeding Depression Inbreeding is mating among relatives which increases homozygosity Why is Inbreeding a Conservation Concern: Inbreeding may or may not lead to inbreeding depression,

More information

(b) What is the allele frequency of the b allele in the new merged population on the island?

(b) What is the allele frequency of the b allele in the new merged population on the island? 2005 7.03 Problem Set 6 KEY Due before 5 PM on WEDNESDAY, November 23, 2005. Turn answers in to the box outside of 68-120. PLEASE WRITE YOUR ANSWERS ON THIS PRINTOUT. 1. Two populations (Population One

More information

Nonparametric Linkage Analysis. Nonparametric Linkage Analysis

Nonparametric Linkage Analysis. Nonparametric Linkage Analysis Limitations of Parametric Linkage Analysis We previously discued parametric linkage analysis Genetic model for the disease must be specified: allele frequency parameters and penetrance parameters Lod scores

More information

Complex Multifactorial Genetic Diseases

Complex Multifactorial Genetic Diseases Complex Multifactorial Genetic Diseases Nicola J Camp, University of Utah, Utah, USA Aruna Bansal, University of Utah, Utah, USA Secondary article Article Contents. Introduction. Continuous Variation.

More information

Introduction to linkage and family based designs to study the genetic epidemiology of complex traits. Harold Snieder

Introduction to linkage and family based designs to study the genetic epidemiology of complex traits. Harold Snieder Introduction to linkage and family based designs to study the genetic epidemiology of complex traits Harold Snieder Overview of presentation Designs: population vs. family based Mendelian vs. complex diseases/traits

More information

INTERACTION BETWEEN NATURAL SELECTION FOR HETEROZYGOTES AND DIRECTIONAL SELECTION

INTERACTION BETWEEN NATURAL SELECTION FOR HETEROZYGOTES AND DIRECTIONAL SELECTION INTERACTION BETWEEN NATURAL SELECTION FOR HETEROZYGOTES AND DIRECTIONAL SELECTION MARGRITH WEHRLI VERGHESE 1228 Kingston Ridge Driue, Cary, N.C. 27511 Manuscript received May 5, 1973 Revised copy received

More information

Genetics and Genomics in Medicine Chapter 8 Questions

Genetics and Genomics in Medicine Chapter 8 Questions Genetics and Genomics in Medicine Chapter 8 Questions Linkage Analysis Question Question 8.1 Affected members of the pedigree above have an autosomal dominant disorder, and cytogenetic analyses using conventional

More information

Single Gene (Monogenic) Disorders. Mendelian Inheritance: Definitions. Mendelian Inheritance: Definitions

Single Gene (Monogenic) Disorders. Mendelian Inheritance: Definitions. Mendelian Inheritance: Definitions Single Gene (Monogenic) Disorders Mendelian Inheritance: Definitions A genetic locus is a specific position or location on a chromosome. Frequently, locus is used to refer to a specific gene. Alleles are

More information

Effects of Stratification in the Analysis of Affected-Sib-Pair Data: Benefits and Costs

Effects of Stratification in the Analysis of Affected-Sib-Pair Data: Benefits and Costs Am. J. Hum. Genet. 66:567 575, 2000 Effects of Stratification in the Analysis of Affected-Sib-Pair Data: Benefits and Costs Suzanne M. Leal and Jurg Ott Laboratory of Statistical Genetics, The Rockefeller

More information

Pedigree Analysis Why do Pedigrees? Goals of Pedigree Analysis Basic Symbols More Symbols Y-Linked Inheritance

Pedigree Analysis Why do Pedigrees? Goals of Pedigree Analysis Basic Symbols More Symbols Y-Linked Inheritance Pedigree Analysis Why do Pedigrees? Punnett squares and chi-square tests work well for organisms that have large numbers of offspring and controlled mating, but humans are quite different: Small families.

More information

Multifactorial Inheritance. Prof. Dr. Nedime Serakinci

Multifactorial Inheritance. Prof. Dr. Nedime Serakinci Multifactorial Inheritance Prof. Dr. Nedime Serakinci GENETICS I. Importance of genetics. Genetic terminology. I. Mendelian Genetics, Mendel s Laws (Law of Segregation, Law of Independent Assortment).

More information

Pedigree Construction Notes

Pedigree Construction Notes Name Date Pedigree Construction Notes GO TO à Mendelian Inheritance (http://www.uic.edu/classes/bms/bms655/lesson3.html) When human geneticists first began to publish family studies, they used a variety

More information

Selection at one locus with many alleles, fertility selection, and sexual selection

Selection at one locus with many alleles, fertility selection, and sexual selection Selection at one locus with many alleles, fertility selection, and sexual selection Introduction It s easy to extend the Hardy-Weinberg principle to multiple alleles at a single locus. In fact, we already

More information

GENETIC LINKAGE ANALYSIS

GENETIC LINKAGE ANALYSIS Atlas of Genetics and Cytogenetics in Oncology and Haematology GENETIC LINKAGE ANALYSIS * I- Recombination fraction II- Definition of the "lod score" of a family III- Test for linkage IV- Estimation of

More information

Dan Koller, Ph.D. Medical and Molecular Genetics

Dan Koller, Ph.D. Medical and Molecular Genetics Design of Genetic Studies Dan Koller, Ph.D. Research Assistant Professor Medical and Molecular Genetics Genetics and Medicine Over the past decade, advances from genetics have permeated medicine Identification

More information

A Unified Sampling Approach for Multipoint Analysis of Qualitative and Quantitative Traits in Sib Pairs

A Unified Sampling Approach for Multipoint Analysis of Qualitative and Quantitative Traits in Sib Pairs Am. J. Hum. Genet. 66:1631 1641, 000 A Unified Sampling Approach for Multipoint Analysis of Qualitative and Quantitative Traits in Sib Pairs Kung-Yee Liang, 1 Chiung-Yu Huang, 1 and Terri H. Beaty Departments

More information

Roadmap. Inbreeding How inbred is a population? What are the consequences of inbreeding?

Roadmap. Inbreeding How inbred is a population? What are the consequences of inbreeding? 1 Roadmap Quantitative traits What kinds of variation can selection work on? How much will a population respond to selection? Heritability How can response be restored? Inbreeding How inbred is a population?

More information

Any inbreeding will have similar effect, but slower. Overall, inbreeding modifies H-W by a factor F, the inbreeding coefficient.

Any inbreeding will have similar effect, but slower. Overall, inbreeding modifies H-W by a factor F, the inbreeding coefficient. Effect of finite population. Two major effects 1) inbreeding 2) genetic drift Inbreeding Does not change gene frequency; however, increases homozygotes. Consider a population where selfing is the only

More information

Will now consider in detail the effects of relaxing the assumption of infinite-population size.

Will now consider in detail the effects of relaxing the assumption of infinite-population size. FINITE POPULATION SIZE: GENETIC DRIFT READING: Nielsen & Slatkin pp. 21-27 Will now consider in detail the effects of relaxing the assumption of infinite-population size. Start with an extreme case: a

More information

Chapter 2. Linkage Analysis. JenniferH.BarrettandM.DawnTeare. Abstract. 1. Introduction

Chapter 2. Linkage Analysis. JenniferH.BarrettandM.DawnTeare. Abstract. 1. Introduction Chapter 2 Linkage Analysis JenniferH.BarrettandM.DawnTeare Abstract Linkage analysis is used to map genetic loci using observations on relatives. It can be applied to both major gene disorders (parametric

More information

When bad things happen to good genes: mutation vs. selection

When bad things happen to good genes: mutation vs. selection When bad things happen to good genes: mutation vs. selection Selection tends to increase the frequencies of alleles with higher marginal fitnesses. Does this mean that genes are perfect? No, mutation can

More information

Stat 531 Statistical Genetics I Homework 4

Stat 531 Statistical Genetics I Homework 4 Stat 531 Statistical Genetics I Homework 4 Erik Erhardt November 17, 2004 1 Duerr et al. report an association between a particular locus on chromosome 12, D12S1724, and in ammatory bowel disease (Am.

More information

Mating Systems. 1 Mating According to Index Values. 1.1 Positive Assortative Matings

Mating Systems. 1 Mating According to Index Values. 1.1 Positive Assortative Matings Mating Systems After selecting the males and females that will be used to produce the next generation of animals, the next big decision is which males should be mated to which females. Mating decisions

More information

Systems of Mating: Systems of Mating:

Systems of Mating: Systems of Mating: 8/29/2 Systems of Mating: the rules by which pairs of gametes are chosen from the local gene pool to be united in a zygote with respect to a particular locus or genetic system. Systems of Mating: A deme

More information

Replacing IBS with IBD: The MLS Method. Biostatistics 666 Lecture 15

Replacing IBS with IBD: The MLS Method. Biostatistics 666 Lecture 15 Replacing IBS with IBD: The MLS Method Biostatistics 666 Lecture 5 Previous Lecture Analysis of Affected Relative Pairs Test for Increased Sharing at Marker Expected Amount of IBS Sharing Previous Lecture:

More information

Genome Scan Meta-Analysis of Schizophrenia and Bipolar Disorder, Part I: Methods and Power Analysis

Genome Scan Meta-Analysis of Schizophrenia and Bipolar Disorder, Part I: Methods and Power Analysis Am. J. Hum. Genet. 73:17 33, 2003 Genome Scan Meta-Analysis of Schizophrenia and Bipolar Disorder, Part I: Methods and Power Analysis Douglas F. Levinson, 1 Matthew D. Levinson, 1 Ricardo Segurado, 2 and

More information

1248 Letters to the Editor

1248 Letters to the Editor 148 Letters to the Editor Badner JA, Goldin LR (1997) Bipolar disorder and chromosome 18: an analysis of multiple data sets. Genet Epidemiol 14:569 574. Meta-analysis of linkage studies. Genet Epidemiol

More information

MULTIFACTORIAL DISEASES. MG L-10 July 7 th 2014

MULTIFACTORIAL DISEASES. MG L-10 July 7 th 2014 MULTIFACTORIAL DISEASES MG L-10 July 7 th 2014 Genetic Diseases Unifactorial Chromosomal Multifactorial AD Numerical AR Structural X-linked Microdeletions Mitochondrial Spectrum of Alterations in DNA Sequence

More information

GENETIC DRIFT & EFFECTIVE POPULATION SIZE

GENETIC DRIFT & EFFECTIVE POPULATION SIZE Instructor: Dr. Martha B. Reiskind AEC 450/550: Conservation Genetics Spring 2018 Lecture Notes for Lectures 3a & b: In the past students have expressed concern about the inbreeding coefficient, so please

More information

Beef Cattle Handbook

Beef Cattle Handbook Beef Cattle Handbook BCH-1400 Product of Extension Beef Cattle Resource Committee The Genetic Principles of Crossbreeding David S. Buchanan, Oklahoma State University Sally L. Northcutt, Oklahoma State

More information

Lecture 7: Introduction to Selection. September 14, 2012

Lecture 7: Introduction to Selection. September 14, 2012 Lecture 7: Introduction to Selection September 14, 2012 Announcements Schedule of open computer lab hours on lab website No office hours for me week. Feel free to make an appointment for M-W. Guest lecture

More information

An Introduction to Quantitative Genetics I. Heather A Lawson Advanced Genetics Spring2018

An Introduction to Quantitative Genetics I. Heather A Lawson Advanced Genetics Spring2018 An Introduction to Quantitative Genetics I Heather A Lawson Advanced Genetics Spring2018 Outline What is Quantitative Genetics? Genotypic Values and Genetic Effects Heritability Linkage Disequilibrium

More information

An Introduction to Quantitative Genetics

An Introduction to Quantitative Genetics An Introduction to Quantitative Genetics Mohammad Keramatipour MD, PhD Keramatipour@tums.ac.ir ac ir 1 Mendel s work Laws of inheritance Basic Concepts Applications Predicting outcome of crosses Phenotype

More information

Lecture 6: Linkage analysis in medical genetics

Lecture 6: Linkage analysis in medical genetics Lecture 6: Linkage analysis in medical genetics Magnus Dehli Vigeland NORBIS course, 8 th 12 th of January 2018, Oslo Approaches to genetic mapping of disease Multifactorial disease Monogenic disease Syke

More information

Homozygote Incidence

Homozygote Incidence Am. J. Hum. Genet. 41:671-677, 1987 The Effects of Genetic Screening and Assortative Mating on Lethal Recessive-Allele Frequencies and Homozygote Incidence R. B. CAMPBELL Department of Mathematics and

More information

Ch. 23 The Evolution of Populations

Ch. 23 The Evolution of Populations Ch. 23 The Evolution of Populations 1 Essential question: Do populations evolve? 2 Mutation and Sexual reproduction produce genetic variation that makes evolution possible What is the smallest unit of

More information

Effect of Genetic Heterogeneity and Assortative Mating on Linkage Analysis: A Simulation Study

Effect of Genetic Heterogeneity and Assortative Mating on Linkage Analysis: A Simulation Study Am. J. Hum. Genet. 61:1169 1178, 1997 Effect of Genetic Heterogeneity and Assortative Mating on Linkage Analysis: A Simulation Study Catherine T. Falk The Lindsley F. Kimball Research Institute of The

More information

Complex Trait Genetics in Animal Models. Will Valdar Oxford University

Complex Trait Genetics in Animal Models. Will Valdar Oxford University Complex Trait Genetics in Animal Models Will Valdar Oxford University Mapping Genes for Quantitative Traits in Outbred Mice Will Valdar Oxford University What s so great about mice? Share ~99% of genes

More information

Genetics Unit Exam. Number of progeny with following phenotype Experiment Red White #1: Fish 2 (red) with Fish 3 (red) 100 0

Genetics Unit Exam. Number of progeny with following phenotype Experiment Red White #1: Fish 2 (red) with Fish 3 (red) 100 0 Genetics Unit Exam Question You are working with an ornamental fish that shows two color phenotypes, red or white. The color is controlled by a single gene. These fish are hermaphrodites meaning they can

More information

Analysis of single gene effects 1. Quantitative analysis of single gene effects. Gregory Carey, Barbara J. Bowers, Jeanne M.

Analysis of single gene effects 1. Quantitative analysis of single gene effects. Gregory Carey, Barbara J. Bowers, Jeanne M. Analysis of single gene effects 1 Quantitative analysis of single gene effects Gregory Carey, Barbara J. Bowers, Jeanne M. Wehner From the Department of Psychology (GC, JMW) and Institute for Behavioral

More information

Statistical Evaluation of Sibling Relationship

Statistical Evaluation of Sibling Relationship The Korean Communications in Statistics Vol. 14 No. 3, 2007, pp. 541 549 Statistical Evaluation of Sibling Relationship Jae Won Lee 1), Hye-Seung Lee 2), Hyo Jung Lee 3) and Juck-Joon Hwang 4) Abstract

More information

The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0

The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0 The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0 Introduction Loss of erozygosity (LOH) represents the loss of allelic differences. The SNP markers on the SNP Array 6.0 can be used

More information

p and q can be thought of as probabilities of selecting the given alleles by

p and q can be thought of as probabilities of selecting the given alleles by Lecture 26 Population Genetics Until now, we have been carrying out genetic analysis of individuals, but for the next three lectures we will consider genetics from the point of view of groups of individuals,

More information

When the deleterious allele is completely recessive the equilibrium frequency is: 0.9

When the deleterious allele is completely recessive the equilibrium frequency is: 0.9 PROBLEM SET 2 EVOLUTIONARY BIOLOGY FALL 2016 KEY Mutation, Selection, Migration, Drift (20 pts total) 1) A small amount of dominance can have a major effect in reducing the equilibrium frequency of a harmful

More information

Discontinuous Traits. Chapter 22. Quantitative Traits. Types of Quantitative Traits. Few, distinct phenotypes. Also called discrete characters

Discontinuous Traits. Chapter 22. Quantitative Traits. Types of Quantitative Traits. Few, distinct phenotypes. Also called discrete characters Discontinuous Traits Few, distinct phenotypes Chapter 22 Also called discrete characters Quantitative Genetics Examples: Pea shape, eye color in Drosophila, Flower color Quantitative Traits Phenotype is

More information

Reliability, validity, and all that jazz

Reliability, validity, and all that jazz Reliability, validity, and all that jazz Dylan Wiliam King s College London Introduction No measuring instrument is perfect. The most obvious problems relate to reliability. If we use a thermometer to

More information

PopGen4: Assortative mating

PopGen4: Assortative mating opgen4: Assortative mating Introduction Although random mating is the most important system of mating in many natural populations, non-random mating can also be an important mating system in some populations.

More information

Ch 8 Practice Questions

Ch 8 Practice Questions Ch 8 Practice Questions Multiple Choice Identify the choice that best completes the statement or answers the question. 1. What fraction of offspring of the cross Aa Aa is homozygous for the dominant allele?

More information

Allowing for Missing Parents in Genetic Studies of Case-Parent Triads

Allowing for Missing Parents in Genetic Studies of Case-Parent Triads Am. J. Hum. Genet. 64:1186 1193, 1999 Allowing for Missing Parents in Genetic Studies of Case-Parent Triads C. R. Weinberg National Institute of Environmental Health Sciences, Research Triangle Park, NC

More information

Summary. Introduction. Atypical and Duplicated Samples. Atypical Samples. Noah A. Rosenberg

Summary. Introduction. Atypical and Duplicated Samples. Atypical Samples. Noah A. Rosenberg doi: 10.1111/j.1469-1809.2006.00285.x Standardized Subsets of the HGDP-CEPH Human Genome Diversity Cell Line Panel, Accounting for Atypical and Duplicated Samples and Pairs of Close Relatives Noah A. Rosenberg

More information

Genomewide Linkage of Forced Mid-Expiratory Flow in Chronic Obstructive Pulmonary Disease

Genomewide Linkage of Forced Mid-Expiratory Flow in Chronic Obstructive Pulmonary Disease ONLINE DATA SUPPLEMENT Genomewide Linkage of Forced Mid-Expiratory Flow in Chronic Obstructive Pulmonary Disease Dawn L. DeMeo, M.D., M.P.H.,Juan C. Celedón, M.D., Dr.P.H., Christoph Lange, John J. Reilly,

More information

Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection

Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection Author's response to reviews Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection Authors: Jestinah M Mahachie John

More information

Solutions to Genetics Unit Exam

Solutions to Genetics Unit Exam Solutions to Genetics Unit Exam Question 1 You are working with an ornamental fish that shows two color phenotypes, red or white. The color is controlled by a single gene. These fish are hermaphrodites

More information

Science Olympiad Heredity

Science Olympiad Heredity Science Olympiad Heredity Multiple Choice Identify the letter of the choice that best completes the statement or answers the question. 1. A Punnett square shows you all the ways in which can combine. a.

More information

MBG* Animal Breeding Methods Fall Final Exam

MBG* Animal Breeding Methods Fall Final Exam MBG*4030 - Animal Breeding Methods Fall 2007 - Final Exam 1 Problem Questions Mick Dundee used his financial resources to purchase the Now That s A Croc crocodile farm that had been operating for a number

More information

Lecture 6 Practice of Linkage Analysis

Lecture 6 Practice of Linkage Analysis Lecture 6 Practice of Linkage Analysis Jurg Ott http://lab.rockefeller.edu/ott/ http://www.jurgott.org/pekingu/ The LINKAGE Programs http://www.jurgott.org/linkage/linkagepc.html Input: pedfile Fam ID

More information

White Paper Estimating Genotype-Specific Incidence for One or Several Loci

White Paper Estimating Genotype-Specific Incidence for One or Several Loci White Paper 23-01 Estimating Genotype-Specific Incidence for One or Several Loci Authors: Mike Macpherson Brian Naughton Andro Hsu Joanna Mountain Created: September 5, 2007 Last Edited: November 18, 2007

More information

Problem set questions from Final Exam Human Genetics, Nondisjunction, and Cancer

Problem set questions from Final Exam Human Genetics, Nondisjunction, and Cancer Problem set questions from Final Exam Human Genetics, Nondisjunction, and ancer Mapping in humans using SSRs and LOD scores 1. You set out to genetically map the locus for color blindness with respect

More information

Transmission Disequilibrium Methods for Family-Based Studies Daniel J. Schaid Technical Report #72 July, 2004

Transmission Disequilibrium Methods for Family-Based Studies Daniel J. Schaid Technical Report #72 July, 2004 Transmission Disequilibrium Methods for Family-Based Studies Daniel J. Schaid Technical Report #72 July, 2004 Correspondence to: Daniel J. Schaid, Ph.D., Harwick 775, Division of Biostatistics Mayo Clinic/Foundation,

More information

Activities to Accompany the Genetics and Evolution App for ipad and iphone

Activities to Accompany the Genetics and Evolution App for ipad and iphone Activities to Accompany the Genetics and Evolution App for ipad and iphone All of the following questions can be answered using the ipad version of the Genetics and Evolution App. When using the iphone

More information

Multifactorial Inheritance

Multifactorial Inheritance S e s s i o n 6 Medical Genetics Multifactorial Inheritance and Population Genetics J a v a d J a m s h i d i F a s a U n i v e r s i t y o f M e d i c a l S c i e n c e s, Novemb e r 2 0 1 7 Multifactorial

More information

Imaging Genetics: Heritability, Linkage & Association

Imaging Genetics: Heritability, Linkage & Association Imaging Genetics: Heritability, Linkage & Association David C. Glahn, PhD Olin Neuropsychiatry Research Center & Department of Psychiatry, Yale University July 17, 2011 Memory Activation & APOE ε4 Risk

More information

Problem 3: Simulated Rheumatoid Arthritis Data

Problem 3: Simulated Rheumatoid Arthritis Data Problem 3: Simulated Rheumatoid Arthritis Data Michael B Miller Michael Li Gregg Lind Soon-Young Jang The plan

More information

Exam #2 BSC Fall. NAME_Key correct answers in BOLD FORM A

Exam #2 BSC Fall. NAME_Key correct answers in BOLD FORM A Exam #2 BSC 2011 2004 Fall NAME_Key correct answers in BOLD FORM A Before you begin, please write your name and social security number on the computerized score sheet. Mark in the corresponding bubbles

More information

What we mean more precisely is that this gene controls the difference in seed form between the round and wrinkled strains that Mendel worked with

What we mean more precisely is that this gene controls the difference in seed form between the round and wrinkled strains that Mendel worked with 9/23/05 Mendel Revisited In typical genetical parlance the hereditary factor that determines the round/wrinkled seed difference as referred to as the gene for round or wrinkled seeds What we mean more

More information

Bio 1M: Evolutionary processes

Bio 1M: Evolutionary processes Bio 1M: Evolutionary processes Evolution by natural selection Is something missing from the story I told last chapter? Heritable variation in traits Selection (i.e., differential reproductive success)

More information

DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK

DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK CHAPTER 6 DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK Genetic research aimed at the identification of new breast cancer susceptibility genes is at an interesting crossroad. On the one hand, the existence

More information

Reliability, validity, and all that jazz

Reliability, validity, and all that jazz Reliability, validity, and all that jazz Dylan Wiliam King s College London Published in Education 3-13, 29 (3) pp. 17-21 (2001) Introduction No measuring instrument is perfect. If we use a thermometer

More information

Lab 5: Testing Hypotheses about Patterns of Inheritance

Lab 5: Testing Hypotheses about Patterns of Inheritance Lab 5: Testing Hypotheses about Patterns of Inheritance How do we talk about genetic information? Each cell in living organisms contains DNA. DNA is made of nucleotide subunits arranged in very long strands.

More information

The plant of the day Pinus longaeva Pinus aristata

The plant of the day Pinus longaeva Pinus aristata The plant of the day Pinus longaeva Pinus aristata Today s Topics Non-random mating Genetic drift Population structure Big Questions What are the causes and evolutionary consequences of non-random mating?

More information

HERITABILITY INTRODUCTION. Objectives

HERITABILITY INTRODUCTION. Objectives 36 HERITABILITY In collaboration with Mary Puterbaugh and Larry Lawson Objectives Understand the concept of heritability. Differentiate between broad-sense heritability and narrowsense heritability. Learn

More information

What favorite organism of geneticists is described in the right-hand column?

What favorite organism of geneticists is described in the right-hand column? What favorite organism of geneticists is described in the right-hand column? Model Organism fruit fly?? Generation time 12 days ~ 5000 days Size 2 mm 1500-1800mm Brood size hundreds a couple dozen would

More information

Overview of Animal Breeding

Overview of Animal Breeding Overview of Animal Breeding 1 Required Information Successful animal breeding requires 1. the collection and storage of data on individually identified animals; 2. complete pedigree information about the

More information

Quantitative Trait Analysis in Sibling Pairs. Biostatistics 666

Quantitative Trait Analysis in Sibling Pairs. Biostatistics 666 Quantitative Trait Analsis in Sibling Pairs Biostatistics 666 Outline Likelihood function for bivariate data Incorporate genetic kinship coefficients Incorporate IBD probabilities The data Pairs of measurements

More information

additive genetic component [d] = rded

additive genetic component [d] = rded Heredity (1976), 36 (1), 31-40 EFFECT OF GENE DISPERSION ON ESTIMATES OF COMPONENTS OF GENERATION MEANS AND VARIANCES N. E. M. JAYASEKARA* and J. L. JINKS Department of Genetics, University of Birmingham,

More information

Inbreeding: Its Meaning, Uses and Effects on Farm Animals

Inbreeding: Its Meaning, Uses and Effects on Farm Animals 1 of 10 11/13/2009 4:33 PM University of Missouri Extension G2911, Reviewed October 1993 Inbreeding: Its Meaning, Uses and Effects on Farm Animals Dale Vogt, Helen A. Swartz and John Massey Department

More information

Lab Activity 36. Principles of Heredity. Portland Community College BI 233

Lab Activity 36. Principles of Heredity. Portland Community College BI 233 Lab Activity 36 Principles of Heredity Portland Community College BI 233 Terminology of Chromosomes Homologous chromosomes: A pair, of which you get one from mom, and one from dad. Example: the pair of

More information

Statistical power and significance testing in large-scale genetic studies

Statistical power and significance testing in large-scale genetic studies STUDY DESIGNS Statistical power and significance testing in large-scale genetic studies Pak C. Sham 1 and Shaun M. Purcell 2,3 Abstract Significance testing was developed as an objective method for summarizing

More information

INTRODUCTION TO GENETIC EPIDEMIOLOGY (EPID0754) Prof. Dr. Dr. K. Van Steen

INTRODUCTION TO GENETIC EPIDEMIOLOGY (EPID0754) Prof. Dr. Dr. K. Van Steen INTRODUCTION TO GENETIC EPIDEMIOLOGY (EPID0754) Prof. Dr. Dr. K. Van Steen DIFFERENT FACES OF GENETIC EPIDEMIOLOGY 1 Basic epidemiology 1.a Aims of epidemiology 1.b Designs in epidemiology 1.c An overview

More information

Transmission Disequilibrium Test in GWAS

Transmission Disequilibrium Test in GWAS Department of Computer Science Brown University, Providence sorin@cs.brown.edu November 10, 2010 Outline 1 Outline 2 3 4 The transmission/disequilibrium test (TDT) was intro- duced several years ago by

More information

On the Effective Size of Populations With Separate Sexes, With Particular Reference to Sex-Linked Genes

On the Effective Size of Populations With Separate Sexes, With Particular Reference to Sex-Linked Genes Copyright 0 1995 by the Genetics Society of America On the Effective Size of Populations With Separate Sexes, With Particular Reference to Sex-Linked Genes Armo Caballero Institute of Cell, Animal Population

More information

STATISTICAL ANALYSIS FOR GENETIC EPIDEMIOLOGY (S.A.G.E.) INTRODUCTION

STATISTICAL ANALYSIS FOR GENETIC EPIDEMIOLOGY (S.A.G.E.) INTRODUCTION STATISTICAL ANALYSIS FOR GENETIC EPIDEMIOLOGY (S.A.G.E.) INTRODUCTION Release 3.1 December 1997 - ii - INTRODUCTION Table of Contents 1 Changes Since Last Release... 1 2 Purpose... 3 3 Using the Programs...

More information

Your DNA extractions! 10 kb

Your DNA extractions! 10 kb Your DNA extractions! 10 kb Quantitative characters: polygenes and environment Most ecologically important quantitative traits (QTs) vary. Distributions are often unimodal and approximately normal. Offspring

More information

Introduction to the Genetics of Complex Disease

Introduction to the Genetics of Complex Disease Introduction to the Genetics of Complex Disease Jeremiah M. Scharf, MD, PhD Departments of Neurology, Psychiatry and Center for Human Genetic Research Massachusetts General Hospital Breakthroughs in Genome

More information

Case Studies in Ecology and Evolution

Case Studies in Ecology and Evolution 2 Genetics of Small Populations: the case of the Laysan Finch In 1903, rabbits were introduced to a tiny island in the Hawaiian archipelago called Laysan Island. That island is only 187 ha in size, in

More information

Mendel. The pea plant was ideal to work with and Mendel s results were so accurate because: 1) Many. Purple versus flowers, yellow versus seeds, etc.

Mendel. The pea plant was ideal to work with and Mendel s results were so accurate because: 1) Many. Purple versus flowers, yellow versus seeds, etc. Mendel A. Mendel: Before Mendel, people believed in the hypothesis. This is analogous to how blue and yellow paints blend to make. Mendel introduced the hypothesis. This deals with discrete units called

More information

Regression-based Linkage Analysis Methods

Regression-based Linkage Analysis Methods Chapter 1 Regression-based Linkage Analysis Methods Tao Wang and Robert C. Elston Department of Epidemiology and Biostatistics Case Western Reserve University, Wolstein Research Building 2103 Cornell Road,

More information

6. Unusual and Influential Data

6. Unusual and Influential Data Sociology 740 John ox Lecture Notes 6. Unusual and Influential Data Copyright 2014 by John ox Unusual and Influential Data 1 1. Introduction I Linear statistical models make strong assumptions about the

More information

Mendelian Genetics. KEY CONCEPT Mendel s research showed that traits are inherited as discrete units.

Mendelian Genetics. KEY CONCEPT Mendel s research showed that traits are inherited as discrete units. KEY CONCEPT Mendel s research showed that traits are inherited as discrete units. Mendel laid the groundwork for genetics. Traits are distinguishing characteristics that are inherited. Genetics is the

More information

Mendelian & Complex Traits. Quantitative Imaging Genomics. Genetics Terminology 2. Genetics Terminology 1. Human Genome. Genetics Terminology 3

Mendelian & Complex Traits. Quantitative Imaging Genomics. Genetics Terminology 2. Genetics Terminology 1. Human Genome. Genetics Terminology 3 Mendelian & Complex Traits Quantitative Imaging Genomics David C. Glahn, PhD Olin Neuropsychiatry Research Center & Department of Psychiatry, Yale University July, 010 Mendelian Trait A trait influenced

More information

READING ASSIGNMENT GENETIC ANALYSIS OF DROSOPHILA POPULATIONS I. HOW DO MITOSIS AND MEIOSIS COMPARE?

READING ASSIGNMENT GENETIC ANALYSIS OF DROSOPHILA POPULATIONS I. HOW DO MITOSIS AND MEIOSIS COMPARE? READING ASSIGNMENT GENETIC ANALYSIS OF DROSOPHILA POPULATIONS I. HOW DO MITOSIS AND MEIOSIS COMPARE? II. HOW CAN WE DETERMINE EXPECTED RATIOS OF OFFSPRING? What rules can we learn from Mendel s work with

More information

National Disease Research Interchange Annual Progress Report: 2010 Formula Grant

National Disease Research Interchange Annual Progress Report: 2010 Formula Grant National Disease Research Interchange Annual Progress Report: 2010 Formula Grant Reporting Period July 1, 2011 June 30, 2012 Formula Grant Overview The National Disease Research Interchange received $62,393

More information

Quantitative Genetics

Quantitative Genetics Instructor: Dr. Martha B Reiskind AEC 550: Conservation Genetics Spring 2017 We will talk more about about D and R 2 and here s some additional information. Lewontin (1964) proposed standardizing D to

More information

The laws of Heredity. Allele: is the copy (or a version) of the gene that control the same characteristics.

The laws of Heredity. Allele: is the copy (or a version) of the gene that control the same characteristics. The laws of Heredity 1. Definition: Heredity: The passing of traits from parents to their offspring by means of the genes from the parents. Gene: Part or portion of a chromosome that carries genetic information

More information

Combined Analysis of Hereditary Prostate Cancer Linkage to 1q24-25

Combined Analysis of Hereditary Prostate Cancer Linkage to 1q24-25 Am. J. Hum. Genet. 66:945 957, 000 Combined Analysis of Hereditary Prostate Cancer Linkage to 1q4-5: Results from 77 Hereditary Prostate Cancer Families from the International Consortium for Prostate Cancer

More information

Performing. linkage analysis using MERLIN

Performing. linkage analysis using MERLIN Performing linkage analysis using MERLIN David Duffy Queensland Institute of Medical Research Brisbane, Australia Overview MERLIN and associated programs Error checking Parametric linkage analysis Nonparametric

More information

Evaluation Models STUDIES OF DIAGNOSTIC EFFICIENCY

Evaluation Models STUDIES OF DIAGNOSTIC EFFICIENCY 2. Evaluation Model 2 Evaluation Models To understand the strengths and weaknesses of evaluation, one must keep in mind its fundamental purpose: to inform those who make decisions. The inferences drawn

More information

Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 6 Patterns of Inheritance

Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 6 Patterns of Inheritance Chapter 6 Patterns of Inheritance Genetics Explains and Predicts Inheritance Patterns Genetics can explain how these poodles look different. Section 10.1 Genetics Explains and Predicts Inheritance Patterns

More information

Lessons in biostatistics

Lessons in biostatistics Lessons in biostatistics The test of independence Mary L. McHugh Department of Nursing, School of Health and Human Services, National University, Aero Court, San Diego, California, USA Corresponding author:

More information