STATISTICAL ISSUES IN THE DESIGN AND ANALYSIS OF GENE-DISEASE ASSOCIATION STUDIES. Duncan C. Thomas

Size: px
Start display at page:

Download "STATISTICAL ISSUES IN THE DESIGN AND ANALYSIS OF GENE-DISEASE ASSOCIATION STUDIES. Duncan C. Thomas"

Transcription

1 Revised 5/6/02 STATISTICAL ISSUES IN THE DESIGN AND ANALYSIS OF GENE-DISEASE ASSOCIATION STUDIES Duncan C. Thomas To appear in: Human Genome Epidemiology: Scientific Foundation for Using Genetic Information to Improve Health and Prevent Disease Edited by MJ Khoury, J Little, W Burke 1

2 INTRODUCTION Genetic epidemiology comprises two broad types of activity that entail the use of biological specimens: gene discovery and gene characterization. Studies of familial aggregation and segregation analysis are aimed at establishing a genetic component to a disease and inferring the mode of inheritance, but do not use any DNA analysis. Once evidence for the existence of one or more major genes has been found, geneticists use linkage analysis to localize these genes by identifying genetic markers at known chromosomal locations which appear to be transmitted within families in a manner that parallels the transmission of the disease. Once localized in this manner, association studies can be used either to test hypotheses about possible candidate genes within the region or to further localize the region using linkage disequilibrium. Association studies are also used once a causal gene has been cloned to characterize its age-specific penetrance function and interactions with other factors. Following a brief review of methods for gene discovery to set the stage for a unified approach to gene discovery and gene characterization, the remainder of this chapter focuses on design and analysis issues in these various types of association studies,. Throughout, we limit attention to binary disease traits, although many of the design and analysis issues also apply to continuous traits. Although there is obviously a continuous spectrum of gene effects, we are accustomed to thinking in terms of two general types of genes that are potentially detectable by genetic epidemiologists: major susceptibility genes having a high penetrance, such mutations usually being rare in the general population; and common low penetrance genes, such as those 2

3 involved in metabolic activation and detoxification of carcinogens, DNA repair, and other complex pathways involving multiple genes and interactions with environmental agents. It is increasingly being recognized that the classical positional cloning approaches (i.e., linkage analysis) are more effective for discovery of major susceptibility genes than common low penetrance genes, and genome-wide association studies are now being suggested as an approach to detection of the latter type (1). A variety of study designs are available for these various purposes. Linkage analysis of necessity requires family studies, typically either sib-pair designs (affected sib pairs for a binary disease trait) or extended pedigrees. Association studies, on the other hand, can use either family designs or population-based designs involving unrelated individuals. These various options will be discussed in greater detail below, with the suggestion that it is possible to design efficient population-based family studies that can be used for both linkage and association. We will also discuss the use of collections of high risk families in gene characterization studies and discuss a general population-based framework for discovery and characterization. Finally, we touch briefly on a number of modeling issues and future challenges that are likely to arise in studying complex diseases METHODS FOR GENE DISCOVERY Linkage analysis entails a search across the genome for markers that are associated with the disease within families, i.e., that there is a tendency for pairs of affected individuals to have the same alleles at that marker locus. Between families, however, different alleles may be associated 3

4 with the disease, because the marker allele per se has no causal role in the disease; the marker simply travels with the disease allele from parents to offspring because the two are close together on a chromosome. Thus, it is possible for a marker to be linked to a disease gene but not associated with it. Once one or more markers in a region have been found to be linked, additional nearby markers are then tested in an effort to localize the disease gene more precisely. Efficient multistage testing strategies for such genome scans have been discussed by Brown et al (2) and Elston et al (3). Multipoint linkage analyses might use several markers jointly for greater precision in estimating that location than can be obtain from a series of two-point linkage analyses one marker at a time (4). Linkage analyses can be model-based (parametric or lod score ) or model-free ( nonparametric ). The lod score method is based on the likelihood of the observed marker and disease data in a pedigree under a model for the distribution of the unobserved disease gene. Typically, the parameters of such a model (e.g., the age- and sex-specific penetrance function and disease allele frequency) are assumed to be known from earlier segregation analyses and the likelihood is maximized with respect to the recombination fraction θ (or the location of the disease gene in a multipoint analysis). This approach can be applied to nuclear families or extended pedigrees. By modeling the conditional distribution of markers given disease phenotypes, no assumption about the families having been ascertained in any systematic statistical sampling manner is needed; indeed, heavily loaded families identified through genetic counseling clinics are often used for this purpose and are typically the most informative for linkage analysis, even though they would be in no sense population-based. The lod score method is the most powerful approach if the genetic model is correctly specified, but can lose 4

5 power or even produce false evidence of linkage under some kinds of misspecification. In contrast, nonparametric approaches do not require any assumptions about the genetic model and thus are robust to model misspecification, but are generally less powerful than lod score methods; furthermore, the choice of the optimal nonparametric test will still depend upon the presumed mode of inheritance. Nonparametric methods are based on a comparison of the proportion of alleles shared identical by descent (IBD) by pairs of affected relatives against the proportion expected based solely on their relationship (e.g., one quarter of sibling pairs would be expected to share zero, one half to share one, and one quarter to share two alleles IBD). This approach is most commonly applied to affected sib pairs (where possible their parents are also included to aid in the determination of IBD status), although it is possible to include other relative types in other forms of analysis. The limit of resolution of linkage analysis is generally felt to be not much smaller than about 1 cm (1% recombination, corresponding to roughly one million base pairs (bp)), even with the largest pedigree studies. Thus, other techniques are needed to further localize a disease gene before undertaking massive sequencing in search of mutations. These techniques typically entail use of a very dense panel of markers, perhaps on the order of 10 kb (0.01 cm), which can be used in various ways. The simplest of these is to search for markers one-by-one that appear to associated with the trait across the population, a phenomenon known as linkage disequilibrium (LD). LD can arise in a number of ways new mutations, genetic drift, admixture, etc. but will tend to decay across generations G at a rate (1 θ) G ; thus, after many generations from the event that created the LD initially, only very nearby pairs of loci will remain associated. The extent of detectable LD in various human populations and its usefulness as a mapping tool is an 5

6 active area of research, but in most outbred populations is generally believed to be very short range, ~100kb or less. (In recently admixed populations, LD will generally extend over a much wider interval, making them potentially useful for genome scans (5); LD tends to be larger in magnitude and more consistent in population isolates, making them more useful for fine mapping (6).) Even within a region of significant LD, its magnitude will be extremely variable across pairs of nearby loci, owing to chance mutation, recombination, and coalescent events in the ancestral history of the population. Therefore, there is now great interest in using haplotypes sequences of adjacent marker alleles on a single chromosome as a tool for fine mapping. A variety of approaches have been proposed for doing this, including searching for segments that are frequently shared by pairs of cases (7-10), association between specific haplotypes and disease (11, 12), or using coalescent methods to model the ancestry and evolution of mutation-carrying haplotypes (13-15). DESIGNS FOR ASSOCIATION STUDIES Whether the aim is fine mapping by LD, testing an association with a candidate gene, or characterizing a cloned gene, a number of different study designs could be used, their relative merits depending upon the context. The most important distinction between these designs concerns whether families or unrelated individuals are studied. We therefore begin by discussing the standard population-based epidemiologic case-control and cohort designs using unrelated individuals and then survey a range of family-based designs case-control, case-parent trios, kin-cohort, and use of heavily loaded pedigrees. 6

7 Population-Based Case-Control and Cohort Designs The majority of disease traits studied in genetic epidemiology are relatively rare, so that casecontrol designs are natural to consider. The design of such studies is essentially no different for a genetic risk factor than for environmental risk factors and follows well recognized principles discussed in standard epidemiologic textbooks (e.g., (16-18)). Thus, for example, cases should be representative of all cases in the population and controls should be representative of the source population of cases. This is most easily accomplished in situations where a populationbased disease registry exists, such as the SEER registries in the U.S., and where there is some means of sampling from the total population. The latter is more difficult in the U.S., although some other countries maintain voter registration or other databases that are available for epidemiologic research. Absent such a register, some imagination may be needed to construct a suitable control selection procedure: techniques such as neighborhood censuses, random digit dialing, sampling from birth registries (for childhood diseases) or Medicare files (for diseases of old age), prepaid health maintenance organization rosters, or hospital controls have been used in various epidemiologic studies and their advantages and disadvantages have been widely discussed. Cases and controls are frequently individually or stratum-matched on potential confounding variables, such as age, gender, race, and possibly other established risk factors. See (19) in this volume for further discussion of some of the issues of validity and efficiency that can arise in such studies. In some respects, one of the major drawbacks of case-control studies in conventional risk factor epidemiology recall bias due to retrospective collection of exposure information is not as much a concern in genetic epidemiology, since a subject s constitutional genotype does vary over time and is not subject to the vagaries of an individual s memory. (Of 7

8 course, phenotypic assays of genotype could be distorted by the disease process, and other confounding or modifying factors could be misclassified.) Indeed, Clayton and McKeigue (20) have argued that because the transmission of genes from parents to offspring is random, a gene association study carries the same interpretability in terms of causality as a randomized control trial, at least in terms of freedom from bias and residual confounding. Thus such associations would reflect a causal effect of either the variant under study or a nearby one in linkage disequilibrium with it. This Mendelian randomization argument is directly applicable to the case-parent trio design discussed below, but they also apply it to ordinary case-control and cohort designs involving unrelated individuals. However, this extension of the principle requires that one address the problem of population stratification using one of the approaches discussed below. Cohort studies have well recognized advantages and disadvantages (20). For the purpose of genetic epidemiology, few investigators would contemplate initiating a new prospective cohort study for any but the most common diseases, but there are now in excess of a million persons enrolled in various existing cohorts for whom biological specimens have already been obtained and stored. Some of these cohorts have already accrued decades of follow-up time and represent a rich resource for genetic association studies (21). The cost of genotyping everyone in a large cohort would likely be prohibitive, even with recently developed high throughput technologies, but this can be avoided using nested case-control (22) or case-cohort (23) designs. These entail comparison of cases arising in the cohort with a sample of suitably selected controls drawn from the cohort, thereby capitalizing on the inferential advantages of a cohort design at greatly reduced cost. The design of such nested studies is in principle no different when studying a 8

9 genetic association than any other risk factor and are discussed in standard textbooks (24) and review articles (25). However, a number of options for efficient sampling are available, such as multistage sampling (26-29) and countermatching (30, 31). Multistage sampling might entail an initial random sampling of cases and controls on whom a surrogate for some risk factor (e.g., family history as a surrogate for genotype) is obtained. Subjects are then subsampled using this information for the more expensive determination of genotype and perhaps other risk factors. Countermatching aims to improve the efficiency of a matched case-control design by increasing the proportion of pairs that are discordant for the risk factor(s) of interest through systematically mismatching them on a surrogate for the factor: for example, in a genetic study, a case with a positive family history might be matched with a family-history-negative control and vice versa. The inherent bias in both these designs is then accounted for by including suitable weights in the analysis. Whether case-control, cohort, or any of these nested designs are used, any association study based on unrelated individuals potentially suffers from a form of confounding known in the genetics literature as population stratification (32). If the population comprises two or more subgroups with different allele frequencies and different baseline rates of disease, then Allele frequency Candidate Gene confounding can occur (see Figure), leading Ethnicity to increased risk of false positive associations, and biasing relative risk Baseline disease risk Disease estimates upwards or downwards, depending upon the direction of these two associations. If these subgroups were identifiable, standard techniques such as matching or statistical adjustment 9

10 could be used to control this problem indeed, epidemiologic studies are routinely matched or adjusted for race/ethnicity. The difficulty is that even within the broad categories of race/ethnicity that are conventionally used, there can be strong gradients in allele frequencies and baseline risks. Some authors (19, 33-35) have questioned the practical importance of this concern, at least for studies of common polymorphisms in non-hispanic whites of European descent, arguing that any correlation between baseline rates and allele frequencies that would give rise to confounding tends to disappear as larger numbers of subgroups are combined. Hence, they argue that adherence to standard principles of sound epidemiologic study design should be adequate to address it. However, in multiethnic populations, particularly heavily admixed populations such as African-Americans or Hispanics or those with a high prevalence of multi-racial individuals as in Southern California, individuals can be difficult to classify or match appropriately (36). A different approach to this problem which is gaining some theoretical attention (but remains to be applied widely), known as genomic control, is based on using a panel of markers unlinked to the gene under study to infer the hidden population structure and adjust for it. One such approach uses the distribution of test statistics for the markers to estimate an inflation factor by which to adjust the naïve chi square test for the overdispersion caused by population stratification (37, 38). Another approach uses Bayesian clustering or latent class analysis methods to assign cases and controls probabilistically to strata defined by their markers and then perform a stratified analysis within these strata (39, 40). Although not the focus of this chapter, all the designs discussed in this section and the familybased designs section which follows can also be used for testing gene-environment and genegene interactions (see (19) for more discussion of this topic). Another approach to testing such 10

11 interaction effects is known as the case-only or case-case design (41-43). In this approach, a series of unrelated cases is used and the association between genotype and environment (or two genes) is tested. If the two factors were independently distributed in the source population, then any such association in cases would be evidence of departure from a multiplicative model for their joint effects on disease risk. Of course, with this design it is not possible to test the main effects of either factor and careful consideration is needed to judge whether the assumption of independence in the source population is tenable (44), but if it is, the design is more powerful for testing that interaction than a conventional case-control design. Family-Based Designs The use of family-member case-control designs is appealing because family members have a common gene pool and hence the problem of population stratification is overcome by matching. The two main variants of this idea involve the use of sibling controls or case-parent-trios (also known as parental controls or pseudo-siblings ). In a case-sibling study, affected individuals are the cases and their unaffected siblings the controls, the data being analyzed as a matched case-control study using standard conditional logistic regression methods. (Unaffected cousins could also be used instead of siblings, although the protection against population stratification would no longer be absolute, since they would have only one pair of grandparents in common and the other grandparents might come from different ethnic groups.) In the case-parent-trio design, cases and their parents are genotyped, but the parents themselves are not used as controls; instead, one forms the set of hypothetical pseudo-siblings comprising the three other genotypes that could have been transmitted from the parents; the case-pseudosib sets are then 11

12 analyzed as a 1:3 matched case-control design using conditional logistic regression (45). A variant of this design, known as the Transmission-Disequilibrium Test (TDT) (46) compares each of the two alleles for the case separately against the other allele not transmitted by that parent as two independent contributions to a 1:1 matched case-control comparison of alleles rather than a single genotype comparison. The two procedures are mathematically equivalent under a multiplicative model for allele contributions to risk, but would differ under a dominant or recessive model. Both the case-sib and the case-parent-trio designs test an alternative hypothesis of linkage and association, i.e., they will detect associations only with causal genes or with genes that are in linkage disequilibrium with a causal gene. There are a number of drawbacks to the use of these designs. Because cases and their siblings are more likely to share genotypes (and environmental factors), their comparison tends to be less efficient than using unrelated controls (depending upon the genetic model, about 50% as efficient, meaning that double the sample size would be required to obtain the same statistical precision or power). For gene-environment interactions, however, the use of sib controls can be more efficient (47). In general, siblings should have attained the age of the case s diagnosis to rule out the possibility that he or she might still have been affected prior to the case (48), effectively limiting the pool of potential controls to older siblings for many cases; this lack of comparability on birth order, however, risks introduction of other biases, particularly if timedependent environmental variables are to be included in the model or if a substantial proportion of cases need to be excluded for lack of a suitable sib control (47). There are other subtleties if multiple cases or multiple controls are selected from the same family, since the possible permutations of disease status against genotypes are not equally likely under the null hypothesis 12

13 that the gene is not itself causal but is linked to a causal gene (49). Furthermore, the familial relationships amongst cases and controls must not give away who must have been the case: for example, if cousins were used as controls and one were drawn from each side of the family, it would be obvious which was the case because that he or she would be the only one who was a blood relative of both of the other two. While some authors have advocated limiting such comparisons by selecting a single case and a single control from each family (for example, taking the pair with the maximally different genotypes (49)), there has been a rapidly developing literature on valid family-based-association-tests (FBATs) that would exploit all the possible comparisons within a family (50-52). The case-parent-trio design generally does not suffer from the efficiency loss that the case-sib design does, and indeed can be more powerful that using unrelated controls for a recessive gene (47). However, it does require that both parents be available for genotyping, which makes it difficult to apply to diseases or middle- or old-age. Although some information is contained in the transmission from a single parent, if that is all that is available, care must be taken to avoid bias, since the subset of transmissions for which parental sources can be inferred unambiguously is not random (53, 54). As with the case-sib design, families with multiple cases do not contribute independent information under the null hypothesis of linkage but no association, so more sophisticated techniques are required (55). Although the trio design cannot be used to test for the main effects of environmental factors, it can in principle test for gene-environment interactions by comparing the genetic relative risks in exposed and unexposed cases (56). This comparison, however, involves an assumption that genes and environments are independently distributed within families (i.e., conditional on parental genotypes) (43, 57); an assumption that 13

14 is similar to that for the case-only design, but somewhat weaker because it applies within families, not between families, and thus would not be influenced by such factors as family history that could potentially induce such an association. See (58, 59) for a log-linear models approach to case-parent-trio data, with particular application to testing maternal genotype effects and imprinting. For example, birth defects could involve a direct or interactive effect of maternal alleles, in which case the deleterious allele would tend to have a higher frequency in mothers than fathers. A more complex design is the case-control-family study. Here, population-based series of unrelated cases and controls are identified, possibly matched on various factors as in a traditional case-control study, and their family members are also recruited as study participants. While more commonly used for testing familial aggregation (e.g., (60)) or segregation analysis (e.g., (61)) without use of any molecular data, the design can also be used for testing candidate gene associations or characterizing cloned genes (e.g., (62, 63)). Used in this way, it can be seen as an extension of the kin-cohort design discussed below, but its real advantage lies in its population-based nature and ability to serve as the basis for an integrated approach to gene discovery and gene characterization. The kin cohort design (64) entails ascertainment of a series of probands unselected with respect to family history and obtaining their genotypes and their family history in first-degree relatives (but relatives genotypes are not needed in this approach). The probands themselves could be affected or unaffected and need not necessarily be representative of the population (provided they are not biased with respect to family history). For example, in a study of the 14

15 penetrance of the BRCA1 and BRCA2 ancestral mutations in Ashkenazi Jews, Struewing et al. (65) enrolled volunteers from Jewish community organizations in the Washington, D.C. area. The cumulative incidence curves in first-degree relatives of carrier and noncarrier probands are then estimated using standard Kaplan-Meier survival analysis methods. Because first-degree relatives of carriers have a roughly 50% probability of carrying the same mutation while firstdegree relatives of noncarriers have only half the population probability of being a carrier, it is then possible to decompose the observed cumulative incidence curves into their constituent penetrance curves (cumulative incidence in carrier and noncarrier individuals) by a simple algebraic manipulation. This design has been extended in a number of ways. The relatively simple analysis described above does not exploit all the information in the sample, so Gail et al. proposed a maximum likelihood analysis, similar to segregation analysis but conditioning on the observed genotypes (66, 67). Using this likelihood, it then becomes straight-forward to extend the design to include more distant relatives, as well as measured genotype information on other relatives; they call this general approach the Genotyped Proband Design. Siegmund et al. (68) have considered the question of which members of a pedigree would be the most informative to genotype and concluded that for a common low penetrance dominant gene, genotyping additional relatives per family was more efficient that genotyping a single proband in a larger number of families (for the same total genotyping costs), while for a rare major dominant gene, the reverse was true. Since in the process of gene discovery, the heavily loaded families typically used are not representative of all cases in the population, it is natural to inquire whether any useful 15

16 information about penetrance or modifying factors can be obtained at the gene characterization phase. Their great advantage, beyond simply the cost efficiency from having already collected the pedigree information and biological specimens, is that such families will tend to have a higher prevalence of mutations, particularly for rare high penetrance genes, so that smaller sample sizes should be required. On the other hand, since such families were collected specifically because they had many cases, a naïve analysis that ignored the ascertainment process would greatly overestimate both penetrance and allele frequency, compared to their true values in the general population. While in principle, one might be able to construct a maximum likelihood analysis which conditioned on the ascertainment scheme, in practice such collections seldom can be described in terms of any well defined statistical sampling plan, and even if it were, the analysis of complex sampling schemes would likely be computationally intractable. Fortunately, an alternative approach is available that theoretically should allow valid estimates of population parameters even from samples that are not population-based. Known as the mod score (69-71) or retrospective likelihood (72) approach, the analysis is based on the conditional likelihood of the measured genotypes given the observed distribution of phenotypes in the family. By conditioning on the phenotypes in this manner, their ascertainment is automatically controlled for, assuming that families were ascertained solely on the basis of their phenotypes, not their genotypes. This is the approach that was used in the initial estimation of BRCA1/2 penetrance from the Breast Cancer Linkage Consortium families (73, 74), which led to an estimate of risk of breast or ovarian cancer by age 70 of 83%. Subsequent estimates based on the kin-cohort and population-based case-control-family designs have been substantially lower (63, 65, 75). This difference cannot be explained simply as an artifact of ascertainment bias because of the use of the mod score approach to analyze the high risk families. However, by 16

17 limiting that analysis to the linked families (done to address the problem of genetic heterogeneity, i.e., some families disease being due to genes other than the one under study), the assumption that families were ascertained solely on the basis of their phenotype was violated; this has been shown to lead to upwardly biased estimates of penetrance (76). Other explanations that have been offered to explain the discrepancy between the clinic-based and population-based estimates are that the former may also be segregating other modifying factors (other genes or environmental factors) leading to truly higher penetrance in such families, or that the penetrance varies by specific mutations, with the more commonly occurring mutations in the general population (e.g., the Ashkenazi founder mutations) having lower penetrance than those occurring in the heavily loaded families. INTEGRATED DESIGNS FOR DISCOVERY AND CHARACTERIZATION With this brief tour of approaches to discovering and characterizing genes, we now turn to the question of whether it is possible and efficient to try to design a resource that can be used for both purposes. The experience from the use of heavily loaded clinic-based collections of pedigrees to estimate BRCA1/2 penetrance should be somewhat cautionary about the limitations of relying exclusively on series that are not population-based. On the other hand, they are arguably the most efficient way to assemble pedigrees that are highly informative for linkage analysis. In an attempt to bridge this gap, Zhao et al (77, 78) have proposed a general framework based on the case-control-family design described above. Since the initial ascertainment of families is population-based, there would be no difficulty in estimating population parameters from such a design. Of course, the yield of rare major genes would be 17

18 relatively low, but multistage sampling of probands based on family history (as discussed earlier (28)) could be used to hone in on the families most likely to be segregating mutations, and this could be extended following the principles of sequential sampling of pedigrees (79). Here, the basic idea is that at each stage of pedigree extension, one is entitled to use all the phenotype information already collected systematically as well as knowledge of the pedigree structure (but not anecdotal information about phenotypes) in branches not yet explored, to decide whether and in what direction to extend the pedigree; once extended, all the phenotype information obtained must then be included in the analysis, whether additional cases were identified or not. Following these simple rules, Cannings and Thompson show that the likelihood for the pedigree need be conditioned solely on the initial ascertainment of probands, not on all the decisions made subsequently. Still, for mapping a very rare gene, it is unclear whether this process can yield a sufficient number of highly informative pedigrees, even using the most efficient approaches to multistage sampling and sequential extension of pedigrees, without requiring enrollment of a prohibitive number of probands. For genes with mutations that are not extremely rare, however, there is great merit in this approach, as it will not only provide a basis for mapping genes and then characterizing them in the same sample, but it will also provide a resource for continuing the search for additional genes after some have been discovered. For example, Antoniou et al. (80) and Cui et al. (81), using such approaches, have provided evidence for an additional major gene for breast cancer, possibly a more common recessive gene, after removing the families attributable to BRCA1 and BRCA2. Their approaches differ somewhat, Antoniou et al. fitting a multilocus model which includes the measured genes and all families in the analysis, Cui et al. 18

19 excluding the families known to be segregating one of the two measured genes. On the basis of such segregation analysis results, one might then feel confident to launch a further genome scan to localize such a gene, now using the more powerful lod score approaches which require a population-based estimate of the genetic model. Absent such knowledge, one would be forced to use the affected sib pair approach, first screening all pairs to exclude those that were carrying a known mutation. It is this general philosophy that underlies the establishment of the NCI s Cooperative Family Registries for Breast and Colorectal Cancer Research (82-85). In order to address the aims of both discovery and characterization, this multi-center resource comprises population-based and clinic-based series of families. The population-based series are ascertained through affected probands from population-based cancer registries, stratified in various ways. Some are unselected consecutive series, some restricted or sampled by age, race, or family history in firstdegree relatives; a few registries have used multistage sampling (29, 86). The clinic-based registries are intended to provide a large series of multiple-case families for gene discovery purposes, but would not be included in analyses aimed at characterization, except perhaps using the mod-score approach. Whatever the mode of ascertainment, all probands provide a standardized risk factor questionnaire, including extended family history, and blood samples that are being stored for genotyping and creation of cell lines. Participating centers differ in the specifics of their protocols for developing extended pedigrees, but in general as many surviving family members (affected and unaffected) as possible are enrolled as participants, providing the same risk factor information and blood samples which are also being stored. To date, over 6000 breast and 6000 colorectal cancer families have been enrolled, comprising over 100,000 19

20 individuals in each registry. Depending upon the specific scientific aims, these families might be sampled in various ways for genotyping. A variety of studies aimed at using this resource for gene discovery and characterization are currently underway. MODELS FOR COMPLEX DISEASES Whether parametric linkage analysis or association analysis is planned, some form of statistical model of penetrance is needed. Amongst the complexities that must be considered are variable age at onset, the role of polygenes, other major genes, and environmental factors, including their possible interactions, residual familial aggregation due to unmeasured factors, and heterogeneity of effect for genes with multiple mutations or polymorphisms. One might also wish to take account of somatic events, such as loss of heterozygosity, genomic instability, DNA methylation, and gene expression data. Genome-wide association studies are also being proposed as a means of gene discovery, perhaps requiring something of the order of a million statistical tests (1), introducing yet another level of statistical complexity. In this brief section, we can only outline a general approach to model building, leaving the details to other papers. For binary disease traits with variable age at onset, the techniques of survival analysis provide a natural framework for modeling penetrance. Letting λ(t) denote the incidence rate of disease at t age t ( hazard function ) and S(t) = exp( 0 λ(u) du) the probability of surviving to age t free of disease ( survival function ), then the likelihood contribution for a case diagnosed at age t is λ(t) S(t) and the contribution for a subject last seen at age t disease free at that time is simply S(t). If 20

21 we assumed that, conditional on all the measured risk factors, the outcomes of all subjects i = 1,,n were independent, then the overall likelihood of the data would be simply L = i=1 n λi (t i ) di S i(t i ) where d i is an indicator for affected (1) or not (0). The conditional independence assumption would not pose any difficulty for unrelated individuals (e.g., a population-based case-control study), but is more problematic for family data. If not all family members have been genotyped for a major gene, then a likelihood contribution for the entire family must be constructed by summing over the possible genotypes of all the untyped individuals that are compatible with the available genotype information on other family members. This is essentially a segregation analysis, but conditional on partially measured genotype information (63). Additional familial dependencies might be caused by other as yet unidentified genes, by unmeasured environmental factors, or by correlated measurement errors in measured risk factors. Such dependencies might be taken into account using regressive models (87), latent variable approaches like frailty models (88, 89), or marginal models using Generalized Estimating Equations methods (90). By whatever means the likelihood is constructed, a model is needed for the hazard function in relation to the various measured risk factors, genetic and environmental. One possibility is the proportional hazards model (91), which might be written as λ(t,g,z) = λ 0 (t) exp(β G + Z γ + ) 21

22 where G represents the major gene(s), β G the log relative risk associated with genotype G, Z the measured environmental covariates, λ 0 (t) an unspecified function representing the baseline risk as a function only of age, and indicates the possibility of adding additional interaction terms (e.g., gene-environment, gene-gene, gene-age, etc.). However, a number of major genes such as BRCA1 seem to have much stronger effects at younger ages on a relative risk scale. While this could be addressed by adding age genotype interaction terms, it might be preferable to reformulate the model as λ(t,g,z) = λ G (t) exp(z γ + ) i.e., with separate age-specific baseline rates for each genotype, but still assuming that environmental factors acted multiplicatively on these baseline rates, unless specific interaction terms were added to the model. In either of these approaches, the form of the baseline rates might be left completely unspecified, as in the Cox partial likelihood approach (92), or some parametric form could be adopted; for example, the S.A.G.E. package assumes a logistic distribution for the ages at onset amongst the affected, coupled with a logistic model for the lifetime risk of disease, either of which could depend upon genotype and/or covariates, as in an application to smoking-gene interactions for lung cancer (93). Other mathematical models might also be considered for the joint effects of age and genotype, such as an additive model of the form λ(t,g) = λ 0 (t) + β G or an accelerated failure time model of the form S(t,G) = S 0 (t e βg ). For example, Peto and Mack (94) have suggested that the rate of breast cancer in co-twins of affected twins or of second cancer in the contralateral breast is virtually constant as a function of 22

23 age or time since diagnosis of the first, suggesting an additive model for genetic effects might be appropriate. The coding of β G would depend upon what is assumed about dominance. For a dominant gene, with wild type allele a and mutant allele A, one would set β aa = 0 and constrain β aa = β AA ; likewise, for a recessive gene, one would set β aa = 0; for a codominant gene, one would estimate both β aa and β AA. For a multiallelic gene, one might have many more parameters to estimate. Most analysis of BRCA1 penetrance have treated all mutations as equivalent, but there is some evidence that different mutations confer different risks of breast vs ovarian cancer (95) and it remains an open question whether certain common polymorphisms in the gene also have an effect on penetrance (96). For genes like BRCA1 with hundreds of rare mutations, the prospects of ever having direct estimates of penetrance for any one of them are virtually nonexistent, so some kind of modeling approach is needed to test for systematic influences of broad classes of mutations (truncating or not, by location, etc.) as well as random between-mutation heterogeneity in effect. Hierarchical models (97) provide a natural framework for addressing such questions. Bayesian approaches to smoothing the effects of many haplotypes within a gene (sequences of alleles on a single chromosome) have also been suggested (12). This entails the use of a multi-level model, in which the first level would be a conventional logistic model for disease as a function of a set of relative risks for all possible haplotypes, and the second level would be a model for the prior means and covariances of haplotype relative risks in terms of their structural similarities to each other. 23

24 Increasingly, gene characterization efforts have been directed towards trying to understand complex pathways involving multiple genes and multiple exposures jointly, particularly for common polymorphisms in low-penetrance metabolic genes. For example, hypothesized causes of colorectal polyps and cancer include polycyclic aromatic hydrocarbons (PAHs) and heterocyclic amines (HCAs), which derive from tobacco smoke and well done red meat (98). The metabolic activation and detoxification of these compounds are regulated by a number of genes, including several Cytochrome P-450 enzymes (such as Cyp1A1 and Cyp1A2), various glutothione-s-transferases (such as GSTm3), N-acetyl-transferases (NAT1 and NAT2), and microsomal epoxide hydrolase (meh, aka EPHX1) (99). The complexity of these pathways makes it difficult to examine the effects of these exposures or these genes one at a time, or even in pairwise interactions, without allowing for the influence of the other factors, but the problems of sparse data and multiple comparisons precludes standard approaches based on multi-way stratification. Cortessis and Thomas (100) have proposed a Bayesian approach to such problems using physiologically based pharmacokinetic (PBPK) models. In essence, the approach entails estimating the concentrations of the various intermediate metabolites for each subject, as a function of the measured exposures and a set of unmeasured metabolic rates, which are in turn determined by the subject s genotypes at the relevant loci, and relating the estimated concentrations of the relevant metabolites to the disease risk. The distributions of the various individual parameters are determined by a set of population parameters that are the primary object of inference, e.g., regression coefficients for the contributions of exposures to pathways or of pathways to disease, means and variances of metabolic rates as a function of genotype, etc. 24

25 SUMMARY Both population-based and family-based designs have their uses in testing candidate gene associations and characterizing genes once their causal connection to a disease has been established. Appropriately designed, such studies can also be a useful resource for discovering other genes that may also be involved. Nonmendelian disorders may involve a complex interplay between multiple genes and multiple environmental factors, as well as age and other time-dependent factors, requiring sophisticated methods of analysis. While survival analysis techniques, such as Cox regression can provide a flexible framework for empirical modeling of penetrance functions, mechanistic models such as PBPK models for complex metabolic networks can also be useful. Stochastic models of carcinogenesis, which have long been used to describe exposure-time-response relationships for environmental exposures, might usefully be extended to incorporate the influence of germline mutations or such epigenetic phenomena as microsatellite instability and DNA methylation. 25

26 REFERENCES 1. Risch N, Merikangas K. The future of genetic studies of complex human diseases. Science 1996;273: Brown D, Gorin M, Weeks D. Efficient strategies for genomic searching using the affected-pedigree-member method of linkage analysis. American Journal of Human Genetics 1994;54: Elston R, Guo X, Williams L. Two-stage global search designs for linkage analysis using pairs of affected relatives. Genet Epidemiology 1996;13: Kruglyak L, Lander E. Complete multipoint sib-pair analysis of qualitative and quantitative traits. Am J Hum Genet 1995;57: Stephens J, Briscoe D, O'Brien S. Mapping by admixture linkage disequilibrium in human populations: limits and guidelines. Am J Hum Genet 1994;55: Jorde L. Linkage disequilibrium as a gene-mapping tool. Am J Hum Genet 1995;56: Houwen R, Baharloo S, Blankenship K, et al. Genome screening by searching for shared segments: mapping a gene for benign recurrent intrahepatic cholestasis. Nature Genet 1994;8: Qian D, Thomas D. Genome scan of complex traits by haplotype sharing correlation. Genet Epidemiol 2001;21:S582-S Te Meerman G, Van Der Meulen M. Genomic sharing surrounding alleles identical by descent effects of genetic drift and population growth. Genet Epidemiol 1997;14: Bourgain C, Genin E, Holopainen P, et al. Use of closely related affected individuals for the genetic study of complex diseases in founder populations. American Journal of Human Genetics 2001;68: Chiano M, Clayton D. Fine genetic mapping using haplotype analysis and the missing data problem. Ann Hum Genet 1998;62: Thomas D, Morrison J, Clayton D. Bayes estimates of haplotype effects. Genet Epidemiol 2001;21 (Suppl 1):S712-S McPeek M, Strahs A. Assessment of linkage disequilibrium by the decay of haplotype sharing, with application to fine-scale genetic mapping. Am J Hum Genet 1999;65: Morris A, Whittaker J, Balding D. Bayesian fine-scale mapping of disease loci, by hidden Markov models. Am J Hum Genet 2000;67: Niu T, Qin ZS, Xu X, Liu JS. Bayesian haplotype inference for multiple linked singlenucleotide polymorphisms. Am J Hum Genet 2002;70: Breslow NE, Day NE. Statistical methods in cancer research: I. The analysis of casecontrol studies. Lyon: IARC Scientific publications, Rothman KJ, Greenland S. Modern epidemiology. Philadelphia: Lippencott-Raven, Klienbaum DG, Kupper LL, Morgentern H. Epidemiologic research: Principles and quantitative methods. Belmont, CA: Lifetime Learning Publications, Garcia-Closas M, Wacholder S, Caporaso N, Rothman N. Inference issues in epidemiological studies of genetic effects and gene-environment interactions. In: Khoury MJ, Little J, Burke W, eds. Human Genome Epidemiology: Scientific basis for using genetic information to improve health and prevent disease, 2002:Chapter 7. 26

27 20. Clayton DG, McKeigue PM. Epidemiological methods for studying genes and environmental factors in complex diseases. Lancet 2001;358: Langholz B, Rothman N, Wacholder S, Thomas D. Cohort studies for characterizing measured genes. Monogr Natl Cancer Inst 1999;26: Mantel N. Synthetic retrospective studies and related topics. Biometrics 1973;29: Prentice R. A case-cohort design for epidemiologic studies and disease prevention trials. Biometrika 1986;73: Breslow NE, Day NE. Statistical methods in cancer research. II. The design and analysis of cohort studies. Lyon: IARC Scientific Publications, Thomas DC. New approaches to the analysis of cohort studies. Epidemiol Rev 1998;14: White J. A two stage design for the study of the relationship between a rare exposure and a rare disease. Am J Epidemiol 1982;1982: Breslow N, Cain K. Logistic regression for two-stage case-control data. Biometrika 1988;75: Whittemore A, Halpern J. Multi-stage sampling in genetic epidemiology. Statistics in Medicine 1997;16: Siegmund K, Whittemore A, Thomas D. Multistage sampling for disease family registries. Monogr Natl Cancer Inst 1999;26: Langholz B, Borgan O. Counter-matching: a stratified nested case-control sampling method. Biometrika 1995;82: Andrieu N, Goldstein A, Langholz B, Thomas D. Counter-matching in gene-environment interaction studies: efficiency and feasibility. Am J Epidemiol 2001;153: Lander ES, Schork NJ. Genetic dissection of complex traits. Science 1994;265: Caparaso N, Rothman N, Wacholder W. Case-control studies of common alleles and environmental factors. Monogr Natl Cancer Inst 1999;26: Wacholder S, Rothman N, Caporaso N. Population stratification in epidemiologic studies of common genetic variants and cancer: quantification of bias. JNCI 2000;92: Wacholder S, Rothman N, Caporaso N. Counterpoint: Bias from population stratification is not a major threat to the validity of conclusions from epidemiologic studies of common polymorphisms and cancer. Cancer Epidemiol Prev Biomarkers 2002;11:in press. 36. Thomas D, Witte J. Population stratification: A problem for case-control studies of candidate gene associations? Cancer Epidemiol Prev Biomark 2001:Under review. 37. Devlin B, Roeder K. Genomic control for association studies. Biometrics 1999;55: Reich DE, Goldstein DB. Detecting association in a case-control study while correcting for population stratification. Genet Epidemiol 2001;20: Pritchard JK, Stephens M, Rosenberg NA, Donnelly P. Association mapping in structured populations. Am J Hum Genet 2000;67: Satten GA, Flanders WD, Yang Q. Accounting for unmeasured population substructure in case-control studies of genetic association using a novel latent-class model. Am J Hum Genet 2001;68: Umbach D, Weinberg C. Designing and analysing case-control studies to exploit independence of genotype and exposure. Statistics in Med 1997;16: Khoury M, Flanders W. Nontraditional epidemiologic approaches in the analysis of gene- 27

Introduction to linkage and family based designs to study the genetic epidemiology of complex traits. Harold Snieder

Introduction to linkage and family based designs to study the genetic epidemiology of complex traits. Harold Snieder Introduction to linkage and family based designs to study the genetic epidemiology of complex traits Harold Snieder Overview of presentation Designs: population vs. family based Mendelian vs. complex diseases/traits

More information

Flexible Matching in Case-Control Studies of Gene-Environment Interactions

Flexible Matching in Case-Control Studies of Gene-Environment Interactions American Journal of Epidemiology Copyright 2004 by the Johns Hopkins Bloomberg School of Public Health All rights reserved Vol. 59, No. Printed in U.S.A. DOI: 0.093/aje/kwg250 ORIGINAL CONTRIBUTIONS Flexible

More information

Dan Koller, Ph.D. Medical and Molecular Genetics

Dan Koller, Ph.D. Medical and Molecular Genetics Design of Genetic Studies Dan Koller, Ph.D. Research Assistant Professor Medical and Molecular Genetics Genetics and Medicine Over the past decade, advances from genetics have permeated medicine Identification

More information

Nonparametric Linkage Analysis. Nonparametric Linkage Analysis

Nonparametric Linkage Analysis. Nonparametric Linkage Analysis Limitations of Parametric Linkage Analysis We previously discued parametric linkage analysis Genetic model for the disease must be specified: allele frequency parameters and penetrance parameters Lod scores

More information

BST227 Introduction to Statistical Genetics. Lecture 4: Introduction to linkage and association analysis

BST227 Introduction to Statistical Genetics. Lecture 4: Introduction to linkage and association analysis BST227 Introduction to Statistical Genetics Lecture 4: Introduction to linkage and association analysis 1 Housekeeping Homework #1 due today Homework #2 posted (due Monday) Lab at 5:30PM today (FXB G13)

More information

DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK

DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK CHAPTER 6 DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK Genetic research aimed at the identification of new breast cancer susceptibility genes is at an interesting crossroad. On the one hand, the existence

More information

Non-parametric methods for linkage analysis

Non-parametric methods for linkage analysis BIOSTT516 Statistical Methods in Genetic Epidemiology utumn 005 Non-parametric methods for linkage analysis To this point, we have discussed model-based linkage analyses. These require one to specify a

More information

Chapter 2. Linkage Analysis. JenniferH.BarrettandM.DawnTeare. Abstract. 1. Introduction

Chapter 2. Linkage Analysis. JenniferH.BarrettandM.DawnTeare. Abstract. 1. Introduction Chapter 2 Linkage Analysis JenniferH.BarrettandM.DawnTeare Abstract Linkage analysis is used to map genetic loci using observations on relatives. It can be applied to both major gene disorders (parametric

More information

Complex Multifactorial Genetic Diseases

Complex Multifactorial Genetic Diseases Complex Multifactorial Genetic Diseases Nicola J Camp, University of Utah, Utah, USA Aruna Bansal, University of Utah, Utah, USA Secondary article Article Contents. Introduction. Continuous Variation.

More information

breast cancer; relative risk; risk factor; standard deviation; strength of association

breast cancer; relative risk; risk factor; standard deviation; strength of association American Journal of Epidemiology The Author 2015. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail:

More information

Systems of Mating: Systems of Mating:

Systems of Mating: Systems of Mating: 8/29/2 Systems of Mating: the rules by which pairs of gametes are chosen from the local gene pool to be united in a zygote with respect to a particular locus or genetic system. Systems of Mating: A deme

More information

Transmission Disequilibrium Methods for Family-Based Studies Daniel J. Schaid Technical Report #72 July, 2004

Transmission Disequilibrium Methods for Family-Based Studies Daniel J. Schaid Technical Report #72 July, 2004 Transmission Disequilibrium Methods for Family-Based Studies Daniel J. Schaid Technical Report #72 July, 2004 Correspondence to: Daniel J. Schaid, Ph.D., Harwick 775, Division of Biostatistics Mayo Clinic/Foundation,

More information

CS2220 Introduction to Computational Biology

CS2220 Introduction to Computational Biology CS2220 Introduction to Computational Biology WEEK 8: GENOME-WIDE ASSOCIATION STUDIES (GWAS) 1 Dr. Mengling FENG Institute for Infocomm Research Massachusetts Institute of Technology mfeng@mit.edu PLANS

More information

A Comparison of Sample Size and Power in Case-Only Association Studies of Gene-Environment Interaction

A Comparison of Sample Size and Power in Case-Only Association Studies of Gene-Environment Interaction American Journal of Epidemiology ª The Author 2010. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. This is an Open Access article distributed under

More information

Supplementary Figure 1. Principal components analysis of European ancestry in the African American, Native Hawaiian and Latino populations.

Supplementary Figure 1. Principal components analysis of European ancestry in the African American, Native Hawaiian and Latino populations. Supplementary Figure. Principal components analysis of European ancestry in the African American, Native Hawaiian and Latino populations. a Eigenvector 2.5..5.5. African Americans European Americans e

More information

ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION STUDY

ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION STUDY ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION STUDY by SARADHA RAJAMANI A Project submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the degree of

More information

Human population sub-structure and genetic association studies

Human population sub-structure and genetic association studies Human population sub-structure and genetic association studies Stephanie A. Santorico, Ph.D. Department of Mathematical & Statistical Sciences Stephanie.Santorico@ucdenver.edu Global Similarity Map from

More information

Mendelian Randomization

Mendelian Randomization Mendelian Randomization Drawback with observational studies Risk factor X Y Outcome Risk factor X? Y Outcome C (Unobserved) Confounders The power of genetics Intermediate phenotype (risk factor) Genetic

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

Genetics and Genomics in Medicine Chapter 8 Questions

Genetics and Genomics in Medicine Chapter 8 Questions Genetics and Genomics in Medicine Chapter 8 Questions Linkage Analysis Question Question 8.1 Affected members of the pedigree above have an autosomal dominant disorder, and cytogenetic analyses using conventional

More information

Alzheimer Disease and Complex Segregation Analysis p.1/29

Alzheimer Disease and Complex Segregation Analysis p.1/29 Alzheimer Disease and Complex Segregation Analysis Amanda Halladay Dalhousie University Alzheimer Disease and Complex Segregation Analysis p.1/29 Outline Background Information on Alzheimer Disease Alzheimer

More information

Ascertainment Through Family History of Disease Often Decreases the Power of Family-based Association Studies

Ascertainment Through Family History of Disease Often Decreases the Power of Family-based Association Studies Behav Genet (2007) 37:631 636 DOI 17/s10519-007-9149-0 ORIGINAL PAPER Ascertainment Through Family History of Disease Often Decreases the Power of Family-based Association Studies Manuel A. R. Ferreira

More information

MULTIFACTORIAL DISEASES. MG L-10 July 7 th 2014

MULTIFACTORIAL DISEASES. MG L-10 July 7 th 2014 MULTIFACTORIAL DISEASES MG L-10 July 7 th 2014 Genetic Diseases Unifactorial Chromosomal Multifactorial AD Numerical AR Structural X-linked Microdeletions Mitochondrial Spectrum of Alterations in DNA Sequence

More information

Selection Bias in the Assessment of Gene-Environment Interaction in Case-Control Studies

Selection Bias in the Assessment of Gene-Environment Interaction in Case-Control Studies American Journal of Epidemiology Copyright 2003 by the Johns Hopkins Bloomberg School of Public Health All rights reserved Vol. 158, No. 3 Printed in U.S.A. DOI: 10.1093/aje/kwg147 Selection Bias in the

More information

Transmission Disequilibrium Test in GWAS

Transmission Disequilibrium Test in GWAS Department of Computer Science Brown University, Providence sorin@cs.brown.edu November 10, 2010 Outline 1 Outline 2 3 4 The transmission/disequilibrium test (TDT) was intro- duced several years ago by

More information

Tutorial on Genome-Wide Association Studies

Tutorial on Genome-Wide Association Studies Tutorial on Genome-Wide Association Studies Assistant Professor Institute for Computational Biology Department of Epidemiology and Biostatistics Case Western Reserve University Acknowledgements Dana Crawford

More information

Developing and evaluating polygenic risk prediction models for stratified disease prevention

Developing and evaluating polygenic risk prediction models for stratified disease prevention Developing and evaluating polygenic risk prediction models for stratified disease prevention Nilanjan Chatterjee 1 3, Jianxin Shi 3 and Montserrat García-Closas 3 Abstract Knowledge of genetics and its

More information

An Introduction to Quantitative Genetics I. Heather A Lawson Advanced Genetics Spring2018

An Introduction to Quantitative Genetics I. Heather A Lawson Advanced Genetics Spring2018 An Introduction to Quantitative Genetics I Heather A Lawson Advanced Genetics Spring2018 Outline What is Quantitative Genetics? Genotypic Values and Genetic Effects Heritability Linkage Disequilibrium

More information

New Enhancements: GWAS Workflows with SVS

New Enhancements: GWAS Workflows with SVS New Enhancements: GWAS Workflows with SVS August 9 th, 2017 Gabe Rudy VP Product & Engineering 20 most promising Biotech Technology Providers Top 10 Analytics Solution Providers Hype Cycle for Life sciences

More information

Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection

Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection Author's response to reviews Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection Authors: Jestinah M Mahachie John

More information

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n. University of Groningen Latent instrumental variables Ebbes, P. IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document

More information

Multifactorial Inheritance. Prof. Dr. Nedime Serakinci

Multifactorial Inheritance. Prof. Dr. Nedime Serakinci Multifactorial Inheritance Prof. Dr. Nedime Serakinci GENETICS I. Importance of genetics. Genetic terminology. I. Mendelian Genetics, Mendel s Laws (Law of Segregation, Law of Independent Assortment).

More information

Allowing for Missing Parents in Genetic Studies of Case-Parent Triads

Allowing for Missing Parents in Genetic Studies of Case-Parent Triads Am. J. Hum. Genet. 64:1186 1193, 1999 Allowing for Missing Parents in Genetic Studies of Case-Parent Triads C. R. Weinberg National Institute of Environmental Health Sciences, Research Triangle Park, NC

More information

Nature Genetics: doi: /ng Supplementary Figure 1

Nature Genetics: doi: /ng Supplementary Figure 1 Supplementary Figure 1 Illustrative example of ptdt using height The expected value of a child s polygenic risk score (PRS) for a trait is the average of maternal and paternal PRS values. For example,

More information

Effect of Genetic Heterogeneity and Assortative Mating on Linkage Analysis: A Simulation Study

Effect of Genetic Heterogeneity and Assortative Mating on Linkage Analysis: A Simulation Study Am. J. Hum. Genet. 61:1169 1178, 1997 Effect of Genetic Heterogeneity and Assortative Mating on Linkage Analysis: A Simulation Study Catherine T. Falk The Lindsley F. Kimball Research Institute of The

More information

Combined Analysis of Hereditary Prostate Cancer Linkage to 1q24-25

Combined Analysis of Hereditary Prostate Cancer Linkage to 1q24-25 Am. J. Hum. Genet. 66:945 957, 000 Combined Analysis of Hereditary Prostate Cancer Linkage to 1q4-5: Results from 77 Hereditary Prostate Cancer Families from the International Consortium for Prostate Cancer

More information

Supplementary Figures

Supplementary Figures Supplementary Figures Supplementary Fig 1. Comparison of sub-samples on the first two principal components of genetic variation. TheBritishsampleisplottedwithredpoints.The sub-samples of the diverse sample

More information

Imaging Genetics: Heritability, Linkage & Association

Imaging Genetics: Heritability, Linkage & Association Imaging Genetics: Heritability, Linkage & Association David C. Glahn, PhD Olin Neuropsychiatry Research Center & Department of Psychiatry, Yale University July 17, 2011 Memory Activation & APOE ε4 Risk

More information

During the hyperinsulinemic-euglycemic clamp [1], a priming dose of human insulin (Novolin,

During the hyperinsulinemic-euglycemic clamp [1], a priming dose of human insulin (Novolin, ESM Methods Hyperinsulinemic-euglycemic clamp procedure During the hyperinsulinemic-euglycemic clamp [1], a priming dose of human insulin (Novolin, Clayton, NC) was followed by a constant rate (60 mu m

More information

GENETIC LINKAGE ANALYSIS

GENETIC LINKAGE ANALYSIS Atlas of Genetics and Cytogenetics in Oncology and Haematology GENETIC LINKAGE ANALYSIS * I- Recombination fraction II- Definition of the "lod score" of a family III- Test for linkage IV- Estimation of

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

(b) What is the allele frequency of the b allele in the new merged population on the island?

(b) What is the allele frequency of the b allele in the new merged population on the island? 2005 7.03 Problem Set 6 KEY Due before 5 PM on WEDNESDAY, November 23, 2005. Turn answers in to the box outside of 68-120. PLEASE WRITE YOUR ANSWERS ON THIS PRINTOUT. 1. Two populations (Population One

More information

A Unified Sampling Approach for Multipoint Analysis of Qualitative and Quantitative Traits in Sib Pairs

A Unified Sampling Approach for Multipoint Analysis of Qualitative and Quantitative Traits in Sib Pairs Am. J. Hum. Genet. 66:1631 1641, 000 A Unified Sampling Approach for Multipoint Analysis of Qualitative and Quantitative Traits in Sib Pairs Kung-Yee Liang, 1 Chiung-Yu Huang, 1 and Terri H. Beaty Departments

More information

Review and Evaluation of Methods Correcting for Population Stratification with a Focus on Underlying Statistical Principles

Review and Evaluation of Methods Correcting for Population Stratification with a Focus on Underlying Statistical Principles Original Paper DOI: 10.1159/000119107 Published online: March 31, 2008 Review and Evaluation of Methods Correcting for Population Stratification with a Focus on Underlying Statistical Principles Hemant

More information

MOLECULAR EPIDEMIOLOGY Afiono Agung Prasetyo Faculty of Medicine Sebelas Maret University Indonesia

MOLECULAR EPIDEMIOLOGY Afiono Agung Prasetyo Faculty of Medicine Sebelas Maret University Indonesia MOLECULAR EPIDEMIOLOGY GENERAL EPIDEMIOLOGY General epidemiology is the scientific basis of public health Descriptive epidemiology: distribution of disease in populations Incidence and prevalence rates

More information

Cancer develops after somatic mutations overcome the multiple

Cancer develops after somatic mutations overcome the multiple Genetic variation in cancer predisposition: Mutational decay of a robust genetic control network Steven A. Frank* Department of Ecology and Evolutionary Biology, University of California, Irvine, CA 92697-2525

More information

Sensitivity Analysis in Observational Research: Introducing the E-value

Sensitivity Analysis in Observational Research: Introducing the E-value Sensitivity Analysis in Observational Research: Introducing the E-value Tyler J. VanderWeele Harvard T.H. Chan School of Public Health Departments of Epidemiology and Biostatistics 1 Plan of Presentation

More information

STATISTICAL ANALYSIS FOR GENETIC EPIDEMIOLOGY (S.A.G.E.) INTRODUCTION

STATISTICAL ANALYSIS FOR GENETIC EPIDEMIOLOGY (S.A.G.E.) INTRODUCTION STATISTICAL ANALYSIS FOR GENETIC EPIDEMIOLOGY (S.A.G.E.) INTRODUCTION Release 3.1 December 1997 - ii - INTRODUCTION Table of Contents 1 Changes Since Last Release... 1 2 Purpose... 3 3 Using the Programs...

More information

Gene-Environment Interactions

Gene-Environment Interactions Gene-Environment Interactions What is gene-environment interaction? A different effect of an environmental exposure on disease risk in persons with different genotypes," or, alternatively, "a different

More information

Genome Scan Meta-Analysis of Schizophrenia and Bipolar Disorder, Part I: Methods and Power Analysis

Genome Scan Meta-Analysis of Schizophrenia and Bipolar Disorder, Part I: Methods and Power Analysis Am. J. Hum. Genet. 73:17 33, 2003 Genome Scan Meta-Analysis of Schizophrenia and Bipolar Disorder, Part I: Methods and Power Analysis Douglas F. Levinson, 1 Matthew D. Levinson, 1 Ricardo Segurado, 2 and

More information

Statistical Evaluation of Sibling Relationship

Statistical Evaluation of Sibling Relationship The Korean Communications in Statistics Vol. 14 No. 3, 2007, pp. 541 549 Statistical Evaluation of Sibling Relationship Jae Won Lee 1), Hye-Seung Lee 2), Hyo Jung Lee 3) and Juck-Joon Hwang 4) Abstract

More information

Roadmap. Inbreeding How inbred is a population? What are the consequences of inbreeding?

Roadmap. Inbreeding How inbred is a population? What are the consequences of inbreeding? 1 Roadmap Quantitative traits What kinds of variation can selection work on? How much will a population respond to selection? Heritability How can response be restored? Inbreeding How inbred is a population?

More information

Problem set questions from Final Exam Human Genetics, Nondisjunction, and Cancer

Problem set questions from Final Exam Human Genetics, Nondisjunction, and Cancer Problem set questions from Final Exam Human Genetics, Nondisjunction, and ancer Mapping in humans using SSRs and LOD scores 1. You set out to genetically map the locus for color blindness with respect

More information

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method Biost 590: Statistical Consulting Statistical Classification of Scientific Studies; Approach to Consulting Lecture Outline Statistical Classification of Scientific Studies Statistical Tasks Approach to

More information

Asingle inherited mutant gene may be enough to

Asingle inherited mutant gene may be enough to 396 Cancer Inheritance STEVEN A. FRANK Asingle inherited mutant gene may be enough to cause a very high cancer risk. Single-mutation cases have provided much insight into the genetic basis of carcinogenesis,

More information

A Brief Introduction to Bayesian Statistics

A Brief Introduction to Bayesian Statistics A Brief Introduction to Statistics David Kaplan Department of Educational Psychology Methods for Social Policy Research and, Washington, DC 2017 1 / 37 The Reverend Thomas Bayes, 1701 1761 2 / 37 Pierre-Simon

More information

Statistical power and significance testing in large-scale genetic studies

Statistical power and significance testing in large-scale genetic studies STUDY DESIGNS Statistical power and significance testing in large-scale genetic studies Pak C. Sham 1 and Shaun M. Purcell 2,3 Abstract Significance testing was developed as an objective method for summarizing

More information

Measuring cancer survival in populations: relative survival vs cancer-specific survival

Measuring cancer survival in populations: relative survival vs cancer-specific survival Int. J. Epidemiol. Advance Access published February 8, 2010 Published by Oxford University Press on behalf of the International Epidemiological Association ß The Author 2010; all rights reserved. International

More information

Pedigree Construction Notes

Pedigree Construction Notes Name Date Pedigree Construction Notes GO TO à Mendelian Inheritance (http://www.uic.edu/classes/bms/bms655/lesson3.html) When human geneticists first began to publish family studies, they used a variety

More information

National Disease Research Interchange Annual Progress Report: 2010 Formula Grant

National Disease Research Interchange Annual Progress Report: 2010 Formula Grant National Disease Research Interchange Annual Progress Report: 2010 Formula Grant Reporting Period July 1, 2011 June 30, 2012 Formula Grant Overview The National Disease Research Interchange received $62,393

More information

Epidemiological methods for studying genes and environmental factors in complex diseases

Epidemiological methods for studying genes and environmental factors in complex diseases Review Epidemiological methods for studying genes and environmental factors in complex diseases David Clayton, Paul M McKeigue Exploration of the human genome presents new challenges and opportunities

More information

Effects of Stratification in the Analysis of Affected-Sib-Pair Data: Benefits and Costs

Effects of Stratification in the Analysis of Affected-Sib-Pair Data: Benefits and Costs Am. J. Hum. Genet. 66:567 575, 2000 Effects of Stratification in the Analysis of Affected-Sib-Pair Data: Benefits and Costs Suzanne M. Leal and Jurg Ott Laboratory of Statistical Genetics, The Rockefeller

More information

Introduction to the Genetics of Complex Disease

Introduction to the Genetics of Complex Disease Introduction to the Genetics of Complex Disease Jeremiah M. Scharf, MD, PhD Departments of Neurology, Psychiatry and Center for Human Genetic Research Massachusetts General Hospital Breakthroughs in Genome

More information

Performing. linkage analysis using MERLIN

Performing. linkage analysis using MERLIN Performing linkage analysis using MERLIN David Duffy Queensland Institute of Medical Research Brisbane, Australia Overview MERLIN and associated programs Error checking Parametric linkage analysis Nonparametric

More information

TEACHERS TOPICS. The Role of Matching in Epidemiologic Studies. American Journal of Pharmaceutical Education 2004; 68 (3) Article 83.

TEACHERS TOPICS. The Role of Matching in Epidemiologic Studies. American Journal of Pharmaceutical Education 2004; 68 (3) Article 83. TEACHERS TOPICS American Journal of Pharmaceutical Education 2004; 68 (3) Article 83. The Role of Matching in Epidemiologic Studies Kevin W. Garey, PharmD College of Pharmacy, University of Houston Submitted

More information

On the Use of Familial Aggregation in Population-Based Case Probands for Calculating Penetrance

On the Use of Familial Aggregation in Population-Based Case Probands for Calculating Penetrance On the Use of Familial Aggregation in Population-Based Case Probands for Calculating Penetrance Colin B. Begg Background: Estimating the lifetime risk associated with (i.e., the penetrance of) genetic

More information

Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology

Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology Sylvia Richardson 1 sylvia.richardson@imperial.co.uk Joint work with: Alexina Mason 1, Lawrence

More information

Stat 531 Statistical Genetics I Homework 4

Stat 531 Statistical Genetics I Homework 4 Stat 531 Statistical Genetics I Homework 4 Erik Erhardt November 17, 2004 1 Duerr et al. report an association between a particular locus on chromosome 12, D12S1724, and in ammatory bowel disease (Am.

More information

Single Gene (Monogenic) Disorders. Mendelian Inheritance: Definitions. Mendelian Inheritance: Definitions

Single Gene (Monogenic) Disorders. Mendelian Inheritance: Definitions. Mendelian Inheritance: Definitions Single Gene (Monogenic) Disorders Mendelian Inheritance: Definitions A genetic locus is a specific position or location on a chromosome. Frequently, locus is used to refer to a specific gene. Alleles are

More information

Epidemiologic Methods I & II Epidem 201AB Winter & Spring 2002

Epidemiologic Methods I & II Epidem 201AB Winter & Spring 2002 DETAILED COURSE OUTLINE Epidemiologic Methods I & II Epidem 201AB Winter & Spring 2002 Hal Morgenstern, Ph.D. Department of Epidemiology UCLA School of Public Health Page 1 I. THE NATURE OF EPIDEMIOLOGIC

More information

Statistical Genetics

Statistical Genetics Institute of Mathematics Ecole polytechnique fédérale de Lausanne Switzerland Spring Seminar of the 3e cycle romand Diablerets, March 2007 Mendel s Experiments What is Genetics? Statistical Models G. Mendel

More information

So how much of breast and ovarian cancer is hereditary? A). 5 to 10 percent. B). 20 to 30 percent. C). 50 percent. Or D). 65 to 70 percent.

So how much of breast and ovarian cancer is hereditary? A). 5 to 10 percent. B). 20 to 30 percent. C). 50 percent. Or D). 65 to 70 percent. Welcome. My name is Amanda Brandt. I am one of the Cancer Genetic Counselors at the University of Texas MD Anderson Cancer Center. Today, we are going to be discussing how to identify patients at high

More information

CHAPTER 6. Conclusions and Perspectives

CHAPTER 6. Conclusions and Perspectives CHAPTER 6 Conclusions and Perspectives In Chapter 2 of this thesis, similarities and differences among members of (mainly MZ) twin families in their blood plasma lipidomics profiles were investigated.

More information

The RoB 2.0 tool (individually randomized, cross-over trials)

The RoB 2.0 tool (individually randomized, cross-over trials) The RoB 2.0 tool (individually randomized, cross-over trials) Study design Randomized parallel group trial Cluster-randomized trial Randomized cross-over or other matched design Specify which outcome is

More information

Statistical Tests for X Chromosome Association Study. with Simulations. Jian Wang July 10, 2012

Statistical Tests for X Chromosome Association Study. with Simulations. Jian Wang July 10, 2012 Statistical Tests for X Chromosome Association Study with Simulations Jian Wang July 10, 2012 Statistical Tests Zheng G, et al. 2007. Testing association for markers on the X chromosome. Genetic Epidemiology

More information

The Genetic Epidemiology of Cancer: Interpreting Family and Twin Studies and Their Implications for Molecular Genetic Approaches

The Genetic Epidemiology of Cancer: Interpreting Family and Twin Studies and Their Implications for Molecular Genetic Approaches Vol. 10, 733 741, July 2001 Cancer Epidemiology, Biomarkers & Prevention 733 Review The Genetic Epidemiology of Cancer: Interpreting Family and Twin Studies and Their Implications for Molecular Genetic

More information

Challenges in design and analysis of large register-based epidemiological studies

Challenges in design and analysis of large register-based epidemiological studies FMS/DSBS autumn meeting 2014 Challenges in design and analysis of large register-based epidemiological studies Caroline Weibull & Anna Johansson Department of Medical Epidemiology and Biostatistics (MEB)

More information

INTRODUCTION TO GENETIC EPIDEMIOLOGY (1012GENEP1) Prof. Dr. Dr. K. Van Steen

INTRODUCTION TO GENETIC EPIDEMIOLOGY (1012GENEP1) Prof. Dr. Dr. K. Van Steen INTRODUCTION TO GENETIC EPIDEMIOLOGY (1012GENEP1) Prof. Dr. Dr. K. Van Steen DIFFERENT FACES OF GENETIC EPIDEMIOLOGY 1 Basic epidemiology 1.a Aims of epidemiology 1.b Designs in epidemiology 1.c An overview

More information

Multilevel analysis quantifies variation in the experimental effect while optimizing power and preventing false positives

Multilevel analysis quantifies variation in the experimental effect while optimizing power and preventing false positives DOI 10.1186/s12868-015-0228-5 BMC Neuroscience RESEARCH ARTICLE Open Access Multilevel analysis quantifies variation in the experimental effect while optimizing power and preventing false positives Emmeke

More information

B-4.7 Summarize the chromosome theory of inheritance and relate that theory to Gregor Mendel s principles of genetics

B-4.7 Summarize the chromosome theory of inheritance and relate that theory to Gregor Mendel s principles of genetics B-4.7 Summarize the chromosome theory of inheritance and relate that theory to Gregor Mendel s principles of genetics The Chromosome theory of inheritance is a basic principle in biology that states genes

More information

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models White Paper 23-12 Estimating Complex Phenotype Prevalence Using Predictive Models Authors: Nicholas A. Furlotte Aaron Kleinman Robin Smith David Hinds Created: September 25 th, 2015 September 25th, 2015

More information

Cross-Lagged Panel Analysis

Cross-Lagged Panel Analysis Cross-Lagged Panel Analysis Michael W. Kearney Cross-lagged panel analysis is an analytical strategy used to describe reciprocal relationships, or directional influences, between variables over time. Cross-lagged

More information

Genetics of common disorders with complex inheritance Bettina Blaumeiser MD PhD

Genetics of common disorders with complex inheritance Bettina Blaumeiser MD PhD Genetics of common disorders with complex inheritance Bettina Blaumeiser MD PhD Medical Genetics University Hospital & University of Antwerp Programme Day 6: Genetics of common disorders with complex inheritance

More information

Impact and adjustment of selection bias. in the assessment of measurement equivalence

Impact and adjustment of selection bias. in the assessment of measurement equivalence Impact and adjustment of selection bias in the assessment of measurement equivalence Thomas Klausch, Joop Hox,& Barry Schouten Working Paper, Utrecht, December 2012 Corresponding author: Thomas Klausch,

More information

SUMMARY AND DISCUSSION

SUMMARY AND DISCUSSION Risk factors for the development and outcome of childhood psychopathology SUMMARY AND DISCUSSION Chapter 147 In this chapter I present a summary of the results of the studies described in this thesis followed

More information

Does Cancer Run in Your Family?

Does Cancer Run in Your Family? Does Cancer Run in Your Family? Nancie Petrucelli, MS, CGC Clinical Assistant Professor Certified Genetic Counselor/Coordinator Cancer Genetic Counseling Service Karmanos Cancer Institute Wayne State University

More information

A NEW TRIAL DESIGN FULLY INTEGRATING BIOMARKER INFORMATION FOR THE EVALUATION OF TREATMENT-EFFECT MECHANISMS IN PERSONALISED MEDICINE

A NEW TRIAL DESIGN FULLY INTEGRATING BIOMARKER INFORMATION FOR THE EVALUATION OF TREATMENT-EFFECT MECHANISMS IN PERSONALISED MEDICINE A NEW TRIAL DESIGN FULLY INTEGRATING BIOMARKER INFORMATION FOR THE EVALUATION OF TREATMENT-EFFECT MECHANISMS IN PERSONALISED MEDICINE Dr Richard Emsley Centre for Biostatistics, Institute of Population

More information

Summary. Introduction. Atypical and Duplicated Samples. Atypical Samples. Noah A. Rosenberg

Summary. Introduction. Atypical and Duplicated Samples. Atypical Samples. Noah A. Rosenberg doi: 10.1111/j.1469-1809.2006.00285.x Standardized Subsets of the HGDP-CEPH Human Genome Diversity Cell Line Panel, Accounting for Atypical and Duplicated Samples and Pairs of Close Relatives Noah A. Rosenberg

More information

LTA Analysis of HapMap Genotype Data

LTA Analysis of HapMap Genotype Data LTA Analysis of HapMap Genotype Data Introduction. This supplement to Global variation in copy number in the human genome, by Redon et al., describes the details of the LTA analysis used to screen HapMap

More information

INTRODUCTION TO GENETIC EPIDEMIOLOGY (EPID0754) Prof. Dr. Dr. K. Van Steen

INTRODUCTION TO GENETIC EPIDEMIOLOGY (EPID0754) Prof. Dr. Dr. K. Van Steen INTRODUCTION TO GENETIC EPIDEMIOLOGY (EPID0754) Prof. Dr. Dr. K. Van Steen DIFFERENT FACES OF GENETIC EPIDEMIOLOGY 1 Basic epidemiology 1.a Aims of epidemiology 1.b Designs in epidemiology 1.c An overview

More information

Evidence-Based Medicine Journal Club. A Primer in Statistics, Study Design, and Epidemiology. August, 2013

Evidence-Based Medicine Journal Club. A Primer in Statistics, Study Design, and Epidemiology. August, 2013 Evidence-Based Medicine Journal Club A Primer in Statistics, Study Design, and Epidemiology August, 2013 Rationale for EBM Conscientious, explicit, and judicious use Beyond clinical experience and physiologic

More information

Vocabulary. Bias. Blinding. Block. Cluster sample

Vocabulary. Bias. Blinding. Block. Cluster sample Bias Blinding Block Census Cluster sample Confounding Control group Convenience sample Designs Experiment Experimental units Factor Level Any systematic failure of a sampling method to represent its population

More information

Genome-wide Association Analysis Applied to Asthma-Susceptibility Gene. McCaw, Z., Wu, W., Hsiao, S., McKhann, A., Tracy, S.

Genome-wide Association Analysis Applied to Asthma-Susceptibility Gene. McCaw, Z., Wu, W., Hsiao, S., McKhann, A., Tracy, S. Genome-wide Association Analysis Applied to Asthma-Susceptibility Gene McCaw, Z., Wu, W., Hsiao, S., McKhann, A., Tracy, S. December 17, 2014 1 Introduction Asthma is a chronic respiratory disease affecting

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction 1.1 Motivation and Goals The increasing availability and decreasing cost of high-throughput (HT) technologies coupled with the availability of computational tools and data form a

More information

Information for You and Your Family

Information for You and Your Family Information for You and Your Family What is Prevention? Cancer prevention is action taken to lower the chance of getting cancer. In 2017, more than 1.6 million people will be diagnosed with cancer in the

More information

EPI 200C Final, June 4 th, 2009 This exam includes 24 questions.

EPI 200C Final, June 4 th, 2009 This exam includes 24 questions. Greenland/Arah, Epi 200C Sp 2000 1 of 6 EPI 200C Final, June 4 th, 2009 This exam includes 24 questions. INSTRUCTIONS: Write all answers on the answer sheets supplied; PRINT YOUR NAME and STUDENT ID NUMBER

More information

Chapter 6 Topic 6B Test Bias and Other Controversies. The Question of Test Bias

Chapter 6 Topic 6B Test Bias and Other Controversies. The Question of Test Bias Chapter 6 Topic 6B Test Bias and Other Controversies The Question of Test Bias Test bias is an objective, empirical question, not a matter of personal judgment. Test bias is a technical concept of amenable

More information

Strategies for Mapping Heterogeneous Recessive Traits by Allele- Sharing Methods

Strategies for Mapping Heterogeneous Recessive Traits by Allele- Sharing Methods Am. J. Hum. Genet. 60:965-978, 1997 Strategies for Mapping Heterogeneous Recessive Traits by Allele- Sharing Methods Eleanor Feingold' and David 0. Siegmund2 'Department of Biostatistics, Emory University,

More information

MEA DISCUSSION PAPERS

MEA DISCUSSION PAPERS Inference Problems under a Special Form of Heteroskedasticity Helmut Farbmacher, Heinrich Kögel 03-2015 MEA DISCUSSION PAPERS mea Amalienstr. 33_D-80799 Munich_Phone+49 89 38602-355_Fax +49 89 38602-390_www.mea.mpisoc.mpg.de

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 10: Introduction to inference (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 17 What is inference? 2 / 17 Where did our data come from? Recall our sample is: Y, the vector

More information