Lecture 6 Practice of Linkage Analysis Jurg Ott http://lab.rockefeller.edu/ott/ http://www.jurgott.org/pekingu/ The LINKAGE Programs http://www.jurgott.org/linkage/linkagepc.html Input: pedfile Fam ID fa mo sex dis marker 1. AUX 1 6 7 1 1 1 2 AUX 2 8 9 2 1 1 2 AUX 3 1 2 2 2 1 1 AUX 4 0 0 1 0 0 0 AUX 5 0 0 2 0 0 0 AUX 6 0 0 1 0 0 0 AUX 7 4 5 2 0 0 0 AUX 8 4 5 1 0 0 0 AUX 9 0 0 2 0 0 0 2 1
The LINKAGE Programs Input: datafile (Preplink program) 2 0 0 5 << No LOCI, RISK L, SEXL IF 1, PROGRAM 0 0.0 0.0 0 << MUT LOCUS, MUT MALE, MUT FEM, HAP FREQ IF 1 1 2 << Locus order 1 2 << AFFECTION, NO ALLELES 0.999 0.001 << GENE FREQUENCIES 1 << LIAB CLASSES 0 0 0.95 << PENETRANCES, recessive 3 2 << ALLELE NUMBERS, NO. OF ALLELES 0.1 0.9 << GENE FREQUENCIES 0 0 << SEX DIFF, INTERFERENCE 0.0001 << RECOMBINATION VALUES 1 0.49 0.41 << REC VARIED, INCREMENT, FINISHING VALUE 3 The LINKAGE Programs Makeped program: dis.pre dis.ped o Insert pointers into data: Next sib, first offspring o Add loop breaker o Add proband, automatic is easiest. o risk 4 2
The LINKAGE Programs Preprocessing: Unknown program o Input: dis.dat o Input: dis.dat o Input: dis.ped o Output: datafile.dat o Output: pedfile.dat o Output: ipedfile.dat o Output: speedfile.dat Run mlink or linkmap or ilink 5 The LINKAGE Programs The LCP shell 6 3
Programs and Platforms LINKAGE programs on Linux (Fastlink) LINKAGE on Windows: To run 16 bit programs like makeped, LCP, install XP mode (Windows 7 Professional). EasyLINKAGE: http://nephrologie.uniklinikum-leipzig.de/nephrologie.site,postext,easylinkage,a_id,797.html Merlin (Abecasis). Can handle large numbers of correlated SNPs (results depend on r 2 threshold) Allegro (no longer available) Mendel (Ken Lange, Mike Boehnke, Dan Weeks) Discussion points: Pedigree size, number of markers 7 8 4
SNPs Genome wide Write shell program (Pascal, C, Fortran, java, perl) to analyze one SNP at a time: Program that calls mlink as a function to compute likelihood. Transmit resulting likelihood via call from mlink, or inspect output file produced dby mlink. LINKAGE (V5.1) WITH 2-POINT AUTOSOMAL DATA ORDER OF LOCI: 1 2 THETAS 0.500 PEDIGREE LN LIKE LOG 10 LIKE 1-12.186130-5.292369 TOTALS -12.186130-5.292369-2 LN(LIKE) = 2.43722604246926E+001 LOD SCORE = 0.000000 THETAS 0.000 PEDIGREE LN LIKE LOG 10 LIKE 1-12.087855-5.249689 TOTALS -12.087855-5.249689-2 LN(LIKE) = 2.41757099356111E+001 LOD SCORE = 0.042680 9 Quality Control (QC) The unknown program (and mlink) will catch mendelian inconsistencies, but other QC should be done prior to any calculations. Delete SNPs with call rates < 0.95 (error prone) Based on >10,000 (better >100,000) SNPs, estimate mean number of alleles shared (PI_HAT) for all pairs of individuals. Siblings: pi = 0.5. Duplicate samples and MZ twins: pi = 1. Based on X-chromosomal markers, estimate sex (males should ldbe all llhemizygous, not heterozygous) except for markers in the pseudoautosomal regions. 10 5
SNPs and Nuisance Parameters Recombination fractions are best estimated in mlink: Find max. lod score at θ = 0.001, 0.05, 0.10, 0.15,, 0.45. Other parameters such as allele l frequencies (not for disease!) can be estimated iteratively by ilink program. Haplotype frequencies must include all loci, including disease: Estimation not suitable except in specialized programs. Penetrances may be estimated for all genotypes at disease locus. For dominant disease, two dd f 1 =0.02 penetrances cannot be handled by ilink. dd f 2 =0.90 Solution to (1) estimate them and (2) handle DD f 2 =0.90 such parameters as nuisance parameters: ~ ~ Z ) log[ L( ˆ, fˆ, fˆ )] log[ L(0.5, f, )] ( 1 2 1 f2 Rare dom: May set f (DD) = 0 (ilink ok). 11 Two Locus Inheritance Ming & Muenke (2002) Am J Hum Genet 71, 1017 (review) 12 6
Digenic Inheritance of Severe Insulin Resistance Savage et al (2002) Nat Genet 31, 379... all five family members with severe insulin resistance, and no other family members, were doubly heterozygous with respect to two frameshift mutations of these two unlinked genes. 13 Abstract Polygenic Nature of Schizophrenia Purcell et al (2009) Nature 460, 748-752 14 7
Inheritance Patterns Generalized mendelian single-locus (incomplete penetrance, phenocopies) Polygenic: Large number of loci, each with small contribution to phenotype Oligogenic, for example, 2-locus inheritance: NN ND DD Epistasis Heterogeneity Liability NN ND DD NN ND DD NN ND DD 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 1 1 0 0 1 1 1 1 1 1 1 15 Examples in Nature M.W. Strickberger (1985) Genetics, 3 rd edition, Macmillan 16 8
Schizophrenia Inheritance McGue et al. (1986) Behav Genet 16, 75-87 Recurrence risk for MZ twin: 44.3% Recurrence risk for DZ twin: 12.1% Recurrence risk for sib: 7.3% Path analysis Heritability = 74.0% Risch (1990) Am J Hum Genet 46, 222-228228 Risks decrease with increasing distance of relationship. Amount of drop-off 2 or 3 epistatic loci 17 2 Locus Model for Schizophrenia Neuman & Rice (1992) Genet Epidemiol 9, 347 Individuals with 3 or more "2" alleles are susceptible With "2" allele frequencies of 0.20 at each locus, trait frequency is just below 1%. Neuman-Rice model, slightly modified (made symmetric): Locus 1 1 1 Locus 2 1 2 2 2 1 1 0 0 0 1 2 0 0 0.35 2 2 0 0.35 0.35 18 9
Mendelian Models for Complex Traits Abreu et al. (1999) Am J Hum Genet 65, 847 Receipe: Dominant and recessive inheritance with 50% penetrance, choose higher of the 2 lod scores and subtract 0.3 Realistic? Performance: In simulations, this simple approach did better than more sophisticated approaches Apply this principle to 2-locus analysis 19 TLINKAGE Program http://www.jurgott.org/linkage/tlinkage.htm The datafile looks slightly different: 3 0 0 5 2 <<< No. loci, risk locus, sexlinked (if 1), program code, # null loci 0 0.0 0.0 0 << Mut Locus, Mut Rates (male, female), Hap. Frequencies (if 1) 1 2 3 4 2 <<- null locus, number of alleles 0.97 0.03 4 2 <<- null locus, number of alleles 0.99 0.01 1 <<- number of liability classes 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 3 4 << Allele numbers, Number of alleles 0.25000 0.25000 0.25000 0.25000 0 0 << Sex Difference, Interference (If 1 or 2) 0.001 0.00000 << Recombination Values 2 0.05 0.40 << Rec. varied, Increment, Finishing value Relevant lod score threshold for 5% significance? Must be >3. 20 10
Extensions to TLINKAGE TLINKAGE-IMPRINT: a model-based approach to performing twolocus genetic imprinting analysis. Shete S, Zhou X. Hum Hered. 2006;62(3):145-56. Epub 2006 Oct 20. Analysis of genes for alcoholism using two-disease-locus models. Wu CC, Shete S. BMC Genet. 2005 Dec 30;6 Suppl 1:S149. Efficient two-trait-locuslinkage linkage analysisthrough program optimization and parallelization: application to hypercholesterolemia. Dietter J, Spiegel A, an Mey D, Pflug HJ, Al-Kateb H, Hoffmann K, Wienker TF, Strauch K. Eur J Hum Genet. 2004 Jul;12(7):542-50. 21 Phenotype Dubay et al. (1993) Nat Genet 3, 354 Genetic Determinants of Diastolic and Pulse Pressure Map to Different Loci in Lyon Hypertensive Rats Hypertension generally defined as a compound phenotype In rats, two components of blood pressure: Diastolic pressure (steady-state regulation) Pulse pressure (systolic minus diastolic pressure) Controlled by two different genes 22 11
Loops Terwilliger & Ott, Handbook, 1994, ch. 7 Consanguinity loop Inbreeding: Mating between relatives Break loop at an individual who is both an offspring and a parent ( loop person ). Homozygosity mapping: Compute lod scores rather than only record homozygous status. Max. lod score = 1.14 ( 4 0.3) 23 Loops Terwilliger & Ott, Handbook, 1994, ch. 7 Another consanguinity loop Most linkage information comes from the two affecteds Max. lod score = 1.88 ( 6 0.3: why?) 24 12
Marriage loop Loops Terwilliger & Ott, Handbook, 1994, ch. 7 No inbreeding; same principle as in homozygosity mapping Break loop at an individual who is both an offspring and a parent! Max. lod score = 118 1.18 The loops program can detect whether a loop is present in pedigree data (Xie & Ott, Am J Hum Genet 51, A206, 1992); incorporated into makeped command file. 25 Handling Loops in Likelihood Calculation Lange & Elston (1975) Hum Hered 25, 95 105 Need to convert pedigree with loops into a pedigree without loops. Sum over all genotypes (incl phases) of loop person Sum over all genotypes (incl. phases) of loop person. Mendel program: Simply peels over entire loop. L ( ) P( x) P( x, g) g 26 13
Liability Classes Assume dominant disease with age-dependent penetrance. Two-generation family, children listed (as usual) in birth order x-axis: age y-axis: P(affected by given age) 27 Liability Classes Penetrances assigned in liability classes: Corresponding ped and data files: 1 2 << AFFECTION, NO ALLELES 0.999 0.001 << GENE FREQ. 5 << LIAB CLASSES 0 0 0 0 0.3 0.3 0 0.6 0.6 0 0.7 0.7 0 1 1 << PENETRANCES Interpret penetrances of 0 0 0! 28 14
Generalized Usage of Penetrance Classes Define penetrance = Prob(phenotype 2 genotype) Example: ABO blood types Example: Allow for genotyping errors. Each marker will be an affection status locus type. 29 15