Performing. linkage analysis using MERLIN

Similar documents
Chapter 2. Linkage Analysis. JenniferH.BarrettandM.DawnTeare. Abstract. 1. Introduction

Software for genetic analyses

Introduction to linkage and family based designs to study the genetic epidemiology of complex traits. Harold Snieder

Statistical Genetics : Gene Mappin g through Linkag e and Associatio n

Stat 531 Statistical Genetics I Homework 4

Estimating genetic variation within families

Nonparametric Linkage Analysis. Nonparametric Linkage Analysis

An Introduction to Quantitative Genetics I. Heather A Lawson Advanced Genetics Spring2018

Lecture 6: Linkage analysis in medical genetics

Dan Koller, Ph.D. Medical and Molecular Genetics

Lecture 6 Practice of Linkage Analysis

Imaging Genetics: Heritability, Linkage & Association

QTL detection for traits of interest for the dairy goat industry

MULTIFACTORIAL DISEASES. MG L-10 July 7 th 2014

Non-parametric methods for linkage analysis

Mendel Short IGES 2003 Data Preparation. Eric Sobel. Department of of Human Genetics UCLA School of of Medicine

Discontinuous Traits. Chapter 22. Quantitative Traits. Types of Quantitative Traits. Few, distinct phenotypes. Also called discrete characters

Introduction of Genome wide Complex Trait Analysis (GCTA) Presenter: Yue Ming Chen Location: Stat Gen Workshop Date: 6/7/2013

Quantitative Trait Analysis in Sibling Pairs. Biostatistics 666

Genetics and Genomics in Medicine Chapter 8 Questions

MBG* Animal Breeding Methods Fall Final Exam

Genome-wide Association Analysis Applied to Asthma-Susceptibility Gene. McCaw, Z., Wu, W., Hsiao, S., McKhann, A., Tracy, S.

Problem 3: Simulated Rheumatoid Arthritis Data

Mendelian & Complex Traits. Quantitative Imaging Genomics. Genetics Terminology 2. Genetics Terminology 1. Human Genome. Genetics Terminology 3

Complex Multifactorial Genetic Diseases

B-4.7 Summarize the chromosome theory of inheritance and relate that theory to Gregor Mendel s principles of genetics

We expand our previous deterministic power

A Unified Sampling Approach for Multipoint Analysis of Qualitative and Quantitative Traits in Sib Pairs

BST227 Introduction to Statistical Genetics. Lecture 4: Introduction to linkage and association analysis

Using Sex-Averaged Genetic Maps in Multipoint Linkage Analysis When Identity-by-Descent Status is Incompletely Known

Pedigree Analysis. A = the trait (a genetic disease or abnormality, dominant) a = normal (recessive)

Statistical Tests for X Chromosome Association Study. with Simulations. Jian Wang July 10, 2012

Complex Trait Genetics in Animal Models. Will Valdar Oxford University

Name Class Date. KEY CONCEPT The chromosomes on which genes are located can affect the expression of traits.

Combined Linkage and Association in Mx. Hermine Maes Kate Morley Dorret Boomsma Nick Martin Meike Bartels

During the hyperinsulinemic-euglycemic clamp [1], a priming dose of human insulin (Novolin,

Genome Scan Meta-Analysis of Schizophrenia and Bipolar Disorder, Part I: Methods and Power Analysis

Mendelian Genetics. KEY CONCEPT Mendel s research showed that traits are inherited as discrete units.

CS2220 Introduction to Computational Biology

GENETIC LINKAGE ANALYSIS

Supplementary Figures

Chapter 7: Pedigree Analysis B I O L O G Y

Tutorial on Genome-Wide Association Studies

For more information about how to cite these materials visit

Comparing heritability estimates for twin studies + : & Mary Ellen Koran. Tricia Thornton-Wells. Bennett Landman

Genomewide Linkage of Forced Mid-Expiratory Flow in Chronic Obstructive Pulmonary Disease

New Enhancements: GWAS Workflows with SVS

Imputation of Missing Genotypes from Sparse to High Density using Long-Range Phasing

MULTIPLE ALLELES. Ms. Gunjan M. Chaudhari

Genetic association analysis incorporating intermediate phenotypes information for complex diseases

Combined Analysis of Hereditary Prostate Cancer Linkage to 1q24-25

UNIT 2: GENETICS Chapter 7: Extending Medelian Genetics

Question 2: Which one of the following is the phenotypic monohybrid ratio in F2 generation? (a) 3:1 (b) 1:2:1 (c) 2:2 (d) 1:3 Solution 2: (a) 3 : 1

Mendelian Inheritance. Jurg Ott Columbia and Rockefeller Universities New York

QTs IV: miraculous and missing heritability

Mapping quantitative trait loci in humans: achievements and limitations

Linkage of a bipolar disorder susceptibility locus to human chromosome 13q32 in a new pedigree series

Replacing IBS with IBD: The MLS Method. Biostatistics 666 Lecture 15

Downloaded from Chapter 5 Principles of Inheritance and Variation

Problem set questions from Final Exam Human Genetics, Nondisjunction, and Cancer

Independent genome-wide scans identify a chromosome 18 quantitative-trait locus influencing dyslexia theoret read

Queensland scientists lead international study discovering Genes Behind Endometriosis

By Mir Mohammed Abbas II PCMB 'A' CHAPTER CONCEPT NOTES

Inbreeding and Inbreeding Depression

INTRODUCTION TO GENETIC EPIDEMIOLOGY (EPID0754) Prof. Dr. Dr. K. Van Steen

Multivariate QTL linkage analysis suggests a QTL for platelet count on chromosome 19q

Mendel: Understanding Inheritance. 7 th Grade Science Unit 4 NCFE Review

14 1 Human Heredity. Week 8 vocab Chapter 14

Interaction of Genes and the Environment

Alzheimer Disease and Complex Segregation Analysis p.1/29

Exam #2 BSC Fall. NAME_Key correct answers in BOLD FORM A

Class XII Chapter 5 Principles of Inheritance and Variation Biology

Mapping of genes causing dyslexia susceptibility Clyde Francks Wellcome Trust Centre for Human Genetics University of Oxford Trinity 2001

Genome-wide association studies (case/control and family-based) Heather J. Cordell, Institute of Genetic Medicine Newcastle University, UK

Transmission Disequilibrium Test in GWAS

COMPLETE DOMINANCE. Autosomal Dominant Inheritance Autosomal Recessive Inheritance

INTRODUCTION TO GENETIC EPIDEMIOLOGY (1012GENEP1) Prof. Dr. Dr. K. Van Steen

LINKAGE ANALYSIS IN PSYCHIATRIC DISORDERS: The Emerging Picture

Major quantitative trait locus for eosinophil count is located on chromosome 2q

Principles of Inheritance and Variation

Chapter 4 PEDIGREE ANALYSIS IN HUMAN GENETICS

The Foundations of Personalized Medicine

Sexual Reproduction & Inheritance

Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22.

Alcohol and nicotine use and dependence: Common genetic and other risk factors

HST.161 Molecular Biology and Genetics in Modern Medicine Fall 2007

The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0

Pedigree Construction Notes

UNIT 6 GENETICS 12/30/16

Roadmap. Inbreeding How inbred is a population? What are the consequences of inbreeding?

Evidence for linkage of nonsyndromic cleft lip with or without cleft palate to a region on chromosome 2

Lecture 13: May 24, 2004

SEX. Genetic Variation: The genetic substrate for natural selection. Sex: Sources of Genotypic Variation. Genetic Variation

A comparison between methods for linkage disequilibrium fine mapping of quantitative trait loci

It is shown that maximum likelihood estimation of

Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 6 Patterns of Inheritance

Linkage analysis: Prostate Cancer

Effects of Stratification in the Analysis of Affected-Sib-Pair Data: Benefits and Costs

Human genetic susceptibility to leprosy: methodological issues in a linkage analysis of extended pedigrees from Karonga district, Malawi

Lecture 1 Mendelian Inheritance

Transcription:

Performing linkage analysis using MERLIN David Duffy Queensland Institute of Medical Research Brisbane, Australia

Overview MERLIN and associated programs Error checking Parametric linkage analysis Nonparametric linkage analysis Variance components linkage analysis Calculating IBD for other programs Combination of SNPs in linkage disequilibrium Goncalo Abecasis s excellent program. 506 journal citations since 2002.

[davidd@bioinfo ~]$ merlin MERLIN 1.0.1 - (c) 2000-2005 Goncalo Abecasis References for this version of Merlin: Abecasis et al (2002) Nat Gen 30:97-101 citation] Fingerlin et al (2004) AJHG 74:432-43 association studies] Abecasis and Wigginton (2005) AJHG 77:754-67 parametric analyses] [original [case selection for [ld modeling,

Genetic linkage analysis: MERLIN For linkage analysis of binary or quantitative traits and many markers Small to moderately large families Performs: parametric and non-parametric linkage analysis variance components linkage analysis of quantitative traits regression-based analysis of quantitative traits: MERLIN-REGRESS multimarker-based ibd and kinship estimation haplotyping error detection simulation of marker data under null hypothesis of no linkage

Linkage disequilibrium between markers is allowed in models

Limitations Uses the Lander-Green approach to multipoint linkage, so not suitable for large pedigrees (>30 bits ) Maximizer for variance components analysis a little slow if multiple fixed effects included, and may sometimes get stuck

All the MERLIN Commands

[davidd@bioinfo ~]$ merlin The following parameters are in effect: Data File : merlin.dat (-dname) Pedigree File : merlin.ped (-pname) Missing Value Code : -99.999 (-xname) Map File : merlin.map (-mname) Allele Frequencies : ALL INDIVIDUALS (-f[a e f m file]) Random Seed : 123456 (-r9999) Data Analysis Options General : error, information, likelihood, model [param.tbl] IBD States : ibd, kinship, matrices, extended, select NPL Linkage : npl, pairs, qtl, deviates, exp VC Linkage : vc, usecovariates, ascertainment Haplotyping : best, sample, all, founders, horizontal Recombination : zero, one, two, three, singlepoint Positions : steps, maxstep, minstep, grid, start, stop Marker Clusters : clusters [], distance, rsq, cfreq Limits : bits [24], megabytes, minutes Performance : trim, nocouplebits, swap, cache [] Output : quiet, markernames, frequencies, perfamily, pdf, prefix [merlin] Simulation : simulate, reruns, save

The MERLIN pedigree file.ped There are three essential files: The pedigree file format is consistent with the GAS (Sib-pair) or LINKAGE formats: 01927 0192703 x x m 0 x x 1936.3404 1/2 1/1 1/1 01927 0192704 x x f 0 x x 1938.9885 1/2 1/1 1/1 01927 0192701 0192703 0192704 m MZ 2.3300 4.0943 1968.0000 x/x x/x x/x 01927 0192702 0192703 0192704 f MZ 2.1700 5.3293 1968.0000 1/2 1/1 1/1 01927 0192750 0192703 0192704 f 0 x 3.8067 1966.0000 1/2 1/1 1/1 01940 0194003 0 0 1 0 - - 1918.7665 0 0 0 0 0 0 01940 0194004 0 0 2 0 - - 1921.4147 1 1 1 1 1 2 01940 0194005 0 0 1 0-3.4012 1946.3317 1 1 2 1 1 1 01940 0194001 0194003 0194004 2 0 1.8100 4.2806 1949.0000 1 1 1 1 1 2 01940 0194002 0194003 0194004 2 0 2.7600 5.0599 1949.0000 1 1 1 1 1 1 01940 0194008 0194005 0194001 1 0-3.2189 1978.0000 1 1 1 1 1 2 01940 0194009 0194005 0194001 2 0-4.4998 1980.0000 1 1 2 1 1 1 The first five fields are pedigree_id, individual_id, father_id, mother_id, sex.

A character string is treated as missing (hence the use of x or - here). Therefore alleles must be numeric. A 0 for a marker or -99.999 are also missing data tokens. The slash between alleles at a genotype are optional. The MZ twin indicator is MZ in that column.

The MERLIN locus file.dat Z mztwin T aat T lige2 C yob M rs1303 M E366K M rs17580 M rs6647 S2 rs2753934 M rs1980618 The first column defines the variable type: S and S2 skip one or two columns of data, A is a binary trait (taking values [12yn] ). The second column gives the variable name

The MERLIN map file.map 14 rs1303 93.915 14 E366K 93.915 14 rs17580 93.917 14 rs6647 93.917 14 rs709932 93.919 14 rs17090730 93.920 The columns are chromosome, marker_name, map_position (in cm).

A typical call to MERLIN > merlin -d chr1.dat -m chr1.map -p chr1.ped grid 10 vc pdf Family: 42258 - Founders: 2 - Descendants: 2 - Bits: 2 Skipping Marker D11S2008_S [BAD INHERITANCE] Skipping Marker ATA27C11_M [BAD INHERITANCE] Family: 99008 - Founders: 2 - Descendants: 2 - Bits: 2 Skipping Marker D11S1301_S [BAD INHERITANCE]

A typical call to MERLIN Phenotype: ige1 [VC] (773 families, h2 = 48.46%) ================================================================== Position H2 ChiSq LOD pvalue 4.939 37.76% 5.91 1.28 0.008 14.939 31.25% 3.83 0.83 0.03 24.939 10.59% 0.52 0.11 0.2 34.939 0.00% 0.00 0.00 0.5 44.939 0.00% 0.00 0.00 0.5 54.939 0.00% 0.00 0.00 0.5 64.939 1.51% 0.03 0.01 0.4 74.939 3.38% 0.10 0.02 0.4 Columns are: 84.939 0.00% 0.00 0.00 0.5 map position where lod score evaluated heritability due to QTL at that position (AQE model) Likelihood Ratio Test Statistic testing H2=0

LRTS/(2 log(10)) One-sided P-value (as negative heritability not legal, H 0 on bound)

Error checking (unlikely double recombinants) Two or more close recombination events on the same chromosome are uncommon, due to the chance distribution of chiasmata, and because of interference. Broman and Weber [2000] suggest the probability of a double recombinant within a 20 cm interval in humans is only 2 in 1000. It is more likely that there is a genotyping error in one of the markers used to infer the location of the recombination event. > merlin -d chr1.dat -m chr1.map -p chr1.ped error Family: 83520 - Founders: 2 - Descendants: 2 - Bits: 2 D15S205 genotype for individual 8352001 is unlikely [2.016e-02] D15S205 genotype for individual 8352002 is unlikely [2.016e-02] Family: 85060 - Founders: 2 - Descendants: 3 - Bits: 4 D15S978 genotype for individual 8506002 is unlikely [2.410e-02]

Parametric Linkage Analysis This is a new addition to MERLIN s functionality. One needs an additional file of penetrances and trait allele frequencies: disease 0.01 0.0,0.5,0.5 dominant disease 0.10 0.0,0.0,0.5 recessive Each line is trait, allele_frequency, penetrances, model_name. The job is run as: > merlin -d chr1.dat -m chr1.map -p chr1.ped model pkd.model For multipoint analysis in small pedigrees, this is very fast.

Nonparametric Binary Trait Linkage Analysis MERLIN can carry out the Kong and Cox nonparametric affected-pedigree-member analysis using either the pairs or all statistics, and using either the linear or exponential models. The job is run as: > merlin -d chr1.dat -m chr1.map -p chr1.ped npl pairs

Haplotyping The default algorithm for this assumes there is no linkage disequilibrium between markers. MERLIN can be requested to combine adjacent SNPs into haplotypes where the 2 r is above a threshold. This is a necessary preliminary to using closely associated SNPs for linkage analysis, as missing parental genotypes can be wrongly inferred if LD is neglected.