Statistical Tests for X Chromosome Association Study. with Simulations. Jian Wang July 10, 2012

Similar documents
Pedigree Analysis Why do Pedigrees? Goals of Pedigree Analysis Basic Symbols More Symbols Y-Linked Inheritance

Mendelian Genetics using Fast Plants Report due Sept. 15/16. Readings: Mendelian genetics: Hartwell Chapter 2 pp , Chapter 5 pp

(b) What is the allele frequency of the b allele in the new merged population on the island?

Name: PS#: Biol 3301 Midterm 1 Spring 2012

Introduction of Genome wide Complex Trait Analysis (GCTA) Presenter: Yue Ming Chen Location: Stat Gen Workshop Date: 6/7/2013

Non-parametric methods for linkage analysis

Assessing Gene-Environment Interactions in Genome-Wide Association Studies: Statistical Approaches

Supplementary Figures

Lab 5: Testing Hypotheses about Patterns of Inheritance

The Determination of the Genetic Order and Genetic Map for the Eye Color, Wing Size, and Bristle Morphology in Drosophila melanogaster

2/3 x 1 x 1/4 = 2/12 = 1/6

The Association Design and a Continuous Phenotype

UNIVERSITY OF CALIFORNIA, LOS ANGELES

Analysis of single gene effects 1. Quantitative analysis of single gene effects. Gregory Carey, Barbara J. Bowers, Jeanne M.

Your DNA extractions! 10 kb

Section 8.1 Studying inheritance

For more information about how to cite these materials visit

Tutorial on Genome-Wide Association Studies

An Introduction to Quantitative Genetics I. Heather A Lawson Advanced Genetics Spring2018

UNIT 3 GENETICS LESSON #30: TRAITS, GENES, & ALLELES. Many things come in many forms. Give me an example of something that comes in many forms.

COMPLETE DOMINANCE. Autosomal Dominant Inheritance Autosomal Recessive Inheritance

New Enhancements: GWAS Workflows with SVS

Estimating genetic variation within families

Decomposition of the Genotypic Value

BIOL 364 Population Biology Fairly testing the theory of evolution by natural selection with playing cards

MBG* Animal Breeding Methods Fall Final Exam

Statistical power and significance testing in large-scale genetic studies

Score Tests of Normality in Bivariate Probit Models

CS2220 Introduction to Computational Biology

A Comparison of Sample Size and Power in Case-Only Association Studies of Gene-Environment Interaction

Chapter 7: Pedigree Analysis B I O L O G Y

Genetics All somatic cells contain 23 pairs of chromosomes 22 pairs of autosomes 1 pair of sex chromosomes Genes contained in each pair of chromosomes

MENDELIAN GENETICS. MENDEL RULE AND LAWS Please read and make sure you understand the following instructions and knowledge before you go on.

Mendelian Genetics. Gregor Mendel. Father of modern genetics

Introduction to Genetics and Heredity

Drosophila melanogaster. Introduction. Drosophila melanogaster is a kind of flies fruit fly that is widely used in genetic

1. A homozygous yellow pea plant is crossed with a homozygous green pea plant, Knowing that yellow is the dominant trait for pea plants:

Mendel Short IGES 2003 Data Preparation. Eric Sobel. Department of of Human Genetics UCLA School of of Medicine

Systems of Mating: Systems of Mating:

An Introduction to Quantitative Genetics

University of Groningen. Metabolic risk in people with psychotic disorders Bruins, Jojanneke

Mendelian Genetics and Beyond Chapter 4 Study Prompts

Bio 312, Spring 2017 Exam 3 ( 1 ) Name:

Lab Activity Report: Mendelian Genetics - Genetic Disorders

Genetics & The Work of Mendel

Pedigree Analysis. A = the trait (a genetic disease or abnormality, dominant) a = normal (recessive)

What we mean more precisely is that this gene controls the difference in seed form between the round and wrinkled strains that Mendel worked with

Rare Variant Burden Tests. Biostatistics 666

The laws of Heredity. Allele: is the copy (or a version) of the gene that control the same characteristics.

Genome-wide Association Analysis Applied to Asthma-Susceptibility Gene. McCaw, Z., Wu, W., Hsiao, S., McKhann, A., Tracy, S.

Nonparametric Linkage Analysis. Nonparametric Linkage Analysis

Unit 1 Review. 3. If the male parent had the following genotypes, what alleles would his gametes (sperm) contain? A. AABB B. AaBb C. aabb D.

IN SILICO EVALUATION OF DNA-POOLED ALLELOTYPING VERSUS INDIVIDUAL GENOTYPING FOR GENOME-WIDE ASSOCIATION STUDIES OF COMPLEX DISEASE.

Performing. linkage analysis using MERLIN

Laboratory. Mendelian Genetics

VOCABULARY somatic cell autosome fertilization gamete sex chromosome diploid homologous chromosome sexual reproduction meiosis

Su Yon Jung 1*, Eric M. Sobel 2, Jeanette C. Papp 2 and Zuo-Feng Zhang 3

Quantitative Trait Analysis in Sibling Pairs. Biostatistics 666

Daniel Boduszek University of Huddersfield

Genetics and Diversity Punnett Squares

LTA Analysis of HapMap Genotype Data

Two copies of each autosomal gene affect phenotype.

Introduction to Quantitative Genetics

Complex Trait Genetics in Animal Models. Will Valdar Oxford University

Lecture 13: May 24, 2004

HARDY- WEINBERG PRACTICE PROBLEMS

HST.161 Molecular Biology and Genetics in Modern Medicine Fall 2007

Supplementary Figure 1. Quantile-quantile (Q-Q) plots. (Panel A) Q-Q plot graphical

Lecture 6 Practice of Linkage Analysis

Blood Types and Genetics

DRAGON GENETICS Understanding Inheritance 1

Power Calculation for Testing If Disease is Associated with Marker in a Case-Control Study Using the GeneticsDesign Package

Applied Medical. Statistics Using SAS. Geoff Der. Brian S. Everitt. CRC Press. Taylor Si Francis Croup. Taylor & Francis Croup, an informa business

Genetics- The field of biology that studies how characteristics are passed from one generation to another.

Genes and Inheritance (11-12)

Supplementary Methods. 1. Cancer Genetic Markers of Susceptibility (CGEMS) Prostate Cancer Genome-Wide Association Scan

Genetics & The Work of Mendel. AP Biology

DEFINITIONS: POPULATION: a localized group of individuals belonging to the same species

Double The Muscle: Genotype and Probability

Genetics & The Work of Mendel

Pedigree Construction Notes

Genetic variants on 17q21 are associated with asthma in a Han Chinese population

Univariate modeling. Sarah Medland

NOTES: Exceptions to Mendelian Genetics!

A. Incorrect! Cells contain the units of genetic they are not the unit of heredity.

25.1 QUANTITATIVE TRAITS

Mendel: Understanding Inheritance. 7 th Grade Science Unit 4 NCFE Review

Does Mendel s work suggest that this is the only gene in the pea genome that can affect this particular trait?

Patterns of Inheritance

Heredity and Meiosis AP BIOLOGY. Heredity. Slide 1 / 142 Slide 2 / 142. Slide 4 / 142. Slide 3 / 142. Slide 6 / 142. Slide 5 / 142.

Chapter 2. Linkage Analysis. JenniferH.BarrettandM.DawnTeare. Abstract. 1. Introduction

InGen: Dino Genetics Lab Lab Related Activity: DNA and Genetics

Mendelian Genetics: Patterns of Inheritance

Diallel Analysis and its Applications in Plant Breeding

Solutions to Genetics Unit Exam

Human population sub-structure and genetic association studies

WHAT S IN THIS LECTURE?

NOTES: : HUMAN HEREDITY

Association mapping (qualitative) Association scan, quantitative. Office hours Wednesday 3-4pm 304A Stanley Hall. Association scan, qualitative

The Making of the Fittest: Natural Selection in Humans

Transcription:

Statistical Tests for X Chromosome Association Study with Simulations Jian Wang July 10, 2012

Statistical Tests Zheng G, et al. 2007. Testing association for markers on the X chromosome. Genetic Epidemiology 31:834-843 NOT assuming X- inactivation (Wrong!) Six different association tests: combinations of allele-based test and genotype-based test in males and females. Use the minimum p value of the six test statistics Need to adjust for the correlation between the test statistics

Allele-counting method? Count alleles in a 2*2 table and compare the allele counts between cases and controls. Assumes that the effect of 1 copy of a variant allele on phenotype is the same in males as in females Males have half impact on the results of this test as females. Assuming Hardy-Weinberg proportion in females. Would have wrong size if HWP fails.

Statistical Tests Clayton D. 2008. Testing for association on the X chromosome. Biostatistics 9:593-600 Assuming X-inactivation Plinks: commonly used NOT assuming X-inactivation Using females only Using whole sample with sex as a covariate genotypes of male are coded as (0,1) for a and A, respectively genotypes of female are coded as (0,1,2) for aa, aa and AA, respectively

Clayton s Approaches Assumptions: X-inactivation Allele frequencies are same in males and females NOT assume Hardy-Weinberg proportion in females Statistical tests: 1 degree-of-freedom test 2 degree-of-freedom test (less powerful)

1 df Test for Autosomal Loci Consider an autosomal diallelic locus with genotypes aa, Aa and AA coded as A i =(0,1,2), respectively, and a phenotype Y i, i=1,2 N. Score test for testing for an additive effect of this locus on phenotype is given as the genotypephenotype covariance: is the mean of Y in the sample. Assume the variance V A is constant for all A i Consider the distribution Pr(A i Y i )

1 df Test for Autosomal Loci The statistic is asymptotically normally distributed under null hypothesis with mean 0 and variance where V A is the variance of genotype A i : The ratio is asymptotically distributed as chi-squared on 1df. Applicable for a quantitative phenotype.

1 df Test for X Chromosome Loci X-inactivation: A female will have approximately half her cells with 1 copy active while the remainder of her cells have the other copy activated. Males should be equivalent to homozygous females. Coding for X loci: Males: A i =(0,2) for a and A alleles respectively Females: A i =(0,1,2) for aa, Aa and AA genotypes respectively.

1 df Test for X Chromosome Loci If the allele frequencies are same in males and females The expectation of A is equal in males and females The expectation of U A will remain 0 The variance of A differs between males and females Variance is 2p(1-p) in females and 4p(1-p) in males, where p is the minor allele frequency. The same 1df chi-squared test can be conducted.

Stratified Tests If the allele frequencies are NOT same in males and females Stratified score test in the analysis The test statistic and its estimated variance are calculated separately in each stratum Sum the test statistics and variances over strata Same 1 df chi-sq test can be performed The different stratum contributions would need to be weighted appropriately If the sex and phenotype has strong association, this will result in loss of power However, in the simulation studies, we find that there is no significant power loss

Conditional on Genotype Consider the distribution Pr(Y i A i ) V Y = An identical asymptotic test Regress Y on A in a GLM and test for regression coefficient of A using a Huber-White estimate for the variance-covariance matrix of coefficients The stratified version of the test would be obtained by additionally including the stratifying factor in the GLM

R Package for Clayton s Approaches R package: whole-genome association studies http://www.bioconductor.org/packages/2.10/bioc/html/snpstats.html Library: snpstats It can use the plink binary ped file as input Incorporate the information of diploid for female genotype An example: genobed<-"xchrom_17.bed" genobim<-"xchrom_17.bim" genofam<-"xchrom_17.fam" data<-read.plink(genobed,genobim,genofam) data1<-data D<-data$fam$sex==2 data1$genotype<-new("xsnpmatrix",data$genotype,diploid=d) re1<-single.snp.tests(data1$fam$affected,data=data1$fam,snp.data=data1$genotype) re2<-single.snp.tests(data1$fam$affected,data1$fam$sex,data=data1$fam,snp.data=data1$genotype)

R Package for Clayton s Approaches Not adjust for sex Adjust for sex Chi.square Chi.square Chi.square Chi.square N d.1.df d.2.df P.1df P.2df d.1.df d.2.df P.1df P.2df rs3120733 3480 7.680666 8.94684 0.005582 0.011408 8.095423 9.361597 0.004438 0.009272 rs525161 3481 12.38907 14.60077 0.000432 0.000675 11.75987 13.97157 0.000605 0.000925 rs559165 3480 11.23412 11.65569 0.000803 0.002944 11.21578 11.63735 0.000811 0.002972 rs28576503 3482 12.98154 13.78788 0.000315 0.001014 12.90062 13.70695 0.000328 0.001056 rs28719866 3482 11.11458 11.40087 0.000857 0.003345 11.02953 11.31582 0.000897 0.00349 rs28444648 3480 10.15823 10.47012 0.001437 0.005326 9.91551 10.2274 0.001639 0.006014 rs28599172 3467 10.5431 10.54374 0.001166 0.005134 11.97719 11.97784 0.000539 0.002506 rs5940510 3481 7.719249 9.57082 0.005464 0.008351 7.962648 9.814218 0.004775 0.007394 rs12557310 3480 13.14358 13.2422 0.000289 0.001332 13.44997 13.54858 0.000245 0.001143 rs1454268 3482 9.265828 11.56982 0.002335 0.003074 9.689129 11.99312 0.001854 0.002487 rs1084451 3482 8.886967 10.98022 0.002872 0.004127 9.270666 11.36392 0.002329 0.003407 rs5940536 3469 13.228 14.87472 0.000276 0.000589 11.50504 13.15176 0.000694 0.001394 rs5983743 3481 9.826408 9.888733 0.00172 0.007123 9.764787 9.827113 0.001779 0.007346 rs5940540 3477 10.39686 10.87112 0.001262 0.004359 11.0602 11.53446 0.000882 0.003128 rs5940403 3476 11.62007 11.92036 0.000652 0.002579 11.63626 11.93654 0.000647 0.002559 rs676920 3482 10.94061 12.35371 0.000941 0.002077 11.10087 12.51396 0.000863 0.001917 rs553678 3475 12.41785 12.93499 0.000425 0.001553 12.54723 13.06436 0.000397 0.001456

Genetic Model Assumption Clayton s approach considered X-inactivation, but does not model other X chromosome features. We are trying to model X-inactivation, skewness and escaping in X-inactivation Autosomal Additive Dominant Recessive AA 2 1 1 Aa 1 1 0 aa 0 0 0 X Chromosome (Female) Additive Dominant Recessive Escaping AA 1 1 1 2 Aa 0.5 1 0 1 aa 0 0 0 0 Random X-inactivation: additive More than half cells have copy with risk allele active: dominant Less than half cells have copy with risk allele active: recessive Escaping: the effect of Aa in female is same as the one of A in male (plink) X Chromosome (Male) A 1 1 1 1 a 0 0 0 0 Gendrel and Heard. Fifty years of X-inactivation research. Development 138(23):5049 5055

Possible Approach For each SNP on X chromosome, we apply four different coding schemes and conduct regression on each coding scheme Four correlated statistical tests, so we need to adjust for multiple testing Take the minimum of the 4 p values FDR The resulting 4 p values are adjusted for multiple testing using the approach proposed by Conneely and Boehnke (AJHG, 2007): pact Compare the results with plink and Clayton s 1df approach

Simulation Studies Consider 4 di-allelic SNPs, X 1, X 2, X 3 and X 4 and a binary phenotype Y =(0,1) Logit(Pr(Y=1 X 1,X 2, X 3,X 4, Sex)) =b 0 +b 1 x 1 +b 2 x 2 + b 3 x 3 +b 4 x 4 +b 5 Sex OR MAF SNP1 SNP2 SNP3 SNP4 Sex SNP1&2 SNP3&4 Model b0 exp(b1) exp(b2) exp(b3) exp(b4) exp(b5) F M F M -2.55 1.3 1 1.3 1 1 0.4 0.4 0.25 0.25 ADD -2.55 1.3 1 1.3 1 1 0.4 0.4 0.25 0.25 DOM -2.55 1.3 1 1.3 1 1 0.4 0.4 0.25 0.25 REC -2.55 1.3 1 1.3 1 1 0.4 0.4 0.25 0.25 PLINK -2.55 1.3 1 1.3 1 1 0.2 0.4 0.2 0.25 ADD -2.55 1.3 1 1.3 1 1 0.2 0.4 0.2 0.25 DOM -2.55 1.3 1 1.3 1 1 0.2 0.4 0.2 0.25 REC -2.55 1.3 1 1.3 1 1 0.2 0.4 0.2 0.25 PLINK -2.55 1.3 1 1.3 1 1 0.4 0.2 0.25 0.2 ADD -2.55 1.3 1 1.3 1 1 0.4 0.2 0.25 0.2 DOM -2.55 1.3 1 1.3 1 1 0.4 0.2 0.25 0.2 REC -2.55 1.3 1 1.3 1 1 0.4 0.2 0.25 0.2 PLINK -2.95 1.3 1 1.3 1 1.5 0.4 0.4 0.25 0.25 ADD -2.95 1.3 1 1.3 1 1.5 0.4 0.4 0.25 0.25 DOM -2.95 1.3 1 1.3 1 1.5 0.4 0.4 0.25 0.25 REC -2.95 1.3 1 1.3 1 1.5 0.4 0.4 0.25 0.25 PLINK -2.95 1.3 1 1.3 1 1.5 0.2 0.4 0.2 0.25 ADD -2.95 1.3 1 1.3 1 1.5 0.2 0.4 0.2 0.25 DOM -2.95 1.3 1 1.3 1 1.5 0.2 0.4 0.2 0.25 REC -2.95 1.3 1 1.3 1 1.5 0.2 0.4 0.2 0.25 PLINK -2.95 1.3 1 1.3 1 1.5 0.4 0.2 0.25 0.2 ADD -2.95 1.3 1 1.3 1 1.5 0.4 0.2 0.25 0.2 DOM -2.95 1.3 1 1.3 1 1.5 0.4 0.2 0.25 0.2 REC -2.95 1.3 1 1.3 1 1.5 0.4 0.2 0.25 0.2 PLINK

Simulation Results: Type I Errors

Simulation Results: Powers

Future Work Better understanding of the biological mechanism for X chromosome Other approaches to simulate data which could better represent X chromosome Bayesian approaches to incorporate information of the X chromosome features, such as skewness Thanks!