Power Calculation for Testing If Disease is Associated with Marker in a Case-Control Study Using the GeneticsDesign Package

Similar documents
BST227 Introduction to Statistical Genetics. Lecture 4: Introduction to linkage and association analysis

For more information about how to cite these materials visit

New Enhancements: GWAS Workflows with SVS

Transmission Disequilibrium Test in GWAS

Statistical Tests for X Chromosome Association Study. with Simulations. Jian Wang July 10, 2012

CS2220 Introduction to Computational Biology

Non-parametric methods for linkage analysis

Statistical power and significance testing in large-scale genetic studies

Human population sub-structure and genetic association studies

Ascertainment Through Family History of Disease Often Decreases the Power of Family-based Association Studies

Imaging Genetics: Heritability, Linkage & Association

Indian Journal of Nephrology Indian J Nephrol 2001;11: 88-97

Roadmap. Inbreeding How inbred is a population? What are the consequences of inbreeding?

Genetics and Pharmacogenetics in Human Complex Disorders (Example of Bipolar Disorder)

Assessing Accuracy of Genotype Imputation in American Indians

Nonparametric Linkage Analysis. Nonparametric Linkage Analysis

IN SILICO EVALUATION OF DNA-POOLED ALLELOTYPING VERSUS INDIVIDUAL GENOTYPING FOR GENOME-WIDE ASSOCIATION STUDIES OF COMPLEX DISEASE.

Lecture 6: Linkage analysis in medical genetics

An Introduction to Quantitative Genetics I. Heather A Lawson Advanced Genetics Spring2018

Replacing IBS with IBD: The MLS Method. Biostatistics 666 Lecture 15

Chapter 2. Linkage Analysis. JenniferH.BarrettandM.DawnTeare. Abstract. 1. Introduction

A Comparison of Sample Size and Power in Case-Only Association Studies of Gene-Environment Interaction

Single SNP/Gene Analysis. Typical Results of GWAS Analysis (Single SNP Approach) Typical Results of GWAS Analysis (Single SNP Approach)

Haplotypes of VKORC1, NQO1 and GGCX, their effect on activity levels of vitamin K-dependent coagulation factors, and the risk of venous thrombosis

Transmission Disequilibrium Methods for Family-Based Studies Daniel J. Schaid Technical Report #72 July, 2004

Association mapping (qualitative) Association scan, quantitative. Office hours Wednesday 3-4pm 304A Stanley Hall. Association scan, qualitative

Heritability and genetic correlations explained by common SNPs for MetS traits. Shashaank Vattikuti, Juen Guo and Carson Chow LBM/NIDDK

Association-heterogeneity mapping identifies an Asian-specific association of the GTF2I locus with rheumatoid arthritis

Decomposition of the Genotypic Value

Using ancestry estimates as tools to better understand group or individual differences in disease risk or disease outcomes

Regression-based Linkage Analysis Methods

Genetic association analysis incorporating intermediate phenotypes information for complex diseases

Performing. linkage analysis using MERLIN

Quantitative Genetic Methods to Dissect Heterogeneity in Complex Traits

Mendelian Inheritance. Jurg Ott Columbia and Rockefeller Universities New York

Association of XRCC5 polymorphisms with COPD and COPD-related phenotypes in the Han Chinese population: a case-control cohort study

Introduction to the Genetics of Complex Disease

Introduction to linkage and family based designs to study the genetic epidemiology of complex traits. Harold Snieder

Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22.

Supplementary Online Content

Handling Immunogenetic Data Managing and Validating HLA Data

GENETIC LINKAGE ANALYSIS

Genome-wide association studies (case/control and family-based) Heather J. Cordell, Institute of Genetic Medicine Newcastle University, UK

Introduction to Genetics and Genomics

Lecture 1 Mendelian Inheritance

The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0

Association between interleukin-17a polymorphism and coronary artery disease susceptibility in the Chinese Han population

Supplementary Methods. 1. Cancer Genetic Markers of Susceptibility (CGEMS) Prostate Cancer Genome-Wide Association Scan

Effects of age-at-diagnosis and duration of diabetes on GADA and IA-2A positivity

Statistical Genetics : Gene Mappin g through Linkag e and Associatio n

MBG* Animal Breeding Methods Fall Final Exam

(b) What is the allele frequency of the b allele in the new merged population on the island?

Research Article Power Estimation for Gene-Longevity Association Analysis Using Concordant Twins

Passing It On. QUESTION: How are inherited characteristics passed from parent to offspring? toothpicks - red and green

Alzheimer Disease and Complex Segregation Analysis p.1/29

5/2/18. After this class students should be able to: Stephanie Moon, Ph.D. - GWAS. How do we distinguish Mendelian from non-mendelian traits?

Sponsored document from The Journal of Allergy and Clinical Immunology

University of Groningen. Metabolic risk in people with psychotic disorders Bruins, Jojanneke

Supplementary Figure 1. Principal components analysis of European ancestry in the African American, Native Hawaiian and Latino populations.

The Role of Host: Genetic Variation

A Bayesian approach for constructing genetic maps when markers are miscoded

Department of Psychiatry, Henan Mental Hospital, The Second Affiliated Hospital of Xinxiang Medical University, Xinxiang, China

Genome-wide association studies for human narcolepsy and other complex diseases

Statistical questions for statistical methods

Antigen Presentation to T lymphocytes

Title: Pinpointing resilience in Bipolar Disorder

ASSOCIATION OF KCNJ1 VARIATION WITH CHANGE IN FASTING GLUCOSE AND NEW ONSET DIABETES DURING HCTZ TREATMENT

MENDELIAN GENETICS. MENDEL RULE AND LAWS Please read and make sure you understand the following instructions and knowledge before you go on.

Supplementary Figures

Correlations between the COMT gene rs4680 polymorphism and susceptibility to ovarian cancer

Selection at one locus with many alleles, fertility selection, and sexual selection

Global variation in copy number in the human genome

Animal Science USAMV Iaşi; University of Agricultural Sciences and Veterinary Medicine Ion Ionescu de la Brad Iaşi ; Institute of Life Sciences at

Relationship between genetic polymorphisms in the DRD5 gene and paranoid schizophrenia in northern Han Chinese

SNPrints: Defining SNP signatures for prediction of onset in complex diseases

Mendelian Genetics using Fast Plants Report due Sept. 15/16. Readings: Mendelian genetics: Hartwell Chapter 2 pp , Chapter 5 pp

Hellinger Optimal Criterion and HPA- Optimum Designs for Model Discrimination and Probability-Based Optimality

Genetic variants on 17q21 are associated with asthma in a Han Chinese population

Population Genetics 4: Assortative mating

Original Article. C18orf1 located on chromosome 18p11.2 may confer susceptibility to schizophrenia

Supplementary Online Content

Nature Genetics: doi: /ng Supplementary Figure 1. Study design.

Unequal Numbers of Judges per Subject

The Making of the Fittest: Natural Selection in Humans

Tutorial on Genome-Wide Association Studies

Myoglobin A79G polymorphism association with exercise-induced skeletal muscle damage

Combining Different Marker Densities in Genomic Evaluation

Association of Single Nucleotide Polymorphisms (SNPs) in CCR6, TAGAP and TNFAIP3 with Rheumatoid Arthritis in African Americans

TADA: Analyzing De Novo, Transmission and Case-Control Sequencing Data

Drosophila melanogaster. Introduction. Drosophila melanogaster is a kind of flies fruit fly that is widely used in genetic

Systems of Mating: Systems of Mating:

Mendelian & Complex Traits. Quantitative Imaging Genomics. Genetics Terminology 2. Genetics Terminology 1. Human Genome. Genetics Terminology 3

The association between TCM syndromes and SCAP polymorphisms in subjects with non-alcoholic fatty liver disease

COMBINING PHENOTYPIC AND GENOTYPIC DATA TO DISCOVER MULTIPLE DISEASE GENES. Hannu Toivonen, Saara Hyvönen, Petteri Sevon

MULTIFACTORIAL DISEASES. MG L-10 July 7 th 2014

Rare Variant Burden Tests. Biostatistics 666

PopGen4: Assortative mating

Using Imputed Genotypes for Relative Risk Estimation in Case-Parent Studies

The Association Design and a Continuous Phenotype

Evaluating the role of the 620W allele of protein tyrosine phosphatase PTPN22 in Crohn s disease and multiple sclerosis

Transcription:

Power Calculation for Testing If Disease is Associated with Marker in a Case-Control Study Using the GeneticsDesign Package Weiliang Qiu email: weiliang.qiu@gmail.com Ross Lazarus email: ross.lazarus@channing.harvard.edu October 30, 2017 1 Introduction In genetics association studies, investigators are interested in testing the association between disease and DSL (disease susceptibility locus) in a case-control study. However, DSL is usually unknown. Hence what one can do is to test the association between disease and DSL indirectly by testing the association between disease and a marker. The power of the association between disease and DSL depends on the power of the association between disease and the marker, and on the LD (linkage disequilibrium) between disease and the marker. Assume that both DSL and the marker are biallelic and assume additive model. Given the MAFs (minor allele frequencies) for both DSL and the marker, relative risks, r 2 or D between DSL and marker, the numbers of cases and controls, and the significant level for hypothesis testing, the tools in the GeneticsDesign package can calculate the power for the association test that disease is associated with DSL in a case-control study. This information can help investigator to determine appropriate sample sizes in experiment design stage. 2 Examples To call the functions in the R package GeneticsDesign, we first need to load it into R: > library(geneticsdesign) The following is a sample code to get a table of power for different combinations of high risk allele frequency P r(a) and genotype relative risk RR(Aa aa). 1

> set1<-seq(from0.1, to0.5, by0.1) > set2<-c(1.25, 1.5, 1.75, 2.0) > len1<-length(set1) > len2<-length(set2) > mat<-matrix(0, nrowlen1, ncollen2) > rownames(mat)<-paste("maf", set1, sep"") > colnames(mat)<-paste("rraa", set2, sep"") > for(i in 1:len1) + { a<-set1[i] + for(j in 1:len2) + { b<-set2[j] + res<-gpc.default(paa,pd0.1,rraa(1+b)/2, RRAAb, Dprime1,pBa, quiett) + mat[i,j]<-res$power + } + } > print(round(mat,3)) RRAA1.25 RRAA1.5 RRAA1.75 RRAA2 MAF0.1 0.145 0.400 0.690 0.884 MAF0.2 0.214 0.594 0.877 0.977 MAF0.3 0.259 0.682 0.926 0.989 MAF0.4 0.280 0.711 0.936 0.990 MAF0.5 0.282 0.702 0.925 0.986 A Technical Details Suppose that A is the minor allele for the disease locus; a is the common allele for the disease locus; B is the minor allele for the marker locus; and b is the common allele for the marker locus. We use D to denote the disease status diseased and use D to denote the disease status not diseased. Given the minor allele frequencies P r(a) and P r(b) and Linkage disequilibrium (LD) measure D, we can get haplotype frequencies P r(ab), P r(ab), P r(ab), and P r(ab): where P r(ab) P r(a)p r(b) + D P r(ab) P r(a)p r(b) D P r(ab) P r(a)p r(b) D P r(ab) P r(a)p r(b) + D, D D d max 2

and That is, d max { min [P r(a)p r(b), P r(a)p r(b)], if D > 0, max [ P r(a)p r(b), P r(a)p r(b)], if D < 0. D P r(ab) P r(a)p r(b). Note that D > 0 means P r(ab) > P r(a)p r(b), i.e., the probability of occuring the haplotype AB is higher than the probability that the haplotype AB occurs merely by chance. D < 0 means P r(ab) < P r(a)p r(b), i.e., the probability of occuring the haplotype AB is smaller than the probability that the haplotype AB occurs merely by chance. In other words, given the same sample size, we can observe haplotype AB more often when D > 0 than when D < 0. Hence the power of association test that marker is associated with disease is larger when D > 0 than when D < 0. Hence, we assume that D > 0. D can also be rewritten as D P r(b A)P r(a) P r(a)p r(b) [P r(b A) P r(b)] P r(a). Hence D > 0 is equivalent to P r(b A) > P r(a) which indicates positive association between minor allele A in disease locus and minor allele B in marker locus. The relative risks are RR(AA aa) P r(d AA) P r(d aa) RR(Aa aa) P r(d Aa) P r(d aa) The disease prevalence can be rewritten as P r(d AA)P r(aa) + P r(d Aa)P r(aa) + P r(d aa)p r(aa). Dividing both sides by P r(d aa), we get Hence P r(d aa) P r(d AA) P r(d Aa) P r(d aa) P r(aa) + P r(aa) + P r(d aa) P r(d aa) P r(d aa) P r(aa) RR(AA aa) [P r(a)] 2 + RR(Aa aa)2p r(a)p r(a) + [P r(a)] 2. P r(d aa) RR(AA aa) [P r(a)] 2 + RR(Aa aa)2p r(a)p r(a) + [P r(a)] 2 P r(d Aa) RR(Aa aa)p r(d aa) P r(d AA) RR(AA aa)p r(d aa) 3

Once we obtain the penetrances of disease locus (P r(d aa), P r(d Aa), and P r(d AA)), we can get the sampling probabilities: P r(d, BB) P r(bb D) P r(d, BB, AA) + P r(d, BB, Aa) + P r(d, BB, aa) P r(d BB, AA)P r(bb, AA) + P r(d BB, Aa)P r(bb, Aa) + P r(d BB, aa)p r(bb, aa) P r(bb D) P r(d, Bb) P r(d, Bb, AA) + P r(d, Bb, Aa) + P r(d, Bb, aa) P r(d Bb, AA)P r(bb, AA) + P r(d Bb, Aa)P r(bb, Aa) + P r(d Bb, aa)p r(bb, aa) P r(d, bb) P r(bb D) P r(d, bb, AA) + P r(d, bb, Aa) + P r(d, bb, aa) P r(d bb, AA)P r(bb, AA) + P r(d bb, Aa)P r(bb, Aa) + P r(d bb, aa)p r(bb, aa) We assume that P r(d BB, AA) P r(d AA), P r(d BB, Aa) P r(d Aa), P r(d BB, aa) P r(d aa). We also have P r(bb, AA) [P r(ab)] 2 P r(bb, Aa) 2P r(ab)p r(ab) P r(bb, aa) [P r(ab)] 2 P r(bb, AA) 2P r(ab)p r(ab) P r(bb, Aa) 2 [P r(ab) P r(ab) + P r(ab)p r(ab)] P r(bb, aa) 2P r(ab)p r(ab) P r(bb, AA) [P r(ab)] 2 P r(bb, Aa) 2P r(ab)p r(ab) P r(bb, aa) [P r(ab)] 2 4

Hence P r(d BB, AA)P r(bb, AA) + P r(d BB, Aa)P r(bb, Aa) + P r(d BB, aa)p r(bb, aa) P r(bb D) P r(d AA) [P r(ab)]2 + P r(d Aa) [2P r(ab)p r(ab)] + P r(d aa) [P r(ab)] 2 P r(d Bb, AA)P r(bb, AA) + P r(d Bb, Aa)P r(bb, Aa) + P r(d Bb, aa)p r(bb, aa) P r(bb D) P r(d AA) [2P r(ab)p r(ab)] + P r(d Aa) {2 [P r(ab)p r(ab) + P r(ab)p r(ab)]} P r(d aa) [2P r(ab)p r(ab)] + P r(d bb, AA)P r(bb, AA) + P r(d bb, Aa)P r(bb, Aa) + P r(d bb, aa)p r(bb, aa) P r(bb D) P r(d AA) [P r(ab)]2 + P r(d Aa) [2P r(ab)p r(ab)] + P r(d aa) [P r(ab)] 2 To calculate the sampling probabilities P r(bb D), P r(bb D), and P r(bb D), we can apply Bayesian rules again: P r(bb D) P r( D BB)P r(bb) P r( D) P r(bb D) P r( D Bb)P r(bb) P r( D) P r(bb D) P r( D bb)p r(bb) P r( D) [1 P r(d BB)][P r(b)]2 1 [1 P r(d Bb)]2P r(b)p r(b) 1 [1 P r(d bb)][p r(b)]2 1 and P r(d BB) P r(d Bb) P r(d bb) P r(bb D) P r(bb D) P r(bb) [P r(b)] 2 P r(bb D) P r(bb D) P r(bb) [2P r(b)p r(b)] P r(bb D) P r(bb D) P r(bb) [P r(b)] 2. 5

The expected allele frequencies are The counts for cases P r(b D) 2n casep r(bb D) + n case P r(bb D) 2n case P r(bb D) + P r(bb D)/2 P r(b D) 2n casep r(bb D) + n case P r(bb D) 2n case P r(bb D) + P r(bb D)/2 P r(b D) P r(bb D) + P r(bb D)/2 P r(b D) P r(bb D) + P r(bb D)/2. n 2c n cases P r(bb D), n 1c n cases P r(bb D), n 0c n cases P r(bb D). The counts for controls n 2n n controls P r(bb D), n 1n n controls P r(bb D), n 0n n controls P r(bb D). Table 1: Counts for cases and controls marker genotype cases controls BB n 2c n 2n Bb n 1c n 1n bb n 0c n 0n total n cases n controls Odds ratios are Define OR(BB bb) n 2cn 0n n 2n n 0c P r(bb D)P r(bb D) P r(bb D)P r(bb D) OR(Bb bb) n 1cn 0n n 1n n 0c P r(bb D)P r(bb D) P r(bb D)P r(bb D) U 3 ( S x i N r i R ) N s i, i1 6

where We have R n 0c + n 1c + n 2c, S n 0n + n 1n + n 2n, N R + S.x 1 2, x 2 1, x 3 0, r 1 n 0c, r 2 n 1c, r 3 n 2c, s 1 n 0n, s 2 n 1n, s 3 n 2n. V V ar(u) RS N N 3 ( 3 3 ) 2 x 2 i n i x i n i. i1 i1 The Linear Trend Test Statistic 1 Z 2 T ( U V ) 2 asymptotically follows a χ 2 1 distribution under the null hypothesis. Under alternative hypothesis, Z 2 T asymptotically follows a non-central chi-square distribution χ2 1,λ T with non-centrality parameter where λ T RS [ 3 i1 x i(p 1i p 0i ) ] 2 3 i1 x2 i (Rp 0i + Sp 1i ) [ 3 i1 x i(rp 0i + Sp 1i ] 2 /N, x 1 2, x 2 1, x 3 0, n 1 n 2c + n 2n, n 2 n 1c + n 1n, n 3 n 0c + n 0n, p 01 P r(bb D), p 02 P r(bb D), p 03 P r(bb D), p 11 P r(bb D), p 12 P r(bb D), p 13 P r(bb D). Given significant level α, the power 1 β satisfies: References P r(z 2 T > c H 0 ) α P r(z 2 T > c H a ) 1 β. [1] Armitage, P. Tests for linear trends in proportions and frequencies. Biometrics, 11, 375-386, 1955. [2] Cochran, W. G. Some methods for strengthening the common chi-squared tests. Biometrics, 10, 417-451, 1954. 1 http://linkage.rockefeller.edu/pawe3d/help/linear-trend-test-ncp.html 7

[3] Gordon D. and Finch S. J. and Nothnagel M. and Ott J. Power and sample size calculations for case-control genetic association tests when errors are present: application to single nucleotide polymorphisms. Hum. Hered., 54:22-33, 2002. [4] Gordon D. and Haynes C. and Blumenfeld J. and Finch S. J. PAWE-3D: visualizing Power for Association With Error in case/control genetic studies of complex traits. Bioinformatics, 21:3935-3937, 2005. [5] Purcell S. and Cherny S. S. and Sham P. C. Genetic Power Calculator: design of linkage and association genetic mapping studies of complex trait Bioinformatics, 19(1):149-150, 2003. [6] Sham P. Statistics in Human Genetics. Arnold Applications of Statistics, 1998. 8