Human population sub-structure and genetic association studies

Size: px
Start display at page:

Download "Human population sub-structure and genetic association studies"

Transcription

1 Human population sub-structure and genetic association studies Stephanie A. Santorico, Ph.D. Department of Mathematical & Statistical Sciences

2 Global Similarity Map from 23andme.com 2

3 Besides being cool This information can help us think about genetic influence on complex traits. Why is information about ancestry important in disease mapping studies? How can we measure ancestry from genetic data? How is this information used in the context of genetic association studies and, more specifically, in genome-wide association studies? 3

4 Motivating data: Sample of 4,920 Native Americans of the Pima and Papago tribes [1] Type 2 diabetes Haplotype from the Gm 3;5,13,14 system of human immunoglobulin G Question of interest: Is there association between the haplotype and disease? 4

5 H 0 : Haplotype frequency in cases = Haplotype frequency in controls Haplotype appears to be protective: 8% diabetic versus 29% diabetic 5

6 A BRIEF DETOUR 6

7 Confounding Factor of Interest Disease Confounder A confounding variable is an extraneous variable in a statistical model that correlates with both the factor of interest and the independent variable. 7

8 Public service announcement: Confounding is everywhere and is a good phenomenon to be aware of in any scientific study. 8

9 Here, confounding is due to admixture. HOW? More generally, we can have confounding due to population substructure or population stratification. Population stratification is the presence of a systematic difference in allele frequencies between subpopulations in a population. This is why great care is taken in genetic association studies to match and/or control for ancestry. 9

10 This example motivated a good deal of research on methods for correcting for confounding due to population substructure. The issue is not specific to any one complex trait. The problem does not go away with a bigger sample or more markers. However, with more markers, we can use that information to adjust for population structure. 10

11 Context: genome-wide association studies (GWAS) Box 2, i of [4]: an example from the type 2 diabetes component of the Welcome Trust Case Control Consortium study From here on, we will assume that tests have been conducted for each SNP over the genome using a chi-squared test statistic, e.g., X 2 2 1,, X 1,817,

12 Methods for Dealing with Population Substructure in GWAS 1. Genomic control 2. Principal components analysis 3. Structured association methods 4. Family-based studies 5. Mixed models With the exception of familybased studies, these are in order of increasing complexity. For each method, we will go through an overview and pros/cons for their usage. [2] is a good review of methods and further literature. 12

13 1. Genomic Control Devlin and Roeder [3] suggested the use of a genomic inflation factor, denoted by Concept: Population substructure inflates significance 13

14 FIGURE 1 of [2] The figure shows simulated P P plots under three scenarios for genome-wide scans with no causal markers. a No stratification: p-values fit the expected distribution. b Stratification without unusually differentiated markers: p- values exhibit modest genome-wide inflation. c Stratification with unusually differentiated markers: p-values exhibit modest genome-wide inflation and severe inflation at a small number of markers. 14

15 1. Genomic Control Devlin and Roeder [3] suggested the use of a genomic inflation factor, denoted by Concept: Population substructure inflates significance Measure inflation using the median of the chi-squared test statistics divided by the median corresponding to the distribution under no inflation: λ = Median X i If λ > 1, this indicates the presence of substructure Adjust the chi-squared statistics by dividing by this factor. This reduces the inflation Pros: Exceptionally easy. Cons: a single adjustment for all SNPs may not be appropriate. 15

16 Now days, GC is more often used as an indicator that appropriate adjustments have been made No adjustments made From [2], based on 99,900 SNPs unusually differentiated SNPs added for =0.6 16

17 2. Principal Components Analysis (PCA) PCA is a general, well-studied and utilized statistical method that dates to PCA finds uncorrelated combinations of variables that maximally explain variation. In GWAS, PCA is used to derive continuous axes of genetic variation These PCs can be used to detect outliers. Population substructure can be explored. Information from PCs can be used to match cases and controls based on ancestry or as a covariate adjustment in linear or logistic regression. 17

18 Genetic PCs correlated with ancestry Figure 1a from [5]

19 PCA usage in GWAS: Conduct a PCA with study samples and samples from public sources, representing diverse world populations, such as from 1000 genomes Determine if study samples match with selfreported ancestry, removing those that do not represent a homogenous study sample PCs on study samples can be used as covariates in subsequent analysis or for matching individuals based on ancestry. Software note: most statistical software packages will perform a PCA. Commonly used packages specific to this application are EIGENSOFT and SMARTPCA Pros: relatively easy with existing software. Powerful exploratory technique. Cons: requires some pre-treatment of SNPs, e.g., pruning of SNPs due to linkage disequilibrium and minor allele frequency Principal component analysis of 3557 study subjects with 1194 HapMap controls. Color-coding distinguishes HapMap groups and the study subjects (TZ). Axis labels indicate the percentage of variance explained by each eigenvector 19

20 3. Structured association methods These methods model or infer underlying population structure and assign individuals to these sub-populations or clusters Some established methods: GEM (Genetic Matching) [6] STRUCTURE [7] ADMIXTURE [8] 20

21 3. Structured association methods: Genetic Matching Figure 1 A&B from [6]: Flowchart for Genetic-Matching Algorithm Illustrated with Portions of the T1D Data. Distances between individuals are determined by the major axes of variation in the EVD representation. Outlier removal, illustrated by (A), is critical for revealing the subtle variability between individuals of similar ancestry. After major outliers are removed, clustering is used for discovery of homogeneous clusters; four distinct clusters are displayed here (B), plotted as principal component axes. 21

22 After matching cases and controls, testing is done within each strata of matched casecontrols and then evidence is combined over the strata, e.g., Cochran Mantel Hansel test or Conditional Logistic Regression Strata 1 Allele Case Control Strata K Allele Case Control A a A a 9 40 n 111 μ 111 = Var(n 111 ) = 8.6 n 11K μ 11K = Var(n 11K ) =

23 After matching cases and controls Testing is done within each strata of matched case-controls and then evidence is combined over the strata, e.g., Cochran Mantel Hansel test and, Conditional Logistic Regression Since testing is done within strata prior to combining information, inflation due to population substructure is controlled. Pros: allows for control of fine level structure. Enables rigorous use of public controls Cons: a multi-step process that requires a fair amount of computation, though software exists for such analyses 23

24 4. Family-based studies Family-based association tests focus on within-family information. Popular starting in the 90 s with the transmission disequilibrium test or TDT and numerous extensions such as the FBAT, PDT and QTDT Matching is done by focusing on transmitted versus non-transmitted alleles from heterozygous parents Since matching is internal to the family, population substructure is appropriately taken into account H 0 : proportion of 1 alleles transmitted to diseased individuals from 1/2 parents is

25 In [10], the TDT was used to test demonstrate association of class 1 alleles of the insulin gene 5 VNTR with insulin dependent diabetes. Previous case-control tests had found significant association, but linkage studies were not able to find significant linkage for this marker. H 0 : proportion of 1 alleles transmitted to diseased individuals from 1/2 parents is 0.5 NOT TRANSMITTED TRANSMITTED Class 1 Other Class 1 78 Other 46 T ( n n ) n n T 2 = =8.3 The p-value can be computed from a chisquared distribution with 1 degree of freedom to be

26 4. Family-based studies Family-based association tests focus on within-family information. Popular starting in the 90 s with the transmission disequilibrium test or TDT and numerous extensions such as the FBAT, PDT and QTDT Matching is done by focusing on transmitted versus non-transmitted alleles from heterozygous parents Since matching is internal to the family, population substructure is appropriately taken into account Pros: Statistics are easy to compute and easy to interpret. Does not require genome-wide data. Tests incorporate linkage information. Cons: These tests do not use between family information and hence are not fully powered. Information from homozygous parents is not used at all. H 0 : proportion of 1 alleles transmitted to diseased individuals from 1/2 parents is

27 5. Mixed Models Mixed by using a fixed component for a SNP effect and a random component reflecting underlying structure, e.g., population substructure and cryptic relatedness Adjusts for underlying structure by using genetic data to measure correlation between individuals y i = β 0 + β k X ik + η i + ε i var Y = σ a 2 S N + σ e 2 I S N is derived based on the matrix K = k ij where k ij = 1 M M k=1 n ik 2p k 2p k 1 p k n jk 2p k 27

28 Mixed model example 28 From [9]

29 5. Mixed Models Mixed by using a fixed component for a SNP effect and a random component reflecting underlying structure, e.g., population substructure and cryptic relatedness Adjusts for underlying structure by using genetic data to measure correlation between individuals y i = β 0 + β k X ik + η i + ε i var Y = σ a 2 S N + σ e 2 I S N is derived based on the matrix K = k ij where k ij = 1 M M k=1 n ik 2p k 2p k 1 p k n jk 2p k Pros: Adjusts for fine level structure including relatedness Cons: Much more computationally intensive but feasible with existing software packages such as EMMAX [9] as well as with the use of computing clusters 29

30 Concluding remarks and test Confounding presents itself in genetic studies through which is a concern because This can be detected through use of a summary measure,, which can be used as a quality control step and/or as an adjustment to test statistics. Beyond family-based methods, there are a number of methods that allow for adjustment of substructure. In increasing order of complexity, four such methods are 30

31 References 1. Knowler WC, Williams RC, Pettitt DJ, Steinberg a G. Gm3;5,13,14 and type 2 diabetes mellitus: an association in American Indians with genetic admixture. Am J Hum Genet. 1988;43: Price AL, Zaitlen N a, Reich D, Patterson N. New approaches to population stratification in genome-wide association studies. Nat Rev Genet [Internet]. Nature Publishing Group; 2010;11(7): Devlin B, Roeder K. Genomic control for association studies. Biometrics. 1999;55(4): McCarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J, Ioannidis JP a, et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet. 2008;9(May): Novembre J, Johnson T, Bryc K, Kutalik Z, Boyko AR, Auton A, et al. Genes mirror geography within Europe. Nature. 2008;456(7218): Luca D, Ringquist S, Klei L, Lee AB, Gieger C, Wichmann H-E, et al. On the use of general control samples for genome-wide association studies: genetic matching highlights causal variants. Am J Hum Genet Feb;82(2): Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000;155: Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19: Kang HM, Sul JH, Service SK, Zaitlen N a, Kong S-Y, Freimer NB, et al. Variance component model to account for sample structure in genome-wide association studies. Nat Genet [Internet]. Nature Publishing Group; 2010;42(4): Spielman RS, McGinnis RE, Ewens WJ (1993) Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am J Hum Genet 52:

Supplementary Figures

Supplementary Figures Supplementary Figures Supplementary Fig 1. Comparison of sub-samples on the first two principal components of genetic variation. TheBritishsampleisplottedwithredpoints.The sub-samples of the diverse sample

More information

Genome-wide association studies (case/control and family-based) Heather J. Cordell, Institute of Genetic Medicine Newcastle University, UK

Genome-wide association studies (case/control and family-based) Heather J. Cordell, Institute of Genetic Medicine Newcastle University, UK Genome-wide association studies (case/control and family-based) Heather J. Cordell, Institute of Genetic Medicine Newcastle University, UK GWAS For the last 8 years, genome-wide association studies (GWAS)

More information

CS2220 Introduction to Computational Biology

CS2220 Introduction to Computational Biology CS2220 Introduction to Computational Biology WEEK 8: GENOME-WIDE ASSOCIATION STUDIES (GWAS) 1 Dr. Mengling FENG Institute for Infocomm Research Massachusetts Institute of Technology mfeng@mit.edu PLANS

More information

Tutorial on Genome-Wide Association Studies

Tutorial on Genome-Wide Association Studies Tutorial on Genome-Wide Association Studies Assistant Professor Institute for Computational Biology Department of Epidemiology and Biostatistics Case Western Reserve University Acknowledgements Dana Crawford

More information

During the hyperinsulinemic-euglycemic clamp [1], a priming dose of human insulin (Novolin,

During the hyperinsulinemic-euglycemic clamp [1], a priming dose of human insulin (Novolin, ESM Methods Hyperinsulinemic-euglycemic clamp procedure During the hyperinsulinemic-euglycemic clamp [1], a priming dose of human insulin (Novolin, Clayton, NC) was followed by a constant rate (60 mu m

More information

New Enhancements: GWAS Workflows with SVS

New Enhancements: GWAS Workflows with SVS New Enhancements: GWAS Workflows with SVS August 9 th, 2017 Gabe Rudy VP Product & Engineering 20 most promising Biotech Technology Providers Top 10 Analytics Solution Providers Hype Cycle for Life sciences

More information

Supplementary Figure 1. Principal components analysis of European ancestry in the African American, Native Hawaiian and Latino populations.

Supplementary Figure 1. Principal components analysis of European ancestry in the African American, Native Hawaiian and Latino populations. Supplementary Figure. Principal components analysis of European ancestry in the African American, Native Hawaiian and Latino populations. a Eigenvector 2.5..5.5. African Americans European Americans e

More information

Transmission Disequilibrium Test in GWAS

Transmission Disequilibrium Test in GWAS Department of Computer Science Brown University, Providence sorin@cs.brown.edu November 10, 2010 Outline 1 Outline 2 3 4 The transmission/disequilibrium test (TDT) was intro- duced several years ago by

More information

Mendelian Randomization

Mendelian Randomization Mendelian Randomization Drawback with observational studies Risk factor X Y Outcome Risk factor X? Y Outcome C (Unobserved) Confounders The power of genetics Intermediate phenotype (risk factor) Genetic

More information

Allowing for Missing Parents in Genetic Studies of Case-Parent Triads

Allowing for Missing Parents in Genetic Studies of Case-Parent Triads Am. J. Hum. Genet. 64:1186 1193, 1999 Allowing for Missing Parents in Genetic Studies of Case-Parent Triads C. R. Weinberg National Institute of Environmental Health Sciences, Research Triangle Park, NC

More information

Ascertainment Through Family History of Disease Often Decreases the Power of Family-based Association Studies

Ascertainment Through Family History of Disease Often Decreases the Power of Family-based Association Studies Behav Genet (2007) 37:631 636 DOI 17/s10519-007-9149-0 ORIGINAL PAPER Ascertainment Through Family History of Disease Often Decreases the Power of Family-based Association Studies Manuel A. R. Ferreira

More information

Supplementary Online Content

Supplementary Online Content Supplementary Online Content Hartwig FP, Borges MC, Lessa Horta B, Bowden J, Davey Smith G. Inflammatory biomarkers and risk of schizophrenia: a 2-sample mendelian randomization study. JAMA Psychiatry.

More information

BST227 Introduction to Statistical Genetics. Lecture 4: Introduction to linkage and association analysis

BST227 Introduction to Statistical Genetics. Lecture 4: Introduction to linkage and association analysis BST227 Introduction to Statistical Genetics Lecture 4: Introduction to linkage and association analysis 1 Housekeeping Homework #1 due today Homework #2 posted (due Monday) Lab at 5:30PM today (FXB G13)

More information

GENOME-WIDE ASSOCIATION STUDIES

GENOME-WIDE ASSOCIATION STUDIES GENOME-WIDE ASSOCIATION STUDIES SUCCESSES AND PITFALLS IBT 2012 Human Genetics & Molecular Medicine Zané Lombard IDENTIFYING DISEASE GENES??? Nature, 15 Feb 2001 Science, 16 Feb 2001 IDENTIFYING DISEASE

More information

Transmission Disequilibrium Methods for Family-Based Studies Daniel J. Schaid Technical Report #72 July, 2004

Transmission Disequilibrium Methods for Family-Based Studies Daniel J. Schaid Technical Report #72 July, 2004 Transmission Disequilibrium Methods for Family-Based Studies Daniel J. Schaid Technical Report #72 July, 2004 Correspondence to: Daniel J. Schaid, Ph.D., Harwick 775, Division of Biostatistics Mayo Clinic/Foundation,

More information

Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22.

Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22. Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22.32 PCOS locus after conditioning for the lead SNP rs10993397;

More information

HHS Public Access Author manuscript Nat Genet. Author manuscript; available in PMC 2015 September 01.

HHS Public Access Author manuscript Nat Genet. Author manuscript; available in PMC 2015 September 01. LD Score Regression Distinguishes Confounding from Polygenicity in Genome-Wide Association Studies Brendan K. Bulik-Sullivan 1,2,3, Po-Ru Loh 4,5, Hilary Finucane 6, Stephan Ripke 2,3, Jian Yang 7,8, Schizophrenia

More information

Introduction to the Genetics of Complex Disease

Introduction to the Genetics of Complex Disease Introduction to the Genetics of Complex Disease Jeremiah M. Scharf, MD, PhD Departments of Neurology, Psychiatry and Center for Human Genetic Research Massachusetts General Hospital Breakthroughs in Genome

More information

Statistical Genetics : Gene Mappin g through Linkag e and Associatio n

Statistical Genetics : Gene Mappin g through Linkag e and Associatio n Statistical Genetics : Gene Mappin g through Linkag e and Associatio n Benjamin M Neale Manuel AR Ferreira Sarah E Medlan d Danielle Posthuma About the editors List of contributors Preface Acknowledgements

More information

Introduction to linkage and family based designs to study the genetic epidemiology of complex traits. Harold Snieder

Introduction to linkage and family based designs to study the genetic epidemiology of complex traits. Harold Snieder Introduction to linkage and family based designs to study the genetic epidemiology of complex traits Harold Snieder Overview of presentation Designs: population vs. family based Mendelian vs. complex diseases/traits

More information

Analyzing the genetic structure of populations: individual assignment

Analyzing the genetic structure of populations: individual assignment Analyzing the genetic structure of populations: individual assignment Introduction Although F -statistics are widely used and very informative, they suffer from one fundamental limitation: We have to know

More information

Assessing Accuracy of Genotype Imputation in American Indians

Assessing Accuracy of Genotype Imputation in American Indians Assessing Accuracy of Genotype Imputation in American Indians Alka Malhotra*, Sayuko Kobes, Clifton Bogardus, William C. Knowler, Leslie J. Baier, Robert L. Hanson Phoenix Epidemiology and Clinical Research

More information

Accurate Liability Estimation Substantially Improves Power in Ascertained Case. Running Title: Liability Estimation Improves Case Control GWAS

Accurate Liability Estimation Substantially Improves Power in Ascertained Case. Running Title: Liability Estimation Improves Case Control GWAS Accurate Liability Estimation Substantially Improves Power in Ascertained Case Control Studies Omer Weissbrod 1,*, Christoph Lippert 2, Dan Geiger 1 and David Heckerman 2,** 1 Computer Science Department,

More information

Introduction to Genetics and Genomics

Introduction to Genetics and Genomics 2016 Introduction to enetics and enomics 3. ssociation Studies ggibson.gt@gmail.com http://www.cig.gatech.edu Outline eneral overview of association studies Sample results hree steps to WS: primary scan,

More information

Imaging Genetics: Heritability, Linkage & Association

Imaging Genetics: Heritability, Linkage & Association Imaging Genetics: Heritability, Linkage & Association David C. Glahn, PhD Olin Neuropsychiatry Research Center & Department of Psychiatry, Yale University July 17, 2011 Memory Activation & APOE ε4 Risk

More information

A total of 2,822 Mexican dyslipidemic cases and controls were recruited at INCMNSZ in

A total of 2,822 Mexican dyslipidemic cases and controls were recruited at INCMNSZ in Supplemental Material The N342S MYLIP polymorphism is associated with high total cholesterol and increased LDL-receptor degradation in humans by Daphna Weissglas-Volkov et al. Supplementary Methods Mexican

More information

Predicting Country of Origin from Genetic Data G. David Poznik

Predicting Country of Origin from Genetic Data G. David Poznik Predicting Country of Origin from Genetic Data G. David Poznik Introduction Genetic variation in Europe is spatially structured; similarity decays with geographic distance. The most striking visual manifestation

More information

Review and Evaluation of Methods Correcting for Population Stratification with a Focus on Underlying Statistical Principles

Review and Evaluation of Methods Correcting for Population Stratification with a Focus on Underlying Statistical Principles Original Paper DOI: 10.1159/000119107 Published online: March 31, 2008 Review and Evaluation of Methods Correcting for Population Stratification with a Focus on Underlying Statistical Principles Hemant

More information

American Indians with Genetic Admixture

American Indians with Genetic Admixture Am. J. Hum. Genet. 43:52-526, 1988 Gm'35'13,14 and Type 2 Diabetes Mellitus: An Association in American Indians with Genetic Admixture William C. Knowler,* Robert C. Williams,'t, David J. Pettitt,* and

More information

Association mapping (qualitative) Association scan, quantitative. Office hours Wednesday 3-4pm 304A Stanley Hall. Association scan, qualitative

Association mapping (qualitative) Association scan, quantitative. Office hours Wednesday 3-4pm 304A Stanley Hall. Association scan, qualitative Association mapping (qualitative) Office hours Wednesday 3-4pm 304A Stanley Hall Fig. 11.26 Association scan, qualitative Association scan, quantitative osteoarthritis controls χ 2 test C s G s 141 47

More information

Quality Control Analysis of Add Health GWAS Data

Quality Control Analysis of Add Health GWAS Data 2018 Add Health Documentation Report prepared by Heather M. Highland Quality Control Analysis of Add Health GWAS Data Christy L. Avery Qing Duan Yun Li Kathleen Mullan Harris CAROLINA POPULATION CENTER

More information

Power Calculation for Testing If Disease is Associated with Marker in a Case-Control Study Using the GeneticsDesign Package

Power Calculation for Testing If Disease is Associated with Marker in a Case-Control Study Using the GeneticsDesign Package Power Calculation for Testing If Disease is Associated with Marker in a Case-Control Study Using the GeneticsDesign Package Weiliang Qiu email: weiliang.qiu@gmail.com Ross Lazarus email: ross.lazarus@channing.harvard.edu

More information

Using Ancestry Matching to Combine Family-Based and Unrelated Samples for Genome-Wide Association Studies

Using Ancestry Matching to Combine Family-Based and Unrelated Samples for Genome-Wide Association Studies Using Ancestry Matching to Combine Family-Based and Unrelated Samples for Genome-Wide Association Studies Andrew Crossett 1, Brian P Kent 1, Lambertus Klei 2, Steven Ringquist 3, Massimo Trucco 3, Kathryn

More information

Nature Genetics: doi: /ng Supplementary Figure 1

Nature Genetics: doi: /ng Supplementary Figure 1 Supplementary Figure 1 Illustrative example of ptdt using height The expected value of a child s polygenic risk score (PRS) for a trait is the average of maternal and paternal PRS values. For example,

More information

Supplementary Methods

Supplementary Methods Supplementary Methods Populations ascertainment and characterization Our genotyping strategy included 3 stages of SNP selection, with individuals from 3 populations (Europeans, Indian Asians and Mexicans).

More information

Whole-genome detection of disease-associated deletions or excess homozygosity in a case control study of rheumatoid arthritis

Whole-genome detection of disease-associated deletions or excess homozygosity in a case control study of rheumatoid arthritis HMG Advance Access published December 21, 2012 Human Molecular Genetics, 2012 1 13 doi:10.1093/hmg/dds512 Whole-genome detection of disease-associated deletions or excess homozygosity in a case control

More information

Dan Koller, Ph.D. Medical and Molecular Genetics

Dan Koller, Ph.D. Medical and Molecular Genetics Design of Genetic Studies Dan Koller, Ph.D. Research Assistant Professor Medical and Molecular Genetics Genetics and Medicine Over the past decade, advances from genetics have permeated medicine Identification

More information

Statistical Tests for X Chromosome Association Study. with Simulations. Jian Wang July 10, 2012

Statistical Tests for X Chromosome Association Study. with Simulations. Jian Wang July 10, 2012 Statistical Tests for X Chromosome Association Study with Simulations Jian Wang July 10, 2012 Statistical Tests Zheng G, et al. 2007. Testing association for markers on the X chromosome. Genetic Epidemiology

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

Rare Variant Burden Tests. Biostatistics 666

Rare Variant Burden Tests. Biostatistics 666 Rare Variant Burden Tests Biostatistics 666 Last Lecture Analysis of Short Read Sequence Data Low pass sequencing approaches Modeling haplotype sharing between individuals allows accurate variant calls

More information

Example HLA-B and abacavir. Roujeau 2014

Example HLA-B and abacavir. Roujeau 2014 Example HLA-B and abacavir Roujeau 2014 FDA requires testing for abacavir Treatment with abacavir is generally well tolerated, but 5% of the patients experience hypersensitivity reactions that can be life

More information

Association-heterogeneity mapping identifies an Asian-specific association of the GTF2I locus with rheumatoid arthritis

Association-heterogeneity mapping identifies an Asian-specific association of the GTF2I locus with rheumatoid arthritis Supplementary Material Association-heterogeneity mapping identifies an Asian-specific association of the GTF2I locus with rheumatoid arthritis Kwangwoo Kim 1,, So-Young Bang 1,, Katsunori Ikari 2,3, Dae

More information

Using ancestry estimates as tools to better understand group or individual differences in disease risk or disease outcomes

Using ancestry estimates as tools to better understand group or individual differences in disease risk or disease outcomes Using ancestry estimates as tools to better understand group or individual differences in disease risk or disease outcomes Jill Barnholtz-Sloan, Ph.D. Assistant Professor Case Comprehensive Cancer Center

More information

Reliability of Ordination Analyses

Reliability of Ordination Analyses Reliability of Ordination Analyses Objectives: Discuss Reliability Define Consistency and Accuracy Discuss Validation Methods Opening Thoughts Inference Space: What is it? Inference space can be defined

More information

Nonparametric Linkage Analysis. Nonparametric Linkage Analysis

Nonparametric Linkage Analysis. Nonparametric Linkage Analysis Limitations of Parametric Linkage Analysis We previously discued parametric linkage analysis Genetic model for the disease must be specified: allele frequency parameters and penetrance parameters Lod scores

More information

Statistical power and significance testing in large-scale genetic studies

Statistical power and significance testing in large-scale genetic studies STUDY DESIGNS Statistical power and significance testing in large-scale genetic studies Pak C. Sham 1 and Shaun M. Purcell 2,3 Abstract Significance testing was developed as an objective method for summarizing

More information

White Paper Guidelines on Vetting Genetic Associations

White Paper Guidelines on Vetting Genetic Associations White Paper 23-03 Guidelines on Vetting Genetic Associations Authors: Andro Hsu Brian Naughton Shirley Wu Created: November 14, 2007 Revised: February 14, 2008 Revised: June 10, 2010 (see end of document

More information

5/2/18. After this class students should be able to: Stephanie Moon, Ph.D. - GWAS. How do we distinguish Mendelian from non-mendelian traits?

5/2/18. After this class students should be able to: Stephanie Moon, Ph.D. - GWAS. How do we distinguish Mendelian from non-mendelian traits? corebio II - genetics: WED 25 April 2018. 2018 Stephanie Moon, Ph.D. - GWAS After this class students should be able to: 1. Compare and contrast methods used to discover the genetic basis of traits or

More information

AN INFORMATION VISUALIZATION APPROACH TO CLASSIFICATION AND ASSESSMENT OF DIABETES RISK IN PRIMARY CARE

AN INFORMATION VISUALIZATION APPROACH TO CLASSIFICATION AND ASSESSMENT OF DIABETES RISK IN PRIMARY CARE Proceedings of the 3rd INFORMS Workshop on Data Mining and Health Informatics (DM-HI 2008) J. Li, D. Aleman, R. Sikora, eds. AN INFORMATION VISUALIZATION APPROACH TO CLASSIFICATION AND ASSESSMENT OF DIABETES

More information

Understandable Statistics

Understandable Statistics Understandable Statistics correlated to the Advanced Placement Program Course Description for Statistics Prepared for Alabama CC2 6/2003 2003 Understandable Statistics 2003 correlated to the Advanced Placement

More information

On Missing Data and Genotyping Errors in Association Studies

On Missing Data and Genotyping Errors in Association Studies On Missing Data and Genotyping Errors in Association Studies Department of Biostatistics Johns Hopkins Bloomberg School of Public Health May 16, 2008 Specific Aims of our R01 1 Develop and evaluate new

More information

Family-based association tests for sequence data, and. comparisons with population-based association tests

Family-based association tests for sequence data, and. comparisons with population-based association tests Family-based association tests for sequence data, and comparisons with population-based association tests Iuliana Ionita-Laza,, Seunggeun Lee, Vladimir Makarov, Joseph D. Buxbaum,,5, and Xihong Lin, Department

More information

Part [2.1]: Evaluation of Markers for Treatment Selection Linking Clinical and Statistical Goals

Part [2.1]: Evaluation of Markers for Treatment Selection Linking Clinical and Statistical Goals Part [2.1]: Evaluation of Markers for Treatment Selection Linking Clinical and Statistical Goals Patrick J. Heagerty Department of Biostatistics University of Washington 174 Biomarkers Session Outline

More information

SNPrints: Defining SNP signatures for prediction of onset in complex diseases

SNPrints: Defining SNP signatures for prediction of onset in complex diseases SNPrints: Defining SNP signatures for prediction of onset in complex diseases Linda Liu, Biomedical Informatics, Stanford University Daniel Newburger, Biomedical Informatics, Stanford University Grace

More information

Supplementary Methods. 1. Cancer Genetic Markers of Susceptibility (CGEMS) Prostate Cancer Genome-Wide Association Scan

Supplementary Methods. 1. Cancer Genetic Markers of Susceptibility (CGEMS) Prostate Cancer Genome-Wide Association Scan Supplementary Methods 1. Cancer Genetic Markers of Susceptibility (CGEMS) Prostate Cancer Genome-Wide Association Scan The CGEMS data portal provides public access to summary results for approximately

More information

LTA Analysis of HapMap Genotype Data

LTA Analysis of HapMap Genotype Data LTA Analysis of HapMap Genotype Data Introduction. This supplement to Global variation in copy number in the human genome, by Redon et al., describes the details of the LTA analysis used to screen HapMap

More information

CONTENT SUPPLEMENTARY FIGURE E. INSTRUMENTAL VARIABLE ANALYSIS USING DESEASONALISED PLASMA 25-HYDROXYVITAMIN D. 7

CONTENT SUPPLEMENTARY FIGURE E. INSTRUMENTAL VARIABLE ANALYSIS USING DESEASONALISED PLASMA 25-HYDROXYVITAMIN D. 7 CONTENT FIGURES 3 SUPPLEMENTARY FIGURE A. NUMBER OF PARTICIPANTS AND EVENTS IN THE OBSERVATIONAL AND GENETIC ANALYSES. 3 SUPPLEMENTARY FIGURE B. FLOWCHART SHOWING THE SELECTION PROCESS FOR DETERMINING

More information

Biostatistics Faculty Publications

Biostatistics Faculty Publications University of Kentucky UKnowledge Biostatistics Faculty Publications Biostatistics 7-24-2009 On Quality Control Measures in Genome-Wide Association Studies: A Test to Assess the Genotyping Quality of Individual

More information

An Extension of the Regression of Offspring on Mid-Parent to Test for Association and Estimate Locus-Specific Heritability: The Revised ROMP Method

An Extension of the Regression of Offspring on Mid-Parent to Test for Association and Estimate Locus-Specific Heritability: The Revised ROMP Method doi: 10.1111/j.1469-1809.007.00401.x An Extension of the Regression of Offspring on Mid-Parent to Test for Association and Estimate Locus-Specific Heritability: The Revised ROMP Method M.-H. Roy-Gagnon

More information

Small-area estimation of mental illness prevalence for schools

Small-area estimation of mental illness prevalence for schools Small-area estimation of mental illness prevalence for schools Fan Li 1 Alan Zaslavsky 2 1 Department of Statistical Science Duke University 2 Department of Health Care Policy Harvard Medical School March

More information

Summary. Introduction. Atypical and Duplicated Samples. Atypical Samples. Noah A. Rosenberg

Summary. Introduction. Atypical and Duplicated Samples. Atypical Samples. Noah A. Rosenberg doi: 10.1111/j.1469-1809.2006.00285.x Standardized Subsets of the HGDP-CEPH Human Genome Diversity Cell Line Panel, Accounting for Atypical and Duplicated Samples and Pairs of Close Relatives Noah A. Rosenberg

More information

Genome-wide association study identifies variants in TMPRSS6 associated with hemoglobin levels.

Genome-wide association study identifies variants in TMPRSS6 associated with hemoglobin levels. Supplementary Online Material Genome-wide association study identifies variants in TMPRSS6 associated with hemoglobin levels. John C Chambers, Weihua Zhang, Yun Li, Joban Sehmi, Mark N Wass, Delilah Zabaneh,

More information

BST227: Introduction to Statistical Genetics

BST227: Introduction to Statistical Genetics BST227: Introduction to Statistical Genetics Lecture 11: Heritability from summary statistics & epigenetic enrichments Guest Lecturer: Caleb Lareau Success of GWAS EBI Human GWAS Catalog As of this morning

More information

Using Imputed Genotypes for Relative Risk Estimation in Case-Parent Studies

Using Imputed Genotypes for Relative Risk Estimation in Case-Parent Studies American Journal of Epidemiology Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health 011. Vol. 173, No. 5 DOI: 10.1093/aje/kwq363 Advance Access publication:

More information

Nature Genetics: doi: /ng Supplementary Figure 1. Country distribution of GME samples and designation of geographical subregions.

Nature Genetics: doi: /ng Supplementary Figure 1. Country distribution of GME samples and designation of geographical subregions. Supplementary Figure 1 Country distribution of GME samples and designation of geographical subregions. GME samples collected across 20 countries and territories from the GME. Pie size corresponds to the

More information

Performing. linkage analysis using MERLIN

Performing. linkage analysis using MERLIN Performing linkage analysis using MERLIN David Duffy Queensland Institute of Medical Research Brisbane, Australia Overview MERLIN and associated programs Error checking Parametric linkage analysis Nonparametric

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

Can We Increase the Likelihood of Success for Future Association Studies in Epilepsy?

Can We Increase the Likelihood of Success for Future Association Studies in Epilepsy? Epilepsia, 47(10):1617 1621, 2006 Blackwell Publishing, Inc. C 2006 International League Against Epilepsy Editorial Commentary Can We Increase the Likelihood of Success for Future Association Studies in

More information

STATISTICAL GENETICS 98 Transmission Disequilibrium, Family Controls, and Great Expectations

STATISTICAL GENETICS 98 Transmission Disequilibrium, Family Controls, and Great Expectations Am. J. Hum. Genet. 63:935 941, 1998 STATISTICAL GENETICS 98 Transmission Disequilibrium, Family Controls, and Great Expectations Daniel J. Schaid Departments of Health Sciences Research and Medical Genetics,

More information

# For the GWAS stage, B-cell NHL cases which small numbers (N<20) were excluded from analysis.

# For the GWAS stage, B-cell NHL cases which small numbers (N<20) were excluded from analysis. Supplementary Table 1a. Subtype Breakdown of all analyzed samples Stage GWAS Singapore Validation 1 Guangzhou Validation 2 Guangzhou Validation 3 Beijing Total No. of B-Cell Cases 253 # 168^ 294^ 713^

More information

Ridge regression for risk prediction

Ridge regression for risk prediction Ridge regression for risk prediction with applications to genetic data Erika Cule and Maria De Iorio Imperial College London Department of Epidemiology and Biostatistics School of Public Health May 2012

More information

Introduction of Genome wide Complex Trait Analysis (GCTA) Presenter: Yue Ming Chen Location: Stat Gen Workshop Date: 6/7/2013

Introduction of Genome wide Complex Trait Analysis (GCTA) Presenter: Yue Ming Chen Location: Stat Gen Workshop Date: 6/7/2013 Introduction of Genome wide Complex Trait Analysis (GCTA) resenter: ue Ming Chen Location: Stat Gen Workshop Date: 6/7/013 Outline Brief review of quantitative genetics Overview of GCTA Ideas Main functions

More information

Effects of Stratification in the Analysis of Affected-Sib-Pair Data: Benefits and Costs

Effects of Stratification in the Analysis of Affected-Sib-Pair Data: Benefits and Costs Am. J. Hum. Genet. 66:567 575, 2000 Effects of Stratification in the Analysis of Affected-Sib-Pair Data: Benefits and Costs Suzanne M. Leal and Jurg Ott Laboratory of Statistical Genetics, The Rockefeller

More information

Identification of Tissue Independent Cancer Driver Genes

Identification of Tissue Independent Cancer Driver Genes Identification of Tissue Independent Cancer Driver Genes Alexandros Manolakos, Idoia Ochoa, Kartik Venkat Supervisor: Olivier Gevaert Abstract Identification of genomic patterns in tumors is an important

More information

White Paper Estimating Genotype-Specific Incidence for One or Several Loci

White Paper Estimating Genotype-Specific Incidence for One or Several Loci White Paper 23-01 Estimating Genotype-Specific Incidence for One or Several Loci Authors: Mike Macpherson Brian Naughton Andro Hsu Joanna Mountain Created: September 5, 2007 Last Edited: November 18, 2007

More information

Nature Genetics: doi: /ng Supplementary Figure 1

Nature Genetics: doi: /ng Supplementary Figure 1 Supplementary Figure 1 Replicability of blood eqtl effects in ileal biopsies from the RISK study. eqtls detected in the vicinity of SNPs associated with IBD tend to show concordant effect size and direction

More information

IN SILICO EVALUATION OF DNA-POOLED ALLELOTYPING VERSUS INDIVIDUAL GENOTYPING FOR GENOME-WIDE ASSOCIATION STUDIES OF COMPLEX DISEASE.

IN SILICO EVALUATION OF DNA-POOLED ALLELOTYPING VERSUS INDIVIDUAL GENOTYPING FOR GENOME-WIDE ASSOCIATION STUDIES OF COMPLEX DISEASE. IN SILICO EVALUATION OF DNA-POOLED ALLELOTYPING VERSUS INDIVIDUAL GENOTYPING FOR GENOME-WIDE ASSOCIATION STUDIES OF COMPLEX DISEASE By Siddharth Pratap Thesis Submitted to the Faculty of the Graduate School

More information

Variation in PNPLA3 is associated with outcomes. in alcoholic liver disease

Variation in PNPLA3 is associated with outcomes. in alcoholic liver disease Variation in PNPLA3 is associated with outcomes in alcoholic liver disease Chao Tian 1, Renee P. Stokowski 1, David Kershenobich 2, Dennis G. Ballinger 1,3, David A. Hinds 1 1. Perlegen, 2021 Stierlin

More information

A UNIFIED FRAMEWORK FOR VARIANCE COMPONENT ESTIMATION WITH SUMMARY STATISTICS IN GENOME-WIDE ASSOCIATION STUDIES 1

A UNIFIED FRAMEWORK FOR VARIANCE COMPONENT ESTIMATION WITH SUMMARY STATISTICS IN GENOME-WIDE ASSOCIATION STUDIES 1 The Annals of Applied Statistics 2017, Vol. 11, No. 4, 2027 2051 https://doi.org/10.1214/17-aoas1052 Institute of Mathematical Statistics, 2017 A UNIFIED FRAMEWORK FOR VARIANCE COMPONENT ESTIMATION WITH

More information

Genome-wide Association Analysis Applied to Asthma-Susceptibility Gene. McCaw, Z., Wu, W., Hsiao, S., McKhann, A., Tracy, S.

Genome-wide Association Analysis Applied to Asthma-Susceptibility Gene. McCaw, Z., Wu, W., Hsiao, S., McKhann, A., Tracy, S. Genome-wide Association Analysis Applied to Asthma-Susceptibility Gene McCaw, Z., Wu, W., Hsiao, S., McKhann, A., Tracy, S. December 17, 2014 1 Introduction Asthma is a chronic respiratory disease affecting

More information

Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection

Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection Author's response to reviews Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection Authors: Jestinah M Mahachie John

More information

Dajiang J. Liu 1,2, Suzanne M. Leal 1,2 * Abstract. Introduction

Dajiang J. Liu 1,2, Suzanne M. Leal 1,2 * Abstract. Introduction A Novel Adaptive Method for the Analysis of Next- Generation Sequencing Data to Detect Complex Trait Associations with Rare Variants Due to Gene Main Effects and Interactions Dajiang J. Liu 1,2, Suzanne

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

MBG* Animal Breeding Methods Fall Final Exam

MBG* Animal Breeding Methods Fall Final Exam MBG*4030 - Animal Breeding Methods Fall 2007 - Final Exam 1 Problem Questions Mick Dundee used his financial resources to purchase the Now That s A Croc crocodile farm that had been operating for a number

More information

What can we contribute to cancer research and treatment from Computer Science or Mathematics? How do we adapt our expertise for them

What can we contribute to cancer research and treatment from Computer Science or Mathematics? How do we adapt our expertise for them From Bioinformatics to Health Information Technology Outline What can we contribute to cancer research and treatment from Computer Science or Mathematics? How do we adapt our expertise for them Introduction

More information

Memorial Sloan-Kettering Cancer Center

Memorial Sloan-Kettering Cancer Center Memorial Sloan-Kettering Cancer Center Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series Year 2007 Paper 14 On Comparing the Clustering of Regression Models

More information

Detecting Identity by Descent and Homozygosity Mapping in Whole-Exome Sequencing Data

Detecting Identity by Descent and Homozygosity Mapping in Whole-Exome Sequencing Data Detecting Identity by Descent and Homozygosity Mapping in Whole-Exome Sequencing Data Zhong Zhuang 1 *., Alexander Gusev 1., Judy Cho 3, Itsik e er 1,2 1 Department of Computer Science, Columbia University,

More information

Bayesian hierarchical modelling

Bayesian hierarchical modelling Bayesian hierarchical modelling Matthew Schofield Department of Mathematics and Statistics, University of Otago Bayesian hierarchical modelling Slide 1 What is a statistical model? A statistical model:

More information

Survey research (Lecture 1) Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.

Survey research (Lecture 1) Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4. Summary & Conclusion Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.0 Overview 1. Survey research 2. Survey design 3. Descriptives & graphing 4. Correlation

More information

Survey research (Lecture 1)

Survey research (Lecture 1) Summary & Conclusion Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.0 Overview 1. Survey research 2. Survey design 3. Descriptives & graphing 4. Correlation

More information

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research 2012 CCPRC Meeting Methodology Presession Workshop October 23, 2012, 2:00-5:00 p.m. Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy

More information

Our Stage 1 genotype scan was performed using Illumina Human1 Beadarrays, which have a

Our Stage 1 genotype scan was performed using Illumina Human1 Beadarrays, which have a Supplementary Note Analysis of Stage 1 GWAS and design of the Stage 2 iselect array Our Stage 1 genotype scan was performed using Illumina Human1 Beadarrays, which have a gene-centric design, and Illumina

More information

2) Cases and controls were genotyped on different platforms. The comparability of the platforms should be discussed.

2) Cases and controls were genotyped on different platforms. The comparability of the platforms should be discussed. Reviewers' Comments: Reviewer #1 (Remarks to the Author) The manuscript titled 'Association of variations in HLA-class II and other loci with susceptibility to lung adenocarcinoma with EGFR mutation' evaluated

More information

Supplementary Figure 1. Quantile-quantile (Q-Q) plot of the log 10 p-value association results from logistic regression models for prostate cancer

Supplementary Figure 1. Quantile-quantile (Q-Q) plot of the log 10 p-value association results from logistic regression models for prostate cancer Supplementary Figure 1. Quantile-quantile (Q-Q) plot of the log 10 p-value association results from logistic regression models for prostate cancer risk in stage 1 (red) and after removing any SNPs within

More information

Data Analysis in Practice-Based Research. Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine

Data Analysis in Practice-Based Research. Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine Data Analysis in Practice-Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine Multilevel Data Statistical analyses that fail to recognize

More information

Publications (* denote senior corresponding author)

Publications (* denote senior corresponding author) Publications (* denote senior corresponding author) 1. Sha Q, Zhang K, * Zhang SL (2016) A nonparametric regression approach to control for population stratification in rare variant association studies.

More information

TITLE: A Genome-wide Breast Cancer Scan in African Americans. CONTRACTING ORGANIZATION: University of Southern California, Los Angeles, CA 90033

TITLE: A Genome-wide Breast Cancer Scan in African Americans. CONTRACTING ORGANIZATION: University of Southern California, Los Angeles, CA 90033 Award Number: W81XWH-08-1-0383 TITLE: A Genome-wide Breast Cancer Scan in African Americans PRINCIPAL INVESTIGATOR: Christopher A. Haiman CONTRACTING ORGANIZATION: University of Southern California, Los

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis Basic Concept: Extend the simple regression model to include additional explanatory variables: Y = β 0 + β1x1 + β2x2 +... + βp-1xp + ε p = (number of independent variables

More information

Challenges in Developing Learning Algorithms to Personalize mhealth Treatments

Challenges in Developing Learning Algorithms to Personalize mhealth Treatments Challenges in Developing Learning Algorithms to Personalize mhealth Treatments JOOLHEALTH Bar-Fit Susan A Murphy 01.16.18 HeartSteps SARA Sense 2 Stop Continually Learning Mobile Health Intervention 1)

More information