Statistical Genetics. Matthew Stephens. Statistics Retreat, October 26th 2012

Similar documents
GENOME-WIDE ASSOCIATION STUDIES

ChromHMM Tutorial. Jason Ernst Assistant Professor University of California, Los Angeles

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

Not IN Our Genes - A Different Kind of Inheritance.! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014

Genome-wide Association Studies (GWAS) Pasieka, Science Photo Library

Computational aspects of ChIP-seq. John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory

BST227 Introduction to Statistical Genetics. Lecture 4: Introduction to linkage and association analysis

Genome-Wide Localization of Protein-DNA Binding and Histone Modification by a Bayesian Change-Point Method with ChIP-seq Data

Bioinformatics and Computational Pharmacology

Processing, integrating and analysing chromatin immunoprecipitation followed by sequencing (ChIP-seq) data

Introduction to the Genetics of Complex Disease

During the hyperinsulinemic-euglycemic clamp [1], a priming dose of human insulin (Novolin,

BST227: Introduction to Statistical Genetics

Chromatin marks identify critical cell-types for fine-mapping complex trait variants

An epigenetic approach to understanding (and predicting?) environmental effects on gene expression

Patterns of Histone Methylation and Chromatin Organization in Grapevine Leaf. Rachel Schwope EPIGEN May 24-27, 2016

Session 6: Integration of epigenetic data. Peter J Park Department of Biomedical Informatics Harvard Medical School July 18-19, 2016

Missing Heritablility How to Analyze Your Own Genome Fall 2013

RNA-seq Introduction

STAT1 regulates microrna transcription in interferon γ stimulated HeLa cells

Yue Wei 1, Rui Chen 2, Carlos E. Bueso-Ramos 3, Hui Yang 1, and Guillermo Garcia-Manero 1

Heritability enrichment of differentially expressed genes. Hilary Finucane PGC Statistical Analysis Call January 26, 2016

Supplemental Figure S1. Tertiles of FKBP5 promoter methylation and internal regulatory region

Research Article Identifying Liver Cancer-Related Enhancer SNPs by Integrating GWAS and Histone Modification ChIP-seq Data

An expanded view of complex traits: from polygenic to omnigenic

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq

Nature Structural & Molecular Biology: doi: /nsmb.2419

Sudin Bhattacharya Institute for Integrative Toxicology

Association mapping (qualitative) Association scan, quantitative. Office hours Wednesday 3-4pm 304A Stanley Hall. Association scan, qualitative

Human population sub-structure and genetic association studies

CS2220 Introduction to Computational Biology

Challenges of CGH array testing in children with developmental delay. Dr Sally Davies 17 th September 2014

Allelic reprogramming of the histone modification H3K4me3 in early mammalian development

Host Genomics of HIV-1

Nature Genetics: doi: /ng Supplementary Figure 1

Functional annotation of farm animal genomes: ChIP-seq.

Supervised Learner for the Prediction of Hi-C Interaction Counts and Determination of Influential Features. Tyler Yue Lab

QTL Studies- Past, Present and Future. David Evans

5/2/18. After this class students should be able to: Stephanie Moon, Ph.D. - GWAS. How do we distinguish Mendelian from non-mendelian traits?

Nature Immunology: doi: /ni Supplementary Figure 1. Characteristics of SEs in T reg and T conv cells.

Epigenetics. Jenny van Dongen Vrije Universiteit (VU) Amsterdam Boulder, Friday march 10, 2017

New Enhancements: GWAS Workflows with SVS

RESEARCHER S NAME: Làszlò Tora RESEARCHER S ORGANISATION: Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC)

Supplementary Figure 1. Nature Genetics: doi: /ng.3736

The Epigenome Tools 2: ChIP-Seq and Data Analysis

Epigenetic Mechanisms

Introduction to Systems Biology of Cancer Lecture 2

The Risk of Anti-selection in Protection Business from Advances in Statistical Genetics

7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans.

Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22.

FOXO3 Regulates Fetal Hemoglobin Levels in Sickle Cell Anemia. Yankai Zhang, Jacy R. Crosby, Eric Boerwinkle, Vivien A. Sheehan

Transcript-indexed ATAC-seq for immune profiling

Annotation of Functional Regulatory Elements in Livestock Species

Gene Regulation Part 2

Introduction to genetic variation. He Zhang Bioinformatics Core Facility 6/22/2016

High-Throughput Sequencing Course

Mendelian Randomization

Raymond Auerbach PhD Candidate, Yale University Gerstein and Snyder Labs August 30, 2012

Transcriptional control in Eukaryotes: (chapter 13 pp276) Chromatin structure affects gene expression. Chromatin Array of nuc

A rare variant in MYH6 confers high risk of sick sinus syndrome. Hilma Hólm ESC Congress 2011 Paris, France

EPIGENOMICS PROFILING SERVICES

Golden Helix s End-to-End Solution for Clinical Labs

genomics for systems biology / ISB2020 RNA sequencing (RNA-seq)

Fragile X Syndrome. Genetics, Epigenetics & the Role of Unprogrammed Events in the expression of a Phenotype

High Throughput Sequence (HTS) data analysis. Lei Zhou

Genetics in the Health of African Americans: Obesity and Ovarian Cancer. Taylor Walker

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types.

Genes, Aging and Skin. Helen Knaggs Vice President, Nu Skin Global R&D

GENETIC SUSCEPTIBILITY TO CANCER

Are you the way you are because of the

Frontiers in Personalized Medicine. PW-GW-AS DNA sequencing Reverse human genetics

Yingying Wei George Wu Hongkai Ji

Rare Variant Burden Tests. Biostatistics 666

Doing more with genetics: Gene-environment interactions

10/19/2017. How Nutritional Genomics Affects You in Nutrition Research and Practice Joyanna Hansen, PhD, RD & Kristin Guertin, PhD, MPH

Large-scale identity-by-descent mapping discovers rare haplotypes of large effect. Suyash Shringarpure 23andMe, Inc. ASHG 2017

DISSERTATION. Adam Michael Suhy. Graduate Program in Integrated Biomedical Science Program. The Ohio State University. Dissertation Committee:

Genetic association analysis incorporating intermediate phenotypes information for complex diseases

Chapter 1 : Genetics 101

Supplemental Figure 1: Asymmetric chromatin maturation leads to epigenetic asymmetries on sister chromatids.

Nature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality.

Genetic Analysis of Anxiety Related Behaviors by Gene Chip and In situ Hybridization of the Hippocampus and Amygdala of C57BL/6J and AJ Mice Brains

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

ChIP-seq data analysis

GeneOverlap: An R package to test and visualize

REVIEWERS' COMMENTS: Reviewer #1 (Remarks to the Author):

Transcription and chromatin. General Transcription Factors + Promoter-specific factors + Co-activators

Epigenetics: Basic Principals and role in health and disease

Walking upright Specific changes in chewing design: teeth, jaws and skull. Homonoidea, Hominidae, Hominininae, Hominini, Hominina, Homo

Identifying the Zygosity Status of Twins Using Bayes Network and Estimation- Maximization Methodology

David Tamborero, PhD

Interaction of Genes and the Environment

Research in IBD at University of Colorado Denver

An Introduction to Quantitative Genetics I. Heather A Lawson Advanced Genetics Spring2018

Quantitative genetics: traits controlled by alleles at many loci

Assessing Accuracy of Genotype Imputation in American Indians

Peak-calling for ChIP-seq and ATAC-seq

Epigenetics and Autoimmune Disease

Alice Sigurdson, Ph.D.

EVOLUTION. Reading. Research in my Lab. Who am I? The Unifying Concept in Biology. Professor Carol Lee. On your Notecards please write the following:

Transcription:

Statistical Genetics Statistics Retreat, October 26th 2012

Two stories The two most influential statistical ideas in analysis of genetic association studies. 1 Sequence, sequence, everywhere. 1 With apologies to Steve Stigler

Story I: Genetic Association Studies Genetic association studies aim to identify genetic variants that modify risk of common diseases or affect other phenotypes (e.g. Type I Diabetes, height, LDL cholestrol). The idea is absurdly simple: measure genetic variants (usually SNPs), and phenotypes in randomly-sampled individuals, and see which SNPs are correlated with phenotypes.

Story I: Genetic Association Studies Typical recent genome-wide studies have typed 500K-1M SNPs in thousands of (unrelated) phenotyped individuals. Basic Analysis: test each SNP, one-by-one, for statistical association with each phenotype.

Progress identifying variants underlying common disease Published Genome Wide Associations through 09/2011 1,617 published GWA at p 5X10 8 for 249 traits NHGRI GWA Catalog www.genome.gov/gwastudies Credit:

The two most influential statistical ideas in GWAS Correction for unmeasured confounding (population structure). Imputation to combine studies.

Population Structure and Unmeasured Confounding The Problem in a nutshell: What would happen if you conducted a Genetic Association study for Chopstick Use in San Francisco?

Population Structure and Unmeasured Confounding If you know the genetic background of the individuals in your study (e.g. which continent they inherited their genes from), then you can correct for it. What if you don t know it?

Principal Components Analysis to the rescue! Novembre et al, Nature, 2008

Principal Components Analysis to the rescue! Test for significance of genetic effect β, controlling for effects of genetic background (α): y = vα + xβ + ɛ Price et al, Nature Genetics, 2006

The two most influential statistical ideas in GWAS Correction for unmeasured confounding (population structure). Imputation to combine studies. Credit: Bryan Howie

Genotype(imputa-on(background( 0% 0% 1% 1% 1% 0% 0% 1% 1% 0% 0% 0% 1% 1% 1% 0% 0% 0% 0% 0% 1% 1% 1% 0% 1% 1% 1% 0% 0% 1% 1% 1% 1% 1% 1% 0% 0% 0% 1% 0% 0% 0% 0% 0% 0% 1% 0% 1% 1% 0% 0% 0% 1% 1% 1% 1% 1% 0% 0% 1% 1% 2% 0% 0% 1% 1% 1% 1% 0%?% 0% 0% 0% 1% 1% 1% 0% 1% 1% 2% 0% 0% 1% 1%?% 2% 0% 0% 0% 0% 1% 1% 1% 1% 0%?% 0% 2% 0% 0% 1% 1% 1% 1% 1% 1% 1% 2% Reference( haplotypes( Phenotyped( GWAS( samples( SNPs%genotyped%on%an%array%

Genotype(imputa-on(background( 0% 0% 1% 1% 1% 0% 0% 1% 1% 0% 0% 0% 1% 1% 1% 0% 0% 0% 0% 0% 1% 1% 1% 0% 1% 1% 1% 0% 0% 1% 1% 1% 1% 1% 1% 0% 0% 0% 1% 0% 0% 0% 0% 0% 0% 1% 0% 1% 1% 0% 0% 0% 1% 1% 1% 1% 1% 0% 0% 1% 1%?%?%?% 2%?% 0%?%?%?%?% 0% 1%?% 1% 1%?%?%?% 1%?% 0%?%?%?%?%?% 0%?% 0% 0%?%?%?% 1%?% 1%?%?%?%?% 1% 0%?% 1% 1%?%?%?% 2%?% 0%?%?%?%?% 0% 1%?% 1%?%?%?%?% 2%?% 0%?%?%?%?% 0% 0%?% 0% 1%?%?%?% 1%?% 1%?%?%?%?% 1% 0%?%?% 0%?%?%?% 2%?% 0%?%?%?%?% 0% 1%?% 1% 1%?%?%?% 1%?% 1%?%?%?%?% 1% 1%?% 2% Reference( haplotypes( Phenotyped( GWAS( samples( Untyped%SNPs%

Genotype(imputa-on(background( 0% 0% 1% 1% 1% 0% 0% 1% 1% 0% 0% 0% 1% 1% 1% 0% 0% 0% 0% 0% 1% 1% 1% 0% 1% 1% 1% 0% 0% 1% 1% 1% 1% 1% 1% 0% 0% 0% 1% 0% 0% 0% 0% 0% 0% 1% 0% 1% 1% 0% 0% 0% 1% 1% 1% 1% 1% 0% 0% 1% 1% 1% 2% 2% 2% 0% 0% 1% 2% 0% 0% 0% 1% 1% 1% 1% 1% 1% 1% 1% 0% 0% 1% 2% 1% 0% 0% 0% 0% 0% 0% 0% 1% 1% 1% 1% 1% 2% 1% 0% 1% 1% 0% 0% 1% 1% 2% 2% 2% 2% 0% 0% 1% 2% 0% 0% 0% 1% 1% 1% 2% 1% 2% 2% 2% 0% 0% 0% 2% 2% 0% 0% 0% 0% 0% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 0% 0% 1% 0% 0% 2% 2% 2% 0% 0% 2% 2% 2% 2% 0% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 2% Reference( haplotypes( Phenotyped( GWAS( samples( Associa8on% signal%

Imputa-on(facilitates(meta>analysis( 0% 0% 1% 1% 1% 0% 0% 1% 1% 0% 0% 0% 1% 1% 1% 0% 0% 0% 0% 0% 1% 1% 1% 0% 1% 1% 1% 0% 0% 1% 1% 1% 1% 1% 1% 0% 0% 0% 1% 0% 0% 0% 0% 0% 0% 1% 0% 1% 1% 0% 0% 0% 1% 1% 1% 1% 1% 0% 0% 1% 1% 2% 0% 0% 1% 1% 1% 1% 0% 0% 0% 0% 0% 1% 1% 1% 0% 1% 1% 2% 0% 0% 1% 1% 0% 1% 1% 0% 1% 1% 0% 0% 1% 0% 2% 0% 2% 2% 0% 0% 1% 0% 1% 1% 0% 1% 1% 1% Reference( haplotypes( GWAS(1( GWAS(2(

Imputa-on(facilitates(meta>analysis( 0% 0% 1% 1% 1% 0% 0% 1% 1% 0% 0% 0% 1% 1% 1% 0% 0% 0% 0% 0% 1% 1% 1% 0% 1% 1% 1% 0% 0% 1% 1% 1% 1% 1% 1% 0% 0% 0% 1% 0% 0% 0% 0% 0% 0% 1% 0% 1% 1% 0% 0% 0% 1% 1% 1% 1% 1% 0% 0% 1% Reference( haplotypes( Associa8on% signal% 1% 1% 2% 2% 2% 0% 0% 1% 1% 2% 0% 0% 1% 1% 1% 1% 1% 1% 1% 1% 0% 0% 1% 0% 1% 0% 0% 0% 0% 0% 0% 0% 1% 1% 1% 1% 1% 2% 1% 1% 1% 1% 0% 0% 1% 1% 2% 2% 2% 2% 0% 0% 1% 0% 1% 0% 0% 1% 1% 1% 0% 0% 0% 1% 1% 1% 1% 2% 0% 1% 1% 1% 1% 1% 2% 0% 0% 0% 0% 0% 1% 1% 2% 0% 2% 1% 1% 0% 0% 1% 1% 1% 2% 2% 1% 0% 0% 1% 0% 1% 1% 1% 0% 0% 1% 0% 0% 1% 1% 1% 0% 0% 2% 1% 1% 0% 0% 1% 1% 1% GWAS(1( GWAS(2(

Imputa-on(facilitates(meta>analysis( 0% 0% 1% 1% 1% 0% 0% 1% 1% 0% 0% 0% 1% 1% 1% 0% 0% 0% 0% 0% 1% 1% 1% 0% 1% 1% 1% 0% 0% 1% 1% 1% 1% 1% 1% 0% 0% 0% 1% 0% 0% 0% 0% 0% 0% 1% 0% 1% 1% 0% 0% 0% 1% 1% 1% 1% 1% 0% 0% 1% 1% 1% 2% 2% 2% 0% 0% 1% 1% 2% 0% 0% 1% 1% 1% 1% 1% 1% 1% 1% 0% 0% 1% 0% 1% 0% 0% 0% 0% 0% 0% 0% 1% 1% 1% 1% 1% 2% 1% 1% 1% 1% 0% 0% 1% 1% 2% 2% 2% 2% 0% 0% 1% 0% 1% 0% 0% 1% 1% 1% 0% 0% 0% 1% 1% 1% 1% 2% 0% 1% 1% 1% 1% 1% 2% 0% 0% 0% 0% 0% 1% 1% 2% 0% 2% 1% 1% 0% 0% 1% 1% 1% 2% 2% 1% 0% 0% 1% 0% 1% 1% 1% 0% 0% 1% 0% 0% 1% 1% 1% 0% 0% 2% 1% 1% 0% 0% 1% 1% 1% Reference( haplotypes( GWAS(1( GWAS(2( Type%1%diabetes:%Cooper%et%al.,%Nov%2008%(Nature'Gene*cs)% Type%2%diabetes:%Zeggini%et%al.,%May%2008%(Nature'Gene*cs)% Crohn s%disease:%barreh%et%al.,%aug%2008%(nature'gene*cs)%

Story II: Sequence, Sequence, Everywhere

Sequencing Assays, and Statistical Challenges Although DNA sequencing is best known for obtaining genome sequences, it is now routinely used for measuring cellular processes to try to understand how cells operate. For example: Gene expression (RNA-seq). Chromatin openness (DNase-seq). Transcription Factor Binding (ChIP-seq) Histone modifications (ChIP-seq) A key question is how/why cells differ from one another (they share the same DNA!).

Chromatin and DNA structure Figure from Felsenfeld and Groudine. Nature, 2003

The Data The basic structure of these assays is the same: Do something clever to get bits of the DNA that you want (e.g. the bits that contact a modified histone, or the bits that are bound by a particular transcription factor). Sequence these bits (producing millions of little sequences). Work out where in the genome each sequence came from. The number of sequences coming from each location (usually 0 or 1) is a measure of the intensity of the process at that location. Basic model: an inhomogeneous Poisson process, x ib Poi(λ ib ).

Example: Histone Modification H3K4me1 Can you spot the difference? Left Ventricle, H3K4me1 0.00 0.02 0.04 0.06 0.08 32230000 32250000 32270000 32290000 Right Ventricle, H3K4me1 0.00 0.02 0.04 0.06 0.08 32230000 32250000 32270000 32290000 Data from Scott Smemo, Nobrega lab

Advertisement: STAT 45800 We have preliminary ideas and methods for dealing with these data, based on wavelets for count data (work with H. Shim). In STAT 45800 we will try crowd-sourcing these ideas, to see how much further progress we can make. Aim: to combine expertises in Bioinformatics, Computing, Biology and Statistics, to make more progress together than any of us could do alone!

Acknowledgements Bryan Howie, Heejung Shim. Funding: NHGRI, NIH GTEX project, and NIH ENDGAME consortium.