BST227: Introduction to Statistical Genetics Lecture 11: Heritability from summary statistics & epigenetic enrichments Guest Lecturer: Caleb Lareau
Success of GWAS EBI Human GWAS Catalog
As of this morning EBI Human GWAS Catalog
Questions of the Post-GWAS Era Can we identify other traits that share a similar genetic basis for the specific phenotype? Can we identify the cell types most important for disease (e.g. schizophrenia) and other traits (e.g. height) where variants are acting?
Tackling the big post-gwas questions Khan Academy; NIH Roadmap Website
Overview Part I: Omnigenic Model Part II: LD Score Regression <break> Part III: Epigenetic enrichment of GWAS Part IV: Improving precision of epigenetic enrichments ~ 1 hour ~ 20 minutes
Part I: Omnigenic Model
Questions: How many genes are important in a Mendelian disease (e.g. Sickle-Cell Disease)? How many genes are important in a non- Mendelian disease(e.g. schizophrenia)? How many genes are important in height?
Inflated summary statistics PGC 2014 Nature
Remove all green regions (+/- 1 Mb) PGC 2014 Nature
After removing all GWAS-hits PGC 2014 Nature
Omnigenic model Boyle et al. 2017 Cell
Low et al 2010 PLoS One; PGC 2014 Nature Contrasting Models Polygenic Omnigenic
Question: If the omnigenic model is true, which chromosome should have the most heritability?
Omnigenic model validation Shi et al., 2016 AJHG
Part II: LD Score Regression
LD Score Regression can 1. Accurately distinguish polygenicity over confounding 2. Estimate heritability from summary statistics 3. Identify traits that share a genetic basis all of which you need to discuss in your project so ask questions!
LD Score Regression can 1. Accurately distinguish polygenicity over confounding 2. Estimate heritability from summary statistics 3. Identify traits that share a genetic basis
Omnigenic association vs. confounding Inflation: Confounding: No Yes *Simulated Data Bulik-Sullivan 2015 Nature Genetics
Definitions A standard model for GWAS is: (recall: need standardization) Heritability can be defined: Heritability of a category C is: Finucane 2014 AJHG
Polygenicity Polygenicity causes more chi-square statistic inflation in high LD regions than in low LD regions Finucane 2014 AJHG
Toy Illustration of the Genome Bulik-Sullivan 2015
Simulating a polygenic trait Bulik-Sullivan 2015
Simulating a polygenic trait Bulik-Sullivan 2015
Simulating a polygenic trait Bulik-Sullivan 2015
High-level overview 1. Separate the genome into bins 2. Compute the mean chi-squared statistic per bin 3. Compute the mean LD score per bin 4. Perform a regression of 2 & 3
LD Bins
LD Score Let C be the bin of genome of interest LD Score for SNP j Χ 2 statistic for SNP j (copy on board) Traylor et al. 2014 PLoS Genetics
LD Score Regression each bin is a dot intercept is important Bulik-Sullivan et al 2015 Nature Genetics
pause, review last slides if needed
Confounding (Population Stratification) Bulik-Sullivan et al 2015 Nature Genetics
No Confounding (Omnigenic) Bulik-Sullivan et al 2015 Nature Genetics
Intercept matters
Real GWAS PGC 2014 Nature
Bulik-Sullivan et al 2015 Nature Genetics
LD Score Regression can 1. Accurately distinguish polygenicity over confounding 2. Estimate heritability from summary statistics 3. Identify traits that share a genetic basis
LD Score Regression Slope -> Slope is proportional to the heritability Write on the board
Recall Lecture 9 requires genotypes!!!
Key point: LD Score regression can compute heritability using summary statistics Why might this be important?
From LD Hub ldsc.broadinstitute.org
LD Score Regression can 1. Accurately distinguish polygenicity over confounding 2. Estimate heritability from summary statistics 3. Identify traits that share a genetic basis
Pleiotropy Pleiotropy := the production by a single gene (or genes!) of two or more apparently unrelated phenotypes or traits.
Single Trait Bulik-Sullivan 2015
Two Traits Bulik-Sullivan 2015
Pleiotropy using LD Score Z 1j and Z 2j are the z statistics of a single SNP j for two different traits Bulik-Sullivan et al., 2015 Nature Genetics
Genetic Correlations Cor = ~ 0 Cor = ~ 0.5 Bulik-Sullivan 2015
Many traits share a genetic basis! Bulik-Sullivan et al., 2015 Nature Genetics
LD Score isn t alone Bulik-Sullivan et al 2015 Nature Genetics
<break>
Part III: Epigenetic enrichment of GWAS
Epigenetics Encode Project Consortium 2012 Nature
What makes cells so different? NIH Roadmap Website
Epigenetic plots Buenrostro et al 2013 Nature Methods
Meyer and Liu 2014 Nature Reviews Genetics
Roadmap Project Roadmap Consortium 2015 Nature
Finding causal tissues for GWAS Intersecting with epigenetic annotations can find causal variants Intersecting GWAS with epigenetics can also find important tissues
Finding important tissue Encode Consortium 2012 Nature
Where is schizophrenia risk important? Boyle et al. 2017 Cell
Stratified LD Score Regression Regular LD Score Regression: Stratified LD Score Regression (sldsc): Finucane 2014 AJHG
Stratifying the genome Encode Consortium 2012 Nature
Where is heritability localized? Finucane et al 2015 Nature Genetics
What cell types are important? Finucane et al 2015 Nature Genetics
LD Score Regression can 1. Accurately distinguish polygenicity over confounding 2. Estimate heritability from summary statistics 3. Identify traits that share a genetic basis 4. Identify cell types important for traits all of which you need to discuss in your project so ask questions!
Part IV: Improving precision of epigenetic enrichments
In collaboration with Jacob Ulirsch Harvard BBS Program Martin Aryee, PhD Massachusetts General Hospital Erik Bao Harvard Medical School Jason Buenrostro, PhD Broad Institute Vijay Sankaran, MD, PhD Boston Children s Hospital
LD Score Regression gets us in the right zip code Finucane et al 2017 Nature Genetics
Accessibility peaks are not the same!
Main Question: Can we develop a methodology that accurately identifies the causal tissue for GWAS traits? Can we apply this approach to single cells?
Human Hematopoiesis
New method: gchromvar 1. Use quantitative genetic information about the core gene associations 2. Use quantitative epigenetic information about chromatin locations
Human hematopoietic traits are heritable h 2
sldsc vs. gchromvar reticulocyte count (-log 10 p-value)
New method: gchromvar 1. Use quantitative genetic information about the core gene associations 2. Use quantitative epigenetic information about chromatin locations
gchromvar Results
Can we apply gchromvar to single cells?
Single Cell ATAC ~2,200 cells assayed
scatac + gchromvar
Pseudotime
Platelet count single cell GWAS Enrichment
Ongoing efforts Pinpoint the precise cell types and stage of development where GWAS seems to matter most for a trait Our approach, gchromvar, is more sensitive at distinguishing enrichments in closelyrelated cell types.
More information EPI511 Offered Spring of 2019 Supplemental reading on the course webpage Homework 5, final projects will require running and interpreting LD Score Regression
Thanks! caleblareau@g.harvard.edu