Supplementary Figures

Similar documents
7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans.

Chromatin marks identify critical cell-types for fine-mapping complex trait variants

Nature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality.

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types.

Supplementary Figure S1. Gene expression analysis of epidermal marker genes and TP63.

Table S1. Total and mapped reads produced for each ChIP-seq sample

Supplementary Figure 1. Efficiency of Mll4 deletion and its effect on T cell populations in the periphery. Nature Immunology: doi: /ni.

Computational aspects of ChIP-seq. John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory

BIMM 143. RNA sequencing overview. Genome Informatics II. Barry Grant. Lecture In vivo. In vitro.

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

Supplementary Figure 1. Quantile-quantile (Q-Q) plot of the log 10 p-value association results from logistic regression models for prostate cancer

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

ChromHMM Tutorial. Jason Ernst Assistant Professor University of California, Los Angeles

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq

Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Nature Genetics: doi: /ng Supplementary Figure 1. SEER data for male and female cancer incidence from

Nature Methods: doi: /nmeth.3115

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Experimental design and workflow utilized to generate the WMG Protein Atlas.

Figure S2. Distribution of acgh probes on all ten chromosomes of the RIL M0022

Supplemental Information For: The genetics of splicing in neuroblastoma

Supplementary Materials for

Epigenetics. Jenny van Dongen Vrije Universiteit (VU) Amsterdam Boulder, Friday march 10, 2017

Nature Immunology: doi: /ni Supplementary Figure 1. Characteristics of SEs in T reg and T conv cells.

Nature Genetics: doi: /ng Supplementary Figure 1

Lecture 8 Understanding Transcription RNA-seq analysis. Foundations of Computational Systems Biology David K. Gifford

Nature Immunology: doi: /ni Supplementary Figure 1. Transcriptional program of the TE and MP CD8 + T cell subsets.

Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22.

Nature Structural & Molecular Biology: doi: /nsmb.2419

Supplemental Data. Integrating omics and alternative splicing i reveals insights i into grape response to high temperature

SUPPLEMENTARY FIGURE 1: f-i

genomics for systems biology / ISB2020 RNA sequencing (RNA-seq)

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Cancer Informatics Lecture

Effects of UBL5 knockdown on cell cycle distribution and sister chromatid cohesion

Histone Modifications Are Associated with Transcript Isoform Diversity in Normal and Cancer Cells

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays

Nature Genetics: doi: /ng.3731

ChIP-seq data analysis

Nature Genetics: doi: /ng Supplementary Figure 1

SUPPLEMENTARY APPENDIX

Global variation in copy number in the human genome

Supplemental Information. Genetic Regulation of Plasma Lipid Species. and Their Association with Metabolic Phenotypes

fl/+ KRas;Atg5 fl/+ KRas;Atg5 fl/fl KRas;Atg5 fl/fl KRas;Atg5 Supplementary Figure 1. Gene set enrichment analyses. (a) (b)

Ct=28.4 WAT 92.6% Hepatic CE (mg/g) P=3.6x10-08 Plasma Cholesterol (mg/dl)

MIR retrotransposon sequences provide insulators to the human genome

RNA interference induced hepatotoxicity results from loss of the first synthesized isoform of microrna-122 in mice

Nature Getetics: doi: /ng.3471

Expanded View Figures

New Enhancements: GWAS Workflows with SVS

Processing, integrating and analysing chromatin immunoprecipitation followed by sequencing (ChIP-seq) data

Nature Genetics: doi: /ng Supplementary Figure 1. Somatic coding mutations identified by WES/WGS for 83 ATL cases.

MODULE 4: SPLICING. Removal of introns from messenger RNA by splicing

SUPPLEMENTAL INFORMATION

Nature Genetics: doi: /ng Supplementary Figure 1. PCA for ancestry in SNV data.

Supplementary Figures

The Epigenome Tools 2: ChIP-Seq and Data Analysis

Relationship between genomic features and distributions of RS1 and RS3 rearrangements in breast cancer genomes.

a) List of KMTs targeted in the shrna screen. The official symbol, KMT designation,

Tutorial: RNA-Seq Analysis Part II: Non-Specific Matches and Expression Measures

Predominant contribution of cis-regulatory divergence in the evolution of mouse alternative splicing

Supplementary information

MODULE 3: TRANSCRIPTION PART II

Supplementary Information

Supplementary information. Supplementary figure 1. Flow chart of study design

EXPression ANalyzer and DisplayER

Nature Genetics: doi: /ng Supplementary Figure 1. Country distribution of GME samples and designation of geographical subregions.

RNA SEQUENCING AND DATA ANALYSIS

SUPPLEMENTARY INFORMATION

Supplementary Information. Supplementary Figures

Heritability enrichment of differentially expressed genes. Hilary Finucane PGC Statistical Analysis Call January 26, 2016

SUPPLEMENTARY FIGURES: Supplementary Figure 1

Supplementary Figure 1 IL-27 IL

SUPPLEMENTARY INFORMATION

Supplementary Figures and Tables

Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells.

Supplemental Figure S1. Tertiles of FKBP5 promoter methylation and internal regulatory region

An expanded view of complex traits: from polygenic to omnigenic

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY DATA. 1. Characteristics of individual studies

Data mining with Ensembl Biomart. Stéphanie Le Gras

DeconRNASeq: A Statistical Framework for Deconvolution of Heterogeneous Tissue Samples Based on mrna-seq data

ARTICLE Genetic Risk Factors for Type 2 Diabetes: A Trans-Regulatory Genetic Architecture?

Analyse de données de séquençage haut débit

Lung Met 1 Lung Met 2 Lung Met Lung Met H3K4me1. Lung Met H3K27ac Primary H3K4me1

Exercises: Differential Methylation

Nature Medicine: doi: /nm.3967

Cluster Dendrogram. dist(cor(na.omit(tss.exprs.chip[, c(1:10, 24, 27, 30, 48:50, dist(cor(na.omit(tss.exprs.chip[, c(1:99, 103, 104, 109, 110,

Supplementary Information Methods Subjects The study was comprised of 84 chronic pain patients with either chronic back pain (CBP) or osteoarthritis

BWA alignment to reference transcriptome and genome. Convert transcriptome mappings back to genome space

Sirt1 Hmg20b Gm (0.17) 24 (17.3) 877 (857)

Supplementary Table 1. Association of rs with risk of obesity among participants in NHS and HPFS

ncounter TM Analysis System

Figure S1, Beyer et al.

Supplementary Materials for

Lentiviral Delivery of Combinatorial mirna Expression Constructs Provides Efficient Target Gene Repression.

RNA-Seq Preparation Comparision Summary: Lexogen, Standard, NEB

SUPPLEMENTARY INFORMATION

Genome-wide Association Studies (GWAS) Pasieka, Science Photo Library

Transcription:

Supplementary Figures Supplementary Figure 1. Heatmap of GO terms for differentially expressed genes. The terms were hierarchically clustered using the GO term enrichment beta. Darker red, higher positive gene expression-trait association; darker blue, stronger negative association. Larger circle sizes represent more significant GO terms.

Supplementary Figure 2. Heat maps for GO terms for differential expression analyses without (a) (as in Figure 1c) and with (b) adjusting for fraction white blood cell and fraction lymph from the muscle tissue heterogeneity deconvolution analysis (see Methods). To facilitate comparison between the results of the two analyses, we show the same go terms in (b) as in (a). For each GO term and trait we see similar patterns of higher or lower gene expression with the trait for analyses with and without adjustment for muscle cell heterogeneity. The strength of associations shows some variability between the two analyses.

Supplementary Figure 3. Distance (kb) from the cis-eqtl to transcription start site (TSS) of the associated gene.

Supplementary Figure 4. Allelic bias in RNA-seq data and comparison with eqtl results. (a) Comparison of eqtl and ASE results in protein coding genes on the autosomes. To be tested for ASE the gene must have at least one SNP that passes filters, has 30x coverage, and is present in 10 samples. Significance is determined by Fisher s Combined Probability test and Storey s FDR correction (q-value 0.05). 94.57% of genes tested for an eqtl have a significant eqtl and 80.24% of genes tested for ASE have significant ASE suggesting almost all genes have genetic regulation. 407 genes were tested by ASE but not for an eqtl and 84.8% had significant ASE. (b) Volcano plot of genome-wise ASE in one muscle sample, M12001. The adjusted fraction reference allele uses a sample specific, allele pair specific expectation to adjust for residual reference allele alignment bias after filtering steps. Reference + Alternate allele count is restricted to those sites with 30x coverage.

Supplementary Figure 5. Consistency in FUSION (N=267 skeletal muscle samples) and GTEx (N=361 skeletal muscle samples) cis-eqtl results. Here we show the signed (based on the estimated regression coeffcient keyed to the same allele) log 10 (P-values) for the 95.5M matching cis-eqtl results across the studies. Note that the large majority of results have a consistent direction of effect.

Supplementary Figure 6. Original state assignments from our ChromHMM run (in each facet) were compared to chromatin state assignments in the same cell types published by Ernst et al. 2011 (a). Based on enrichment with previously published chromatin state calls, we re-assigned our states (b). The emission probabilities of our re-assigned chromatin state model are represented in (c).

Supplementary Figure 7. Enrichment of 0.1% FDR cis-eqtl in different skeletal muscle chromatin states (x-axis) using either all genes or genes split by mesi decile (y-axis). Note that the level of enrichment increases with mesi decile for the enhancer states and decreases for the polycomb repressed states.

Supplementary Figure 8. Stretch and typical enhancer fold enrichment (a) and significance (b) for overlapping eqtl from difference mesi bins or all genes across multiple cell/tissue types. For skeletal muscle there are 15,007 stretch and 54,878 typical enhancers.

Supplementary Figure 9. Skeletal muscle stretch, typical, and mid-size enhancer fold enrichment for overlapping skeletal muscle eqtl from different mesi bins. Stretch enhancer thresholds of 3.0kb, 4.2kb, and 6.2kb correspond to the previously defined thresholds of the top 10%, 5%, and 1% of enhancer lengths, respectively. Note that the enrichment trend is consistent regardless of the stretch enhancer threshold used. Typical ( 800bp; median length) and mid-size (>800bp and <3kb) enhancers are shown as controls and do not have progressively increasing enrichment trends like stretch enhancers.

Supplementary Figure 10. Skeletal muscle stretch (default 3.0kb threshold) and super enhancer base pair level overlap comparison.

Supplementary Figure 11. Super and typical enhancer (based on super enhancer definition) fold enrichment (a) and significance (b) for overlapping eqtl from difference mesi bins or all genes across multiple cell/tissue types. For skeletal muscle there are 618 super and 18,620 typical enhancers. This large difference in N allows for the skeletal muscle typical enhancers to have more significant p-values (b) even though the enrichment is lower (a) compared to skeletal muscle super enhancers.

Supplementary Figure 12. Signal correlation across frozen skeletal muscle ATAC-seq 10mg and 2mg input replicates. Peaks were called on individual replicates using MACS2 (see Methods) and then merged across each. Signal within each merged peak is quantified in RPKM units. Pearson s R = 0.815.

Supplementary Figure 13. Comparison of ATAC-seq peak calls in three cell types (columns) with chromatin states across diverse tissues (y-axis). Skeletal muscle (sample = HSM1398) combined replicate ATAC-seq peak calls (left column) show enrichment for skeletal muscle active chromatin states, which is more pronounced at TSS-distal (>5kb from TSS) regions. Adipose (middle column) and GM12878 (right column) ATAC-seq peak calls show similar trends with chromatin states from matched tissues.

Supplementary Figure 14. Enhancer length overlaps at the T2D GWAS tag SNP rs516946, which is the best cis-eqtl SNP for ANK1. The vertical dashed red line represents the default 3kb length threshold for calling stretch enhancers.

Supplementary Figure 15. Chromatin states overlapping the ANK1 cis-eqtl T2D GWAS tag SNP rs516946 and those in close (r 2 0.8) LD across diverse cells/tissues. Note that rs508419 overlaps an active promoter state in skeletal muscle and HSMM and is flanked by strong enhancer states.

rs508419 A(non-risk)/G(risk) Probe A HSMM (Nuclear Extract) Competition Lane 1 A + 2 A + A + A G 3 4 G - 5 G + 6 G + G + G A 7 8 Probe A A A A G G G C2C12 (Nuclear Extract) Competition Lane 1 + + + - + + + 2 A 3 G 4 6 G 7 A 8 5 G Supplementary Figure 16. EMSA results using human (HSMM; left) and mouse (C2C12; right) myoblast cell nuclear extract.

Supplementary Figure 17. We performed an exonqtl analysis on all exonic parts for GENCODE V19 transcripts (see Methods) and show the results for the four short ANK1 isoforms. Note that exonic part 10 is diagnostic for ENST00000522543.1 and shows decreased splicing with risk allele dosages. These results are consistent sqtl results we report in Figs. 5bc and with the RT-qPCR (Supplementary Figs. 18 and 19) and ddpcr (Supplementary Fig. 20).

Supplementary Figure 18. RT-qPCR results for ANK1 isoforms. Twelve samples were assayed, four each from each of the three possible genotype classes at rs508149 (risk allele dosage of 0, 1, or 2). Values are expressed as the inverse of the difference between the Ct values of the target and control (EMC7) genes, i.e. Ct(EMC7) - Ct(target); therefore, greater values indicate higher expression.

Supplementary Figure 19. RT-qPCR results for ANK1 isoforms for each of the three possible genotype classes at rs508149 (risk allele dosage of 0, 1, or 2). Values are expressed as the inverse of the difference between the Ct values of the target and control (EMC7) genes, i.e. Ct(EMC7) - Ct(target); therefore, greater values indicate higher expression.

Supplementary figure 20. Results of ddpcr quantification for 12 samples, 4 each with the three possible risk allele dosages. Values are expressed as normalized concentrations and were computed by first subtracting the median of the no-template control and then dividing by the median of the constant assay (NUCKS1) within each sample.

Supplementary Figure 21. Scatter plot of the first 2 principal components on the expression matrix for nuclear genome genes, before (left) and after (right) PEER adjustment. Color-labeling by sequencing batch revealed the presence of systematic batch effects (left). Principal components analysis after normalization using the PEER framework demonstrates removal of batch effects (right).

Supplementary Figure 22. cis-eqtl analyses using a diverse number of PEER factors reveals a plateau in the number of significant eqtl associations (FDR=5%) at 60, which is the number we used in our final analysis.

Analysis Differential Expression e-qtl Total NGT IFG IGT T2D Total N 271 97 36 71 67 267 OGTT Status( n(%)) NGT 97 (35.8%) 97(100%) 0 (0%) 0 (0%) 0 (0%) 97 (36.3%) IFG 36 (13.3%) 0 (0%) 36 (100%) 0 (0%) 0 (0%) 35 (13.1%) IGT 71 (26.2%) 0 (0%) 0 (0%) 71 (100%) 0 (0%) 69 (25.8%) T2D 67 (24.7%) 0 (0%) 0 (0%) 0 (0%) 67 (100%) 66 (24.7%) Sex (N, %) 160 M (59.0%) / 111 F (41.0%) 46 M (47.4%) / 51 F (52.6%) 24 M (66.7%) / 12 F (33.3%) 38 M (53.5%) / 33 F (46.5%) 52 M (77.6%) / 15 F (22.4%) 157 M (58.8%) / 110 F (41.2%) Age (mean(sd)), y 60.1 (7.0) 58.1 (7.0) 57.8 (7.4) 62.9 (5.3) 61.1 (7.3) 60.1 (7.0) Fasting glucose (mean(sd)), mmol/l 6.26 (0.77) 5.61 (0.33) 6.42 (0.28) 6.19 (0.54) 7.17 (0.65) 6.26 (0.78) Fasting insulin (mean(sd)), mu/l 8.69 (5.31) 6.72 (3.32) 7.57 (3.46) 10.07 (5.60) 10.67 (6.87) 8.65 (5.27) WHR (mean(sd)) 0.950 (0.077) 0.920 (0.077) 0.945 (0.071) 0.954 (0.073) 0.990 (0.067) 0.949 (0.078) BMI (mean(sd)), kg/m^2 27.6 (4.2) 26.2 (3.4) 26.4 (3.4) 28.3 (3.8) 29.4 (5.0) 27.6 (4.2) Supplementary Table 1. Summary statistics for individuals that participated in this study.

Genes/Exons/Isoforms Tested b Genic Analysis eqtl Exon Analysis eqtl d Isoform FPKM Analysis eqtl Isoform % of Category with % of Category with % of Category Gene Type Category a Genes Exons FPKM Genes c eqtl Exons c eqtl Transcripts c with eqtl protein_coding 15,449 353,596 63,158 14,479 93.6 269,871 76.3 34,836 55.2 pseudogene 2,147 14,528 792 1,747 80.5 11,233 77.3 433 54.7 antisense 1,667 7,398 7,597 1,535 92.0 5,878 79.5 3,755 49.4 lincrna 1,561 6,715 5,536 1,437 91.5 5,330 79.4 2,702 48.8 processed_transcript 244 8,755 845 229 93.9 7,022 80.2 435 51.5 sense_intronic 212 1,319 643 186 87.7 1,037 78.6 320 49.8 sense_overlapping 92 1,252 179 84 91.3 993 79.3 109 60.9 Total 21,420 393,563 78,750 19,697 92.0 301,364 76.6 42,590 54.1 Supplementary Table 2. Count of genes, exons, and isoforms with at least 1 significant eqtl, stratified by gene type. a Gene type from GENCODE v19; b The total number of gene, exons and isoforms tested for eqtls (genes with at least 1 tested SNP); c Number of genes, exons, or isoforms with 1 eqtl; d Exons that are annotated to more than one gene are included once in the table for each annotate gene category.