ChIP-seq data analysis
|
|
- Tyrone Hamilton
- 5 years ago
- Views:
Transcription
1 ChIP-seq data analysis Harri Lähdesmäki Department of Computer Science Aalto University November 24, 2017
2 Contents Background ChIP-seq protocol ChIP-seq data analysis
3 Transcriptional regulation Transcriptional regulation is largely controlled by protein-dna interactions Chromatin Distal TFBS Co-activator complex Transcription initiation complex Transcription initiation CRM Proximal TFBS Figure from (Wasserman & Sandelin, 2004)
4 Transcriptional regulation Transcriptional regulation is largely controlled by protein-dna interactions Chromatin Distal TFBS Co-activator complex Transcription initiation complex Transcription initiation CRM Proximal TFBS Figure from (Wasserman & Sandelin, 2004)
5 Protein-DNA binding A transcription factor is a protein that binds to specific DNA sequences, recruits other co-factors and controls gene transcription Can function alone or with other proteins Can activate or repress gene expression Transcription factors contain DNA-binding domain(s) (DBDs)
6 Protein-DNA binding Proteins DNA-binding specificities are encoded in their DNA-binding domains that specialize them to recognize and bind specific types of binding sites Figure from (Kissinger et al., 1990)
7 Modeling transcriptional regulation The goal An accurate method to quantify protein-dna interactions Challenges Human genome contains about 3.2 billion ( !) nucleotides Human genome is physically about 2 meters long, packed in a cell of average diameter of 10µm
8 Modeling transcriptional regulation The goal An accurate method to quantify protein-dna interactions Challenges Human genome contains about 3.2 billion ( !) nucleotides Human genome is physically about 2 meters long, packed in a cell of average diameter of 10µm Protein-DNA binding can be studied using e.g. Probabilistic models for biological sequences Biological experiments + statistical analysis: ChIP-seq, protein binding microarray, high-throughput SELEX, chromatin accessibility Biophysics: all atom-level modeling
9 Contents Background ChIP-seq protocol ChIP-seq data analysis
10 ChIP-seq So, for any given condition, how do we find the genomic locations where DNA binding proteins bind? Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is the current state-of-the-art method The basic principle: Use a specific antibody to detect a protein of interest (or a post-translationally modified histone protein in nucleosomes) ChIP-seq procedure enriches DNA fragments that are bound to these proteins or nucleosomes
11 ChIP-seq protocol ChIP-seq steps: Crosslink DNA-binding proteins with DNA in vivo Shear the chromatin into small fragments ( bp) amenable for sequencing (sonication) Immunoprecipitate the DNA-protein complex with a specific antibody Reverse the crosslinks Assay enriched DNA to determine the sequences bound by the protein of interest Figure from (Visel et al., 2009)
12 ChIP-seq protocol again 152 Biometrics, March 2011 Figure 1. Details of a ChIPseq experiment. Figure A DNA-binding from (Zhang proteinet isal., cross-linked 2011) to its in vivo genomic DNA targets, and the chromatin (a complex of DNA and protein) is isolated (1). The DNA with the bound proteins is extracted from the cells,
13 Poisson model A probability distribution that aligned reads. Owing to the large number of reads, the is often used to model the use of conventional alignment algorithms can take hundreds or thousands of processor visualization hours; therefore, a Strand specificity andnumber of read random events in a density new fixed interval. Given an average number of events in the interval, the probability of a alignment, as all subsequent results are based on the generation of aligners has been developed 57, and more are expected soon. Every aligner is a balance between accuracy, speed, memory and flexibility, and no aligner can be best suited for all applications. Alignment for given number of occurrences can be calculated. A data view of protein-dna binding Protein or nucleosome of interest 5 Positive strand 3 3 Negative strand 5 5 ends of fragments are sequenced ChIP seq should allow for a small number of mismatches due to sequencing errors, SNPs and indels or the difference between the genome of interest and the reference genome. This is simpler than in RNA seq, for example, in which large gaps corresponding to introns must be considered. Popular aligners include: Eland, an efficient and fast aligner for short reads that was developed by Illumina and is the default aligner on that platform; Mapping and Assembly with Qualities (MAQ) 58, a widely used aligner with a more exhaustive algorithm and excellent capabilities for detecting SNPs; and Bowtie 59, an extremely fast mapper that is based on an algorithm that was originally developed for file compression. These methods use the quality score that accompanies each base call to indicate its reliability. For the SOLiD di-base sequencing technology, in which two consecutive bases are read at a time, modified aligners have been developed 60,61. Many current analysis pipelines discard non-unique tags, but studies involving the repetitive regions of the genome 27,62 64 require careful handling of these non-unique tags. Short reads are aligned Distribution of tags Reference genome is computed Profile is generated from combined tags For example, each mapped Peak identification location is extended can be performed with a fragment of on either profile estimated size Fragments are added Figure 5 Strand-specific profiles at enriched sites. DNA fragments from a chromatin immunoprecipitation Figure experiment from (Park, are sequenced 2009) from the 5 end. Therefore, the alignment of these tags to the genome results in two peaks (one on Identification of enriched regions. After sequenced reads are aligned to the genome, the next step is to identify regions that are enriched in the ChIP sample relative to the control with statistical significance. Several peak callers that scan along the genome to identify the enriched regions are currently available 24,26,38,48, In early algorithms, regions were scored by the number of tags in a window of a given size and then assessed by a set of criteria based on factors such as enrichment over the control and minimum tag density. Subsequent algorithms take advantage of the directionality of the reads 71. As shown in FIG. 5, the fragments are sequenced at the 5 end, and the locations of mapped reads should form two distributions, one on the positive strand and the other on the negative strand, with a consistent distance between the peaks of the distributions. In these methods, a smoothed profile of each strand is constructed 65,72 and the combined profile is calculated either by shifting each distribution towards the centre or by extending each mapped position into an appropriately oriented fragment and then adding the fragments together. The latter approach should result in a more accurate profile with respect to the width of the binding, but it requires an estimate of the fragment size as well as the assumption that fragment size is uniform. Given a combined profile, peaks can be scored in several ways. A simple fold ratio of the signal for the ChIP sample relative to that of the control sample around the peak (FIG. 3B) provides important information, but it is not adequate. A fold ratio of 5 estimated from 50 and 10 tags (from the ChIP and control experiments, respectively) has a different statistical significance to the same ratio estimated from, for example, 500 and 100 tags. A Poisson model for the tag distribution is an effective
14 Contents Background ChIP-seq protocol ChIP-seq data analysis
15 Identification of binding sites from ChIP-seq data First steps in ChIP-seq data analysis: quality control, short read alignment Given a read density (or read densities on both strands) along genome, the actual data analysis task involves identification of the protein binding sites Given the above information, we should expect to see two signal peaks on opposite DNA strands within a proper distance This analysis is often called peak detection
16 Identification of binding sites from ChIP-seq data First steps in ChIP-seq data analysis: quality control, short read alignment Given a read density (or read densities on both strands) along genome, the actual data analysis task involves identification of the protein binding sites Given the above information, we should expect to see two signal peaks on opposite DNA strands within a proper distance This analysis is often called peak detection But how much signal (how many reads) are considered enough to call a protein-dna interaction site? What affects the signal strength? Protein binding in the first place Sequencing efficiency and depth Chromatin accessibility Fragmentation efficiency Mappability (i.e., uniqueness) of a local genomic region
17 ChIP-seq controls The best way to assess significance of a signal at putative binding sites is to use a control for ChIP-seq Input-DNA: sequencing data of the genomic DNA. More/Less accessible regions are sequenced more/less ChIP-seq experiment with an unspecific antibody which does not detect any specific protein ChIP-seq controls can be used to account for many of the biases which affect the signal strength Input-DNA is currently considered the best control
18 Detecting binding sites from ChIP-seq data S A Fraction of peaks recovered Early methods used a single cut-off for signal strength or a Ba Not statistically significant log-fold enrichment (for a given putative binding Enrichment ratio: 1.5 region/window) Considering all statistically significant peaks 1 with fold enrichment above a # threshold ChIP-seqControl reads 10in a window score = log # Input DNA reads in a window Ba Considering only peaks Not statistically significant Enrichment ratio: 1.5 ChIP 15 Considering only peaks with fold enrichment 5 above a threshold 10 Control Control 0 Figure1 from (Park, 2009) Fraction of reads sampled from the data Bb Statistically significant Figure Current 3 Depth of sequencing. state-of-the-art A To determine methods whether enough are tags have probabilistic been sequenced, a simulation can be carried out to characterize the fraction of the peaks that would be recovered if a smaller number of tags had been Enrichment Enrichment sequenced. In many cases, new statistically significant ratio: peaks 4 are discovered at ratio: a steady 1.5 rate with an increasing number of tags (solid curve) that is, there is no saturation of binding sites. However, when a minimum threshold is imposed for the enrichment ratio between chromatin ChIP immunoprecipitation 20 (ChIP) 150 and input DNA peaks, the rate at which new peaks are discovered slows down (dashed curve) that is, saturation of detected binding sites can occur when only sufficiently prominent binding positions are considered. For a given data set, multiple curves corresponding to different thresholds can be examined to identify 5 the threshold at which the curve becomes sufficiently flat to meet the Control 100 Bb ChIP ChIP 15 Statistically significant 20 Enrichment ratio: Enrichment ratio: 1.5
19 Model-based Analysis of ChIP-Seq (MACS) A commonlyfigure used S6. method Workflow chart for of MACS. detecting TF binding sites from ChIP-seq data: MACS (Zhang et al, 2008) Workflow: Figure from (Zhang et al., 2008)
20 Model-based Analysis of ChIP-Seq (MACS) Find model peaks: mfoldlow and mfold high : high confidence fold-enrichment parameters MACS slides 2 bandwidth window across the genome to find regions with tags (=reads) more than mfold low (but smaller than mfold high to avoid artefacts) enriched relative to a random tag genome distribution bandwidth = assumed sonicated fragment size
21 Model-based Analysis of ChIP-Seq (MACS) Model the shift size of ChIP-seq tags Take 1000 high-quality peaks randomly Separate Watson and Crick tags Aligns the tags by the mid point Find d: distance between the modes of the Watson and Crick peaks in the alignment Genome Biology 2008, Volume 9, Iss (a) (b) Tag percentage (%) alized) / 10 kb ontrol / kb d = 126 bp Watson tags Crick tags Tag percentage (%) d = 122 bp (c) Location with respect to the center of Watson and Crick peaks (bp) Figure from (Zhang et al., 2008) (d) Location with respect to F
22 Model-based Analysis of ChIP-Seq (MACS) Shift all the tags by d/2 toward the 3 ends to the most likely protein-dna interaction sites Remove redundant tags: Sometimes the same tag can be sequenced repeatedly, more than expected from a random genome-wide tag distribution Such tags might arise from biases during ChIP-DNA amplification and sequencing library preparation (PCR duplicates) These are likely to add noise to the final peak calls MACS removes duplicate tags in excess of what is warranted by the sequencing depth (binomial distribution p-value < 10 5 ) For example, for the 3.9 million ChIP-seq tags, MACS allows each genomic position to contain no more than one tag and removes all the redundancies
23 Model-based Analysis of ChIP-Seq (MACS) Identifying the most likely binding sites Counting process: if reads were sampled independently from a population with given, fixed fractions of genomic locations, the read counts x in each genomic location/window would follow a multinomial distribution (binomial for a single location), which can be approximated by the Poisson distribution
24 Multinomial and binomial distributions n independent trials Each trial leads to a success for exactly one of k categories Each category has a fixed success probability p k Multinomial gives probability of any particular combination of numbers of successes for the k categories Binomial is a special case of 2 categories; can be approximated with a Poisson for larger n
25 Model-based Analysis of ChIP-Seq (MACS) Let x denote the number of sequencing reads in a window (genomic region) Analysing each genomic window independently, if all genomic regions would be equally likely, then x Poisson( λ BG ) = λx BG x! e λ BG, x = 0, 1, 2,... where λ BG is the rate of observing reads in the control sample For experiments with a control, MACS linearly scales the total control tag count to be the same as the total ChIP tag count
26 Model-based Analysis of ChIP-Seq (MACS) Let x denote the number of sequencing reads in a window (genomic region) Analysing each genomic window independently, if all genomic regions would be equally likely, then x Poisson( λ BG ) = λx BG x! e λ BG, x = 0, 1, 2,... where λ BG is the rate of observing reads in the control sample For experiments with a control, MACS linearly scales the total control tag count to be the same as the total ChIP tag count Because ChIP-seq data has several bias sources which vary across the genome, it is better to model the data using a local/dynamic Poisson λ local = max(λ BG, [λ 1K ], λ 5K, λ 10K ) λ X K is estimated from the X K window of the control sample (e.g. input-dna) centered at a specific location
27 Model-based Analysis of ChIP-Seq (MACS) Assessing statistical significance of x reads (in a genomic region i) using hypothesis testing H 0 : the ith location is not a binding site H 1 : the ith location is a binding site The p-value is the probability of observing x many reads or more, assuming the null hypothesis is true: p value = Poisson(x λ local ) k=x
28 Model-based Analysis of ChIP-Seq (MACS) Assessing statistical significance of x reads (in a genomic region i) using hypothesis testing H 0 : the ith location is not a binding site H 1 : the ith location is a binding site The p-value is the probability of observing x many reads or more, assuming the null hypothesis is true: p value = Poisson(x λ local ) k=x The location with the highest fragment pileup (summit) is predicted as the precise binding location The ratio between the ChIP-seq tag count and λ local is reported as the fold enrichment
29 Tag percentage (%) Model-based Analysis of ChIP-Seq (MACS) The read count in ChIP-seq versus control in 10 kb windows across the genome. Each dot represents a 10 kb window; red dots are windows Location with respect to the center of Watson and Crick peaks (bp) containing ChIP peaks and black dots are windows containing (c) (d) control peaks used for FDR calculation. Control tag number (normalized) / 10 kb Average tag number in control / kb Tag percentage (%) Location wit , (e) FoxA1 ChIP Seq tag number / 10 kb Figure from (Zhang et al., 2008) Motif occurrence in peak centers for FoxA1 ChIP-Seq (f) Distance Spatial resol 70 MACS Without local lambda Without tag shifting 5 MACS Without local lambda Without tag shifting
30 ChIP-seq peak: Illustration An illustration of a strong TF binding site Figure from
31 Multiple correction in MACS For a ChIP-seq experiment with controls, MACS empirically estimates the false discovery rate (FDR) At each p-value, MACS uses the same parameters to find ChIP-seq peaks over control, and Control peaks over ChIP-seq (i.e., a sample swap) The empirical FDR is defined as empirical FDR = #control peaks #ChIP peaks
32 Differential binding MACS can also be applied to differential binding between two conditions by treating one of the samples as the control Differential binding analysis in MACS will only work with two samples, i.e., one biological replicate per condition Empirical FDR control will not work in such a scenario
33 Summary ChIP-seq is a powerful way to detect TF binding sites (or e.g. enriched regions of histone modifications; next lecture) ChIP-seq approaches are limited in that Only a subset of all TFs have a chip-grade antibody None of the antibodies are perfect A single experiment will profile a single protein
34 References Metzker ML (2010) Sequencing technologies - the next generation, Nat Rev Genet. 11(1): Park PJ (2009) ChIP-seq: advantages and challenges of a maturing technology, Nat Rev Genet. 10(10): Axel Visel, Edward M. Rubin & Len A. Pennacchio (2009) Genomic views of distant-acting enhancers, Nature 461, Zhang Y et al. (2008), Model-based analysis of ChIP-Seq (MACS), Genome Biol. 9(9):R137. Zhang X et al. (2011) PICS: probabilistic inference for ChIP-seq, Biometrics, 67(1): Wyeth W. Wasserman & Albin Sandelin, Applied bioinformatics for the identification of regulatory elements, Nature Reviews Genetics 5, , 2004.
Peak-calling for ChIP-seq and ATAC-seq
Peak-calling for ChIP-seq and ATAC-seq Shamith Samarajiwa CRUK Autumn School in Bioinformatics 2017 University of Cambridge Overview Peak-calling: identify enriched (signal) regions in ChIP-seq or ATAC-seq
More informationChIP-seq analysis. J. van Helden, M. Defrance, C. Herrmann, D. Puthier, N. Servant, M. Thomas-Chollier, O.Sand
ChIP-seq analysis J. van Helden, M. Defrance, C. Herrmann, D. Puthier, N. Servant, M. Thomas-Chollier, O.Sand Tuesday : quick introduction to ChIP-seq and peak-calling (Presentation + Practical session)
More informationThe Epigenome Tools 2: ChIP-Seq and Data Analysis
The Epigenome Tools 2: ChIP-Seq and Data Analysis Chongzhi Zang zang@virginia.edu http://zanglab.com PHS5705: Public Health Genomics March 20, 2017 1 Outline Epigenome: basics review ChIP-seq overview
More informationAccessing and Using ENCODE Data Dr. Peggy J. Farnham
1 William M Keck Professor of Biochemistry Keck School of Medicine University of Southern California How many human genes are encoded in our 3x10 9 bp? C. elegans (worm) 959 cells and 1x10 8 bp 20,000
More informationProcessing, integrating and analysing chromatin immunoprecipitation followed by sequencing (ChIP-seq) data
Processing, integrating and analysing chromatin immunoprecipitation followed by sequencing (ChIP-seq) data Bioinformatics methods, models and applications to disease Alex Essebier ChIP-seq experiment To
More informationComputational aspects of ChIP-seq. John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory
Computational aspects of ChIP-seq John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory ChIP-seq Using highthroughput sequencing to investigate DNA
More informationPart-II: Statistical analysis of ChIP-seq data
Part-II: Statistical analysis of ChIP-seq data Outline ChIP-seq data, features, detailed modeling aspects (today). Other ChIP-seq related problems - overview (next lecture). IDR (next lecture) Stat 877
More informationAnalysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers
Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers Gordon Blackshields Senior Bioinformatician Source BioScience 1 To Cancer Genetics Studies
More informationComputational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq
Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq Philipp Bucher Wednesday January 21, 2009 SIB graduate school course EPFL, Lausanne ChIP-seq against histone variants: Biological
More informationBreast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data
Breast cancer Inferring Transcriptional Module from Breast Cancer Profile Data Breast Cancer and Targeted Therapy Microarray Profile Data Inferring Transcriptional Module Methods CSC 177 Data Warehousing
More informationDNA Sequence Bioinformatics Analysis with the Galaxy Platform
DNA Sequence Bioinformatics Analysis with the Galaxy Platform University of São Paulo, Brazil 28 July - 1 August 2014 Dave Clements Johns Hopkins University Robson Francisco de Souza University of São
More informationHigh-Throughput Sequencing Course
High-Throughput Sequencing Course Introduction Biostatistics and Bioinformatics Summer 2017 From Raw Unaligned Reads To Aligned Reads To Counts Differential Expression Differential Expression 3 2 1 0 1
More informationChIP-seq hands-on. Iros Barozzi, Campus IFOM-IEO (Milan) Saverio Minucci, Gioacchino Natoli Labs
ChIP-seq hands-on Iros Barozzi, Campus IFOM-IEO (Milan) Saverio Minucci, Gioacchino Natoli Labs Main goals Becoming familiar with essential tools and formats Visualizing and contextualizing raw data Understand
More informationStatistical analysis of ChIP-seq data
Statistical analysis of ChIP-seq data Sündüz Keleş Department of Statistics Department of Biostatistics and Medical Informatics University of Wisconsin, Madison Feb 9, 2012 BMI 776 (Spring 12) 02/09/2012
More information7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans.
Supplementary Figure 1 7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans. Regions targeted by the Even and Odd ChIRP probes mapped to a secondary structure model 56 of the
More informationSession 6: Integration of epigenetic data. Peter J Park Department of Biomedical Informatics Harvard Medical School July 18-19, 2016
Session 6: Integration of epigenetic data Peter J Park Department of Biomedical Informatics Harvard Medical School July 18-19, 2016 Utilizing complimentary datasets Frequent mutations in chromatin regulators
More informationHigh Throughput Sequence (HTS) data analysis. Lei Zhou
High Throughput Sequence (HTS) data analysis Lei Zhou (leizhou@ufl.edu) High Throughput Sequence (HTS) data analysis 1. Representation of HTS data. 2. Visualization of HTS data. 3. Discovering genomic
More informationRaymond Auerbach PhD Candidate, Yale University Gerstein and Snyder Labs August 30, 2012
Elucidating Transcriptional Regulation at Multiple Scales Using High-Throughput Sequencing, Data Integration, and Computational Methods Raymond Auerbach PhD Candidate, Yale University Gerstein and Snyder
More informationPatterns of Histone Methylation and Chromatin Organization in Grapevine Leaf. Rachel Schwope EPIGEN May 24-27, 2016
Patterns of Histone Methylation and Chromatin Organization in Grapevine Leaf Rachel Schwope EPIGEN May 24-27, 2016 What does H3K4 methylation do? Plant of interest: Vitis vinifera Culturally important
More informationChip Seq Peak Calling in Galaxy
Chip Seq Peak Calling in Galaxy Chris Seward PowerPoint by Pei-Chen Peng Chip-Seq Peak Calling in Galaxy Chris Seward 2018 1 Introduction This goals of the lab are as follows: 1. Gain experience using
More informationGenome-wide Association Studies (GWAS) Pasieka, Science Photo Library
Lecture 5 Genome-wide Association Studies (GWAS) Pasieka, Science Photo Library Chi-squared test to evaluate whether the odds ratio is different from 1. Corrected for multiple testing Source: wikipedia.org
More informationIntroduction to Systems Biology of Cancer Lecture 2
Introduction to Systems Biology of Cancer Lecture 2 Gustavo Stolovitzky IBM Research Icahn School of Medicine at Mt Sinai DREAM Challenges High throughput measurements: The age of omics Systems Biology
More informationNot IN Our Genes - A Different Kind of Inheritance.! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014
Not IN Our Genes - A Different Kind of Inheritance! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014 Epigenetics in Mainstream Media Epigenetics *Current definition:
More informationNature Genetics: doi: /ng Supplementary Figure 1
Supplementary Figure 1 Expression deviation of the genes mapped to gene-wise recurrent mutations in the TCGA breast cancer cohort (top) and the TCGA lung cancer cohort (bottom). For each gene (each pair
More informationColorspace & Matching
Colorspace & Matching Outline Color space and 2-base-encoding Quality Values and filtering Mapping algorithm and considerations Estimate accuracy Coverage 2 2008 Applied Biosystems Color Space Properties
More informationDNA-seq Bioinformatics Analysis: Copy Number Variation
DNA-seq Bioinformatics Analysis: Copy Number Variation Elodie Girard elodie.girard@curie.fr U900 institut Curie, INSERM, Mines ParisTech, PSL Research University Paris, France NGS Applications 5C HiC DNA-seq
More informationAVENIO ctdna Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB
Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB Analysis Kits Next-generation performance in liquid biopsies 2 Accelerating clinical research From liquid biopsy to next-generation
More informationAVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits
AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits Accelerating clinical research Next-generation sequencing (NGS) has the ability to interrogate many different genes and detect
More informationSupplementary Figure 1 IL-27 IL
Tim-3 Supplementary Figure 1 Tc0 49.5 0.6 Tc1 63.5 0.84 Un 49.8 0.16 35.5 0.16 10 4 61.2 5.53 10 3 64.5 5.66 10 2 10 1 10 0 31 2.22 10 0 10 1 10 2 10 3 10 4 IL-10 28.2 1.69 IL-27 Supplementary Figure 1.
More informationSupplementary Figures
Supplementary Figures Supplementary Figure 1. Heatmap of GO terms for differentially expressed genes. The terms were hierarchically clustered using the GO term enrichment beta. Darker red, higher positive
More informationRelationship between genomic features and distributions of RS1 and RS3 rearrangements in breast cancer genomes.
Supplementary Figure 1 Relationship between genomic features and distributions of RS1 and RS3 rearrangements in breast cancer genomes. (a,b) Values of coefficients associated with genomic features, separately
More informationSupplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells.
SUPPLEMENTAL FIGURE AND TABLE LEGENDS Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells. A) Cirbp mrna expression levels in various mouse tissues collected around the clock
More informationRNA-seq Introduction
RNA-seq Introduction DNA is the same in all cells but which RNAs that is present is different in all cells There is a wide variety of different functional RNAs Which RNAs (and sometimes then translated
More informationMutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research
Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research Application Note Authors John McGuigan, Megan Manion,
More informationgenomics for systems biology / ISB2020 RNA sequencing (RNA-seq)
RNA sequencing (RNA-seq) Module Outline MO 13-Mar-2017 RNA sequencing: Introduction 1 WE 15-Mar-2017 RNA sequencing: Introduction 2 MO 20-Mar-2017 Paper: PMID 25954002: Human genomics. The human transcriptome
More informationNature Structural & Molecular Biology: doi: /nsmb.2419
Supplementary Figure 1 Mapped sequence reads and nucleosome occupancies. (a) Distribution of sequencing reads on the mouse reference genome for chromosome 14 as an example. The number of reads in a 1 Mb
More informationGenome-wide relationship between histone H3 lysine 4 mono- and tri-methylation and transcription factor binding
Letter Genome-wide relationship between histone H3 lysine 4 mono- and tri-methylation and transcription factor binding A. Gordon Robertson, 1,4 Mikhail Bilenky, 1,4 Angela Tam, 1 Yongjun Zhao, 1 Thomas
More informationTranscript reconstruction
Transcript reconstruction Summary I Data types, file formats and utilities Annotation: Genomic regions Genes Peaks bedtools Alignment: Map reads BAM/SAM Samtools Aggregation: Summary files Wig (UCSC) TDF
More informationAdvance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library
Advance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library Marilou Wijdicks International Product Manager Research For Life Science Research Only. Not for Use in Diagnostic Procedures.
More informationMechanisms of alternative splicing regulation
Mechanisms of alternative splicing regulation The number of mechanisms that are known to be involved in splicing regulation approximates the number of splicing decisions that have been analyzed in detail.
More informationGene Selection for Tumor Classification Using Microarray Gene Expression Data
Gene Selection for Tumor Classification Using Microarray Gene Expression Data K. Yendrapalli, R. Basnet, S. Mukkamala, A. H. Sung Department of Computer Science New Mexico Institute of Mining and Technology
More informationAbstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction
Optimization strategy of Copy Number Variant calling using Multiplicom solutions Michael Vyverman, PhD; Laura Standaert, PhD and Wouter Bossuyt, PhD Abstract Copy number variations (CNVs) represent a significant
More informationRecent advances in ChIP-seq analysis: from quality management to whole-genome annotation
Briefings in Bioinformatics Advance Access published March 15, 2016 Briefings in Bioinformatics, 2016, 1 12 doi: 10.1093/bib/bbw023 Paper Recent advances in ChIP-seq analysis: from quality management to
More informationLecture 8 Understanding Transcription RNA-seq analysis. Foundations of Computational Systems Biology David K. Gifford
Lecture 8 Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology David K. Gifford 1 Lecture 8 RNA-seq Analysis RNA-seq principles How can we characterize mrna isoform
More informationcis-regulatory enrichment analysis in human, mouse and fly
cis-regulatory enrichment analysis in human, mouse and fly Zeynep Kalender Atak, PhD Laboratory of Computational Biology VIB-KU Leuven Center for Brain & Disease Research Laboratory of Computational Biology
More informationGenome-Wide Localization of Protein-DNA Binding and Histone Modification by a Bayesian Change-Point Method with ChIP-seq Data
Genome-Wide Localization of Protein-DNA Binding and Histone Modification by a Bayesian Change-Point Method with ChIP-seq Data Haipeng Xing, Yifan Mo, Will Liao, Michael Q. Zhang Clayton Davis and Geoffrey
More informationChipSeq. Technique and science. The genome wide dynamics of the binding of ldb1 complexes during erythroid differentiation
Center for Biomics ChipSeq Technique and science The genome wide dynamics of the binding of ldb1 complexes during erythroid differentiation Wilfred van IJcken Sequencing Seminar Illumina September 8 Scheme
More informationUser Guide. Association analysis. Input
User Guide TFEA.ChIP is a tool to estimate transcription factor enrichment in a set of differentially expressed genes using data from ChIP-Seq experiments performed in different tissues and conditions.
More informationPackage NarrowPeaks. August 3, Version Date Type Package
Package NarrowPeaks August 3, 2013 Version 1.5.0 Date 2013-02-13 Type Package Title Analysis of Variation in ChIP-seq using Functional PCA Statistics Author Pedro Madrigal , with contributions
More informationSUPPLEMENTARY INFORMATION
doi:10.1038/nature16931 Contents 1 Nomenclature 3 2 Fertility of homozygous B6 H/H and heterozygous B6 B6/H mice 3 3 DSB hotspot maps 3 3.1 DSB hotspot caller............................... 3 3.2 Calling
More informationYingying Wei George Wu Hongkai Ji
Stat Biosci (2013) 5:156 178 DOI 10.1007/s12561-012-9066-5 Global Mapping of Transcription Factor Binding Sites by Sequencing Chromatin Surrogates: a Perspective on Experimental Design, Data Analysis,
More informationMATS: a Bayesian framework for flexible detection of differential alternative splicing from RNA-Seq data
Nucleic Acids Research Advance Access published February 1, 2012 Nucleic Acids Research, 2012, 1 13 doi:10.1093/nar/gkr1291 MATS: a Bayesian framework for flexible detection of differential alternative
More informationRASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays
Supplementary Materials RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays Junhee Seok 1*, Weihong Xu 2, Ronald W. Davis 2, Wenzhong Xiao 2,3* 1 School of Electrical Engineering,
More informationMBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1. Lecture 27: Systems Biology and Bayesian Networks
MBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1 Lecture 27: Systems Biology and Bayesian Networks Systems Biology and Regulatory Networks o Definitions o Network motifs o Examples
More informationExploring chromatin regulation by ChIP-Sequencing
Exploring chromatin regulation by ChIP-Sequencing From datasets quality assessment, enrichment patterns identification and multi-profiles integration to the reconstitution of gene regulatory wires describing
More informationDiscovery of Novel Human Gene Regulatory Modules from Gene Co-expression and
Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and Promoter Motif Analysis Shisong Ma 1,2*, Michael Snyder 3, and Savithramma P Dinesh-Kumar 2* 1 School of Life Sciences, University
More informationP. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( ); g ( ) Bioinformatics Center, Chang Gung University.
Databases and Tools for High Throughput Sequencing Analysis P. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( ); g ( ) Bioinformatics Center, Chang Gung University. HTseq Platforms Applications on Biomedical Sciences
More informationChIPSeq. Technique and science. The genome wide dynamics of the binding of ldb1 complexes during erythroid differentiation
Center for Biomics ChIPSeq Technique and science The genome wide dynamics of the binding of ldb1 complexes during erythroid differentiation Wilfred van IJcken EU Sequencing Seminar Illumina June 16 Genomics
More informationChapter 1. Introduction
Chapter 1 Introduction 1.1 Motivation and Goals The increasing availability and decreasing cost of high-throughput (HT) technologies coupled with the availability of computational tools and data form a
More informationMIR retrotransposon sequences provide insulators to the human genome
Supplementary Information: MIR retrotransposon sequences provide insulators to the human genome Jianrong Wang, Cristina Vicente-García, Davide Seruggia, Eduardo Moltó, Ana Fernandez- Miñán, Ana Neto, Elbert
More informationGolden Helix s End-to-End Solution for Clinical Labs
Golden Helix s End-to-End Solution for Clinical Labs Steven Hystad - Field Application Scientist Nathan Fortier Senior Software Engineer 20 most promising Biotech Technology Providers Top 10 Analytics
More informationObstacles and challenges in the analysis of microrna sequencing data
Obstacles and challenges in the analysis of microrna sequencing data (mirna-seq) David Humphreys Genomics core Dr Victor Chang AC 1936-1991, Pioneering Cardiothoracic Surgeon and Humanitarian The ABCs
More informationTranscript-indexed ATAC-seq for immune profiling
Transcript-indexed ATAC-seq for immune profiling Technical Journal Club 22 nd of May 2018 Christina Müller Nature Methods, Vol.10 No.12, 2013 Nature Biotechnology, Vol.32 No.7, 2014 Nature Medicine, Vol.24,
More informationSimple, rapid, and reliable RNA sequencing
Simple, rapid, and reliable RNA sequencing RNA sequencing applications RNA sequencing provides fundamental insights into how genomes are organized and regulated, giving us valuable information about the
More informationBroad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes
Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes Kaifu Chen 1,2,3,4,5,10, Zhong Chen 6,10, Dayong Wu 6, Lili Zhang 7, Xueqiu Lin 1,2,8,
More informationPhylogenomics. Antonis Rokas Department of Biological Sciences Vanderbilt University.
Phylogenomics Antonis Rokas Department of Biological Sciences Vanderbilt University http://as.vanderbilt.edu/rokaslab High-Throughput DNA Sequencing Technologies 454 / Roche 450 bp 1.5 Gbp / day Illumina
More informationTutorial. ChIP Sequencing. Sample to Insight. September 15, 2016
ChIP Sequencing September 15, 2016 Sample to Insight CLC bio, a QIAGEN Company Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.clcbio.com support-clcbio@qiagen.com ChIP Sequencing
More informationcn.mops - Mixture of Poissons for CNV detection in NGS data Günter Klambauer Institute of Bioinformatics, Johannes Kepler University Linz
Software Manual Institute of Bioinformatics, Johannes Kepler University Linz cn.mops - Mixture of Poissons for CNV detection in NGS data Günter Klambauer Institute of Bioinformatics, Johannes Kepler University
More informationSTAT1 regulates microrna transcription in interferon γ stimulated HeLa cells
CAMDA 2009 October 5, 2009 STAT1 regulates microrna transcription in interferon γ stimulated HeLa cells Guohua Wang 1, Yadong Wang 1, Denan Zhang 1, Mingxiang Teng 1,2, Lang Li 2, and Yunlong Liu 2 Harbin
More informationEPIGENOMICS PROFILING SERVICES
EPIGENOMICS PROFILING SERVICES Chromatin analysis DNA methylation analysis RNA-seq analysis Diagenode helps you uncover the mysteries of epigenetics PAGE 3 Integrative epigenomics analysis DNA methylation
More informationcn.mops - Mixture of Poissons for CNV detection in NGS data Günter Klambauer Institute of Bioinformatics, Johannes Kepler University Linz
Software Manual Institute of Bioinformatics, Johannes Kepler University Linz cn.mops - Mixture of Poissons for CNV detection in NGS data Günter Klambauer Institute of Bioinformatics, Johannes Kepler University
More informationA Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis
A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis Jian Xu, Ph.D. Children s Research Institute, UTSW Introduction Outline Overview of genomic and next-gen sequencing technologies
More informationHigh-resolution characterization of sequence signatures due to non-random cleavage of cell-free DNA
Chandrananda et al. BMC Medical Genomics (2015) 8:29 DOI 10.1186/s12920-015-0107-z RESEARCH ARTICLE High-resolution characterization of sequence signatures due to non-random cleavage of cell-free DNA Dineika
More informationfl/+ KRas;Atg5 fl/+ KRas;Atg5 fl/fl KRas;Atg5 fl/fl KRas;Atg5 Supplementary Figure 1. Gene set enrichment analyses. (a) (b)
KRas;At KRas;At KRas;At KRas;At a b Supplementary Figure 1. Gene set enrichment analyses. (a) GO gene sets (MSigDB v3. c5) enriched in KRas;Atg5 fl/+ as compared to KRas;Atg5 fl/fl tumors using gene set
More informationTranscriptional control in Eukaryotes: (chapter 13 pp276) Chromatin structure affects gene expression. Chromatin Array of nuc
Transcriptional control in Eukaryotes: (chapter 13 pp276) Chromatin structure affects gene expression Chromatin Array of nuc 1 Transcriptional control in Eukaryotes: Chromatin undergoes structural changes
More informationSUPPLEMENTAL INFORMATION
SUPPLEMENTAL INFORMATION GO term analysis of differentially methylated SUMIs. GO term analysis of the 458 SUMIs with the largest differential methylation between human and chimp shows that they are more
More informationSequence and chromatin determinants of cell-type specific transcription factor binding
Method Sequence and chromatin determinants of cell-type specific transcription factor binding Aaron Arvey, 1 Phaedra Agius, 1 William Stafford Noble, 2 and Christina Leslie 1,3 1 Computational Biology
More informationGenomic structural variation
Genomic structural variation Mario Cáceres The new genomic variation DNA sequence differs across individuals much more than researchers had suspected through structural changes A huge amount of structural
More informationAnalysis of the peroxisome proliferator-activated receptor-β/δ (PPARβ/δ) cistrome reveals novel co-regulatory role of ATF4
Khozoie et al. BMC Genomics 2012, 13:665 RESEARCH ARTICLE Open Access Analysis of the peroxisome proliferator-activated receptor-β/δ (PPARβ/δ) cistrome reveals novel co-regulatory role of ATF4 Combiz Khozoie
More informationSupplemental Figure S1. Tertiles of FKBP5 promoter methylation and internal regulatory region
Supplemental Figure S1. Tertiles of FKBP5 promoter methylation and internal regulatory region methylation in relation to PSS and fetal coupling. A, PSS values for participants whose placentas showed low,
More informationHands-On Ten The BRCA1 Gene and Protein
Hands-On Ten The BRCA1 Gene and Protein Objective: To review transcription, translation, reading frames, mutations, and reading files from GenBank, and to review some of the bioinformatics tools, such
More informationTranscriptome Analysis
Transcriptome Analysis Data Preprocessing Sample Preparation Illumina Sequencing Demultiplexing Raw FastQ Reference Genome (fasta) Reference Annotation (GTF) Reference Genome Analysis Tophat Accepted hits
More informationEukaryotic small RNA Small RNAseq data analysis for mirna identification
Eukaryotic small RNA Small RNAseq data analysis for mirna identification P. Bardou, C. Gaspin, S. Maman, J. Mariette, O. Rué, M. Zytnicki INRA Sigenae Toulouse INRA MIA Toulouse GenoToul Bioinfo INRA MaIAGE
More informationLecture #4: Overabundance Analysis and Class Discovery
236632 Topics in Microarray Data nalysis Winter 2004-5 November 15, 2004 Lecture #4: Overabundance nalysis and Class Discovery Lecturer: Doron Lipson Scribes: Itai Sharon & Tomer Shiran 1 Differentially
More informationEXPression ANalyzer and DisplayER
EXPression ANalyzer and DisplayER Tom Hait Aviv Steiner Igor Ulitsky Chaim Linhart Amos Tanay Seagull Shavit Rani Elkon Adi Maron-Katz Dorit Sagir Eyal David Roded Sharan Israel Steinfeld Yossi Shiloh
More informationStatistical analysis of RIM data (retroviral insertional mutagenesis) Bioinformatics and Statistics The Netherlands Cancer Institute Amsterdam
Statistical analysis of RIM data (retroviral insertional mutagenesis) Lodewyk Wessels Bioinformatics and Statistics The Netherlands Cancer Institute Amsterdam Viral integration Viral integration Viral
More informationMeasuring DNA Methylation with the MinION. Winston Timp Department of Biomedical Engineering Johns Hopkins University 12/1/16
Measuring DNA Methylation with the MinION Winston Timp Department of Biomedical Engineering Johns Hopkins University 12/1/16 Epigenetics: Modern Modern Definition of epigenetics involves heritable changes
More informationCTCF-Mediated Functional Chromatin Interactome in Pluripotent Cells
SUPPLEMENTARY INFORMATION CTCF-Mediated Functional Chromatin Interactome in Pluripotent Cells Lusy Handoko 1,*, Han Xu 1,*, Guoliang Li 1,*, Chew Yee Ngan 1, Elaine Chew 1, Marie Schnapp 1, Charlie Wah
More informationMODULE 4: SPLICING. Removal of introns from messenger RNA by splicing
Last update: 05/10/2017 MODULE 4: SPLICING Lesson Plan: Title MEG LAAKSO Removal of introns from messenger RNA by splicing Objectives Identify splice donor and acceptor sites that are best supported by
More informationNeuron, Volume 63 Spatial attention decorrelates intrinsic activity fluctuations in Macaque area V4.
Neuron, Volume 63 Spatial attention decorrelates intrinsic activity fluctuations in Macaque area V4. Jude F. Mitchell, Kristy A. Sundberg, and John H. Reynolds Systems Neurobiology Lab, The Salk Institute,
More informationA complete next-generation sequencing workfl ow for circulating cell-free DNA isolation and analysis
APPLICATION NOTE Cell-Free DNA Isolation Kit A complete next-generation sequencing workfl ow for circulating cell-free DNA isolation and analysis Abstract Circulating cell-free DNA (cfdna) has been shown
More informationVariant Classification. Author: Mike Thiesen, Golden Helix, Inc.
Variant Classification Author: Mike Thiesen, Golden Helix, Inc. Overview Sequencing pipelines are able to identify rare variants not found in catalogs such as dbsnp. As a result, variants in these datasets
More informationBelow, we included the point-to-point response to the comments of both reviewers.
To the Editor and Reviewers: We would like to thank the editor and reviewers for careful reading, and constructive suggestions for our manuscript. According to comments from both reviewers, we have comprehensively
More informationThe Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0
The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0 Introduction Loss of erozygosity (LOH) represents the loss of allelic differences. The SNP markers on the SNP Array 6.0 can be used
More informationSelective depletion of abundant RNAs to enable transcriptome analysis of lowinput and highly-degraded RNA from FFPE breast cancer samples
DNA CLONING DNA AMPLIFICATION & PCR EPIGENETICS RNA ANALYSIS Selective depletion of abundant RNAs to enable transcriptome analysis of lowinput and highly-degraded RNA from FFPE breast cancer samples LIBRARY
More informationThe Importance of Coverage Uniformity Over On-Target Rate for Efficient Targeted NGS
WHITE PAPER The Importance of Coverage Uniformity Over On-Target Rate for Efficient Targeted NGS Yehudit Hasin-Brumshtein, Ph.D., Maria Celeste M. Ramirez, Ph.D., Leonardo Arbiza, Ph.D., Ramsey Zeitoun,
More informationHao D. H., Ma W. G., Sheng Y. L., Zhang J. B., Jin Y. F., Yang H. Q., Li Z. G., Wang S. S., GONG Ming*
Comparison of transcriptomes and gene expression profiles of two chilling- and drought-tolerant and intolerant Nicotiana tabacum varieties under low temperature and drought stress Hao D. H., Ma W. G.,
More informationGenerating Spontaneous Copy Number Variants (CNVs) Jennifer Freeman Assistant Professor of Toxicology School of Health Sciences Purdue University
Role of Chemical lexposure in Generating Spontaneous Copy Number Variants (CNVs) Jennifer Freeman Assistant Professor of Toxicology School of Health Sciences Purdue University CNV Discovery Reference Genetic
More informationSupervised Learner for the Prediction of Hi-C Interaction Counts and Determination of Influential Features. Tyler Yue Lab
Supervised Learner for the Prediction of Hi-C Interaction Counts and Determination of Influential Features Tyler Derr @ Yue Lab tsd5037@psu.edu Background Hi-C is a chromosome conformation capture (3C)
More informationthe reaction was stopped by adding glycine to final concentration 0.2M for 10 minutes at
Supplemental Material Material and Methods. ChIP. Cells were crosslinked with formaldehyde 0.4% for 10 min at room temperature and the reaction was stopped by adding glycine to final concentration 0.2M
More information