Studio delle modificazioni post-trascrizionali mediante tecnologia RNA-seq

Size: px
Start display at page:

Download "Studio delle modificazioni post-trascrizionali mediante tecnologia RNA-seq"

Transcription

1 Studio delle modificazioni post-trascrizionali mediante tecnologia RNA-seq Ernesto Picardi University of Bari IBIOM-CNR

2 2001 Publication of the human genome sequence Surprisingly the human genome contains about 23,000 (Ensembl GRCh38.p10) distinct protein-coding genes and other less complex species like the nematode Caenorhabditis elegans have a similar number of proteincoding genes (Frazer 2012, Genome Research).

3 Gene number is not correlated to organism complexity (something we learned from genome projects) Gene number Genome size (Mb) Number of genes in prokaryotes (up to 8000) Genome size in prokaryotes (up to 9 Mb) human mouse chicken xenopus zebrafish fugu ciona fly worm yeast

4 Explaining the paradox of gene numbers We can define the complexity of an organism as the number of theoretical transcriptome states that its genome could achieve, where the transcriptome represents the universe of transcripts for the genome. According to the simplest model, in which each gene is either ON or OFF, a genome with N genes can (theoretically) encode 2 N states. 2 23, , , ,000 = 900 However, gene expression exhibits more than two states. A trivial mathematical model can thus illustrate how a relatively small number of genes could be sufficient to generate a tremendous biological complexity. Calverie 2001 Science

5 Genetic flux of information genome Gene 1 Gene 2 24,000 23,000 transcripts X A C A B C products 200,000 X A C A B C

6 Alternative splicing expands the functional potential of encoded genes: antagonist functions of caspase 9 (CASP9) activity modulated by Alternative Splicing The constitutive form of the protein (CASP9, 9 exons, 416 aa) induces apoptosis. It contains a Caspase recruitment domain (CARD) and a Peptidase_C14 caspase domain. The shorter isoform of the protein (CASP9S, 5 exons, 266 aa) contains a Caspase recruitment domain (CARD) and a truncated Peptidase_C14 caspase domain. This isoform lacks protease activity and acts as an apoptosis inhibitor. Waltereit R, Weller M: The role of caspases 9 and 9-short (9S) in death ligand- and drug-induced apoptosis in human astrocytoma cells. Brain Res 2002, 106:42-49

7 RNA-Seq RNA-Seq refers to experimental procedures that generate sequence reads derived from the entire RNA molecule. It can be used to build a complete map of the transcriptome across all cell types, perturbations and states. Samples of interest Isolate RNAs Generate cdna, fragment, size select, add linkers Condition 1 (normal colon) Condition 2 (colon tumor) Sequence Map to genome, transcriptome, and predicted exon junctions Downstream analysis 10s of billions bases of sequence

8 RNA-Seq: Applications Gene/transcript expression ex1 ex2 ex3 Isoform reconstruction full RNA Alternative splicing detection ex1 ex2 ex3 ex1 ex3 Gene discovery gene 1 gene 2 gene x RNA editing identification * * * DNA

9 RNA editing Basic intro RNA editing is a widespread post-transcriptional molecular phenomenon that can increase the complexity of the eukaryotic transcriptome and proteome through a variety of mechanistically and evolutionarily unrelated pathways. Genomic DNA 5 3 A A A Primary mrna 5 I I I 3 5 I I I (A) n 3 Mature mrna Primary RNAs are modified at specific positions by base substitutions or insertions and deletions.

10 The impact of RNA editing on scientific community RNA editing publications since N. of publications Years Data from PubMed (last update 20/01/2018)

11 RNA editing in mammals (C-to-U) In mammals, RNA editing occurs mainly by C-to-U or A-to-I conversions.

12 RNA editing in mammals (A-to-I) A-to-I is carried out by ADAR (adenine deaminases acting on RNA) enzymes. Jepson et al., 2008, BBA Zinshteyn et al., 2009, WIREs Syst Biol Med ADAR1 and ADAR2 are expressed in almost all human tissues. ADAR3, instead, is expressed in the brain only.

13 RNA editing in mammals (A-to-I) ADAR enzymes can deaminate As included in RNA duplexes. Keegan et al., 2001, Nature Rev. Gen. Nishikura 2010 Annu. Rev. Biochem Farajollahi and Maas 2010

14 Effects of RNA editing in mammals (A-to-I) A-to-I editing can modulate the gene expression at different levels. Maas S. web site Nishikura 2010 Annu. Rev. Biochem

15 A-to-I RNA editing in the innate immune response ADAR1 is present in the nucleus (ADAR1 p110) and cytoplasm (ADAR1 p150) and can edit endogenous RNA. ADAR1 is required to edit endogenous RNA to prevent the activation of the cytosolic pattern recognition receptor MDA5 in the cytosol, leading to induction of the innate immune/interferon response. ADAR1 can also edit viral dsrna and participate in the innate immune response as a direct interferon-stimulated gene (ADAR1 p150 isoform). The absence of ADAR1 or the absence of ADAR1-mediated editing leads to innapropriate activation of the MDA5 MAVS axis. Walkley and Li 2017 Genome Biol

16 RNA editing in mammals (A-to-I) RNA editing deregulation is associated to several human diseases including neurological disorders as major depression, schizophrenia, epilepsy, amyotrophic lateral sclerosis (ALS) and cancer.

17 RNA editing: experimental detection RNA editing changes can be experimentally detected by comparing the genomic locus with the corresponding cdna sequenced by classical Sanger methodology. Li et al Editing sites (indicated by black arrows) in the amyloid beta A4 precursor protein-binding (APBA1) gene in cerebellum.

18 RNA editing: bioinformatics The detection of RNA editing in human by conventional techniques is not feasible for large-scale experiments. Therefore, many candidate events have been mainly identified by computational analyses, employing mrna/est alignments onto the genome of origin.

19 RNA editing and NGS Massive RNA sequencing can facilitate the study of entire transcriptomes as well as post-transcriptional events occurring herein as alternative splicing and RNA editing. Genome Genome Short reads ESTs/cDNAs Using NGS, each genomic position can be supported by a large number of sequences and this can greatly improve the detection of RNA editing substitutions. Few clones are sequenced by Sanger method and differences between the genome and the related cdnas are scored as RNA editing sites Each editing site is supported by a very restricted number of transcripts.

20 RNA-Seq: Experimental and Practical considerations a. Experimental Design Biological replicates b. Poly(A) enrichment or ribosomal RNA depletion? c. Single-end or Paired end? Always paired-end a. Stranded or not? Prefer Stranded b. How much sequencing data to collect? At least 50M fragments

21 RNA-Seq: analysis workflow RAW Data Quality check Read Mapping Downstream analyses Fastq file(s) FastQC Trimgalore GSNAP STAR Tophat2 RUM HISAT REDItools JACUSA GIREMI

22 RNA-Seq: quality check Per base sequence quality After sequencing After adaptor trimming and removal of low quality regions Generated by FASTQC software

23 RNA-Seq: quality check Per base sequence content After sequencing After adaptor trimming and removal of low quality regions Generated by FASTQC software

24 Reads Reads RNA-Seq: read mapping We need to align the sequence data to our genome of interest In aligning RNA-Seq data to the genome always pick a slice-aware aligner: STAR, TopHat2, MapSplice, SOAPSplice, Passion, SpliceMap, RUM, ABMapper, CRAC, GSNAP, HMMSplicer, Olego, BLAT Genome Alignment Genome Gene Versus Splice-Aware Alignment Gene

25 We can employ NGS data (RNA-Seq, genome resequencing and exome sequencing) to study RNA editing at different levels: genome/exome Vs RNA-Seq to identify new events (REDItools); RNA-Seq to explore the presence of known A-to-I conversions; RNA-Seq: RNA editing detection r1 GGGTGCCTTTATGCAGCAAGGATGCGATATT r2 GGGTGTCTTTATGCAGCAAGGATGCGATACTTCGC Exome r3 GGGTGCCTTTATGCAGCAAGGATGCGATATTTCG r4 GGGTGCCTTTATGCAGCAAGGATGCGATATTTCG r5 GGGTGCCTTTATGCAGCAAGGATGCGATATTTCG...A... gdna TGGGTGCCTTTATGCAGCAAGGATGCGATATTTCGCC...G... r1 GGGTGCCTTTATGCGGCAAGGATGCGATATT r2 GGGTGTCTTTATGCAGCAAGGATGCGATACTTCGC RNA-Seq r3 GGGTGCCTTTATGCGGCAAGGATGCGATATTTCG r4 GGGTGCCTTTATGCGGCAAGGATGCGATATTTCG r5 GGGTGCCTTTATGCGGCAAGGATGCGATATTTCG RNA-Seq to detect de novo new editing candidates; gdna AGCTGGCCAGATACATTAAGACCAGTGCTCACTATGAAG...G... r1 GCTGGCCAGATACATTGAGACCAGTGCTCAC r2 GCTGGCCAGATACATTAAGACCAGTGCTCAC r3 CTGGCCAGATACATTGAGACCAGTGCTCACTATGAAG RNA-Seq r4 CTGGCCAGATACATTGAGACCAGTGCTCACTATG r5 CTGGCCAGATACATTAAGACCAGTGCTCACTATGAAG r6 CTGGCCAGATACATTAAGACCAGTGCTCACTATGAAG r7 CTGGCCAGATACATTGGGACCAGTGCTCACTATGAAG r8 CTGGCCAGATACATTGAGACCAGTGCTCACT r9 CTGGCCAGATACATTGAGACCAGTGCTCACTATGAAG

26 REDItools REDItools are a suite of python scripts to investigate RNA editing at large-scale employing RNA-Seq as well as DNA-Seq (WGS/WES) massive data. Starting point is a BAM file of aligned reads onto the reference genome. BAM file REDItoolDnaRna.py REDItoolKnown.py REDItoolDenovo.py RNA-Seq and DNA-Seq RNA-Seq and known events RNA-Seq only Picardi and Pesole 2013 Bioinformatics

27 Workflow to call RNA editing by REDItools. RNA editing and NGS Pre-aligned DNA-Seq reads Reference genome Pre-aligned RNA-Seq reads DNA-Seq gdna RNA-Seq r1 r2 r3 r4 r5 r1 r2 r3 r4 r5 BAM file GGGTGCCTTTATGCAGCAAGGATGCGATATT GGGTGTCTTTATGCAGCAAGGATGCGATACTTCGC GGGTGCCTTTATGCAGCAAGGATGCGATATTTCG GGGTGCCTTTATGCAGCAAGGATGCGATATTTCG GGGTGCCTTTATGCAGCAAGGATGCGATATTTCG...A... GGGTGCCTTTATGCAGCAAGGATGCGATATTTCGCC...G... GGGTGCCTTTATGCGGCAAGGATGCGATATT GGGTGTCTTTATGCAGCAAGGATGCGATACTTCGC GGGTGCCTTTATGCGGCAAGGATGCGATATTTCG GGGTGCCTTTATGCGGCAAGGATGCGATATTTCG GGGTGCCTTTATGCGGCAAGGATGCGATATTTCG Reads with mismatches are checked for mis-mapping by Blat using the REDItoolBlatCorrection.py script. >r1 TATAGGGTGCCTTTATGCGGCAAGGATGCGATATT >r2 GGGTGTCTTTATGCAGCAAGGATGCGATACTTCGC A list of bad reads is printed out and used as an additional input file for REDItools. Each genomic position is explored and several filters are applied: A --> GGGGGAAGGGAAAGGGAGGAGAGTAAAAA For each read position we can recover different info as: - Read name - Position along the read - Map quality and so on Filters: Quality score > 25/30 Map quality > 40 Per base coverage > 10 Bases supporting variation > 3 Remove substitutions in homopolymeric regions > 5 bases Remove substitutions near splice sites Check Blat alignments of reads supporting the variation Use only uniquely mapping reads Use concordant paired-end reads Exclude PCR duplicates Trim few bases upstream and/or downstream of each read Use an editing background value (0.1) Exclude positions with multiple changes

28 RNA editing in human To profile RNA editing in human tissues we sequenced total RNA from 6 tissues in 3 Caucasian and non diseased Individuals (sex and age matched) using the Illumina HiSeq2500 platform. Paired end RNA-Seq reads (2x100) were generated according to the strand-oriented Illumina TruSeq kit. In addition, we produced WES and WGS (20x) reads from the same samples. ID TISSUE READ PAIRS PERCENT_DUPLICA PCT_RIBOSOMAL_ TION BASES PCT_MRNA_BASES PCT_CORREC T_STRAND_RE ADS 11 brain ,48 0, , , brain ,93 0, , , brain ,26 0, , , heart ,4 0, , , heart ,62 0, , , heart ,88 0, , , kidney ,23 0, , , kidney ,93 0, , , kidney ,18 0, , , liver ,55 0, , , liver ,53 0, , , liver ,15 0, , , lung ,8 0, , , lung ,93 0, , , lung ,53 0, , , muscle ,57 0, , , muscle ,78 0, , , muscle ,19 0, , , Picardi et al Sci. Rep. Statistics obtained by Picard on GencodeV19

29 Frequency Frequency RNA editing in human Most of the detected RNA editing events were A-to-G (>97%). Potential non canonical events were rare and showed frequency values less than in addition, the fraction of A-to-G changes in non-synonymous sites was notably high. 1 0,9 0,8 0,7 0,6 0,5 0,4 0,3 0,2 0,1 0 AC AG AT CA CG CT GA GC GT TA TC TG Observed nucleotide change BRAIN LUNG LIVER KIDNEY HEART MUSCLE Picardi et al Sci. Rep. 1 0,9 0,8 0,7 0,6 0,5 0,4 0,3 0,2 0,1 0 0,96 0,95 0,92 0,92 0,89 0,73 BRAIN KIDNEY LIVER LUNG HEART MUSCLE Tissue type

30 The Human Inosinome Classification of RNA editing sites. A) Partitioning of detected RNA editing sites in Alu elements (ALU), other repetitive regions (REP-NON-ALU) and non-repetitive regions (NON-REP). According to previous large-scale investigations, the vast majority of RNA editing sites resides in repetitive regions (97%). B) Distribution of editing events along gene structure. ALU REP-NON-ALU NON-REP UTR5 CDS intronic UTR3 ncrna intergenic A 8% 3% B 0.15% 0.05% 14% 10% 3% 73% 89% Picardi et al Sci. Rep.

31 % RNA editing events % RNA editing events The Human Inosinome To investigate the impact of RNA editing on human transcriptome, we mapped all detected events on Gencode (v19) annotations and discovered that 17,140 loci over 55,496 (31%) underwent RNA editing in their exons and/or introns. Interestingly, most of detected sites (92%) occurred in protein coding genes, modifying 13,062 loci out of 20,173 annotated (65%). Remaining A-to-I events (8%) were distributed in the non-coding RNA fraction (ncrna). RNA editing distribution in Gencode RNAs RNA editing distribution in Gencode ncrna ncrna mrna snorna processed_transcript sense_overlapping mirna sense_intronic Picardi et al Sci. Rep.

32 BRAIN_1 BRAIN_2 BRAIN_3 LUNG_1 LUNG_2 LUNG_3 LIVER_1 LIVER_2 LIVER_3 KIDNEY_1 KIDNEY_2 KIDNEY_3 HEART_1 HEART_2 HEART_3 MUSCLE_1 MUSCLE_2 MUSCLE_3 Absolute number of events x The Human Inosinome The number of detected A-to-I events varied greatly among tissues and individuals. This effect, described also in previous works, is mainly due to sequencing depth variation, stringent filters used to recover editing candidates and tissue specific roles of RNA editing. RNA Edits RNA HyperEdits Picardi et al Sci. Rep.

33 The Human Inosinome Despite the difference in the number of editing sites per sample, the distribution of RNA editing levels was quite similar across tissues and more evident within each tissue group. Picardi et al Sci. Rep.

34 The Human Inosinome We investigated the inosinome similarity across human tissues. Cluster analysis based on pairwise comparison of RNA editing levels per sample by the Spearman correlation coefficient, showed welldefined tissue segregation. Picardi et al Sci. Rep.

35 The Human Inosinome RNA editing in human is pervasive and occurs in the vast majority of mrna transcripts. From the outside: - Brain - Lung - Kidney - Liver - Heart - Muscle In red tissue exclusive eventss Picardi et al Sci. Rep.

36 Is Human Inosinome indispensable? Our genome wide screening indicates that more than 90% of RNA editing sites resides in known protein coding genes (affecting 65% of mrnas) and may profoundly affect transcriptome dynamics with a variety of functional consequences. Using our large collection of A-to-I modifications, we investigated the indispensability of RNA editing in human, exploring the relationship between inosinome and human diseases. We downloaded all known genes associated to human disorders from DisGeNET database comprising over associations between more than genes and diseases. Next, we calculated the enrichment in our set of edited protein-coding genes, if any. Surprisingly, we found that edited genes were consistently enriched in genes involved in neurological disorders and cancer. Disgenet ID GenesInDisease GenesInList Pvalue Term umls:c ,02E-05Neoplastic Process, Rhabdoid tumour of the kidney umls:c Intellectual Disability..Nervous System Diseases;Pathological Conditions, Signs and Symptoms;Behavior and Behavior Mechanisms;Mental 1,55E-06Disorders..Mental or Behavioral Dysfunction Substance-Related Disorders..Substance-Related Disorders;Mental Disorders..Mental or Behavioral umls:c ,40E-06Dysfunction Amyotrophic Lateral Sclerosis 2, Juvenile..Nervous System Diseases;Nutritional and Metabolic Diseases..Disease or umls:c ,93E-05Syndrome Tobacco Use Disorder..Substance-Related Disorders;Mental umls:c ,21E-212Disorders..Mental or Behavioral Dysfunction

37 REDIportal: The Human Inosinome We have extended RNA editing detection to 2642 RNAseq experiments from 55 body sites of GTEx Project. All events have been collected in a unique and comprehensive repository called REDIportal. Computational workfow used to load RNA editing sites in REDIportal. Raw data in Fastq format are quality checked by FASTQC and aligned onto the reference human genome by STAR. REDItools are then used to interrogate multiple read alignments using a large collection of known RNA editing sites from ATLAS repository and RADAR database. WGS data are finally included in REDItools tables and, in turn, stored in REDIportal. Picardi et al Nucleic Acids Research

38 REDIportal: The Human Inosinome Picardi et al Nucleic Acids Research

39 RNA editing in neurological disorders To evaluate RNA editing dysregulation in human neurological diseases, a total of 839 RNA- Seq experiments from 14 BioProjects were downloaded from the SRA repository. In addition, transcriptome data from 30 RNA-Seq experiments from sporadic ALS, Alzheimer and Parkinson diseases were generated in our laboratory and added to final list of samples.

40 RNA editing in neurological disorders Global RNA editing activity calculated through the Alu Editing index (AEI) and the Recoding Editing Index (REI) in Alzheimer s Disease (AD), showed for two different projects and brain area. Pvalues were calculated by the Mann-Whitney non parametric test. A) Hippocampus B) Broadmann Area 9

41 Direct RNA sequencing for RNA editing

42 Acknowledgments Lab of Bioinformatics and Comparative Genomics and at the University of Bari & IBIOM-CNR Prof. Eli Eisenberg at Tel Aviv University Dr. Erez Levanon at Bar Illan University Dr. Billy Li at Stanford University Italy Israel Actions

43 Single-cell Omics Single-cell transcriptome analyses of tissues and cell types. Cells from a healthy or pathological tissue are dissociated, analyzed independently with single-cell RNA-seq and clustered based on their gene expression profiles. Clustering of cells reveals a cell-type map that can be used to assess the composition of the tissue including the identification of new cell types or subtypes. These rich data can be used to address many questions of gene expression and regulation within or between cell types and between tissues. Bulk tissue Cell Dissociation Data Analysis NGS Library and Sequencing Darmanis et al. PNAS 2015 Cell Sorting and RNA isolation

44 RNA editing detection in single cells We have profiled RNA editing in 466 single cells of human brain cortex from living individuals, in which a transcriptomic analysis was already been completed (Darmanis et al. PNAS 2015). BAM file Picardi et al Nucleic Acids Research gdna RNA-Seq r1 r2 r3 r4 r5 GGGTGCCTTTATGCAGCAAGGATGCGATATTTCGCC...G... GGGTGCCTTTATGCGGCAAGGATGCGATATT GGGTGTCTTTATGCAGCAAGGATGCGATACTTCGC GGGTGCCTTTATGCGGCAAGGATGCGATATTTCG GGGTGCCTTTATGCGGCAAGGATGCGATATTTCG GGGTGCCTTTATGCGGCAAGGATGCGATATTTCG 4,6 million sites REDItools REDItools Picardi and Pesole 2013 Bioinformatics Known RNA editing sites

45 Frequency RNA editing in adult brain cells We found that the number of RNA editing events per cell was strongly correlated with the number of uniquely mapped reads. Strikingly, RNA editing levels (proportions of reads supporting an editing event at each known editing site) for individual cells showed a bimodal distribution with picks close to extreme values (0 and 1), a sort of all or nothing effect. Pearson correlation r : 0.85 Pvalue: 1.4 x RNA editing levels Picardi et al. RNA 2017

46 Examples of all or nothing effect RNA editing levels calculated in some neurons showing similar profiles with high frequencies near 0 and 1. Picardi et al. RNA 2017

47 Is this effect due to duplicated reads? This observation was not an artifact resulting from the presence of PCR duplicate reads, as PCR duplication was globally low (affecting on average 10% of aligned reads). Furthermore, raw and deduplicated datasets shared, on average, 95% of candidate editing sites and by position comparison of A-to-I levels showed a remarkable positive correlation (r=0.9998, P=0.0) Picardi et al. RNA 2017

48 Frequency Frequency RNA editing in adult brain cells After the removal of PCR duplicates, A-to-I editing levels of single cells continued to exhibit an extreme bimodal distribution. However, when scrnaseq reads were merged, mimicking an ensemble tissue, RNA editing levels displayed a classical unimodal distribution in which the majority of A-to-I editing levels were lower than 0.2, as previously observed in six human tissues (including brain cortex). RNA editing levels RNA editing levels RNA editing levels in three bulk tissues from Picardi et al Scientific Reports

49 RNA editing levels in LCL cells We have applied the same method to profile RNA editing in LCL cells (GM12878) from Marinov et al Genome Res. Authors sequenced total RNA extracted from 1 cell, 10 cells, 30 cells and 100 cells. RNA editing bimodality is visible in single cells and pools of 10 cells. When the number of cells increases, RNA editing distribution recapitulates the distribution of bulk tissues. The number of RNAseq reads per experiment was comparable across all samples. 1 cell 30 cells 28,678,213 28,979,309 29,424,677 26,994, cells 100 cells 45,697,590 24,736,654 26,571,449 28,696,144 Picardi et al. RNA 2017

50 Cell Type Is RNA editing cell type specific? The vast majority of RNA editing resides in Alu repetitive elements. To provide a more realistic estimate of global editing activity per cell, we calculated the Alu editing index (AEI) per cell as it represents the weighted average editing level across all expressed Alu sequences. To confirm cell specificity of RNA editing, we performed a non-metric multidimensional scaling (nmds) analysis, revealing four clusters corresponding to astrocytes, neurons, oligodendrocytes and OPCs. A B Cell Type AEI index Picardi et al. RNA 2017

51 Recoding RNA editing in single cells Recoding RNA editing sites comprise only a very limited fraction of inosinome (<1%). However, they may have profound functional consequences. We have explored RNA editing levels of 183 recoding sites in single cells. Picardi et al. RNA 2017

52 REDItools is slow for large dataset Although REDItools are designed to handle massive data, they require a lot of computational time in standard non-hpc infrastructures. In a sample of 2,600 public RNA- Seqs, REDItools took, in many cases, from 100 to 300 hours to complete a single experiment using only a core and 2GB of RAM (tests performed at INFN infrastructure in Bari comprising a server farm with 150 nodes and 4000 cores [AMD and Intel])

53 A The Human Inosinome The large number of detected RNA editing sites allowed us to investigate the sequence context flanking A-to-I changes. We observed G depletion one nucleotide downstream (-1) RNA editing sites and G enrichment one nucleotide upstream (+1) RNA editing sites. Strong avoidance of G in the first nucleotide downstream (-1) editing sites was also observed in other vertebrates and invertebrates. All genomic regions B Hyper edited regions C Non hyper edited regions

54 REDItools is slow for large dataset i-th nucleotide genome A T C C T C A A T C T T C G A T A C A C T A G C T G C T T G C A T C C T C A A T C T A C G A T A C T T C A A T C T T C G A T A C T A C T C A A T C T T C G A T A T C A T T C T T A G A C A C A C A T C T T C G A T A C A C T A G T T C G A T A C A C T A G C T A T C G A T A C A A T A G A T reads supporting the i- th nucleotide All intersecting reads are loaded all over again each time a new position is visited

55 First optimization: serial code A T C C T C A A T C T T C G A T A C A C T A G C T G C T T G C T C C T C A A T C T A C G A T A C T T C A A T C T T C G A T A C T A C T C A A T C T T C G A T A T C A T T C T T A G A C A C A C A T C T T C G A T A C A C T A G T T C G A T A C A C T A G C T A T C G A T A C A A T A G A T Position 2: only reads number 1 and 3 are loaded Position 3: only read number 2 is loaded Position 4: no new reads are loaded at all Position 5: only read number 4 is loaded and so on...

56 The Human Inosinome Hierarchical clustering analysis as well as the uneven distribution of A-to-I changes across tissues indicates that RNA editing profiles are strongly tissue dependent. This behaviour may be mainly due to tissue specific regulation of ADAR enzymes and only partially to variable RNA-Seq coverage among samples.

57 REDItools 1.0 Vs REDItools 2.0 more than 10 times faster

58 Second optimization: HPC code Differential intervals High coverage / thin intervals Low-coverage / wide intervals with approximately homogeneous computing time

59 REDItools 2.0 performance Strong scaling curve describing the behavior of the algorithm when multiple cores are used. On x-axis, we report the number of cores, while the y-axis indicates the time elapsed (in minutes) for completing the analysis using a certain number of cores. For example, by using 72 cores, the algorithm completed in 33 minutes approximately. The semi-transparent line shows the regression line interpolating data points.

60 Other tools to detect RNA editing Diroma et al Brief in Bioinf

61 Comparison in U87MG cell line Diroma et al Brief in Bioinf

62 RNA editing in fetal brain To further investigate the possibility that RNA editing profiles represent powerful signatures of cell type specificity, we analysed single fetal brain cells, since they are considerably different from any cell type in the adult brain and because RNA editing efficiency increases during brain development and, consequently, different editing patterns are expected between fetal and adult cells. Picardi et al. RNA 2017

63 Recoding RNA editing in fetal brain Notably, RNA editing activity at recoding sites was higher in adult than fetal neurons. In particular, the Q/R site in Gria2, linked to neurological disorders, was edited to high levels in fetal quiescent neurons but not in neuronal progenitors as previously assessed in vitro. Picardi et al. RNA 2017

64 RNA-Seq: RNA editing detection coverage effect Reads from GM12878 RNA-Seq (two replicates 202M and 240M of paired-end reads). From Ramaswami et al Nature Methods

65 Ramaswami et al Nature Methods RNA-Seq: RNA editing detection read mapper effect

66 The Human Inosinome We calculated the distribution of correlation values between the expression of ADARs (ADAR and ADARB1) and editing levels per each position. As background distribution we used the same dataset in which editing levels were randomly shuffled. Limiting the analysis to sites covered by at least 10 RNA reads, we interestingly found a striking and statistically significant positive correlation between ADAR expression and individual editing levels (Kolmogorov-Smirnov Pvalue=8.83* against the shuffled distribution)

genomics for systems biology / ISB2020 RNA sequencing (RNA-seq)

genomics for systems biology / ISB2020 RNA sequencing (RNA-seq) RNA sequencing (RNA-seq) Module Outline MO 13-Mar-2017 RNA sequencing: Introduction 1 WE 15-Mar-2017 RNA sequencing: Introduction 2 MO 20-Mar-2017 Paper: PMID 25954002: Human genomics. The human transcriptome

More information

Transcriptome Analysis

Transcriptome Analysis Transcriptome Analysis Data Preprocessing Sample Preparation Illumina Sequencing Demultiplexing Raw FastQ Reference Genome (fasta) Reference Annotation (GTF) Reference Genome Analysis Tophat Accepted hits

More information

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers Gordon Blackshields Senior Bioinformatician Source BioScience 1 To Cancer Genetics Studies

More information

Advance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library

Advance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library Advance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library Marilou Wijdicks International Product Manager Research For Life Science Research Only. Not for Use in Diagnostic Procedures.

More information

P. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( ); g ( ) Bioinformatics Center, Chang Gung University.

P. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( ); g ( ) Bioinformatics Center, Chang Gung University. Databases and Tools for High Throughput Sequencing Analysis P. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( ); g ( ) Bioinformatics Center, Chang Gung University. HTseq Platforms Applications on Biomedical Sciences

More information

Global regulation of alternative splicing by adenosine deaminase acting on RNA (ADAR)

Global regulation of alternative splicing by adenosine deaminase acting on RNA (ADAR) Global regulation of alternative splicing by adenosine deaminase acting on RNA (ADAR) O. Solomon, S. Oren, M. Safran, N. Deshet-Unger, P. Akiva, J. Jacob-Hirsch, K. Cesarkas, R. Kabesa, N. Amariglio, R.

More information

Selective depletion of abundant RNAs to enable transcriptome analysis of lowinput and highly-degraded RNA from FFPE breast cancer samples

Selective depletion of abundant RNAs to enable transcriptome analysis of lowinput and highly-degraded RNA from FFPE breast cancer samples DNA CLONING DNA AMPLIFICATION & PCR EPIGENETICS RNA ANALYSIS Selective depletion of abundant RNAs to enable transcriptome analysis of lowinput and highly-degraded RNA from FFPE breast cancer samples LIBRARY

More information

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc. Variant Classification Author: Mike Thiesen, Golden Helix, Inc. Overview Sequencing pipelines are able to identify rare variants not found in catalogs such as dbsnp. As a result, variants in these datasets

More information

Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells.

Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells. SUPPLEMENTAL FIGURE AND TABLE LEGENDS Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells. A) Cirbp mrna expression levels in various mouse tissues collected around the clock

More information

RNA-Seq Preparation Comparision Summary: Lexogen, Standard, NEB

RNA-Seq Preparation Comparision Summary: Lexogen, Standard, NEB RNA-Seq Preparation Comparision Summary: Lexogen, Standard, NEB CSF-NGS January 22, 214 Contents 1 Introduction 1 2 Experimental Details 1 3 Results And Discussion 1 3.1 ERCC spike ins............................................

More information

Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project

Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project Introduction RNA splicing is a critical step in eukaryotic gene

More information

Computer Science, Biology, and Biomedical Informatics (CoSBBI) Outline. Molecular Biology of Cancer AND. Goals/Expectations. David Boone 7/1/2015

Computer Science, Biology, and Biomedical Informatics (CoSBBI) Outline. Molecular Biology of Cancer AND. Goals/Expectations. David Boone 7/1/2015 Goals/Expectations Computer Science, Biology, and Biomedical (CoSBBI) We want to excite you about the world of computer science, biology, and biomedical informatics. Experience what it is like to be a

More information

RNA-seq Introduction

RNA-seq Introduction RNA-seq Introduction DNA is the same in all cells but which RNAs that is present is different in all cells There is a wide variety of different functional RNAs Which RNAs (and sometimes then translated

More information

Circular RNAs (circrnas) act a stable mirna sponges

Circular RNAs (circrnas) act a stable mirna sponges Circular RNAs (circrnas) act a stable mirna sponges cernas compete for mirnas Ancestal mrna (+3 UTR) Pseudogene RNA (+3 UTR homolgy region) The model holds true for all RNAs that share a mirna binding

More information

Simple, rapid, and reliable RNA sequencing

Simple, rapid, and reliable RNA sequencing Simple, rapid, and reliable RNA sequencing RNA sequencing applications RNA sequencing provides fundamental insights into how genomes are organized and regulated, giving us valuable information about the

More information

Ambient temperature regulated flowering time

Ambient temperature regulated flowering time Ambient temperature regulated flowering time Applications of RNAseq RNA- seq course: The power of RNA-seq June 7 th, 2013; Richard Immink Overview Introduction: Biological research question/hypothesis

More information

Analyse de données de séquençage haut débit

Analyse de données de séquençage haut débit Analyse de données de séquençage haut débit Vincent Lacroix Laboratoire de Biométrie et Biologie Évolutive INRIA ERABLE 9ème journée ITS 21 & 22 novembre 2017 Lyon https://its.aviesan.fr Sequencing is

More information

Obstacles and challenges in the analysis of microrna sequencing data

Obstacles and challenges in the analysis of microrna sequencing data Obstacles and challenges in the analysis of microrna sequencing data (mirna-seq) David Humphreys Genomics core Dr Victor Chang AC 1936-1991, Pioneering Cardiothoracic Surgeon and Humanitarian The ABCs

More information

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

Accessing and Using ENCODE Data Dr. Peggy J. Farnham 1 William M Keck Professor of Biochemistry Keck School of Medicine University of Southern California How many human genes are encoded in our 3x10 9 bp? C. elegans (worm) 959 cells and 1x10 8 bp 20,000

More information

MODULE 3: TRANSCRIPTION PART II

MODULE 3: TRANSCRIPTION PART II MODULE 3: TRANSCRIPTION PART II Lesson Plan: Title S. CATHERINE SILVER KEY, CHIYEDZA SMALL Transcription Part II: What happens to the initial (premrna) transcript made by RNA pol II? Objectives Explain

More information

BWA alignment to reference transcriptome and genome. Convert transcriptome mappings back to genome space

BWA alignment to reference transcriptome and genome. Convert transcriptome mappings back to genome space Whole genome sequencing Whole exome sequencing BWA alignment to reference transcriptome and genome Convert transcriptome mappings back to genome space genomes Filter on MQ, distance, Cigar string Annotate

More information

IPA Advanced Training Course

IPA Advanced Training Course IPA Advanced Training Course October 2013 Academia sinica Gene (Kuan Wen Chen) IPA Certified Analyst Agenda I. Data Upload and How to Run a Core Analysis II. Functional Interpretation in IPA Hands-on Exercises

More information

A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis

A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis Jian Xu, Ph.D. Children s Research Institute, UTSW Introduction Outline Overview of genomic and next-gen sequencing technologies

More information

Methods: Biological Data

Methods: Biological Data Transcriptome analysis of short read Illumina RNA sequencing: investigating baseline variability in gene expression levels and splice variants among human brain and Lymphoblastoid samples Abstract Understanding

More information

Supplementary Figures

Supplementary Figures Supplementary Figures Supplementary Figure 1. Heatmap of GO terms for differentially expressed genes. The terms were hierarchically clustered using the GO term enrichment beta. Darker red, higher positive

More information

SpliceDB: database of canonical and non-canonical mammalian splice sites

SpliceDB: database of canonical and non-canonical mammalian splice sites 2001 Oxford University Press Nucleic Acids Research, 2001, Vol. 29, No. 1 255 259 SpliceDB: database of canonical and non-canonical mammalian splice sites M.Burset,I.A.Seledtsov 1 and V. V. Solovyev* The

More information

DNA Sequence Bioinformatics Analysis with the Galaxy Platform

DNA Sequence Bioinformatics Analysis with the Galaxy Platform DNA Sequence Bioinformatics Analysis with the Galaxy Platform University of São Paulo, Brazil 28 July - 1 August 2014 Dave Clements Johns Hopkins University Robson Francisco de Souza University of São

More information

Histone Modifications Are Associated with Transcript Isoform Diversity in Normal and Cancer Cells

Histone Modifications Are Associated with Transcript Isoform Diversity in Normal and Cancer Cells Histone Modifications Are Associated with Transcript Isoform Diversity in Normal and Cancer Cells Ondrej Podlaha 1, Subhajyoti De 2,3,4, Mithat Gonen 5, Franziska Michor 1 * 1 Department of Biostatistics

More information

Multi-omics data integration colon cancer using proteogenomics approach

Multi-omics data integration colon cancer using proteogenomics approach Dept. of Medical Oncology Multi-omics data integration colon cancer using proteogenomics approach DTL Focus meeting, 29 August 2016 Thang Pham OncoProteomics Laboratory, Dept. of Medical Oncology VU University

More information

DNA-seq Bioinformatics Analysis: Copy Number Variation

DNA-seq Bioinformatics Analysis: Copy Number Variation DNA-seq Bioinformatics Analysis: Copy Number Variation Elodie Girard elodie.girard@curie.fr U900 institut Curie, INSERM, Mines ParisTech, PSL Research University Paris, France NGS Applications 5C HiC DNA-seq

More information

Genetics. Instructor: Dr. Jihad Abdallah Transcription of DNA

Genetics. Instructor: Dr. Jihad Abdallah Transcription of DNA Genetics Instructor: Dr. Jihad Abdallah Transcription of DNA 1 3.4 A 2 Expression of Genetic information DNA Double stranded In the nucleus Transcription mrna Single stranded Translation In the cytoplasm

More information

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq Philipp Bucher Wednesday January 21, 2009 SIB graduate school course EPFL, Lausanne ChIP-seq against histone variants: Biological

More information

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data Breast cancer Inferring Transcriptional Module from Breast Cancer Profile Data Breast Cancer and Targeted Therapy Microarray Profile Data Inferring Transcriptional Module Methods CSC 177 Data Warehousing

More information

RNA SEQUENCING AND DATA ANALYSIS

RNA SEQUENCING AND DATA ANALYSIS RNA SEQUENCING AND DATA ANALYSIS Length of mrna transcripts in the human genome 5,000 5,000 4,000 3,000 2,000 4,000 1,000 0 0 200 400 600 800 3,000 2,000 1,000 0 0 2,000 4,000 6,000 8,000 10,000 Length

More information

Deploying the full transcriptome using RNA sequencing. Jo Vandesompele, CSO and co-founder The Non-Coding Genome May 12, 2016, Leuven

Deploying the full transcriptome using RNA sequencing. Jo Vandesompele, CSO and co-founder The Non-Coding Genome May 12, 2016, Leuven Deploying the full transcriptome using RNA sequencing Jo Vandesompele, CSO and co-founder The Non-Coding Genome May 12, 2016, Leuven Roadmap Biogazelle the power of RNA reasons to study non-coding RNA

More information

Small RNAs and how to analyze them using sequencing

Small RNAs and how to analyze them using sequencing Small RNAs and how to analyze them using sequencing RNA-seq Course November 8th 2017 Marc Friedländer ComputaAonal RNA Biology Group SciLifeLab / Stockholm University Special thanks to Jakub Westholm for

More information

Eukaryotic small RNA Small RNAseq data analysis for mirna identification

Eukaryotic small RNA Small RNAseq data analysis for mirna identification Eukaryotic small RNA Small RNAseq data analysis for mirna identification P. Bardou, C. Gaspin, S. Maman, J. Mariette, O. Rué, M. Zytnicki INRA Sigenae Toulouse INRA MIA Toulouse GenoToul Bioinfo INRA MaIAGE

More information

Iso-Seq Method Updates and Target Enrichment Without Amplification for SMRT Sequencing

Iso-Seq Method Updates and Target Enrichment Without Amplification for SMRT Sequencing Iso-Seq Method Updates and Target Enrichment Without Amplification for SMRT Sequencing PacBio Americas User Group Meeting Sample Prep Workshop June.27.2017 Tyson Clark, Ph.D. For Research Use Only. Not

More information

Breast and ovarian cancer in Serbia: the importance of mutation detection in hereditary predisposition genes using NGS

Breast and ovarian cancer in Serbia: the importance of mutation detection in hereditary predisposition genes using NGS Breast and ovarian cancer in Serbia: the importance of mutation detection in hereditary predisposition genes using NGS dr sc. Ana Krivokuća Laboratory for molecular genetics Institute for Oncology and

More information

MODULE 4: SPLICING. Removal of introns from messenger RNA by splicing

MODULE 4: SPLICING. Removal of introns from messenger RNA by splicing Last update: 05/10/2017 MODULE 4: SPLICING Lesson Plan: Title MEG LAAKSO Removal of introns from messenger RNA by splicing Objectives Identify splice donor and acceptor sites that are best supported by

More information

Cross species analysis of genomics data. Computational Prediction of mirnas and their targets

Cross species analysis of genomics data. Computational Prediction of mirnas and their targets 02-716 Cross species analysis of genomics data Computational Prediction of mirnas and their targets Outline Introduction Brief history mirna Biogenesis Why Computational Methods? Computational Methods

More information

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types.

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types. Supplementary Figure 1 Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types. (a) Pearson correlation heatmap among open chromatin profiles of different

More information

EXPression ANalyzer and DisplayER

EXPression ANalyzer and DisplayER EXPression ANalyzer and DisplayER Tom Hait Aviv Steiner Igor Ulitsky Chaim Linhart Amos Tanay Seagull Shavit Rani Elkon Adi Maron-Katz Dorit Sagir Eyal David Roded Sharan Israel Steinfeld Yossi Shiloh

More information

Data mining with Ensembl Biomart. Stéphanie Le Gras

Data mining with Ensembl Biomart. Stéphanie Le Gras Data mining with Ensembl Biomart Stéphanie Le Gras (slegras@igbmc.fr) Guidelines Genome data Genome browsers Getting access to genomic data: Ensembl/BioMart 2 Genome Sequencing Example: Human genome 2000:

More information

AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits

AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits Accelerating clinical research Next-generation sequencing (NGS) has the ability to interrogate many different genes and detect

More information

he micrornas of Caenorhabditis elegans (Lim et al. Genes & Development 2003)

he micrornas of Caenorhabditis elegans (Lim et al. Genes & Development 2003) MicroRNAs: Genomics, Biogenesis, Mechanism, and Function (D. Bartel Cell 2004) he micrornas of Caenorhabditis elegans (Lim et al. Genes & Development 2003) Vertebrate MicroRNA Genes (Lim et al. Science

More information

The Alternative Choice of Constitutive Exons throughout Evolution

The Alternative Choice of Constitutive Exons throughout Evolution The Alternative Choice of Constitutive Exons throughout Evolution Galit Lev-Maor 1[, Amir Goren 1[, Noa Sela 1[, Eddo Kim 1, Hadas Keren 1, Adi Doron-Faigenboim 2, Shelly Leibman-Barak 3, Tal Pupko 2,

More information

Not IN Our Genes - A Different Kind of Inheritance.! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014

Not IN Our Genes - A Different Kind of Inheritance.! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014 Not IN Our Genes - A Different Kind of Inheritance! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014 Epigenetics in Mainstream Media Epigenetics *Current definition:

More information

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1 Supplementary Figure 1 Frequency of alternative-cassette-exon engagement with the ribosome is consistent across data from multiple human cell types and from mouse stem cells. Box plots showing AS frequency

More information

TRANSCRIPTION. DNA à mrna

TRANSCRIPTION. DNA à mrna TRANSCRIPTION DNA à mrna Central Dogma Animation DNA: The Secret of Life (from PBS) http://www.youtube.com/watch? v=41_ne5ms2ls&list=pl2b2bd56e908da696&index=3 Transcription http://highered.mcgraw-hill.com/sites/0072507470/student_view0/

More information

On the Reproducibility of TCGA Ovarian Cancer MicroRNA Profiles

On the Reproducibility of TCGA Ovarian Cancer MicroRNA Profiles On the Reproducibility of TCGA Ovarian Cancer MicroRNA Profiles Ying-Wooi Wan 1,2,4, Claire M. Mach 2,3, Genevera I. Allen 1,7,8, Matthew L. Anderson 2,4,5 *, Zhandong Liu 1,5,6,7 * 1 Departments of Pediatrics

More information

MicroRNAs, RNA Modifications, RNA Editing. Bora E. Baysal MD, PhD Oncology for Scientists Lecture Tue, Oct 17, 2017, 3:30 PM - 5:00 PM

MicroRNAs, RNA Modifications, RNA Editing. Bora E. Baysal MD, PhD Oncology for Scientists Lecture Tue, Oct 17, 2017, 3:30 PM - 5:00 PM MicroRNAs, RNA Modifications, RNA Editing Bora E. Baysal MD, PhD Oncology for Scientists Lecture Tue, Oct 17, 2017, 3:30 PM - 5:00 PM Expanding world of RNAs mrna, messenger RNA (~20,000) trna, transfer

More information

Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research

Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research Application Note Authors John McGuigan, Megan Manion,

More information

MicroRNA expression profiling and functional analysis in prostate cancer. Marco Folini s.c. Ricerca Traslazionale DOSL

MicroRNA expression profiling and functional analysis in prostate cancer. Marco Folini s.c. Ricerca Traslazionale DOSL MicroRNA expression profiling and functional analysis in prostate cancer Marco Folini s.c. Ricerca Traslazionale DOSL What are micrornas? For almost three decades, the alteration of protein-coding genes

More information

High AU content: a signature of upregulated mirna in cardiac diseases

High AU content: a signature of upregulated mirna in cardiac diseases https://helda.helsinki.fi High AU content: a signature of upregulated mirna in cardiac diseases Gupta, Richa 2010-09-20 Gupta, R, Soni, N, Patnaik, P, Sood, I, Singh, R, Rawal, K & Rani, V 2010, ' High

More information

ChIP-seq hands-on. Iros Barozzi, Campus IFOM-IEO (Milan) Saverio Minucci, Gioacchino Natoli Labs

ChIP-seq hands-on. Iros Barozzi, Campus IFOM-IEO (Milan) Saverio Minucci, Gioacchino Natoli Labs ChIP-seq hands-on Iros Barozzi, Campus IFOM-IEO (Milan) Saverio Minucci, Gioacchino Natoli Labs Main goals Becoming familiar with essential tools and formats Visualizing and contextualizing raw data Understand

More information

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 PGAR: ASD Candidate Gene Prioritization System Using Expression Patterns Steven Cogill and Liangjiang Wang Department of Genetics and

More information

Long non-coding RNAs

Long non-coding RNAs Long non-coding RNAs Dominic Rose Bioinformatics Group, University of Freiburg Bled, Feb. 2011 Outline De novo prediction of long non-coding RNAs (lncrnas) Genome-wide RNA gene-finding Intrinsic properties

More information

Single-strand DNA library preparation improves sequencing of formalin-fixed and paraffin-embedded (FFPE) cancer DNA

Single-strand DNA library preparation improves sequencing of formalin-fixed and paraffin-embedded (FFPE) cancer DNA www.impactjournals.com/oncotarget/ Oncotarget, Supplementary Materials 2016 Single-strand DNA library preparation improves sequencing of formalin-fixed and paraffin-embedded (FFPE) DNA Supplementary Materials

More information

Transcriptional control in Eukaryotes: (chapter 13 pp276) Chromatin structure affects gene expression. Chromatin Array of nuc

Transcriptional control in Eukaryotes: (chapter 13 pp276) Chromatin structure affects gene expression. Chromatin Array of nuc Transcriptional control in Eukaryotes: (chapter 13 pp276) Chromatin structure affects gene expression Chromatin Array of nuc 1 Transcriptional control in Eukaryotes: Chromatin undergoes structural changes

More information

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction Optimization strategy of Copy Number Variant calling using Multiplicom solutions Michael Vyverman, PhD; Laura Standaert, PhD and Wouter Bossuyt, PhD Abstract Copy number variations (CNVs) represent a significant

More information

An Analysis of MDM4 Alternative Splicing and Effects Across Cancer Cell Lines

An Analysis of MDM4 Alternative Splicing and Effects Across Cancer Cell Lines An Analysis of MDM4 Alternative Splicing and Effects Across Cancer Cell Lines Kevin Hu Mentor: Dr. Mahmoud Ghandi 7th Annual MIT PRIMES Conference May 2021, 2017 Outline Introduction MDM4 Isoforms Methodology

More information

ACE ImmunoID Biomarker Discovery Solutions ACE ImmunoID Platform for Tumor Immunogenomics

ACE ImmunoID Biomarker Discovery Solutions ACE ImmunoID Platform for Tumor Immunogenomics ACE ImmunoID Biomarker Discovery Solutions ACE ImmunoID Platform for Tumor Immunogenomics Precision Genomics for Immuno-Oncology Personalis, Inc. ACE ImmunoID When one biomarker doesn t tell the whole

More information

Lecture 8 Understanding Transcription RNA-seq analysis. Foundations of Computational Systems Biology David K. Gifford

Lecture 8 Understanding Transcription RNA-seq analysis. Foundations of Computational Systems Biology David K. Gifford Lecture 8 Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology David K. Gifford 1 Lecture 8 RNA-seq Analysis RNA-seq principles How can we characterize mrna isoform

More information

MapSplice: Accurate Mapping of RNA-Seq Reads for Splice Junction Discovery

MapSplice: Accurate Mapping of RNA-Seq Reads for Splice Junction Discovery University of Kentucky UKnowledge Computer Science Faculty Publications Computer Science 0-200 MapSplice: Accurate Mapping of RNA-Seq Reads for Splice Junction Discovery Kai Wang University of Kentucky

More information

Investigating rare diseases with Agilent NGS solutions

Investigating rare diseases with Agilent NGS solutions Investigating rare diseases with Agilent NGS solutions Chitra Kotwaliwale, Ph.D. 1 Rare diseases affect 350 million people worldwide 7,000 rare diseases 80% are genetic 60 million affected in the US, Europe

More information

Supplementary Figure 1: Features of IGLL5 Mutations in CLL: a) Representative IGV screenshot of first

Supplementary Figure 1: Features of IGLL5 Mutations in CLL: a) Representative IGV screenshot of first Supplementary Figure 1: Features of IGLL5 Mutations in CLL: a) Representative IGV screenshot of first intron IGLL5 mutation depicting biallelic mutations. Red arrows highlight the presence of out of phase

More information

Computational aspects of ChIP-seq. John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory

Computational aspects of ChIP-seq. John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory Computational aspects of ChIP-seq John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory ChIP-seq Using highthroughput sequencing to investigate DNA

More information

Supplemental Data. Integrating omics and alternative splicing i reveals insights i into grape response to high temperature

Supplemental Data. Integrating omics and alternative splicing i reveals insights i into grape response to high temperature Supplemental Data Integrating omics and alternative splicing i reveals insights i into grape response to high temperature Jianfu Jiang 1, Xinna Liu 1, Guotian Liu, Chonghuih Liu*, Shaohuah Li*, and Lijun

More information

Arabidopsis thaliana small RNA Sequencing. Report

Arabidopsis thaliana small RNA Sequencing. Report Arabidopsis thaliana small RNA Sequencing Report September 2015 Project Information Client Name Client Company / Institution Macrogen Order Number Order ID Species Arabidopsis thaliana Reference UCSC hg19

More information

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1 Supplementary Figure 1 U1 inhibition causes a shift of RNA-seq reads from exons to introns. (a) Evidence for the high purity of 4-shU-labeled RNAs used for RNA-seq. HeLa cells transfected with control

More information

NEXT GENERATION SEQUENCING. R. Piazza (MD, PhD) Dept. of Medicine and Surgery, University of Milano-Bicocca

NEXT GENERATION SEQUENCING. R. Piazza (MD, PhD) Dept. of Medicine and Surgery, University of Milano-Bicocca NEXT GENERATION SEQUENCING R. Piazza (MD, PhD) Dept. of Medicine and Surgery, University of Milano-Bicocca SANGER SEQUENCING 5 3 3 5 + Capillary Electrophoresis DNA NEXT GENERATION SEQUENCING SOLEXA-ILLUMINA

More information

Transcript reconstruction

Transcript reconstruction Transcript reconstruction Summary I Data types, file formats and utilities Annotation: Genomic regions Genes Peaks bedtools Alignment: Map reads BAM/SAM Samtools Aggregation: Summary files Wig (UCSC) TDF

More information

AVENIO ctdna Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB

AVENIO ctdna Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB Analysis Kits Next-generation performance in liquid biopsies 2 Accelerating clinical research From liquid biopsy to next-generation

More information

NGS in tissue and liquid biopsy

NGS in tissue and liquid biopsy NGS in tissue and liquid biopsy Ana Vivancos, PhD Referencias So, why NGS in the clinics? 2000 Sanger Sequencing (1977-) 2016 NGS (2006-) ABIPrism (Applied Biosystems) Up to 2304 per day (96 sequences

More information

Nature Methods: doi: /nmeth.3115

Nature Methods: doi: /nmeth.3115 Supplementary Figure 1 Analysis of DNA methylation in a cancer cohort based on Infinium 450K data. RnBeads was used to rediscover a clinically distinct subgroup of glioblastoma patients characterized by

More information

mirna Dr. S Hosseini-Asl

mirna Dr. S Hosseini-Asl mirna Dr. S Hosseini-Asl 1 2 MicroRNAs (mirnas) are small noncoding RNAs which enhance the cleavage or translational repression of specific mrna with recognition site(s) in the 3 - untranslated region

More information

RNA SEQUENCING AND DATA ANALYSIS

RNA SEQUENCING AND DATA ANALYSIS RNA SEQUENCING AND DATA ANALYSIS Download slides and package http://odin.mdacc.tmc.edu/~rverhaak/package.zip http://odin.mdacc.tmc.edu/~rverhaak/rna-seqlecture.zip Overview Introduction into the topic

More information

Pseudogenes transcribed in breast invasive carcinoma show subtype-specific expression and cerna potential

Pseudogenes transcribed in breast invasive carcinoma show subtype-specific expression and cerna potential Welch et al. BMC Genomics (2015) 16:113 DOI 10.1186/s12864-015-1227-8 RESEARCH ARTICLE Open Access Pseudogenes transcribed in breast invasive carcinoma show subtype-specific expression and cerna potential

More information

Elevated RNA Editing Activity Is a Major Contributor to Transcriptomic Diversity in Tumors

Elevated RNA Editing Activity Is a Major Contributor to Transcriptomic Diversity in Tumors Cell Reports Supplemental Information Elevated RNA Editing Activity Is a Major Contributor to Transcriptomic Diversity in s Nurit Paz-Yaacov, Lily Bazak, Ilana Buchumenski, Hagit T. Porath, Miri Danan-Gotthold,

More information

MEDICAL GENOMICS LABORATORY. Next-Gen Sequencing and Deletion/Duplication Analysis of NF1 Only (NF1-NG)

MEDICAL GENOMICS LABORATORY. Next-Gen Sequencing and Deletion/Duplication Analysis of NF1 Only (NF1-NG) Next-Gen Sequencing and Deletion/Duplication Analysis of NF1 Only (NF1-NG) Ordering Information Acceptable specimen types: Fresh blood sample (3-6 ml EDTA; no time limitations associated with receipt)

More information

For all of the following, you will have to use this website to determine the answers:

For all of the following, you will have to use this website to determine the answers: For all of the following, you will have to use this website to determine the answers: http://blast.ncbi.nlm.nih.gov/blast.cgi We are going to be using the programs under this heading: Answer the following

More information

Transcriptome and isoform reconstruc1on with short reads. Tangled up in reads

Transcriptome and isoform reconstruc1on with short reads. Tangled up in reads Transcriptome and isoform reconstruc1on with short reads Tangled up in reads Topics of this lecture Mapping- based reconstruc1on methods Case study: The domes1c dog De- novo reconstruc1on method Trinity

More information

Dr Rick Tearle Senior Applications Specialist, EMEA Complete Genomics Complete Genomics, Inc.

Dr Rick Tearle Senior Applications Specialist, EMEA Complete Genomics Complete Genomics, Inc. Dr Rick Tearle Senior Applications Specialist, EMEA Complete Genomics Topics Overview of Data Processing Pipeline Overview of Data Files 2 DNA Nano-Ball (DNB) Read Structure Genome : acgtacatgcattcacacatgcttagctatctctcgccag

More information

York criteria, 6 RA patients and 10 age- and gender-matched healthy controls (HCs).

York criteria, 6 RA patients and 10 age- and gender-matched healthy controls (HCs). MATERIALS AND METHODS Study population Blood samples were obtained from 15 patients with AS fulfilling the modified New York criteria, 6 RA patients and 10 age- and gender-matched healthy controls (HCs).

More information

High-throughput transcriptome sequencing

High-throughput transcriptome sequencing High-throughput transcriptome sequencing Erik Kristiansson (erik.kristiansson@zool.gu.se) Department of Zoology Department of Neuroscience and Physiology University of Gothenburg, Sweden Outline Genome

More information

7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans.

7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans. Supplementary Figure 1 7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans. Regions targeted by the Even and Odd ChIRP probes mapped to a secondary structure model 56 of the

More information

P. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( 鄧致剛 ); g ( 黄栢榕 ) Bioinformatics Center, Chang Gung University.

P. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( 鄧致剛 ); g ( 黄栢榕 ) Bioinformatics Center, Chang Gung University. Small RNA High Throughput Sequencing Analysis I P. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( 鄧致剛 ); g ( 黄栢榕 ) Bioinformatics Center, Chang Gung University. Prominent members of the RNA family Classic RNAs mediating

More information

Introduction retroposon

Introduction retroposon 17.1 - Introduction A retrovirus is an RNA virus able to convert its sequence into DNA by reverse transcription A retroposon (retrotransposon) is a transposon that mobilizes via an RNA form; the DNA element

More information

To test the possible source of the HBV infection outside the study family, we searched the Genbank

To test the possible source of the HBV infection outside the study family, we searched the Genbank Supplementary Discussion The source of hepatitis B virus infection To test the possible source of the HBV infection outside the study family, we searched the Genbank and HBV Database (http://hbvdb.ibcp.fr),

More information

Introduction. Introduction

Introduction. Introduction Introduction We are leveraging genome sequencing data from The Cancer Genome Atlas (TCGA) to more accurately define mutated and stable genes and dysregulated metabolic pathways in solid tumors. These efforts

More information

Nature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality.

Nature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality. Supplementary Figure 1 Assessment of sample purity and quality. (a) Hematoxylin and eosin staining of formaldehyde-fixed, paraffin-embedded sections from a human testis biopsy collected concurrently with

More information

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays Supplementary Materials RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays Junhee Seok 1*, Weihong Xu 2, Ronald W. Davis 2, Wenzhong Xiao 2,3* 1 School of Electrical Engineering,

More information

Cancer Informatics Lecture

Cancer Informatics Lecture Cancer Informatics Lecture Mayo-UIUC Computational Genomics Course June 22, 2018 Krishna Rani Kalari Ph.D. Associate Professor 2017 MFMER 3702274-1 Outline The Cancer Genome Atlas (TCGA) Genomic Data Commons

More information

VirusDetect pipeline - virus detection with small RNA sequencing

VirusDetect pipeline - virus detection with small RNA sequencing VirusDetect pipeline - virus detection with small RNA sequencing CSC webinar 16.1.2018 Eija Korpelainen, Kimmo Mattila, Maria Lehtivaara Big thanks to Jan Kreuze and Jari Valkonen! Outline Small interfering

More information

a) List of KMTs targeted in the shrna screen. The official symbol, KMT designation,

a) List of KMTs targeted in the shrna screen. The official symbol, KMT designation, Supplementary Information Supplementary Figures Supplementary Figure 1. a) List of KMTs targeted in the shrna screen. The official symbol, KMT designation, gene ID and specifities are provided. Those highlighted

More information

Studying Alternative Splicing

Studying Alternative Splicing Studying Alternative Splicing Meelis Kull PhD student in the University of Tartu supervisor: Jaak Vilo CS Theory Days Rõuge 27 Overview Alternative splicing Its biological function Studying splicing Technology

More information

Alternative splicing. Biosciences 741: Genomics Fall, 2013 Week 6

Alternative splicing. Biosciences 741: Genomics Fall, 2013 Week 6 Alternative splicing Biosciences 741: Genomics Fall, 2013 Week 6 Function(s) of RNA splicing Splicing of introns must be completed before nuclear RNAs can be exported to the cytoplasm. This led to early

More information

Inference of Isoforms from Short Sequence Reads

Inference of Isoforms from Short Sequence Reads Inference of Isoforms from Short Sequence Reads Tao Jiang Department of Computer Science and Engineering University of California, Riverside Tsinghua University Joint work with Jianxing Feng and Wei Li

More information

A Statistical Framework for Classification of Tumor Type from microrna Data

A Statistical Framework for Classification of Tumor Type from microrna Data DEGREE PROJECT IN MATHEMATICS, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2016 A Statistical Framework for Classification of Tumor Type from microrna Data JOSEFINE RÖHSS KTH ROYAL INSTITUTE OF TECHNOLOGY

More information