Small RNAs and how to analyze them using sequencing

Similar documents
Small RNAs and how to analyze them using sequencing

Bi 8 Lecture 17. interference. Ellen Rothenberg 1 March 2016

Eukaryotic small RNA Small RNAseq data analysis for mirna identification

Canadian Bioinforma1cs Workshops

RNA- seq Introduc1on. Promises and pi7alls

CRS4 Seminar series. Inferring the functional role of micrornas from gene expression data CRS4. Biomedicine. Bioinformatics. Paolo Uva July 11, 2012

MicroRNAs, RNA Modifications, RNA Editing. Bora E. Baysal MD, PhD Oncology for Scientists Lecture Tue, Oct 17, 2017, 3:30 PM - 5:00 PM

RNA-seq Introduction

he micrornas of Caenorhabditis elegans (Lim et al. Genes & Development 2003)

Prediction of micrornas and their targets

Supplementary Figure 1

Cross species analysis of genomics data. Computational Prediction of mirnas and their targets

mirna Dr. S Hosseini-Asl

MicroRNA and Male Infertility: A Potential for Diagnosis

micrornas (mirna) and Biomarkers

genomics for systems biology / ISB2020 RNA sequencing (RNA-seq)

Classifica4on. CSCI1950 Z Computa4onal Methods for Biology Lecture 18. Ben Raphael April 8, hip://cs.brown.edu/courses/csci1950 z/

High AU content: a signature of upregulated mirna in cardiac diseases

Supplemental Figure 1. Small RNA size distribution from different soybean tissues.

Circular RNAs (circrnas) act a stable mirna sponges

Small RNA-Seq and profiling

NOVEL FUNCTION OF mirnas IN REGULATING GENE EXPRESSION. Ana M. Martinez

Transcriptional control in Eukaryotes: (chapter 13 pp276) Chromatin structure affects gene expression. Chromatin Array of nuc

Cellular MicroRNA and P Bodies Modulate Host-HIV-1 Interactions. 指導教授 : 張麗冠博士 演講者 : 黃柄翰 Date: 2009/10/19

Nature Structural & Molecular Biology: doi: /nsmb.2419

MicroRNA in Cancer Karen Dybkær 2013

Phenomena first observed in petunia

Marta Puerto Plasencia. microrna sponges

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

V16: involvement of micrornas in GRNs

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1. Differential expression of mirnas from the pri-mir-17-92a locus.

RNA interference induced hepatotoxicity results from loss of the first synthesized isoform of microrna-122 in mice

STAT1 regulates microrna transcription in interferon γ stimulated HeLa cells

RNA- seq Introduc1on. Promises and pi7alls

Chapter 2. Investigation into mir-346 Regulation of the nachr α5 Subunit

RNA-Seq Preparation Comparision Summary: Lexogen, Standard, NEB

Human breast milk mirna, maternal probiotic supplementation and atopic dermatitis in offsrping

a) List of KMTs targeted in the shrna screen. The official symbol, KMT designation,

Profiles of gene expression & diagnosis/prognosis of cancer. MCs in Advanced Genetics Ainoa Planas Riverola

Transcriptome-wide analysis of microrna expression in the malaria mosquito Anopheles gambiae

MicroRNA expression profiling and functional analysis in prostate cancer. Marco Folini s.c. Ricerca Traslazionale DOSL

microrna Presented for: Presented by: Date:

Transcript reconstruction

Transcriptome profiling of the developing male germ line identifies the mir-29 family as a global regulator during meiosis

Obstacles and challenges in the analysis of microrna sequencing data

Supplementary Figure 1

omiras: MicroRNA regulation of gene expression

Post-transcriptional regulation of an intronic microrna

mirna-guided regulation at the molecular level

MicroRNAs and Cancer

Copy Number Varia/on Detec/on. Alex Mawla UCD Genome Center Bioinforma5cs Core Tuesday June 16, 2015

Mature microrna identification via the use of a Naive Bayes classifier

Transcriptome Analysis

RNA - protein interactions in mrna decay A case study on TTP and HuR in AMD

Chapter 10 - Post-transcriptional Gene Control

Alternative RNA processing: Two examples of complex eukaryotic transcription units and the effect of mutations on expression of the encoded proteins.

Regulation of Gene Expression in Eukaryotes

Introduction to Systems Biology of Cancer Lecture 2

MicroRNA dysregulation in cancer. Systems Plant Microbiology Hyun-Hee Lee

Inferring condition-specific mirna activity from matched mirna and mrna expression data

Improved annotation of C. elegans micrornas by deep sequencing reveals structures associated with processing by Drosha and Dicer

The RNA revolution: rewriting the fundamentals of genetics

Overview: Conducting the Genetic Orchestra Prokaryotes and eukaryotes alter gene expression in response to their changing environment

Transcriptome and isoform reconstruc1on with short reads. Tangled up in reads

Intrinsic cellular defenses against virus infection

BIMM 143. RNA sequencing overview. Genome Informatics II. Barry Grant. Lecture In vivo. In vitro.

sirna count per 50 kb small RNAs matching the direct strand Repeat length (bp) per 50 kb repeats in the chromosome

Identification of mirnas in Eucalyptus globulus Plant by Computational Methods

Human Genome: Mapping, Sequencing Techniques, Diseases

Ch. 18 Regulation of Gene Expression

Prokaryotes and eukaryotes alter gene expression in response to their changing environment

10/31/2017. micrornas and cancer. From the one gene-one enzyme hypothesis to. microrna DNA RNA. Transcription factors.

ANALYSIS OF SOYBEAN SMALL RNA POPULATIONS HLAING HLAING WIN THESIS

Processing of RNA II Biochemistry 302. February 13, 2006

MicroRNAs: a new source of biomarkers in radiation response. Simone Moertl, Helmholtz Centre Munich

mirna seq of mouse brain regions

Sta$s$cs is Easy. Dennis Shasha From a book co- wri7en with Manda Wilson

MicroRNAs: novel regulators in skin research

Evidence for the biogenesis of more than 1,000 novel human micrornas

Processing of RNA II Biochemistry 302. February 14, 2005 Bob Kelm

MicroRNAs: molecular features and role in cancer.

Supplementary information

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

MicroRNAs: Regulatory Function and Potential for Gene Therapy

Epigenetic Principles and Mechanisms Underlying Nervous System Function in Health and Disease Mark F. Mehler MD, FAAN

Arabidopsis thaliana small RNA Sequencing. Report

Polyomaviridae. Spring

P. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( 鄧致剛 ); g ( 黄栢榕 ) Bioinformatics Center, Chang Gung University.

COMPUTATIONAL ANALYSIS OF SMALL RNAs IN MAIZE MUTANTS WITH DEFECTS IN DEVELOPMENT AND PARAMUTATION. Reza Hammond

Milk micro-rna information and lactation

Processing of RNA II Biochemistry 302. February 18, 2004 Bob Kelm

High-Throughput Sequencing Course

Zhao et al. BMC Bioinformatics (2017) 18:180 DOI /s

REGULATED AND NONCANONICAL SPLICING

Lesson Flowchart. Why Gene Expression Regulation?; microrna in Gene Regulation: an overview; microrna Genomics and Biogenesis;

Santosh Patnaik, MD, PhD! Assistant Member! Department of Thoracic Surgery! Roswell Park Cancer Institute!

Long non coding RNA in the pea aphid; iden3fica3on and compara3ve expression in sexual and asexual embryos

MicroRNAs and Their Regulatory Roles in Plants

RNA-seq: filtering, quality control and visualisation. COMBINE RNA-seq Workshop

Analysis of small RNAs from Drosophila Schneider cells using the Small RNA assay on the Agilent 2100 bioanalyzer. Application Note

Transcription:

Small RNAs and how to analyze them using sequencing Jakub Orzechowski Westholm (1) Long- term bioinforma=cs support, Science For Life Laboratory Stockholm (2) Department of Biophysics and Biochemistry, Stockholm University

Small RNAs Small RNAs are species of short non- coding RNAs, typically <100 nucleo=des micro RNAs (mirnas) short interfering RNAs (sirnas) piwi associated RNAs (pirnas) clustered regularly interspaced short palindromic repeats (CRISPRs) mirtrons, cis- natrnas, TSS- mirnas, enhancer RNAs and other strange things

1. Background on regulatory small RNAs

1991: RNA can repress gene expression wt (Napoli et al. Plant Cell, 1991.) Overexpression of pigment gene CHS The mechanism responsible for the reversible co- suppression of homologous genes in trans is unclear..

1993: The first microrna is discovered in the worm genome 1. A muta=on in the lin- 4 locus disrupts worm development. 2. The lin- 4 locus encodes a non- coding RNA that forms a hairpin structure and produces two small transcripts, 61 and 22 nt. 3. Part of this RNA is complementary to the 3 UTR of a developmental gene, lin- 14

1998: double stranded RNA can efficiently repress gene expression To our surprise, we found that double- stranded RNA was substan=ally more effec=ve at producing interference than was either strand individually. RNAi = RNA interference

2000: a second, conserved, microrna is found

2001: many micrornas are found in various animals Using: - RNA structure predic=on - Compara=ve genomics - (low throughput) sequencing

microrna biogenesis Many enzymes etc. are involved: Drosha, Exp5, Dicer,... The end result is a ~22nt RNA loaded into an Argonaute complex. The microrna directs Argonaute to target genes, through base pairing with the 3 UTR (pos 2-8). This causes repression. (Winter et. Nature Cell Biol, 2009)

Target repression by micrornas (This is in animals. micrornas in plants work differently.) (Fabian, NSMB, 2012)

How do micrornas find their targets? In animals, micrornas find their targets through pairing between the microrna seed region (nucleo=des 2-8) and the target transcript (Friedman et al. Genome Research, 2009) Such short matches are common à a microrna can have hundreds of targets. It is es=mated that over half of all genes are targeted by micrornas.

MicroRNA target predic=on Besides seed pairing, other features are used in the target predic=ons: Conserva=on (conserved target sites are more likely to be func=onal) mrna structure (it s hard for a microrna to interact with a highly structured target mrna) Sequences around the target site (AU rich sequences around targets?) Many programs exist for microrna target predic=on (targetscan, mirsvr, PicTar,..) These are not perfect. Target predic=on is hard, and a lot of details about the mechanism are s=ll not known.

MicroRNAs in animal genomes There are typically hundreds or thousands micrornas in animal genomes: Fly: ~300 microrna loci Mouse: ~1200 microrna loci Human: ~1900 microrna loci A single locus can produce mul=ple microrna forms (called isomirs). In a given =ssue, their expression can range over 6 orders of magnitude (a few to a few million reads)

micrornas regulate many biological processes and are involved in disease Development Stress response Cancer Cardiovascular disease Inflammatory disease Autoimmune disease

2. Small RNA sequencing

Sequencing Small RNA sequencing is similar to mrna sequencing, but: There is no poly- A selec=on. Instead RNA fragments are size selected (typically less than 35 nucleo=des, to avoid contamina=on by ribosomal RNA). Low complexity libraries à more sequencing problems FastQC results will look strange: Length Nucleo=de content Sequence duplica=on

Pre- processing of small RNA data I Since we are sequencing short RNA fragments, adaptor sequences end up in the reads too. Many programs available to remove adaptor sequences (cutadapt, fastx_clipper, Btrim..) We only want to keep the reads that had adaptors in them. GTTTCTGCATTTTCGTATGCCGTCTTCTGCTTGAA GTGGGTAGAACTTTGATTAATTCGTATGCCGTCTT GTTTGTAAATTCTGATCGTATGCCGTCTTCTGCTT GAATATATATAGATATATACATACATACTTATCGT GCTGACTTAGCTTGAAGCATAAATGGTCGTATGCC GACGATCTAGACGGTTTTCGCAGAATTCTGTTTAT Adapter missing

Pre- processing of small RNA data II micrornas are expected to be 20-25 nt. Short reads are probably not micrornas, and are hard to map uniquely GTTTCTGCATTTTCGTATGCCGTCTTCTGCTTGAA GTGGGTAGAACTTTGATTAATTCGTATGCCGTCTT GTTTGTAAATTCTGATCGTATGCCGTCTTCTGCTT GCTGACTTAGCTTGAAGCATAAATGGTCGTATGCC Long reads are probably not micrornas To short (Lau et al. Genome Research, 2010)

Pre- processing of small RNA data III Another useful QC step is to check which loci the reads map to: (Ling, BMC Genomics, 2011)

Example of reads mapping to a microrna locus (Berezikov et al. Genome Research, 2011.)

Quan=fying small RNA expression I A B C CGTGTCAGGAGTGCATTTGCA TAAATGCACTATCTGGTACGACA A. Count all reads mapping to a locus? - The simplest op=on, usually good for profiling. B. Count reads from each hairpin arm? - Useful if we want to correlate this with expression of target genes, or do more careful profiling. C. Only count reads that exactly match the mature microrna? - Not a good idea, because we will miss variants

Quan=fying small RNA expression II micrornas from the same family can be very similar (or iden=cal) How treat this: Keep in mind that some micrornas are hard to separate. If a read maps to several N loci, count 1/N read at each locus.

Error sources Different chemistries and protocols can have different effects on expression measurements. We only get a few different sequences from each microrna, so any biases can have big effects. (In normal RNA- seq each gene generates many different reads, so this is not a big problem) Normaliza=on doesn t fix these problems à it s hard to compare data from different plaporms etc. The amount of star=ng material can influence the results:

Normalizing small RNA expression levels I Only a few loci, and huge differences in expression levels à a few mirnas can account for the majority of all reads, and skew expression levels of all micrornas. True expression levels Nr reads sequenced mir1 mir2 mir1 mir2 mir3 mir1 mir2 mir1 mir2 mir3 Condi=on A Condi=on B Condi=on A Condi=on B Since many reads are used to sequence mir3 in condi=on B, fewer are available for mir1 and mir2. Normaliza=on needs to deal with this situa=on. Simply scaling read counts by the total number mapped reads will not solve this problem. (Spike- in are always useful for normaliza=on.)

Normalizing small RNA expression levels II Different methods for normaliza=on TMM ( trimmed mean of M- values ) normaliza=on (Robinson et al. 2010, Genome Biology, McCormick et al. 2011, Silence) In short, TMM normaliza=on works like this: Compute log ra=os of all microrna ( M- values ) Remove ( trim ) the highest and the lowest log- ra=os, and the highest and lowest expressed micrornas. Use a mean of the remaining log- ra=os to compute the scaling factors The underlying assump=on is that most micrornas have similar expression levels in the different samples, and should have similar expression levels arer normaliza=on.

Normalizing small RNA expression levels III Quan=le normaliza=on (Garmire et al. RNA, 2012) The underlying assump=on is that the overall expression distribu=on is the same in all samples. Before normaliza=on Arer normaliza=on Nr loci Nr loci Expression level Expression level Many other methods exist, developed for RNA- seq or microarrays. No consensus about which method to use à always good to try a few different methods.

Differen=al expression Arer the expression levels have been properly normalized, methods for RNA- seq differen=al expression can be used. ANOVA, t- tests DeSeq, edger, voom, limma, etc.. No consensus on which method is best. Keep in mind: Since microrna quan=fica=on is less reliable than normal RNA- seq: More replicates are needed. More valida=on experiments are required (Northern blots, in- situ hybridiza=on, etc.). Use cau=on when interpre=ng results!

3. What can we learn from microrna expression analysis?

MicroRNA expression profiles classify human cancers microrna expression profiles cluster according to cancer type. (Lu et al. Nature 2005)

microrna profiles can be used to dis=nguish cancer subtypes (Chan et al. Trends in Molecular Medicine, 2010)

microrna profiles in cell lines vs =ssues PCA plot showing that microrna profiles in most cell lines are more similar to each other than to normal =ssues. (Wen et al. Genome Research 2014)

microrna regula=on in cancer and Lin- 28 development The microrna let- 7 is involved in regula=ng developmental =ming, and is a tumor suppressor. let- 7 is repressed by the protein lin- 28. This repression is post- transcrip=onal, and is associated with uridyla=on at the 3 end of let- 7. Using sequencing, such uridyla=on can be observed for many micrornas. (It s also possible to see microrna edi=ng.) (Lehrbach et al. NSMB, 2009)

Rela=on between micrornas and their predicted targets It is possible to find sta=s=cal correla=ons between expression of micrornas and of their predicted target genes. Example: mir- 124 is expressed in the nervous system. Neural genes are depleted for mir- 124 target sites in the 3 UTRs. Genes expressed in epidermis, muscle, gut etc. are enriched for mir- 124 sites. But we cannot be sure that this is true, since we are only looking at predicted targets! (Stark et al. Cell, 2005)

Discovering new small RNA loci micrornas have very specific pauerns, when it comes to Read size and mapping RNA structure Conserva=on This makes it possible to find micrornas using small RNA sequencing data.

mirdeep(2) The most used program for finding new micrornas. Takes as input: A genome sequence Small RNA sequencing data A set of loci to exclude (op=onal) The basic idea is: Look at sequence data to find a (large) set of possible loci. Look at RNA folding and read mapping pauerns to give a score to each candidate Nr reads from both arms Fixed read ends Free energy, base pairing in the hairpin structure,..

Output is a list of microrna candidates, with scores, and a plot for each candidate: There are also other programs, e.g. shortstack (Axtell, RNA, 2013) which also finds other small RNAs. (Friedlä nder et. al. Nucleic Acids Reseach, 2011) mirdeep2 is installed on UPPMAX.

Other strange small RNAs that show up in sequencing data pirnas mirtrons hp- RNAs trna fragments TSS- micrornas cis- natrnas Some of these are func=onal Some are by products of RNA processing, and can be informa=ve (e.g. microrna loop sequences). Some are probably just noise.

The end