Obstacles and challenges in the analysis of microrna sequencing data

Similar documents
Eukaryotic small RNA Small RNAseq data analysis for mirna identification

Synthetic microrna Reference Standards Genomics Research Group ABRF 2015

DNA Sequence Bioinformatics Analysis with the Galaxy Platform

microrna analysis Merete Molton Worren Ståle Nygård

Supplemental Figure 1. Small RNA size distribution from different soybean tissues.

Supplementary Fig. 1 Composition of small RNA populations during 2 MEF reprogramming. Continued on next page.

Chip Seq Peak Calling in Galaxy

Supplementary Figure 1

Data mining with Ensembl Biomart. Stéphanie Le Gras

Small RNA-Seq and profiling

genomics for systems biology / ISB2020 RNA sequencing (RNA-seq)

WHITE PAPER. Increasing Ligation Efficiency and Discovery of mirnas for Small RNA NGS Sequencing Library Prep with Plant Samples

Zhao et al. BMC Bioinformatics (2017) 18:180 DOI /s

Arabidopsis thaliana small RNA Sequencing. Report

Transcriptome Analysis

Human breast milk mirna, maternal probiotic supplementation and atopic dermatitis in offsrping

Accurate detection for a wide range of mutation and editing sites of micrornas from small RNA high-throughput sequencing profiles

A novel and universal method for microrna RT-qPCR data normalization

omiras: MicroRNA regulation of gene expression

Transcriptome-wide analysis of microrna expression in the malaria mosquito Anopheles gambiae

P. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( 鄧致剛 ); g ( 黄栢榕 ) Bioinformatics Center, Chang Gung University.

PG-Seq NGS Kit for Preimplantation Genetic Screening

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

Selective depletion of abundant RNAs to enable transcriptome analysis of lowinput and highly-degraded RNA from FFPE breast cancer samples

Identification of mirnas in Eucalyptus globulus Plant by Computational Methods

ChIP-seq hands-on. Iros Barozzi, Campus IFOM-IEO (Milan) Saverio Minucci, Gioacchino Natoli Labs

For Research Use Only Ver

STAT1 regulates microrna transcription in interferon γ stimulated HeLa cells

P. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( ); g ( ) Bioinformatics Center, Chang Gung University.

Supplementary Figure 1

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

Histone Modifications Are Associated with Transcript Isoform Diversity in Normal and Cancer Cells

Table S1. Relative abundance of AGO1/4 proteins in different organs. Table S2. Summary of smrna datasets from various samples.

ChIP-seq data analysis

Advance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library

RNA SEQUENCING AND DATA ANALYSIS

Transcriptome and isoform reconstruc1on with short reads. Tangled up in reads

A Quick-Start Guide for rseqdiff

Gene-microRNA network module analysis for ovarian cancer

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

High AU content: a signature of upregulated mirna in cardiac diseases

mirna seq of mouse brain regions

Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research

EXOSOMES & MICROVESICLES

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays

CRS4 Seminar series. Inferring the functional role of micrornas from gene expression data CRS4. Biomedicine. Bioinformatics. Paolo Uva July 11, 2012

Simple, rapid, and reliable RNA sequencing

Products for cfdna and mirna isolation. Subhead Circulating Cover nucleic acids from plasma

Supplementary materials and methods.

A Statistical Framework for Classification of Tumor Type from microrna Data

HBV. Next Generation Sequencing, data analysis and reporting. Presenter Leen-Jan van Doorn

DNA-seq Bioinformatics Analysis: Copy Number Variation

Small RNAs and how to analyze them using sequencing

RNA interference induced hepatotoxicity results from loss of the first synthesized isoform of microrna-122 in mice

Assaying micrornas in biofluids for detection of drug induced cardiac injury. HESI Annual Meeting State of the Science Session June 8, 2011

A two-microrna signature in urinary exosomes for diagnosis of prostate cancer

Tutorial. ChIP Sequencing. Sample to Insight. September 15, 2016

A complete next-generation sequencing workfl ow for circulating cell-free DNA isolation and analysis

Small RNA Sequencing. Project Workflow. Service Description. Sequencing Service Specification BGISEQ-500 SERVICE OVERVIEW SAMPLE PREPARATION

Cross species analysis of genomics data. Computational Prediction of mirnas and their targets

High Throughput TruSeq Stranded mrna Library Construction on the Biomek FX P

Patnaik SK, et al. MicroRNAs to accurately histotype NSCLC biopsies

Small RNAs and how to analyze them using sequencing

The Epigenome Tools 2: ChIP-Seq and Data Analysis

On the Reproducibility of TCGA Ovarian Cancer MicroRNA Profiles

For Research Use Only Ver

EPIGENOMICS PROFILING SERVICES

For Research Use Only Ver

AVENIO ctdna Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB

ACE ImmunoID Biomarker Discovery Solutions ACE ImmunoID Platform for Tumor Immunogenomics

Metabolomic and Proteomics Solutions for Integrated Biology. Christine Miller Omics Market Manager ASMS 2015

Methods: Biological Data

AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits

RNA-Seq Preparation Comparision Summary: Lexogen, Standard, NEB

The value of Omics to chemical risk assessment

Evidence for the biogenesis of more than 1,000 novel human micrornas

Metabolic programming. Role of micrornas. M Elizabeth Tejero, PhD Laboratory of Nutrigenetics and Nutrigenomics INMEGEN Mexico City

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction

Iso-Seq Method Updates and Target Enrichment Without Amplification for SMRT Sequencing

Mature microrna identification via the use of a Naive Bayes classifier

Vantage Diabetes Panel 1

Peak-calling for ChIP-seq and ATAC-seq

MODULE 4: SPLICING. Removal of introns from messenger RNA by splicing

Circular RNAs (circrnas) act a stable mirna sponges

Long non-coding RNAs

Analyse de données de séquençage haut débit

SC-L-H shared(37) Specific (1)

IDENTIFICATION OF IN SILICO MIRNAS IN FOUR PLANT SPECIES FROM FABACEAE FAMILY

Hao D. H., Ma W. G., Sheng Y. L., Zhang J. B., Jin Y. F., Yang H. Q., Li Z. G., Wang S. S., GONG Ming*

SUPPLEMENTAL DATA AGING, July 2014, Vol. 6 No. 7

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Supplementary Figures

he micrornas of Caenorhabditis elegans (Lim et al. Genes & Development 2003)

High-resolution analysis of the human retina mirnome reveals isomir variations and novel micrornas

RNA SEQUENCING AND DATA ANALYSIS

Santosh Patnaik, MD, PhD! Assistant Member! Department of Thoracic Surgery! Roswell Park Cancer Institute!

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16

Supplementary Material for IPred - Integrating Ab Initio and Evidence Based Gene Predictions to Improve Prediction Accuracy

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1. Differential expression of mirnas from the pri-mir-17-92a locus.

Transcription:

Obstacles and challenges in the analysis of microrna sequencing data (mirna-seq) David Humphreys Genomics core Dr Victor Chang AC 1936-1991, Pioneering Cardiothoracic Surgeon and Humanitarian

The ABCs about mirnas (Annotation, Biogenesis, Curation) www.mirbase.org Mature fasta file Stem loop fasta file Gff (genome coordinate file)

mirna-seq applications Discovery - Novel mirnas - Isoforms - Biogenesis iii ) non canonical processing iv) Strand selection v) length/ non-template additions Quantification - Differentially expressed mirnas - Differential processing Read length covers entire mature transcript

Experimental design Sample selection Species, replicates RNA extraction Library preparation

RNA extraction Liquid Column Bead Prep time ++ ++++ +++ mirna purification +++ ++++ ++++ Recovery ++++ +++ +++ Limitations/pitfalls Low input mirna bias Early protocols no mirna??? Most susceptible: - Low GC content, - 2ndary structure Kim et al., (2011) Molecular Cell 43, 1005-1014 Kim et al., (2012) Molecular Cell 46, 893-895 Small RNA ppt with longer RNA Ratio 141/200c Down regulated mirnas: 141, 29b, 21, 106b, 15a, 34a NO change!! Cell number (L) = 200,000 (H) = 800,000 Low confluence = 500,000 cells High confluence = 800,000 cells

RNA quantification and integrity seqanswers.com/forums/showthread.php?t=21280 Nano drop Qubit Agilent Absorbance 230 260 280 Can detect salt & other contaminants! WARNING - Accuracy poor below 50ng/ul - Careful of concentrations > 1ug/ul Assays specific for DNA/RNA! WARNING - Known biases in quantifying ssrna < 50ng/ul! Quantitate size WARNING - Quantification only accurate in the defined range (read manual)

Library prep kit comparison Sample prep P- mirna -OH # Input amount # PH, buffers/salts/atp Adaptor ligation mirna Sequential Ligation # Sequence # Temperature # Incubation times mirna i) Hybridisation ii) Ligation iii) Denaturation RT (Reverse Transcription) mirna PCR # PCR cycles OK # Hafner et al., (2011) RNA 17(9), 1-16

Summary Sample selection Species, replicates RNA extraction Use same method for all preps Quantify (2 methods) Assess integrity Library preparation Consistent input Consistent ligation conditions (time/temperature) Use same kits

mirna-seq Bioinformatics (Trim - ALIGN Report)

Anscombe s Quartet Maths is a tool for analysis. You can blindly ignore biases and errors in data sets. - mean, stdev, variance, correlation are the same! Image from wikipedia https://en.wikipedia.org/wiki/anscombe%27s_quartet

Challenges Length of a sequence read covers entire microrna transcript Upstream bias will have impacts on analysis Sample preparation Library preparation Clonal amplification Sequencing Bioinformatics Multimappers Mismatches Aligners Feature counting Normalisation Visualisation Differential expression Sharing data

Choice of reference? Genome mirbase stem-loop Better discovery Limited discovery Possible incorrect/loss of mappings Forced (biased) mapping Slower, computationally restrictive? Faster, less complicated.

Multi-mappers (1) mirbase does NOT ACCURATELY report number of times a read aligns to genome Multi-loci mirbase entries provide some information Number mirs 200 160 120 80 40 Human multi-mappers # mir-486 Example 0 0 20 40 60 80 > 100 Number of mapped locations mir-486 # Human mirbase entries mapped using bowtie aligner allowing all multi-mappers

Multi-mappers (2) Multi-mapping rate increases as read length decreases. What should the minimum length mirna read? Shortest length in mirbase is 17nt! mir-133 family mir-133a-1-3p uuugguccccuucaaccagcug mir-133a-1-3p uuugguccccuucaaccagcug mir-133b-1-3p uuugguccccuucaaccagcua Where do you assign multi-loci counts? - Assign to each position? - Assign fraction to each position? - Intelligently assign to a position? - Ignore? mir-133b mir-133a

Mismatches Sequencing Variants i) Error in library prep ii) Variants in reference genome iii) Sequencer RNA editing Ohanian et al. (2013) BMC Genetics, 14:18 Type Enzyme Comment A to I (G) ADAR Predominantly on pre-mirs C to T Apobec Not identified yet? Chawla et al., (2014) Nucleic Acids Research, 42 (8): 5245 5255 Tomaselli et al., (2013) Int. J. Mol. Sci. 14, 22796-22816

Aligners (Too) Many choices Each aligner has a wide array of options with DIFFERENT default settings. Bowtie aligner provides error rate and multi-mapping control : bowtie -p 4 -n 1 -l 21 --nomaqround -k 10 --best --strata --chunkmbs 256 Allow 1 mismatch in a length of 21nt Report up to 10 multi-mappers Fastq calibration dataset: Available for ALL species present in mirbase, features include: i) Each header defines mirbase mapping location ii) Contains all mirbase entries with all single nucleotide mismatches mirna ID Mapping location #1 Mapping location #2 hsa-let-7f-5p_m_chr9_94176353_94176374_+#chrx_53557246_53557267_- 0 chr9 94176353 255 22M * 0 0 TGAGGTAGTAGATTGTATAGTT

Non template additions (NTA) i) Adenylation <mirna seq> + (A) n ii) Uridylation <mirna seq> + (T) n DETECTION METHODS: Koppers-Lalic et al., (2014), Cell Reports 8, 1649 1658 Aligners tend to softclip 3 mismatches!! Remove adaptor - Hard trim (18nt) - Extend alignment. - Look for mismatch clusters at end of read.

Assigning mirna counts Mature mirna analysis i) 5 isomirs ii) 3 isomirs iii) Non canonical iv) Arm switching v) Length vi) Editing Cistronic Analysis (i) (ii) Humphreys et al., 2013, NAR

mirspring http://mirspring.victorchang.edu.au Humphreys D.T., and Suter C.M. Nucleic Acids Research 2013. Small (<2MB) HTML document that replicates the mirna aligned sequencing data. Needs NO internet connectivity. Provides visualization of sequence data + research tools == complete transparency.

Cummulative distribution of mirna reads OK AGO IP TissueENCODE Atlas THP-1 Heart Kidney Liver Lung Ovary Spleen Testes Thymus Brain Placenta HeLa S3 A549 Ag04450 Bj Gm1287 H1hesc HepG2 Huvec K562 MCF7 73 mirspring documents NheK 895 million sequence Sknshra tags < 55 megabytes of disk space In most cell lines and tissues the most abundant mirna should comprise < 35% of all aligned mirna sequences Sampling bias!

Top 100 mirnas typically: - 22nt long - Good correlation with mirbase

Conclusions Many challenges in mirna-seq analysis Multi-mappers Mismatches Best practises. be methodical Know the question you wish to address Know your species (reference/mirbase) Know your aligner Test your pipeline! Know what you are missing Quality control metrics/ visualisation

If you would like a mirbase test data set for any species/reference combination please don t hesistate to contact me. d.humphreys@victorchang.edu.au mirspring.victorchang.edu.au - Fastq synthetic data sets - Intelligently assign multi-mappers - R objects Joshua Ho Peter Szot Catherine Suter Diane Fatkin St Vincent s Hospital Chris Hayward Kavitha Andrew Jabbour Thomas Priess