VirusDetect pipeline - virus detection with small RNA sequencing

Similar documents
Phenomena first observed in petunia

Intrinsic cellular defenses against virus infection

Next-generation sequencing for virus detection: covering all the bases

Hao D. H., Ma W. G., Sheng Y. L., Zhang J. B., Jin Y. F., Yang H. Q., Li Z. G., Wang S. S., GONG Ming*

VIP: an integrated pipeline for metagenomics of virus

Clinical utility of NGS for the detection of HIV and HCV resistance

For all of the following, you will have to use this website to determine the answers:

Hands-On Ten The BRCA1 Gene and Protein

P. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( ); g ( ) Bioinformatics Center, Chang Gung University.

Transcriptome Analysis

Alternative RNA processing: Two examples of complex eukaryotic transcription units and the effect of mutations on expression of the encoded proteins.

Transcriptional control in Eukaryotes: (chapter 13 pp276) Chromatin structure affects gene expression. Chromatin Array of nuc

Human Genome: Mapping, Sequencing Techniques, Diseases

Table S1. Relative abundance of AGO1/4 proteins in different organs. Table S2. Summary of smrna datasets from various samples.

High-throughput transcriptome sequencing

Virus-host interactions

Bi 8 Lecture 17. interference. Ellen Rothenberg 1 March 2016

De Novo Viral Quasispecies Assembly using Overlap Graphs

Prokaryotes and eukaryotes alter gene expression in response to their changing environment

Identification of mirnas in Eucalyptus globulus Plant by Computational Methods

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits

Matthew Cotten Department of Viroscience Erasmus Medical Center Useful data analysis for clinical applications of viral next generation sequencing

Student Handout Bioinformatics

MicroRNA in Cancer Karen Dybkær 2013

Novel RNAs along the Pathway of Gene Expression. (or, The Expanding Universe of Small RNAs)

Small RNAs and Cross-Kingdom RNAi in Plant-Microbial Interaction Hailing Jin

microrna analysis Merete Molton Worren Ståle Nygård

Supplemental Figure 1. Small RNA size distribution from different soybean tissues.

Small RNA Sequencing. Project Workflow. Service Description. Sequencing Service Specification BGISEQ-500 SERVICE OVERVIEW SAMPLE PREPARATION

Characterizing intra-host influenza virus populations to predict emergence

Page 32 AP Biology: 2013 Exam Review CONCEPT 6 REGULATION

Clay nanosheets for topical delivery of RNAi for sustained protection against plant viruses

Chapter 18. Viral Genetics. AP Biology

Circular RNAs (circrnas) act a stable mirna sponges

OCCUPATIONAL HEALTH CONSIDERATIONS FOR WORK WITH VIRAL VECTORS

Journal of Agricultural Technology 2012 Vol. 8(4): Journal of Agricultural

AG Kaderali Bioinformatics / Mathematical Modeling of Infection

Bioinformatic analyses: methodology for allergen similarity search. Zoltán Divéki, Ana Gomes EFSA GMO Unit

Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research

Framework for the evaluation of. and scientific impacts of plant viruses. technologies.

DETECTION OF LOW FREQUENCY CXCR4-USING HIV-1 WITH ULTRA-DEEP PYROSEQUENCING. John Archer. Faculty of Life Sciences University of Manchester

Characterizing the Respiratory Microbiome of Commercial Broilers on the Delmarva Peninsula

Lab Tuesday: Virus Diseases

A Quick-Start Guide for rseqdiff

General concepts of antiviral immunity. source: wikipedia

Eukaryotic small RNA Small RNAseq data analysis for mirna identification

RNAi Restricts Virus Infections

Exploring HIV Evolution: An Opportunity for Research Sam Donovan and Anton E. Weisstein

Lab Tuesday: Virus Diseases

Mechanisms of alternative splicing regulation

LESSON 4.6 WORKBOOK. Designing an antiviral drug The challenge of HIV

RNA Secondary Structures: A Case Study on Viruses Bioinformatics Senior Project John Acampado Under the guidance of Dr. Jason Wang

Web-based tools for Bioinformatics; A (free) introduction to (freely available) NCBI, MUSC and Worldwide.

NEXT GENERATION SEQUENCING OPENS NEW VIEWS ON VIRUS EVOLUTION AND EPIDEMIOLOGY. 16th International WAVLD symposium, 10th OIE Seminar

Chapter 19: Viruses. 1. Viral Structure & Reproduction. What exactly is a Virus? 11/7/ Viral Structure & Reproduction. 2.

Ambient temperature regulated flowering time

Ch. 18 Regulation of Gene Expression

Module 3. Genomic data and annotations in public databases Exercises Custom sequence annotation

COMPUTATIONAL ANALYSIS OF SMALL RNAs IN MAIZE MUTANTS WITH DEFECTS IN DEVELOPMENT AND PARAMUTATION. Reza Hammond

AN UNUSUAL VIRUS IN TREES WITH CITRUS BLIGHT RON BRLANSKY UNIVERSITY OF FLORIDA, CREC

Supplementary Methods

Chapter 19: Viruses. 1. Viral Structure & Reproduction. 2. Bacteriophages. 3. Animal Viruses. 4. Viroids & Prions

18.2 Viruses and Prions

Reverse transcription and integration

AIDS - Knowledge and Dogma. Conditions for the Emergence and Decline of Scientific Theories Congress, July 16/ , Vienna, Austria

Analysis with SureCall 2.1

Phylogenomics. Antonis Rokas Department of Biological Sciences Vanderbilt University.

Breast and ovarian cancer in Serbia: the importance of mutation detection in hereditary predisposition genes using NGS

RNA-Based Plant Protection

Host-Induced Gene Silencing in the Fusarium-Wheat interaction. Wanxin Chen and Patrick Schweizer

PROTOCOL FOR INFLUENZA A VIRUS GLOBAL SWINE H1 CLADE CLASSIFICATION

ALN-HBV. Laura Sepp-Lorenzino November 11, Alnylam Pharmaceuticals, Inc.

ITS accuracy at GenBank. Conrad Schoch Barbara Robbertse

Viral genome sequencing: applications to clinical management and public health. Professor Judy Breuer

Unit 13.2: Viruses. Vocabulary capsid latency vaccine virion

Ali Alabbadi. Bann. Bann. Dr. Belal

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

Golden Helix s End-to-End Solution for Clinical Labs

Lahore University of Management Sciences. BIO 314- Microbiology and Virology (Spring 2018)

A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis

The Open Bioinformatics Journal, 2014, 8, 1-5 1

DNA Sequence Bioinformatics Analysis with the Galaxy Platform

The characterization of the human virome in children and adults

How to Standardise and Assemble Raw Data into Sequences: What Does it Mean for a Laboratory to Use Such Technologies?"

DNA-seq Bioinformatics Analysis: Copy Number Variation

RIP-seq of BmAgo2-associated small RNAs reveal various types of small non-coding RNAs in the silkworm, Bombyx mori

Name: Due on Wensday, December 7th Bioinformatics Take Home Exam #9 Pick one most correct answer, unless stated otherwise!

19 Viruses BIOLOGY. Outline. Structural Features and Characteristics. The Good the Bad and the Ugly. Structural Features and Characteristics

Discovery of. 1892: Russian biologist Dmitri Ivanovsky publishes. 1931: first images of viruses obtained using

High AU content: a signature of upregulated mirna in cardiac diseases

SUPPLEMENTARY INFORMATION. Divergent TLR7/9 signaling and type I interferon production distinguish

The Evolution of sirna Therapeutics Targeting HBV at Arrowhead Pharmaceuticals. Sept 25, 2018

Proposed EPPO validation of plant viral diagnostics using next generation sequencing

Microbiological Societies (IUMS) Congress This conference took place in Montreal, CANADA during

PHARMACEUTICAL MICROBIOLOGY JIGAR SHAH INSTITUTE OF PHARMACY NIRMA UNIVERSITY

AP Biology. Viral diseases Polio. Chapter 18. Smallpox. Influenza: 1918 epidemic. Emerging viruses. A sense of size

Cross species analysis of genomics data. Computational Prediction of mirnas and their targets

HIV DNA Genotyping by UDS compared with cumulative HIV RNA Genotypes in Pretreated Patients

MicroRNA and Male Infertility: A Potential for Diagnosis

Transcription:

VirusDetect pipeline - virus detection with small RNA sequencing CSC webinar 16.1.2018 Eija Korpelainen, Kimmo Mattila, Maria Lehtivaara Big thanks to Jan Kreuze and Jari Valkonen!

Outline Small interfering RNA (sirna) as an antiviral defense mechanism VirusDetect pipeline central concepts analysis steps result files Using VirusDetect in Chipster Using VirusDetect on command line in the Taito cluster

RNA interference (RNAi) RNAi is an antiviral defense mechanism in eukaryotic organisms Upon viral infection Dicer enzymes cut viral dsrna to small interfering RNA (sirna) molecules, which are 21-24 nucleotides long. sirnas associate with Argonaute proteins and guide the RNA-induced silencing complex (RISC) to degrade viral RNA sirnas are further amplified by RNA-dependent RNA polymerases (RdRP) We can sequence sirnas and assemble virus genomes from the reads virus detection and identification

Procedure Figure by Dr Jan Kreuze Extract RNA & run in 4% agarose gel Cut and purify 20-30 nt band, send to sequencing provider for processing & sequencing on Illumina HiSeq 2000

VirusDetect bioinformatics pipeline for detecting viruses in small RNA-seq data Combines several analysis tools in one tool Assembles srna reads to longer sequences (contigs) in two ways Reference-guided assembly: match reads to known virus sequences and combine matching reads together de novo assembly: match reads to each other Compares the contigs to known virus sequences Uses BLAST for similarity search If no similarity is found, reports sirna size distribution profile Read more Zheng Y et al (2017) VirusDetect: An automated pipeline for efficient virus discovery using deep sequencing of small RNAs. Virology 500:130-138 http://virusdetect.feilab.net/cgi-bin/virusdetect/

Reference-guided assembly Figure by Dr Jan Kreuze Virus reference sequence, 1000 nt 600 nt 200 nt depth=7 Assembled 2 contigs, coverage is 80% (800/1000 nt)

Reference-guided assembly of contigs Align reads with the aligner BWA to reference virus database As a reference database VirusDetect uses GenBank virus sequences. They have been Classified to 8 different host kingdoms Vertebrate, invertebrate, plant, protozoa, algae, fungus, bacteria, archaea Processed to remove redundancy (sequences that are more than 95% similar have been combined) The same reference database is used later on for BLAST searches Perform reference-guided assembly of the aligned reads using Samtools

De novo assembly of contigs Remove host-derived small RNAs Align reads to host genome with BWA, keep the unaligned reads Assemble de novo contigs with Velvet Remove host-derived contigs Align contigs to host genome with BWA

VirusDetect steps for contig assembly

VirusDetect steps for virus identification

Parameters for filtering BLAST results Minimum fraction of a contig covered by virus reference The BLAST match must cover at least this fraction of the contig in order to make it significant for virus assignment. By default only contigs that match to the reference viruses for more than 75% of their length are considered. Minimum fraction of virus reference covered by contigs Virus assignment is reported only if at least this percentage of the virus reference is covered with significant matches to contigs. The default is 10%. Minimum read depth The average number of times each nucleotide of the reference sequence is covered by reads

VirusDetect log file and result tables Vd.log Info on number of contigs assembled, reads that aligned, and viruses detected blastn_matching_references.html Table listing reference viruses that have matching contigs identified by BLASTN. Virus.bn.pdf Detailed BLASTN result for each virus reference match. undetermined.html Table listing the length, sirna size distribution and 21-22nt percentage of undetermined contigs. Potential virus contigs (21-22 nt > 50%) indicated in green. undetermined_blast.html Table listing contigs that have hits in the virus reference database but not assigned to any reference viruses because they did not pass the 3 filtering criteria. blastn_matches.tsv Table listing contigs listing that have hits in the virus reference database.

Vd.log

Vd.log continued

blastn_matching_references.html coverage = bases (percentage) of the reference that is covered by contigs depth = the average number of times each reference base is covered by reads depth norm = normalized depth: the average number of times each reference base is covered by reads, per million of total reads % identity = the average percentage of sequence identity to the reference of all contigs aligned to that reference % iden max = maximum percentage of sequence identity of the contigs to the reference Good match has high coverage and high depth

Virus.bn.pdf Blue = virus sequence Red = contigs (lighter color indicates lower identity)

undetermined.html Green color indicates potential virus contigs (21-22nt >50 %)

VirusDetect sequence files and BAM virusdetect_contigs.fa Sequences of non-redundant contigs derived through reference-guided and de novo assemblies. contigs_with_blastn_matches.fa Sequences of contigs that match to virus references by BLASTN. undetermined_contigs.fa Sequences of contigs that do not match to virus references. blastn_matching_references.fa and.fai. Virus reference sequences that produced BLASTN hits, and fasta index file blastn_matches.bam and.bai. BAM file containing the BLASTN alignment of each contig to its corresponding virus reference sequence, and BAM index file.

How many reads should I have? More reads give better reference coverage, longer contigs and more depth Figure from Zheng Y et al (2017) VirusDetect: An automated pipeline for efficient virus discovery using deep sequencing of small RNAs. Virology 500:130-138

VirusDetect in Chipster Two tools in the Small RNA-seq category VirusDetect VirusDetect with own host genome As a result you get the Vd.log file and a tar package file The tar package contains all the other result files. You can extract them using the tool Utilities / Extract tar file If you used the tool VirusDetect with own host genome, the tar package contains also an index file for your host genome. Use it in the next analysis instead of the fasta genome file (genome indexing can take a couple of hours) You can practise with example sessions course_virusdetect_potato_done course_virusdetect_raspberry_done