P. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( ); g ( ) Bioinformatics Center, Chang Gung University.

Similar documents
DNA Sequence Bioinformatics Analysis with the Galaxy Platform

DNA-seq Bioinformatics Analysis: Copy Number Variation

Breast and ovarian cancer in Serbia: the importance of mutation detection in hereditary predisposition genes using NGS

Analysis with SureCall 2.1

Transcriptome Analysis

Advance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library

How to Standardise and Assemble Raw Data into Sequences: What Does it Mean for a Laboratory to Use Such Technologies?"

P. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( 鄧致剛 ); g ( 黄栢榕 ) Bioinformatics Center, Chang Gung University.

Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research

Ambient temperature regulated flowering time

ChIP-seq hands-on. Iros Barozzi, Campus IFOM-IEO (Milan) Saverio Minucci, Gioacchino Natoli Labs

Below, we included the point-to-point response to the comments of both reviewers.

Dr Rick Tearle Senior Applications Specialist, EMEA Complete Genomics Complete Genomics, Inc.

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits

Small RNA Sequencing. Project Workflow. Service Description. Sequencing Service Specification BGISEQ-500 SERVICE OVERVIEW SAMPLE PREPARATION

Future applications of full length virus genome sequencing

Data mining with Ensembl Biomart. Stéphanie Le Gras

PREPARED FOR: U.S. Army Medical Research and Materiel Command Fort Detrick, Maryland

Phylogenomics. Antonis Rokas Department of Biological Sciences Vanderbilt University.

Canadian Bioinforma1cs Workshops

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq

Bigomics : Challenges and promises in large scale sequencing projects

VirusDetect pipeline - virus detection with small RNA sequencing

Identifying Mutations Responsible for Rare Disorders Using New Technologies

genomics for systems biology / ISB2020 RNA sequencing (RNA-seq)

Analysis of the genetic diversity of influenza A viruses using next-generation DNA sequencing

Clinical utility of NGS for the detection of HIV and HCV resistance

VARIANT PRIORIZATION AND ANALYSIS INCORPORATING PROBLEMATIC REGIONS OF THE GENOME ANIL PATWARDHAN

NGS in Cancer Pathology After the Microscope: From Nucleic Acid to Interpretation

Colorspace & Matching

Iso-Seq Method Updates and Target Enrichment Without Amplification for SMRT Sequencing

Chip Seq Peak Calling in Galaxy

Hands-On Ten The BRCA1 Gene and Protein

Simple, rapid, and reliable RNA sequencing

New technologies reaching the clinic

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

CRISPR/Cas9 Enrichment and Long-read WGS for Structural Variant Discovery

Eukaryotic small RNA Small RNAseq data analysis for mirna identification

Assignment 5: Integrative epigenomics analysis

Multiplex target enrichment using DNA indexing for ultra-high throughput variant detection

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

Supplementary Material for IPred - Integrating Ab Initio and Evidence Based Gene Predictions to Improve Prediction Accuracy

Hao D. H., Ma W. G., Sheng Y. L., Zhang J. B., Jin Y. F., Yang H. Q., Li Z. G., Wang S. S., GONG Ming*

ChIP-seq data analysis

SCALPEL MICRO-ASSEMBLY APPROACH TO DETECT INDELS WITHIN EXOME-CAPTURE DATA. Giuseppe Narzisi, PhD Schatz Lab

A Quick-Start Guide for rseqdiff

Benefits and pitfalls of new genetic tests

A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis

Assembling complete genomes for oral pathogens and methanogens. Mia Sales, Undergraduate Research Assistant

NEXT GENERATION SEQUENCING. R. Piazza (MD, PhD) Dept. of Medicine and Surgery, University of Milano-Bicocca

RNA-Seq Preparation Comparision Summary: Lexogen, Standard, NEB

SUPPLEMENTARY INFORMATION

Towards Personalized Medicine: An Improved De Novo Assembly Procedure for Early Detection of Drug Resistant HIV Minor Quasispecies in Patient Samples

Introduction to Systems Biology of Cancer Lecture 2

Lectures 13: High throughput sequencing: Beyond the genome. Spring 2017 March 28, 2017

Performance Characteristics BRCA MASTR Plus Dx

AVENIO ctdna Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB

Genomic Epidemiology of Salmonella enterica Serotype Enteritidis based on Population Structure of Prevalent Lineages

Investigating rare diseases with Agilent NGS solutions

HIV DNA Genotyping by UDS compared with cumulative HIV RNA Genotypes in Pretreated Patients

Single-strand DNA library preparation improves sequencing of formalin-fixed and paraffin-embedded (FFPE) cancer DNA

Obstacles and challenges in the analysis of microrna sequencing data

A Comparison of Next Generation Sequencing Technologies for Transcriptome Assembly and Utility for RNA-Seq in a Non-Model Bird

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction

ACE ImmunoID Biomarker Discovery Solutions ACE ImmunoID Platform for Tumor Immunogenomics

Whole Genome and Transcriptome Analysis of Anaplastic Meningioma. Patrick Tarpey Cancer Genome Project Wellcome Trust Sanger Institute

Evaluation of MIA FORA NGS HLA test and software. Lisa Creary, PhD Department of Pathology Stanford Blood Center Research & Development Group

2/10/2016. Evaluation of MIA FORA NGS HLA test and software. Disclosure. NGS-HLA typing requirements for the Stanford Blood Center

High-throughput transcriptome sequencing

Somatic cancer applications of NGS in in vitro Diagnostics.

An Analysis of MDM4 Alternative Splicing and Effects Across Cancer Cell Lines

SUPPLEMENTAL INFORMATION

Comprehensive Chromosome Screening Is NextGen Likely to be the Final Best Platform and What are its Advantages and Quirks?

Not IN Our Genes - A Different Kind of Inheritance.! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014

Diagnosis of infectious diseases and confirmation of diagnosis. Molecular epidemiology of emerging/re-emerging pathogens

MPS for translocations

Nature Methods: doi: /nmeth.3115

Cancer Gene Panels. Dr. Andreas Scherer. Dr. Andreas Scherer President and CEO Golden Helix, Inc. Twitter: andreasscherer

Golden Helix s End-to-End Solution for Clinical Labs

microrna analysis Merete Molton Worren Ståle Nygård

BWA alignment to reference transcriptome and genome. Convert transcriptome mappings back to genome space

HBV. Next Generation Sequencing, data analysis and reporting. Presenter Leen-Jan van Doorn

Arabidopsis thaliana small RNA Sequencing. Report

The Sequencing Continuum for Clinical Research: From Sanger to Next Gen Webinar 12 March 2014

RNA SEQUENCING AND DATA ANALYSIS

Practical challenges that copy number variation and whole genome sequencing create for genetic diagnostic labs

Deep-Sequencing of HIV-1

Metabolomic and Proteomics Solutions for Integrated Biology. Christine Miller Omics Market Manager ASMS 2015

High Throughput Sequence (HTS) data analysis. Lei Zhou

Next Generation Sequencing as a tool for breakpoint analysis in rearrangements of the globin-gene clusters

Nature Biotechnology: doi: /nbt.1904

Selective depletion of abundant RNAs to enable transcriptome analysis of lowinput and highly-degraded RNA from FFPE breast cancer samples

MODULE 4: SPLICING. Removal of introns from messenger RNA by splicing

Illuminating the genetics of complex human diseases

RNA-seq Introduction

Cytogenetics 101: Clinical Research and Molecular Genetic Technologies

Illumina Trusight Myeloid Panel validation A R FHAN R A FIQ

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays

VIP: an integrated pipeline for metagenomics of virus

Transcription:

Databases and Tools for High Throughput Sequencing Analysis P. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( ); g ( ) Bioinformatics Center, Chang Gung University.

HTseq Platforms

Applications on Biomedical Sciences

Analysis Strategies: Reference Sequence Alignment (Mapping) vs De novo Assembly or transcriptome

HTseq Experiment

Great I got my data now what Data and information management is slowly moving out of infancy in genomics science. at the toddler stage The Good news Some data formats are being accepted widely The Bad news Still many competing standards in some areas Interoperability of data standards is almost non existent Governance is questionable

Storage & Computing Power Storage & Computing Power Next gen sequencers generated Giga bp to Tera bp of data

Data Format Types Raw Sequence Data e.g. fasta Aligned data e.g. BAM Processed data e.g. BED

Interpreting raw data

How deep should we go? coverage (a) 80% of yeast genes (genome size: ~120MB) were detected at 4 million uniquely mapped RNA Seq reads, and coverage reaches a plateau afterwards despite the increasing sequencing depth. Expressed genes are defined as having at least four independent reads from a 50 bp window at the 3' end. (b) The number of unique start sites detected starts to reach a plateau when the depth of sequencing reaches 80 million in two mouse transcriptomes. ES, embryonic stem cells; EB, embryonic body. Nature Reviews Genetics 10, 57 63

Genome Size De novo assembled rice transcriptome 1.3 Gb RNA Seq data (genome size: ~400MB) 85% of assembled unigenes were covered by gene models

HTseq Raw Data Format fasta (Sanger) csfasta (SOLiD) fastq (Solexa) sff (454). And about 30 other file formats http://emboss sourceforge http://emboss.sourceforge.net/docs/themes/ SequenceFormats.html

SOLiD Color Space

(cs cs)fasta Fasta/( /(cs cs)fastq FASTA Header line > Sequence FASTQ Add QVs encoded as single byte ASCII codes Most aligners accept FASTA/Q as input Issue: dt data is volumous (2 bt bytes per base for FASTQ) Do PHRED scaled values provide the most information?

Fastq: Illumina & Snager

Fastq: Illumina & NCBI

sff (text format): 454

454 fasta with quality file

454 base quality?

All Platforms have Errors Illumina SoLID/ABI Life Roche 454 Ion Torrent 1. Removal of low quality bases/ Low complexity regions 2. Removal of adaptor sequences 3. Homopolymer-associated base call errors (3 or more identical DNA bases) causes higher number of (artificial) frameshifts

Trace File High quality region NO ambiguities (Ns) Medium quality region SOME ambiguities (Ns) Poor quality region LOW confidence

Quality Control Is Essential

Accessing Quality: phred scores

Accessing Quality: phred scores

454 output formats Standard flowgram format.sff.fna.qual

Illumina output formats.seq.txt.prb.txt Illumina FASTQ (ASCII 64 is Illumina score) Qseq (ASCII 64 is Phred score) Phred quality scores Illuminasingle line format SCARF 28 Solexa Compact ASCII Read Format

Illumina FastQ ASCII value for h= 103 Quality of Base A at the position 1 = 103 64 103 64 = 39 Where 39isthe phred score

Quality Control Read quality distribution Library insert size Mapping Rate Duplication assessment

Quality Control Tools

NGS QC Toolkit & FastQC NGS QC Toolkit is for quality check and filtering i of high quality h read This toolkit is a standalone and open source application freely available at http://www.nipgr.res.in/ngsqctoolkit.html i / t l Application have been implemented in Perl programming language QC of sequencing data generated using Roche 454 and Illumina platforms Additionaltools tools to aid QC : (sequence format converter and trimming tools) and analysis (statistics tools) FastQC can be used only for preliminary analysis

http://www.ncbi.nlm.nih.gov/geo/

http://www.ncbi.nlm.nih.gov/gds/ expression profiling by array expression profiling by genome tiling array expression profiling by high throughput sequencing expression profiling by mpss expression profiling by rt pcr expression profiling by sage expression profiling by snp array genome binding/occupancy profiling by array genome binding/occupancy profiling by genome tiling array genome binding/occupancy profiling by high throughput sequencing genome binding/occupancy profiling by snp array genome variation profiling by array genome variation profiling by genome tiling array genome variation profiling by high throughput sequencing genome variation profiling by snp array methylation profiling by array methylation profiling by genome tiling array methylation profiling by high throughput sequencing methylation profiling by snp array non coding rna profiling by array non coding rna profiling by genome tiling array non coding rna profiling by high throughput sequencing other protein profiling by mass spec protein profiling by protein array snp genotyping by snp array third party reanalysis

"Illumina Genome Analyzer" AND smallrna

http://seqanswers.com/