NGS, Cancer and Bioinforma;cs. 20/10/15 Yannick Boursin

Similar documents
Copy Number Varia/on Detec/on. Alex Mawla UCD Genome Center Bioinforma5cs Core Tuesday June 16, 2015

DNA-seq Bioinformatics Analysis: Copy Number Variation

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction

DNA Sequence Bioinformatics Analysis with the Galaxy Platform

Canadian Bioinforma1cs Workshops

P. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( ); g ( ) Bioinformatics Center, Chang Gung University.

Recherche de variants génomiques en oncologie clinique. Avec des diapos, données & scripts R de: Yannick Boursin, IGR Bastien Job, IGR

Analysis with SureCall 2.1

Below, we included the point-to-point response to the comments of both reviewers.

ChIP-seq data analysis

AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits

RNA- seq Introduc1on. Promises and pi7alls

ChIP-seq hands-on. Iros Barozzi, Campus IFOM-IEO (Milan) Saverio Minucci, Gioacchino Natoli Labs

Breast and ovarian cancer in Serbia: the importance of mutation detection in hereditary predisposition genes using NGS

Whole Genome and Transcriptome Analysis of Anaplastic Meningioma. Patrick Tarpey Cancer Genome Project Wellcome Trust Sanger Institute

Characteriza*on of Soma*c Muta*ons in Cancer Genomes

AVENIO ctdna Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB

Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research

Dr Rick Tearle Senior Applications Specialist, EMEA Complete Genomics Complete Genomics, Inc.

Simple, rapid, and reliable RNA sequencing

Detection of aneuploidy in a single cell using the Ion ReproSeq PGS View Kit

Cancer Gene Panels. Dr. Andreas Scherer. Dr. Andreas Scherer President and CEO Golden Helix, Inc. Twitter: andreasscherer

Implementation of nation-wide molecular testing in oncology in the French Health care system : quality assurance issues & challenges

Module 3: Pathway and Drug Development

Investigating rare diseases with Agilent NGS solutions

Integrated Analysis of Copy Number and Gene Expression

Characterisation of structural variation in breast. cancer genomes using paired-end sequencing on. the Illumina Genome Analyser

Advance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library

Small RNAs and how to analyze them using sequencing

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

PREPARED FOR: U.S. Army Medical Research and Materiel Command Fort Detrick, Maryland

Transcript reconstruction

cn.mops - Mixture of Poissons for CNV detection in NGS data Günter Klambauer Institute of Bioinformatics, Johannes Kepler University Linz

Golden Helix s End-to-End Solution for Clinical Labs

Part-II: Statistical analysis of ChIP-seq data

Reconstruc*ng Human Tumor Histories By Comparing Genomes From Different Parts of the Same Cancer

Classifica4on. CSCI1950 Z Computa4onal Methods for Biology Lecture 18. Ben Raphael April 8, hip://cs.brown.edu/courses/csci1950 z/

RNA SEQUENCING AND DATA ANALYSIS

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

Single-strand DNA library preparation improves sequencing of formalin-fixed and paraffin-embedded (FFPE) cancer DNA

Chip Seq Peak Calling in Galaxy

Multiplex target enrichment using DNA indexing for ultra-high throughput variant detection

NGS in Cancer Pathology After the Microscope: From Nucleic Acid to Interpretation

SCALPEL MICRO-ASSEMBLY APPROACH TO DETECT INDELS WITHIN EXOME-CAPTURE DATA. Giuseppe Narzisi, PhD Schatz Lab

ACE ImmunoID Biomarker Discovery Solutions ACE ImmunoID Platform for Tumor Immunogenomics

Assessing Laboratory Performance for Next Generation Sequencing Based Detection of Germline Variants through Proficiency Testing

Small RNA Sequencing. Project Workflow. Service Description. Sequencing Service Specification BGISEQ-500 SERVICE OVERVIEW SAMPLE PREPARATION

Clinical Utility of Actionable Genome Information in Precision Oncology Clinic

Case Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD

Review: Genome assembly Reads

SubLasso:a feature selection and classification R package with a. fixed feature subset

Session 4 Rebecca Poulos

ACE ImmunoID. ACE ImmunoID. Precision immunogenomics. Precision Genomics for Immuno-Oncology

Bigomics : Challenges and promises in large scale sequencing projects

ncounter Data Analysis Guidelines for Copy Number Variation (CNV) Molecules That Count NanoString Technologies, Inc.

cn.mops - Mixture of Poissons for CNV detection in NGS data Günter Klambauer Institute of Bioinformatics, Johannes Kepler University Linz

BWA alignment to reference transcriptome and genome. Convert transcriptome mappings back to genome space

Genome. Institute. GenomeVIP: A Genomics Analysis Pipeline for Cloud Computing with Germline and Somatic Calling on Amazon s Cloud. R. Jay Mashl.

Valida5on of a Microsatellite Instability Assay by NGS

Calling DNA variants SNVs, CNVs, and SVs. Steve Laurie Variant Effect Predictor Training Course Prague, 6 th November 2017

SUPPLEMENTARY INFORMATION

Fluxion Biosciences and Swift Biosciences Somatic variant detection from liquid biopsy samples using targeted NGS

Exercises: Differential Methylation

New Drug development and Personalized Therapy in The Era of Molecular Medicine

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16

Lyon, 1 3 February 2012 Auditorium

STREAMLINED MUTATION ANALYSIS FOR CLINICAL NEXT GENERATION SEQUENCING DATA

Illuminating the genetics of complex human diseases

Tumor mutational burden and its transition towards the clinic

Nature Methods: doi: /nmeth.3115

QIAGEN Complete Solutions for Liquid Biopsy Molecular Testing

Session 4 Rebecca Poulos

User Guide. Association analysis. Input

RNA-Seq Preparation Comparision Summary: Lexogen, Standard, NEB

A complete next-generation sequencing workfl ow for circulating cell-free DNA isolation and analysis

OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies

Introduction. Introduction

Hands-On Ten The BRCA1 Gene and Protein

TOWARDS ACCURATE GERMLINE AND SOMATIC INDEL DISCOVERY WITH MICRO-ASSEMBLY. Giuseppe Narzisi, PhD Bioinformatics Scientist

Data mining with Ensembl Biomart. Stéphanie Le Gras

Detection of copy number variations in PCR-enriched targeted sequencing data

RNA-seq Introduction

Personalised medicine: Past, present and future

5 th July 2016 ACGS Dr Michelle Wood Laboratory Genetics, Cardiff

The mutations that drive cancer. Paul Edwards. Department of Pathology and Cancer Research UK Cambridge Institute, University of Cambridge

RNA SEQUENCING AND DATA ANALYSIS

PRECISION INSIGHTS. Liquid GPS. Blood-based tumor profiling and quantitative monitoring. Reveal more with cfdna + cfrna.

Using the Bravo Liquid-Handling System for Next Generation Sequencing Sample Prep

The Cancer Genome Atlas & International Cancer Genome Consortium

Ginkgo Interactive analysis and quality assessment of single-cell CNV data

Colorspace & Matching

SVIM: Structural variant identification with long reads DAVID HELLER MAX PLANCK INSTITUTE FOR MOLECULAR GENETICS, BERLIN JUNE 2O18, SMRT LEIDEN

The Epigenome Tools 2: ChIP-Seq and Data Analysis

VirusDetect pipeline - virus detection with small RNA sequencing

TCGA. The Cancer Genome Atlas

Nature Biotechnology: doi: /nbt.1904

VARIANT PRIORIZATION AND ANALYSIS INCORPORATING PROBLEMATIC REGIONS OF THE GENOME ANIL PATWARDHAN

NGS in tissue and liquid biopsy

Transform genomic data into real-life results

Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics, 2010

Transcription:

NGS, Cancer and Bioinforma;cs 1

NGS and Clinical Oncology NGS in hereditary cancer genome tes;ng BRCA1/2 (breast/ovary cancer) XPC (melanoma) ERCC1 (colorectal cancer) NGS for personalized cancer treatment Clinical trials: MOSCATO (GR), SAFIR (GR), SHIVA (Curie), Ipilimumab (an;-ctla4), Nivolumab (an;-pd1), Trastuzumab (an;-her2), Cetuximab (an;-egfr) Detec;on of chimeric transcripts Chronic Myeloid Leukemia: Philadelphia chromosome (BCR/ABL) Non-Small-Cell Lung Cancer: EML4-ALK 2

NGS and Oncology NGS is now widely used as: A research tool to screen a large amount of cancer samples A clinical/diagnosis tool in daily prac;ce These projects require dedicated bioinforma;cs integra;on project to access and analyses this huge amount of data. 1 3

Why do we need computers for NGS Sequencing data size evolu7on Needs to address Store PetaBytes of data (1 PB is 1000 TB). Share data around the world through networks Analyze huge amounts of data with complex algorithms 4

Bioinformatics and Oncology Problem: finding, extrac;ng, and presen;ng relevant informa;ons. Par;al solu;on: designing workflows in order to ease data analysis. 5

Interdisciplinary collaboration Bioinforma;cs acts as a hubs between the different fields. Trust between partners is needed, training is needed as well for efficient understanding. Biology knowledge Knowledge modeling, Bioinformatics Medical staff Clinicians, specialists, Raw data storage Integration of biological and clinical data Quality Control Data analysis Clinical Biostatistics Report for biological/medical staff Biological staff Biologists, Geneticists, Technological platforms Sequencing, Microarrays, ImmunoChemistry, 6

Standard Workflow for NGS Analysis Depends on the NGS Application Sequencing & Primary Analysis Raw Reads Reads Cleaning Reads Mapping Data Analysis QC: 1 QC: 2 QC: 3 A typical NGS workflow 7

Step 1: Quality Check and improvements 8

NGS Data: what do they look like? A raw data file (.fastq,.sff,.fa,.csfasta/.qual) with millions of short reads of the same size (SOLiD, HiSeq) or reads of different size (Ion PGM/Proton) Enhanced view of the reads in a fastq file 9

FASTQ format 1 sequence = 1 read = 4 lines in the file First line = sequence iden;fier 10

Fourth line = Quality FASTQ format ASCII encoded (Reduce the file size) 11

Sequence quality encoding Phred scores Q : Q scores are defined as a property that is logarithmically related to the base-calling error probabilities (P). Q = -10 log10 P 12

Quality controls on raw reads : lets start after sequencing A raw read is characterized by three parameters: Its length Its sequence Per-base-in-sequence quality ACTGATTAGTCTGAATTAGANNGATAGGAT GATCGATGCATAGCGATCAGCATCGATACG CGGCGCTCCGCTCTCGAAACTAGCACTGAC AGCATCAGGATCTACGATCTAGCGAACTGAC ACTACTTACGACATCGAGGTTAGGAGCATCA ACTAGGCATCGGCATCACGGACNNNNNNNN ACTAGCTATCGAGCTATCAGCGAGCATCTATC ACTAGCTACTATCGAGCGAGCGATCATCGAC CTGACTACTATCGAGCGAGCTACTAACTGAC ACTATCAGCTAGCGCTTCAGCATTACCGT ACTANNGACTAGGAATTAGCTACTGAGCTAC ACTAGCAGCTATATGAGCTACTAGCACTGAC NNNNNNNNNNNNNNNNNNNNNNNNNNNNN Raw reads 13

Why looking at sequencing quality? Quality of data is very important for various downstream analyses: Sequence assembly or mapping Variants detec;on Gene expression studies... Quality of data = poor Try to find a reason Can we correct/improve the quality? May lead to erroneous conclusions 14

Quality controls on raw reads: which metrics to check? Mainly: Quality score per base and over the reads But also: Read length distribu;on Sequence content per base and % of GC Kmers content Overrepresented sequences Duplicated reads 15

Quality scores Per base (Box Whisker type plot) -> to see wether base calls falls into low quality (commonly towards the end of a read) Per sequence (mean quality distribu;on) -> to see if a subset of your sequences have universally low quality values 16

Quality scores PGM run A PGM run A PGM run B PGM run B 17

Quality scores Illumina run C Illumina run C Illumina run D Illumina run D 18

Quality control on raw reads: adapters removal An adapter is a small piece of known DNA located at the end of the reads Adapters roles: Hang read to the sequencer flowcell Allows a specific PCR enrichment of reads having adapter Use in mul;plex sequencing (samples in mix) Available tools to trim adapters: Cutadapt SeqPrep RmAdapter In blue: adapters. In orange: informa;ve part of the read. 19

Quality controls on raw reads : lets start after sequencing A first Quality Control of raw reads is mandatory and can be established according to the applica;on ('N', adapter sequences, barcode, contamina;on, etc.) ACTGATTAGTCTGAATTAGANNGATAGGAT GATCGATGCATAGCGATCAGCATCGATACG CGGCGCTCCGCTCTCGAAACTAGCATCGAC ACTGAC AGCATCAGGATCTACGATCTAGCGAACTGAC ACTGAC ACTACTTACGACATCGAGGTTAGGAGCATCA ACTAGGCATCGGCATCACGGACNNNNNNNN ACTAGCTATCGAGCTATCAGCGAGCATCTATC ACTAGCTACTATCGAGCGAGCGATCATCGAC CTGACTACTATCGAGCGAGCTACTAACTGAC ACTGAC ACTATCAGCTAGCGCTTCAGCATTACCGT ACTANNGACTAGGAATTAGCTACTGAGCTAC ACTAGCAGCTATATGAGCTACTAGCACTGAC ACTGAC NNNNNNNNNNNNNNNNNNNNNNNNNNNNN Processed reads: blue parts are to be kept, green and red parts to be removed 20

Quality controls : Standard Workflow for NGS Analysis Depends on the NGS Application Sequencing & Primary Analysis Raw Reads Reads Cleaning Reads Mapping Data Analysis QC: 1 QC: 2 QC: 3 A typical NGS workflow 21

Step 2: Short Reads Alignment 22

Reads alignment - Vocabulary Alignment : (mapping) The reads alignment aims at transforming the single reads informa;on in an organized and reduced set of informa;on. Mismatch : Incoherence between two nucleo;des Reference Genome : The reference genome is a known sequence, supposed to be as close as possible to the input genome, and which is used as an anchor to organize the single reads informa;on. Gap : Bridge within the read alignment (i.e. small Inser;on/dele;on) Mappability : Uniqueness of a region (repeated region = low mappability, unique region = good mappability) Indels : Inser;on/Dele;on into the reference genome 23

Reads alignment Two strategies The reads alignment aims at transforming the single reads informa;on in an organized and reduced set of informa;on. Two strategies can be applied : - De novo Reads Assembly Used when no reference genome are available. It aims at reconstruc;ng long scaffolds from single reads informa;on. - Alignment on a Reference Genome The reads are directly compared to a known reference genome. 24

Alignment on a reference genome The reference genome is a known sequence, supposed to be as close as possible to the input genome, and which is used as an anchor to organize the single reads informa;on. T T T A C G A A C T A C G A G C T C C T A T G C C A A C A G C T A C T A C G A C T T C A T C T A C T T T A C G A C G A G C T G C G A G C T G T C C T A G C A G C T G C G A C G A G C T A C C T T G G C T A C G A G A G C T A C T G G C C A A C C G G C C A A Reference Genome Sequence A C T A C G A C T C T A C G A G C A T C T A C G A G C T A C T A G C G A T C T A C G A G C T G C G A G C A A C G GC C A A C Alignment of reads against reference genome 25

Alignment on a reference genome The reference genome is a known sequence, supposed to be as close as possible to the input genome, and which is used as an anchor to organize the single reads informa;on. T G C C A A C A C C T T G G C G A G C T G A C G A G C T G G C C A A C C G G C C A A T C C T A G C A G C T G C G G C T C C T A C G A G C T G T T T A C G A A G C T A C T T T T A C G A A G C T A C T A C G A C T T C T A C G A G A C T A C G A C A T C T A C Reference Genome Sequence A C T A C G A C T C T A C G A G C A T C T A C G A G C T A C T A G C G A T C T A C G A G C T G C G A G C A A C G GC C A Homozygous Polymorphism (T/C) Alignment of reads against reference genome 26

Alignment on a reference genome - Challenges New alignment algorithms must address the requirements and characterics of NGS reads Millions of reads per run (30x of genome coverage) Reads of different size (35bp - 200bp) Different types of reads (single-end, paired-end, mate-pair, etc.) Base-calling quality factors Sequencing errors ( ~ 1%) Repe;;ve regions Sequencing organism vs. reference genome Must adjust to evolving sequencing technologies and data formats 27

Alignment on a reference genome Bioinformatics tools Mappers timeline (since 2001) 28

Finding the best alignment - Rational Given a reference and a set of reads, report at least one good local alignment for each read if one exists What is good? For now, we concentrate on: Fewer mismatches is beuer T G A T C A T A... Is better than G A T C A A T G A T.C A T A... G A G A A T Failing to align a low-quality base is beuer than failing to align a high-quality base T G A T A T T A... Is better than G A T c a.t T G A T c a T A... G T A C A T Based on a scoring system, i.e. score for a match (1), MM penalty (3), gap open penalty (5), gap extension penalty (2). The best alignment is the one with the highest score. 29

Alignment key parameters - Repeats Approximately 50% of the human genome is comprised of repeats Treangen T.J. and Salzberg S.L. 2012. Nature review Gene;cs 13, 36-46 NGS and Bioinformatics 30

Alignment key parameters - Repeats Close proximity with genes : intergenic and intragenic posi;ons BRCA2: a mosaic of repeated regions 31

Alignment key parameters Repeats 3 strategies -1- Report only unique alignment -2- Report best alignments and randomly assign reads across equaly good loci -3- Report all (best) alignments -1- -2- -3- A B A B A B Treangen T.J. and Salzberg S.L. 2012. Nature review Gene;cs 13, 36-46 32

Alignment key parameters Using single or paired-end reads? The type of sequencing (i.e. single or paired-end reads) is owen driven by the applica;on. Exemple : Finding large indels, genomic rearrangements,... However, in most of the case, the pair informa;on can improve the mapping specificity - Single-end alignment repeated sequence A C G A C T C A C G A C T C Reference Genome Sequence A C T A C G A C T C T A C G A G C A T C T A C G A G C T A C T A G C G A T C T A C G A G C T G C G A G C A A C G GC C A A C - Paired-end alignment unique sequence A C G A C T C G G C C A A C A C G A C T C G G C C A A C Reference Genome Sequence A C T A C G A C T C T A C G A G C A T C T A C G A G C T A C T A G C G A T C T A C G A G C T G C G A G C A A C G GC C A A C Alignment of reads against reference genome 33

Key points Alignment on a reference genome The alignment is a crucial step of the NGS analysis. The reference genome has to be carefully chosen. The mappability of the region of interest has to be taken into account (primer design). The scoring method has to be chosen accordingly to the sequencing error rate and the quality of the raw reads. The alignment parameters have to be set properly. 34

Limitations of Alignment Tools Even if we have now some nice tools to align reads on a reference genome, several issues are s;ll important : - Homopolymer mapping - Efficiently align small indels - Alignment on several genomes - Alignment on repeated sequences -... 35

Alignment formats A lot of formats exists: SAM BAM ELAND (Illumina specific) MAQ map SAM and BAM are now the standard for aligned data 36

SAM format SAM for Sequence Alignment Map Tabulated text file 1 line per read Each line is composed of 11 fields (minimum) 37

SAM format 11695_6 0 chr1 3292760 255 20M * 0 0 AAGAGATCTGGAACCATAGA DGDFCDGFFGBEFFGFDEEF XA:i:0 MD:Z:20 NM:i:0 XX:i:3984 9985_1 0 chr1 3292761 255 19M * 0 0 AGAGATCTGGAACCATAGA IIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:19 NM:i:0 XX:i:3990 4226_1 0 chr1 3296594 255 22M * 0 0 TCTGCAAGGCAAAAGACACTGT GHHHHHGHGHHHGHHHHBHBGG XA:i:0 MD:Z:22 NM:i:0 XX:i:4194 7001_1 0 chr1 3328828 255 20M * 0 0 AAGAAAGAGAACTTCAGACC GGGG+GGGGGGIIIIIBHII XA:i:0 MD:Z:20 NM:i:0 XX:i:2357 1042_1 0 chr1 3334731 255 21M * 0 0 GGGACTCAGCAGAACTTAGGA?@GGGDGGGG>DDGGGGGGDB XA:i:0 MD:Z:21 NM:i:0 XX:i:1027 14647_1 0 chr1 3334756 255 23M * 0 0 AGTCTGAACAGGTTAGAGGGTGC IIIIIIEGIHIGID<DBDGDBGB XA:i:0 MD:Z:23 NM:i:0 XX:i:1910 38

SAM format Second field can be used for quick sort of file With Samtools (command line) and f et F op;ons Useful webpage: hup://broadins;tute.github.io/picard/explain-flags.html 39

BAM format BAM for Binary Alignment/Map Correspond to SAM format compressed as BGZF Reduce by 5 ;mes the size of the alignment file Not directly readable as SAM format Require Samtools Best format for alignment file sharing Couples with an index file (BAI) Avoid a sequen;al read of the complete file 40

Quality controls on aligned data : Standard workflow for NGS analysis Depends on the NGS Application Sequencing & Primary Analysis Raw Reads Reads Cleaning Reads Mapping Data Analysis QC: 1 QC: 2 QC: 3 A typical NGS workflow 41

QC 3 : Which metric to check? In prac7ce, how to validate my alignment? Be aware of the mapping strategy used Look at simple descrip;ve sta;s;cs Number of aligned reads Coverage/Depth Mapping quality Number of normal/abnormal pairs for paired-end data Strand bias... 42

Paired-end mapping Insert-size checking % of "All Good"= both reads in the pair have aligned "the pair is properly aligned" meaning that they mapped within a proper distance from each other % of "All Bad" = neither the read nor its mate mapped % of Only one read maps = only one read in a pair is mapped 43

NGS Analysis : How can I work with my NGS data? Difficult on personal computer (lack of ressources) 1 alignement = 4 processors + 15gb Ram (to mul;ply by the number of samples) Impossible to open files into sofwares like text editor Need a very large storage capacity Data backup administra;on Applica;ons server connected to a compu;ng cluster and storage array: Commercials solu;on (CLC Bio, NextGene,...) Galaxy server: hwps://galaxy.gustaveroussy.fr/galaxyprod 44

Data analysis Depends on the NGS Application Sequencing & Primary Analysis Raw Reads Reads Cleaning Reads Mapping Data Analysis QC: 1 QC: 2 QC: 3 A typical NGS workflow 45

Data Analyses in Cancer 20/10/15 Chimeric transcript search Alterna;ve transcripts study Differen;al expression study Methyla;on study Detec;on of genomic variants Detec;on of copynumber varia;on Yannick Boursin 46

Chimeric transcripts Does the tumoral cells express any chimeric transcript? History of the bcr-abl fusion 47

Alternative transcripts 48

Differential expression Are there genes that would be strongly expressed in one kind of tumor that are not in the other kind? Can we group tumors according to their expression profiles? Clustering differen;al expression in breast tumours. 49

Methylome Is there any difference between DNA methyla;on in tumors and in normal cells? How does methyla;on promotes cancer? 50

Detection of copynumber variations Are there any copy-number altera;on (gain or loss of chomosomal regions, amplifica;ons ) that could explain tumorigenesis? Copynumber varia;ons in cancer. MYC and KRAS are amplified. 51

Detection of genomic variants Are there muta;onal events that are specific to the tumoral genome? Could the tumorigenesis be explained by those? Is there any drug targe;ng those muta;ons? Pancreas adenocarcinoma: from normal cells to tumoral cells 52

Limitations: Detection of genomic variants Between 1.4 and 8.9 % of the variants are technology specific 53

Limitations: Detection of genomic variants Common genomic variants between different variant callers 54

Conclusion Nowadays, NGS is widely used in cancer centers in order to categorize cancers and link pa;ents with personnalized treatments (Precision Medicine) NGS are also used in cancer research, in order to discover new oncogene;c mechanisms, to understand the way a treatment works, to link biological and gene;cal characters Due to technical and how-the-universe-works-related issues, using NGS might not solve your problems. It is important to know that the technique is limited: A) by the ques;on you asked at first. If a cancer cannot be explained by muta;onal events, it might be explained by other mechanisms. But s;ll, nothing is to be found in data. B) by technical issues. Sequencers and sowwares are prone to errors. Sta;s;cally, there will be at least one error for your analysis. You can owen limit the role of this limita;on by making biological and technical replicates. 55

Galaxy: a web-based genome analysis platform Galaxy is an open-source framework for integra;ng various computa;onal tools and databases into a cohesive workspace hwps://main.g2.bx.psu.edu/ A web-based service that provides and integrates many popular tools and resources for compara;ve genomics A completely self-contained applica;on for building your own Galaxy style sites 29 janvier 2015 Forma;on NGS & Cancer - Analyses Exome

Galaxy: the instant web-based tool and data resource integration platform Open Source downloadable package that can be deployed in individual labs Modularized Add new tools Integrate new data sources Easy to plug in your own components Straigh orward to run your own private galaxy server 29 janvier 2015 Forma;on NGS & Cancer - Analyses Exome

Galaxy: the one-stop shop for genome analysis Analyze Retrieve shared data between galaxy users or upload your own Interac;vely manipulate genomic data with a comprehensive and expanding best-prac;ces toolset Galaxy is designed to work with many different datatypes. hup://wiki.galaxyproject.org/learn/datatypes Visualize Visual analysis environment of your data, your analysis workflows. Publish and Share Results and step-by-step analysis record (Data Libraries and Histories) Customizable pipelines (Workflows) Complete protocols/documenta;ons (Pages) 29 janvier 2015 Forma;on NGS & Cancer - Analyses Exome

https://galaxy.gustaveroussy.fr/galaxyprod 29 janvier 2015 Forma;on NGS & Cancer - Analyses Exome

Data libraries Datasets are accessible from Galaxy or for download. 29 janvier 2015 Forma;on NGS & Cancer - Analyses Exome

History Histories are all steps in the process and the used se}ng. Histories can be imported into your session and rerun as is or modified. 29 janvier 2015 Forma;on NGS & Cancer - Analyses Exome

Workflows Workflows specify the steps in a process (a suite of ordered tools). Workflows are analyses that are meant to be run, each ;me with different user-provided datasets. 29 janvier 2015 Forma;on NGS & Cancer - Analyses Exome

User account Galaxy public Main or Test instances An account is not required to access it But if used, the data quota is increased and full func;onality across sessions opens up, such as naming, saving, sharing, and publishing Galaxy objects (Histories, Workflows, Datasets, Pages). Galaxy @ GR: hups://galaxy.gustaveroussy.fr/galaxyprod An account is required to access it full func;onality across sessions opens up, such as naming, saving, sharing, and publishing Galaxy objects (Histories, Workflows, Datasets, Pages). 29 janvier 2015 Forma;on NGS & Cancer - Analyses Exome

64