RNA SEQUENCING AND DATA ANALYSIS

Similar documents
RNA SEQUENCING AND DATA ANALYSIS

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

genomics for systems biology / ISB2020 RNA sequencing (RNA-seq)

Machine-Learning on Prediction of Inherited Genomic Susceptibility for 20 Major Cancers

BWA alignment to reference transcriptome and genome. Convert transcriptome mappings back to genome space

Iso-Seq Method Updates and Target Enrichment Without Amplification for SMRT Sequencing

Exploring TCGA Pan-Cancer Data at the UCSC Cancer Genomics Browser

Nature Genetics: doi: /ng Supplementary Figure 1. Workflow of CDR3 sequence assembly from RNA-seq data.

Transcriptome Analysis

Selective depletion of abundant RNAs to enable transcriptome analysis of lowinput and highly-degraded RNA from FFPE breast cancer samples

Transcript reconstruction

ncounter Assay Automated Process Immobilize and align reporter for image collecting and barcode counting ncounter Prep Station

File Name: Supplementary Information Description: Supplementary Figures and Supplementary Tables. File Name: Peer Review File Description:

TCGA. The Cancer Genome Atlas

ncounter Assay Automated Process Capture & Reporter Probes Bind reporter to surface Remove excess reporters Hybridize CodeSet to RNA

Supplementary Figures

The Cancer Genome Atlas Pan-cancer analysis Katherine A. Hoadley

AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits

SUPPLEMENTARY INFORMATION

BIMM 143. RNA sequencing overview. Genome Informatics II. Barry Grant. Lecture In vivo. In vitro.

Breast and ovarian cancer in Serbia: the importance of mutation detection in hereditary predisposition genes using NGS

Session 4 Rebecca Poulos

Introduction to Systems Biology of Cancer Lecture 2

Supplementary Figure 1: LUMP Leukocytes unmethylabon to infer tumor purity

Session 4 Rebecca Poulos

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

PSSV User Manual (V2.1)

Whole Genome and Transcriptome Analysis of Anaplastic Meningioma. Patrick Tarpey Cancer Genome Project Wellcome Trust Sanger Institute

RNA-seq Introduction

CRISPR/Cas9 Enrichment and Long-read WGS for Structural Variant Discovery

Supplementary Figure 1. Copy Number Alterations TP53 Mutation Type. C-class TP53 WT. TP53 mut. Nature Genetics: doi: /ng.

Deploying the full transcriptome using RNA sequencing. Jo Vandesompele, CSO and co-founder The Non-Coding Genome May 12, 2016, Leuven

Pan-cancer analysis of expressed somatic nucleotide variants in long intergenic non-coding RNA

Fluxion Biosciences and Swift Biosciences Somatic variant detection from liquid biopsy samples using targeted NGS

The Cancer Genome Atlas & International Cancer Genome Consortium

ChIP-seq hands-on. Iros Barozzi, Campus IFOM-IEO (Milan) Saverio Minucci, Gioacchino Natoli Labs

Characterisation of structural variation in breast. cancer genomes using paired-end sequencing on. the Illumina Genome Analyser

Advance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library

MODULE 4: SPLICING. Removal of introns from messenger RNA by splicing

PSSV User Manual (V1.0)

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq

Ambient temperature regulated flowering time

Supplementary Tables. Supplementary Figures

Trinity: Transcriptome Assembly for Genetic and Functional Analysis of Cancer [U24]

MODULE 3: TRANSCRIPTION PART II

A Statistical Framework for Classification of Tumor Type from microrna Data

Simple, rapid, and reliable RNA sequencing

Circular RNAs (circrnas) act a stable mirna sponges

Solving Problems of Clustering and Classification of Cancer Diseases Based on DNA Methylation Data 1,2

Module 3: Pathway and Drug Development

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction

Nature Genetics: doi: /ng Supplementary Figure 1. SEER data for male and female cancer incidence from

Supplemental Methods RNA sequencing experiment

Hands-On Ten The BRCA1 Gene and Protein

Aliccia Bollig-Fischer, PhD Department of Oncology, Wayne State University Associate Director Genomics Core Molecular Therapeutics Program Karmanos

Genomic structural variation

Computer Science, Biology, and Biomedical Informatics (CoSBBI) Outline. Molecular Biology of Cancer AND. Goals/Expectations. David Boone 7/1/2015

Lecture 8 Understanding Transcription RNA-seq analysis. Foundations of Computational Systems Biology David K. Gifford

AVENIO ctdna Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB

ACE ImmunoID Biomarker Discovery Solutions ACE ImmunoID Platform for Tumor Immunogenomics

TCGA-Assembler: Pipeline for TCGA Data Downloading, Assembling, and Processing. (Supplementary Methods)

NGS in tissue and liquid biopsy

A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis

Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research

Supplemental Data. Integrating omics and alternative splicing i reveals insights i into grape response to high temperature

User s Manual Version 1.0

The Cancer Genome Atlas

Supplementary Information

Patnaik SK, et al. MicroRNAs to accurately histotype NSCLC biopsies

Structural Variation and Medical Genomics

Nature Getetics: doi: /ng.3471

Using the Bravo Liquid-Handling System for Next Generation Sequencing Sample Prep

The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis

RNA-Seq Preparation Comparision Summary: Lexogen, Standard, NEB

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

Figure S4. 15 Mets Whole Exome. 5 Primary Tumors Cancer Panel and WES. Next Generation Sequencing

Dr Rick Tearle Senior Applications Specialist, EMEA Complete Genomics Complete Genomics, Inc.

OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies

NGS in Cancer Pathology After the Microscope: From Nucleic Acid to Interpretation

DNA-seq Bioinformatics Analysis: Copy Number Variation

A complete next-generation sequencing workfl ow for circulating cell-free DNA isolation and analysis

of TERT, MLL4, CCNE1, SENP5, and ROCK1 on tumor development were discussed.

Supplementary Material for IPred - Integrating Ab Initio and Evidence Based Gene Predictions to Improve Prediction Accuracy

Lectures 13: High throughput sequencing: Beyond the genome. Spring 2017 March 28, 2017

MSI positive MSI negative

Elevated RNA Editing Activity Is a Major Contributor to Transcriptomic Diversity in Tumors

Cancer Informatics Lecture

Data mining with Ensembl Biomart. Stéphanie Le Gras

EXAMPLE. - Potentially responsive to PI3K/mTOR and MEK combination therapy or mtor/mek and PKC combination therapy. ratio (%)

Inference of Isoforms from Short Sequence Reads

Fusion Analysis of Solid Tumors Reveals Novel Rearrangements in Breast Carcinomas

Genomic Medicine: What every pathologist needs to know

Supplemental Information. Integrated Genomic Analysis of the Ubiquitin. Pathway across Cancer Types

Copy Number Varia/on Detec/on. Alex Mawla UCD Genome Center Bioinforma5cs Core Tuesday June 16, 2015

Expert-guided Visual Exploration (EVE) for patient stratification. Hamid Bolouri, Lue-Ping Zhao, Eric C. Holland

RNA- seq Introduc1on. Promises and pi7alls

Role of FISH in Hematological Cancers

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

Analysis with SureCall 2.1

Transcription:

RNA SEQUENCING AND DATA ANALYSIS

Download slides and package http://odin.mdacc.tmc.edu/~rverhaak/package.zip http://odin.mdacc.tmc.edu/~rverhaak/rna-seqlecture.zip

Overview Introduction into the topic RNA species Experimental design considerations Analytical approaches Discussion of our analysis pipeline Technical details Application on TCGA data sets Results Hands on

All RNA is not the same Types of RNA:

All RNA is not the same Types of RNA: Messenger RNA Micro RNA Long non-coding RNA Ribosomal RNA Other

Methods for RNA enrichment prior to library construction Poly(A)-RNA selection By hybridization to oligo-dt beads mature mrna highly enriched efficient for quantification of gene expression level and so on limitation: 3 bias correlating with RNA degradation rrna depletion: by hybridization to bead-bound rrna probes rrna sequence-dependent and species-specific all non-rrna retained: premature mrna, long non-coding RNA Small RNA extraction: Specific kits required to retain small RNA Optional fine size-selection by gel or column This lecture focuses on mrna sequencing

Length of mrna transcripts in the human genome 5,000 5,000 4,000 3,000 2,000 4,000 1,000 0 0 200 400 600 800 3,000 2,000 1,000 0 0 2,000 4,000 6,000 8,000 10,000

Length of mrna transcripts in the human genome 5,000 4,000 3,000 2,000 5,000 4,000 3,000 2,000 What is the optimal insert 1,000 and read size 0 0 for 200 mrna 400 600 800 sequencing? 1,000 0 0 2,000 4,000 6,000 8,000 10,000

Alignment versus assembly Assembly Trinity, Cufflinks, ABySS Particularly useful when no reference genome is available, like in bacterial transcriptomes Alignment Bowtie, BWA, Mosaic Maximum sensitivity, fewer false positives

Sequencing parameters Read Type, typically 36/51/76/101 bp: Single end read: Paired end read:

Sequencing parameters Read Type: Single end read: for efficient counting of transcript copy number and splicing sites Paired end read: longer cdna fragment and read length help to determine transcript structure especially within gene families Applications of RNA-sequencing

RNA sequencing applications Quantification of transcript expression levels Detection of splice variation/different isoforms of the same gene Allele specific expression levels Detection of fusion transcripts (such as BCR-ABL in CML) Detection of sequence variation (limited application) Validation of DNA sequence variants

RNA-seq expression levels are linear where microarrays get saturated or are insensitive Expression is measured as reads per kilobase per million (RPKM) to normalize for gene length and library size

Identification of fusion transcripts Popular methods search for Read pairs that map to two different genes Need to correct for gene homology Reads that span fusion junctions Split reads in half and align separate halfs Make a database of all possible fusion junctions and align full reads PRADA, MapSplice, TopHat

Variant detection All DNA mutations from TCGA renal cell clear cell carcinoma project Approximately 35% of mutations are covered sufficiently to be detected at a validation rate of ~ 80-90%. Reverse transcriptase step to convert RNA to cdna complicates detection of RNA edits and mutations

Sequencing parameters Read Depth Minimum mapped reads: 10 million for quantitative analysis of mammalian transcriptome More reads needed for splicing variant discovery and differential comparison among samples Current output: 120-180 million raw reads / lane Multiplex level: 4-12 libraries / lane recommended

RNA sequencing in The Cancer Genome Atlas mrna: poly-a mrna purified from total RNA using poly-t oligo-attached magnetic beads mirna: Total RNA is mixed with oligo(dt) MicroBeads and loaded into MACS column, which is then placed on a MultiMACS separator. From the flow-through, small RNAs, including mirnas, are recovered by ethanol precipitation.

Preprocessing.bam file [PAIRED END].fastq files [END1 & END2] INPUTS Config.txt [location of scripts and reference files] Expression & QC Module Fusion Module GUESS-ft -genea -geneb Processing Module RNA-SeQC Read Alignment Remap alignments Combine two ends Quality Scores Recalibrate d OUTPUTS RPKM & QC metrics Fusion Candidates Supervised search evidence Implementation Results Samples processed >400 KIRC >170 GBM TFG-GPR128 fusion Samples detected 5 KIRC >5 GBM Samples processed 321 normal, 85 tumor (BLCA, BRCA, HNSC, KIRC, KIRP, LIHC, LUAD, LUSC, PRAD, THCA)

RNA sequencing read alignment in PRADA Transcripts from same gene Reads are aligned to all possible transcripts Reads are also aligned to genome

RNA sequencing read alignment in PRADA Reads are aligned to all possible transcripts Reads are also aligned to genome Final and single placement for each read it determined by re-mapping

PRADA alignments advantages versus disadvantages Advantage: Alignment to unannotated transcripts Alignment across exon-exon junctions Disadvantage Alignment approaches such as used by MapSplice, Bowtie/Tophat typically split reads More conservative alignment than split-read

Preprocessing.bam file [PAIRED END].fastq files [END1 & END2] INPUTS Config.txt [location of scripts and reference files] Processing Module Expression & QC Module RNA-SeQC Fusion Module GUESS-ft -genea -geneb Read Alignment Remap alignments Combine two ends Quality Scores Recalibrate d OUTPUTS RPKM & QC metrics Fusion Candidates Supervised search evidence http://sourceforge.net/projects/prada/ PRADA focuses on the analysis of paired-end RNA-sequencing data. Four modules: 1. Processing 2. Expression and Quality Control 3. Gene fusion 4. GUESS-ft: General User defined Supervised Search for fusion transcripts

Preprocessing.bam file [PAIRED END].fastq files [END1 & END2] Read Alignment Processing Module Remap alignments INPUTS Config.txt [location of scripts and reference files] Combine two ends Quality Scores Recalibrate d Expression and QC Module RNA-SeQC Fusion Module GUESS-ft -genea -geneb Samples reads are mapped to: Transcriptome Genome Processing Module Widely use tools by the research community Samtools, BWA, Picard, GATK Enabled References versions hg18 Ensembl52 hg19 Ensembl64 RPKM & QC metrics Fusion Candidates Supervised search evidence

Preprocessing.bam file [PAIRED END].fastq files [END1 & END2] INPUTS Config.txt [location of scripts and reference files] Processing Module Expression & QC Module RNA-SeQC Fusion Module GUESS-ft RNAseQC Process (java) -genea -geneb Read Alignment Remap alignments Combine two ends Quality Scores Recalibrate d Expression & QC Module OUTPUTS RNA-SeQC provides three types of quality control metrics: Read Counts Coverage Correlation RPKM Values at transcript level For longest transcript RPKM & QC metrics Fusion Candidates Supervised search evidence

Preprocessing.bam file [PAIRED END] INPUTS Fusion Module Config.txt.fastq files Discordant [location of read scripts and pair: reference files] Each end of the [END1 & END2] read pair maps uniquely to distinct Processing Module protein-coding genes. Expression & QC Module RNA-SeQC Fusion Module GUESS-ft -genea -geneb Read Alignment Remap alignments Combine two ends Quality Scores Recalibrate d OUTPUTS Fusion spanning reads: Chimeric read that maps a putative junction and the mate read maps to either GENE A or GENE B. RPKM & QC metrics Fusion Candidates Supervised search evidence Gene A Gene B

Fusion Module Cont d Filters Gene homology using blastn (bitscore 50) Ratio of fusion spanning and discordant reads 49 bp 49 bp 50 bp 50 bp 80 bp 180 bp Number of gene partners within a sample Remove promiscuous fusion pairs, i.e. with large number of partners (e.g. >25) Number of distinct junctions Filtered Candidates: Up to 1 mismatch Unique sequences Unique start positions r t 49 + 49 = 1.5 80

Fusion Module Cont d SampleID GeneA GeneB TCGA-BP-4756-01A-01R-1289-07 SFPQ TFE3 Discordant_Pairs 350 Fusion_Reads 220 Fusion_Junctions 1 HomologyScore 26.5 FusionDiscordant_Ratio 0.628571429 Positions_Consistent GeneA_Chr GeneB_Chr Fusion_Type Breakpoint_Distance Breakpoint(s) PARTIALLY chr1 chrx Unique reads: gadiffpos 110 Unique reads: gbdiffpos 119 Unique reads: fusdiffseq 35 ga_withinsamplecount 1 gb_withinsamplecount 1 Interchromosomal 1.00E+46 ExonJunction in-frame classification* in-frame chr1.i.e7.e6.35427190_chr23.e.2.48785038 Outputs List all annotated fusions SampleID.annotated.candidates.txt List filtered annotated fusion SampleID.filtered.candidates.txt TAAGACGCATGGAAGAACTTCACAATCAAGAAATGCAGAAACGTAAAGAAATGCAATTGAG * CCTGAACTCTTTGCTTCCGGAATCCGGGATTG TTGCTGACATAGAATTAGAAAACGTCCTT

Fusion Module Cont d The identification of in-frame fusion transcripts and their predicted protein sequences. Image Source: http://upload.wikimedia.org/wikipedia/en/d/d3/mature_mrna.png Asmann Y W et al. Nucl. Acids Res. 2011;nar.gkr362 The Author(s) 2011. Published by Oxford University Press. Out of all the combinations, we consider only those fusion classification which found in primary transcripts. CDR-CDR Non CDR-CDR In-frame Out-of-frame 5 UTR to CDR 5 UTR to 3 UTR 3 UTR to 3 UTR 5 UTR to 5 UTR 3 UTR to 5 UTR CDR to 5 UTR CDR to 3 UTR

Preprocessing.bam file [PAIRED END].fastq files [END1 & END2] INPUTS Config.txt [location of scripts and reference files] Expression & QC Module Fusion Module GUESS-ft -genea -geneb Processing Module RNA-SeQC Read Alignment Remap alignments Combine two ends Quality Scores Recalibrate d OUTPUTS RPKM & QC metrics Fusion Candidates Supervised search evidence Implementation Results Samples processed >400 KIRC >170 GBM Works well in MDACC HPC* system PRADA-fusion module validation rate ~85 % (11 out of 13)

KIRC fusion results We analyzed 416 RNA-seq samples from clear cell renal carcinoma (ccrcc), available through TCGA. We identified 80 bona-fide fusion transcripts, 57 intrachromosomal 33 interchromosomal in 62 individual samples Recurrent fusions SFPQ-TFE3 (n=5, chr1-chrx) DHX33-NLRP1 (n=2, chr2) TRIP12-SLC16A14 (n=2, chr17) TFG-GRP128 (n=4, chr3)

KIRC fusion results Cont d SFPQ-TFE3 TFE3 translocations have been linked to a rare subtype of renal cancer. The five samples harboring a TFE3 fusion did not contain mutations in the ten most frequently mutated genes in ccrcc (PBRM1, PTEN, VHL, SETD2, BAP1, KDM5C, MTOR, ZNF800, PIK3CA, and TP53), except one (in VHL). This suggests that SFPQ-TFE3 fusion plays a unique role in the cancer genomics of these patients.

KIRC fusion validation PRADA-fusion module validation rate (11 out of 13) ~85% RT-PCR and FISH assays TFE3-SFPQ was validated in three individual samples Sample ID 5 Gene 3 Gene Discordant Read Pairs Fusion Span Reads Fusion Junction (s) 5 Gene Chr 3 Gene Chr Validated? TCGA-AK-3456-01A-02R-1325-07 TFE3 SFPQ 175 129 1 chrx chr1 Yes TCGA-AK-3456-01A-02R-1325-07 SFPQ TFE3 116 81 1 chr1 chrx Yes TCGA-A3-3313-01A-02R-1325-07 C6orf106 LRRC1 90 40 2 chr6 chr6 Yes TCGA-A3-3313-01A-02R-1325-07 CYP39A1 LEMD2 37 9 1 chr6 chr6 Yes TCGA-B2-4101-01A-02R-1277-07 FAM172A FHIT 17 4 1 chr5 chr3 Yes TCGA-AK-3445-01A-02R-1277-07 KIAA0802 LRRC41 14 6 1 chr18 chr1 Yes TCGA-B0-5095-01A-01R-1420-07 GORASP2 WIPF1 14 2 1 chr2 chr2 Yes TCGA-A3-3313-01A-02R-1325-07 ZNF193 MRPS18A 11 3 1 chr6 chr6 Yes TCGA-A3-3313-01A-02R-1325-07 FTSJD2 GPX6 9 8 1 chr6 chr6 Yes TCGA-B0-4945-01A-01R-1420-07 KIAA0427 GRM4 8 5 1 chr18 chr6 No TCGA-B8-4143-01A-01R-1188-07 SLC36A1 TTC37 5 5 1 chr5 chr5 No

KIRC fusion validation: RT-PCR SFPQ-TFE3 TFE3-SFPQ

KIRC fusion results We analyzed 416 RNA-seq samples from clear cell renal carcinoma (ccrcc), available through TCGA. We identified 80 bona-fide fusion transcripts, 57 intrachromosomal 33 interchromosomal in 62 individual samples Recurrent fusions SFPQ-TFE3 (n=5, chr1-chrx) DHX33-NLRP1 (n=2, chr2) TRIP12-SLC16A14 (n=2, chr17) TFG-GRP128 (n=4, chr3)

TFG-GRP128 has been reported in other cancers

TFG-GRP128 has been reported in other cancers

TFG-GRP128 has been reported in other cancers TCGA has 1,000s of RNA seq samples - how can we quickly scan many samples for the presence of this fusion?

Preprocessing.bam file [PAIRED END] INPUTS Supervised Search Module.fastq files Read Alignment Search Processing for fusion Module transcripts Remap alignments Config.txt [location of scripts and reference files] [END1 & END2] GUESS-ft: General User defined Supervised Use high quality mapping reads only, Checks read orientation fulfills fusion schema, allow up to one mismatch. Two read ends map to A and B respectively Summary report BAM Combine two ends GUESS-ft OUTPUTS Mapped to A or B Discordant reads A-B Quality Scores Recalibrate d Unmapped reads Junction DB Junction spanning reads Expression & QC Module RNA-SeQC Time consuming step Fusion Module RPKM & Fusion Parse QC metrics Candidates Unmapped reads with the other end mapping to A or B Map parsed reads to DB of all possible exon junctions List reads with one end map to junction, the other map to A or B GUESS-ft -genea -geneb Supervised search evidence

Tumors with the fusion have higher GPR128 expression levels RPKM expression pattern seen in KIRC tumors Fusion sample(s) Higher expression of GPR128 (activation) TCGA-B0-5703 w/ 1 discordant read pair in tumor sample w/ 33 discordant read pair in matched normal

Identification of TFG-GRP128 fusion All available normal samples in cghub Subset of tumor samples selected based on RPKM expression pattern Table. Samples across cancer types Cancer Type # of normal samples # of tumor samples Bladder Urothelial Carcinoma [BLCA] 11 4 Breast invasive carcinoma [BRCA] 106 30 Head and Neck squamous cell carcinoma [HNSC] 27 12 Kidney renal clear cell carcinoma [KIRC] 66 416* Kidney renal papillary cell carcinoma [KIRP] 15 4 Liver hepatocellular carcinoma [LIHC] 9 2 Lung adenocarcinoma [LUAD] 51 4 Lung squamous cell carcinoma [LUSC] 17 18 Prostate adenocarcinoma [PRAD] 7 7 Thyroid carcinoma [THCA] 12 4 * All performed by PRADA fusion module.

Identification of TFG-GRP128 fusion All available normal samples in cghub Subset of tumor samples selected based on RPKM expression pattern Table. Samples across cancer types Cancer Type # of normal samples # of tumor samples Bladder Urothelial Carcinoma [BLCA] 0 (0%) 2 (3.6%) Breast invasive carcinoma [BRCA] 1 (0.94%) 13 (1.6%) Head and Neck squamous cell carcinoma [HNSC] 0 (0%) 6 (2.3%) Kidney renal clear cell carcinoma [KIRC] 1 (1.5%) 5 (1.2%) Kidney renal papillary cell carcinoma [KIRP] 0 (0%) 1 (5.9%) Liver hepatocellular carcinoma [LIHC] 0 (0%) 1 (5.9%) Lung adenocarcinoma [LUAD] 0 (0%) 1 (0.79%) Lung squamous cell carcinoma [LUSC] 0 (0%) 9 (4%) Prostate adenocarcinoma [PRAD] 1 (14.3) 2 (1.9%) Thyroid carcinoma [THCA] 0 (0%) 2 (0.89%) * All performed by PRADA fusion module.

GUESS-ft module: TFG-GPR128 fusion Cont d Raw Copy Number for KIRC Focal amplification in chr3 (TFG-GPR128)

GUESS-ft module: TFG-GPR128 fusion Cont d GWAS

In GBM, the gene EGFR is frequently targeted by intragenic deletions Figure. GBM Alterations in EGFR

Preprocessing.bam file [PAIRED END] INPUTS Supervised Search Module.fastq files Config.txt [location of scripts and reference files] [END1 & END2] GUESS-ig: GUESS for intragenic rearrangements Processing Module BAM A-A Expression & QC Module RNA-SeQC Fusion Module GUESS-ft -genea -geneb Read Alignment Remap alignments Combine two ends GUESS-IG Quality Scores Recalibrate d Mapped to A OUTPUTS Unmapped reads RPKM & QC metrics Parse Unmapped reads with the other end map to A Fusion Candidates Supervised search evidence Discordant reads Junction DB Map parsed reads to DB of undefined junctions* Summary report Junction spanning reads List reads with one end map to undefined junction, the other maps to A

Applying GUESS-ig in GBM identifies intragenic deletion variants Figure. GBM Alterations in EGFR

Thanks. http://sourceforge.net/projects/prada/