Genome. Institute. GenomeVIP: A Genomics Analysis Pipeline for Cloud Computing with Germline and Somatic Calling on Amazon s Cloud. R. Jay Mashl.

Similar documents
DNA-seq Bioinformatics Analysis: Copy Number Variation

Analysis with SureCall 2.1

PSSV User Manual (V2.1)

Cancer Gene Panels. Dr. Andreas Scherer. Dr. Andreas Scherer President and CEO Golden Helix, Inc. Twitter: andreasscherer

Data mining with Ensembl Biomart. Stéphanie Le Gras

PSSV User Manual (V1.0)

Whole Genome and Transcriptome Analysis of Anaplastic Meningioma. Patrick Tarpey Cancer Genome Project Wellcome Trust Sanger Institute

Molecular Characterization of Tumors Using Next-Generation Sequencing

Calling DNA variants SNVs, CNVs, and SVs. Steve Laurie Variant Effect Predictor Training Course Prague, 6 th November 2017

Assessing Laboratory Performance for Next Generation Sequencing Based Detection of Germline Variants through Proficiency Testing

Supplementary Figure 1. Estimation of tumour content

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits

SVIM: Structural variant identification with long reads DAVID HELLER MAX PLANCK INSTITUTE FOR MOLECULAR GENETICS, BERLIN JUNE 2O18, SMRT LEIDEN

Identification of Genetic Determinants of Metastasis and Clonal Relationships Between Primary and Metastatic Tumors

AVENIO ctdna Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

Core Technology Development Team Meeting

Golden Helix s End-to-End Solution for Clinical Labs

Integrated Analysis of Copy Number and Gene Expression

Genetics Mutations 2 Teacher s Guide

Nature Biotechnology: doi: /nbt.1904

COMPUTATIONAL OPTIMISATION OF TARGETED DNA SEQUENCING FOR CANCER DETECTION

Structural Variation and Medical Genomics

ISOWN: accurate somatic mutation identification in the absence of normal tissue controls

AD (Leave blank) TITLE: Genomic Characterization of Brain Metastasis in Non-Small Cell Lung Cancer Patients

User Instruction Guide

Clonal Evolution of saml. Johnnie J. Orozco Hematology Fellows Conference May 11, 2012

Nature Genetics: doi: /ng Supplementary Figure 1. PCA for ancestry in SNV data.

PERSONALIZED GENETIC REPORT CLIENT-REPORTED DATA PURPOSE OF THE X-SCREEN TEST

Original Article International Cancer Genome Consortium Data Portal a one-stop shop for cancer genomics data

Performance Characteristics BRCA MASTR Plus Dx

GENETIC TESTING FOR PREDICTING RISK OF NONFAMILIAL BREAST CANCER

Research Strategy: 1. Background and Significance

Ginkgo Interactive analysis and quality assessment of single-cell CNV data

The feasibility of circulating tumour DNA as an alternative to biopsy for mutational characterization in Stage III melanoma patients

NGS in Cancer Pathology After the Microscope: From Nucleic Acid to Interpretation

CNV PCA Search Tutorial

MEDICAL GENOMICS LABORATORY. Next-Gen Sequencing and Deletion/Duplication Analysis of NF1 Only (NF1-NG)

Genomic structural variation

Nature Genetics: doi: /ng Supplementary Figure 1. Rates of different mutation types in CRC.

OncoPhase: Quantification of somatic mutation cellular prevalence using phase information

underlying metastasis and recurrence in HNSCC, we analyzed two groups of patients. The

Lionbridge Connector for Hybris. User Guide

Dr Rick Tearle Senior Applications Specialist, EMEA Complete Genomics Complete Genomics, Inc.

Myeloma Genetics what do we know and where are we going?

Reporting TP53 gene analysis results in CLL

Global variation in copy number in the human genome

ACE ImmunoID Biomarker Discovery Solutions ACE ImmunoID Platform for Tumor Immunogenomics

Tumor mutational burden and its transition towards the clinic

Introduction to Genetics

SUPPLEMENTARY INFORMATION

CRISPR/Cas9 Enrichment and Long-read WGS for Structural Variant Discovery

No mutations were identified.

Below, we included the point-to-point response to the comments of both reviewers.

DPV. Ramona Ranz, Andreas Hungele, Prof. Reinhard Holl

Cancer Validation in the 100,000 genomes project. Dr Shirley Henderson ACGS spring meeting 06/07/16

ACE ImmunoID. ACE ImmunoID. Precision immunogenomics. Precision Genomics for Immuno-Oncology

Session 4 Rebecca Poulos

NGS ONCOPANELS: FDA S PERSPECTIVE

Introduction to genetic variation. He Zhang Bioinformatics Core Facility 6/22/2016

Analyzing Genomic Data with PyEnsembl and Varcode. Alex Rubinsteyn SciPy - July 9th, 2015

TCGA. The Cancer Genome Atlas

Computational Systems Biology: Biology X

Nature Medicine: doi: /nm.4439

Clinical Utility of Actionable Genome Information in Precision Oncology Clinic

NGS Types of gene dossier applications UKGTN can evaluate

Use Case 9: Coordinated Changes of Epigenomic Marks Across Tissue Types. Epigenome Informatics Workshop Bioinformatics Research Laboratory

BWA alignment to reference transcriptome and genome. Convert transcriptome mappings back to genome space

Genomic complexity and arrays in CLL. Gian Matteo Rigolin, MD, PhD St. Anna University Hospital Ferrara, Italy

Session 4 Rebecca Poulos

FONS Nové sekvenační technologie vklinickédiagnostice?

Supplementary Materials for

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16

Detection of aneuploidy in a single cell using the Ion ReproSeq PGS View Kit

Identification of genomic alterations in cervical cancer biopsies by exome sequencing

LESSON 3.2 WORKBOOK. How do normal cells become cancer cells? Workbook Lesson 3.2

Multiplex target enrichment using DNA indexing for ultra-high throughput variant detection

Hands-On Ten The BRCA1 Gene and Protein

Functional analysis of DNA variants

SubcloneSeeker: a computational framework for reconstructing tumor clone structure for cancer variant interpretation and prioritization

Detection of copy number variations in PCR-enriched targeted sequencing data

Using Network Flow to Bridge the Gap between Genotype and Phenotype. Teresa Przytycka NIH / NLM / NCBI

Run Time Tester Requirements Document

DETECTION OF LOW FREQUENCY CXCR4-USING HIV-1 WITH ULTRA-DEEP PYROSEQUENCING. John Archer. Faculty of Life Sciences University of Manchester

Clay Tablet Connector for hybris. User Guide. Version 1.5.0

MutationTaster & RegulationSpotter

COSMIC - Catalogue of Somatic Mutations in Cancer

VMMC Installation Guide (Windows NT) Version 2.0

Proteome Discoverer Version 1.3

Investigating rare diseases with Agilent NGS solutions

Supplementary Figure 1

SUPPLEMENTARY INFORMATION

ARTICLE RESEARCH. Macmillan Publishers Limited. All rights reserved

MRC-Holland MLPA. Description version 06; 23 December 2016

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction

Cytogenetics 101: Clinical Research and Molecular Genetic Technologies

Finding subtle mutations with the Shannon human mrna splicing pipeline

Using the Bravo Liquid-Handling System for Next Generation Sequencing Sample Prep

New observations on maternal age effect on germline de novo mutations

Transcription:

GenomeVIP: the Genome Institute at Washington University A Genomics Analysis Pipeline for Cloud Computing with Germline and Somatic Calling on Amazon s Cloud R. Jay Mashl October 20, 2014

Turnkey Variant Analysis Project Provides a collection of analysis tools and computational frameworks for streamlined discovery and interpretation of genetic variants VarScan Pindel BreakDancer Genome Variant Investigation Portal Multi-tool Variant discovery Cloud computing Scalability Extensibility Poster #1678M (Monday) local Cloud (AWS) tvap.genome.wustl.edu

Genome Variant Investigation Portal Web server and interface for germline and somatic variant-discovery tools VarScan Pindel BreakDancer Heuristic/statistical calling of single nucleotide variants (SNVs) Indel detection for paired reads based on local realignment Structural variant (SV) detection for paired reads GenomeSTRiP (Harvard U.) Structural variant detection and genotyping Concurrent pipelines (SNV, indel, SV) with parallelization Launchable on local machines or on the cloud through Amazon Web Services (AWS) Download results from AWS via web browser

Biological Discoveries (selected) Comprehensive molecular portraits of human breast tumours Identified four main types by combining data from five platforms Nature 490, 61-70 (2012) Clonal evolution in relapsed acute myeloid leukaemia Cancer consists of multiple variants; founding clone may give rise to relapse clone; subclones may survive therapy and mutate further Nature 481, 506-510 (2012) Genomic Landscape of Non-Small Cell Lung Cancer in Smokers and Never-Smokers Of patients with lung cancer, smokers found to have10x more mutations than non-smokers Cell 150, 1121-34 (2012) Discovery & genotyping for structural variants in populations ~14,000 deletion polymorphisms with allelic states (1000G pilot) Nature Genetics 43, 269-276 (2011)

Application to APOL1: Demo Representative samples from PUR population from 1000 Genomes Analyze within the range chr22 : 36-37 Mbp for known variants: Sample Region Variant Isoforms HG01242 22:36,661,906 A / G G1 (non-silent) HG01101 22:36,662,041 AATAATT / A G2 (Δ6) HG01049 22:36,133,448 Δ 767bp

Login Select AWS Click Next

Sample & Reference Selection Specify path & retrieve Select samples Entering path: Copy the given URI. Click Retrieve. Click on all the PUR low_coverage items to transfer them to the Selected bams textbox. Select reference hs37d5.chr22.fa. Select reference (hs37d5.chr22.fa) Click Next.

SNV Detection: VarScan SNV All 22:36130000-36700000 CheckVarScan Select Germline Select SNVs only Select All (pooled) samples Select User-defined region and enter 22:36130000-36700000 Keep p-value: 0.99 Set Output vcf: True Click Next.

Indel Detection: Pindel Select All Check Run Pindel Select All (pooled) samples Select User-defined region and enter 22:36130000-36700000 Click Next. 22:36130000-36700000

SV Detection: BreakDancer 22:36130000-36700000 Check BreakDancer In Step 1, select All (pooled) samples In Step 3, select Intra (ITX) only, user-defined region and enter 22:36130000-36700000 Click Next.

SV Detection & Genotyping: GenomeSTRiP 1. Check Run GenomeSTRiP 2. Verify reference is hs37d5.chr22.fa 3. Select mask Hs37d5 human_g1k_v37.mask.36.fasta.chr22 4. GC normalization: True, with cn2_mask_g1k_v37.fasta 5. Chromosome: User-defined with 22:36130000-36700000 6. Variant size: 100bp 100 kbp. 100bp- 100kbp

Amazon AWS Submission Select machine type Jobs have been tested to finish within a few minutes Where to send results Validate & submit

Results 22 36133341 DEL_1 T <DEL> SVLEN=-762;SVTYPE=DEL 22 36662041. AATAATT A. PASS END=36662047;HOMLEN=4;HOMSEQ=ATAA;SVLEN=-6;SVTYPE=DEL; 22 36661906. A G. PASS ADP=7;WT=1;HET=0;HOM=0;NC=2 22 36662041. AATAATT A. PASS ADP=4;WT=0;HET=1;HOM=0;NC=2;

Jay Mashl (rmashl @ genome.wustl.edu) Kai Ye (kye @ genome.wustl.edu) Li Ding (lding @ genome.wustl.edu)...and with thanks to the Ding Lab members Poster #1678 / M (this afternoon) http://tvap.genome.wustl.edu/ National Human Genome Research Institute

Alternate slides

Amazon AWS S3 Data Retrieval Links to actual files to be generated, along with merged VCF Click links to download Participants will identify variants in the output (Left) Prepared results available, in case of technology problems