Investigating rare diseases with Agilent NGS solutions Chitra Kotwaliwale, Ph.D. 1
Rare diseases affect 350 million people worldwide 7,000 rare diseases 80% are genetic 60 million affected in the US, Europe 50% affected individuals are children 2
Rare diseases can have devastating impact on health Cystic Fibrosis Excessive mucus in lungs and pancreas causes respiratory failure and inability to digest food Median survival age is 40 years Affects more than 30,000 people in the US; 70,000 WW Leukodystrophy Progressive diseases that affect brain, spinal cord, peripheral nerves affecting movement, vision, hearing, balance, ability to eat etc. Children affected with leukodystrophy live 5-10 years Affects ~60,000 people in the US Retinitis Pigmentosa Retinal degeneration ultimately causes blindness Most people with RP are legally blind by age 40 Affects ~100,000 people in the US; 1.5 million WW 3
Many Genes Single Gene Genetic causes of rare diseases can be complex Wildtype CFTR Mutation in CFTR Cystic Fibrosis 1 gene Cl- Leukodystrophy 30 genes Healthy neuron Damaged neuron Retinitis Pigmentosa 77 genes Rods & cones in healthy retina Rods & cones in RP 4
Complexity of symptoms makes it difficult to detect rare diseases Complex genetics Complex phenotype Average physician visits before receiving a diagnosis = 7 Average time from symptom onset to accurate diagnosis = 4.8 yrs Percent rare disease cases that are undiagnosed =? Source: Engel et al., Journal of Rare Disorders Rare diseases are progressive Faster diagnosis Early intervention Improved quality of life 5
Exome sequencing has enhanced our understanding of rare diseases Number of Novel Rare Disease Genes Identified 160 120 80 40 ~130 genes 3,710 genes with phenotype causing mutations in OMIM 197,952 mutations in HGMD 0 2009 2010 2011 2012 2013 2014 2015 2016 2017 Source: Boycott et al., Nature Reviews Genetics 6
Present since the inception of exome sequencing 7
Agilent launched the first whole exome sequencing kit First Whole Exome Sequencing kit launched Exome customization enabled Clinical Research Exome v2 SureSelect Human All Exon V6 Focused Exome 2009 2010 2011 2012 2013 2014 2015 2016 2017 8
Agilent pioneered whole exome sequencing workflow DNA Extraction Library Prep Target Enrichment Sequencing Data Analysis 9
SureSelect baits are generated using a highfidelity oligo synthesis process Depurination side reaction 1) Coupling Inkjet 3) Deblock Flood 2) Oxidation RNA Baits Repeat n times HO O N i O O P O O N 2 RO O O P O O RO O O P O RO N 1 Long length synthesis is achieved by improved cycle yield Coupling efficiency Depurination Consistency 10
High fidelity process ensures superior quality baits for target enrichment Oligo Synthesis Fidelity Errors per kb No need to QC individual oligoes More accurate capture %FL = (CY DY) nt %FL= %Full Length CY=Synthesis Cycle Yield DY=Depurination Cycle Yield 11
Agilent SureSelect provides the most versatile platform for target enrichment Catalog panels Custom panels Exomes 12
Three pillars that guide Agilent exomes Performance Content Flexibility 13
Agilent SureSelect Exomes New! SureSelect Clinical Research Exome V2 (CREv2) Comprehensive exome optimized for rare & inherited disorders SureSelect All Human Exon V6 Comprehensive exome for translational and clinical research SureSelect Focused Exome Targeted exome with optimized coverage of only the disease associated genes 14
Performance, Content, Flexibity in CREv2 Performance Content Flexibility Enhanced coverage of disease associated genes Optimized content including non-coding regions associated with disease Design customizability to further enhance your exome 15
CREv2 provides enhanced coverage of disease-associated genes Performance 5,109 diseaseassociated genes 100x average sequencing depth; 67.3Mb design; 6.5Gb sequencing 16
CREv2 provides high SNP and Indel concordance in targeted regions Samples SNP Concordance Indel Concordance Hom Het Sample 1 99.91% 99.41% 97.15% Sample 2 99.91% 99.29% 96.63% Sample 3 99.91% 99.36% 96.95% Sample 4 99.93% 99.40% 97.48% Sample 5 99.95% 99.48% 97.19% Sample 6 99.91% 99.29% 96.5% Sample 7 99.95% 99.46% 97% Sample 8 99.92% 99.42% 97.32% SNP concordance calculated using HapMap data Indel concordance calculated using dbsnp data 17
Normalized coverage Avg_Cov Normalized coverage Avg_Cov Agilent exomes provide uniform coverage regardless of GC content 200 150 Agilent CREv2 Pearson s r = 0.27 200 200 150 150 Vendor ID Pearson s r = 0.6 100 100 100 50 50 50 0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 as.factor(bin) Low High Low as.factor(bin) GC High GC 0 0 Smaller deviation from the mean across GC bins in CREv2 All exomes sequenced to the same average sequencing depth Exons were divided into deciles based on GC to calculate normalized coverage 18
CREv2 provides consistent coverage in high and low GC regions 41% GC 25% GC Agilent CREv2 Vendor ID Low coverage of AT-rich exon in Vendor ID exome 45% GC 29% GC Agilent CREv2 Vendor ID Low coverage of AT-rich exon in Vendor ID exome All exomes sequenced to the same average sequencing depth 19
CREv2 provides the most comprehensive coverage of disease-associated regions Content Optimized coverage of disease-associated genes Plus Coverage of splice sites & deep intronic regions Coverage of other non-coding regions.associated with disease Curated in collaboration with Dr. Madhuri Hegde, Emory University Disease association information available with exome! 20
CREv2 provides superior disease relevant content Pathogenic variant associated with leukodystrophy only detectable by Agilent CREv2 but not competitor ID exome Pathogenic variant in 5 UTR of GJC2 A>G SNV Agilent CREv2 Vendor ID 21
Mutations that cause retinitis pigmentosa frequently occur in non-coding regions Agilent CREv2 Vendor ID Vendor R Agilent CREv2 Vendor ID Vendor R 22
CREv2 provides more disease-associated regions ClinVar Pathogenic/Likely Pathogenic Leukodystrophy Variants covered ClinVar Pathogenic/Likely Pathogenic Retinitis Pigmentosa Variants covered Agilent CREv2 98.1% 95.3% Competitor ID 90% 87.9% Competitor R 90.7% 94.6% 23
Accelerate the detection of disease-causing mutations with CREv2 Average time to detect leukodystrophy = 8 yrs Number of Methods Cost ($) 50 20000 40 30 20 10 15000 10000 5000 0 Minimum Average Maximum 0 Minimum Average Maximum Source: Richards et al., Neurology Comprehensive coverage of disease associated regions means Fewer method iterations Lower cost Faster detection 24
Build your perfect exome with SureSelect customization capability Flexibility Unmatched flexibility in customization of content and formats. Use existing designs as a base to optimize the exome for your research needs 25
Copy number changes in rare and inherited disorders 26
CNVs account for 10-15% pathogenicity in rare diseases 16,000 14,000 12,000 10,000 8,000 6,000 4,000 2,000 0 Copy number variants in HGMD Database 10-15% of pathogenic variants associated with rare disease are copy number changes 27
Some samples have multiple underlying pathogenic variants ~5% of samples have multiple pathogenic variants ~12% samples with dual variatns include combination of CNVs and SNVs Missed if exome sequencing or CNV analysis performed alone 28
Detect CNVs, LOH, SNVs & indels in one NGS assay 29
OneSeq Target Enrichment: One Assay, All Variants ClinGen disease associated regions Gene A ClinGen disease associated regions Gene B 1) Evenly spaced genome-wide baits 2) High density baits in ClinGen disease associated regions Copy number & LOH 3) User-defined baits in exonic regions SNVs & indels 30
OneSeq: Tailored for your needs OneSeq High Resolution OneSeq Low Resolution CNV resolution genome-wide 300 kb 1 Mb CNV resolution in ClinGen regions 25-50 kb 1 MB LOH 5 Mb 10 Mb SNV & Indels Sequencer recommendation Region targeted by CNV backbone Combine OneSeq CNV backbone with any SureSelect exome, ClearSeq gene panel or SureSelect custom region High or medium throughput sequencers Combine OneSeq CNV backbone with any SureSelect exome, ClearSeq gene panel or SureSelect custom region High, medium or benchtop sequencers 12 Mb 2.7 Mb 31
Can OneSeq detect all the important CNVs? OneSeq 300kb (25-50 kb resolution in ClinGen regions) and 1Mb backbones have sufficient resolution to detect most CNVs in ClinGen database CNVs in ClinGen Database Benign Pathogenic 1) 4,579 CNVs 2) 93% are > 300kb 3) 81% are > 1Mb Likely Benign Likely Pathogenic Likely Pathogenic 32
OneSeq can reliably detect CNVs identified by microarrays Detection of 8 CNVs >150 kb with both OneSeq 300 kb and CGH+SNP 4x180K microarrays in Coriell sample NA08254 Chromosome Aberration type CGH aberration size [kb] OneSeq aberration size [kb] OneSeq avg log 2 ratio chr13 del 12427 13335-0.89 chr15 del 2240 1667-0.37 chr16 del 772 863-0.43 chr14 amp 987 544 0.54 chr6 amp 370 372 0.61 chr2 del 828 307-0.46 chr17 amp 163 201 0.49 chr22 amp 172 191 3.00 33
OneSeq can detect intergenic CNVs Duplication upstream of SOX9 Customer generated Array data Case published in Vetro et al, EJHG (2014), 1-8, 1018-4813/14
OneSeq can detect CNVs in non-coding regions 4x180K Catalog design Microarray OneSeq
OneSeq can detect uniparental disomy OneSeq data: Copy number [Log 2 Ratio] LOH data [B allele freq] Known common CNV Detection of Uniparental Disomy 15 in Coriell sample NA20409 CGH+SNP microarray confirmation data For Research Use Only. Not for use in diagnostic procedures.
Exome Sequencing Data analysis bottleneck in NGS workflow FASTQ BAM VCF ~20,000 variants Disease-associated variant 37
NGS workflow needs substantial compute infrastructure Sample prep Library Prep Target Enrichment Sequencing Data Analysis Compute intensive Disparate tools Time-consuming 38
Alissa software platform from raw data to answer Make your work flow with Agilent Alissa Clinical Informatics for NGS One single platform from raw reads to draft lab reports Comprehensive QC metrics at your fingertips A team of experts that go the road with you 39
Agilent NGS solutions for rare and inherited disorders Sample QC 40